diff mbox series

[bpf-next,v4,01/15] bpf: Document XDP RX metadata

Message ID 20221213023605.737383-2-sdf@google.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series xdp: hints via kfuncs | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-2 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-4 fail Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for set-matrix
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 4 maintainers not CCed: linux-doc@vger.kernel.org davem@davemloft.net hawk@kernel.org corbet@lwn.net
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: Missing or malformed SPDX-License-Identifier tag in line 1 WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-3 fail Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-6 fail Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-7 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-8 success Logs for set-matrix

Commit Message

Stanislav Fomichev Dec. 13, 2022, 2:35 a.m. UTC
Document all current use-cases and assumptions.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 Documentation/bpf/xdp-rx-metadata.rst | 90 +++++++++++++++++++++++++++
 1 file changed, 90 insertions(+)
 create mode 100644 Documentation/bpf/xdp-rx-metadata.rst

Comments

David Vernet Dec. 13, 2022, 4:37 p.m. UTC | #1
On Mon, Dec 12, 2022 at 06:35:51PM -0800, Stanislav Fomichev wrote:
> Document all current use-cases and assumptions.
> 
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Martin KaFai Lau <martin.lau@linux.dev>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> Cc: Maryam Tahhan <mtahhan@redhat.com>
> Cc: xdp-hints@xdp-project.net
> Cc: netdev@vger.kernel.org
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  Documentation/bpf/xdp-rx-metadata.rst | 90 +++++++++++++++++++++++++++
>  1 file changed, 90 insertions(+)
>  create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
> 
> diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
> new file mode 100644
> index 000000000000..498eae718275
> --- /dev/null
> +++ b/Documentation/bpf/xdp-rx-metadata.rst

I think you need to add this to Documentation/bpf/index.rst. Or even
better, maybe it's time to add an xdp/ subdirectory and put all docs
there? Don't want to block your patchset from bikeshedding on this
point, so for now it's fine to just put it in
Documentation/bpf/index.rst until we figure that out.

> @@ -0,0 +1,90 @@
> +===============
> +XDP RX Metadata
> +===============
> +
> +XDP programs support creating and passing custom metadata via
> +``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
> +entities:

Can you add a couple of sentences to this intro section that explains
what metadata is at a high level?

> +
> +1. ``AF_XDP`` consumer.
> +2. Kernel core stack via ``XDP_PASS``.
> +3. Another device via ``bpf_redirect_map``.
> +4. Other BPF programs via ``bpf_tail_call``.
> +
> +General Design
> +==============
> +
> +XDP has access to a set of kfuncs to manipulate the metadata. Every

"...to manipulate the metadata in an XDP frame." ?

> +device driver implements these kfuncs. The set of kfuncs is

"Every device driver implements these kfuncs" can you be a bit more
specific about which types of device drivers will implement these?

> +declared in ``include/net/xdp.h`` via ``XDP_METADATA_KFUNC_xxx``.

Why is it suffixed with _xxx?

> +
> +Currently, the following kfuncs are supported. In the future, as more
> +metadata is supported, this set will grow:
> +
> +- ``bpf_xdp_metadata_rx_timestamp_supported`` returns true/false to
> +  indicate whether the device supports RX timestamps
> +- ``bpf_xdp_metadata_rx_timestamp`` returns packet RX timestamp

s/returns packet/returns a packet's

> +- ``bpf_xdp_metadata_rx_hash_supported`` returns true/false to
> +  indicate whether the device supports RX hash

I don't see bpf_xdp_metadata_rx_timestamp_supported() or
bpf_xdp_metadata_rx_hash_supported() being added in your patch set. Can
you remove these entries until they're actually implemented?

> +- ``bpf_xdp_metadata_rx_hash`` returns packet RX hash

We should probably also add a note that these kfuncs currently just
return -EOPNOTSUPP.

Finally, should we add either some example code showing how to use these
kfuncs, or at the very least some links to their selftests so readers
have example code they can refer to?

> +
> +Within the XDP frame, the metadata layout is as follows::
> +
> +  +----------+-----------------+------+
> +  | headroom | custom metadata | data |
> +  +----------+-----------------+------+
> +             ^                 ^
> +             |                 |
> +   xdp_buff->data_meta   xdp_buff->data
> +
> +AF_XDP
> +======
> +
> +``AF_XDP`` use-case implies that there is a contract between the BPF program
> +that redirects XDP frames into the ``XSK`` and the final consumer.

Can you fully spell out what XSK stands for the first time it's used?
Something like "...that redirects XDP frames into the ``AF_XDP`` socket
(``XSK``) and the final consumer." Applies anywhere else you think
appropriate as well.

> +Thus the BPF program manually allocates a fixed number of
> +bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset
> +of kfuncs to populate it. User-space ``XSK`` consumer, looks

s/User-space/The user-space

Also, it feels like it might read better without the comma, and by
doing something like s/looks at/computes. Wdyt?

> +at ``xsk_umem__get_data() - METADATA_SIZE`` to locate its metadata.
> +
> +Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer)::
> +
> +  +----------+-----------------+------+
> +  | headroom | custom metadata | data |
> +  +----------+-----------------+------+
> +                               ^
> +                               |
> +                        rx_desc->address
> +
> +XDP_PASS
> +========
> +
> +This is the path where the packets processed by the XDP program are passed
> +into the kernel. The kernel creates ``skb`` out of the ``xdp_buff`` contents.

s/creates ``skb``/creates the ``skb``

> +Currently, every driver has a custom kernel code to parse the descriptors and
> +populate ``skb`` metadata when doing this ``xdp_buff->skb`` conversion.
> +In the future, we'd like to support a case where XDP program can override

s/where XDP program/where an XDP program

> +some of that metadata.
> +
> +The plan of record is to make this path similar to ``bpf_redirect_map``
> +so the program can control which metadata is passed to the skb layer.
> +
> +bpf_redirect_map
> +================
> +
> +``bpf_redirect_map`` can redirect the frame to a different device.
> +In this case we don't know ahead of time whether that final consumer
> +will further redirect to an ``XSK`` or pass it to the kernel via ``XDP_PASS``.
> +Additionally, the final consumer doesn't have access to the original
> +hardware descriptor and can't access any of the original metadata.
> +
> +For this use-case, only custom metadata is currently supported. If
> +the frame is eventually passed to the kernel, the skb created from such
> +a frame won't have any skb metadata. The ``XSK`` consumer will only
> +have access to the custom metadata.
> +
> +bpf_tail_call
> +=============
> +
> +No special handling here. Tail-called program operates on the same context

s/Tail-called program/A tail-called program

> +as the original one.
> -- 
> 2.39.0.rc1.256.g54fd8350bd-goog
>
Stanislav Fomichev Dec. 13, 2022, 8:42 p.m. UTC | #2
On Tue, Dec 13, 2022 at 8:37 AM David Vernet <void@manifault.com> wrote:
>
> On Mon, Dec 12, 2022 at 06:35:51PM -0800, Stanislav Fomichev wrote:
> > Document all current use-cases and assumptions.
> >
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: David Ahern <dsahern@gmail.com>
> > Cc: Martin KaFai Lau <martin.lau@linux.dev>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Willem de Bruijn <willemb@google.com>
> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> > Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> > Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> > Cc: Maryam Tahhan <mtahhan@redhat.com>
> > Cc: xdp-hints@xdp-project.net
> > Cc: netdev@vger.kernel.org
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  Documentation/bpf/xdp-rx-metadata.rst | 90 +++++++++++++++++++++++++++
> >  1 file changed, 90 insertions(+)
> >  create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
> >
> > diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
> > new file mode 100644
> > index 000000000000..498eae718275
> > --- /dev/null
> > +++ b/Documentation/bpf/xdp-rx-metadata.rst
>
> I think you need to add this to Documentation/bpf/index.rst. Or even
> better, maybe it's time to add an xdp/ subdirectory and put all docs
> there? Don't want to block your patchset from bikeshedding on this
> point, so for now it's fine to just put it in
> Documentation/bpf/index.rst until we figure that out.

Maybe let's put it under Documentation/networking/xdp-rx-metadata.rst
and reference form Documentation/networking/index.rst? Since it's more
relevant to networking than the core bpf?

> > @@ -0,0 +1,90 @@
> > +===============
> > +XDP RX Metadata
> > +===============
> > +
> > +XDP programs support creating and passing custom metadata via
> > +``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
> > +entities:
>
> Can you add a couple of sentences to this intro section that explains
> what metadata is at a high level?

I'm gonna copy-paste here what I'm adding, feel free to reply back if
still unclear. (so we don't have to wait another week to discuss the
changes)

XDP programs support creating and passing custom metadata via
``bpf_xdp_adjust_meta``. The metadata can contain some extra information
about the packet: timestamps, hash, vlan and tunneling information, etc.
This metadata can be consumed by the following entities:

> > +
> > +1. ``AF_XDP`` consumer.
> > +2. Kernel core stack via ``XDP_PASS``.
> > +3. Another device via ``bpf_redirect_map``.
> > +4. Other BPF programs via ``bpf_tail_call``.
> > +
> > +General Design
> > +==============
> > +
> > +XDP has access to a set of kfuncs to manipulate the metadata. Every
>
> "...to manipulate the metadata in an XDP frame." ?

SG!

> > +device driver implements these kfuncs. The set of kfuncs is
>
> "Every device driver implements these kfuncs" can you be a bit more
> specific about which types of device drivers will implement these?

How about the following?
Every device driver that wishes to expose additional packet metadata
can implement these kfuncs.

> > +declared in ``include/net/xdp.h`` via ``XDP_METADATA_KFUNC_xxx``.
>
> Why is it suffixed with _xxx?

I'm following the precedent of BTF_SOCK_TYPE_xxx and
BTF_TRACING_TYPE_xxx. LMK if you prefer a better name here.

> > +
> > +Currently, the following kfuncs are supported. In the future, as more
> > +metadata is supported, this set will grow:
> > +
> > +- ``bpf_xdp_metadata_rx_timestamp_supported`` returns true/false to
> > +  indicate whether the device supports RX timestamps
> > +- ``bpf_xdp_metadata_rx_timestamp`` returns packet RX timestamp
>
> s/returns packet/returns a packet's

ty!

> > +- ``bpf_xdp_metadata_rx_hash_supported`` returns true/false to
> > +  indicate whether the device supports RX hash
>
> I don't see bpf_xdp_metadata_rx_timestamp_supported() or
> bpf_xdp_metadata_rx_hash_supported() being added in your patch set. Can
> you remove these entries until they're actually implemented?

Ooh, good catch, I've removed them for this round. Will remove from
the doc as well.

> > +- ``bpf_xdp_metadata_rx_hash`` returns packet RX hash
>
> We should probably also add a note that these kfuncs currently just
> return -EOPNOTSUPP.

The default ones return EOPNOTSUPP. Maybe I can clarify with the following?

Not all kfuncs have to be implemented by the device driver; when not
implemented, the default ones that return ``-EOPNOTSUPP`` will be
used.

> Finally, should we add either some example code showing how to use these
> kfuncs, or at the very least some links to their selftests so readers
> have example code they can refer to?

Good idea, will reference
tools/testing/selftests/bpf/progs/xdp_metadata.c and
tools/testing/selftests/bpf/prog_tests/xdp_metadata.c.

Example
=======
See ``tools/testing/selftests/bpf/progs/xdp_metadata.c`` and
``tools/testing/selftests/bpf/prog_tests/xdp_metadata.c`` for an example of
BPF program that handles XDP metadata.

> > +
> > +Within the XDP frame, the metadata layout is as follows::
> > +
> > +  +----------+-----------------+------+
> > +  | headroom | custom metadata | data |
> > +  +----------+-----------------+------+
> > +             ^                 ^
> > +             |                 |
> > +   xdp_buff->data_meta   xdp_buff->data
> > +
> > +AF_XDP
> > +======
> > +
> > +``AF_XDP`` use-case implies that there is a contract between the BPF program
> > +that redirects XDP frames into the ``XSK`` and the final consumer.
>
> Can you fully spell out what XSK stands for the first time it's used?
> Something like "...that redirects XDP frames into the ``AF_XDP`` socket
> (``XSK``) and the final consumer." Applies anywhere else you think
> appropriate as well.

SG!

> > +Thus the BPF program manually allocates a fixed number of
> > +bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset
> > +of kfuncs to populate it. User-space ``XSK`` consumer, looks
>
> s/User-space/The user-space
>
> Also, it feels like it might read better without the comma, and by
> doing something like s/looks at/computes. Wdyt?

Ageed.

> > +at ``xsk_umem__get_data() - METADATA_SIZE`` to locate its metadata.
> > +
> > +Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer)::
> > +
> > +  +----------+-----------------+------+
> > +  | headroom | custom metadata | data |
> > +  +----------+-----------------+------+
> > +                               ^
> > +                               |
> > +                        rx_desc->address
> > +
> > +XDP_PASS
> > +========
> > +
> > +This is the path where the packets processed by the XDP program are passed
> > +into the kernel. The kernel creates ``skb`` out of the ``xdp_buff`` contents.
>
> s/creates ``skb``/creates the ``skb``

Ack.

> > +Currently, every driver has a custom kernel code to parse the descriptors and
> > +populate ``skb`` metadata when doing this ``xdp_buff->skb`` conversion.
> > +In the future, we'd like to support a case where XDP program can override
>
> s/where XDP program/where an XDP program

Same here, will fix, thanks!

> > +some of that metadata.
> > +
> > +The plan of record is to make this path similar to ``bpf_redirect_map``
> > +so the program can control which metadata is passed to the skb layer.
> > +
> > +bpf_redirect_map
> > +================
> > +
> > +``bpf_redirect_map`` can redirect the frame to a different device.
> > +In this case we don't know ahead of time whether that final consumer
> > +will further redirect to an ``XSK`` or pass it to the kernel via ``XDP_PASS``.
> > +Additionally, the final consumer doesn't have access to the original
> > +hardware descriptor and can't access any of the original metadata.
> > +
> > +For this use-case, only custom metadata is currently supported. If
> > +the frame is eventually passed to the kernel, the skb created from such
> > +a frame won't have any skb metadata. The ``XSK`` consumer will only
> > +have access to the custom metadata.
> > +
> > +bpf_tail_call
> > +=============
> > +
> > +No special handling here. Tail-called program operates on the same context
>
> s/Tail-called program/A tail-called program

Ty!


> > +as the original one.
> > --
> > 2.39.0.rc1.256.g54fd8350bd-goog
> >
Toke Høiland-Jørgensen Dec. 14, 2022, 10:34 a.m. UTC | #3
Stanislav Fomichev <sdf@google.com> writes:

> On Tue, Dec 13, 2022 at 8:37 AM David Vernet <void@manifault.com> wrote:
>>
>> On Mon, Dec 12, 2022 at 06:35:51PM -0800, Stanislav Fomichev wrote:
>> > Document all current use-cases and assumptions.
>> >
>> > Cc: John Fastabend <john.fastabend@gmail.com>
>> > Cc: David Ahern <dsahern@gmail.com>
>> > Cc: Martin KaFai Lau <martin.lau@linux.dev>
>> > Cc: Jakub Kicinski <kuba@kernel.org>
>> > Cc: Willem de Bruijn <willemb@google.com>
>> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
>> > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
>> > Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
>> > Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
>> > Cc: Maryam Tahhan <mtahhan@redhat.com>
>> > Cc: xdp-hints@xdp-project.net
>> > Cc: netdev@vger.kernel.org
>> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
>> > ---
>> >  Documentation/bpf/xdp-rx-metadata.rst | 90 +++++++++++++++++++++++++++
>> >  1 file changed, 90 insertions(+)
>> >  create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
>> >
>> > diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
>> > new file mode 100644
>> > index 000000000000..498eae718275
>> > --- /dev/null
>> > +++ b/Documentation/bpf/xdp-rx-metadata.rst
>>
>> I think you need to add this to Documentation/bpf/index.rst. Or even
>> better, maybe it's time to add an xdp/ subdirectory and put all docs
>> there? Don't want to block your patchset from bikeshedding on this
>> point, so for now it's fine to just put it in
>> Documentation/bpf/index.rst until we figure that out.
>
> Maybe let's put it under Documentation/networking/xdp-rx-metadata.rst
> and reference form Documentation/networking/index.rst? Since it's more
> relevant to networking than the core bpf?
>
>> > @@ -0,0 +1,90 @@
>> > +===============
>> > +XDP RX Metadata
>> > +===============
>> > +
>> > +XDP programs support creating and passing custom metadata via
>> > +``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
>> > +entities:
>>
>> Can you add a couple of sentences to this intro section that explains
>> what metadata is at a high level?
>
> I'm gonna copy-paste here what I'm adding, feel free to reply back if
> still unclear. (so we don't have to wait another week to discuss the
> changes)
>
> XDP programs support creating and passing custom metadata via
> ``bpf_xdp_adjust_meta``. The metadata can contain some extra information
> about the packet: timestamps, hash, vlan and tunneling information, etc.
> This metadata can be consumed by the following entities:

This is not really accurate, though? The metadata area itself can
contain whatever the XDP program wants it to, and I think you're
conflating the "old" usage for arbitrary storage with the driver-kfunc
metadata support.

I think we should clear separate the two: the metadata area is just a
place to store data (and is not consumed by the stack, except that
TC-BPF programs can access it), and the driver kfuncs are just a general
way to get data out of the drivers (and has nothing to do with the
metadata area, you can just get the data into stack variables).

While it would be good to have a documentation of the general metadata
area stuff somewhere, I don't think it necessarily have to be part of
this series, so maybe just stick to documenting the kfuncs?

-Toke
Stanislav Fomichev Dec. 14, 2022, 6:42 p.m. UTC | #4
On Wed, Dec 14, 2022 at 2:34 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Stanislav Fomichev <sdf@google.com> writes:
>
> > On Tue, Dec 13, 2022 at 8:37 AM David Vernet <void@manifault.com> wrote:
> >>
> >> On Mon, Dec 12, 2022 at 06:35:51PM -0800, Stanislav Fomichev wrote:
> >> > Document all current use-cases and assumptions.
> >> >
> >> > Cc: John Fastabend <john.fastabend@gmail.com>
> >> > Cc: David Ahern <dsahern@gmail.com>
> >> > Cc: Martin KaFai Lau <martin.lau@linux.dev>
> >> > Cc: Jakub Kicinski <kuba@kernel.org>
> >> > Cc: Willem de Bruijn <willemb@google.com>
> >> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> >> > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> >> > Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> >> > Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> >> > Cc: Maryam Tahhan <mtahhan@redhat.com>
> >> > Cc: xdp-hints@xdp-project.net
> >> > Cc: netdev@vger.kernel.org
> >> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> >> > ---
> >> >  Documentation/bpf/xdp-rx-metadata.rst | 90 +++++++++++++++++++++++++++
> >> >  1 file changed, 90 insertions(+)
> >> >  create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
> >> >
> >> > diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
> >> > new file mode 100644
> >> > index 000000000000..498eae718275
> >> > --- /dev/null
> >> > +++ b/Documentation/bpf/xdp-rx-metadata.rst
> >>
> >> I think you need to add this to Documentation/bpf/index.rst. Or even
> >> better, maybe it's time to add an xdp/ subdirectory and put all docs
> >> there? Don't want to block your patchset from bikeshedding on this
> >> point, so for now it's fine to just put it in
> >> Documentation/bpf/index.rst until we figure that out.
> >
> > Maybe let's put it under Documentation/networking/xdp-rx-metadata.rst
> > and reference form Documentation/networking/index.rst? Since it's more
> > relevant to networking than the core bpf?
> >
> >> > @@ -0,0 +1,90 @@
> >> > +===============
> >> > +XDP RX Metadata
> >> > +===============
> >> > +
> >> > +XDP programs support creating and passing custom metadata via
> >> > +``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
> >> > +entities:
> >>
> >> Can you add a couple of sentences to this intro section that explains
> >> what metadata is at a high level?
> >
> > I'm gonna copy-paste here what I'm adding, feel free to reply back if
> > still unclear. (so we don't have to wait another week to discuss the
> > changes)
> >
> > XDP programs support creating and passing custom metadata via
> > ``bpf_xdp_adjust_meta``. The metadata can contain some extra information
> > about the packet: timestamps, hash, vlan and tunneling information, etc.
> > This metadata can be consumed by the following entities:
>
> This is not really accurate, though? The metadata area itself can
> contain whatever the XDP program wants it to, and I think you're
> conflating the "old" usage for arbitrary storage with the driver-kfunc
> metadata support.
>
> I think we should clear separate the two: the metadata area is just a
> place to store data (and is not consumed by the stack, except that
> TC-BPF programs can access it), and the driver kfuncs are just a general
> way to get data out of the drivers (and has nothing to do with the
> metadata area, you can just get the data into stack variables).
>
> While it would be good to have a documentation of the general metadata
> area stuff somewhere, I don't think it necessarily have to be part of
> this series, so maybe just stick to documenting the kfuncs?

Maybe I can reword to something like below?

The metadata can be used to store some extra information about the
packet timestamps, hash, vlan and tunneling information, etc.

This way we are not actually defining what it is, but hinting about
how it's commonly used?

> -Toke
>
Toke Høiland-Jørgensen Dec. 14, 2022, 11:46 p.m. UTC | #5
Stanislav Fomichev <sdf@google.com> writes:

> Document all current use-cases and assumptions.

Below is a set of slightly more constructive suggestions for how to edit
this so it's not confusing the metadata area description with the kfunc
list:

> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Martin KaFai Lau <martin.lau@linux.dev>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> Cc: Maryam Tahhan <mtahhan@redhat.com>
> Cc: xdp-hints@xdp-project.net
> Cc: netdev@vger.kernel.org
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  Documentation/bpf/xdp-rx-metadata.rst | 90 +++++++++++++++++++++++++++
>  1 file changed, 90 insertions(+)
>  create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
>
> diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
> new file mode 100644
> index 000000000000..498eae718275
> --- /dev/null
> +++ b/Documentation/bpf/xdp-rx-metadata.rst
> @@ -0,0 +1,90 @@
> +===============
> +XDP RX Metadata
> +===============
> +
> +XDP programs support creating and passing custom metadata via
> +``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
> +entities:
> +
> +1. ``AF_XDP`` consumer.
> +2. Kernel core stack via ``XDP_PASS``.
> +3. Another device via ``bpf_redirect_map``.
> +4. Other BPF programs via ``bpf_tail_call``.

I'd replace the above with a short introduction, like:

"This document describes how an XDP program can access hardware metadata
related to a packet using a set of helper functions, and how it can pass
that metadata on to other consumers."

> +General Design
> +==============
> +
> +XDP has access to a set of kfuncs to manipulate the metadata. Every
> +device driver implements these kfuncs. The set of kfuncs is
> +declared in ``include/net/xdp.h`` via ``XDP_METADATA_KFUNC_xxx``.
> +
> +Currently, the following kfuncs are supported. In the future, as more
> +metadata is supported, this set will grow:
> +
> +- ``bpf_xdp_metadata_rx_timestamp_supported`` returns true/false to
> +  indicate whether the device supports RX timestamps
> +- ``bpf_xdp_metadata_rx_timestamp`` returns packet RX timestamp
> +- ``bpf_xdp_metadata_rx_hash_supported`` returns true/false to
> +  indicate whether the device supports RX hash
> +- ``bpf_xdp_metadata_rx_hash`` returns packet RX hash

Keep the above (with David's comments), then add a bit of extra text,
here:

"The XDP program can use these kfuncs to read the metadata into stack
variables for its own consumption. Or, to pass the metadata on to other
consumers, an XDP program can store it into the metadata area carried
ahead of the packet.

> +Within the XDP frame, the metadata layout is as follows::
> +
> +  +----------+-----------------+------+
> +  | headroom | custom metadata | data |
> +  +----------+-----------------+------+
> +             ^                 ^
> +             |                 |
> +   xdp_buff->data_meta   xdp_buff->data

Add:

"The XDP program can store individual metadata items into this data_meta
area in whichever format it chooses. Later consumers of the metadata
will have to agree on the format by some out of band contract (like for
the AF_XDP use case, see below)."

> +AF_XDP
> +======
> +
> +``AF_XDP`` use-case implies that there is a contract between the BPF program
> +that redirects XDP frames into the ``XSK`` and the final consumer.
> +Thus the BPF program manually allocates a fixed number of
> +bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset
> +of kfuncs to populate it. User-space ``XSK`` consumer, looks
> +at ``xsk_umem__get_data() - METADATA_SIZE`` to locate its metadata.
> +
> +Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer)::
> +
> +  +----------+-----------------+------+
> +  | headroom | custom metadata | data |
> +  +----------+-----------------+------+
> +                               ^
> +                               |
> +                        rx_desc->address
> +
> +XDP_PASS
> +========
> +
> +This is the path where the packets processed by the XDP program are passed
> +into the kernel. The kernel creates ``skb`` out of the ``xdp_buff`` contents.
> +Currently, every driver has a custom kernel code to parse the descriptors and
> +populate ``skb`` metadata when doing this ``xdp_buff->skb`` conversion.

Add: ", and the XDP metadata is not used by the kernel when building
skbs. However, TC-BPF programs can access the XDP metadata area using
the data_meta pointer."

> +In the future, we'd like to support a case where XDP program can override
> +some of that metadata.

s/some of that metadata/some of the metadata used for building skbs/.

> +The plan of record is to make this path similar to ``bpf_redirect_map``
> +so the program can control which metadata is passed to the skb layer.

I'm not sure we are quite agreed on this part, just drop for now (it's
sorta covered by the above)?

> +bpf_redirect_map
> +================
> +
> +``bpf_redirect_map`` can redirect the frame to a different device.
> +In this case we don't know ahead of time whether that final consumer
> +will further redirect to an ``XSK`` or pass it to the kernel via ``XDP_PASS``.
> +Additionally, the final consumer doesn't have access to the original
> +hardware descriptor and can't access any of the original metadata.

Replace this paragraph with: "``bpf_redirect_map`` can redirect the
frame to a different device. Some devices (like virtual ethernet links)
support running a second XDP program after the redirect. However, the
final consumer doesn't have access to the original hardware descriptor
and can't access any of the original metadata. The same applies to XDP
programs installed into devmaps and cpumaps."

> +For this use-case, only custom metadata is currently supported. If
> +the frame is eventually passed to the kernel, the skb created from such
> +a frame won't have any skb metadata. The ``XSK`` consumer will only
> +have access to the custom metadata.

Reword as:

"This means that for redirected packets only custom metadata is
currently supported, which has to be prepared by the initial XDP program
before redirect. If +the frame is eventually passed to the kernel, the
skb created from such a frame won't have any hardware metadata populated
in its skb. And if such a packet is later redirected into an ``XSK``,
that will also only have access to the custom metadata."

> +bpf_tail_call
> +=============
> +
> +No special handling here. Tail-called program operates on the same context
> +as the original one.

Replace this with a statement that it is in fact *not* supported in tail
maps :)

-Toke
Toke Høiland-Jørgensen Dec. 14, 2022, 11:46 p.m. UTC | #6
Stanislav Fomichev <sdf@google.com> writes:

> On Wed, Dec 14, 2022 at 2:34 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Stanislav Fomichev <sdf@google.com> writes:
>>
>> > On Tue, Dec 13, 2022 at 8:37 AM David Vernet <void@manifault.com> wrote:
>> >>
>> >> On Mon, Dec 12, 2022 at 06:35:51PM -0800, Stanislav Fomichev wrote:
>> >> > Document all current use-cases and assumptions.
>> >> >
>> >> > Cc: John Fastabend <john.fastabend@gmail.com>
>> >> > Cc: David Ahern <dsahern@gmail.com>
>> >> > Cc: Martin KaFai Lau <martin.lau@linux.dev>
>> >> > Cc: Jakub Kicinski <kuba@kernel.org>
>> >> > Cc: Willem de Bruijn <willemb@google.com>
>> >> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
>> >> > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
>> >> > Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
>> >> > Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
>> >> > Cc: Maryam Tahhan <mtahhan@redhat.com>
>> >> > Cc: xdp-hints@xdp-project.net
>> >> > Cc: netdev@vger.kernel.org
>> >> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
>> >> > ---
>> >> >  Documentation/bpf/xdp-rx-metadata.rst | 90 +++++++++++++++++++++++++++
>> >> >  1 file changed, 90 insertions(+)
>> >> >  create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
>> >> >
>> >> > diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
>> >> > new file mode 100644
>> >> > index 000000000000..498eae718275
>> >> > --- /dev/null
>> >> > +++ b/Documentation/bpf/xdp-rx-metadata.rst
>> >>
>> >> I think you need to add this to Documentation/bpf/index.rst. Or even
>> >> better, maybe it's time to add an xdp/ subdirectory and put all docs
>> >> there? Don't want to block your patchset from bikeshedding on this
>> >> point, so for now it's fine to just put it in
>> >> Documentation/bpf/index.rst until we figure that out.
>> >
>> > Maybe let's put it under Documentation/networking/xdp-rx-metadata.rst
>> > and reference form Documentation/networking/index.rst? Since it's more
>> > relevant to networking than the core bpf?
>> >
>> >> > @@ -0,0 +1,90 @@
>> >> > +===============
>> >> > +XDP RX Metadata
>> >> > +===============
>> >> > +
>> >> > +XDP programs support creating and passing custom metadata via
>> >> > +``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
>> >> > +entities:
>> >>
>> >> Can you add a couple of sentences to this intro section that explains
>> >> what metadata is at a high level?
>> >
>> > I'm gonna copy-paste here what I'm adding, feel free to reply back if
>> > still unclear. (so we don't have to wait another week to discuss the
>> > changes)
>> >
>> > XDP programs support creating and passing custom metadata via
>> > ``bpf_xdp_adjust_meta``. The metadata can contain some extra information
>> > about the packet: timestamps, hash, vlan and tunneling information, etc.
>> > This metadata can be consumed by the following entities:
>>
>> This is not really accurate, though? The metadata area itself can
>> contain whatever the XDP program wants it to, and I think you're
>> conflating the "old" usage for arbitrary storage with the driver-kfunc
>> metadata support.
>>
>> I think we should clear separate the two: the metadata area is just a
>> place to store data (and is not consumed by the stack, except that
>> TC-BPF programs can access it), and the driver kfuncs are just a general
>> way to get data out of the drivers (and has nothing to do with the
>> metadata area, you can just get the data into stack variables).
>>
>> While it would be good to have a documentation of the general metadata
>> area stuff somewhere, I don't think it necessarily have to be part of
>> this series, so maybe just stick to documenting the kfuncs?
>
> Maybe I can reword to something like below?
>
> The metadata can be used to store some extra information about the
> packet timestamps, hash, vlan and tunneling information, etc.
>
> This way we are not actually defining what it is, but hinting about
> how it's commonly used?

Sent another reply to the original patch with some comments that are
hopefully a bit more constructive :)

-Toke
Stanislav Fomichev Dec. 15, 2022, 3:48 a.m. UTC | #7
On Wed, Dec 14, 2022 at 3:46 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Stanislav Fomichev <sdf@google.com> writes:
>
> > On Wed, Dec 14, 2022 at 2:34 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Stanislav Fomichev <sdf@google.com> writes:
> >>
> >> > On Tue, Dec 13, 2022 at 8:37 AM David Vernet <void@manifault.com> wrote:
> >> >>
> >> >> On Mon, Dec 12, 2022 at 06:35:51PM -0800, Stanislav Fomichev wrote:
> >> >> > Document all current use-cases and assumptions.
> >> >> >
> >> >> > Cc: John Fastabend <john.fastabend@gmail.com>
> >> >> > Cc: David Ahern <dsahern@gmail.com>
> >> >> > Cc: Martin KaFai Lau <martin.lau@linux.dev>
> >> >> > Cc: Jakub Kicinski <kuba@kernel.org>
> >> >> > Cc: Willem de Bruijn <willemb@google.com>
> >> >> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> >> >> > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> >> >> > Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> >> >> > Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> >> >> > Cc: Maryam Tahhan <mtahhan@redhat.com>
> >> >> > Cc: xdp-hints@xdp-project.net
> >> >> > Cc: netdev@vger.kernel.org
> >> >> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> >> >> > ---
> >> >> >  Documentation/bpf/xdp-rx-metadata.rst | 90 +++++++++++++++++++++++++++
> >> >> >  1 file changed, 90 insertions(+)
> >> >> >  create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
> >> >> >
> >> >> > diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
> >> >> > new file mode 100644
> >> >> > index 000000000000..498eae718275
> >> >> > --- /dev/null
> >> >> > +++ b/Documentation/bpf/xdp-rx-metadata.rst
> >> >>
> >> >> I think you need to add this to Documentation/bpf/index.rst. Or even
> >> >> better, maybe it's time to add an xdp/ subdirectory and put all docs
> >> >> there? Don't want to block your patchset from bikeshedding on this
> >> >> point, so for now it's fine to just put it in
> >> >> Documentation/bpf/index.rst until we figure that out.
> >> >
> >> > Maybe let's put it under Documentation/networking/xdp-rx-metadata.rst
> >> > and reference form Documentation/networking/index.rst? Since it's more
> >> > relevant to networking than the core bpf?
> >> >
> >> >> > @@ -0,0 +1,90 @@
> >> >> > +===============
> >> >> > +XDP RX Metadata
> >> >> > +===============
> >> >> > +
> >> >> > +XDP programs support creating and passing custom metadata via
> >> >> > +``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
> >> >> > +entities:
> >> >>
> >> >> Can you add a couple of sentences to this intro section that explains
> >> >> what metadata is at a high level?
> >> >
> >> > I'm gonna copy-paste here what I'm adding, feel free to reply back if
> >> > still unclear. (so we don't have to wait another week to discuss the
> >> > changes)
> >> >
> >> > XDP programs support creating and passing custom metadata via
> >> > ``bpf_xdp_adjust_meta``. The metadata can contain some extra information
> >> > about the packet: timestamps, hash, vlan and tunneling information, etc.
> >> > This metadata can be consumed by the following entities:
> >>
> >> This is not really accurate, though? The metadata area itself can
> >> contain whatever the XDP program wants it to, and I think you're
> >> conflating the "old" usage for arbitrary storage with the driver-kfunc
> >> metadata support.
> >>
> >> I think we should clear separate the two: the metadata area is just a
> >> place to store data (and is not consumed by the stack, except that
> >> TC-BPF programs can access it), and the driver kfuncs are just a general
> >> way to get data out of the drivers (and has nothing to do with the
> >> metadata area, you can just get the data into stack variables).
> >>
> >> While it would be good to have a documentation of the general metadata
> >> area stuff somewhere, I don't think it necessarily have to be part of
> >> this series, so maybe just stick to documenting the kfuncs?
> >
> > Maybe I can reword to something like below?
> >
> > The metadata can be used to store some extra information about the
> > packet timestamps, hash, vlan and tunneling information, etc.
> >
> > This way we are not actually defining what it is, but hinting about
> > how it's commonly used?
>
> Sent another reply to the original patch with some comments that are
> hopefully a bit more constructive :)

Thanks, everything makes sense, will incorporate. I'll also try to
share the patches privately with you sometime tomorrow maybe; not
super comfortable sending them out with a bunch of changes on top of
your authorship without some kind of ack from you :-)

> -Toke
>
Toke Høiland-Jørgensen Dec. 15, 2022, 2:04 p.m. UTC | #8
Stanislav Fomichev <sdf@google.com> writes:

> On Wed, Dec 14, 2022 at 3:46 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Stanislav Fomichev <sdf@google.com> writes:
>>
>> > On Wed, Dec 14, 2022 at 2:34 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Stanislav Fomichev <sdf@google.com> writes:
>> >>
>> >> > On Tue, Dec 13, 2022 at 8:37 AM David Vernet <void@manifault.com> wrote:
>> >> >>
>> >> >> On Mon, Dec 12, 2022 at 06:35:51PM -0800, Stanislav Fomichev wrote:
>> >> >> > Document all current use-cases and assumptions.
>> >> >> >
>> >> >> > Cc: John Fastabend <john.fastabend@gmail.com>
>> >> >> > Cc: David Ahern <dsahern@gmail.com>
>> >> >> > Cc: Martin KaFai Lau <martin.lau@linux.dev>
>> >> >> > Cc: Jakub Kicinski <kuba@kernel.org>
>> >> >> > Cc: Willem de Bruijn <willemb@google.com>
>> >> >> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
>> >> >> > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
>> >> >> > Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
>> >> >> > Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
>> >> >> > Cc: Maryam Tahhan <mtahhan@redhat.com>
>> >> >> > Cc: xdp-hints@xdp-project.net
>> >> >> > Cc: netdev@vger.kernel.org
>> >> >> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
>> >> >> > ---
>> >> >> >  Documentation/bpf/xdp-rx-metadata.rst | 90 +++++++++++++++++++++++++++
>> >> >> >  1 file changed, 90 insertions(+)
>> >> >> >  create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
>> >> >> >
>> >> >> > diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
>> >> >> > new file mode 100644
>> >> >> > index 000000000000..498eae718275
>> >> >> > --- /dev/null
>> >> >> > +++ b/Documentation/bpf/xdp-rx-metadata.rst
>> >> >>
>> >> >> I think you need to add this to Documentation/bpf/index.rst. Or even
>> >> >> better, maybe it's time to add an xdp/ subdirectory and put all docs
>> >> >> there? Don't want to block your patchset from bikeshedding on this
>> >> >> point, so for now it's fine to just put it in
>> >> >> Documentation/bpf/index.rst until we figure that out.
>> >> >
>> >> > Maybe let's put it under Documentation/networking/xdp-rx-metadata.rst
>> >> > and reference form Documentation/networking/index.rst? Since it's more
>> >> > relevant to networking than the core bpf?
>> >> >
>> >> >> > @@ -0,0 +1,90 @@
>> >> >> > +===============
>> >> >> > +XDP RX Metadata
>> >> >> > +===============
>> >> >> > +
>> >> >> > +XDP programs support creating and passing custom metadata via
>> >> >> > +``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
>> >> >> > +entities:
>> >> >>
>> >> >> Can you add a couple of sentences to this intro section that explains
>> >> >> what metadata is at a high level?
>> >> >
>> >> > I'm gonna copy-paste here what I'm adding, feel free to reply back if
>> >> > still unclear. (so we don't have to wait another week to discuss the
>> >> > changes)
>> >> >
>> >> > XDP programs support creating and passing custom metadata via
>> >> > ``bpf_xdp_adjust_meta``. The metadata can contain some extra information
>> >> > about the packet: timestamps, hash, vlan and tunneling information, etc.
>> >> > This metadata can be consumed by the following entities:
>> >>
>> >> This is not really accurate, though? The metadata area itself can
>> >> contain whatever the XDP program wants it to, and I think you're
>> >> conflating the "old" usage for arbitrary storage with the driver-kfunc
>> >> metadata support.
>> >>
>> >> I think we should clear separate the two: the metadata area is just a
>> >> place to store data (and is not consumed by the stack, except that
>> >> TC-BPF programs can access it), and the driver kfuncs are just a general
>> >> way to get data out of the drivers (and has nothing to do with the
>> >> metadata area, you can just get the data into stack variables).
>> >>
>> >> While it would be good to have a documentation of the general metadata
>> >> area stuff somewhere, I don't think it necessarily have to be part of
>> >> this series, so maybe just stick to documenting the kfuncs?
>> >
>> > Maybe I can reword to something like below?
>> >
>> > The metadata can be used to store some extra information about the
>> > packet timestamps, hash, vlan and tunneling information, etc.
>> >
>> > This way we are not actually defining what it is, but hinting about
>> > how it's commonly used?
>>
>> Sent another reply to the original patch with some comments that are
>> hopefully a bit more constructive :)
>
> Thanks, everything makes sense, will incorporate. I'll also try to
> share the patches privately with you sometime tomorrow maybe; not
> super comfortable sending them out with a bunch of changes on top of
> your authorship without some kind of ack from you :-)

OK, sounds good. Tomorrow (Friday) is my last day before the holidays,
but happy to look things over before I sign off :)

-Toke
kernel test robot Dec. 17, 2022, 4:20 a.m. UTC | #9
Hi Stanislav,

I love your patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Stanislav-Fomichev/xdp-hints-via-kfuncs/20221213-103902
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20221213023605.737383-2-sdf%40google.com
patch subject: [PATCH bpf-next v4 01/15] bpf: Document XDP RX metadata
reproduce:
        # https://github.com/intel-lab-lkp/linux/commit/7f9cb6dfe1965bc856540839d159aa81314c7ce6
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Stanislav-Fomichev/xdp-hints-via-kfuncs/20221213-103902
        git checkout 7f9cb6dfe1965bc856540839d159aa81314c7ce6
        make menuconfig
        # enable CONFIG_COMPILE_TEST, CONFIG_WARN_MISSING_DOCUMENTS, CONFIG_WARN_ABI_ERRORS
        make htmldocs

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> Documentation/bpf/xdp-rx-metadata.rst: WARNING: document isn't included in any toctree
diff mbox series

Patch

diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
new file mode 100644
index 000000000000..498eae718275
--- /dev/null
+++ b/Documentation/bpf/xdp-rx-metadata.rst
@@ -0,0 +1,90 @@ 
+===============
+XDP RX Metadata
+===============
+
+XDP programs support creating and passing custom metadata via
+``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
+entities:
+
+1. ``AF_XDP`` consumer.
+2. Kernel core stack via ``XDP_PASS``.
+3. Another device via ``bpf_redirect_map``.
+4. Other BPF programs via ``bpf_tail_call``.
+
+General Design
+==============
+
+XDP has access to a set of kfuncs to manipulate the metadata. Every
+device driver implements these kfuncs. The set of kfuncs is
+declared in ``include/net/xdp.h`` via ``XDP_METADATA_KFUNC_xxx``.
+
+Currently, the following kfuncs are supported. In the future, as more
+metadata is supported, this set will grow:
+
+- ``bpf_xdp_metadata_rx_timestamp_supported`` returns true/false to
+  indicate whether the device supports RX timestamps
+- ``bpf_xdp_metadata_rx_timestamp`` returns packet RX timestamp
+- ``bpf_xdp_metadata_rx_hash_supported`` returns true/false to
+  indicate whether the device supports RX hash
+- ``bpf_xdp_metadata_rx_hash`` returns packet RX hash
+
+Within the XDP frame, the metadata layout is as follows::
+
+  +----------+-----------------+------+
+  | headroom | custom metadata | data |
+  +----------+-----------------+------+
+             ^                 ^
+             |                 |
+   xdp_buff->data_meta   xdp_buff->data
+
+AF_XDP
+======
+
+``AF_XDP`` use-case implies that there is a contract between the BPF program
+that redirects XDP frames into the ``XSK`` and the final consumer.
+Thus the BPF program manually allocates a fixed number of
+bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset
+of kfuncs to populate it. User-space ``XSK`` consumer, looks
+at ``xsk_umem__get_data() - METADATA_SIZE`` to locate its metadata.
+
+Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer)::
+
+  +----------+-----------------+------+
+  | headroom | custom metadata | data |
+  +----------+-----------------+------+
+                               ^
+                               |
+                        rx_desc->address
+
+XDP_PASS
+========
+
+This is the path where the packets processed by the XDP program are passed
+into the kernel. The kernel creates ``skb`` out of the ``xdp_buff`` contents.
+Currently, every driver has a custom kernel code to parse the descriptors and
+populate ``skb`` metadata when doing this ``xdp_buff->skb`` conversion.
+In the future, we'd like to support a case where XDP program can override
+some of that metadata.
+
+The plan of record is to make this path similar to ``bpf_redirect_map``
+so the program can control which metadata is passed to the skb layer.
+
+bpf_redirect_map
+================
+
+``bpf_redirect_map`` can redirect the frame to a different device.
+In this case we don't know ahead of time whether that final consumer
+will further redirect to an ``XSK`` or pass it to the kernel via ``XDP_PASS``.
+Additionally, the final consumer doesn't have access to the original
+hardware descriptor and can't access any of the original metadata.
+
+For this use-case, only custom metadata is currently supported. If
+the frame is eventually passed to the kernel, the skb created from such
+a frame won't have any skb metadata. The ``XSK`` consumer will only
+have access to the custom metadata.
+
+bpf_tail_call
+=============
+
+No special handling here. Tail-called program operates on the same context
+as the original one.