diff mbox series

[bpf-next,v5,01/17] bpf: Document XDP RX metadata

Message ID 20221220222043.3348718-2-sdf@google.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series xdp: hints via kfuncs | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ${{ matrix.test }} on ${{ matrix.arch }} with ${{ matrix.toolchain }}
bpf/vmtest-bpf-next-VM_Test-2 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 fail Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-7 fail Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-8 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-9 success Logs for set-matrix
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count fail Series longer than 15 patches (and no cover letter)
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 6 maintainers not CCed: edumazet@google.com davem@davemloft.net linux-doc@vger.kernel.org pabeni@redhat.com corbet@lwn.net hawk@kernel.org
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: Missing or malformed SPDX-License-Identifier tag in line 1 WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Stanislav Fomichev Dec. 20, 2022, 10:20 p.m. UTC
Document all current use-cases and assumptions.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 Documentation/networking/index.rst           |   1 +
 Documentation/networking/xdp-rx-metadata.rst | 107 +++++++++++++++++++
 2 files changed, 108 insertions(+)
 create mode 100644 Documentation/networking/xdp-rx-metadata.rst

Comments

David Vernet Dec. 28, 2022, 5:25 p.m. UTC | #1
On Tue, Dec 20, 2022 at 02:20:27PM -0800, Stanislav Fomichev wrote:
> Document all current use-cases and assumptions.
> 
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Martin KaFai Lau <martin.lau@linux.dev>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> Cc: Maryam Tahhan <mtahhan@redhat.com>
> Cc: xdp-hints@xdp-project.net
> Cc: netdev@vger.kernel.org
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  Documentation/networking/index.rst           |   1 +
>  Documentation/networking/xdp-rx-metadata.rst | 107 +++++++++++++++++++
>  2 files changed, 108 insertions(+)
>  create mode 100644 Documentation/networking/xdp-rx-metadata.rst
> 
> diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
> index 4f2d1f682a18..4ddcae33c336 100644
> --- a/Documentation/networking/index.rst
> +++ b/Documentation/networking/index.rst
> @@ -120,6 +120,7 @@ Refer to :ref:`netdev-FAQ` for a guide on netdev development process specifics.
>     xfrm_proc
>     xfrm_sync
>     xfrm_sysctl
> +   xdp-rx-metadata
>  
>  .. only::  subproject and html
>  
> diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
> new file mode 100644
> index 000000000000..37e8192d9b60
> --- /dev/null
> +++ b/Documentation/networking/xdp-rx-metadata.rst

Hey Stanislav,

This is looking excellent. Left a few more minor comments and
suggestions.

> @@ -0,0 +1,107 @@
> +===============
> +XDP RX Metadata
> +===============
> +
> +This document describes how an XDP program can access hardware metadata

In similar fashion to LWN articles, can we spell out what XDP means the
first time it's used, e.g.:

...describes how an eXpress Data Path (XDP) program...

In general this applies to other acronyms unless they're super obvious,
like "CPU" (thanks for already having done it for XSK).

> +related to a packet using a set of helper functions, and how it can pass
> +that metadata on to other consumers.
> +
> +General Design
> +==============
> +
> +XDP has access to a set of kfuncs to manipulate the metadata in an XDP frame.
> +Every device driver that wishes to expose additional packet metadata can
> +implement these kfuncs. The set of kfuncs is declared in ``include/net/xdp.h``
> +via ``XDP_METADATA_KFUNC_xxx``.
> +
> +Currently, the following kfuncs are supported. In the future, as more
> +metadata is supported, this set will grow:
> +
> +- ``bpf_xdp_metadata_rx_timestamp`` returns a packet's RX timestamp
> +- ``bpf_xdp_metadata_rx_hash`` returns a packet's RX hash

So, I leave this up to you as to whether or not you want to do this, but
there is a built-in mechanism in sphinx that converts doxygen comments
to rendered documentation for a function, struct, etc, and also
automatically links other places in documentation where the function is
referenced. See [0] for an example of this in code, and [1] for an
example of how it's rendered.

[0]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/Documentation/bpf/kfuncs.rst#n239
[1]: https://docs.kernel.org/bpf/kfuncs.html#c.bpf_task_acquire

So you would do something like add function headers to the kfuncs, and
then do:

.. kernel-doc:: net/core/xdp.c
   :identifiers: bpf_xdp_metadata_rx_timestamp bpf_xdp_metadata_rx_hash

At some point we will need a consistent story for how we document
kfuncs. That's not set in stone yet, which is why I'm saying it's up to
you whether or not you want to do this or just leave it as teletype with
a written description next to it.  Later on when we settle on a
documentation story for kfuncs, we can update all of them to be
documented in the same way.

> +The XDP program can use these kfuncs to read the metadata into stack

s/The/An

> +variables for its own consumption. Or, to pass the metadata on to other
> +consumers, an XDP program can store it into the metadata area carried
> +ahead of the packet.
> +
> +Not all kfuncs have to be implemented by the device driver; when not
> +implemented, the default ones that return ``-EOPNOTSUPP`` will be used.
> +
> +Within the XDP frame, the metadata layout is as follows::

s/the/an

> +
> +  +----------+-----------------+------+
> +  | headroom | custom metadata | data |
> +  +----------+-----------------+------+
> +             ^                 ^
> +             |                 |
> +   xdp_buff->data_meta   xdp_buff->data
> +
> +The XDP program can store individual metadata items into this data_meta

Can we teletype ``data_meta``? Also, s/The XDP program/An XDP program

> +area in whichever format it chooses. Later consumers of the metadata
> +will have to agree on the format by some out of band contract (like for
> +the AF_XDP use case, see below).
> +
> +AF_XDP
> +======
> +
> +``AF_XDP`` use-case implies that there is a contract between the BPF program
> +that redirects XDP frames into the ``AF_XDP`` socket (``XSK``) and the final
> +consumer. Thus the BPF program manually allocates a fixed number of

Can we have ``AF_XDP`` link to the ``AF_XDP`` docs page for any reader
that's not familiar with it? I think you can just do the following and
it should just work:

s/``AF_XDP``/:doc:`af_xdp`

> +bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset
> +of kfuncs to populate it. The userspace ``XSK`` consumer computes
> +``xsk_umem__get_data() - METADATA_SIZE`` to locate its metadata.

s/to locate its metadata/to locate that metadata

to make it clear that the consumer is consuming the metadata that was
populated by bpf_xdp_adjust_meta()? It's pretty clear from context but I
think "its metadata" may confuse some people as the metadata is not
really owned by the consumer. Also, should we mention that
xsk_umem__get_data() is defined in libxdp?

One other probably dumb question: is METADATA_SIZE static? If so, should
we provide a libxdp wrapper to access it?

> +
> +Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer)::
> +
> +  +----------+-----------------+------+
> +  | headroom | custom metadata | data |
> +  +----------+-----------------+------+
> +                               ^
> +                               |
> +                        rx_desc->address
> +
> +XDP_PASS
> +========
> +
> +This is the path where the packets processed by the XDP program are passed
> +into the kernel. The kernel creates the ``skb`` out of the ``xdp_buff``
> +contents. Currently, every driver has custom kernel code to parse
> +the descriptors and populate ``skb`` metadata when doing this ``xdp_buff->skb``
> +conversion, and the XDP metadata is not used by the kernel when building
> +skbs. However, TC-BPF programs can access the XDP metadata area using
> +the data_meta pointer.

``data_meta``

> +
> +In the future, we'd like to support a case where an XDP program
> +can override some of the metadata used for building skbs.
> +
> +bpf_redirect_map
> +================
> +
> +``bpf_redirect_map`` can redirect the frame to a different device.
> +Some devices (like virtual ethernet links) support running a second XDP
> +program after the redirect. However, the final consumer doesn't have
> +access to the original hardware descriptor and can't access any of
> +the original metadata. The same applies to XDP programs installed
> +into devmaps and cpumaps.
> +
> +This means that for redirected packets only custom metadata is
> +currently supported, which has to be prepared by the initial XDP program
> +before redirect. If the frame is eventually passed to the kernel, the
> +skb created from such a frame won't have any hardware metadata populated
> +in its skb. And if such a packet is later redirected into an ``XSK``,

s/And if/If

Also, can you add teletype ``skb`` in this paragraph?

> +that will also only have access to the custom metadata.
> +
> +
> +bpf_tail_call
> +=============
> +
> +Adding programs that access metadata kfuncs to the ``BPF_MAP_TYPE_PROG_ARRAY``
> +is currently not supported.
> +
> +Example
> +=======
> +
> +See ``tools/testing/selftests/bpf/progs/xdp_metadata.c`` and
> +``tools/testing/selftests/bpf/prog_tests/xdp_metadata.c`` for an example of
> +BPF program that handles XDP metadata.
> -- 
> 2.39.0.314.g84b9a713c41-goog

Looks great otherwise, thanks.
Stanislav Fomichev Jan. 3, 2023, 10:23 p.m. UTC | #2
On Wed, Dec 28, 2022 at 9:25 AM David Vernet <void@manifault.com> wrote:
>
> On Tue, Dec 20, 2022 at 02:20:27PM -0800, Stanislav Fomichev wrote:
> > Document all current use-cases and assumptions.
> >
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: David Ahern <dsahern@gmail.com>
> > Cc: Martin KaFai Lau <martin.lau@linux.dev>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Willem de Bruijn <willemb@google.com>
> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> > Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> > Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> > Cc: Maryam Tahhan <mtahhan@redhat.com>
> > Cc: xdp-hints@xdp-project.net
> > Cc: netdev@vger.kernel.org
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  Documentation/networking/index.rst           |   1 +
> >  Documentation/networking/xdp-rx-metadata.rst | 107 +++++++++++++++++++
> >  2 files changed, 108 insertions(+)
> >  create mode 100644 Documentation/networking/xdp-rx-metadata.rst
> >
> > diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
> > index 4f2d1f682a18..4ddcae33c336 100644
> > --- a/Documentation/networking/index.rst
> > +++ b/Documentation/networking/index.rst
> > @@ -120,6 +120,7 @@ Refer to :ref:`netdev-FAQ` for a guide on netdev development process specifics.
> >     xfrm_proc
> >     xfrm_sync
> >     xfrm_sysctl
> > +   xdp-rx-metadata
> >
> >  .. only::  subproject and html
> >
> > diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
> > new file mode 100644
> > index 000000000000..37e8192d9b60
> > --- /dev/null
> > +++ b/Documentation/networking/xdp-rx-metadata.rst
>
> Hey Stanislav,
>
> This is looking excellent. Left a few more minor comments and
> suggestions.
>
> > @@ -0,0 +1,107 @@
> > +===============
> > +XDP RX Metadata
> > +===============
> > +
> > +This document describes how an XDP program can access hardware metadata
>
> In similar fashion to LWN articles, can we spell out what XDP means the
> first time it's used, e.g.:
>
> ...describes how an eXpress Data Path (XDP) program...
>
> In general this applies to other acronyms unless they're super obvious,
> like "CPU" (thanks for already having done it for XSK).

Sure. Hopefully no need to explain RX below? Don't see anything else..
LMK if I missed something

> > +related to a packet using a set of helper functions, and how it can pass
> > +that metadata on to other consumers.
> > +
> > +General Design
> > +==============
> > +
> > +XDP has access to a set of kfuncs to manipulate the metadata in an XDP frame.
> > +Every device driver that wishes to expose additional packet metadata can
> > +implement these kfuncs. The set of kfuncs is declared in ``include/net/xdp.h``
> > +via ``XDP_METADATA_KFUNC_xxx``.
> > +
> > +Currently, the following kfuncs are supported. In the future, as more
> > +metadata is supported, this set will grow:
> > +
> > +- ``bpf_xdp_metadata_rx_timestamp`` returns a packet's RX timestamp
> > +- ``bpf_xdp_metadata_rx_hash`` returns a packet's RX hash
>
> So, I leave this up to you as to whether or not you want to do this, but
> there is a built-in mechanism in sphinx that converts doxygen comments
> to rendered documentation for a function, struct, etc, and also
> automatically links other places in documentation where the function is
> referenced. See [0] for an example of this in code, and [1] for an
> example of how it's rendered.
>
> [0]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/Documentation/bpf/kfuncs.rst#n239
> [1]: https://docs.kernel.org/bpf/kfuncs.html#c.bpf_task_acquire
>
> So you would do something like add function headers to the kfuncs, and
> then do:
>
> .. kernel-doc:: net/core/xdp.c
>    :identifiers: bpf_xdp_metadata_rx_timestamp bpf_xdp_metadata_rx_hash
>
> At some point we will need a consistent story for how we document
> kfuncs. That's not set in stone yet, which is why I'm saying it's up to
> you whether or not you want to do this or just leave it as teletype with
> a written description next to it.  Later on when we settle on a
> documentation story for kfuncs, we can update all of them to be
> documented in the same way.

Let me try and see how it looks in the html doc. I like the idea of
referencing the code directly, hopefully less chance it goes stale.

> > +The XDP program can use these kfuncs to read the metadata into stack
>
> s/The/An

Ack, ty!

> > +variables for its own consumption. Or, to pass the metadata on to other
> > +consumers, an XDP program can store it into the metadata area carried
> > +ahead of the packet.
> > +
> > +Not all kfuncs have to be implemented by the device driver; when not
> > +implemented, the default ones that return ``-EOPNOTSUPP`` will be used.
> > +
> > +Within the XDP frame, the metadata layout is as follows::
>
> s/the/an

Thx!

> > +
> > +  +----------+-----------------+------+
> > +  | headroom | custom metadata | data |
> > +  +----------+-----------------+------+
> > +             ^                 ^
> > +             |                 |
> > +   xdp_buff->data_meta   xdp_buff->data
> > +
> > +The XDP program can store individual metadata items into this data_meta
>
> Can we teletype ``data_meta``? Also, s/The XDP program/An XDP program

Sure.

> > +area in whichever format it chooses. Later consumers of the metadata
> > +will have to agree on the format by some out of band contract (like for
> > +the AF_XDP use case, see below).
> > +
> > +AF_XDP
> > +======
> > +
> > +``AF_XDP`` use-case implies that there is a contract between the BPF program
> > +that redirects XDP frames into the ``AF_XDP`` socket (``XSK``) and the final
> > +consumer. Thus the BPF program manually allocates a fixed number of
>
> Can we have ``AF_XDP`` link to the ``AF_XDP`` docs page for any reader
> that's not familiar with it? I think you can just do the following and
> it should just work:
>
> s/``AF_XDP``/:doc:`af_xdp`

Will try.

> > +bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset
> > +of kfuncs to populate it. The userspace ``XSK`` consumer computes
> > +``xsk_umem__get_data() - METADATA_SIZE`` to locate its metadata.
>
> s/to locate its metadata/to locate that metadata

SG.

> to make it clear that the consumer is consuming the metadata that was
> populated by bpf_xdp_adjust_meta()? It's pretty clear from context but I
> think "its metadata" may confuse some people as the metadata is not

[..]

> really owned by the consumer. Also, should we mention that
> xsk_umem__get_data() is defined in libxdp?

Add something like:

Note, ``xsk_umem__get_data`` is defined in ``libxdp`` and
``METADATA_SIZE`` is an application-specific constant.

?

> One other probably dumb question: is METADATA_SIZE static? If so, should
> we provide a libxdp wrapper to access it?

(added a note above)

> > +
> > +Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer)::
> > +
> > +  +----------+-----------------+------+
> > +  | headroom | custom metadata | data |
> > +  +----------+-----------------+------+
> > +                               ^
> > +                               |
> > +                        rx_desc->address
> > +
> > +XDP_PASS
> > +========
> > +
> > +This is the path where the packets processed by the XDP program are passed
> > +into the kernel. The kernel creates the ``skb`` out of the ``xdp_buff``
> > +contents. Currently, every driver has custom kernel code to parse
> > +the descriptors and populate ``skb`` metadata when doing this ``xdp_buff->skb``
> > +conversion, and the XDP metadata is not used by the kernel when building
> > +skbs. However, TC-BPF programs can access the XDP metadata area using
> > +the data_meta pointer.
>
> ``data_meta``

Ack.

> > +
> > +In the future, we'd like to support a case where an XDP program
> > +can override some of the metadata used for building skbs.
> > +
> > +bpf_redirect_map
> > +================
> > +
> > +``bpf_redirect_map`` can redirect the frame to a different device.
> > +Some devices (like virtual ethernet links) support running a second XDP
> > +program after the redirect. However, the final consumer doesn't have
> > +access to the original hardware descriptor and can't access any of
> > +the original metadata. The same applies to XDP programs installed
> > +into devmaps and cpumaps.
> > +
> > +This means that for redirected packets only custom metadata is
> > +currently supported, which has to be prepared by the initial XDP program
> > +before redirect. If the frame is eventually passed to the kernel, the
> > +skb created from such a frame won't have any hardware metadata populated
> > +in its skb. And if such a packet is later redirected into an ``XSK``,
>
> s/And if/If
>
> Also, can you add teletype ``skb`` in this paragraph?

Sure, thanks.

> > +that will also only have access to the custom metadata.
> > +
> > +
> > +bpf_tail_call
> > +=============
> > +
> > +Adding programs that access metadata kfuncs to the ``BPF_MAP_TYPE_PROG_ARRAY``
> > +is currently not supported.
> > +
> > +Example
> > +=======
> > +
> > +See ``tools/testing/selftests/bpf/progs/xdp_metadata.c`` and
> > +``tools/testing/selftests/bpf/prog_tests/xdp_metadata.c`` for an example of
> > +BPF program that handles XDP metadata.
> > --
> > 2.39.0.314.g84b9a713c41-goog
>
> Looks great otherwise, thanks.
David Vernet Jan. 4, 2023, 4:02 p.m. UTC | #3
On Tue, Jan 03, 2023 at 02:23:03PM -0800, Stanislav Fomichev wrote:
> On Wed, Dec 28, 2022 at 9:25 AM David Vernet <void@manifault.com> wrote:
> >
> > On Tue, Dec 20, 2022 at 02:20:27PM -0800, Stanislav Fomichev wrote:
> > > Document all current use-cases and assumptions.
> > >
> > > Cc: John Fastabend <john.fastabend@gmail.com>
> > > Cc: David Ahern <dsahern@gmail.com>
> > > Cc: Martin KaFai Lau <martin.lau@linux.dev>
> > > Cc: Jakub Kicinski <kuba@kernel.org>
> > > Cc: Willem de Bruijn <willemb@google.com>
> > > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> > > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> > > Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
> > > Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
> > > Cc: Maryam Tahhan <mtahhan@redhat.com>
> > > Cc: xdp-hints@xdp-project.net
> > > Cc: netdev@vger.kernel.org
> > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > ---
> > >  Documentation/networking/index.rst           |   1 +
> > >  Documentation/networking/xdp-rx-metadata.rst | 107 +++++++++++++++++++
> > >  2 files changed, 108 insertions(+)
> > >  create mode 100644 Documentation/networking/xdp-rx-metadata.rst
> > >
> > > diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
> > > index 4f2d1f682a18..4ddcae33c336 100644
> > > --- a/Documentation/networking/index.rst
> > > +++ b/Documentation/networking/index.rst
> > > @@ -120,6 +120,7 @@ Refer to :ref:`netdev-FAQ` for a guide on netdev development process specifics.
> > >     xfrm_proc
> > >     xfrm_sync
> > >     xfrm_sysctl
> > > +   xdp-rx-metadata
> > >
> > >  .. only::  subproject and html
> > >
> > > diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
> > > new file mode 100644
> > > index 000000000000..37e8192d9b60
> > > --- /dev/null
> > > +++ b/Documentation/networking/xdp-rx-metadata.rst
> >
> > Hey Stanislav,
> >
> > This is looking excellent. Left a few more minor comments and
> > suggestions.
> >
> > > @@ -0,0 +1,107 @@
> > > +===============
> > > +XDP RX Metadata
> > > +===============
> > > +
> > > +This document describes how an XDP program can access hardware metadata
> >
> > In similar fashion to LWN articles, can we spell out what XDP means the
> > first time it's used, e.g.:
> >
> > ...describes how an eXpress Data Path (XDP) program...
> >
> > In general this applies to other acronyms unless they're super obvious,
> > like "CPU" (thanks for already having done it for XSK).
> 
> Sure. Hopefully no need to explain RX below? Don't see anything else..
> LMK if I missed something

Yeah, I think we can forego RX.

> 
> > > +related to a packet using a set of helper functions, and how it can pass
> > > +that metadata on to other consumers.
> > > +
> > > +General Design
> > > +==============
> > > +
> > > +XDP has access to a set of kfuncs to manipulate the metadata in an XDP frame.
> > > +Every device driver that wishes to expose additional packet metadata can
> > > +implement these kfuncs. The set of kfuncs is declared in ``include/net/xdp.h``
> > > +via ``XDP_METADATA_KFUNC_xxx``.
> > > +
> > > +Currently, the following kfuncs are supported. In the future, as more
> > > +metadata is supported, this set will grow:
> > > +
> > > +- ``bpf_xdp_metadata_rx_timestamp`` returns a packet's RX timestamp
> > > +- ``bpf_xdp_metadata_rx_hash`` returns a packet's RX hash
> >
> > So, I leave this up to you as to whether or not you want to do this, but
> > there is a built-in mechanism in sphinx that converts doxygen comments
> > to rendered documentation for a function, struct, etc, and also
> > automatically links other places in documentation where the function is
> > referenced. See [0] for an example of this in code, and [1] for an
> > example of how it's rendered.
> >
> > [0]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/Documentation/bpf/kfuncs.rst#n239
> > [1]: https://docs.kernel.org/bpf/kfuncs.html#c.bpf_task_acquire
> >
> > So you would do something like add function headers to the kfuncs, and
> > then do:
> >
> > .. kernel-doc:: net/core/xdp.c
> >    :identifiers: bpf_xdp_metadata_rx_timestamp bpf_xdp_metadata_rx_hash
> >
> > At some point we will need a consistent story for how we document
> > kfuncs. That's not set in stone yet, which is why I'm saying it's up to
> > you whether or not you want to do this or just leave it as teletype with
> > a written description next to it.  Later on when we settle on a
> > documentation story for kfuncs, we can update all of them to be
> > documented in the same way.
> 
> Let me try and see how it looks in the html doc. I like the idea of
> referencing the code directly, hopefully less chance it goes stale.

Sounds good!

[...]

Thanks for making all these changes.

- David
diff mbox series

Patch

diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 4f2d1f682a18..4ddcae33c336 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -120,6 +120,7 @@  Refer to :ref:`netdev-FAQ` for a guide on netdev development process specifics.
    xfrm_proc
    xfrm_sync
    xfrm_sysctl
+   xdp-rx-metadata
 
 .. only::  subproject and html
 
diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
new file mode 100644
index 000000000000..37e8192d9b60
--- /dev/null
+++ b/Documentation/networking/xdp-rx-metadata.rst
@@ -0,0 +1,107 @@ 
+===============
+XDP RX Metadata
+===============
+
+This document describes how an XDP program can access hardware metadata
+related to a packet using a set of helper functions, and how it can pass
+that metadata on to other consumers.
+
+General Design
+==============
+
+XDP has access to a set of kfuncs to manipulate the metadata in an XDP frame.
+Every device driver that wishes to expose additional packet metadata can
+implement these kfuncs. The set of kfuncs is declared in ``include/net/xdp.h``
+via ``XDP_METADATA_KFUNC_xxx``.
+
+Currently, the following kfuncs are supported. In the future, as more
+metadata is supported, this set will grow:
+
+- ``bpf_xdp_metadata_rx_timestamp`` returns a packet's RX timestamp
+- ``bpf_xdp_metadata_rx_hash`` returns a packet's RX hash
+
+The XDP program can use these kfuncs to read the metadata into stack
+variables for its own consumption. Or, to pass the metadata on to other
+consumers, an XDP program can store it into the metadata area carried
+ahead of the packet.
+
+Not all kfuncs have to be implemented by the device driver; when not
+implemented, the default ones that return ``-EOPNOTSUPP`` will be used.
+
+Within the XDP frame, the metadata layout is as follows::
+
+  +----------+-----------------+------+
+  | headroom | custom metadata | data |
+  +----------+-----------------+------+
+             ^                 ^
+             |                 |
+   xdp_buff->data_meta   xdp_buff->data
+
+The XDP program can store individual metadata items into this data_meta
+area in whichever format it chooses. Later consumers of the metadata
+will have to agree on the format by some out of band contract (like for
+the AF_XDP use case, see below).
+
+AF_XDP
+======
+
+``AF_XDP`` use-case implies that there is a contract between the BPF program
+that redirects XDP frames into the ``AF_XDP`` socket (``XSK``) and the final
+consumer. Thus the BPF program manually allocates a fixed number of
+bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset
+of kfuncs to populate it. The userspace ``XSK`` consumer computes
+``xsk_umem__get_data() - METADATA_SIZE`` to locate its metadata.
+
+Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer)::
+
+  +----------+-----------------+------+
+  | headroom | custom metadata | data |
+  +----------+-----------------+------+
+                               ^
+                               |
+                        rx_desc->address
+
+XDP_PASS
+========
+
+This is the path where the packets processed by the XDP program are passed
+into the kernel. The kernel creates the ``skb`` out of the ``xdp_buff``
+contents. Currently, every driver has custom kernel code to parse
+the descriptors and populate ``skb`` metadata when doing this ``xdp_buff->skb``
+conversion, and the XDP metadata is not used by the kernel when building
+skbs. However, TC-BPF programs can access the XDP metadata area using
+the data_meta pointer.
+
+In the future, we'd like to support a case where an XDP program
+can override some of the metadata used for building skbs.
+
+bpf_redirect_map
+================
+
+``bpf_redirect_map`` can redirect the frame to a different device.
+Some devices (like virtual ethernet links) support running a second XDP
+program after the redirect. However, the final consumer doesn't have
+access to the original hardware descriptor and can't access any of
+the original metadata. The same applies to XDP programs installed
+into devmaps and cpumaps.
+
+This means that for redirected packets only custom metadata is
+currently supported, which has to be prepared by the initial XDP program
+before redirect. If the frame is eventually passed to the kernel, the
+skb created from such a frame won't have any hardware metadata populated
+in its skb. And if such a packet is later redirected into an ``XSK``,
+that will also only have access to the custom metadata.
+
+
+bpf_tail_call
+=============
+
+Adding programs that access metadata kfuncs to the ``BPF_MAP_TYPE_PROG_ARRAY``
+is currently not supported.
+
+Example
+=======
+
+See ``tools/testing/selftests/bpf/progs/xdp_metadata.c`` and
+``tools/testing/selftests/bpf/prog_tests/xdp_metadata.c`` for an example of
+BPF program that handles XDP metadata.