diff mbox series

[bpf-next,01/11] bpf: Document XDP RX metadata

Message ID 20221115030210.3159213-2-sdf@google.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series xdp: hints via kfuncs | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 6 maintainers not CCed: hawk@kernel.org corbet@lwn.net davem@davemloft.net kuba@kernel.org netdev@vger.kernel.org linux-doc@vger.kernel.org
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: Missing or malformed SPDX-License-Identifier tag in line 1 WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ${{ matrix.test }} on ${{ matrix.arch }} with ${{ matrix.toolchain }}
bpf/vmtest-bpf-next-VM_Test-2 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-3 fail Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 fail Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-5 fail Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-6 fail Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-7 fail Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-8 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-9 success Logs for set-matrix

Commit Message

Stanislav Fomichev Nov. 15, 2022, 3:02 a.m. UTC
Document all current use-cases and assumptions.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 Documentation/bpf/xdp-rx-metadata.rst | 109 ++++++++++++++++++++++++++
 1 file changed, 109 insertions(+)
 create mode 100644 Documentation/bpf/xdp-rx-metadata.rst

Comments

Zvi Effron Nov. 15, 2022, 10:31 p.m. UTC | #1
On Mon, Nov 14, 2022 at 7:04 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> Document all current use-cases and assumptions.
>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  Documentation/bpf/xdp-rx-metadata.rst | 109 ++++++++++++++++++++++++++
>  1 file changed, 109 insertions(+)
>  create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
>
> diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
> new file mode 100644
> index 000000000000..5ddaaab8de31
> --- /dev/null
> +++ b/Documentation/bpf/xdp-rx-metadata.rst
> @@ -0,0 +1,109 @@
> +===============
> +XDP RX Metadata
> +===============
> +
> +XDP programs support creating and passing custom metadata via
> +``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
> +entities:
> +
> +1. ``AF_XDP`` consumer.
> +2. Kernel core stack via ``XDP_PASS``.
> +3. Another device via ``bpf_redirect_map``.

4. Other eBPF programs via eBPF tail calls.

> +
> +General Design
> +==============
> +
> +XDP has access to a set of kfuncs to manipulate the metadata. Every
> +device driver implements these kfuncs by generating BPF bytecode
> +to parse it out from the hardware descriptors. The set of kfuncs is
> +declared in ``include/net/xdp.h`` via ``XDP_METADATA_KFUNC_xxx``.
> +
> +Currently, the following kfuncs are supported. In the future, as more
> +metadata is supported, this set will grow:
> +
> +- ``bpf_xdp_metadata_rx_timestamp_supported`` returns true/false to
> +  indicate whether the device supports RX timestamps in general
> +- ``bpf_xdp_metadata_rx_timestamp`` returns packet RX timestamp or 0
> +- ``bpf_xdp_metadata_export_to_skb`` prepares metadata layout that
> +  the kernel will be able to consume. See ``bpf_redirect_map`` section
> +  below for more details.
> +
> +Within the XDP frame, the metadata layout is as follows::
> +
> +  +----------+------------------+-----------------+------+
> +  | headroom | xdp_skb_metadata | custom metadata | data |
> +  +----------+------------------+-----------------+------+
> +                                ^                 ^
> +                                |                 |
> +                      xdp_buff->data_meta   xdp_buff->data
> +
> +Where ``xdp_skb_metadata`` is the metadata prepared by
> +``bpf_xdp_metadata_export_to_skb``. And ``custom metadata``
> +is prepared by the BPF program via calls to ``bpf_xdp_adjust_meta``.
> +
> +Note that ``bpf_xdp_metadata_export_to_skb`` doesn't adjust
> +``xdp->data_meta`` pointer. To access the metadata generated
> +by ``bpf_xdp_metadata_export_to_skb`` use ``xdp_buf->skb_metadata``.
> +
> +AF_XDP
> +======
> +
> +``AF_XDP`` use-case implies that there is a contract between the BPF program
> +that redirects XDP frames into the ``XSK`` and the final consumer.
> +Thus the BPF program manually allocates a fixed number of
> +bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset
> +of kfuncs to populate it. User-space ``XSK`` consumer, looks
> +at ``xsk_umem__get_data() - METADATA_SIZE`` to locate its metadata.
> +
> +Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer)::
> +
> +  +----------+------------------+-----------------+------+
> +  | headroom | xdp_skb_metadata | custom metadata | data |
> +  +----------+------------------+-----------------+------+
> +                                                  ^
> +                                                  |
> +                                           rx_desc->address
> +
> +XDP_PASS
> +========
> +
> +This is the path where the packets processed by the XDP program are passed
> +into the kernel. The kernel creates ``skb`` out of the ``xdp_buff`` contents.
> +Currently, every driver has a custom kernel code to parse the descriptors and
> +populate ``skb`` metadata when doing this ``xdp_buff->skb`` conversion.
> +In the future, we'd like to support a case where XDP program can override
> +some of that metadata.
> +
> +The plan of record is to make this path similar to ``bpf_redirect_map``
> +below where the program would call ``bpf_xdp_metadata_export_to_skb``,
> +override the metadata and return ``XDP_PASS``. Additional work in
> +the drivers will be required to enable this (for example, to skip
> +populating ``skb`` metadata from the descriptors when
> +``bpf_xdp_metadata_export_to_skb`` has been called).
> +
> +bpf_redirect_map
> +================
> +
> +``bpf_redirect_map`` can redirect the frame to a different device.
> +In this case we don't know ahead of time whether that final consumer
> +will further redirect to an ``XSK`` or pass it to the kernel via ``XDP_PASS``.
> +Additionally, the final consumer doesn't have access to the original
> +hardware descriptor and can't access any of the original metadata.
> +
> +To support passing metadata via ``bpf_redirect_map``, there is a
> +``bpf_xdp_metadata_export_to_skb`` kfunc that populates a subset
> +of metadata into ``xdp_buff``. The layout is defined in
> +``struct xdp_skb_metadata``.
> +
> +Mixing custom metadata and xdp_skb_metadata
> +===========================================
> +
> +For the cases of ``bpf_redirect_map``, where the final consumer isn't
> +known ahead of time, the program can store both, custom metadata
> +and ``xdp_skb_metadata`` for the kernel consumption.
> +
> +Current limitation is that the program cannot adjust ``data_meta`` (via
> +``bpf_xdp_adjust_meta``) after a call to ``bpf_xdp_metadata_export_to_skb``.
> +So it has to, first, prepare its custom metadata layout and only then,
> +optionally, store ``xdp_skb_metadata`` via a call to
> +``bpf_xdp_metadata_export_to_skb``.
> --
> 2.38.1.431.g37b22c650d-goog
>
Stanislav Fomichev Nov. 15, 2022, 10:43 p.m. UTC | #2
On Tue, Nov 15, 2022 at 2:31 PM Zvi Effron <zeffron@riotgames.com> wrote:
>
> On Mon, Nov 14, 2022 at 7:04 PM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > Document all current use-cases and assumptions.
> >
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  Documentation/bpf/xdp-rx-metadata.rst | 109 ++++++++++++++++++++++++++
> >  1 file changed, 109 insertions(+)
> >  create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
> >
> > diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
> > new file mode 100644
> > index 000000000000..5ddaaab8de31
> > --- /dev/null
> > +++ b/Documentation/bpf/xdp-rx-metadata.rst
> > @@ -0,0 +1,109 @@
> > +===============
> > +XDP RX Metadata
> > +===============
> > +
> > +XDP programs support creating and passing custom metadata via
> > +``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
> > +entities:
> > +
> > +1. ``AF_XDP`` consumer.
> > +2. Kernel core stack via ``XDP_PASS``.
> > +3. Another device via ``bpf_redirect_map``.
>
> 4. Other eBPF programs via eBPF tail calls.

Don't think a tail call is a special case here?
Correct me if I'm wrong, but with a tail call, we retain the original
xdp_buff ctx, so the tail call can still use the same kfuncs as if the
original bpf prog was running.

> > +
> > +General Design
> > +==============
> > +
> > +XDP has access to a set of kfuncs to manipulate the metadata. Every
> > +device driver implements these kfuncs by generating BPF bytecode
> > +to parse it out from the hardware descriptors. The set of kfuncs is
> > +declared in ``include/net/xdp.h`` via ``XDP_METADATA_KFUNC_xxx``.
> > +
> > +Currently, the following kfuncs are supported. In the future, as more
> > +metadata is supported, this set will grow:
> > +
> > +- ``bpf_xdp_metadata_rx_timestamp_supported`` returns true/false to
> > +  indicate whether the device supports RX timestamps in general
> > +- ``bpf_xdp_metadata_rx_timestamp`` returns packet RX timestamp or 0
> > +- ``bpf_xdp_metadata_export_to_skb`` prepares metadata layout that
> > +  the kernel will be able to consume. See ``bpf_redirect_map`` section
> > +  below for more details.
> > +
> > +Within the XDP frame, the metadata layout is as follows::
> > +
> > +  +----------+------------------+-----------------+------+
> > +  | headroom | xdp_skb_metadata | custom metadata | data |
> > +  +----------+------------------+-----------------+------+
> > +                                ^                 ^
> > +                                |                 |
> > +                      xdp_buff->data_meta   xdp_buff->data
> > +
> > +Where ``xdp_skb_metadata`` is the metadata prepared by
> > +``bpf_xdp_metadata_export_to_skb``. And ``custom metadata``
> > +is prepared by the BPF program via calls to ``bpf_xdp_adjust_meta``.
> > +
> > +Note that ``bpf_xdp_metadata_export_to_skb`` doesn't adjust
> > +``xdp->data_meta`` pointer. To access the metadata generated
> > +by ``bpf_xdp_metadata_export_to_skb`` use ``xdp_buf->skb_metadata``.
> > +
> > +AF_XDP
> > +======
> > +
> > +``AF_XDP`` use-case implies that there is a contract between the BPF program
> > +that redirects XDP frames into the ``XSK`` and the final consumer.
> > +Thus the BPF program manually allocates a fixed number of
> > +bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset
> > +of kfuncs to populate it. User-space ``XSK`` consumer, looks
> > +at ``xsk_umem__get_data() - METADATA_SIZE`` to locate its metadata.
> > +
> > +Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer)::
> > +
> > +  +----------+------------------+-----------------+------+
> > +  | headroom | xdp_skb_metadata | custom metadata | data |
> > +  +----------+------------------+-----------------+------+
> > +                                                  ^
> > +                                                  |
> > +                                           rx_desc->address
> > +
> > +XDP_PASS
> > +========
> > +
> > +This is the path where the packets processed by the XDP program are passed
> > +into the kernel. The kernel creates ``skb`` out of the ``xdp_buff`` contents.
> > +Currently, every driver has a custom kernel code to parse the descriptors and
> > +populate ``skb`` metadata when doing this ``xdp_buff->skb`` conversion.
> > +In the future, we'd like to support a case where XDP program can override
> > +some of that metadata.
> > +
> > +The plan of record is to make this path similar to ``bpf_redirect_map``
> > +below where the program would call ``bpf_xdp_metadata_export_to_skb``,
> > +override the metadata and return ``XDP_PASS``. Additional work in
> > +the drivers will be required to enable this (for example, to skip
> > +populating ``skb`` metadata from the descriptors when
> > +``bpf_xdp_metadata_export_to_skb`` has been called).
> > +
> > +bpf_redirect_map
> > +================
> > +
> > +``bpf_redirect_map`` can redirect the frame to a different device.
> > +In this case we don't know ahead of time whether that final consumer
> > +will further redirect to an ``XSK`` or pass it to the kernel via ``XDP_PASS``.
> > +Additionally, the final consumer doesn't have access to the original
> > +hardware descriptor and can't access any of the original metadata.
> > +
> > +To support passing metadata via ``bpf_redirect_map``, there is a
> > +``bpf_xdp_metadata_export_to_skb`` kfunc that populates a subset
> > +of metadata into ``xdp_buff``. The layout is defined in
> > +``struct xdp_skb_metadata``.
> > +
> > +Mixing custom metadata and xdp_skb_metadata
> > +===========================================
> > +
> > +For the cases of ``bpf_redirect_map``, where the final consumer isn't
> > +known ahead of time, the program can store both, custom metadata
> > +and ``xdp_skb_metadata`` for the kernel consumption.
> > +
> > +Current limitation is that the program cannot adjust ``data_meta`` (via
> > +``bpf_xdp_adjust_meta``) after a call to ``bpf_xdp_metadata_export_to_skb``.
> > +So it has to, first, prepare its custom metadata layout and only then,
> > +optionally, store ``xdp_skb_metadata`` via a call to
> > +``bpf_xdp_metadata_export_to_skb``.
> > --
> > 2.38.1.431.g37b22c650d-goog
> >
Zvi Effron Nov. 15, 2022, 11:34 p.m. UTC | #3
On Tue, Nov 15, 2022 at 2:44 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> On Tue, Nov 15, 2022 at 2:31 PM Zvi Effron <zeffron@riotgames.com> wrote:
> >
> > On Mon, Nov 14, 2022 at 7:04 PM Stanislav Fomichev <sdf@google.com> wrote:
> > >
> > > Document all current use-cases and assumptions.
> > >
> > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > ---
> > > Documentation/bpf/xdp-rx-metadata.rst | 109 ++++++++++++++++++++++++++
> > > 1 file changed, 109 insertions(+)
> > > create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
> > >
> > > diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
> > > new file mode 100644
> > > index 000000000000..5ddaaab8de31
> > > --- /dev/null
> > > +++ b/Documentation/bpf/xdp-rx-metadata.rst
> > > @@ -0,0 +1,109 @@
> > > +===============
> > > +XDP RX Metadata
> > > +===============
> > > +
> > > +XDP programs support creating and passing custom metadata via
> > > +``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
> > > +entities:
> > > +
> > > +1. ``AF_XDP`` consumer.
> > > +2. Kernel core stack via ``XDP_PASS``.
> > > +3. Another device via ``bpf_redirect_map``.
> >
> > 4. Other eBPF programs via eBPF tail calls.
>
> Don't think a tail call is a special case here?
> Correct me if I'm wrong, but with a tail call, we retain the original
> xdp_buff ctx, so the tail call can still use the same kfuncs as if the
> original bpf prog was running.
>

That's correct, but it's still a separate program that consumes the metadata,
unrelated to anything kfuncs. Prior to the existence of kfuncs and AF_XDP, this
was (to my knowledge) the primary consumer (outside of the original program, of
course) of the metadata.

From the name of the file and commit message, it sounds like this is the
documentation for XDP metadata, not the documentation for XDP metadata as used
by kfuncs to implement xdp-hints. Is that correct?

> > > +
> > > +General Design
> > > +==============
> > > +
> > > +XDP has access to a set of kfuncs to manipulate the metadata. Every
> > > +device driver implements these kfuncs by generating BPF bytecode
> > > +to parse it out from the hardware descriptors. The set of kfuncs is
> > > +declared in ``include/net/xdp.h`` via ``XDP_METADATA_KFUNC_xxx``.
> > > +
> > > +Currently, the following kfuncs are supported. In the future, as more
> > > +metadata is supported, this set will grow:
> > > +
> > > +- ``bpf_xdp_metadata_rx_timestamp_supported`` returns true/false to
> > > + indicate whether the device supports RX timestamps in general
> > > +- ``bpf_xdp_metadata_rx_timestamp`` returns packet RX timestamp or 0
> > > +- ``bpf_xdp_metadata_export_to_skb`` prepares metadata layout that
> > > + the kernel will be able to consume. See ``bpf_redirect_map`` section
> > > + below for more details.
> > > +
> > > +Within the XDP frame, the metadata layout is as follows::
> > > +
> > > + +----------+------------------+-----------------+------+
> > > + | headroom | xdp_skb_metadata | custom metadata | data |
> > > + +----------+------------------+-----------------+------+
> > > + ^ ^
> > > + | |
> > > + xdp_buff->data_meta xdp_buff->data
> > > +
> > > +Where ``xdp_skb_metadata`` is the metadata prepared by
> > > +``bpf_xdp_metadata_export_to_skb``. And ``custom metadata``
> > > +is prepared by the BPF program via calls to ``bpf_xdp_adjust_meta``.
> > > +
> > > +Note that ``bpf_xdp_metadata_export_to_skb`` doesn't adjust
> > > +``xdp->data_meta`` pointer. To access the metadata generated
> > > +by ``bpf_xdp_metadata_export_to_skb`` use ``xdp_buf->skb_metadata``.
> > > +
> > > +AF_XDP
> > > +======
> > > +
> > > +``AF_XDP`` use-case implies that there is a contract between the BPF program
> > > +that redirects XDP frames into the ``XSK`` and the final consumer.
> > > +Thus the BPF program manually allocates a fixed number of
> > > +bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset
> > > +of kfuncs to populate it. User-space ``XSK`` consumer, looks
> > > +at ``xsk_umem__get_data() - METADATA_SIZE`` to locate its metadata.
> > > +
> > > +Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer)::
> > > +
> > > + +----------+------------------+-----------------+------+
> > > + | headroom | xdp_skb_metadata | custom metadata | data |
> > > + +----------+------------------+-----------------+------+
> > > + ^
> > > + |
> > > + rx_desc->address
> > > +
> > > +XDP_PASS
> > > +========
> > > +
> > > +This is the path where the packets processed by the XDP program are passed
> > > +into the kernel. The kernel creates ``skb`` out of the ``xdp_buff`` contents.
> > > +Currently, every driver has a custom kernel code to parse the descriptors and
> > > +populate ``skb`` metadata when doing this ``xdp_buff->skb`` conversion.
> > > +In the future, we'd like to support a case where XDP program can override
> > > +some of that metadata.
> > > +
> > > +The plan of record is to make this path similar to ``bpf_redirect_map``
> > > +below where the program would call ``bpf_xdp_metadata_export_to_skb``,
> > > +override the metadata and return ``XDP_PASS``. Additional work in
> > > +the drivers will be required to enable this (for example, to skip
> > > +populating ``skb`` metadata from the descriptors when
> > > +``bpf_xdp_metadata_export_to_skb`` has been called).
> > > +
> > > +bpf_redirect_map
> > > +================
> > > +
> > > +``bpf_redirect_map`` can redirect the frame to a different device.
> > > +In this case we don't know ahead of time whether that final consumer
> > > +will further redirect to an ``XSK`` or pass it to the kernel via ``XDP_PASS``.
> > > +Additionally, the final consumer doesn't have access to the original
> > > +hardware descriptor and can't access any of the original metadata.
> > > +
> > > +To support passing metadata via ``bpf_redirect_map``, there is a
> > > +``bpf_xdp_metadata_export_to_skb`` kfunc that populates a subset
> > > +of metadata into ``xdp_buff``. The layout is defined in
> > > +``struct xdp_skb_metadata``.
> > > +
> > > +Mixing custom metadata and xdp_skb_metadata
> > > +===========================================
> > > +
> > > +For the cases of ``bpf_redirect_map``, where the final consumer isn't
> > > +known ahead of time, the program can store both, custom metadata
> > > +and ``xdp_skb_metadata`` for the kernel consumption.
> > > +
> > > +Current limitation is that the program cannot adjust ``data_meta`` (via
> > > +``bpf_xdp_adjust_meta``) after a call to ``bpf_xdp_metadata_export_to_skb``.
> > > +So it has to, first, prepare its custom metadata layout and only then,
> > > +optionally, store ``xdp_skb_metadata`` via a call to
> > > +``bpf_xdp_metadata_export_to_skb``.
> > > --
> > > 2.38.1.431.g37b22c650d-goog
> > >
Stanislav Fomichev Nov. 16, 2022, 3:50 a.m. UTC | #4
On Tue, Nov 15, 2022 at 3:34 PM Zvi Effron <zeffron@riotgames.com> wrote:
>
> On Tue, Nov 15, 2022 at 2:44 PM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > On Tue, Nov 15, 2022 at 2:31 PM Zvi Effron <zeffron@riotgames.com> wrote:
> > >
> > > On Mon, Nov 14, 2022 at 7:04 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > >
> > > > Document all current use-cases and assumptions.
> > > >
> > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > ---
> > > > Documentation/bpf/xdp-rx-metadata.rst | 109 ++++++++++++++++++++++++++
> > > > 1 file changed, 109 insertions(+)
> > > > create mode 100644 Documentation/bpf/xdp-rx-metadata.rst
> > > >
> > > > diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
> > > > new file mode 100644
> > > > index 000000000000..5ddaaab8de31
> > > > --- /dev/null
> > > > +++ b/Documentation/bpf/xdp-rx-metadata.rst
> > > > @@ -0,0 +1,109 @@
> > > > +===============
> > > > +XDP RX Metadata
> > > > +===============
> > > > +
> > > > +XDP programs support creating and passing custom metadata via
> > > > +``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
> > > > +entities:
> > > > +
> > > > +1. ``AF_XDP`` consumer.
> > > > +2. Kernel core stack via ``XDP_PASS``.
> > > > +3. Another device via ``bpf_redirect_map``.
> > >
> > > 4. Other eBPF programs via eBPF tail calls.
> >
> > Don't think a tail call is a special case here?
> > Correct me if I'm wrong, but with a tail call, we retain the original
> > xdp_buff ctx, so the tail call can still use the same kfuncs as if the
> > original bpf prog was running.
> >
>
> That's correct, but it's still a separate program that consumes the metadata,
> unrelated to anything kfuncs. Prior to the existence of kfuncs and AF_XDP, this
> was (to my knowledge) the primary consumer (outside of the original program, of
> course) of the metadata.

SG. I'll add this #4 in the respin and will add a short note that the
tail call operates on the same ctx.

> From the name of the file and commit message, it sounds like this is the
> documentation for XDP metadata, not the documentation for XDP metadata as used
> by kfuncs to implement xdp-hints. Is that correct?

I'm mostly focused on the kfunc-related details for now.



> > > > +
> > > > +General Design
> > > > +==============
> > > > +
> > > > +XDP has access to a set of kfuncs to manipulate the metadata. Every
> > > > +device driver implements these kfuncs by generating BPF bytecode
> > > > +to parse it out from the hardware descriptors. The set of kfuncs is
> > > > +declared in ``include/net/xdp.h`` via ``XDP_METADATA_KFUNC_xxx``.
> > > > +
> > > > +Currently, the following kfuncs are supported. In the future, as more
> > > > +metadata is supported, this set will grow:
> > > > +
> > > > +- ``bpf_xdp_metadata_rx_timestamp_supported`` returns true/false to
> > > > + indicate whether the device supports RX timestamps in general
> > > > +- ``bpf_xdp_metadata_rx_timestamp`` returns packet RX timestamp or 0
> > > > +- ``bpf_xdp_metadata_export_to_skb`` prepares metadata layout that
> > > > + the kernel will be able to consume. See ``bpf_redirect_map`` section
> > > > + below for more details.
> > > > +
> > > > +Within the XDP frame, the metadata layout is as follows::
> > > > +
> > > > + +----------+------------------+-----------------+------+
> > > > + | headroom | xdp_skb_metadata | custom metadata | data |
> > > > + +----------+------------------+-----------------+------+
> > > > + ^ ^
> > > > + | |
> > > > + xdp_buff->data_meta xdp_buff->data
> > > > +
> > > > +Where ``xdp_skb_metadata`` is the metadata prepared by
> > > > +``bpf_xdp_metadata_export_to_skb``. And ``custom metadata``
> > > > +is prepared by the BPF program via calls to ``bpf_xdp_adjust_meta``.
> > > > +
> > > > +Note that ``bpf_xdp_metadata_export_to_skb`` doesn't adjust
> > > > +``xdp->data_meta`` pointer. To access the metadata generated
> > > > +by ``bpf_xdp_metadata_export_to_skb`` use ``xdp_buf->skb_metadata``.
> > > > +
> > > > +AF_XDP
> > > > +======
> > > > +
> > > > +``AF_XDP`` use-case implies that there is a contract between the BPF program
> > > > +that redirects XDP frames into the ``XSK`` and the final consumer.
> > > > +Thus the BPF program manually allocates a fixed number of
> > > > +bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset
> > > > +of kfuncs to populate it. User-space ``XSK`` consumer, looks
> > > > +at ``xsk_umem__get_data() - METADATA_SIZE`` to locate its metadata.
> > > > +
> > > > +Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer)::
> > > > +
> > > > + +----------+------------------+-----------------+------+
> > > > + | headroom | xdp_skb_metadata | custom metadata | data |
> > > > + +----------+------------------+-----------------+------+
> > > > + ^
> > > > + |
> > > > + rx_desc->address
> > > > +
> > > > +XDP_PASS
> > > > +========
> > > > +
> > > > +This is the path where the packets processed by the XDP program are passed
> > > > +into the kernel. The kernel creates ``skb`` out of the ``xdp_buff`` contents.
> > > > +Currently, every driver has a custom kernel code to parse the descriptors and
> > > > +populate ``skb`` metadata when doing this ``xdp_buff->skb`` conversion.
> > > > +In the future, we'd like to support a case where XDP program can override
> > > > +some of that metadata.
> > > > +
> > > > +The plan of record is to make this path similar to ``bpf_redirect_map``
> > > > +below where the program would call ``bpf_xdp_metadata_export_to_skb``,
> > > > +override the metadata and return ``XDP_PASS``. Additional work in
> > > > +the drivers will be required to enable this (for example, to skip
> > > > +populating ``skb`` metadata from the descriptors when
> > > > +``bpf_xdp_metadata_export_to_skb`` has been called).
> > > > +
> > > > +bpf_redirect_map
> > > > +================
> > > > +
> > > > +``bpf_redirect_map`` can redirect the frame to a different device.
> > > > +In this case we don't know ahead of time whether that final consumer
> > > > +will further redirect to an ``XSK`` or pass it to the kernel via ``XDP_PASS``.
> > > > +Additionally, the final consumer doesn't have access to the original
> > > > +hardware descriptor and can't access any of the original metadata.
> > > > +
> > > > +To support passing metadata via ``bpf_redirect_map``, there is a
> > > > +``bpf_xdp_metadata_export_to_skb`` kfunc that populates a subset
> > > > +of metadata into ``xdp_buff``. The layout is defined in
> > > > +``struct xdp_skb_metadata``.
> > > > +
> > > > +Mixing custom metadata and xdp_skb_metadata
> > > > +===========================================
> > > > +
> > > > +For the cases of ``bpf_redirect_map``, where the final consumer isn't
> > > > +known ahead of time, the program can store both, custom metadata
> > > > +and ``xdp_skb_metadata`` for the kernel consumption.
> > > > +
> > > > +Current limitation is that the program cannot adjust ``data_meta`` (via
> > > > +``bpf_xdp_adjust_meta``) after a call to ``bpf_xdp_metadata_export_to_skb``.
> > > > +So it has to, first, prepare its custom metadata layout and only then,
> > > > +optionally, store ``xdp_skb_metadata`` via a call to
> > > > +``bpf_xdp_metadata_export_to_skb``.
> > > > --
> > > > 2.38.1.431.g37b22c650d-goog
> > > >
diff mbox series

Patch

diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst
new file mode 100644
index 000000000000..5ddaaab8de31
--- /dev/null
+++ b/Documentation/bpf/xdp-rx-metadata.rst
@@ -0,0 +1,109 @@ 
+===============
+XDP RX Metadata
+===============
+
+XDP programs support creating and passing custom metadata via
+``bpf_xdp_adjust_meta``. This metadata can be consumed by the following
+entities:
+
+1. ``AF_XDP`` consumer.
+2. Kernel core stack via ``XDP_PASS``.
+3. Another device via ``bpf_redirect_map``.
+
+General Design
+==============
+
+XDP has access to a set of kfuncs to manipulate the metadata. Every
+device driver implements these kfuncs by generating BPF bytecode
+to parse it out from the hardware descriptors. The set of kfuncs is
+declared in ``include/net/xdp.h`` via ``XDP_METADATA_KFUNC_xxx``.
+
+Currently, the following kfuncs are supported. In the future, as more
+metadata is supported, this set will grow:
+
+- ``bpf_xdp_metadata_rx_timestamp_supported`` returns true/false to
+  indicate whether the device supports RX timestamps in general
+- ``bpf_xdp_metadata_rx_timestamp`` returns packet RX timestamp or 0
+- ``bpf_xdp_metadata_export_to_skb`` prepares metadata layout that
+  the kernel will be able to consume. See ``bpf_redirect_map`` section
+  below for more details.
+
+Within the XDP frame, the metadata layout is as follows::
+
+  +----------+------------------+-----------------+------+
+  | headroom | xdp_skb_metadata | custom metadata | data |
+  +----------+------------------+-----------------+------+
+                                ^                 ^
+                                |                 |
+                      xdp_buff->data_meta   xdp_buff->data
+
+Where ``xdp_skb_metadata`` is the metadata prepared by
+``bpf_xdp_metadata_export_to_skb``. And ``custom metadata``
+is prepared by the BPF program via calls to ``bpf_xdp_adjust_meta``.
+
+Note that ``bpf_xdp_metadata_export_to_skb`` doesn't adjust
+``xdp->data_meta`` pointer. To access the metadata generated
+by ``bpf_xdp_metadata_export_to_skb`` use ``xdp_buf->skb_metadata``.
+
+AF_XDP
+======
+
+``AF_XDP`` use-case implies that there is a contract between the BPF program
+that redirects XDP frames into the ``XSK`` and the final consumer.
+Thus the BPF program manually allocates a fixed number of
+bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset
+of kfuncs to populate it. User-space ``XSK`` consumer, looks
+at ``xsk_umem__get_data() - METADATA_SIZE`` to locate its metadata.
+
+Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer)::
+
+  +----------+------------------+-----------------+------+
+  | headroom | xdp_skb_metadata | custom metadata | data |
+  +----------+------------------+-----------------+------+
+                                                  ^
+                                                  |
+                                           rx_desc->address
+
+XDP_PASS
+========
+
+This is the path where the packets processed by the XDP program are passed
+into the kernel. The kernel creates ``skb`` out of the ``xdp_buff`` contents.
+Currently, every driver has a custom kernel code to parse the descriptors and
+populate ``skb`` metadata when doing this ``xdp_buff->skb`` conversion.
+In the future, we'd like to support a case where XDP program can override
+some of that metadata.
+
+The plan of record is to make this path similar to ``bpf_redirect_map``
+below where the program would call ``bpf_xdp_metadata_export_to_skb``,
+override the metadata and return ``XDP_PASS``. Additional work in
+the drivers will be required to enable this (for example, to skip
+populating ``skb`` metadata from the descriptors when
+``bpf_xdp_metadata_export_to_skb`` has been called).
+
+bpf_redirect_map
+================
+
+``bpf_redirect_map`` can redirect the frame to a different device.
+In this case we don't know ahead of time whether that final consumer
+will further redirect to an ``XSK`` or pass it to the kernel via ``XDP_PASS``.
+Additionally, the final consumer doesn't have access to the original
+hardware descriptor and can't access any of the original metadata.
+
+To support passing metadata via ``bpf_redirect_map``, there is a
+``bpf_xdp_metadata_export_to_skb`` kfunc that populates a subset
+of metadata into ``xdp_buff``. The layout is defined in
+``struct xdp_skb_metadata``.
+
+Mixing custom metadata and xdp_skb_metadata
+===========================================
+
+For the cases of ``bpf_redirect_map``, where the final consumer isn't
+known ahead of time, the program can store both, custom metadata
+and ``xdp_skb_metadata`` for the kernel consumption.
+
+Current limitation is that the program cannot adjust ``data_meta`` (via
+``bpf_xdp_adjust_meta``) after a call to ``bpf_xdp_metadata_export_to_skb``.
+So it has to, first, prepare its custom metadata layout and only then,
+optionally, store ``xdp_skb_metadata`` via a call to
+``bpf_xdp_metadata_export_to_skb``.