Message ID | 20230112003230.3779451-2-sdf@google.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | xdp: hints via kfuncs | expand |
On 12/01/2023 01.32, Stanislav Fomichev wrote: > Document all current use-cases and assumptions. > > Cc: John Fastabend <john.fastabend@gmail.com> > Cc: David Ahern <dsahern@gmail.com> > Cc: Martin KaFai Lau <martin.lau@linux.dev> > Cc: Jakub Kicinski <kuba@kernel.org> > Cc: Willem de Bruijn <willemb@google.com> > Cc: Jesper Dangaard Brouer <brouer@redhat.com> > Cc: Anatoly Burakov <anatoly.burakov@intel.com> > Cc: Alexander Lobakin <alexandr.lobakin@intel.com> > Cc: Magnus Karlsson <magnus.karlsson@gmail.com> > Cc: Maryam Tahhan <mtahhan@redhat.com> > Cc: xdp-hints@xdp-project.net > Cc: netdev@vger.kernel.org > Acked-by: David Vernet <void@manifault.com> > Signed-off-by: Stanislav Fomichev <sdf@google.com> > --- > Documentation/networking/index.rst | 1 + > Documentation/networking/xdp-rx-metadata.rst | 108 +++++++++++++++++++ > 2 files changed, 109 insertions(+) > create mode 100644 Documentation/networking/xdp-rx-metadata.rst > > diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst > index 4f2d1f682a18..4ddcae33c336 100644 > --- a/Documentation/networking/index.rst > +++ b/Documentation/networking/index.rst > @@ -120,6 +120,7 @@ Refer to :ref:`netdev-FAQ` for a guide on netdev development process specifics. > xfrm_proc > xfrm_sync > xfrm_sysctl > + xdp-rx-metadata > > .. only:: subproject and html > > diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst > new file mode 100644 > index 000000000000..b6c8c77937c4 > --- /dev/null > +++ b/Documentation/networking/xdp-rx-metadata.rst > @@ -0,0 +1,108 @@ > +=============== > +XDP RX Metadata > +=============== > + > +This document describes how an eXpress Data Path (XDP) program can access > +hardware metadata related to a packet using a set of helper functions, > +and how it can pass that metadata on to other consumers. > + > +General Design > +============== > + > +XDP has access to a set of kfuncs to manipulate the metadata in an XDP frame. > +Every device driver that wishes to expose additional packet metadata can > +implement these kfuncs. The set of kfuncs is declared in ``include/net/xdp.h`` > +via ``XDP_METADATA_KFUNC_xxx``. > + > +Currently, the following kfuncs are supported. In the future, as more > +metadata is supported, this set will grow: > + > +.. kernel-doc:: net/core/xdp.c > + :identifiers: bpf_xdp_metadata_rx_timestamp bpf_xdp_metadata_rx_hash > + > +An XDP program can use these kfuncs to read the metadata into stack > +variables for its own consumption. Or, to pass the metadata on to other > +consumers, an XDP program can store it into the metadata area carried > +ahead of the packet. > + > +Not all kfuncs have to be implemented by the device driver; when not > +implemented, the default ones that return ``-EOPNOTSUPP`` will be used. > + > +Within an XDP frame, the metadata layout is as follows:: Below diagram describes XDP buff (xdp_buff), but text says 'XDP frame'. So XDP frame isn't referring literally to xdp_frame, which I find slightly confusing. It is likely because I think too much about the code and the different objects, xdp_frame, xdp_buff, xdp_md (xdp ctx seen be bpf-prog). I tried to grep in the (recent added) bpf/xdp docs to see if there is a definition of a XDP "packet" or "frame". Nothing popped up, except that Documentation/bpf/map_cpumap.rst talks about raw ``xdp_frame`` objects. Perhaps we can improve this doc by calling out xdp_buff here, like: Within an XDP frame, the metadata layout (accessed via ``xdp_buff``) is as follows:: > + > + +----------+-----------------+------+ > + | headroom | custom metadata | data | > + +----------+-----------------+------+ > + ^ ^ > + | | > + xdp_buff->data_meta xdp_buff->data > + > +An XDP program can store individual metadata items into this ``data_meta`` > +area in whichever format it chooses. Later consumers of the metadata > +will have to agree on the format by some out of band contract (like for > +the AF_XDP use case, see below). > + > +AF_XDP > +====== > + > +:doc:`af_xdp` use-case implies that there is a contract between the BPF > +program that redirects XDP frames into the ``AF_XDP`` socket (``XSK``) and > +the final consumer. Thus the BPF program manually allocates a fixed number of > +bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset > +of kfuncs to populate it. The userspace ``XSK`` consumer computes > +``xsk_umem__get_data() - METADATA_SIZE`` to locate that metadata. > +Note, ``xsk_umem__get_data`` is defined in ``libxdp`` and > +``METADATA_SIZE`` is an application-specific constant. The main problem with AF_XDP and metadata is that, the AF_XDP descriptor doesn't contain any info about the length METADATA_SIZE. The text does says this, but in a very convoluted way. I think this challenge should be more clearly spelled out. (p.s. This was something that XDP-hints via BTF have a proposed solution for) > + > +Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer):: The "note" also hint to this issue. > + > + +----------+-----------------+------+ > + | headroom | custom metadata | data | > + +----------+-----------------+------+ > + ^ > + | > + rx_desc->address > + > +XDP_PASS > +======== > + > +This is the path where the packets processed by the XDP program are passed > +into the kernel. The kernel creates the ``skb`` out of the ``xdp_buff`` > +contents. Currently, every driver has custom kernel code to parse > +the descriptors and populate ``skb`` metadata when doing this ``xdp_buff->skb`` > +conversion, and the XDP metadata is not used by the kernel when building > +``skbs``. However, TC-BPF programs can access the XDP metadata area using > +the ``data_meta`` pointer. > + > +In the future, we'd like to support a case where an XDP program > +can override some of the metadata used for building ``skbs``. Happy this is mentioned as future work. > + > +bpf_redirect_map > +================ > + > +``bpf_redirect_map`` can redirect the frame to a different device. > +Some devices (like virtual ethernet links) support running a second XDP > +program after the redirect. However, the final consumer doesn't have > +access to the original hardware descriptor and can't access any of > +the original metadata. The same applies to XDP programs installed > +into devmaps and cpumaps. > + > +This means that for redirected packets only custom metadata is > +currently supported, which has to be prepared by the initial XDP program > +before redirect. If the frame is eventually passed to the kernel, the > +``skb`` created from such a frame won't have any hardware metadata populated > +in its ``skb``. If such a packet is later redirected into an ``XSK``, > +that will also only have access to the custom metadata. > + Good that this is documented, but I hope we can fix/improve this as future work. > +bpf_tail_call > +============= > + > +Adding programs that access metadata kfuncs to the ``BPF_MAP_TYPE_PROG_ARRAY`` > +is currently not supported. > + > +Example > +======= > + > +See ``tools/testing/selftests/bpf/progs/xdp_metadata.c`` and > +``tools/testing/selftests/bpf/prog_tests/xdp_metadata.c`` for an example of > +BPF program that handles XDP metadata. --Jesper
On Mon, Jan 16, 2023 at 5:09 AM Jesper Dangaard Brouer <jbrouer@redhat.com> wrote: > > > > On 12/01/2023 01.32, Stanislav Fomichev wrote: > > Document all current use-cases and assumptions. > > > > Cc: John Fastabend <john.fastabend@gmail.com> > > Cc: David Ahern <dsahern@gmail.com> > > Cc: Martin KaFai Lau <martin.lau@linux.dev> > > Cc: Jakub Kicinski <kuba@kernel.org> > > Cc: Willem de Bruijn <willemb@google.com> > > Cc: Jesper Dangaard Brouer <brouer@redhat.com> > > Cc: Anatoly Burakov <anatoly.burakov@intel.com> > > Cc: Alexander Lobakin <alexandr.lobakin@intel.com> > > Cc: Magnus Karlsson <magnus.karlsson@gmail.com> > > Cc: Maryam Tahhan <mtahhan@redhat.com> > > Cc: xdp-hints@xdp-project.net > > Cc: netdev@vger.kernel.org > > Acked-by: David Vernet <void@manifault.com> > > Signed-off-by: Stanislav Fomichev <sdf@google.com> > > --- > > Documentation/networking/index.rst | 1 + > > Documentation/networking/xdp-rx-metadata.rst | 108 +++++++++++++++++++ > > 2 files changed, 109 insertions(+) > > create mode 100644 Documentation/networking/xdp-rx-metadata.rst > > > > diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst > > index 4f2d1f682a18..4ddcae33c336 100644 > > --- a/Documentation/networking/index.rst > > +++ b/Documentation/networking/index.rst > > @@ -120,6 +120,7 @@ Refer to :ref:`netdev-FAQ` for a guide on netdev development process specifics. > > xfrm_proc > > xfrm_sync > > xfrm_sysctl > > + xdp-rx-metadata > > > > .. only:: subproject and html > > > > diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst > > new file mode 100644 > > index 000000000000..b6c8c77937c4 > > --- /dev/null > > +++ b/Documentation/networking/xdp-rx-metadata.rst > > @@ -0,0 +1,108 @@ > > +=============== > > +XDP RX Metadata > > +=============== > > + > > +This document describes how an eXpress Data Path (XDP) program can access > > +hardware metadata related to a packet using a set of helper functions, > > +and how it can pass that metadata on to other consumers. > > + > > +General Design > > +============== > > + > > +XDP has access to a set of kfuncs to manipulate the metadata in an XDP frame. > > +Every device driver that wishes to expose additional packet metadata can > > +implement these kfuncs. The set of kfuncs is declared in ``include/net/xdp.h`` > > +via ``XDP_METADATA_KFUNC_xxx``. > > + > > +Currently, the following kfuncs are supported. In the future, as more > > +metadata is supported, this set will grow: > > + > > +.. kernel-doc:: net/core/xdp.c > > + :identifiers: bpf_xdp_metadata_rx_timestamp bpf_xdp_metadata_rx_hash > > + > > +An XDP program can use these kfuncs to read the metadata into stack > > +variables for its own consumption. Or, to pass the metadata on to other > > +consumers, an XDP program can store it into the metadata area carried > > +ahead of the packet. > > + > > +Not all kfuncs have to be implemented by the device driver; when not > > +implemented, the default ones that return ``-EOPNOTSUPP`` will be used. > > + > > +Within an XDP frame, the metadata layout is as follows:: > > Below diagram describes XDP buff (xdp_buff), but text says 'XDP frame'. > So XDP frame isn't referring literally to xdp_frame, which I find > slightly confusing. > It is likely because I think too much about the code and the different > objects, xdp_frame, xdp_buff, xdp_md (xdp ctx seen be bpf-prog). > > I tried to grep in the (recent added) bpf/xdp docs to see if there is a > definition of a XDP "packet" or "frame". Nothing popped up, except that > Documentation/bpf/map_cpumap.rst talks about raw ``xdp_frame`` objects. > > Perhaps we can improve this doc by calling out xdp_buff here, like: > > Within an XDP frame, the metadata layout (accessed via ``xdp_buff``) > is as follows:: Sure, will amend! > > + > > + +----------+-----------------+------+ > > + | headroom | custom metadata | data | > > + +----------+-----------------+------+ > > + ^ ^ > > + | | > > + xdp_buff->data_meta xdp_buff->data > > + > > +An XDP program can store individual metadata items into this ``data_meta`` > > +area in whichever format it chooses. Later consumers of the metadata > > +will have to agree on the format by some out of band contract (like for > > +the AF_XDP use case, see below). > > + > > +AF_XDP > > +====== > > + > > +:doc:`af_xdp` use-case implies that there is a contract between the BPF > > +program that redirects XDP frames into the ``AF_XDP`` socket (``XSK``) and > > +the final consumer. Thus the BPF program manually allocates a fixed number of > > +bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset > > +of kfuncs to populate it. The userspace ``XSK`` consumer computes > > +``xsk_umem__get_data() - METADATA_SIZE`` to locate that metadata. > > +Note, ``xsk_umem__get_data`` is defined in ``libxdp`` and > > +``METADATA_SIZE`` is an application-specific constant. > > The main problem with AF_XDP and metadata is that, the AF_XDP descriptor > doesn't contain any info about the length METADATA_SIZE. > > The text does says this, but in a very convoluted way. > I think this challenge should be more clearly spelled out. > > (p.s. This was something that XDP-hints via BTF have a proposed solution > for) Any suggestions on how to clarify it better? I have two hints: 1. ``METADATA_SIZE`` is an application-specific constant 2. note missing ``data_meta`` pointer Do you prefer I also add a sentence where I spell it out more explicitly? Something like: Note, ``xsk_umem__get_data`` is defined in ``libxdp`` and ``METADATA_SIZE`` is an application-specific constant (``AF_XDP`` receive descriptor does _not_ explicitly carry the size of the metadata). > > + > > +Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer):: > > The "note" also hint to this issue. This seems like an explicit design choice of the AF_XDP? In theory, I don't see why we can't have a v2 receive descriptor format where we return the size of the metadata? > > + > > + +----------+-----------------+------+ > > + | headroom | custom metadata | data | > > + +----------+-----------------+------+ > > + ^ > > + | > > + rx_desc->address > > + > > +XDP_PASS > > +======== > > + > > +This is the path where the packets processed by the XDP program are passed > > +into the kernel. The kernel creates the ``skb`` out of the ``xdp_buff`` > > +contents. Currently, every driver has custom kernel code to parse > > +the descriptors and populate ``skb`` metadata when doing this ``xdp_buff->skb`` > > +conversion, and the XDP metadata is not used by the kernel when building > > +``skbs``. However, TC-BPF programs can access the XDP metadata area using > > +the ``data_meta`` pointer. > > + > > +In the future, we'd like to support a case where an XDP program > > +can override some of the metadata used for building ``skbs``. > > Happy this is mentioned as future work. As mentioned in a separate email, if you prefer to focus on that, feel free to drive it since I'm gonna look into the TX side first. > > + > > +bpf_redirect_map > > +================ > > + > > +``bpf_redirect_map`` can redirect the frame to a different device. > > +Some devices (like virtual ethernet links) support running a second XDP > > +program after the redirect. However, the final consumer doesn't have > > +access to the original hardware descriptor and can't access any of > > +the original metadata. The same applies to XDP programs installed > > +into devmaps and cpumaps. > > + > > +This means that for redirected packets only custom metadata is > > +currently supported, which has to be prepared by the initial XDP program > > +before redirect. If the frame is eventually passed to the kernel, the > > +``skb`` created from such a frame won't have any hardware metadata populated > > +in its ``skb``. If such a packet is later redirected into an ``XSK``, > > +that will also only have access to the custom metadata. > > + > > Good that this is documented, but I hope we can fix/improve this as > future work. Definitely! Hopefully documenting it here acts as a sort-of TODO which we can eventually address. Maybe even starting with a section here on how it is supposed to work :-) > > +bpf_tail_call > > +============= > > + > > +Adding programs that access metadata kfuncs to the ``BPF_MAP_TYPE_PROG_ARRAY`` > > +is currently not supported. > > + > > +Example > > +======= > > + > > +See ``tools/testing/selftests/bpf/progs/xdp_metadata.c`` and > > +``tools/testing/selftests/bpf/prog_tests/xdp_metadata.c`` for an example of > > +BPF program that handles XDP metadata. > > > --Jesper >
On 17/01/2023 21.33, Stanislav Fomichev wrote: > On Mon, Jan 16, 2023 at 5:09 AM Jesper Dangaard Brouer > <jbrouer@redhat.com> wrote: >> >> On 12/01/2023 01.32, Stanislav Fomichev wrote: >>> Document all current use-cases and assumptions. >>> [...] >>> Acked-by: David Vernet <void@manifault.com> >>> Signed-off-by: Stanislav Fomichev <sdf@google.com> >>> --- >>> Documentation/networking/index.rst | 1 + >>> Documentation/networking/xdp-rx-metadata.rst | 108 +++++++++++++++++++ >>> 2 files changed, 109 insertions(+) >>> create mode 100644 Documentation/networking/xdp-rx-metadata.rst >>> >>> diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst >>> index 4f2d1f682a18..4ddcae33c336 100644 >>> --- a/Documentation/networking/index.rst >>> +++ b/Documentation/networking/index.rst [...cut...] >>> +AF_XDP >>> +====== >>> + >>> +:doc:`af_xdp` use-case implies that there is a contract between the BPF >>> +program that redirects XDP frames into the ``AF_XDP`` socket (``XSK``) and >>> +the final consumer. Thus the BPF program manually allocates a fixed number of >>> +bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset >>> +of kfuncs to populate it. The userspace ``XSK`` consumer computes >>> +``xsk_umem__get_data() - METADATA_SIZE`` to locate that metadata. >>> +Note, ``xsk_umem__get_data`` is defined in ``libxdp`` and >>> +``METADATA_SIZE`` is an application-specific constant. >> >> The main problem with AF_XDP and metadata is that, the AF_XDP descriptor >> doesn't contain any info about the length METADATA_SIZE. >> >> The text does says this, but in a very convoluted way. >> I think this challenge should be more clearly spelled out. >> >> (p.s. This was something that XDP-hints via BTF have a proposed solution >> for) > > Any suggestions on how to clarify it better? I have two hints: > 1. ``METADATA_SIZE`` is an application-specific constant > 2. note missing ``data_meta`` pointer > > Do you prefer I also add a sentence where I spell it out more > explicitly? Something like: > > Note, ``xsk_umem__get_data`` is defined in ``libxdp`` and > ``METADATA_SIZE`` is an application-specific constant (``AF_XDP`` > receive descriptor does _not_ explicitly carry the size of the > metadata). That addition works for me. (Later we can hopefully come up with a solution for this) >>> + >>> +Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer):: >> >> The "note" also hint to this issue. > > This seems like an explicit design choice of the AF_XDP? In theory, I > don't see why we can't have a v2 receive descriptor format where we > return the size of the metadata? (Cc. Magnus+Bjørn) Yes, it was a design choice from AF_XDP not to include the metadata length. The AF_XDP descriptor, see struct xdp_desc (below) from /include/uapi/linux/if_xdp.h. /* Rx/Tx descriptor */ struct xdp_desc { __u64 addr; __u32 len; __u32 options; }; Does contain a 'u32 options' field, that we could use. In previous discussions, the proposed solution (from Bjørn+Magnus) was to use some bits in the 'options' field to say metadata is present, and xsk_umem__get_data minus 4 (or 8) bytes contain a BTF_ID. The AF_XDP programmer can then get the metadata length by looking up the BTF_ID. >>> + >>> + +----------+-----------------+------+ >>> + | headroom | custom metadata | data | >>> + +----------+-----------------+------+ >>> + ^ >>> + | >>> + rx_desc->address >>> + >>> +XDP_PASS >>> +======== >>> + >>> +This is the path where the packets processed by the XDP program are passed >>> +into the kernel. The kernel creates the ``skb`` out of the ``xdp_buff`` >>> +contents. Currently, every driver has custom kernel code to parse >>> +the descriptors and populate ``skb`` metadata when doing this ``xdp_buff->skb`` >>> +conversion, and the XDP metadata is not used by the kernel when building >>> +``skbs``. However, TC-BPF programs can access the XDP metadata area using >>> +the ``data_meta`` pointer. >>> + >>> +In the future, we'd like to support a case where an XDP program >>> +can override some of the metadata used for building ``skbs``. >> >> Happy this is mentioned as future work. > > As mentioned in a separate email, if you prefer to focus on that, feel Yes, I'm going to work on PoC code that explore this area. > free to drive it since I'm gonna look into the TX side first. Happy to hear you are going to look into TX-side. Are your use case related to TX timestamping? --Jesper
On Wed, Jan 18, 2023 at 6:28 AM Jesper Dangaard Brouer <jbrouer@redhat.com> wrote: > > > On 17/01/2023 21.33, Stanislav Fomichev wrote: > > On Mon, Jan 16, 2023 at 5:09 AM Jesper Dangaard Brouer > > <jbrouer@redhat.com> wrote: > >> > >> On 12/01/2023 01.32, Stanislav Fomichev wrote: > >>> Document all current use-cases and assumptions. > >>> > [...] > >>> Acked-by: David Vernet <void@manifault.com> > >>> Signed-off-by: Stanislav Fomichev <sdf@google.com> > >>> --- > >>> Documentation/networking/index.rst | 1 + > >>> Documentation/networking/xdp-rx-metadata.rst | 108 +++++++++++++++++++ > >>> 2 files changed, 109 insertions(+) > >>> create mode 100644 Documentation/networking/xdp-rx-metadata.rst > >>> > >>> diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst > >>> index 4f2d1f682a18..4ddcae33c336 100644 > >>> --- a/Documentation/networking/index.rst > >>> +++ b/Documentation/networking/index.rst > [...cut...] > >>> +AF_XDP > >>> +====== > >>> + > >>> +:doc:`af_xdp` use-case implies that there is a contract between the BPF > >>> +program that redirects XDP frames into the ``AF_XDP`` socket (``XSK``) and > >>> +the final consumer. Thus the BPF program manually allocates a fixed number of > >>> +bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset > >>> +of kfuncs to populate it. The userspace ``XSK`` consumer computes > >>> +``xsk_umem__get_data() - METADATA_SIZE`` to locate that metadata. > >>> +Note, ``xsk_umem__get_data`` is defined in ``libxdp`` and > >>> +``METADATA_SIZE`` is an application-specific constant. > >> > >> The main problem with AF_XDP and metadata is that, the AF_XDP descriptor > >> doesn't contain any info about the length METADATA_SIZE. > >> > >> The text does says this, but in a very convoluted way. > >> I think this challenge should be more clearly spelled out. > >> > >> (p.s. This was something that XDP-hints via BTF have a proposed solution > >> for) > > > > Any suggestions on how to clarify it better? I have two hints: > > 1. ``METADATA_SIZE`` is an application-specific constant > > 2. note missing ``data_meta`` pointer > > > > Do you prefer I also add a sentence where I spell it out more > > explicitly? Something like: > > > > Note, ``xsk_umem__get_data`` is defined in ``libxdp`` and > > ``METADATA_SIZE`` is an application-specific constant (``AF_XDP`` > > receive descriptor does _not_ explicitly carry the size of the > > metadata). > > That addition works for me. > (Later we can hopefully come up with a solution for this) > > >>> + > >>> +Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer):: > >> > >> The "note" also hint to this issue. > > > > This seems like an explicit design choice of the AF_XDP? In theory, I > > don't see why we can't have a v2 receive descriptor format where we > > return the size of the metadata? > > (Cc. Magnus+Bjørn) > Yes, it was a design choice from AF_XDP not to include the metadata length. > > The AF_XDP descriptor, see struct xdp_desc (below) from > /include/uapi/linux/if_xdp.h. > > /* Rx/Tx descriptor */ > struct xdp_desc { > __u64 addr; > __u32 len; > __u32 options; > }; > > Does contain a 'u32 options' field, that we could use. > > In previous discussions, the proposed solution (from Bjørn+Magnus) was > to use some bits in the 'options' field to say metadata is present, and > xsk_umem__get_data minus 4 (or 8) bytes contain a BTF_ID. The AF_XDP > programmer can then get the metadata length by looking up the BTF_ID. Yeah, that's one way to do it. Instead of BTF_ID, we can just put the size of the metadata. But it wasn't needed for the use-cases described in this patchset. For redirect use-case, I agree, we might carry some extra information about the layout, up to you. > >>> + > >>> + +----------+-----------------+------+ > >>> + | headroom | custom metadata | data | > >>> + +----------+-----------------+------+ > >>> + ^ > >>> + | > >>> + rx_desc->address > >>> + > >>> +XDP_PASS > >>> +======== > >>> + > >>> +This is the path where the packets processed by the XDP program are passed > >>> +into the kernel. The kernel creates the ``skb`` out of the ``xdp_buff`` > >>> +contents. Currently, every driver has custom kernel code to parse > >>> +the descriptors and populate ``skb`` metadata when doing this ``xdp_buff->skb`` > >>> +conversion, and the XDP metadata is not used by the kernel when building > >>> +``skbs``. However, TC-BPF programs can access the XDP metadata area using > >>> +the ``data_meta`` pointer. > >>> + > >>> +In the future, we'd like to support a case where an XDP program > >>> +can override some of the metadata used for building ``skbs``. > >> > >> Happy this is mentioned as future work. > > > > As mentioned in a separate email, if you prefer to focus on that, feel > > Yes, I'm going to work on PoC code that explore this area. > > > free to drive it since I'm gonna look into the TX side first. > > Happy to hear you are going to look into TX-side. > Are your use case related to TX timestamping? Yes, I'll try to start with a tx timestamp. But looking (briefly) at the code, that seems like a more invasive addition. If we want to return that tx timstamp via the original umem, some completion mechanism has to be added (besides the existing one it). LMK if you have some pointers to the previous discussions. Or maybe Bjørn/Magnus had some plans/ideas about that? > --Jesper >
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index 4f2d1f682a18..4ddcae33c336 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -120,6 +120,7 @@ Refer to :ref:`netdev-FAQ` for a guide on netdev development process specifics. xfrm_proc xfrm_sync xfrm_sysctl + xdp-rx-metadata .. only:: subproject and html diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst new file mode 100644 index 000000000000..b6c8c77937c4 --- /dev/null +++ b/Documentation/networking/xdp-rx-metadata.rst @@ -0,0 +1,108 @@ +=============== +XDP RX Metadata +=============== + +This document describes how an eXpress Data Path (XDP) program can access +hardware metadata related to a packet using a set of helper functions, +and how it can pass that metadata on to other consumers. + +General Design +============== + +XDP has access to a set of kfuncs to manipulate the metadata in an XDP frame. +Every device driver that wishes to expose additional packet metadata can +implement these kfuncs. The set of kfuncs is declared in ``include/net/xdp.h`` +via ``XDP_METADATA_KFUNC_xxx``. + +Currently, the following kfuncs are supported. In the future, as more +metadata is supported, this set will grow: + +.. kernel-doc:: net/core/xdp.c + :identifiers: bpf_xdp_metadata_rx_timestamp bpf_xdp_metadata_rx_hash + +An XDP program can use these kfuncs to read the metadata into stack +variables for its own consumption. Or, to pass the metadata on to other +consumers, an XDP program can store it into the metadata area carried +ahead of the packet. + +Not all kfuncs have to be implemented by the device driver; when not +implemented, the default ones that return ``-EOPNOTSUPP`` will be used. + +Within an XDP frame, the metadata layout is as follows:: + + +----------+-----------------+------+ + | headroom | custom metadata | data | + +----------+-----------------+------+ + ^ ^ + | | + xdp_buff->data_meta xdp_buff->data + +An XDP program can store individual metadata items into this ``data_meta`` +area in whichever format it chooses. Later consumers of the metadata +will have to agree on the format by some out of band contract (like for +the AF_XDP use case, see below). + +AF_XDP +====== + +:doc:`af_xdp` use-case implies that there is a contract between the BPF +program that redirects XDP frames into the ``AF_XDP`` socket (``XSK``) and +the final consumer. Thus the BPF program manually allocates a fixed number of +bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset +of kfuncs to populate it. The userspace ``XSK`` consumer computes +``xsk_umem__get_data() - METADATA_SIZE`` to locate that metadata. +Note, ``xsk_umem__get_data`` is defined in ``libxdp`` and +``METADATA_SIZE`` is an application-specific constant. + +Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer):: + + +----------+-----------------+------+ + | headroom | custom metadata | data | + +----------+-----------------+------+ + ^ + | + rx_desc->address + +XDP_PASS +======== + +This is the path where the packets processed by the XDP program are passed +into the kernel. The kernel creates the ``skb`` out of the ``xdp_buff`` +contents. Currently, every driver has custom kernel code to parse +the descriptors and populate ``skb`` metadata when doing this ``xdp_buff->skb`` +conversion, and the XDP metadata is not used by the kernel when building +``skbs``. However, TC-BPF programs can access the XDP metadata area using +the ``data_meta`` pointer. + +In the future, we'd like to support a case where an XDP program +can override some of the metadata used for building ``skbs``. + +bpf_redirect_map +================ + +``bpf_redirect_map`` can redirect the frame to a different device. +Some devices (like virtual ethernet links) support running a second XDP +program after the redirect. However, the final consumer doesn't have +access to the original hardware descriptor and can't access any of +the original metadata. The same applies to XDP programs installed +into devmaps and cpumaps. + +This means that for redirected packets only custom metadata is +currently supported, which has to be prepared by the initial XDP program +before redirect. If the frame is eventually passed to the kernel, the +``skb`` created from such a frame won't have any hardware metadata populated +in its ``skb``. If such a packet is later redirected into an ``XSK``, +that will also only have access to the custom metadata. + +bpf_tail_call +============= + +Adding programs that access metadata kfuncs to the ``BPF_MAP_TYPE_PROG_ARRAY`` +is currently not supported. + +Example +======= + +See ``tools/testing/selftests/bpf/progs/xdp_metadata.c`` and +``tools/testing/selftests/bpf/prog_tests/xdp_metadata.c`` for an example of +BPF program that handles XDP metadata.