diff mbox series

[bpf-next,v4,1/1] docs: BPF_MAP_TYPE_XSKMAP

Message ID 20221122103701.65867-1-mtahhan@redhat.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series [bpf-next,v4,1/1] docs: BPF_MAP_TYPE_XSKMAP | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 16 maintainers not CCed: sdf@google.com daniel@iogearbox.net corbet@lwn.net netdev@vger.kernel.org martin.lau@linux.dev ast@kernel.org kpsingh@kernel.org hawk@kernel.org kuba@kernel.org andrii@kernel.org haoluo@google.com davem@davemloft.net jolsa@kernel.org song@kernel.org john.fastabend@gmail.com yhs@fb.com
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-7 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-8 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-14 fail Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-15 fail Logs for test_progs on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-17 fail Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 fail Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19 fail Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 fail Logs for test_progs_no_alu32 on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22 fail Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 fail Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_progs_no_alu32_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-30 success Logs for test_progs_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-32 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-33 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-34 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-35 success Logs for test_verifier on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-37 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-38 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-36 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-21 fail Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_progs_no_alu32_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-31 success Logs for test_progs_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16 fail Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-PR fail PR summary

Commit Message

Maryam Tahhan Nov. 22, 2022, 10:37 a.m. UTC
From: Maryam Tahhan <mtahhan@redhat.com>

Add documentation for BPF_MAP_TYPE_XSKMAP
including kernel version introduced, usage
and examples.

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

---
v4:
- Add more details about AF_XDP sockets and where to find
  relevant info.
- Fixup typos.
- Remove ``c:function::`` block directives.
- Replace spaces with tabs in code blocks.

v3:
- Fixed duplicate function warnings from Sphinx >= 3.3.

v2:
- Fixed typos + incorrect return type references.
- Adjusted examples to use __u32 and fixed references to key_size.
- Changed `AF_XDP socket` references to XSK.
- Added note re map key and value size.
---
 Documentation/bpf/map_xskmap.rst | 192 +++++++++++++++++++++++++++++++
 1 file changed, 192 insertions(+)
 create mode 100644 Documentation/bpf/map_xskmap.rst

Comments

David Vernet Nov. 22, 2022, 3:14 p.m. UTC | #1
On Tue, Nov 22, 2022 at 10:37:01AM +0000, mtahhan@redhat.com wrote:
> From: Maryam Tahhan <mtahhan@redhat.com>
> 
> Add documentation for BPF_MAP_TYPE_XSKMAP
> including kernel version introduced, usage
> and examples.
> 
> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
> 
> ---
> v4:
> - Add more details about AF_XDP sockets and where to find
>   relevant info.
> - Fixup typos.
> - Remove ``c:function::`` block directives.
> - Replace spaces with tabs in code blocks.
> 
> v3:
> - Fixed duplicate function warnings from Sphinx >= 3.3.
> 
> v2:
> - Fixed typos + incorrect return type references.
> - Adjusted examples to use __u32 and fixed references to key_size.
> - Changed `AF_XDP socket` references to XSK.
> - Added note re map key and value size.
> ---
>  Documentation/bpf/map_xskmap.rst | 192 +++++++++++++++++++++++++++++++
>  1 file changed, 192 insertions(+)
>  create mode 100644 Documentation/bpf/map_xskmap.rst
> 
> diff --git a/Documentation/bpf/map_xskmap.rst b/Documentation/bpf/map_xskmap.rst
> new file mode 100644
> index 000000000000..990010952d2f
> --- /dev/null
> +++ b/Documentation/bpf/map_xskmap.rst
> @@ -0,0 +1,192 @@
> +.. SPDX-License-Identifier: GPL-2.0-only
> +.. Copyright (C) 2022 Red Hat, Inc.
> +
> +===================
> +BPF_MAP_TYPE_XSKMAP
> +===================
> +
> +.. note::
> +   - ``BPF_MAP_TYPE_XSKMAP`` was introduced in kernel version 4.18
> +
> +The ``BPF_MAP_TYPE_XSKMAP`` is used as a backend map for XDP BPF helper
> +call ``bpf_redirect_map()`` and ``XDP_REDIRECT`` action, like 'devmap' and 'cpumap'.
> +This map type redirects raw XDP frames to `AF_XDP`_ sockets (XSKs), a new type of
> +address family in the Kernel that allows redirection of frames from a driver to

s/Kernel/kernel

> +userspace without having to traverse the full network stack. An AF_XDP socket

We make this mistake all over the BPF docs, but we might as well be
consistent / correct here:

s/userspace/user space

> +binds to a single netdev queue. A mapping of XSKs to queues is shown below:
> +
> +.. code-block:: none
> +
> +    +---------------------------------------------------+
> +    |     xsk A      |     xsk B       |      xsk C     |<---+ Userspace

s/Userspace/User space

> +    =========================================================|==========
> +    |    Queue 0     |     Queue 1     |     Queue 2    |    |  Kernel
> +    +---------------------------------------------------+    |
> +    |                  Netdev eth0                      |    |
> +    +---------------------------------------------------+    |
> +    |                            +=============+        |    |
> +    |                            | key |  xsk  |        |    |
> +    |  +---------+               +=============+        |    |
> +    |  |         |               |  0  | xsk A |        |    |
> +    |  |         |               +-------------+        |    |
> +    |  |         |               |  1  | xsk B |        |    |
> +    |  | BPF     |-- redirect -->+-------------+-------------+
> +    |  | prog    |               |  2  | xsk C |        |
> +    |  |         |               +-------------+        |
> +    |  |         |                                      |
> +    |  |         |                                      |
> +    |  +---------+                                      |
> +    |                                                   |
> +    +---------------------------------------------------+
> +
> +.. note::
> +    An AF_XDP socket that is bound to a certain <netdev/queue_id> will *only*
> +    accept XDP frames from that <netdev/queue_id>. If an XDP program tries to redirect
> +    from a <netdev/queue_id> other than what the socket is bound to, the frame will
> +    not be received on the socket.
> +
> +Typically an XSKMAP is created per netdev. This map contains an array of XSK File
> +Descriptors (FDs). The number of array elements is typically set or adjusted using
> +the ``max_entries`` map parameter. For AF_XDP ``max_entries`` is equal to the number
> +of queues supported by the netdev.
> +
> +.. note::
> +    Both the map key and map value size must be 4 bytes.
> +
> +Usage
> +=====
> +
> +Kernel BPF
> +----------
> +bpf_redirect_map()
> +^^^^^^^^^^^^^^^^^^
> +.. code-block:: c
> +
> +    long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
> +
> +Redirect the packet to the endpoint referenced by ``map`` at index ``key``.
> +For ``BPF_MAP_TYPE_XSKMAP`` this map contains references to XSK FDs
> +for sockets attached to a netdev's queues.
> +
> +.. note::
> +    If the map is empty at an index, the packet is dropped. This means that it is
> +    necessary to have an XDP program loaded with at least one XSK in the
> +    XSKMAP to be able to get any traffic to user space through the socket.
> +
> +bpf_map_lookup_elem()
> +^^^^^^^^^^^^^^^^^^^^^
> +.. code-block:: c
> +
> +    void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
> +
> +XSK entry references of type ``struct xdp_sock *`` can be retrieved using the
> +``bpf_map_lookup_elem()`` helper.
> +
> +Userspace
> +---------

s/Userspace/User space

> +.. note::
> +    XSK entries can only be updated/deleted from user space and not from
> +    an BPF program. Trying to call these functions from a kernel BPF program will

s/an BPF/a BPF

> +    result in the program failing to load and a verifier warning.
> +
> +bpf_map_update_elem()
> +^^^^^^^^^^^^^^^^^^^^^
> +.. code-block:: c
> +
> +	int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags)
> +
> +XSK entries can be added or updated using the ``bpf_map_update_elem()``
> +helper. The ``key`` parameter is equal to the queue_id of the queue the XSK
> +is attaching to. And the ``value`` parameter is the FD value of that socket.
> +
> +Under the hood, the XSKMAP update function uses the XSK FD value to retrieve the
> +associated ``struct xdp_sock`` instance.
> +
> +The flags argument can be one of the following:
> +
> +- BPF_ANY: Create a new element or update an existing element.
> +- BPF_NOEXIST: Create a new element only if it did not exist.
> +- BPF_EXIST: Update an existing element.
> +
> +bpf_map_lookup_elem()
> +^^^^^^^^^^^^^^^^^^^^^
> +.. code-block:: c
> +
> +    int bpf_map_lookup_elem(int fd, const void *key, void *value)
> +
> +Returns ``struct xdp_sock *`` or negative error in case of failure.
> +
> +bpf_map_delete_elem()
> +^^^^^^^^^^^^^^^^^^^^^
> +.. code-block:: c
> +
> +    int bpf_map_delete_elem(int fd, const void *key)
> +
> +XSK entries can be deleted using the ``bpf_map_delete_elem()``
> +helper. This helper will return 0 on success, or negative error in case of
> +failure.
> +
> +.. note::
> +    When `libxdp`_ deletes an XSK it also removes the associated socket
> +    entry from the XSKMAP.
> +
> +Examples
> +========
> +Kernel
> +------
> +
> +The following code snippet shows how to declare a ``BPF_MAP_TYPE_XSKMAP`` called
> +``xsks_map`` and how to redirect packets to an XSK.
> +
> +.. code-block:: c
> +
> +	struct {
> +		__uint(type, BPF_MAP_TYPE_XSKMAP);
> +		__type(key, __u32);
> +		__type(value, __u32);
> +		__uint(max_entries, 64);
> +	} xsks_map SEC(".maps");
> +
> +
> +	SEC("xdp")
> +	int xsk_redir_prog(struct xdp_md *ctx)
> +	{
> +		__u32 index = ctx->rx_queue_index;
> +
> +		if (bpf_map_lookup_elem(&xsks_map, &index))
> +			return bpf_redirect_map(&xsks_map, index, 0);
> +		return XDP_PASS;
> +	}
> +
> +Userspace
> +---------

s/Userspace/User space

> +
> +The following code snippet shows how to update an XSKMAP with an XSK entry.
> +
> +.. code-block:: c
> +
> +	int update_xsks_map(struct bpf_map *xsks_map, int queue_id, int xsk_fd)
> +	{
> +		int ret;
> +
> +		ret = bpf_map_update_elem(bpf_map__fd(xsks_map), &queue_id, &xsk_fd, 0);
> +		if (ret < 0)
> +			fprintf(stderr, "Failed to update xsks_map: %s\n", strerror(errno));
> +
> +		return ret;
> +	}
> +
> +For an example on how create AF_XDP sockets, please see the AF_XDP-example and
> +AF_XDP-forwarding programs in the `bpf-examples`_ directory in the `libxdp`_ repository.
> +For a detailed explaination of the AF_XDP interface please see:
> +
> +- `libxdp-readme`_.
> +- `AF_XDP`_ Kernel documentation.
> +
> +.. note::
> +    The most comprehensive resource for using XSKMAPs and AF_XDP is `libxdp`_.
> +
> +.. _libxdp: https://github.com/xdp-project/xdp-tools/tree/master/lib/libxdp
> +.. _AF_XDP: https://www.kernel.org/doc/html/latest/networking/af_xdp.html
> +.. _bpf-examples: https://github.com/xdp-project/bpf-examples
> +.. _libxdp-readme: https://github.com/xdp-project/xdp-tools/tree/master/lib/libxdp#using-af_xdp-sockets
> -- 
> 2.34.1
>
diff mbox series

Patch

diff --git a/Documentation/bpf/map_xskmap.rst b/Documentation/bpf/map_xskmap.rst
new file mode 100644
index 000000000000..990010952d2f
--- /dev/null
+++ b/Documentation/bpf/map_xskmap.rst
@@ -0,0 +1,192 @@ 
+.. SPDX-License-Identifier: GPL-2.0-only
+.. Copyright (C) 2022 Red Hat, Inc.
+
+===================
+BPF_MAP_TYPE_XSKMAP
+===================
+
+.. note::
+   - ``BPF_MAP_TYPE_XSKMAP`` was introduced in kernel version 4.18
+
+The ``BPF_MAP_TYPE_XSKMAP`` is used as a backend map for XDP BPF helper
+call ``bpf_redirect_map()`` and ``XDP_REDIRECT`` action, like 'devmap' and 'cpumap'.
+This map type redirects raw XDP frames to `AF_XDP`_ sockets (XSKs), a new type of
+address family in the Kernel that allows redirection of frames from a driver to
+userspace without having to traverse the full network stack. An AF_XDP socket
+binds to a single netdev queue. A mapping of XSKs to queues is shown below:
+
+.. code-block:: none
+
+    +---------------------------------------------------+
+    |     xsk A      |     xsk B       |      xsk C     |<---+ Userspace
+    =========================================================|==========
+    |    Queue 0     |     Queue 1     |     Queue 2    |    |  Kernel
+    +---------------------------------------------------+    |
+    |                  Netdev eth0                      |    |
+    +---------------------------------------------------+    |
+    |                            +=============+        |    |
+    |                            | key |  xsk  |        |    |
+    |  +---------+               +=============+        |    |
+    |  |         |               |  0  | xsk A |        |    |
+    |  |         |               +-------------+        |    |
+    |  |         |               |  1  | xsk B |        |    |
+    |  | BPF     |-- redirect -->+-------------+-------------+
+    |  | prog    |               |  2  | xsk C |        |
+    |  |         |               +-------------+        |
+    |  |         |                                      |
+    |  |         |                                      |
+    |  +---------+                                      |
+    |                                                   |
+    +---------------------------------------------------+
+
+.. note::
+    An AF_XDP socket that is bound to a certain <netdev/queue_id> will *only*
+    accept XDP frames from that <netdev/queue_id>. If an XDP program tries to redirect
+    from a <netdev/queue_id> other than what the socket is bound to, the frame will
+    not be received on the socket.
+
+Typically an XSKMAP is created per netdev. This map contains an array of XSK File
+Descriptors (FDs). The number of array elements is typically set or adjusted using
+the ``max_entries`` map parameter. For AF_XDP ``max_entries`` is equal to the number
+of queues supported by the netdev.
+
+.. note::
+    Both the map key and map value size must be 4 bytes.
+
+Usage
+=====
+
+Kernel BPF
+----------
+bpf_redirect_map()
+^^^^^^^^^^^^^^^^^^
+.. code-block:: c
+
+    long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
+
+Redirect the packet to the endpoint referenced by ``map`` at index ``key``.
+For ``BPF_MAP_TYPE_XSKMAP`` this map contains references to XSK FDs
+for sockets attached to a netdev's queues.
+
+.. note::
+    If the map is empty at an index, the packet is dropped. This means that it is
+    necessary to have an XDP program loaded with at least one XSK in the
+    XSKMAP to be able to get any traffic to user space through the socket.
+
+bpf_map_lookup_elem()
+^^^^^^^^^^^^^^^^^^^^^
+.. code-block:: c
+
+    void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
+
+XSK entry references of type ``struct xdp_sock *`` can be retrieved using the
+``bpf_map_lookup_elem()`` helper.
+
+Userspace
+---------
+.. note::
+    XSK entries can only be updated/deleted from user space and not from
+    an BPF program. Trying to call these functions from a kernel BPF program will
+    result in the program failing to load and a verifier warning.
+
+bpf_map_update_elem()
+^^^^^^^^^^^^^^^^^^^^^
+.. code-block:: c
+
+	int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags)
+
+XSK entries can be added or updated using the ``bpf_map_update_elem()``
+helper. The ``key`` parameter is equal to the queue_id of the queue the XSK
+is attaching to. And the ``value`` parameter is the FD value of that socket.
+
+Under the hood, the XSKMAP update function uses the XSK FD value to retrieve the
+associated ``struct xdp_sock`` instance.
+
+The flags argument can be one of the following:
+
+- BPF_ANY: Create a new element or update an existing element.
+- BPF_NOEXIST: Create a new element only if it did not exist.
+- BPF_EXIST: Update an existing element.
+
+bpf_map_lookup_elem()
+^^^^^^^^^^^^^^^^^^^^^
+.. code-block:: c
+
+    int bpf_map_lookup_elem(int fd, const void *key, void *value)
+
+Returns ``struct xdp_sock *`` or negative error in case of failure.
+
+bpf_map_delete_elem()
+^^^^^^^^^^^^^^^^^^^^^
+.. code-block:: c
+
+    int bpf_map_delete_elem(int fd, const void *key)
+
+XSK entries can be deleted using the ``bpf_map_delete_elem()``
+helper. This helper will return 0 on success, or negative error in case of
+failure.
+
+.. note::
+    When `libxdp`_ deletes an XSK it also removes the associated socket
+    entry from the XSKMAP.
+
+Examples
+========
+Kernel
+------
+
+The following code snippet shows how to declare a ``BPF_MAP_TYPE_XSKMAP`` called
+``xsks_map`` and how to redirect packets to an XSK.
+
+.. code-block:: c
+
+	struct {
+		__uint(type, BPF_MAP_TYPE_XSKMAP);
+		__type(key, __u32);
+		__type(value, __u32);
+		__uint(max_entries, 64);
+	} xsks_map SEC(".maps");
+
+
+	SEC("xdp")
+	int xsk_redir_prog(struct xdp_md *ctx)
+	{
+		__u32 index = ctx->rx_queue_index;
+
+		if (bpf_map_lookup_elem(&xsks_map, &index))
+			return bpf_redirect_map(&xsks_map, index, 0);
+		return XDP_PASS;
+	}
+
+Userspace
+---------
+
+The following code snippet shows how to update an XSKMAP with an XSK entry.
+
+.. code-block:: c
+
+	int update_xsks_map(struct bpf_map *xsks_map, int queue_id, int xsk_fd)
+	{
+		int ret;
+
+		ret = bpf_map_update_elem(bpf_map__fd(xsks_map), &queue_id, &xsk_fd, 0);
+		if (ret < 0)
+			fprintf(stderr, "Failed to update xsks_map: %s\n", strerror(errno));
+
+		return ret;
+	}
+
+For an example on how create AF_XDP sockets, please see the AF_XDP-example and
+AF_XDP-forwarding programs in the `bpf-examples`_ directory in the `libxdp`_ repository.
+For a detailed explaination of the AF_XDP interface please see:
+
+- `libxdp-readme`_.
+- `AF_XDP`_ Kernel documentation.
+
+.. note::
+    The most comprehensive resource for using XSKMAPs and AF_XDP is `libxdp`_.
+
+.. _libxdp: https://github.com/xdp-project/xdp-tools/tree/master/lib/libxdp
+.. _AF_XDP: https://www.kernel.org/doc/html/latest/networking/af_xdp.html
+.. _bpf-examples: https://github.com/xdp-project/bpf-examples
+.. _libxdp-readme: https://github.com/xdp-project/xdp-tools/tree/master/lib/libxdp#using-af_xdp-sockets