From patchwork Tue Nov 15 03:02:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stanislav Fomichev X-Patchwork-Id: 13043167 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F205CC43217 for ; Tue, 15 Nov 2022 03:04:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237867AbiKODD7 (ORCPT ); Mon, 14 Nov 2022 22:03:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237321AbiKODDM (ORCPT ); Mon, 14 Nov 2022 22:03:12 -0500 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D11BE83 for ; Mon, 14 Nov 2022 19:02:14 -0800 (PST) Received: by mail-pg1-x54a.google.com with SMTP id q63-20020a632a42000000b0045724b1dfb9so6735569pgq.3 for ; Mon, 14 Nov 2022 19:02:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=S7Vman8qfNtTZB6r6aTNPGFOv5a3C0qHKKQ3/SbdT4Q=; b=FymJYW6y9M1ExpR+i/pawUZZGTmTF6qVEjvG3D1/8zUCPUWXw2bMA5p8qNHGc91uFA /YykIRp35jQVikhTm68oyXejSNivjXdoywX6RrBimP+5XCAFEifH5KLnonBtrBKDRthP C5XXrW1w1cp+KKNe8Rq2pT1IsqZRM5lMO+Mm1iLJnFT8hSQB0f0/EL6wjMbj3wbCc6M5 O6HsaRoyDpgDvyV4EzPwe9rCPBG8vJgA+8sWxy3xn3l+Pmf/FybH/DHeFT3/Y9HD5FWs Yf2O8cGkGs5Key6LXF6PQYdsN/BcwIk07Qx3BdjMZA6VP1KUJkaNkW/0MMCZpCyNxV+l eYeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=S7Vman8qfNtTZB6r6aTNPGFOv5a3C0qHKKQ3/SbdT4Q=; b=gHlxS6vV0Us17pJAXo6iea0EECAJitVtcgYLCLyRetZWvvf3aWuiEoQ9XLQfgZJ7AG yHUE+/qrjD88foMPhgz3DXpWCj81HDOXeVaQDMg6CfoDyOslNRn+uMOsf0fGThTIjr6x eNJP9pgBGFnv//XMA3kVkeflU2koPanBp9EyTV2f7g64rPm4JUCVZvzorKd/xyN7zDwj uafmLdmjOObvfn90cRbGW80nkMPu988LvVL7DGnRimawX2hnogX/6CsuU/5Hc6TnVuh6 pLcLW3HbDu6l/S/E4FDciGAvkAGP8ewzNIX0BO+gslhFWnhZtbYH1cemnFiB4yX/iAGF 3FGw== X-Gm-Message-State: ANoB5pnNz7DrmF5xwNoIInE3KhKBIuQBOdIHrmxi1tfTIM/X4Xy5jt5w PVln4EU5apkg0YUJRG7jbGS8VsIHYgmh0kxJhnjaHnD8HcVzBRDYlg5kow7wF0PGqpV6/rJb46t mcuDVr/uO+PETnrL3+w02loDJo6P2u1He8EYC9fPUO5V9w4IXmg== X-Google-Smtp-Source: AA0mqf4u0jnBkNNxotR3tKpzg2dGXIoZzslFhUp1GEdXqwEMDbO7AjkbxpbYcglpySoD4gFFeZqqmhI= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a17:902:8d90:b0:188:537d:78d9 with SMTP id v16-20020a1709028d9000b00188537d78d9mr2070479plo.48.1668481333923; Mon, 14 Nov 2022 19:02:13 -0800 (PST) Date: Mon, 14 Nov 2022 19:02:00 -0800 In-Reply-To: <20221115030210.3159213-1-sdf@google.com> Mime-Version: 1.0 References: <20221115030210.3159213-1-sdf@google.com> X-Mailer: git-send-email 2.38.1.431.g37b22c650d-goog Message-ID: <20221115030210.3159213-2-sdf@google.com> Subject: [PATCH bpf-next 01/11] bpf: Document XDP RX metadata From: Stanislav Fomichev To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Document all current use-cases and assumptions. Signed-off-by: Stanislav Fomichev --- Documentation/bpf/xdp-rx-metadata.rst | 109 ++++++++++++++++++++++++++ 1 file changed, 109 insertions(+) create mode 100644 Documentation/bpf/xdp-rx-metadata.rst diff --git a/Documentation/bpf/xdp-rx-metadata.rst b/Documentation/bpf/xdp-rx-metadata.rst new file mode 100644 index 000000000000..5ddaaab8de31 --- /dev/null +++ b/Documentation/bpf/xdp-rx-metadata.rst @@ -0,0 +1,109 @@ +=============== +XDP RX Metadata +=============== + +XDP programs support creating and passing custom metadata via +``bpf_xdp_adjust_meta``. This metadata can be consumed by the following +entities: + +1. ``AF_XDP`` consumer. +2. Kernel core stack via ``XDP_PASS``. +3. Another device via ``bpf_redirect_map``. + +General Design +============== + +XDP has access to a set of kfuncs to manipulate the metadata. Every +device driver implements these kfuncs by generating BPF bytecode +to parse it out from the hardware descriptors. The set of kfuncs is +declared in ``include/net/xdp.h`` via ``XDP_METADATA_KFUNC_xxx``. + +Currently, the following kfuncs are supported. In the future, as more +metadata is supported, this set will grow: + +- ``bpf_xdp_metadata_rx_timestamp_supported`` returns true/false to + indicate whether the device supports RX timestamps in general +- ``bpf_xdp_metadata_rx_timestamp`` returns packet RX timestamp or 0 +- ``bpf_xdp_metadata_export_to_skb`` prepares metadata layout that + the kernel will be able to consume. See ``bpf_redirect_map`` section + below for more details. + +Within the XDP frame, the metadata layout is as follows:: + + +----------+------------------+-----------------+------+ + | headroom | xdp_skb_metadata | custom metadata | data | + +----------+------------------+-----------------+------+ + ^ ^ + | | + xdp_buff->data_meta xdp_buff->data + +Where ``xdp_skb_metadata`` is the metadata prepared by +``bpf_xdp_metadata_export_to_skb``. And ``custom metadata`` +is prepared by the BPF program via calls to ``bpf_xdp_adjust_meta``. + +Note that ``bpf_xdp_metadata_export_to_skb`` doesn't adjust +``xdp->data_meta`` pointer. To access the metadata generated +by ``bpf_xdp_metadata_export_to_skb`` use ``xdp_buf->skb_metadata``. + +AF_XDP +====== + +``AF_XDP`` use-case implies that there is a contract between the BPF program +that redirects XDP frames into the ``XSK`` and the final consumer. +Thus the BPF program manually allocates a fixed number of +bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset +of kfuncs to populate it. User-space ``XSK`` consumer, looks +at ``xsk_umem__get_data() - METADATA_SIZE`` to locate its metadata. + +Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer):: + + +----------+------------------+-----------------+------+ + | headroom | xdp_skb_metadata | custom metadata | data | + +----------+------------------+-----------------+------+ + ^ + | + rx_desc->address + +XDP_PASS +======== + +This is the path where the packets processed by the XDP program are passed +into the kernel. The kernel creates ``skb`` out of the ``xdp_buff`` contents. +Currently, every driver has a custom kernel code to parse the descriptors and +populate ``skb`` metadata when doing this ``xdp_buff->skb`` conversion. +In the future, we'd like to support a case where XDP program can override +some of that metadata. + +The plan of record is to make this path similar to ``bpf_redirect_map`` +below where the program would call ``bpf_xdp_metadata_export_to_skb``, +override the metadata and return ``XDP_PASS``. Additional work in +the drivers will be required to enable this (for example, to skip +populating ``skb`` metadata from the descriptors when +``bpf_xdp_metadata_export_to_skb`` has been called). + +bpf_redirect_map +================ + +``bpf_redirect_map`` can redirect the frame to a different device. +In this case we don't know ahead of time whether that final consumer +will further redirect to an ``XSK`` or pass it to the kernel via ``XDP_PASS``. +Additionally, the final consumer doesn't have access to the original +hardware descriptor and can't access any of the original metadata. + +To support passing metadata via ``bpf_redirect_map``, there is a +``bpf_xdp_metadata_export_to_skb`` kfunc that populates a subset +of metadata into ``xdp_buff``. The layout is defined in +``struct xdp_skb_metadata``. + +Mixing custom metadata and xdp_skb_metadata +=========================================== + +For the cases of ``bpf_redirect_map``, where the final consumer isn't +known ahead of time, the program can store both, custom metadata +and ``xdp_skb_metadata`` for the kernel consumption. + +Current limitation is that the program cannot adjust ``data_meta`` (via +``bpf_xdp_adjust_meta``) after a call to ``bpf_xdp_metadata_export_to_skb``. +So it has to, first, prepare its custom metadata layout and only then, +optionally, store ``xdp_skb_metadata`` via a call to +``bpf_xdp_metadata_export_to_skb``. From patchwork Tue Nov 15 03:02:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stanislav Fomichev X-Patchwork-Id: 13043159 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E2B5C4167B for ; Tue, 15 Nov 2022 03:04:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237866AbiKODD7 (ORCPT ); Mon, 14 Nov 2022 22:03:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236796AbiKODDM (ORCPT ); Mon, 14 Nov 2022 22:03:12 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 817CF21BB for ; Mon, 14 Nov 2022 19:02:16 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id z125-20020a25c983000000b006dc905e6ccfso12051584ybf.1 for ; Mon, 14 Nov 2022 19:02:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=E+8IjXRgxBNITPsOwIQKMXDKqWa80/z37oUBtUuQVa0=; b=WJ+Gmv5/4g5yFbrp5IctO/4xjTqfAnV73tfMi/F7yI3/SCZP1J+a2RMInH/6OsyyIe Rdh8XJsAMlXQJjWlonNRg0k7V+GccC8NlXTIEmWG3jf56Pt3ZbbxrzHH0YcvjzHJkREM kLMNnb+jiEbjPOkbR6ScJ5b2r80deVxSNTiYuKnzEu+PY/cPtUaNNQM5ZfOp+b9v+xX5 i/CVxyDT7vB9DCXEp/+Yj5ZkI1gjNKMs/mx3nxe7+J5CT7OCcwheY1XTKmVgMyOw21Ml MeHMnSYXDAO7WoztOY0aKuxmltR6B8pm3cqNEc9eqdd1+Wvtzy5Sj2XfOF6P2kyBi2VL TXUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=E+8IjXRgxBNITPsOwIQKMXDKqWa80/z37oUBtUuQVa0=; b=kv4hyTNyw0Iny2Ub2vQ8i7lMsx39wDTUq1BLQTc4UAlQkT3km4jBK48EsWsm4YYf/w 9pTrpr1yO9xPebizgfE4TE8ltdcag/7IgLLVh71urwVhDt7mY6xdzr0JA43ujGjOrNlp W2w3bkzgZSnOTesZs18gHGtJ+ygWdfEFotLwZoNaRep1Smw/dNR7m9uDgHeKT68KGrZI us27Ip9OSpgkhUvCcZoqxXQ17CcI7QC3F/mLsA8EWc52f7gIEC7Ck6A5Xpvpbk/JfMiD Pa6XXL8wqwm8aTk3MeJ2FbTKtxjrXl4RktW8hVixfSkAE0ZJzW7ksqFxLBmghJ4GCKPs rTGg== X-Gm-Message-State: ANoB5pn5Mbtm6n6yUvGtBxb4SdKBm0q8LeA11xf37geySfyHSFaT5uSW TcIGgj/HW2QFtE7wfO3WAY4ZgNk= X-Google-Smtp-Source: AA0mqf49JR+xywOcvNrrM2mX6vAJ2aTg+01p+IZYDcOS32Yl5zwEF1L6Kv8p/1IcRroXqOIzIsYPkrk= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a25:3801:0:b0:6cf:ef46:e780 with SMTP id f1-20020a253801000000b006cfef46e780mr15073287yba.644.1668481335532; Mon, 14 Nov 2022 19:02:15 -0800 (PST) Date: Mon, 14 Nov 2022 19:02:01 -0800 In-Reply-To: <20221115030210.3159213-1-sdf@google.com> Mime-Version: 1.0 References: <20221115030210.3159213-1-sdf@google.com> X-Mailer: git-send-email 2.38.1.431.g37b22c650d-goog Message-ID: <20221115030210.3159213-3-sdf@google.com> Subject: [PATCH bpf-next 02/11] bpf: Introduce bpf_patch From: Stanislav Fomichev To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, David Ahern , Jakub Kicinski , Willem de Bruijn , Jesper Dangaard Brouer , Anatoly Burakov , Alexander Lobakin , Magnus Karlsson , Maryam Tahhan , xdp-hints@xdp-project.net, netdev@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net A simple abstraction around a series of instructions that transparently handles resizing. Currently, we have insn_buf[16] in convert_ctx_accesses which might not be enough for xdp kfuncs. If we find this abstraction helpful, we might convert existing insn_buf[16] to it in the future. Cc: John Fastabend Cc: David Ahern Cc: Martin KaFai Lau Cc: Jakub Kicinski Cc: Willem de Bruijn Cc: Jesper Dangaard Brouer Cc: Anatoly Burakov Cc: Alexander Lobakin Cc: Magnus Karlsson Cc: Maryam Tahhan Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Signed-off-by: Stanislav Fomichev --- include/linux/bpf_patch.h | 29 +++++++++++++++ kernel/bpf/Makefile | 2 +- kernel/bpf/bpf_patch.c | 77 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 107 insertions(+), 1 deletion(-) create mode 100644 include/linux/bpf_patch.h create mode 100644 kernel/bpf/bpf_patch.c diff --git a/include/linux/bpf_patch.h b/include/linux/bpf_patch.h new file mode 100644 index 000000000000..359c165ad68b --- /dev/null +++ b/include/linux/bpf_patch.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef _LINUX_BPF_PATCH_H +#define _LINUX_BPF_PATCH_H 1 + +#include + +struct bpf_patch { + struct bpf_insn *insn; + size_t capacity; + size_t len; + int err; +}; + +void bpf_patch_free(struct bpf_patch *patch); +size_t bpf_patch_len(const struct bpf_patch *patch); +int bpf_patch_err(const struct bpf_patch *patch); +void __bpf_patch_append(struct bpf_patch *patch, struct bpf_insn insn); +struct bpf_insn *bpf_patch_data(const struct bpf_patch *patch); +void bpf_patch_resolve_jmp(struct bpf_patch *patch); +u32 bpf_patch_magles_registers(const struct bpf_patch *patch); + +#define bpf_patch_append(patch, ...) ({ \ + struct bpf_insn insn[] = { __VA_ARGS__ }; \ + int i; \ + for (i = 0; i < ARRAY_SIZE(insn); i++) \ + __bpf_patch_append(patch, insn[i]); \ +}) + +#endif diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 3a12e6b400a2..5724f36292a5 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -13,7 +13,7 @@ obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o obj-$(CONFIG_BPF_SYSCALL) += disasm.o obj-$(CONFIG_BPF_JIT) += trampoline.o -obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o +obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o bpf_patch.o obj-$(CONFIG_BPF_JIT) += dispatcher.o ifeq ($(CONFIG_NET),y) obj-$(CONFIG_BPF_SYSCALL) += devmap.o diff --git a/kernel/bpf/bpf_patch.c b/kernel/bpf/bpf_patch.c new file mode 100644 index 000000000000..eb768398fd8f --- /dev/null +++ b/kernel/bpf/bpf_patch.c @@ -0,0 +1,77 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include + +void bpf_patch_free(struct bpf_patch *patch) +{ + kfree(patch->insn); +} + +size_t bpf_patch_len(const struct bpf_patch *patch) +{ + return patch->len; +} + +int bpf_patch_err(const struct bpf_patch *patch) +{ + return patch->err; +} + +void __bpf_patch_append(struct bpf_patch *patch, struct bpf_insn insn) +{ + void *arr; + + if (patch->err) + return; + + if (patch->len + 1 > patch->capacity) { + if (!patch->capacity) + patch->capacity = 16; + else + patch->capacity *= 2; + + arr = krealloc_array(patch->insn, patch->capacity, sizeof(insn), GFP_KERNEL); + if (!arr) { + patch->err = -ENOMEM; + kfree(patch->insn); + return; + } + + patch->insn = arr; + patch->capacity *= 2; + } + + patch->insn[patch->len++] = insn; +} +EXPORT_SYMBOL(__bpf_patch_append); + +struct bpf_insn *bpf_patch_data(const struct bpf_patch *patch) +{ + return patch->insn; +} + +void bpf_patch_resolve_jmp(struct bpf_patch *patch) +{ + int i; + + for (i = 0; i < patch->len; i++) { + if (BPF_CLASS(patch->insn[i].code) != BPF_JMP) + continue; + + if (patch->insn[i].off != S16_MAX) + continue; + + patch->insn[i].off = patch->len - i - 1; + } +} + +u32 bpf_patch_magles_registers(const struct bpf_patch *patch) +{ + u32 mask = 0; + int i; + + for (i = 0; i < patch->len; i++) + mask |= 1 << patch->insn[i].dst_reg; + + return mask; +} From patchwork Tue Nov 15 03:02:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stanislav Fomichev X-Patchwork-Id: 13043161 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38364C43219 for ; Tue, 15 Nov 2022 03:04:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230131AbiKODEB (ORCPT ); Mon, 14 Nov 2022 22:04:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45388 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237390AbiKODDM (ORCPT ); Mon, 14 Nov 2022 22:03:12 -0500 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 03FEEE3B for ; Mon, 14 Nov 2022 19:02:18 -0800 (PST) Received: by mail-pl1-x64a.google.com with SMTP id q6-20020a170902dac600b001873ef77938so10354433plx.18 for ; Mon, 14 Nov 2022 19:02:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=FVcjJTYs3qCTkpvcT8nqvJ1EIP9Gk9gj05zdIvg+G/4=; b=HN4cBokocanuFyOXb64l/iL3pA0M2RMz4yk+DugW6A4fe3ZodPd4y1Jq4ie7ScmAzB j5Vo4Eg9wj3prU+naUqkMs1BTGvJ1JeMFH9NdRr2e4IKbH8rzaAH+M/wtXiRyz6i92XG eaQH6ILLJYL22e3ix1hH8rp1ATNgsJP+U2dTcNqg66vZ0LniGf5NprJsTGzOVXRbgGwV bRbGTOwDQHoQkvxFCbLdz7/GIJ4A/waf7UagTm0ETBRrCoM6ZRFwK89tsN0S/aRe3euS ltQL8rTMBy6r0RlYMwr1zRpdUWShOf3AmZzDX/cw+SpajUofndbQwoTb+wTuLF8olsAv VkiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=FVcjJTYs3qCTkpvcT8nqvJ1EIP9Gk9gj05zdIvg+G/4=; b=x3bE6IQxMK59m5wS6rmVQ/7AWMVqPyhXriHoIQdZx+8oQ7ufBG2uIrAs6DQjCX0VqD rR1kT5VMVFU0UA3DbgMbX7zOmY048Vhk07DWZ2T/BzbL4KLzw3b4O3mH0bOVsPb431oh SYVEHrk7WJwqFvJoOuQ8v08lKWuLJaSGOe5qS56M5dvX5nR/X73vnaHsoC2Nqwf36+rU Hse0lrtgeGL8exBbigAqHEUwFxmqin1tACDn8EeyDMeBZt8kw9l2NFI1a1tudJigOnN8 tTqQvQmdajvSG6qNh4rhjYch9orlqd+CBHJn9rKfz4ke1rbYxFbnxJ/+8XnPOze3vCci pRaA== X-Gm-Message-State: ANoB5pmE4Fzr+8QtTnqv5sXJ0IrVF5blFj/6ul4DUwfcYxQctsGwf1Ng km7aGEYAMQBi58OvqqkqbbzB7U0= X-Google-Smtp-Source: AA0mqf4hbdUGs+Tmj8aE8WP87yMZZUpn7lQ8UzARwIt9yagVEkHYPOg3xaTZGtc4XTTSjMx2Ns4fDns= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a17:90a:9503:b0:20a:eab5:cf39 with SMTP id t3-20020a17090a950300b0020aeab5cf39mr74515pjo.1.1668481337155; Mon, 14 Nov 2022 19:02:17 -0800 (PST) Date: Mon, 14 Nov 2022 19:02:02 -0800 In-Reply-To: <20221115030210.3159213-1-sdf@google.com> Mime-Version: 1.0 References: <20221115030210.3159213-1-sdf@google.com> X-Mailer: git-send-email 2.38.1.431.g37b22c650d-goog Message-ID: <20221115030210.3159213-4-sdf@google.com> Subject: [PATCH bpf-next 03/11] bpf: Support inlined/unrolled kfuncs for xdp metadata From: Stanislav Fomichev To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, David Ahern , Jakub Kicinski , Willem de Bruijn , Jesper Dangaard Brouer , Anatoly Burakov , Alexander Lobakin , Magnus Karlsson , Maryam Tahhan , xdp-hints@xdp-project.net, netdev@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Kfuncs have to be defined with KF_UNROLL for an attempted unroll. For now, only XDP programs can have their kfuncs unrolled, but we can extend this later on if more programs would like to use it. For XDP, we define a new kfunc set (xdp_metadata_kfunc_ids) which implements all possible metatada kfuncs. Not all devices have to implement them. If unrolling is not supported by the target device, the default implementation is called instead. The default implementation is unconditionally unrolled to 'return false/0/NULL' for now. Upon loading, if BPF_F_XDP_HAS_METADATA is passed via prog_flags, we treat prog_index as target device for kfunc unrolling. net_device_ops gains new ndo_unroll_kfunc which does the actual dirty work per device. The kfunc unrolling itself largely follows the existing map_gen_lookup unrolling example, so there is nothing new here. Cc: John Fastabend Cc: David Ahern Cc: Martin KaFai Lau Cc: Jakub Kicinski Cc: Willem de Bruijn Cc: Jesper Dangaard Brouer Cc: Anatoly Burakov Cc: Alexander Lobakin Cc: Magnus Karlsson Cc: Maryam Tahhan Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Signed-off-by: Stanislav Fomichev --- Documentation/bpf/kfuncs.rst | 8 +++++ include/linux/bpf.h | 1 + include/linux/btf.h | 1 + include/linux/btf_ids.h | 4 +++ include/linux/netdevice.h | 5 +++ include/net/xdp.h | 24 +++++++++++++ include/uapi/linux/bpf.h | 5 +++ kernel/bpf/syscall.c | 28 ++++++++++++++- kernel/bpf/verifier.c | 65 ++++++++++++++++++++++++++++++++++ net/core/dev.c | 7 ++++ net/core/xdp.c | 39 ++++++++++++++++++++ tools/include/uapi/linux/bpf.h | 5 +++ 12 files changed, 191 insertions(+), 1 deletion(-) diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst index 0f858156371d..1723de2720bb 100644 --- a/Documentation/bpf/kfuncs.rst +++ b/Documentation/bpf/kfuncs.rst @@ -169,6 +169,14 @@ rebooting or panicking. Due to this additional restrictions apply to these calls. At the moment they only require CAP_SYS_BOOT capability, but more can be added later. +2.4.8 KF_UNROLL flag +----------------------- + +The KF_UNROLL flag is used for kfuncs that the verifier can attempt to unroll. +Unrolling is currently implemented only for XDP programs' metadata kfuncs. +The main motivation behind unrolling is to remove function call overhead +and allow efficient inlined kfuncs to be generated. + 2.5 Registering the kfuncs -------------------------- diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 798aec816970..bf8936522dd9 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1240,6 +1240,7 @@ struct bpf_prog_aux { struct work_struct work; struct rcu_head rcu; }; + const struct net_device_ops *xdp_kfunc_ndo; }; struct bpf_prog { diff --git a/include/linux/btf.h b/include/linux/btf.h index d80345fa566b..950cca997a5a 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -51,6 +51,7 @@ #define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ #define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ #define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */ +#define KF_UNROLL (1 << 7) /* kfunc unrolling can be attempted */ /* * Return the name of the passed struct, if exists, or halt the build if for diff --git a/include/linux/btf_ids.h b/include/linux/btf_ids.h index c9744efd202f..eb448e9c79bb 100644 --- a/include/linux/btf_ids.h +++ b/include/linux/btf_ids.h @@ -195,6 +195,10 @@ asm( \ __BTF_ID_LIST(name, local) \ __BTF_SET8_START(name, local) +#define BTF_SET8_START_GLOBAL(name) \ +__BTF_ID_LIST(name, global) \ +__BTF_SET8_START(name, global) + #define BTF_SET8_END(name) \ asm( \ ".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \ diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 02a2318da7c7..2096b4f00e4b 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -73,6 +73,8 @@ struct udp_tunnel_info; struct udp_tunnel_nic_info; struct udp_tunnel_nic; struct bpf_prog; +struct bpf_insn; +struct bpf_patch; struct xdp_buff; void synchronize_net(void); @@ -1604,6 +1606,9 @@ struct net_device_ops { ktime_t (*ndo_get_tstamp)(struct net_device *dev, const struct skb_shared_hwtstamps *hwtstamps, bool cycles); + void (*ndo_unroll_kfunc)(const struct bpf_prog *prog, + u32 func_id, + struct bpf_patch *patch); }; /** diff --git a/include/net/xdp.h b/include/net/xdp.h index 55dbc68bfffc..2a82a98f2f9f 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -7,6 +7,7 @@ #define __LINUX_NET_XDP_H__ #include /* skb_shared_info */ +#include /* btf_id_set8 */ /** * DOC: XDP RX-queue information @@ -409,4 +410,27 @@ void xdp_attachment_setup(struct xdp_attachment_info *info, #define DEV_MAP_BULK_SIZE XDP_BULK_QUEUE_SIZE +#define XDP_METADATA_KFUNC_xxx \ + XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED, \ + bpf_xdp_metadata_rx_timestamp_supported) \ + XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_TIMESTAMP, \ + bpf_xdp_metadata_rx_timestamp) \ + +enum { +#define XDP_METADATA_KFUNC(name, str) name, +XDP_METADATA_KFUNC_xxx +#undef XDP_METADATA_KFUNC +MAX_XDP_METADATA_KFUNC, +}; + +#ifdef CONFIG_DEBUG_INFO_BTF +extern struct btf_id_set8 xdp_metadata_kfunc_ids; +static inline u32 xdp_metadata_kfunc_id(int id) +{ + return xdp_metadata_kfunc_ids.pairs[id].id; +} +#else +static inline u32 xdp_metadata_kfunc_id(int id) { return 0; } +#endif + #endif /* __LINUX_NET_XDP_H__ */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index fb4c911d2a03..b444b1118c4f 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1156,6 +1156,11 @@ enum bpf_link_type { */ #define BPF_F_XDP_HAS_FRAGS (1U << 5) +/* If BPF_F_XDP_HAS_METADATA is used in BPF_PROG_LOAD command, the loaded + * program becomes device-bound but can access it's XDP metadata. + */ +#define BPF_F_XDP_HAS_METADATA (1U << 6) + /* link_create.kprobe_multi.flags used in LINK_CREATE command for * BPF_TRACE_KPROBE_MULTI attach type to create return probe. */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 85532d301124..597c41949910 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2426,6 +2426,20 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type) /* last field in 'union bpf_attr' used by this command */ #define BPF_PROG_LOAD_LAST_FIELD core_relo_rec_size +static int xdp_resolve_netdev(struct bpf_prog *prog, int ifindex) +{ + struct net *net = current->nsproxy->net_ns; + struct net_device *dev; + + for_each_netdev(net, dev) { + if (dev->ifindex == ifindex) { + prog->aux->xdp_kfunc_ndo = dev->netdev_ops; + return 0; + } + } + return -EINVAL; +} + static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr) { enum bpf_prog_type type = attr->prog_type; @@ -2443,7 +2457,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr) BPF_F_TEST_STATE_FREQ | BPF_F_SLEEPABLE | BPF_F_TEST_RND_HI32 | - BPF_F_XDP_HAS_FRAGS)) + BPF_F_XDP_HAS_FRAGS | + BPF_F_XDP_HAS_METADATA)) return -EINVAL; if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && @@ -2531,6 +2546,17 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr) prog->aux->sleepable = attr->prog_flags & BPF_F_SLEEPABLE; prog->aux->xdp_has_frags = attr->prog_flags & BPF_F_XDP_HAS_FRAGS; + if (attr->prog_flags & BPF_F_XDP_HAS_METADATA) { + /* Reuse prog_ifindex to carry request to unroll + * metadata kfuncs. + */ + prog->aux->offload_requested = false; + + err = xdp_resolve_netdev(prog, attr->prog_ifindex); + if (err < 0) + goto free_prog; + } + err = security_bpf_prog_alloc(prog->aux); if (err) goto free_prog; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 07c0259dfc1a..b657ed6eb277 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -14015,6 +14016,43 @@ static int fixup_call_args(struct bpf_verifier_env *env) return err; } +static int unroll_kfunc_call(struct bpf_verifier_env *env, + struct bpf_insn *insn, + struct bpf_patch *patch) +{ + enum bpf_prog_type prog_type; + struct bpf_prog_aux *aux; + struct btf *desc_btf; + u32 *kfunc_flags; + u32 func_id; + + desc_btf = find_kfunc_desc_btf(env, insn->off); + if (IS_ERR(desc_btf)) + return PTR_ERR(desc_btf); + + prog_type = resolve_prog_type(env->prog); + func_id = insn->imm; + + kfunc_flags = btf_kfunc_id_set_contains(desc_btf, prog_type, func_id); + if (!kfunc_flags) + return 0; + if (!(*kfunc_flags & KF_UNROLL)) + return 0; + if (prog_type != BPF_PROG_TYPE_XDP) + return 0; + + aux = env->prog->aux; + if (aux->xdp_kfunc_ndo && aux->xdp_kfunc_ndo->ndo_unroll_kfunc) + aux->xdp_kfunc_ndo->ndo_unroll_kfunc(env->prog, func_id, patch); + if (bpf_patch_len(patch) == 0) { + /* Default optimized kfunc implementation that + * returns NULL/0/false. + */ + bpf_patch_append(patch, BPF_MOV64_IMM(BPF_REG_0, 0)); + } + return bpf_patch_err(patch); +} + static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn) { @@ -14178,6 +14216,33 @@ static int do_misc_fixups(struct bpf_verifier_env *env) if (insn->src_reg == BPF_PSEUDO_CALL) continue; if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) { + struct bpf_patch patch = {}; + + if (bpf_prog_is_dev_bound(env->prog->aux)) { + verbose(env, "no metadata kfuncs offload\n"); + return -EINVAL; + } + + ret = unroll_kfunc_call(env, insn, &patch); + if (ret < 0) { + verbose(env, "failed to unroll kfunc with func_id=%d\n", insn->imm); + return cnt; + } + cnt = bpf_patch_len(&patch); + if (cnt) { + new_prog = bpf_patch_insn_data(env, i + delta, + bpf_patch_data(&patch), + bpf_patch_len(&patch)); + bpf_patch_free(&patch); + if (!new_prog) + return -ENOMEM; + + delta += cnt - 1; + env->prog = prog = new_prog; + insn = new_prog->insnsi + i + delta; + continue; + } + ret = fixup_kfunc_call(env, insn); if (ret) return ret; diff --git a/net/core/dev.c b/net/core/dev.c index 117e830cabb0..a2227f4f4a0b 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -9258,6 +9258,13 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack return -EOPNOTSUPP; } + if (new_prog && + new_prog->aux->xdp_kfunc_ndo && + new_prog->aux->xdp_kfunc_ndo != dev->netdev_ops) { + NL_SET_ERR_MSG(extack, "Cannot attach to a different target device"); + return -EINVAL; + } + err = dev_xdp_install(dev, mode, bpf_op, extack, flags, new_prog); if (err) return err; diff --git a/net/core/xdp.c b/net/core/xdp.c index 844c9d99dc0e..22f1e44700eb 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -4,6 +4,8 @@ * Copyright (c) 2017 Jesper Dangaard Brouer, Red Hat Inc. */ #include +#include +#include #include #include #include @@ -709,3 +711,40 @@ struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf) return nxdpf; } + +/* Indicates whether particular device supports rx_timestamp metadata. + * This is an optional helper to support marking some branches as + * "dead code" in the BPF programs. + */ +noinline int bpf_xdp_metadata_rx_timestamp_supported(const struct xdp_md *ctx) +{ + /* payload is ignored, see default case in unroll_kfunc_call */ + return false; +} + +/* Returns rx_timestamp metadata or 0 when the frame doesn't have it. + */ +noinline const __u64 bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx) +{ + /* payload is ignored, see default case in unroll_kfunc_call */ + return 0; +} + +#ifdef CONFIG_DEBUG_INFO_BTF +BTF_SET8_START_GLOBAL(xdp_metadata_kfunc_ids) +#define XDP_METADATA_KFUNC(name, str) BTF_ID_FLAGS(func, str, KF_RET_NULL | KF_UNROLL) +XDP_METADATA_KFUNC_xxx +#undef XDP_METADATA_KFUNC +BTF_SET8_END(xdp_metadata_kfunc_ids) + +static const struct btf_kfunc_id_set xdp_metadata_kfunc_set = { + .owner = THIS_MODULE, + .set = &xdp_metadata_kfunc_ids, +}; + +static int __init xdp_metadata_init(void) +{ + return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &xdp_metadata_kfunc_set); +} +late_initcall(xdp_metadata_init); +#endif diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index fb4c911d2a03..b444b1118c4f 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1156,6 +1156,11 @@ enum bpf_link_type { */ #define BPF_F_XDP_HAS_FRAGS (1U << 5) +/* If BPF_F_XDP_HAS_METADATA is used in BPF_PROG_LOAD command, the loaded + * program becomes device-bound but can access it's XDP metadata. + */ +#define BPF_F_XDP_HAS_METADATA (1U << 6) + /* link_create.kprobe_multi.flags used in LINK_CREATE command for * BPF_TRACE_KPROBE_MULTI attach type to create return probe. */ From patchwork Tue Nov 15 03:02:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stanislav Fomichev X-Patchwork-Id: 13043168 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED7A8C43217 for ; Tue, 15 Nov 2022 03:04:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236982AbiKODED (ORCPT ); Mon, 14 Nov 2022 22:04:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46686 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237555AbiKODDN (ORCPT ); Mon, 14 Nov 2022 22:03:13 -0500 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9A08810B6 for ; Mon, 14 Nov 2022 19:02:19 -0800 (PST) Received: by mail-pf1-x449.google.com with SMTP id x11-20020a056a000bcb00b0056c6ec11eefso7090729pfu.14 for ; Mon, 14 Nov 2022 19:02:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=a5AIkCTUh38LdhgjcqmXMopWX95gOLvjqlIhAbQ6HyA=; b=ZwvO7zet084WHw2NNbL2nH/2JhepQmESb/xkbKDrZhFD7rXiVClnYJnNz2HPlkFqu9 Ry+yzF0yZ8oiV7/OJEQazrwgilIw6a7C3V6GeatC+nPUohEmxF7rcFqep8BVmsZ4zkdl qIG4GOhaL3WMrLDYlnxvKEIY5Ylaxa6JSxZOOzAiFjFY3/b1+tjCPqlMaoLb2+zbfmLh 3eXcY6OTSJHXwqYPdiE1zqOsrpaMF2Hpq9pgIVOlpSKWdIYvsKQB0/Z+svvx1rsMcFN1 AVfAGx4kLB4T3kV4jIyy3345Y0VqRwtwiSAEWOShlAMxnRJs8FuZfe7+g8Td1QxXybYd YbKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=a5AIkCTUh38LdhgjcqmXMopWX95gOLvjqlIhAbQ6HyA=; b=xKzc6RpfjyIT/5wZseRcbtcK3xu0hr4ndYe5kASB7as7aCQTDH39YhrM6uRyGTglSN E5yk1h0KlpXG+JFOFIoiIhrO0UlZ5Qb/MO9knbyZvburOiXKqfJxgruDPJhZ3PbjOOpZ 0ZxGSL1ynVEM6t6oGXjHFWsh5RTQ3VdLLqRujcFgGcM1R84dL+RCWyvd64/FqyzE5AwN t+KAcc2IQv0YeG4YXg8Sz8NiuKJ+uGeZ2CEl7TvlMGCfTrZr/C/1YrNwmNqosTFKWNYt sNMpmyBxnAFqE4vmyt5OOxSLFSMrhADoe987/J0tosrlGIUYNJvV7tuS+qXmdZslhg7n cEoQ== X-Gm-Message-State: ANoB5pkMBoteXGzqsmDkSKbPAs0YO6EFdXcIac4txZ7czisxjStnkQG+ jPGKgulhd+y8wOMMOVGwEyz7qaBbdOilQyGvho+G+OteIM9GkYz2YV/kPbHVAUb7Khj2eZN9NcO sF+ZU0Gr7vdyAcmlKk0Avrd/oYW2ZMnhqd5lUSI0ZnKBKrIqs7A== X-Google-Smtp-Source: AA0mqf5ShA3hCnfBoDvhrowJbJznEs4r+D7gQF9n9YYWRe0n8Vk+I8CK63wpuPvFahtMYEOmePixyTA= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a62:6283:0:b0:56e:989d:7410 with SMTP id w125-20020a626283000000b0056e989d7410mr16594601pfb.1.1668481339043; Mon, 14 Nov 2022 19:02:19 -0800 (PST) Date: Mon, 14 Nov 2022 19:02:03 -0800 In-Reply-To: <20221115030210.3159213-1-sdf@google.com> Mime-Version: 1.0 References: <20221115030210.3159213-1-sdf@google.com> X-Mailer: git-send-email 2.38.1.431.g37b22c650d-goog Message-ID: <20221115030210.3159213-5-sdf@google.com> Subject: [PATCH bpf-next 04/11] bpf: Implement hidden BPF_PUSH64 and BPF_POP64 instructions From: Stanislav Fomichev To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, Zi Shen Lim Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Implemented for: - x86_64 jit (tested) - arm64 jit (untested) Interpreter is not implemented because push/pop are currently used only with xdp kfunc and jit is required to use kfuncs. Fundamentally: BPF_ST | BPF_STACK + src_reg == store into the stack BPF_LD | BPF_STACK + dst_reg == load from the stack off/imm are unused Updated disasm code to properly dump these new instructions: 31: (e2) push r1 32: (79) r5 = *(u64 *)(r1 +56) 33: (55) if r5 != 0x0 goto pc+2 34: (b7) r0 = 0 35: (05) goto pc+1 36: (79) r0 = *(u64 *)(r5 +32) 37: (e0) pop r1 Cc: Zi Shen Lim Suggested-by: Alexei Starovoitov Signed-off-by: Stanislav Fomichev --- arch/arm64/net/bpf_jit_comp.c | 8 ++++++++ arch/x86/net/bpf_jit_comp.c | 8 ++++++++ include/linux/filter.h | 23 +++++++++++++++++++++++ kernel/bpf/disasm.c | 6 ++++++ 4 files changed, 45 insertions(+) diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 62f805f427b7..4c0e70e6572a 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -1185,6 +1185,14 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, */ break; + /* kernel hidden stack operations */ + case BPF_ST | BPF_STACK: + emit(A64_PUSH(src, src, A64_SP), ctx); + break; + case BPF_LD | BPF_STACK: + emit(A64_POP(dst, dst, A64_SP), ctx); + break; + /* ST: *(size *)(dst + off) = imm */ case BPF_ST | BPF_MEM | BPF_W: case BPF_ST | BPF_MEM | BPF_H: diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index cec5195602bc..528bece87ca4 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -1324,6 +1324,14 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image EMIT_LFENCE(); break; + /* kernel hidden stack operations */ + case BPF_ST | BPF_STACK: + EMIT1(add_1reg(0x50, src_reg)); /* pushq */ + break; + case BPF_LD | BPF_STACK: + EMIT1(add_1reg(0x58, dst_reg)); /* popq */ + break; + /* ST: *(u8*)(dst_reg + off) = imm */ case BPF_ST | BPF_MEM | BPF_B: if (is_ereg(dst_reg)) diff --git a/include/linux/filter.h b/include/linux/filter.h index efc42a6e3aed..42c61ec8f895 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -76,6 +76,9 @@ struct ctl_table_header; */ #define BPF_NOSPEC 0xc0 +/* unused opcode for kernel hidden stack operations */ +#define BPF_STACK 0xe0 + /* As per nm, we expose JITed images as text (code) section for * kallsyms. That way, tools like perf can find it to match * addresses. @@ -402,6 +405,26 @@ static inline bool insn_is_zext(const struct bpf_insn *insn) .off = 0, \ .imm = 0 }) +/* Push SRC register value onto the stack */ + +#define BPF_PUSH64(SRC) \ + ((struct bpf_insn) { \ + .code = BPF_ST | BPF_STACK, \ + .dst_reg = 0, \ + .src_reg = SRC, \ + .off = 0, \ + .imm = 0 }) + +/* Pop stack value into DST register */ + +#define BPF_POP64(DST) \ + ((struct bpf_insn) { \ + .code = BPF_LD | BPF_STACK, \ + .dst_reg = DST, \ + .src_reg = 0, \ + .off = 0, \ + .imm = 0 }) + /* Internal classic blocks for direct assignment */ #define __BPF_STMT(CODE, K) \ diff --git a/kernel/bpf/disasm.c b/kernel/bpf/disasm.c index 7b4afb7d96db..9cd22f3591de 100644 --- a/kernel/bpf/disasm.c +++ b/kernel/bpf/disasm.c @@ -214,6 +214,9 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs, insn->off, insn->imm); } else if (BPF_MODE(insn->code) == 0xc0 /* BPF_NOSPEC, no UAPI */) { verbose(cbs->private_data, "(%02x) nospec\n", insn->code); + } else if (BPF_MODE(insn->code) == 0xe0 /* BPF_STACK, no UAPI */) { + verbose(cbs->private_data, "(%02x) push r%d\n", + insn->code, insn->src_reg); } else { verbose(cbs->private_data, "BUG_st_%02x\n", insn->code); } @@ -254,6 +257,9 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs, insn->code, insn->dst_reg, __func_imm_name(cbs, insn, imm, tmp, sizeof(tmp))); + } else if (BPF_MODE(insn->code) == 0xe0 /* BPF_STACK, no UAPI */) { + verbose(cbs->private_data, "(%02x) pop r%d\n", + insn->code, insn->dst_reg); } else { verbose(cbs->private_data, "BUG_ld_%02x\n", insn->code); return; From patchwork Tue Nov 15 03:02:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stanislav Fomichev X-Patchwork-Id: 13043160 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C249EC433FE for ; Tue, 15 Nov 2022 03:04:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231544AbiKODEF (ORCPT ); Mon, 14 Nov 2022 22:04:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237545AbiKODDN (ORCPT ); Mon, 14 Nov 2022 22:03:13 -0500 Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com [IPv6:2607:f8b0:4864:20::549]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9AF825CB for ; Mon, 14 Nov 2022 19:02:21 -0800 (PST) Received: by mail-pg1-x549.google.com with SMTP id r126-20020a632b84000000b004393806c06eso6750404pgr.4 for ; Mon, 14 Nov 2022 19:02:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=uak+AW+U5N4YbNn9buxZQp6UdtaFEKD9ZsMrNaxGG78=; b=IJf/QnwPGrSOWHjPA+9ALOSOh1NqTsi7DNF5o+Ei0MEPHu3z8rVErhbWDW95Xosh6Q uIBw9AgpUELCuBvp7NE8jOywhhD14OWbd78fEpD6KxeVYeOJoAmeyT0cCxCKVBRoR43c /6xdh7hMjmO+XCoGDf8FglKZERcU0WkoRUeHblmrOArJDbR8R6Ss6ko8vZDfltoFJS2M HIf4E/Rd05rwvEfBTnJTl9AYs3HcQrTnW7JJ8cVq+HBocflxY2Z1/iM46aeDi/Xb7xm4 eFMvBRDWXPWdAaEwFcbd4HsFoNUByfHnFSyu8YxGlhbGhjJXdZGHki+EooWK7TxPBQsf Q7Kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uak+AW+U5N4YbNn9buxZQp6UdtaFEKD9ZsMrNaxGG78=; b=SBH0Ll/kGwe9BmbpvtuemUBgGzehrfpl1K1sjvNc6Ura2191amoS13lWlF0NHSt7Nb qcaC0zsMWKTes6/nN4HPwCKvTgvu9uRrToqwfIOGtzGCSE4j7h9+QHfnF6+7cZedErm1 GaJUb4nnTtcOoHprWw2xrdVo4Fn2UkJbBj1nuyaBf5NGugFCjgPPwKzW+EjGp8RiGcTH PjcRMeDabrby8VsyrCmLJsrYOjyA6l4htYFHbkyzDUxy8qNZdpi6h3IG7Iqnp2aWWt6Z JJQg2s6BtpjGbFPW/IrNS2p18ikkIOyZLmI8o2q1tDKMdmuFRLa7Iukmcj5pKjkJT/xm 8qcg== X-Gm-Message-State: ANoB5pnRS3rSaZsOonEIsK//ZoYKhQJp7SiA5Pu2K7vrMWVrVG7x6YOw lLn5u7ObGT2YEl82EIPngA+0fF8= X-Google-Smtp-Source: AA0mqf7XuKdCJk8xpjbtBEno+Sz5is/P53QXTUKvG4x+k1O80lInCyELmD/UVxjXkUNmhMxob/eWiBA= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a17:902:e546:b0:186:c56d:4950 with SMTP id n6-20020a170902e54600b00186c56d4950mr2116464plf.69.1668481341128; Mon, 14 Nov 2022 19:02:21 -0800 (PST) Date: Mon, 14 Nov 2022 19:02:04 -0800 In-Reply-To: <20221115030210.3159213-1-sdf@google.com> Mime-Version: 1.0 References: <20221115030210.3159213-1-sdf@google.com> X-Mailer: git-send-email 2.38.1.431.g37b22c650d-goog Message-ID: <20221115030210.3159213-6-sdf@google.com> Subject: [PATCH bpf-next 05/11] veth: Support rx timestamp metadata for xdp From: Stanislav Fomichev To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, David Ahern , Jakub Kicinski , Willem de Bruijn , Jesper Dangaard Brouer , Anatoly Burakov , Alexander Lobakin , Magnus Karlsson , Maryam Tahhan , xdp-hints@xdp-project.net, netdev@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net The goal is to enable end-to-end testing of the metadata for AF_XDP. Current rx_timestamp kfunc returns current time which should be enough to exercise this new functionality. Cc: John Fastabend Cc: David Ahern Cc: Martin KaFai Lau Cc: Jakub Kicinski Cc: Willem de Bruijn Cc: Jesper Dangaard Brouer Cc: Anatoly Burakov Cc: Alexander Lobakin Cc: Magnus Karlsson Cc: Maryam Tahhan Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Signed-off-by: Stanislav Fomichev --- drivers/net/veth.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 2a4592780141..c626580a2294 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #define DRV_NAME "veth" @@ -1659,6 +1660,18 @@ static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp) } } +static void veth_unroll_kfunc(const struct bpf_prog *prog, u32 func_id, + struct bpf_patch *patch) +{ + if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED)) { + /* return true; */ + bpf_patch_append(patch, BPF_MOV64_IMM(BPF_REG_0, 1)); + } else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP)) { + /* return ktime_get_mono_fast_ns(); */ + bpf_patch_append(patch, BPF_EMIT_CALL(ktime_get_mono_fast_ns)); + } +} + static const struct net_device_ops veth_netdev_ops = { .ndo_init = veth_dev_init, .ndo_open = veth_open, @@ -1678,6 +1691,7 @@ static const struct net_device_ops veth_netdev_ops = { .ndo_bpf = veth_xdp, .ndo_xdp_xmit = veth_ndo_xdp_xmit, .ndo_get_peer_dev = veth_peer_dev, + .ndo_unroll_kfunc = veth_unroll_kfunc, }; #define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | \ From patchwork Tue Nov 15 03:02:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stanislav Fomichev X-Patchwork-Id: 13043162 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87648C43217 for ; Tue, 15 Nov 2022 03:04:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237423AbiKODEG (ORCPT ); Mon, 14 Nov 2022 22:04:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46610 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232366AbiKODDP (ORCPT ); Mon, 14 Nov 2022 22:03:15 -0500 Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com [IPv6:2607:f8b0:4864:20::649]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B3AA2679 for ; Mon, 14 Nov 2022 19:02:23 -0800 (PST) Received: by mail-pl1-x649.google.com with SMTP id k15-20020a170902c40f00b001887cd71fe6so10300670plk.5 for ; Mon, 14 Nov 2022 19:02:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=dTHRlkKLabXAtQPLHvb8Vrt1mLuPnrRmBOPa7WfDJFo=; b=PSbO1KyGXRAvZ8j1J3jSGk9bHxjCktcbSbe4QgM9nd39etPIqzG2GeYPRqdEEuLE/t vadXutMRJql9/LPtK4MbUWpmGbfCoPDCII+aWNzEHbzwae6S4WjHzLeLPHp3ofY/B4eQ rDYUQHIxpGADNHf1XYF3IH8/Vs63ANPNQgFED0ewym9kJiv7pklXNyQoey99CEgJv6aC g1YxFnlhewNbV4+q1XvQpY/msGlOAlT9t5W0DXeBIWpd8KdiHIbbyeP/nIQg242Zyqqx kzJBl1m7x+Apo5QSGpUnGOXrJG+VoudvBEMJrbfX+4JetLE8KLgvbndHmDjXFn1ipHp1 5YVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=dTHRlkKLabXAtQPLHvb8Vrt1mLuPnrRmBOPa7WfDJFo=; b=j2LqJLmGVUg/xzZ4XFSBfWxqI03uI36JFz9uvmQQKo3/6cF0tJo4hD+cpmvuoMu7YR 22T8kKOnk+nejqoo3rVyxx7DlZR4MWjHdcXJlz9QpE98BCD0786/lRZMzhpcmfAzL/Lp 1V5MMndHcLFQpw3FgLGm51RtVlqcTImO3SZcpz4hHHL4jwIaGD+ujDjApfFK4b4kE6QA VJZUWSmYimliAxI5Ru4nSqRS2+uLK3dIyXs43d6feFX3bbIp4etqQatVqEgJQfO0eP2v MPBdq2buJznzKPHc5TjDNDxhrDAvAT2S5JH3iQCfCn3DJAF0xJS+2278J+Dd2cZMhbcI 8nZw== X-Gm-Message-State: ANoB5plwmjCcJv/2auk5GB961K6ctuxmoEa4Xs2WCTu6SGj/aqWgflUU 4ysNc6gB+aOLJzmM+9g3RHypclg= X-Google-Smtp-Source: AA0mqf7KV4otaR0D1CCR7hnm1sbTUQBk76wwUgieCkNqG3pbAo/Knjb6lkgclLoShn7O+/WeTPlAdW0= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a62:e919:0:b0:56d:3180:e885 with SMTP id j25-20020a62e919000000b0056d3180e885mr16605597pfh.82.1668481342776; Mon, 14 Nov 2022 19:02:22 -0800 (PST) Date: Mon, 14 Nov 2022 19:02:05 -0800 In-Reply-To: <20221115030210.3159213-1-sdf@google.com> Mime-Version: 1.0 References: <20221115030210.3159213-1-sdf@google.com> X-Mailer: git-send-email 2.38.1.431.g37b22c650d-goog Message-ID: <20221115030210.3159213-7-sdf@google.com> Subject: [PATCH bpf-next 06/11] xdp: Carry over xdp metadata into skb context From: Stanislav Fomichev To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, David Ahern , Jakub Kicinski , Willem de Bruijn , Jesper Dangaard Brouer , Anatoly Burakov , Alexander Lobakin , Magnus Karlsson , Maryam Tahhan , xdp-hints@xdp-project.net, netdev@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Implement new bpf_xdp_metadata_export_to_skb kfunc which prepares compatible xdp metadata for kernel consumption. This kfunc should be called prior to bpf_redirect or when XDP_PASS'ing the frame into the kernel (note, the drivers have to be updated to enable consuming XDP_PASS'ed metadata). veth driver is amended to consume this metadata when converting to skb. Internally, XDP_FLAGS_HAS_SKB_METADATA flag is used to indicate whether the frame has skb metadata. The metadata is currently stored prior to xdp->data_meta. bpf_xdp_adjust_meta refuses to work after a call to bpf_xdp_metadata_export_to_skb (can lift this requirement later on if needed, we'd have to memmove xdp_skb_metadata). Cc: John Fastabend Cc: David Ahern Cc: Martin KaFai Lau Cc: Jakub Kicinski Cc: Willem de Bruijn Cc: Jesper Dangaard Brouer Cc: Anatoly Burakov Cc: Alexander Lobakin Cc: Magnus Karlsson Cc: Maryam Tahhan Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Signed-off-by: Stanislav Fomichev --- drivers/net/veth.c | 10 +-- include/linux/skbuff.h | 4 + include/net/xdp.h | 17 ++++ include/uapi/linux/bpf.h | 7 ++ kernel/bpf/verifier.c | 15 ++++ net/core/filter.c | 40 +++++++++ net/core/skbuff.c | 20 +++++ net/core/xdp.c | 145 +++++++++++++++++++++++++++++++-- tools/include/uapi/linux/bpf.h | 7 ++ 9 files changed, 255 insertions(+), 10 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index c626580a2294..35349a232209 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -803,7 +803,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, void *orig_data, *orig_data_end; struct bpf_prog *xdp_prog; struct xdp_buff xdp; - u32 act, metalen; + u32 act; int off; skb_prepare_for_gro(skb); @@ -886,9 +886,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, skb->protocol = eth_type_trans(skb, rq->dev); - metalen = xdp.data - xdp.data_meta; - if (metalen) - skb_metadata_set(skb, metalen); + xdp_convert_skb_metadata(&xdp, skb); out: return skb; drop: @@ -1663,7 +1661,9 @@ static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp) static void veth_unroll_kfunc(const struct bpf_prog *prog, u32 func_id, struct bpf_patch *patch) { - if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED)) { + if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_EXPORT_TO_SKB)) { + return xdp_metadata_export_to_skb(prog, patch); + } else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED)) { /* return true; */ bpf_patch_append(patch, BPF_MOV64_IMM(BPF_REG_0, 1)); } else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP)) { diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 4e464a27adaf..be6a9559dbf1 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4219,6 +4219,10 @@ static inline bool skb_metadata_differs(const struct sk_buff *skb_a, true : __skb_metadata_differs(skb_a, skb_b, len_a); } +struct xdp_skb_metadata; +bool skb_metadata_import_from_xdp(struct sk_buff *skb, + struct xdp_skb_metadata *meta); + static inline void skb_metadata_set(struct sk_buff *skb, u8 meta_len) { skb_shinfo(skb)->meta_len = meta_len; diff --git a/include/net/xdp.h b/include/net/xdp.h index 2a82a98f2f9f..547a6a0e99f8 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -73,6 +73,7 @@ enum xdp_buff_flags { XDP_FLAGS_FRAGS_PF_MEMALLOC = BIT(1), /* xdp paged memory is under * pressure */ + XDP_FLAGS_HAS_SKB_METADATA = BIT(2), /* xdp_skb_metadata */ }; struct xdp_buff { @@ -91,6 +92,11 @@ static __always_inline bool xdp_buff_has_frags(struct xdp_buff *xdp) return !!(xdp->flags & XDP_FLAGS_HAS_FRAGS); } +static __always_inline bool xdp_buff_has_skb_metadata(struct xdp_buff *xdp) +{ + return !!(xdp->flags & XDP_FLAGS_HAS_SKB_METADATA); +} + static __always_inline void xdp_buff_set_frags_flag(struct xdp_buff *xdp) { xdp->flags |= XDP_FLAGS_HAS_FRAGS; @@ -306,6 +312,8 @@ struct xdp_frame *xdp_convert_buff_to_frame(struct xdp_buff *xdp) return xdp_frame; } +bool xdp_convert_skb_metadata(struct xdp_buff *xdp, struct sk_buff *skb); + void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct, struct xdp_buff *xdp); void xdp_return_frame(struct xdp_frame *xdpf); @@ -411,6 +419,8 @@ void xdp_attachment_setup(struct xdp_attachment_info *info, #define DEV_MAP_BULK_SIZE XDP_BULK_QUEUE_SIZE #define XDP_METADATA_KFUNC_xxx \ + XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_EXPORT_TO_SKB, \ + bpf_xdp_metadata_export_to_skb) \ XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED, \ bpf_xdp_metadata_rx_timestamp_supported) \ XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_TIMESTAMP, \ @@ -423,14 +433,21 @@ XDP_METADATA_KFUNC_xxx MAX_XDP_METADATA_KFUNC, }; +struct bpf_patch; + #ifdef CONFIG_DEBUG_INFO_BTF extern struct btf_id_set8 xdp_metadata_kfunc_ids; static inline u32 xdp_metadata_kfunc_id(int id) { return xdp_metadata_kfunc_ids.pairs[id].id; } +void xdp_metadata_export_to_skb(const struct bpf_prog *prog, struct bpf_patch *patch); #else static inline u32 xdp_metadata_kfunc_id(int id) { return 0; } +static void xdp_metadata_export_to_skb(const struct bpf_prog *prog, struct bpf_patch *patch) +{ + return 0; +} #endif #endif /* __LINUX_NET_XDP_H__ */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index b444b1118c4f..71e3bc7ad839 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -6116,6 +6116,12 @@ enum xdp_action { XDP_REDIRECT, }; +/* Subset of XDP metadata exported to skb context. + */ +struct xdp_skb_metadata { + __u64 rx_timestamp; +}; + /* user accessible metadata for XDP packet hook * new fields must be added to the end of this structure */ @@ -6128,6 +6134,7 @@ struct xdp_md { __u32 rx_queue_index; /* rxq->queue_index */ __u32 egress_ifindex; /* txq->dev->ifindex */ + __bpf_md_ptr(struct xdp_skb_metadata *, skb_metadata); }; /* DEVMAP map-value layout diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index b657ed6eb277..6879ad3a6026 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -14023,6 +14023,7 @@ static int unroll_kfunc_call(struct bpf_verifier_env *env, enum bpf_prog_type prog_type; struct bpf_prog_aux *aux; struct btf *desc_btf; + u32 allowed, mangled; u32 *kfunc_flags; u32 func_id; @@ -14050,6 +14051,20 @@ static int unroll_kfunc_call(struct bpf_verifier_env *env, */ bpf_patch_append(patch, BPF_MOV64_IMM(BPF_REG_0, 0)); } + + allowed = 1 << BPF_REG_0; + allowed |= 1 << BPF_REG_1; + allowed |= 1 << BPF_REG_2; + allowed |= 1 << BPF_REG_3; + allowed |= 1 << BPF_REG_4; + allowed |= 1 << BPF_REG_5; + mangled = bpf_patch_magles_registers(patch); + if (WARN_ON_ONCE(mangled & ~allowed)) { + bpf_patch_free(patch); + verbose(env, "bpf verifier is misconfigured\n"); + return -EINVAL; + } + return bpf_patch_err(patch); } diff --git a/net/core/filter.c b/net/core/filter.c index 6dd2baf5eeb2..2497144e4216 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4094,6 +4094,8 @@ BPF_CALL_2(bpf_xdp_adjust_meta, struct xdp_buff *, xdp, int, offset) return -EINVAL; if (unlikely(xdp_metalen_invalid(metalen))) return -EACCES; + if (unlikely(xdp_buff_has_skb_metadata(xdp))) + return -EACCES; xdp->data_meta = meta; @@ -8690,6 +8692,8 @@ static bool __is_valid_xdp_access(int off, int size) return true; } +BTF_ID_LIST_SINGLE(xdp_to_skb_metadata_btf_ids, struct, xdp_skb_metadata); + static bool xdp_is_valid_access(int off, int size, enum bpf_access_type type, const struct bpf_prog *prog, @@ -8722,6 +8726,18 @@ static bool xdp_is_valid_access(int off, int size, case offsetof(struct xdp_md, data_end): info->reg_type = PTR_TO_PACKET_END; break; + case offsetof(struct xdp_md, skb_metadata): + info->btf = bpf_get_btf_vmlinux(); + if (IS_ERR(info->btf)) + return PTR_ERR(info->btf); + if (!info->btf) + return -EINVAL; + + info->reg_type = PTR_TO_BTF_ID_OR_NULL; + info->btf_id = xdp_to_skb_metadata_btf_ids[0]; + if (size == sizeof(__u64)) + return true; + return false; } return __is_valid_xdp_access(off, size); @@ -9814,6 +9830,30 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, offsetof(struct net_device, ifindex)); break; + case offsetof(struct xdp_md, skb_metadata): + /* dst_reg = xdp_buff->flags; */ + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, flags), + si->dst_reg, si->src_reg, + offsetof(struct xdp_buff, flags)); + /* dst_reg &= XDP_FLAGS_HAS_SKB_METADATA; */ + *insn++ = BPF_ALU64_IMM(BPF_AND, si->dst_reg, + XDP_FLAGS_HAS_SKB_METADATA); + + /* if (dst_reg != 0) { */ + *insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 3); + /* dst_reg = xdp_buff->data_meta; */ + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, data_meta), + si->dst_reg, si->src_reg, + offsetof(struct xdp_buff, data_meta)); + /* dst_reg -= sizeof(struct xdp_skb_metadata); */ + *insn++ = BPF_ALU64_IMM(BPF_SUB, si->dst_reg, + sizeof(struct xdp_skb_metadata)); + *insn++ = BPF_JMP_A(1); + /* } else { */ + /* return 0; */ + *insn++ = BPF_MOV32_IMM(si->dst_reg, 0); + /* } */ + break; } return insn - insn_buf; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 90d085290d49..0cc24ca20e4d 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -72,6 +72,7 @@ #include #include #include +#include #include #include @@ -6675,3 +6676,22 @@ nodefer: __kfree_skb(skb); if (unlikely(kick) && !cmpxchg(&sd->defer_ipi_scheduled, 0, 1)) smp_call_function_single_async(cpu, &sd->defer_csd); } + +bool skb_metadata_import_from_xdp(struct sk_buff *skb, + struct xdp_skb_metadata *meta) +{ + /* Optional SKB info, currently missing: + * - HW checksum info (skb->ip_summed) + * - HW RX hash (skb_set_hash) + * - RX ring dev queue index (skb_record_rx_queue) + */ + + if (meta->rx_timestamp) { + *skb_hwtstamps(skb) = (struct skb_shared_hwtstamps){ + .hwtstamp = ns_to_ktime(meta->rx_timestamp), + }; + } + + return true; +} +EXPORT_SYMBOL(skb_metadata_import_from_xdp); diff --git a/net/core/xdp.c b/net/core/xdp.c index 22f1e44700eb..ede9b1b987d9 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -368,6 +368,22 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model); +bool xdp_convert_skb_metadata(struct xdp_buff *xdp, struct sk_buff *skb) +{ + struct xdp_skb_metadata *meta; + u32 metalen; + + metalen = xdp->data - xdp->data_meta; + if (metalen) + skb_metadata_set(skb, metalen); + if (xdp_buff_has_skb_metadata(xdp)) { + meta = xdp->data_meta - sizeof(*meta); + return skb_metadata_import_from_xdp(skb, meta); + } + return false; +} +EXPORT_SYMBOL(xdp_convert_skb_metadata); + /* XDP RX runs under NAPI protection, and in different delivery error * scenarios (e.g. queue full), it is possible to return the xdp_frame * while still leveraging this protection. The @napi_direct boolean @@ -619,6 +635,7 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, { struct skb_shared_info *sinfo = xdp_get_shared_info_from_frame(xdpf); unsigned int headroom, frame_size; + struct xdp_skb_metadata *meta; void *hard_start; u8 nr_frags; @@ -653,11 +670,10 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, /* Essential SKB info: protocol and skb->dev */ skb->protocol = eth_type_trans(skb, dev); - /* Optional SKB info, currently missing: - * - HW checksum info (skb->ip_summed) - * - HW RX hash (skb_set_hash) - * - RX ring dev queue index (skb_record_rx_queue) - */ + if (xdpf->flags & XDP_FLAGS_HAS_SKB_METADATA) { + meta = xdpf->data - xdpf->metasize - sizeof(*meta); + skb_metadata_import_from_xdp(skb, meta); + } /* Until page_pool get SKB return path, release DMA here */ xdp_release_frame(xdpf); @@ -712,6 +728,14 @@ struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf) return nxdpf; } +/* For the packets directed to the kernel, this kfunc exports XDP metadata + * into skb context. + */ +noinline int bpf_xdp_metadata_export_to_skb(const struct xdp_md *ctx) +{ + return 0; +} + /* Indicates whether particular device supports rx_timestamp metadata. * This is an optional helper to support marking some branches as * "dead code" in the BPF programs. @@ -736,15 +760,126 @@ BTF_SET8_START_GLOBAL(xdp_metadata_kfunc_ids) XDP_METADATA_KFUNC_xxx #undef XDP_METADATA_KFUNC BTF_SET8_END(xdp_metadata_kfunc_ids) +EXPORT_SYMBOL(xdp_metadata_kfunc_ids); static const struct btf_kfunc_id_set xdp_metadata_kfunc_set = { .owner = THIS_MODULE, .set = &xdp_metadata_kfunc_ids, }; +/* Since we're not actually doing a call but instead rewriting + * in place, we can only afford to use R0-R5 scratch registers + * and hidden BPF_PUSH64/BPF_POP64 opcodes to spill to the stack. + */ +void xdp_metadata_export_to_skb(const struct bpf_prog *prog, struct bpf_patch *patch) +{ + u32 func_id; + + /* The code below generates the following: + * + * int bpf_xdp_metadata_export_to_skb(struct xdp_md *ctx) + * { + * struct xdp_skb_metadata *meta = ctx->data_meta - sizeof(*meta); + * int ret; + * + * if (ctx->flags & XDP_FLAGS_HAS_SKB_METADATA) + * return -1; + * + * if (meta < ctx->data_hard_start + sizeof(struct xdp_frame)) + * return -1; + * + * meta->rx_timestamp = bpf_xdp_metadata_rx_timestamp(ctx); + * ctx->flags |= BPF_F_XDP_HAS_METADATA; + * + * return 0; + * } + */ + + bpf_patch_append(patch, + BPF_MOV64_IMM(BPF_REG_0, -1), + + /* r2 = ((struct xdp_buff *)r1)->flags; */ + BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, flags), + BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_buff, flags)), + + /* r2 &= XDP_FLAGS_HAS_SKB_METADATA; */ + BPF_ALU64_IMM(BPF_AND, BPF_REG_2, XDP_FLAGS_HAS_SKB_METADATA), + + /* if (xdp_buff->flags & XDP_FLAGS_HAS_SKB_METADATA) return -1; */ + BPF_JMP_IMM(BPF_JNE, BPF_REG_2, 0, S16_MAX), + + /* r2 = ((struct xdp_buff *)r1)->data_meta; */ + BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_buff, data_meta)), + /* r2 -= sizeof(struct xdp_skb_metadata); */ + BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, + sizeof(struct xdp_skb_metadata)), + /* r3 = ((struct xdp_buff *)r1)->data_hard_start; */ + BPF_LDX_MEM(BPF_DW, BPF_REG_3, BPF_REG_1, + offsetof(struct xdp_buff, data_hard_start)), + /* r3 += sizeof(struct xdp_frame) */ + BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, + sizeof(struct xdp_frame)), + /* if (data_meta-sizeof(struct xdp_skb_metadata) < + * data_hard_start+sizeof(struct xdp_frame)) return -1; + */ + BPF_JMP_REG(BPF_JLT, BPF_REG_2, BPF_REG_3, S16_MAX), + + /* r2 = ((struct xdp_buff *)r1)->flags; */ + BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, flags), + BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_buff, flags)), + + /* r2 |= XDP_FLAGS_HAS_SKB_METADATA; */ + BPF_ALU64_IMM(BPF_OR, BPF_REG_2, XDP_FLAGS_HAS_SKB_METADATA), + + /* ((struct xdp_buff *)r1)->flags = r2; */ + BPF_STX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, flags), + BPF_REG_1, BPF_REG_2, + offsetof(struct xdp_buff, flags)), + + /* push r1 */ + BPF_PUSH64(BPF_REG_1), + ); + + /* r0 = bpf_xdp_metadata_rx_timestamp(ctx); */ + func_id = xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP); + prog->aux->xdp_kfunc_ndo->ndo_unroll_kfunc(prog, func_id, patch); + + bpf_patch_append(patch, + /* pop r1 */ + BPF_POP64(BPF_REG_1), + + /* r2 = ((struct xdp_buff *)r1)->data_meta; */ + BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, + offsetof(struct xdp_buff, data_meta)), + /* r2 -= sizeof(struct xdp_skb_metadata); */ + BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, + sizeof(struct xdp_skb_metadata)), + + /* *((struct xdp_skb_metadata *)r2)->rx_timestamp = r0; */ + BPF_STX_MEM(BPF_DW, BPF_REG_2, BPF_REG_0, + offsetof(struct xdp_skb_metadata, rx_timestamp)), + + /* return 0; */ + BPF_MOV64_IMM(BPF_REG_0, 0), + ); + + bpf_patch_resolve_jmp(patch); +} +EXPORT_SYMBOL(xdp_metadata_export_to_skb); + static int __init xdp_metadata_init(void) { return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &xdp_metadata_kfunc_set); } late_initcall(xdp_metadata_init); +#else +struct btf_id_set8 xdp_metadata_kfunc_ids = {}; +EXPORT_SYMBOL(xdp_metadata_kfunc_ids); +void xdp_metadata_export_to_skb(const struct bpf_prog *prog, struct bpf_patch *patch) +{ +} +EXPORT_SYMBOL(xdp_metadata_export_to_skb); #endif diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index b444b1118c4f..71e3bc7ad839 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -6116,6 +6116,12 @@ enum xdp_action { XDP_REDIRECT, }; +/* Subset of XDP metadata exported to skb context. + */ +struct xdp_skb_metadata { + __u64 rx_timestamp; +}; + /* user accessible metadata for XDP packet hook * new fields must be added to the end of this structure */ @@ -6128,6 +6134,7 @@ struct xdp_md { __u32 rx_queue_index; /* rxq->queue_index */ __u32 egress_ifindex; /* txq->dev->ifindex */ + __bpf_md_ptr(struct xdp_skb_metadata *, skb_metadata); }; /* DEVMAP map-value layout From patchwork Tue Nov 15 03:02:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stanislav Fomichev X-Patchwork-Id: 13043163 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4820C4332F for ; Tue, 15 Nov 2022 03:04:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237545AbiKODEJ (ORCPT ); Mon, 14 Nov 2022 22:04:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231379AbiKODDT (ORCPT ); Mon, 14 Nov 2022 22:03:19 -0500 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DCB9F5F50 for ; Mon, 14 Nov 2022 19:02:24 -0800 (PST) Received: by mail-pg1-x54a.google.com with SMTP id s16-20020a632c10000000b0047084b16f23so6725285pgs.7 for ; Mon, 14 Nov 2022 19:02:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=PDHEz68ZRjbhMtkALlLM8ImnpA5qxCPrjkRit5OXTAs=; b=Uqnmz9vl2zD8dTR6onHHWaSQoH6cU9sjQ3aGTeCDYdnjhK5A7QEuMRsUETdH37fYj3 CiYv+iyueCz+FKEzAFKKeqinpBJHYl/VYgGxXu4ms9dXONy23omvG0TRJ22YKvxFVz1J e0VlO7HoMkaYCbt4Lc4Re2TyZdKDkg3N9gMLVjnfOzMp+ms7d13mu2+6YqMAqAK2ja2g m/blsgpnW+zQZ7w8pw679AhnZ5Z/RgNpqyncLB0RjVZzKAjSyQl0kmXm8XiRNVuQjFrq /rqLtIAC1OPjFrIxfm2KUxcVoDkaZgCX7NgDN4VgP+gn03ej7kuuqRttjPzUjUBYpMtN ujQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PDHEz68ZRjbhMtkALlLM8ImnpA5qxCPrjkRit5OXTAs=; b=d6hAtDyBVF15aqroAm4/Vv72K3amGgnLZQa4BDwjtWGhrV/OZ19/gyMMb6A61GtLWp QMWgBNZUPqZc39wym21epweFai4Wyf54stSm3CGxMLReS5N0+jTkAOiZh7fdWJWr/i02 8AgLqyKIO6op6LZvYUa5UCBz3iKQMmvMy2G4SmaxiqBfjQ3tGwE6nRDpW+vqyz9z6upv EHCKNEeIbhDpHuJLeHyzMIjfsF+jmoxdkjoUZqLaAD1zgXTnF+E2JI9U+/UBgtHhXoth uMoOGdvltsLBbyf/OJlz9Ey+mmqxzk30D1+Bn/3dUo9SkvuHk7wEOR8Hx7Z6PwNG7p7Q mcOA== X-Gm-Message-State: ANoB5pkSF14kLTie8YO1ljiJOcxdbPiScrL0MpTzk8VLMe/nssC3Hhe7 Sz4b4zUy8gU/qQF9MLcm2LpEP5M= X-Google-Smtp-Source: AA0mqf4eLe9GaSlGQAbQoNWLPZ3h4iqAS1Uu4GjdxZ3BL9Dhrcu23cBwPudRVemOAekGSiakOMa02r0= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a17:902:d54c:b0:187:31da:a27e with SMTP id z12-20020a170902d54c00b0018731daa27emr2050969plf.111.1668481344402; Mon, 14 Nov 2022 19:02:24 -0800 (PST) Date: Mon, 14 Nov 2022 19:02:06 -0800 In-Reply-To: <20221115030210.3159213-1-sdf@google.com> Mime-Version: 1.0 References: <20221115030210.3159213-1-sdf@google.com> X-Mailer: git-send-email 2.38.1.431.g37b22c650d-goog Message-ID: <20221115030210.3159213-8-sdf@google.com> Subject: [PATCH bpf-next 07/11] selftests/bpf: Verify xdp_metadata xdp->af_xdp path From: Stanislav Fomichev To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, David Ahern , Jakub Kicinski , Willem de Bruijn , Jesper Dangaard Brouer , Anatoly Burakov , Alexander Lobakin , Magnus Karlsson , Maryam Tahhan , xdp-hints@xdp-project.net, netdev@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net - create new netns - create veth pair (veTX+veRX) - setup AF_XDP socket for both interfaces - attach bpf to veRX - send packet via veTX - verify the packet has expected metadata at veRX Cc: John Fastabend Cc: David Ahern Cc: Martin KaFai Lau Cc: Jakub Kicinski Cc: Willem de Bruijn Cc: Jesper Dangaard Brouer Cc: Anatoly Burakov Cc: Alexander Lobakin Cc: Magnus Karlsson Cc: Maryam Tahhan Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Signed-off-by: Stanislav Fomichev --- tools/testing/selftests/bpf/Makefile | 2 +- .../selftests/bpf/prog_tests/xdp_metadata.c | 359 ++++++++++++++++++ .../selftests/bpf/progs/xdp_metadata.c | 50 +++ 3 files changed, 410 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_metadata.c create mode 100644 tools/testing/selftests/bpf/progs/xdp_metadata.c diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index f3cd17026ee5..b645cf5a5021 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -523,7 +523,7 @@ TRUNNER_BPF_PROGS_DIR := progs TRUNNER_EXTRA_SOURCES := test_progs.c cgroup_helpers.c trace_helpers.c \ network_helpers.c testing_helpers.c \ btf_helpers.c flow_dissector_load.h \ - cap_helpers.c + cap_helpers.c xsk.c TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko \ $(OUTPUT)/liburandom_read.so \ $(OUTPUT)/xdp_synproxy \ diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c new file mode 100644 index 000000000000..c3321d8c7cd4 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c @@ -0,0 +1,359 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include "xdp_metadata.skel.h" +#include "xsk.h" + +#include +#include +#include +#include +#include +#include +#include + +#define TX_NAME "veTX" +#define RX_NAME "veRX" + +#define UDP_PAYLOAD_BYTES 4 + +#define AF_XDP_SOURCE_PORT 1234 +#define AF_XDP_CONSUMER_PORT 8080 + +#define UMEM_NUM 16 +#define UMEM_FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE +#define UMEM_SIZE (UMEM_FRAME_SIZE * UMEM_NUM) +#define METADATA_SIZE 8 +#define XDP_FLAGS XDP_FLAGS_DRV_MODE +#define QUEUE_ID 0 + +#define TX_ADDR "10.0.0.1" +#define RX_ADDR "10.0.0.2" +#define PREFIX_LEN "8" +#define FAMILY AF_INET + +#define SYS(cmd) ({ \ + if (!ASSERT_OK(system(cmd), (cmd))) \ + goto out; \ +}) + +struct xsk { + void *umem_area; + struct xsk_umem *umem; + struct xsk_ring_prod fill; + struct xsk_ring_cons comp; + struct xsk_ring_prod tx; + struct xsk_ring_cons rx; + struct xsk_socket *socket; +}; + +static int open_xsk(const char *ifname, struct xsk *xsk) +{ + int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE; + const struct xsk_socket_config socket_config = { + .rx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS, + .tx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS, + .libbpf_flags = XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD, + .xdp_flags = XDP_FLAGS, + .bind_flags = XDP_COPY, + }; + const struct xsk_umem_config umem_config = { + .fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS, + .comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS, + .frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE, + .flags = XDP_UMEM_UNALIGNED_CHUNK_FLAG, + }; + __u32 idx; + u64 addr; + int ret; + int i; + + xsk->umem_area = mmap(NULL, UMEM_SIZE, PROT_READ | PROT_WRITE, mmap_flags, -1, 0); + if (!ASSERT_NEQ(xsk->umem_area, MAP_FAILED, "mmap")) + return -1; + + ret = xsk_umem__create(&xsk->umem, + xsk->umem_area, UMEM_SIZE, + &xsk->fill, + &xsk->comp, + &umem_config); + if (!ASSERT_OK(ret, "xsk_umem__create")) + return ret; + + ret = xsk_socket__create(&xsk->socket, ifname, QUEUE_ID, + xsk->umem, + &xsk->rx, + &xsk->tx, + &socket_config); + if (!ASSERT_OK(ret, "xsk_socket__create")) + return ret; + + /* First half of umem is for TX. This way address matches 1-to-1 + * to the completion queue index. + */ + + for (i = 0; i < UMEM_NUM / 2; i++) { + addr = i * UMEM_FRAME_SIZE; + printf("%p: tx_desc[%d] -> %lx\n", xsk, i, addr); + } + + /* Second half of umem is for RX. */ + + ret = xsk_ring_prod__reserve(&xsk->fill, UMEM_NUM / 2, &idx); + if (!ASSERT_EQ(UMEM_NUM / 2, ret, "xsk_ring_prod__reserve")) + return ret; + if (!ASSERT_EQ(idx, 0, "fill idx != 0")) + return -1; + + for (i = 0; i < UMEM_NUM / 2; i++) { + addr = (UMEM_NUM / 2 + i) * UMEM_FRAME_SIZE; + printf("%p: rx_desc[%d] -> %lx\n", xsk, i, addr); + *xsk_ring_prod__fill_addr(&xsk->fill, i) = addr; + } + xsk_ring_prod__submit(&xsk->fill, ret); + + return 0; +} + +static void close_xsk(struct xsk *xsk) +{ + if (xsk->umem) + xsk_umem__delete(xsk->umem); + if (xsk->socket) + xsk_socket__delete(xsk->socket); + munmap(xsk->umem, UMEM_SIZE); +} + +static void ip_csum(struct iphdr *iph) +{ + __u32 sum = 0; + __u16 *p; + int i; + + iph->check = 0; + p = (void *)iph; + for (i = 0; i < sizeof(*iph) / sizeof(*p); i++) + sum += p[i]; + + while (sum >> 16) + sum = (sum & 0xffff) + (sum >> 16); + + iph->check = ~sum; +} + +static int generate_packet(struct xsk *xsk, __u16 dst_port) +{ + struct xdp_desc *tx_desc; + struct udphdr *udph; + struct ethhdr *eth; + struct iphdr *iph; + void *data; + __u32 idx; + int ret; + + ret = xsk_ring_prod__reserve(&xsk->tx, 1, &idx); + if (!ASSERT_EQ(ret, 1, "xsk_ring_prod__reserve")) + return -1; + + tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx); + tx_desc->addr = idx % (UMEM_NUM / 2) * UMEM_FRAME_SIZE; + printf("%p: tx_desc[%u]->addr=%llx\n", xsk, idx, tx_desc->addr); + data = xsk_umem__get_data(xsk->umem_area, tx_desc->addr); + + eth = data; + iph = (void *)(eth + 1); + udph = (void *)(iph + 1); + + memcpy(eth->h_dest, "\x00\x00\x00\x00\x00\x02", ETH_ALEN); + memcpy(eth->h_source, "\x00\x00\x00\x00\x00\x01", ETH_ALEN); + eth->h_proto = htons(ETH_P_IP); + + iph->version = 0x4; + iph->ihl = 0x5; + iph->tos = 0x9; + iph->tot_len = htons(sizeof(*iph) + sizeof(*udph) + UDP_PAYLOAD_BYTES); + iph->id = 0; + iph->frag_off = 0; + iph->ttl = 0; + iph->protocol = IPPROTO_UDP; + ASSERT_EQ(inet_pton(FAMILY, TX_ADDR, &iph->saddr), 1, "inet_pton(TX_ADDR)"); + ASSERT_EQ(inet_pton(FAMILY, RX_ADDR, &iph->daddr), 1, "inet_pton(RX_ADDR)"); + ip_csum(iph); + + udph->source = htons(AF_XDP_SOURCE_PORT); + udph->dest = htons(dst_port); + udph->len = htons(sizeof(*udph) + UDP_PAYLOAD_BYTES); + udph->check = 0; + + memset(udph + 1, 0xAA, UDP_PAYLOAD_BYTES); + + tx_desc->len = sizeof(*eth) + sizeof(*iph) + sizeof(*udph) + UDP_PAYLOAD_BYTES; + xsk_ring_prod__submit(&xsk->tx, 1); + + ret = sendto(xsk_socket__fd(xsk->socket), NULL, 0, MSG_DONTWAIT, NULL, 0); + if (!ASSERT_GE(ret, 0, "sendto")) + return ret; + + return 0; +} + +static void complete_tx(struct xsk *xsk) +{ + __u32 idx; + __u64 addr; + + if (ASSERT_EQ(xsk_ring_cons__peek(&xsk->comp, 1, &idx), 1, "xsk_ring_cons__peek")) { + addr = *xsk_ring_cons__comp_addr(&xsk->comp, idx); + + printf("%p: refill idx=%u addr=%llx\n", xsk, idx, addr); + *xsk_ring_prod__fill_addr(&xsk->fill, idx) = addr; + xsk_ring_prod__submit(&xsk->fill, 1); + } +} + +static void refill_rx(struct xsk *xsk, __u64 addr) +{ + __u32 idx; + + if (ASSERT_EQ(xsk_ring_prod__reserve(&xsk->fill, 1, &idx), 1, "xsk_ring_prod__reserve")) { + printf("%p: complete idx=%u addr=%llx\n", xsk, idx, addr); + *xsk_ring_prod__fill_addr(&xsk->fill, idx) = addr; + xsk_ring_prod__submit(&xsk->fill, 1); + } +} + +static int verify_xsk_metadata(struct xsk *xsk) +{ + const struct xdp_desc *rx_desc; + struct pollfd fds = {}; + struct ethhdr *eth; + struct iphdr *iph; + __u64 comp_addr; + void *data_meta; + void *data; + __u64 addr; + __u32 idx; + int ret; + + ret = recvfrom(xsk_socket__fd(xsk->socket), NULL, 0, MSG_DONTWAIT, NULL, NULL); + if (!ASSERT_EQ(ret, 0, "recvfrom")) + return -1; + + fds.fd = xsk_socket__fd(xsk->socket); + fds.events = POLLIN; + + ret = poll(&fds, 1, 1000); + if (!ASSERT_GT(ret, 0, "poll")) + return -1; + + ret = xsk_ring_cons__peek(&xsk->rx, 1, &idx); + if (!ASSERT_EQ(ret, 1, "xsk_ring_cons__peek")) + return -2; + + rx_desc = xsk_ring_cons__rx_desc(&xsk->rx, idx); + comp_addr = xsk_umem__extract_addr(rx_desc->addr); + addr = xsk_umem__add_offset_to_addr(rx_desc->addr); + printf("%p: rx_desc[%u]->addr=%llx addr=%llx comp_addr=%llx\n", + xsk, idx, rx_desc->addr, addr, comp_addr); + data = xsk_umem__get_data(xsk->umem_area, addr); + + /* Make sure we got the packet offset correctly. */ + + eth = data; + ASSERT_EQ(eth->h_proto, htons(ETH_P_IP), "eth->h_proto"); + iph = (void *)(eth + 1); + ASSERT_EQ((int)iph->version, 4, "iph->version"); + + data_meta = data - METADATA_SIZE; + + if (*(__u64 *)data_meta == 0) + return -1; + + xsk_ring_cons__release(&xsk->rx, 1); + refill_rx(xsk, comp_addr); + + return 0; +} + +void test_xdp_metadata(void) +{ + struct xdp_metadata *bpf_obj = NULL; + struct nstoken *tok = NULL; + __u32 queue_id = QUEUE_ID; + struct bpf_program *prog; + struct xsk tx_xsk = {}; + struct xsk rx_xsk = {}; + int rx_ifindex; + int sock_fd; + int ret; + + /* Setup new networking namespace, with a veth pair. */ + + SYS("ip netns add xdp_metadata"); + tok = open_netns("xdp_metadata"); + SYS("ip link add numtxqueues 1 numrxqueues 1 " TX_NAME + " type veth peer " RX_NAME " numtxqueues 1 numrxqueues 1"); + SYS("ip link set dev " TX_NAME " address 00:00:00:00:00:01"); + SYS("ip link set dev " RX_NAME " address 00:00:00:00:00:02"); + SYS("ip link set dev " TX_NAME " up"); + SYS("ip link set dev " RX_NAME " up"); + SYS("ip addr add " TX_ADDR "/" PREFIX_LEN " dev " TX_NAME); + SYS("ip addr add " RX_ADDR "/" PREFIX_LEN " dev " RX_NAME); + + rx_ifindex = if_nametoindex(RX_NAME); + + /* Setup separate AF_XDP for TX and RX interfaces. */ + + ret = open_xsk(TX_NAME, &tx_xsk); + if (!ASSERT_OK(ret, "open_xsk(TX_NAME)")) + goto out; + + ret = open_xsk(RX_NAME, &rx_xsk); + if (!ASSERT_OK(ret, "open_xsk(RX_NAME)")) + goto out; + + /* Attach BPF program to RX interface. */ + + bpf_obj = xdp_metadata__open(); + if (!ASSERT_OK_PTR(bpf_obj, "open skeleton")) + goto out; + + prog = bpf_object__find_program_by_name(bpf_obj->obj, "rx"); + bpf_program__set_ifindex(prog, rx_ifindex); + bpf_program__set_flags(prog, BPF_F_XDP_HAS_METADATA); + + if (!ASSERT_OK(xdp_metadata__load(bpf_obj), "load skeleton")) + goto out; + + ret = bpf_xdp_attach(rx_ifindex, + bpf_program__fd(bpf_obj->progs.rx), + XDP_FLAGS, NULL); + if (!ASSERT_GE(ret, 0, "bpf_xdp_attach")) + goto out; + + sock_fd = xsk_socket__fd(rx_xsk.socket); + ret = bpf_map_update_elem(bpf_map__fd(bpf_obj->maps.xsk), &queue_id, &sock_fd, 0); + if (!ASSERT_GE(ret, 0, "bpf_map_update_elem")) + goto out; + + /* Send packet destined to RX AF_XDP socket. */ + if (!ASSERT_GE(generate_packet(&tx_xsk, AF_XDP_CONSUMER_PORT), 0, + "generate AF_XDP_CONSUMER_PORT")) + goto out; + + /* Verify AF_XDP RX packet has proper metadata. */ + if (!ASSERT_GE(verify_xsk_metadata(&rx_xsk), 0, + "verify_xsk_metadata")) + goto out; + + complete_tx(&tx_xsk); + +out: + close_xsk(&rx_xsk); + close_xsk(&tx_xsk); + if (bpf_obj) + xdp_metadata__destroy(bpf_obj); + system("ip netns del xdp_metadata"); + if (tok) + close_netns(tok); +} diff --git a/tools/testing/selftests/bpf/progs/xdp_metadata.c b/tools/testing/selftests/bpf/progs/xdp_metadata.c new file mode 100644 index 000000000000..bdde17961ab6 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/xdp_metadata.c @@ -0,0 +1,50 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include +#include + +#include +#include + +struct { + __uint(type, BPF_MAP_TYPE_XSKMAP); + __uint(max_entries, 4); + __type(key, __u32); + __type(value, __u32); +} xsk SEC(".maps"); + +extern int bpf_xdp_metadata_rx_timestamp_supported(const struct xdp_md *ctx) __ksym; +extern const __u64 bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx) __ksym; + +SEC("xdp") +int rx(struct xdp_md *ctx) +{ + void *data, *data_meta; + int ret; + + if (bpf_xdp_metadata_rx_timestamp_supported(ctx)) { + __u64 rx_timestamp = bpf_xdp_metadata_rx_timestamp(ctx); + + if (rx_timestamp) { + ret = bpf_xdp_adjust_meta(ctx, -(int)sizeof(rx_timestamp)); + if (ret != 0) + return XDP_DROP; + + data = (void *)(long)ctx->data; + data_meta = (void *)(long)ctx->data_meta; + + if (data_meta + sizeof(rx_timestamp) > data) + return XDP_DROP; + + *(__u64 *)data_meta = rx_timestamp; + } + } + + return bpf_redirect_map(&xsk, ctx->rx_queue_index, XDP_PASS); +} + +char _license[] SEC("license") = "GPL"; From patchwork Tue Nov 15 03:02:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stanislav Fomichev X-Patchwork-Id: 13043164 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87E48C4321E for ; Tue, 15 Nov 2022 03:04:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237572AbiKODEL (ORCPT ); Mon, 14 Nov 2022 22:04:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46688 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232248AbiKODDW (ORCPT ); Mon, 14 Nov 2022 22:03:22 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D23C5632C for ; Mon, 14 Nov 2022 19:02:26 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id b17-20020a25b851000000b006e32b877068so2509277ybm.16 for ; Mon, 14 Nov 2022 19:02:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=CV+ux64Yz/p8PbqeYErWoR4W7lQkYoTRnhtVk1Wew7Y=; b=eUncWoSj5O1GYNCspMAYqmCjK/FWf1UZw3zoAORdmuZG7QTquFDzO6zWKsUJcKj+H+ pbxqA5XdhYr2KQ5991WjlPJShCYX1uDmCvK1tCMMdE3PTZ27vdmaqIXEBnq6u8fm28kH KhGRK9rLIP3L9JmAvuJjFWLytbn5Hg6eSQICJG4AFEu6HbHJJlaPl3IJWTPWPHRoNgdH rdB+CNOgjj+ybf5otV9GOUs/EQVKAWKLNoxXyIqqA207BrLhpJW+h3ZfuSd3ZR07RmNX OW23rVd7+lihzHMW85LBAZ9+606kFJTxUcOLHwKAWLM+3G6WD3OrlaGft1cQsmWx6EI9 beOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CV+ux64Yz/p8PbqeYErWoR4W7lQkYoTRnhtVk1Wew7Y=; b=kLlD9dHOuCjzjPbS1th6kDNsk0Jri96VgMXiU7b3LxPKuuKhJBoCqEa74F/DlRS8N1 7DEZrvwVL0cBiTe2Zxb1dgeU8G2NmyIjszlChxhwlLzHqLkaoMEiSJ7ll1YO8Wycsdq3 4h+R9sJBekHjytUPfExM5xiAUQEWwRNxM9kop2m0X3qbdrcwkGR/kicU/neycS1dL8qT GGukzX696RKMx8EP1iBpKBTX5xYwgknr3TLK3bAxEm532Ev6MiqB5hX8UrbR73hUmafr 43CxEs7TRInUXxFoUcFRQA1U4YqmEVavK3onwADxSZt4Ih0EJLr/3iMqe6X9cp6NLP0K dp/Q== X-Gm-Message-State: ANoB5pmBicnhx6ZguFAIVUnoCturfNO2NSKsRkJYB7cMk69njkE0yjuZ FLILEnBeYuC2k8RyRuKicZFOFdw= X-Google-Smtp-Source: AA0mqf4fBbV0SFVoUuVo9AbyfvFGdXZcqlX0LNz/GVLq7+TTC3UyY1hvvh/e/EnW2IRoxnckAqvzkgc= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a81:ac06:0:b0:356:d0ed:6a79 with SMTP id k6-20020a81ac06000000b00356d0ed6a79mr15488016ywh.489.1668481346098; Mon, 14 Nov 2022 19:02:26 -0800 (PST) Date: Mon, 14 Nov 2022 19:02:07 -0800 In-Reply-To: <20221115030210.3159213-1-sdf@google.com> Mime-Version: 1.0 References: <20221115030210.3159213-1-sdf@google.com> X-Mailer: git-send-email 2.38.1.431.g37b22c650d-goog Message-ID: <20221115030210.3159213-9-sdf@google.com> Subject: [PATCH bpf-next 08/11] selftests/bpf: Verify xdp_metadata xdp->skb path From: Stanislav Fomichev To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, David Ahern , Jakub Kicinski , Willem de Bruijn , Jesper Dangaard Brouer , Anatoly Burakov , Alexander Lobakin , Magnus Karlsson , Maryam Tahhan , xdp-hints@xdp-project.net, netdev@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net - divert 9081 UDP traffic to the kernel - call bpf_xdp_metadata_export_to_skb for such packets - the kernel should fill in hwtstamp - verify that the received packet has non-zero hwtstamp Cc: John Fastabend Cc: David Ahern Cc: Martin KaFai Lau Cc: Jakub Kicinski Cc: Willem de Bruijn Cc: Jesper Dangaard Brouer Cc: Anatoly Burakov Cc: Alexander Lobakin Cc: Magnus Karlsson Cc: Maryam Tahhan Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Signed-off-by: Stanislav Fomichev --- tools/testing/selftests/bpf/DENYLIST.s390x | 1 + .../selftests/bpf/prog_tests/xdp_metadata.c | 81 +++++++++++++++++++ .../selftests/bpf/progs/xdp_metadata.c | 64 +++++++++++++++ 3 files changed, 146 insertions(+) diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x index be4e3d47ea3e..2fa350c8ab42 100644 --- a/tools/testing/selftests/bpf/DENYLIST.s390x +++ b/tools/testing/selftests/bpf/DENYLIST.s390x @@ -78,4 +78,5 @@ xdp_adjust_tail # case-128 err 0 errno 28 retval 1 size xdp_bonding # failed to auto-attach program 'trace_on_entry': -524 (trampoline) xdp_bpf2bpf # failed to auto-attach program 'trace_on_entry': -524 (trampoline) xdp_do_redirect # prog_run_max_size unexpected error: -22 (errno 22) +xdp_metadata # JIT does not support push/pop opcodes (jit) xdp_synproxy # JIT does not support calling kernel function (kfunc) diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c index c3321d8c7cd4..b67a4dcfca6e 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c @@ -19,6 +19,7 @@ #define AF_XDP_SOURCE_PORT 1234 #define AF_XDP_CONSUMER_PORT 8080 +#define SOCKET_CONSUMER_PORT 9081 #define UMEM_NUM 16 #define UMEM_FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE @@ -275,6 +276,61 @@ static int verify_xsk_metadata(struct xsk *xsk) return 0; } +static void timestamping_enable(int fd, int val) +{ + int ret; + + ret = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val)); + ASSERT_OK(ret, "setsockopt(SO_TIMESTAMPING)"); +} + +static int verify_skb_metadata(int fd) +{ + char cmsg_buf[1024]; + char packet_buf[128]; + + struct scm_timestamping *ts; + struct iovec packet_iov; + struct cmsghdr *cmsg; + struct msghdr hdr; + bool found_hwtstamp = false; + + memset(&hdr, 0, sizeof(hdr)); + hdr.msg_iov = &packet_iov; + hdr.msg_iovlen = 1; + packet_iov.iov_base = packet_buf; + packet_iov.iov_len = sizeof(packet_buf); + + hdr.msg_control = cmsg_buf; + hdr.msg_controllen = sizeof(cmsg_buf); + + if (ASSERT_GE(recvmsg(fd, &hdr, 0), 0, "recvmsg")) { + for (cmsg = CMSG_FIRSTHDR(&hdr); cmsg != NULL; + cmsg = CMSG_NXTHDR(&hdr, cmsg)) { + + if (cmsg->cmsg_level != SOL_SOCKET) + continue; + + switch (cmsg->cmsg_type) { + case SCM_TIMESTAMPING: + ts = (struct scm_timestamping *)CMSG_DATA(cmsg); + if (ts->ts[2].tv_sec || ts->ts[2].tv_nsec) { + found_hwtstamp = true; + break; + } + break; + default: + break; + } + } + } + + if (!ASSERT_EQ(found_hwtstamp, true, "no hwtstamp!")) + return -1; + + return 0; +} + void test_xdp_metadata(void) { struct xdp_metadata *bpf_obj = NULL; @@ -283,6 +339,7 @@ void test_xdp_metadata(void) struct bpf_program *prog; struct xsk tx_xsk = {}; struct xsk rx_xsk = {}; + int rx_udp_fd = -1; int rx_ifindex; int sock_fd; int ret; @@ -299,6 +356,8 @@ void test_xdp_metadata(void) SYS("ip link set dev " RX_NAME " up"); SYS("ip addr add " TX_ADDR "/" PREFIX_LEN " dev " TX_NAME); SYS("ip addr add " RX_ADDR "/" PREFIX_LEN " dev " RX_NAME); + SYS("sysctl -q net.ipv4.ip_forward=1"); + SYS("sysctl -q net.ipv4.conf.all.accept_local=1"); rx_ifindex = if_nametoindex(RX_NAME); @@ -312,6 +371,15 @@ void test_xdp_metadata(void) if (!ASSERT_OK(ret, "open_xsk(RX_NAME)")) goto out; + /* Setup UPD listener for RX interface. */ + + rx_udp_fd = start_server(FAMILY, SOCK_DGRAM, NULL, SOCKET_CONSUMER_PORT, 1000); + if (!ASSERT_GE(rx_udp_fd, 0, "start_server")) + goto out; + timestamping_enable(rx_udp_fd, + SOF_TIMESTAMPING_SOFTWARE | + SOF_TIMESTAMPING_RAW_HARDWARE); + /* Attach BPF program to RX interface. */ bpf_obj = xdp_metadata__open(); @@ -348,9 +416,22 @@ void test_xdp_metadata(void) complete_tx(&tx_xsk); + /* Send packet destined to RX UDP socket. */ + if (!ASSERT_GE(generate_packet(&tx_xsk, SOCKET_CONSUMER_PORT), 0, + "generate SOCKET_CONSUMER_PORT")) + goto out; + + /* Verify SKB RX packet has proper metadata. */ + if (!ASSERT_GE(verify_skb_metadata(rx_udp_fd), 0, + "verify_skb_metadata")) + goto out; + + complete_tx(&tx_xsk); + out: close_xsk(&rx_xsk); close_xsk(&tx_xsk); + close(rx_udp_fd); if (bpf_obj) xdp_metadata__destroy(bpf_obj); system("ip netns del xdp_metadata"); diff --git a/tools/testing/selftests/bpf/progs/xdp_metadata.c b/tools/testing/selftests/bpf/progs/xdp_metadata.c index bdde17961ab6..805178f55743 100644 --- a/tools/testing/selftests/bpf/progs/xdp_metadata.c +++ b/tools/testing/selftests/bpf/progs/xdp_metadata.c @@ -17,15 +17,79 @@ struct { __type(value, __u32); } xsk SEC(".maps"); +extern int bpf_xdp_metadata_export_to_skb(const struct xdp_md *ctx) __ksym; extern int bpf_xdp_metadata_rx_timestamp_supported(const struct xdp_md *ctx) __ksym; extern const __u64 bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx) __ksym; SEC("xdp") int rx(struct xdp_md *ctx) { + struct xdp_skb_metadata *skb_metadata; void *data, *data_meta; + struct ethhdr *eth = NULL; + struct udphdr *udp = NULL; + struct iphdr *iph = NULL; + void *data_end; int ret; + /* Exercise xdp -> skb metadata path by diverting some traffic + * into the kernel (UDP destination port 9081). + */ + + data = (void *)(long)ctx->data; + data_end = (void *)(long)ctx->data_end; + eth = data; + if (eth + 1 < data_end) { + if (eth->h_proto == bpf_htons(ETH_P_IP)) { + iph = (void *)(eth + 1); + if (iph + 1 < data_end && iph->protocol == IPPROTO_UDP) + udp = (void *)(iph + 1); + } + if (udp && udp + 1 > data_end) + udp = NULL; + } + if (udp && udp->dest == bpf_htons(9081)) { + bpf_printk("exporting metadata to skb for UDP port 9081"); + + if (bpf_xdp_metadata_export_to_skb(ctx) < 0) { + bpf_printk("bpf_xdp_metadata_export_to_skb failed"); + return XDP_DROP; + } + + /* Make sure metadata can't be adjusted after a call + * to bpf_xdp_metadata_export_to_skb(). + */ + + ret = bpf_xdp_adjust_meta(ctx, -4); + if (ret == 0) { + bpf_printk("bpf_xdp_adjust_meta -4 after bpf_xdp_metadata_export_to_skb succeeded"); + return XDP_DROP; + } + + /* Make sure calling bpf_xdp_metadata_export_to_skb() + * second time is a no-op. + */ + + if (bpf_xdp_metadata_export_to_skb(ctx) == 0) { + bpf_printk("bpf_xdp_metadata_export_to_skb succeeded 2nd time"); + return XDP_DROP; + } + + skb_metadata = ctx->skb_metadata; + if (!skb_metadata) { + bpf_printk("no ctx->skb_metadata"); + return XDP_DROP; + } + + if (!skb_metadata->rx_timestamp) { + bpf_printk("no skb_metadata->rx_timestamp"); + return XDP_DROP; + } + + /*return bpf_redirect(ifindex, BPF_F_INGRESS);*/ + return XDP_PASS; + } + if (bpf_xdp_metadata_rx_timestamp_supported(ctx)) { __u64 rx_timestamp = bpf_xdp_metadata_rx_timestamp(ctx); From patchwork Tue Nov 15 03:02:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stanislav Fomichev X-Patchwork-Id: 13043166 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3417C4332F for ; Tue, 15 Nov 2022 03:04:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232263AbiKODEP (ORCPT ); Mon, 14 Nov 2022 22:04:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46546 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237699AbiKODDW (ORCPT ); Mon, 14 Nov 2022 22:03:22 -0500 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6F98C65DD for ; Mon, 14 Nov 2022 19:02:28 -0800 (PST) Received: by mail-pf1-x449.google.com with SMTP id e12-20020a62aa0c000000b0056c12c0aadeso7100987pff.21 for ; Mon, 14 Nov 2022 19:02:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=LmbtQXfRONLBh7ffBd4bAoho6T1l6iyjip9uOPJH4mk=; b=kP0k0rKgZGTa23eCpMmPwgD6iD6Y7SS3SqPHN7QTRHGrrG6fGLjZOU2ILC66u7e6LX cuj12UmH2OL+y4eh9T8zWI8saqe4fJBWKAndXkYmEcA4tc+Xn4eBidmrscgjO8jqrfxT o/AkvLhuQF7+YgVHN/KQfpLzsW9RqT38Z4tGPDxoKNmu+kFuJEHBuikZqchc2cenu2bv bc7/gcu7mY4aYZIHta/NEWjsJHzzqw0Bfnkgk0OvgLa+j2VCl4DQFOCOUw5xFhTyUmNT gU5KL0dk3lyjWtDitYOfuYynw/H1al1YA0nX8t+1WzU4qG10gBcmgP9IQBvnvCFGyNYa G+Xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LmbtQXfRONLBh7ffBd4bAoho6T1l6iyjip9uOPJH4mk=; b=evgZx7veYq99xc9NqDy+UWFHva4U7/Isls3vgo/lGdLpOBinxuPies1cueAyxW1C7k s6g8pPmHS6qRqZ5J7x+QiS0B+fQQWGN2saqHeYf2+HipCkQpOpX27vTGvtopQ54oTfgP lih4M3N5KkUM5HFjp5THNygyW4ysDoFZZA9M1Q955jWkBufRS1kAgC6MjgBBJQLBk98b 8OXIRRo4yQJ14IZhOqc2ny8JSoaNlg59ThzAeTf36ikhiRO7h9bHtBeNAiLL9YXrT0wd 9blkgPeTlGjcBF67memb4Pqry/6TtkUfoAil+CzcEpsOKuaKP2AFWPMoHinhjCRxUAAc wAWA== X-Gm-Message-State: ANoB5pk2npE6eFxRnLo3a4qzcHXM6b2zxdlPKh/PbvQEKEJP/J8/igss lNSCByJenU6C7h2T5kdDUWacHPs= X-Google-Smtp-Source: AA0mqf70hAU4ASIwlaf/FEHEOOF7u4nv6TjYjGO8hH7My4K1/VEbWVdQ2/1qx5/bMXZq5W2F82/xbX0= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a62:3683:0:b0:56d:dd2a:c494 with SMTP id d125-20020a623683000000b0056ddd2ac494mr16321177pfa.76.1668481347866; Mon, 14 Nov 2022 19:02:27 -0800 (PST) Date: Mon, 14 Nov 2022 19:02:08 -0800 In-Reply-To: <20221115030210.3159213-1-sdf@google.com> Mime-Version: 1.0 References: <20221115030210.3159213-1-sdf@google.com> X-Mailer: git-send-email 2.38.1.431.g37b22c650d-goog Message-ID: <20221115030210.3159213-10-sdf@google.com> Subject: [PATCH bpf-next 09/11] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff From: Stanislav Fomichev To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, David Ahern , Jakub Kicinski , Willem de Bruijn , Jesper Dangaard Brouer , Anatoly Burakov , Alexander Lobakin , Magnus Karlsson , Maryam Tahhan , xdp-hints@xdp-project.net, netdev@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net No functional changes. Boilerplate to allow stuffing more data after xdp_buff. Cc: John Fastabend Cc: David Ahern Cc: Martin KaFai Lau Cc: Jakub Kicinski Cc: Willem de Bruijn Cc: Jesper Dangaard Brouer Cc: Anatoly Burakov Cc: Alexander Lobakin Cc: Magnus Karlsson Cc: Maryam Tahhan Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Signed-off-by: Stanislav Fomichev --- drivers/net/ethernet/mellanox/mlx4/en_rx.c | 26 +++++++++++++--------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index 8f762fc170b3..467356633172 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -661,17 +661,21 @@ static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va, #define MLX4_CQE_STATUS_IP_ANY (MLX4_CQE_STATUS_IPV4) #endif +struct mlx4_xdp_buff { + struct xdp_buff xdp; +}; + int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int budget) { struct mlx4_en_priv *priv = netdev_priv(dev); int factor = priv->cqe_factor; struct mlx4_en_rx_ring *ring; + struct mlx4_xdp_buff mxbuf; struct bpf_prog *xdp_prog; int cq_ring = cq->ring; bool doorbell_pending; bool xdp_redir_flush; struct mlx4_cqe *cqe; - struct xdp_buff xdp; int polled = 0; int index; @@ -681,7 +685,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud ring = priv->rx_ring[cq_ring]; xdp_prog = rcu_dereference_bh(ring->xdp_prog); - xdp_init_buff(&xdp, priv->frag_info[0].frag_stride, &ring->xdp_rxq); + xdp_init_buff(&mxbuf.xdp, priv->frag_info[0].frag_stride, &ring->xdp_rxq); doorbell_pending = false; xdp_redir_flush = false; @@ -776,24 +780,24 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud priv->frag_info[0].frag_size, DMA_FROM_DEVICE); - xdp_prepare_buff(&xdp, va - frags[0].page_offset, + xdp_prepare_buff(&mxbuf.xdp, va - frags[0].page_offset, frags[0].page_offset, length, false); - orig_data = xdp.data; + orig_data = mxbuf.xdp.data; - act = bpf_prog_run_xdp(xdp_prog, &xdp); + act = bpf_prog_run_xdp(xdp_prog, &mxbuf.xdp); - length = xdp.data_end - xdp.data; - if (xdp.data != orig_data) { - frags[0].page_offset = xdp.data - - xdp.data_hard_start; - va = xdp.data; + length = mxbuf.xdp.data_end - mxbuf.xdp.data; + if (mxbuf.xdp.data != orig_data) { + frags[0].page_offset = mxbuf.xdp.data - + mxbuf.xdp.data_hard_start; + va = mxbuf.xdp.data; } switch (act) { case XDP_PASS: break; case XDP_REDIRECT: - if (likely(!xdp_do_redirect(dev, &xdp, xdp_prog))) { + if (likely(!xdp_do_redirect(dev, &mxbuf.xdp, xdp_prog))) { ring->xdp_redirect++; xdp_redir_flush = true; frags[0].page = NULL; From patchwork Tue Nov 15 03:02:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stanislav Fomichev X-Patchwork-Id: 13043165 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02CD8C43219 for ; Tue, 15 Nov 2022 03:04:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237668AbiKODEM (ORCPT ); Mon, 14 Nov 2022 22:04:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46698 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237673AbiKODDW (ORCPT ); Mon, 14 Nov 2022 22:03:22 -0500 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 351A9DE8F for ; Mon, 14 Nov 2022 19:02:30 -0800 (PST) Received: by mail-pl1-x64a.google.com with SMTP id q10-20020a170902f34a00b00186c5448b01so10321506ple.4 for ; Mon, 14 Nov 2022 19:02:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=NHpPVwYWmY+x6hcd+ABfU57UeOB/gPiQsU+3Z5UY1i0=; b=B9GEj8ItHuvei7OLJh6liahQyF41gxO9r0YmDRezNhhJW0B4LRqBdiIpdLtKSAKJ03 bck/3wtWk9S5kVmWsH1Y/X/cO55m8mCUy9roh3KrjHhfMgHjKHanRgEUDFk2hXS5o9HD T8fiPRw91w2DPpiCdHicSzCzSb3G90FEfXhpXam1H0h7gV4eGPTn+EOnXOEgX/GyCRnA MUzwph2rjV5NPKaRb6/QcBwQOGAYqRKmEh5sGsOaS4N884wBgVzWhF4znQ0gEyOvKWEn QbLQn9i+uKTBoSekSpaVAi9Ju+3QijVzSMDS3wQw1Wf3627H4hTfa4y79GUxuh6l2TJ9 dD0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=NHpPVwYWmY+x6hcd+ABfU57UeOB/gPiQsU+3Z5UY1i0=; b=VflUhidn4Npw3I6dBysBwZBrANCiv8Sxn3i2U+bGRMgDNB98egpkdXBdAlJWlyOQMt yEc0hpWcVAb3LEzFTmQ/fQ1NFrJM7Ni8VAAhn4G3AYWlGZoEYu2F/q3s3lgNXMLvmAp6 3r8sbhfpP7TOq5XJNe0nL74YPe32stxfNlhv81JOZCHUwde+D1BEKTqNUFNmAEjHWpt3 Wka968JumtJlxE6HTCn+tFe7c3YpbDPrGrJggusTriH73QNnSdQz1vikPGldd55cLnLn qIyfImP77c81VrhzryPRj0QenoD0OK+huD7ZGL845bhsm5tr92f2aLDbtmyky/Bxo+Su sR3Q== X-Gm-Message-State: ANoB5pkYO1ASfw4w0OormTILUDbjAQkT17EX8SYY/W1ymATJ0ywqpfjC 7C6N2dV1u/uw57a6N/rAwOJySs0= X-Google-Smtp-Source: AA0mqf6ORNAZumHiasQNZjxf2BgibiTbhO3jBp4ygnQ4Uev1N2egl3O8Td8sJSlKU+ljI8s3RHruyz4= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a17:90a:9a81:b0:218:499b:bee9 with SMTP id e1-20020a17090a9a8100b00218499bbee9mr62315pjp.171.1668481349692; Mon, 14 Nov 2022 19:02:29 -0800 (PST) Date: Mon, 14 Nov 2022 19:02:09 -0800 In-Reply-To: <20221115030210.3159213-1-sdf@google.com> Mime-Version: 1.0 References: <20221115030210.3159213-1-sdf@google.com> X-Mailer: git-send-email 2.38.1.431.g37b22c650d-goog Message-ID: <20221115030210.3159213-11-sdf@google.com> Subject: [PATCH bpf-next 10/11] mxl4: Support rx timestamp metadata for xdp From: Stanislav Fomichev To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, David Ahern , Jakub Kicinski , Willem de Bruijn , Jesper Dangaard Brouer , Anatoly Burakov , Alexander Lobakin , Magnus Karlsson , Maryam Tahhan , xdp-hints@xdp-project.net, netdev@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Support rx timestamp metadata. Also use xdp_skb metadata upon XDP_PASS when available (to avoid double work; but note, this supports rx_timestamp only for now). Cc: John Fastabend Cc: David Ahern Cc: Martin KaFai Lau Cc: Jakub Kicinski Cc: Willem de Bruijn Cc: Jesper Dangaard Brouer Cc: Anatoly Burakov Cc: Alexander Lobakin Cc: Magnus Karlsson Cc: Maryam Tahhan Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Signed-off-by: Stanislav Fomichev --- .../net/ethernet/mellanox/mlx4/en_netdev.c | 2 + drivers/net/ethernet/mellanox/mlx4/en_rx.c | 42 ++++++++++++++++++- include/linux/mlx4/device.h | 7 ++++ 3 files changed, 50 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c index 8800d3f1f55c..9489476bab8f 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c @@ -2855,6 +2855,7 @@ static const struct net_device_ops mlx4_netdev_ops = { .ndo_features_check = mlx4_en_features_check, .ndo_set_tx_maxrate = mlx4_en_set_tx_maxrate, .ndo_bpf = mlx4_xdp, + .ndo_unroll_kfunc = mlx4_unroll_kfunc, }; static const struct net_device_ops mlx4_netdev_ops_master = { @@ -2887,6 +2888,7 @@ static const struct net_device_ops mlx4_netdev_ops_master = { .ndo_features_check = mlx4_en_features_check, .ndo_set_tx_maxrate = mlx4_en_set_tx_maxrate, .ndo_bpf = mlx4_xdp, + .ndo_unroll_kfunc = mlx4_unroll_kfunc, }; struct mlx4_en_bond { diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index 467356633172..722a4d56e0b0 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -33,6 +33,7 @@ #include #include +#include #include #include #include @@ -663,8 +664,39 @@ static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va, struct mlx4_xdp_buff { struct xdp_buff xdp; + struct mlx4_cqe *cqe; + struct mlx4_en_dev *mdev; }; +u64 mxl4_xdp_rx_timestamp(struct mlx4_xdp_buff *ctx) +{ + unsigned int seq; + u64 timestamp; + u64 nsec; + + timestamp = mlx4_en_get_cqe_ts(ctx->cqe); + + do { + seq = read_seqbegin(&ctx->mdev->clock_lock); + nsec = timecounter_cyc2time(&ctx->mdev->clock, timestamp); + } while (read_seqretry(&ctx->mdev->clock_lock, seq)); + + return ns_to_ktime(nsec); +} + +void mlx4_unroll_kfunc(const struct bpf_prog *prog, u32 func_id, + struct bpf_patch *patch) +{ + if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_EXPORT_TO_SKB)) { + return xdp_metadata_export_to_skb(prog, patch); + } else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP_SUPPORTED)) { + /* return true; */ + bpf_patch_append(patch, BPF_MOV64_IMM(BPF_REG_0, 1)); + } else if (func_id == xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP)) { + bpf_patch_append(patch, BPF_EMIT_CALL(mxl4_xdp_rx_timestamp)); + } +} + int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int budget) { struct mlx4_en_priv *priv = netdev_priv(dev); @@ -781,8 +813,12 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud DMA_FROM_DEVICE); xdp_prepare_buff(&mxbuf.xdp, va - frags[0].page_offset, - frags[0].page_offset, length, false); + frags[0].page_offset, length, true); orig_data = mxbuf.xdp.data; + if (unlikely(ring->hwtstamp_rx_filter == HWTSTAMP_FILTER_ALL)) { + mxbuf.cqe = cqe; + mxbuf.mdev = priv->mdev; + } act = bpf_prog_run_xdp(xdp_prog, &mxbuf.xdp); @@ -835,6 +871,9 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud if (unlikely(!skb)) goto next; + if (xdp_convert_skb_metadata(&mxbuf.xdp, skb)) + goto skip_metadata; + if (unlikely(ring->hwtstamp_rx_filter == HWTSTAMP_FILTER_ALL)) { u64 timestamp = mlx4_en_get_cqe_ts(cqe); @@ -895,6 +934,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021AD), be16_to_cpu(cqe->sl_vid)); +skip_metadata: nr = mlx4_en_complete_rx_desc(priv, frags, skb, length); if (likely(nr)) { skb_shinfo(skb)->nr_frags = nr; diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 6646634a0b9d..a0e4d490b2fb 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -1585,4 +1585,11 @@ static inline int mlx4_get_num_reserved_uar(struct mlx4_dev *dev) /* The first 128 UARs are used for EQ doorbells */ return (128 >> (PAGE_SHIFT - dev->uar_page_shift)); } + +struct bpf_prog; +struct bpf_insn; +struct bpf_patch; + +void mlx4_unroll_kfunc(const struct bpf_prog *prog, u32 func_id, + struct bpf_patch *patch); #endif /* MLX4_DEVICE_H */ From patchwork Tue Nov 15 03:02:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stanislav Fomichev X-Patchwork-Id: 13043169 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23FD0C433FE for ; Tue, 15 Nov 2022 03:04:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237874AbiKODEM (ORCPT ); Mon, 14 Nov 2022 22:04:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46708 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237702AbiKODDW (ORCPT ); Mon, 14 Nov 2022 22:03:22 -0500 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A5B615731 for ; Mon, 14 Nov 2022 19:02:32 -0800 (PST) Received: by mail-pf1-x449.google.com with SMTP id bq9-20020a056a000e0900b00571802a2eaaso6460326pfb.22 for ; Mon, 14 Nov 2022 19:02:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=G+qAGOGJeiSRN/c9VTgDhqmHnVCRAYQw0nUJ4nC0eLc=; b=QYILD6UjxDDYBtywFRA4u5O6cZXK4U4oNiop/Bu2MOj3FUfblryFcTEXh4hK4azjEO lkLGEgkUtaoW66d5xcDme7F1IpZ1Y5L1/W/D0nYrCQYYbYZT3KQQAwZlPwUNUOWFh/tl 54aOj2R34n5JHBcwnN3sQ0CGAGRg1ZK2jPPIOkKQt82XmtfY9r6W9EKf/H3DhnJG5aE/ yAAqP9mhjfvRKbpdhrJ/XbMuhTFNHvBiWSeH88t/LYprccU3upeT3I4CtZ3yzf5fQ4vv cpWYuPCyTmrNvEqitWGXD8ikk5pzjIxuLrM3TpDR+qZ9KVn0aj+QBZ6POhQQB2ZniADY 35uA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=G+qAGOGJeiSRN/c9VTgDhqmHnVCRAYQw0nUJ4nC0eLc=; b=K9UhB5TW3ViY63AazpRbdV6m1fb4G62iHtW8rhMVd+jNlZimpYtGOVZs9TUSqHTGTZ QqnIz536DKrt7fXN5Xj0/5ioBRnmoMbmJpFQ9TmGf45GCdF39+gJyalxSXYTJNdHdwry 2JxO5XpH2JqgLTSnUElVL1sySk0l5DHbzAt8RpFzHYIOtcJOMZE2K9IsxnBXYidIcSss wV0VzdzPK7O7kOFt91hvgiUTPwNp7/p3+Oy+2ryFt6tztwS3tOTOn0PGJW9AOLt/V06K gYuOcF+n0TXnHF5Cz9nceD5i4Q4crR4WZlxXkNzWzJdaZeRZrcZDwSIEGsO7n0mdm+ZA iPVg== X-Gm-Message-State: ANoB5pmYZgx1LTQn3yFI3CZas5dVR8izazZP/lttsuNndcWJLb6i5b1d GuFV9MBaFNatHl2JuzRq2FxytxHx/EBc91jYInselGaWo/PbsIqAXHES0lfmY3kgHotHnkGghMe diGT0mSzi2vHqXlq6qgtLjIlp8FLLTIZnrV/Uu2ZXLMair5GsMA== X-Google-Smtp-Source: AA0mqf52/firdP/8HuCdLe1fFEb6+300WlxL47Uyq+gvDuZiNP2+ZuoGm/5/uFAzEib7pf+bXDH9DiM= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a17:90a:9503:b0:20a:eab5:cf39 with SMTP id t3-20020a17090a950300b0020aeab5cf39mr74647pjo.1.1668481351496; Mon, 14 Nov 2022 19:02:31 -0800 (PST) Date: Mon, 14 Nov 2022 19:02:10 -0800 In-Reply-To: <20221115030210.3159213-1-sdf@google.com> Mime-Version: 1.0 References: <20221115030210.3159213-1-sdf@google.com> X-Mailer: git-send-email 2.38.1.431.g37b22c650d-goog Message-ID: <20221115030210.3159213-12-sdf@google.com> Subject: [PATCH bpf-next 11/11] selftests/bpf: Simple program to dump XDP RX metadata From: Stanislav Fomichev To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net To be used for verification of driver implementations: $ xdp_hw_metadata On the other machine: $ echo -n xdp | nc -u -q1 9091 # for AF_XDP $ echo -n skb | nc -u -q1 9092 # for skb Sample output: # xdp xsk_ring_cons__peek: 1 0x19f9090: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000 rx_timestamp_supported: 1 rx_timestamp: 1667850075063948829 0x19f9090: complete idx=8 addr=8000 # skb found skb hwtstamp = 1668314052.854274681 Decoding: # xdp rx_timestamp=1667850075.063948829 $ date -d @1667850075 Mon Nov 7 11:41:15 AM PST 2022 $ date Mon Nov 7 11:42:05 AM PST 2022 # skb $ date -d @1668314052 Sat Nov 12 08:34:12 PM PST 2022 $ date Sat Nov 12 08:37:06 PM PST 2022 Signed-off-by: Stanislav Fomichev --- tools/testing/selftests/bpf/.gitignore | 1 + tools/testing/selftests/bpf/Makefile | 6 +- .../selftests/bpf/progs/xdp_hw_metadata.c | 99 +++++ tools/testing/selftests/bpf/xdp_hw_metadata.c | 404 ++++++++++++++++++ tools/testing/selftests/bpf/xdp_hw_metadata.h | 6 + 5 files changed, 515 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/progs/xdp_hw_metadata.c create mode 100644 tools/testing/selftests/bpf/xdp_hw_metadata.c create mode 100644 tools/testing/selftests/bpf/xdp_hw_metadata.h diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore index 07d2d0a8c5cb..01e3baeefd4f 100644 --- a/tools/testing/selftests/bpf/.gitignore +++ b/tools/testing/selftests/bpf/.gitignore @@ -46,3 +46,4 @@ test_cpp xskxceiver xdp_redirect_multi xdp_synproxy +xdp_hw_metadata diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index b645cf5a5021..74d6ed307157 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -83,7 +83,7 @@ TEST_PROGS_EXTENDED := with_addr.sh \ TEST_GEN_PROGS_EXTENDED = test_sock_addr test_skb_cgroup_id_user \ flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \ test_lirc_mode2_user xdping test_cpp runqslower bench bpf_testmod.ko \ - xskxceiver xdp_redirect_multi xdp_synproxy veristat + xskxceiver xdp_redirect_multi xdp_synproxy veristat xdp_hw_metadata TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read $(OUTPUT)/sign-file TEST_GEN_FILES += liburandom_read.so @@ -241,6 +241,9 @@ $(OUTPUT)/test_maps: $(TESTING_HELPERS) $(OUTPUT)/test_verifier: $(TESTING_HELPERS) $(CAP_HELPERS) $(OUTPUT)/xsk.o: $(BPFOBJ) $(OUTPUT)/xskxceiver: $(OUTPUT)/xsk.o +$(OUTPUT)/xdp_hw_metadata: $(OUTPUT)/xsk.o $(OUTPUT)/xdp_hw_metadata.skel.h +$(OUTPUT)/xdp_hw_metadata: $(OUTPUT)/network_helpers.o +$(OUTPUT)/xdp_hw_metadata: LDFLAGS += -static BPFTOOL ?= $(DEFAULT_BPFTOOL) $(DEFAULT_BPFTOOL): $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile) \ @@ -379,6 +382,7 @@ linked_maps.skel.h-deps := linked_maps1.bpf.o linked_maps2.bpf.o test_subskeleton.skel.h-deps := test_subskeleton_lib2.bpf.o test_subskeleton_lib.bpf.o test_subskeleton.bpf.o test_subskeleton_lib.skel.h-deps := test_subskeleton_lib2.bpf.o test_subskeleton_lib.bpf.o test_usdt.skel.h-deps := test_usdt.bpf.o test_usdt_multispec.bpf.o +xdp_hw_metadata.skel.h-deps := xdp_hw_metadata.bpf.o LINKED_BPF_SRCS := $(patsubst %.bpf.o,%.c,$(foreach skel,$(LINKED_SKELS),$($(skel)-deps))) diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c new file mode 100644 index 000000000000..549ec3b1f3a0 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c @@ -0,0 +1,99 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include +#include + +#include +#include + +#include "xdp_hw_metadata.h" + +struct { + __uint(type, BPF_MAP_TYPE_XSKMAP); + __uint(max_entries, 256); + __type(key, __u32); + __type(value, __u32); +} xsk SEC(".maps"); + +extern int bpf_xdp_metadata_export_to_skb(const struct xdp_md *ctx) __ksym; +extern int bpf_xdp_metadata_rx_timestamp_supported(const struct xdp_md *ctx) __ksym; +extern const __u64 bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx) __ksym; + +SEC("xdp") +int rx(struct xdp_md *ctx) +{ + void *data, *data_meta, *data_end; + struct ipv6hdr *ip6h = NULL; + struct ethhdr *eth = NULL; + struct udphdr *udp = NULL; + struct xsk_metadata *meta; + struct iphdr *iph = NULL; + int ret; + + data = (void *)(long)ctx->data; + data_end = (void *)(long)ctx->data_end; + eth = data; + if (eth + 1 < data_end) { + if (eth->h_proto == bpf_htons(ETH_P_IP)) { + iph = (void *)(eth + 1); + if (iph + 1 < data_end && iph->protocol == IPPROTO_UDP) + udp = (void *)(iph + 1); + } + if (eth->h_proto == bpf_htons(ETH_P_IPV6)) { + ip6h = (void *)(eth + 1); + if (ip6h + 1 < data_end && ip6h->nexthdr == IPPROTO_UDP) + udp = (void *)(ip6h + 1); + } + if (udp && udp + 1 > data_end) + udp = NULL; + } + + if (!udp) + return XDP_PASS; + + if (udp->dest == bpf_htons(9092)) { + bpf_printk("forwarding UDP:9092 to socket listener"); + + if (!bpf_xdp_metadata_export_to_skb(ctx)) { + bpf_printk("bpf_xdp_metadata_export_to_skb failed"); + return XDP_DROP; + } + + return XDP_PASS; + } + + if (udp->dest != bpf_htons(9091)) + return XDP_PASS; + + bpf_printk("forwarding UDP:9091 to AF_XDP"); + + ret = bpf_xdp_adjust_meta(ctx, -(int)sizeof(struct xsk_metadata)); + if (ret != 0) { + bpf_printk("bpf_xdp_adjust_meta returned %d", ret); + return XDP_PASS; + } + + data = (void *)(long)ctx->data; + data_meta = (void *)(long)ctx->data_meta; + meta = data_meta; + + if (meta + 1 > data) { + bpf_printk("bpf_xdp_adjust_meta doesn't appear to work"); + return XDP_PASS; + } + + + if (bpf_xdp_metadata_rx_timestamp_supported(ctx)) { + meta->rx_timestamp_supported = 1; + meta->rx_timestamp = bpf_xdp_metadata_rx_timestamp(ctx); + bpf_printk("populated rx_timestamp with %u", meta->rx_timestamp); + } + + return bpf_redirect_map(&xsk, ctx->rx_queue_index, XDP_PASS); +} + +char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c new file mode 100644 index 000000000000..a043e9ef5691 --- /dev/null +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c @@ -0,0 +1,404 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* Reference program for verifying XDP metadata on real HW. Functional test + * only, doesn't test the performance. + * + * RX: + * - UDP 9091 packets are diverted into AF_XDP + * - Metadata verified: + * - rx_timestamp + * + * TX: + * - TBD + */ + +#include +#include +#include "xdp_hw_metadata.skel.h" +#include "xsk.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "xdp_hw_metadata.h" + +#define UMEM_NUM 16 +#define UMEM_FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE +#define UMEM_SIZE (UMEM_FRAME_SIZE * UMEM_NUM) +#define XDP_FLAGS (XDP_FLAGS_DRV_MODE | XDP_FLAGS_REPLACE) + +struct xsk { + void *umem_area; + struct xsk_umem *umem; + struct xsk_ring_prod fill; + struct xsk_ring_cons comp; + struct xsk_ring_prod tx; + struct xsk_ring_cons rx; + struct xsk_socket *socket; +}; + +struct xdp_hw_metadata *bpf_obj; +struct xsk *rx_xsk; +const char *ifname; +int ifindex; +int rxq; + +void test__fail(void) { /* for network_helpers.c */ } + +static int open_xsk(const char *ifname, struct xsk *xsk, __u32 queue_id) +{ + int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE; + const struct xsk_socket_config socket_config = { + .rx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS, + .tx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS, + .libbpf_flags = XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD, + .xdp_flags = XDP_FLAGS, + .bind_flags = XDP_COPY, + }; + const struct xsk_umem_config umem_config = { + .fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS, + .comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS, + .frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE, + .flags = XDP_UMEM_UNALIGNED_CHUNK_FLAG, + }; + __u32 idx; + u64 addr; + int ret; + int i; + + xsk->umem_area = mmap(NULL, UMEM_SIZE, PROT_READ | PROT_WRITE, mmap_flags, -1, 0); + if (xsk->umem_area == MAP_FAILED) + return -ENOMEM; + + ret = xsk_umem__create(&xsk->umem, + xsk->umem_area, UMEM_SIZE, + &xsk->fill, + &xsk->comp, + &umem_config); + if (ret) + return ret; + + ret = xsk_socket__create(&xsk->socket, ifname, queue_id, + xsk->umem, + &xsk->rx, + &xsk->tx, + &socket_config); + if (ret) + return ret; + + /* First half of umem is for TX. This way address matches 1-to-1 + * to the completion queue index. + */ + + for (i = 0; i < UMEM_NUM / 2; i++) { + addr = i * UMEM_FRAME_SIZE; + printf("%p: tx_desc[%d] -> %lx\n", xsk, i, addr); + } + + /* Second half of umem is for RX. */ + + ret = xsk_ring_prod__reserve(&xsk->fill, UMEM_NUM / 2, &idx); + for (i = 0; i < UMEM_NUM / 2; i++) { + addr = (UMEM_NUM / 2 + i) * UMEM_FRAME_SIZE; + printf("%p: rx_desc[%d] -> %lx\n", xsk, i, addr); + *xsk_ring_prod__fill_addr(&xsk->fill, i) = addr; + } + xsk_ring_prod__submit(&xsk->fill, ret); + + return 0; +} + +static void close_xsk(struct xsk *xsk) +{ + if (xsk->umem) + xsk_umem__delete(xsk->umem); + if (xsk->socket) + xsk_socket__delete(xsk->socket); + munmap(xsk->umem, UMEM_SIZE); +} + +static void refill_rx(struct xsk *xsk, __u64 addr) +{ + __u32 idx; + + if (xsk_ring_prod__reserve(&xsk->fill, 1, &idx) == 1) { + printf("%p: complete idx=%u addr=%llx\n", xsk, idx, addr); + *xsk_ring_prod__fill_addr(&xsk->fill, idx) = addr; + xsk_ring_prod__submit(&xsk->fill, 1); + } +} + +static void verify_xdp_metadata(void *data) +{ + struct xsk_metadata *meta; + + meta = data - sizeof(*meta); + + printf("rx_timestamp_supported: %u\n", meta->rx_timestamp_supported); + printf("rx_timestamp: %llu\n", meta->rx_timestamp); +} + +static void verify_skb_metadata(int fd) +{ + char cmsg_buf[1024]; + char packet_buf[128]; + + struct scm_timestamping *ts; + struct iovec packet_iov; + struct cmsghdr *cmsg; + struct msghdr hdr; + + memset(&hdr, 0, sizeof(hdr)); + hdr.msg_iov = &packet_iov; + hdr.msg_iovlen = 1; + packet_iov.iov_base = packet_buf; + packet_iov.iov_len = sizeof(packet_buf); + + hdr.msg_control = cmsg_buf; + hdr.msg_controllen = sizeof(cmsg_buf); + + if (recvmsg(fd, &hdr, 0) < 0) + error(-1, errno, "recvmsg"); + + for (cmsg = CMSG_FIRSTHDR(&hdr); cmsg != NULL; + cmsg = CMSG_NXTHDR(&hdr, cmsg)) { + + if (cmsg->cmsg_level != SOL_SOCKET) + continue; + + switch (cmsg->cmsg_type) { + case SCM_TIMESTAMPING: + ts = (struct scm_timestamping *)CMSG_DATA(cmsg); + if (ts->ts[2].tv_sec || ts->ts[2].tv_nsec) { + printf("found skb hwtstamp = %lu.%lu\n", + ts->ts[2].tv_sec, ts->ts[2].tv_nsec); + return; + } + break; + default: + break; + } + } + + printf("skb hwtstamp is not found!\n"); +} + +static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd) +{ + const struct xdp_desc *rx_desc; + struct pollfd fds[rxq + 1]; + __u64 comp_addr; + __u64 addr; + __u32 idx; + int ret; + int i; + + for (i = 0; i < rxq; i++) { + fds[i].fd = xsk_socket__fd(rx_xsk[i].socket); + fds[i].events = POLLIN; + fds[i].revents = 0; + } + + fds[rxq].fd = server_fd; + fds[rxq].events = POLLIN; + fds[rxq].revents = 0; + + while (true) { + errno = 0; + ret = poll(fds, rxq + 1, 1000); + printf("poll: %d (%d)\n", ret, errno); + if (ret < 0) + break; + if (ret == 0) + continue; + + if (fds[rxq].revents) + verify_skb_metadata(server_fd); + + for (i = 0; i < rxq; i++) { + if (fds[i].revents == 0) + continue; + + struct xsk *xsk = &rx_xsk[i]; + + ret = xsk_ring_cons__peek(&xsk->rx, 1, &idx); + printf("xsk_ring_cons__peek: %d\n", ret); + if (ret != 1) + continue; + + rx_desc = xsk_ring_cons__rx_desc(&xsk->rx, idx); + comp_addr = xsk_umem__extract_addr(rx_desc->addr); + addr = xsk_umem__add_offset_to_addr(rx_desc->addr); + printf("%p: rx_desc[%u]->addr=%llx addr=%llx comp_addr=%llx\n", + xsk, idx, rx_desc->addr, addr, comp_addr); + verify_xdp_metadata(xsk_umem__get_data(xsk->umem_area, addr)); + xsk_ring_cons__release(&xsk->rx, 1); + refill_rx(xsk, comp_addr); + } + } + + return 0; +} + +struct ethtool_channels { + __u32 cmd; + __u32 max_rx; + __u32 max_tx; + __u32 max_other; + __u32 max_combined; + __u32 rx_count; + __u32 tx_count; + __u32 other_count; + __u32 combined_count; +}; + +#define ETHTOOL_GCHANNELS 0x0000003c /* Get no of channels */ + +static int rxq_num(const char *ifname) +{ + struct ethtool_channels ch = { + .cmd = ETHTOOL_GCHANNELS, + }; + + struct ifreq ifr = { + .ifr_data = (void *)&ch, + }; + strcpy(ifr.ifr_name, ifname); + int fd, ret; + + fd = socket(AF_UNIX, SOCK_DGRAM, 0); + if (fd < 0) + error(-1, errno, "socket"); + + ret = ioctl(fd, SIOCETHTOOL, &ifr); + if (ret < 0) + error(-1, errno, "socket"); + + close(fd); + + return ch.rx_count; +} + +static void cleanup(void) +{ + LIBBPF_OPTS(bpf_xdp_attach_opts, opts); + int ret; + int i; + + if (bpf_obj) { + opts.old_prog_fd = bpf_program__fd(bpf_obj->progs.rx); + if (opts.old_prog_fd >= 0) { + printf("detaching bpf program....\n"); + ret = bpf_xdp_detach(ifindex, XDP_FLAGS, &opts); + if (ret) + printf("failed to detach XDP program: %d\n", ret); + } + } + + for (i = 0; i < rxq; i++) + close_xsk(&rx_xsk[i]); + + if (bpf_obj) + xdp_hw_metadata__destroy(bpf_obj); +} + +static void handle_signal(int sig) +{ + /* interrupting poll() is all we need */ +} + +static void timestamping_enable(int fd, int val) +{ + int ret; + + ret = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val)); + if (ret < 0) + error(-1, errno, "setsockopt(SO_TIMESTAMPING)"); +} + +int main(int argc, char *argv[]) +{ + int server_fd = -1; + int ret; + int i; + + struct bpf_program *prog; + + if (argc != 2) { + fprintf(stderr, "pass device name\n"); + return -1; + } + + ifname = argv[1]; + ifindex = if_nametoindex(ifname); + rxq = rxq_num(ifname); + + printf("rxq: %d\n", rxq); + + rx_xsk = malloc(sizeof(struct xsk) * rxq); + if (!rx_xsk) + error(-1, ENOMEM, "malloc"); + + for (i = 0; i < rxq; i++) { + printf("open_xsk(%s, %p, %d)\n", ifname, &rx_xsk[i], i); + ret = open_xsk(ifname, &rx_xsk[i], i); + if (ret) + error(-1, -ret, "open_xsk"); + + printf("xsk_socket__fd() -> %d\n", xsk_socket__fd(rx_xsk[i].socket)); + } + + printf("open bpf program...\n"); + bpf_obj = xdp_hw_metadata__open(); + if (libbpf_get_error(bpf_obj)) + error(-1, libbpf_get_error(bpf_obj), "xdp_hw_metadata__open"); + + prog = bpf_object__find_program_by_name(bpf_obj->obj, "rx"); + bpf_program__set_ifindex(prog, ifindex); + bpf_program__set_flags(prog, BPF_F_XDP_HAS_METADATA); + + printf("load bpf program...\n"); + ret = xdp_hw_metadata__load(bpf_obj); + if (ret) + error(-1, -ret, "xdp_hw_metadata__load"); + + printf("prepare skb endpoint...\n"); + server_fd = start_server(AF_INET6, SOCK_DGRAM, NULL, 9092, 1000); + if (server_fd < 0) + error(-1, errno, "start_server"); + timestamping_enable(server_fd, + SOF_TIMESTAMPING_SOFTWARE | + SOF_TIMESTAMPING_RAW_HARDWARE); + + printf("prepare xsk map...\n"); + for (i = 0; i < rxq; i++) { + int sock_fd = xsk_socket__fd(rx_xsk[i].socket); + __u32 queue_id = i; + + printf("map[%d] = %d\n", queue_id, sock_fd); + ret = bpf_map_update_elem(bpf_map__fd(bpf_obj->maps.xsk), &queue_id, &sock_fd, 0); + if (ret) + error(-1, -ret, "bpf_map_update_elem"); + } + + printf("attach bpf program...\n"); + ret = bpf_xdp_attach(ifindex, + bpf_program__fd(bpf_obj->progs.rx), + XDP_FLAGS, NULL); + if (ret) + error(-1, -ret, "bpf_xdp_attach"); + + signal(SIGINT, handle_signal); + ret = verify_metadata(rx_xsk, rxq, server_fd); + close(server_fd); + cleanup(); + if (ret) + error(-1, -ret, "verify_metadata"); +} diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.h b/tools/testing/selftests/bpf/xdp_hw_metadata.h new file mode 100644 index 000000000000..b4580015ee93 --- /dev/null +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.h @@ -0,0 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +struct xsk_metadata { + __u32 rx_timestamp_supported:1; + __u64 rx_timestamp; +};