From patchwork Tue Nov 16 07:37:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ciara Loftus X-Patchwork-Id: 12621661 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D297EC433F5 for ; Tue, 16 Nov 2021 07:38:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B6F8561BF8 for ; Tue, 16 Nov 2021 07:38:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230341AbhKPHl0 (ORCPT ); Tue, 16 Nov 2021 02:41:26 -0500 Received: from mga11.intel.com ([192.55.52.93]:42900 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230232AbhKPHlY (ORCPT ); Tue, 16 Nov 2021 02:41:24 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10169"; a="231099046" X-IronPort-AV: E=Sophos;i="5.87,238,1631602800"; d="scan'208";a="231099046" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Nov 2021 23:38:28 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,238,1631602800"; d="scan'208";a="671857302" Received: from silpixa00401086.ir.intel.com (HELO localhost.localdomain) ([10.55.129.110]) by orsmga005.jf.intel.com with ESMTP; 15 Nov 2021 23:38:25 -0800 From: Ciara Loftus To: netdev@vger.kernel.org, bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net, kuba@kernel.org, hawk@kernel.org, john.fastabend@gmail.com, toke@redhat.com, bjorn@kernel.org, magnus.karlsson@intel.com, jonathan.lemon@gmail.com, maciej.fijalkowski@intel.com, Ciara Loftus Subject: [RFC PATCH bpf-next 1/8] xsk: add struct xdp_sock to netdev_rx_queue Date: Tue, 16 Nov 2021 07:37:35 +0000 Message-Id: <20211116073742.7941-2-ciara.loftus@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211116073742.7941-1-ciara.loftus@intel.com> References: <20211116073742.7941-1-ciara.loftus@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Storing a reference to the XDP socket in the netdev_rx_queue structure makes a single socket accessible without requiring a lookup in the XSKMAP. A future commit will introduce the XDP_REDIRECT_XSK action which indicates to use this reference instead of performing the lookup. Since an rx ring is required for redirection, only store the reference if an rx ring is configured. When multiple sockets exist for a given context (netdev, qid), a reference is not stored because in this case we fallback to the default behavior of using the XSKMAP to redirect the packets. Signed-off-by: Ciara Loftus --- include/linux/netdevice.h | 2 ++ net/xdp/xsk.c | 34 ++++++++++++++++++++++++++++++++++ 2 files changed, 36 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 3ec42495a43a..1ad2491f0391 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -736,6 +736,8 @@ struct netdev_rx_queue { struct net_device *dev; #ifdef CONFIG_XDP_SOCKETS struct xsk_buff_pool *pool; + struct xdp_sock *xsk; + refcount_t xsk_refcnt; #endif } ____cacheline_aligned_in_smp; diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index f16074eb53c7..94ee524b9ca8 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -728,6 +728,30 @@ static void xsk_unbind_dev(struct xdp_sock *xs) /* Wait for driver to stop using the xdp socket. */ xp_del_xsk(xs->pool, xs); + if (xs->rx) { + if (refcount_read(&dev->_rx[xs->queue_id].xsk_refcnt) == 1) { + refcount_set(&dev->_rx[xs->queue_id].xsk_refcnt, 0); + WRITE_ONCE(xs->dev->_rx[xs->queue_id].xsk, NULL); + } else { + refcount_dec(&dev->_rx[xs->queue_id].xsk_refcnt); + /* If the refcnt returns to one again store the reference to the + * remaining socket in the netdev_rx_queue. + */ + if (refcount_read(&dev->_rx[xs->queue_id].xsk_refcnt) == 1) { + struct net *net = dev_net(dev); + struct xdp_sock *xsk; + struct sock *sk; + + mutex_lock(&net->xdp.lock); + sk = sk_head(&net->xdp.list); + xsk = xdp_sk(sk); + mutex_lock(&xsk->mutex); + WRITE_ONCE(xs->dev->_rx[xs->queue_id].xsk, xsk); + mutex_unlock(&xsk->mutex); + mutex_unlock(&net->xdp.lock); + } + } + } xs->dev = NULL; synchronize_net(); dev_put(dev); @@ -972,6 +996,16 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) xs->queue_id = qid; xp_add_xsk(xs->pool, xs); + if (xs->rx) { + if (refcount_read(&dev->_rx[xs->queue_id].xsk_refcnt) == 0) { + WRITE_ONCE(dev->_rx[qid].xsk, xs); + refcount_set(&dev->_rx[qid].xsk_refcnt, 1); + } else { + refcount_inc(&dev->_rx[qid].xsk_refcnt); + WRITE_ONCE(dev->_rx[qid].xsk, NULL); + } + } + out_unlock: if (err) { dev_put(dev); From patchwork Tue Nov 16 07:37:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ciara Loftus X-Patchwork-Id: 12621663 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EFF7C433EF for ; Tue, 16 Nov 2021 07:38:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E940861BF6 for ; Tue, 16 Nov 2021 07:38:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230376AbhKPHlb (ORCPT ); Tue, 16 Nov 2021 02:41:31 -0500 Received: from mga11.intel.com ([192.55.52.93]:42900 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230232AbhKPHl1 (ORCPT ); Tue, 16 Nov 2021 02:41:27 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10169"; a="231099049" X-IronPort-AV: E=Sophos;i="5.87,238,1631602800"; d="scan'208";a="231099049" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Nov 2021 23:38:31 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,238,1631602800"; d="scan'208";a="671857311" Received: from silpixa00401086.ir.intel.com (HELO localhost.localdomain) ([10.55.129.110]) by orsmga005.jf.intel.com with ESMTP; 15 Nov 2021 23:38:28 -0800 From: Ciara Loftus To: netdev@vger.kernel.org, bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net, kuba@kernel.org, hawk@kernel.org, john.fastabend@gmail.com, toke@redhat.com, bjorn@kernel.org, magnus.karlsson@intel.com, jonathan.lemon@gmail.com, maciej.fijalkowski@intel.com, Ciara Loftus Subject: [RFC PATCH bpf-next 2/8] bpf: add bpf_redirect_xsk helper and XDP_REDIRECT_XSK action Date: Tue, 16 Nov 2021 07:37:36 +0000 Message-Id: <20211116073742.7941-3-ciara.loftus@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211116073742.7941-1-ciara.loftus@intel.com> References: <20211116073742.7941-1-ciara.loftus@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Add a new XDP redirect helper called bpf_redirect_xsk which simply returns the new XDP_REDIRECT_XSK action if the xsk refcnt for the netdev_rx_queue is equal to one. Checking this value verifies that the AF_XDP socket Rx ring is configured and there is exactly one xsk attached to the queue. XDP_REDIRECT_XSK indicates to the driver that the XSKMAP lookup can be skipped and the pointer to the socket to redirect to can instead be retrieved from the netdev_rx_queue on which the packet was received. If the aforementioned conditions are not met, fallback to the behavior of xdp_redirect_map which returns XDP_REDIRECT for a successful XSKMAP lookup. Signed-off-by: Ciara Loftus --- include/uapi/linux/bpf.h | 13 +++++++++++++ kernel/bpf/verifier.c | 7 ++++++- net/core/filter.c | 22 ++++++++++++++++++++++ 3 files changed, 41 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 6297eafdc40f..a33cc63c8e6f 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -4957,6 +4957,17 @@ union bpf_attr { * **-ENOENT** if *task->mm* is NULL, or no vma contains *addr*. * **-EBUSY** if failed to try lock mmap_lock. * **-EINVAL** for invalid **flags**. + * + * long bpf_redirect_xsk(void *ctx, struct bpf_map *map, u32 key, u64 flags) + * Description + * Redirect the packet to the XDP socket associated with the netdev queue if + * the socket has an rx ring configured and is the only socket attached to the + * queue. Fall back to bpf_redirect_map behavior if either condition is not met. + * Return + * **XDP_REDIRECT_XSK** if successful. + * + * **XDP_REDIRECT** if the fall back was successful, or the value of the + * two lower bits of the *flags* argument on error */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5140,6 +5151,7 @@ union bpf_attr { FN(skc_to_unix_sock), \ FN(kallsyms_lookup_name), \ FN(find_vma), \ + FN(redirect_xsk), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper @@ -5520,6 +5532,7 @@ enum xdp_action { XDP_PASS, XDP_TX, XDP_REDIRECT, + XDP_REDIRECT_XSK, }; /* user accessible metadata for XDP packet hook diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index d31a031ab377..59a973f43965 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -5526,7 +5526,8 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, break; case BPF_MAP_TYPE_XSKMAP: if (func_id != BPF_FUNC_redirect_map && - func_id != BPF_FUNC_map_lookup_elem) + func_id != BPF_FUNC_map_lookup_elem && + func_id != BPF_FUNC_redirect_xsk) goto error; break; case BPF_MAP_TYPE_ARRAY_OF_MAPS: @@ -5629,6 +5630,10 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, map->map_type != BPF_MAP_TYPE_XSKMAP) goto error; break; + case BPF_FUNC_redirect_xsk: + if (map->map_type != BPF_MAP_TYPE_XSKMAP) + goto error; + break; case BPF_FUNC_sk_redirect_map: case BPF_FUNC_msg_redirect_map: case BPF_FUNC_sock_map_update: diff --git a/net/core/filter.c b/net/core/filter.c index 46f09a8fba20..4497ad046790 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4140,6 +4140,26 @@ static const struct bpf_func_proto bpf_xdp_redirect_map_proto = { .arg3_type = ARG_ANYTHING, }; +BPF_CALL_4(bpf_xdp_redirect_xsk, struct xdp_buff *, xdp, struct bpf_map *, map, + u32, ifindex, u64, flags) +{ +#ifdef CONFIG_XDP_SOCKETS + if (likely(refcount_read(&xdp->rxq->dev->_rx[xdp->rxq->queue_index].xsk_refcnt) == 1)) + return XDP_REDIRECT_XSK; +#endif + return map->ops->map_redirect(map, ifindex, flags); +} + +static const struct bpf_func_proto bpf_xdp_redirect_xsk_proto = { + .func = bpf_xdp_redirect_xsk, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_CONST_MAP_PTR, + .arg3_type = ARG_ANYTHING, + .arg4_type = ARG_ANYTHING, +}; + static unsigned long bpf_skb_copy(void *dst_buff, const void *skb, unsigned long off, unsigned long len) { @@ -7469,6 +7489,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_xdp_redirect_proto; case BPF_FUNC_redirect_map: return &bpf_xdp_redirect_map_proto; + case BPF_FUNC_redirect_xsk: + return &bpf_xdp_redirect_xsk_proto; case BPF_FUNC_xdp_adjust_tail: return &bpf_xdp_adjust_tail_proto; case BPF_FUNC_fib_lookup: From patchwork Tue Nov 16 07:37:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ciara Loftus X-Patchwork-Id: 12621665 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58F6BC433FE for ; Tue, 16 Nov 2021 07:38:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3CD2363214 for ; Tue, 16 Nov 2021 07:38:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230378AbhKPHlb (ORCPT ); Tue, 16 Nov 2021 02:41:31 -0500 Received: from mga11.intel.com ([192.55.52.93]:42900 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230356AbhKPHla (ORCPT ); Tue, 16 Nov 2021 02:41:30 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10169"; a="231099053" X-IronPort-AV: E=Sophos;i="5.87,238,1631602800"; d="scan'208";a="231099053" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Nov 2021 23:38:34 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,238,1631602800"; d="scan'208";a="671857324" Received: from silpixa00401086.ir.intel.com (HELO localhost.localdomain) ([10.55.129.110]) by orsmga005.jf.intel.com with ESMTP; 15 Nov 2021 23:38:31 -0800 From: Ciara Loftus To: netdev@vger.kernel.org, bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net, kuba@kernel.org, hawk@kernel.org, john.fastabend@gmail.com, toke@redhat.com, bjorn@kernel.org, magnus.karlsson@intel.com, jonathan.lemon@gmail.com, maciej.fijalkowski@intel.com, Ciara Loftus Subject: [RFC PATCH bpf-next 3/8] xsk: handle XDP_REDIRECT_XSK and expose xsk_rcv/flush Date: Tue, 16 Nov 2021 07:37:37 +0000 Message-Id: <20211116073742.7941-4-ciara.loftus@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211116073742.7941-1-ciara.loftus@intel.com> References: <20211116073742.7941-1-ciara.loftus@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Handle the XDP_REDIRECT_XSK action on the SKB path by retrieving the handle to the socket from the netdev_rx_queue struct and immediately calling the xsk_generic_rcv function. Also, prepare for supporting this action in the drivers by exposing the xsk_rcv and xsk_flush functions so they can be used directly in the drivers. Signed-off-by: Ciara Loftus --- include/net/xdp_sock_drv.h | 21 +++++++++++++++++++++ net/core/dev.c | 14 ++++++++++++++ net/core/filter.c | 4 ++++ net/xdp/xsk.c | 6 ++++-- 4 files changed, 43 insertions(+), 2 deletions(-) diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index 443d45951564..e923f5d1adb6 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -22,6 +22,8 @@ void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool); void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool); void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool); bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool); +int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp); +void xsk_flush(struct xdp_sock *xs); static inline u32 xsk_pool_get_headroom(struct xsk_buff_pool *pool) { @@ -130,6 +132,11 @@ static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool, xp_dma_sync_for_device(pool, dma, size); } +static inline struct xdp_sock *xsk_get_redirect_xsk(struct netdev_rx_queue *q) +{ + return READ_ONCE(q->xsk); +} + #else static inline void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries) @@ -179,6 +186,15 @@ static inline bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool) return false; } +static inline int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) +{ + return 0; +} + +static inline void xsk_flush(struct xdp_sock *xs) +{ +} + static inline u32 xsk_pool_get_headroom(struct xsk_buff_pool *pool) { return 0; @@ -264,6 +280,11 @@ static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool, { } +static inline struct xdp_sock *xsk_get_redirect_xsk(struct netdev_rx_queue *q) +{ + return NULL; +} + #endif /* CONFIG_XDP_SOCKETS */ #endif /* _LINUX_XDP_SOCK_DRV_H */ diff --git a/net/core/dev.c b/net/core/dev.c index edeb811c454e..9b38b50f1f97 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -106,6 +106,7 @@ #include #include #include +#include #include #include #include @@ -4771,6 +4772,7 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, * kfree_skb in response to actions it cannot handle/XDP_DROP). */ switch (act) { + case XDP_REDIRECT_XSK: case XDP_REDIRECT: case XDP_TX: __skb_push(skb, mac_len); @@ -4819,6 +4821,7 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb, act = bpf_prog_run_generic_xdp(skb, xdp, xdp_prog); switch (act) { + case XDP_REDIRECT_XSK: case XDP_REDIRECT: case XDP_TX: case XDP_PASS: @@ -4875,6 +4878,17 @@ int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) act = netif_receive_generic_xdp(skb, &xdp, xdp_prog); if (act != XDP_PASS) { switch (act) { +#ifdef CONFIG_XDP_SOCKETS + case XDP_REDIRECT_XSK: + struct xdp_sock *xs = + READ_ONCE(skb->dev->_rx[xdp.rxq->queue_index].xsk); + + err = xsk_generic_rcv(xs, &xdp); + if (err) + goto out_redir; + consume_skb(skb); + break; +#endif case XDP_REDIRECT: err = xdp_do_generic_redirect(skb->dev, skb, &xdp, xdp_prog); diff --git a/net/core/filter.c b/net/core/filter.c index 4497ad046790..c65262722c64 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -8203,7 +8203,11 @@ static bool xdp_is_valid_access(int off, int size, void bpf_warn_invalid_xdp_action(u32 act) { +#ifdef CONFIG_XDP_SOCKETS + const u32 act_max = XDP_REDIRECT_XSK; +#else const u32 act_max = XDP_REDIRECT; +#endif WARN_ONCE(1, "%s XDP return value %u, expect packet loss!\n", act > act_max ? "Illegal" : "Driver unsupported", diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 94ee524b9ca8..ce004f5fae64 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -226,12 +226,13 @@ static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp) return 0; } -static void xsk_flush(struct xdp_sock *xs) +void xsk_flush(struct xdp_sock *xs) { xskq_prod_submit(xs->rx); __xskq_cons_release(xs->pool->fq); sock_def_readable(&xs->sk); } +EXPORT_SYMBOL(xsk_flush); int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) { @@ -247,7 +248,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) return err; } -static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) +int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) { int err; u32 len; @@ -266,6 +267,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) xdp_return_buff(xdp); return err; } +EXPORT_SYMBOL(xsk_rcv); int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp) { From patchwork Tue Nov 16 07:37:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ciara Loftus X-Patchwork-Id: 12621667 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70F52C433F5 for ; Tue, 16 Nov 2021 07:38:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5A0346320D for ; Tue, 16 Nov 2021 07:38:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230406AbhKPHle (ORCPT ); Tue, 16 Nov 2021 02:41:34 -0500 Received: from mga11.intel.com ([192.55.52.93]:42900 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230385AbhKPHle (ORCPT ); Tue, 16 Nov 2021 02:41:34 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10169"; a="231099064" X-IronPort-AV: E=Sophos;i="5.87,238,1631602800"; d="scan'208";a="231099064" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Nov 2021 23:38:37 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,238,1631602800"; d="scan'208";a="671857339" Received: from silpixa00401086.ir.intel.com (HELO localhost.localdomain) ([10.55.129.110]) by orsmga005.jf.intel.com with ESMTP; 15 Nov 2021 23:38:34 -0800 From: Ciara Loftus To: netdev@vger.kernel.org, bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net, kuba@kernel.org, hawk@kernel.org, john.fastabend@gmail.com, toke@redhat.com, bjorn@kernel.org, magnus.karlsson@intel.com, jonathan.lemon@gmail.com, maciej.fijalkowski@intel.com, Ciara Loftus Subject: [RFC PATCH bpf-next 4/8] i40e: handle the XDP_REDIRECT_XSK action Date: Tue, 16 Nov 2021 07:37:38 +0000 Message-Id: <20211116073742.7941-5-ciara.loftus@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211116073742.7941-1-ciara.loftus@intel.com> References: <20211116073742.7941-1-ciara.loftus@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC If the BPF program returns XDP_REDIRECT_XSK, obtain the pointer to the socket from the netdev_rx_queue struct and call the newly exposed xsk_rcv function to push the XDP descriptor to the Rx ring. Then use xsk_flush to flush the socket. Signed-off-by: Ciara Loftus --- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 13 +++++++++++- .../ethernet/intel/i40e/i40e_txrx_common.h | 1 + drivers/net/ethernet/intel/i40e/i40e_xsk.c | 21 +++++++++++++------ 3 files changed, 28 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index 10a83e5385c7..b6a883a8d088 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -4,6 +4,7 @@ #include #include #include +#include #include "i40e.h" #include "i40e_trace.h" #include "i40e_prototype.h" @@ -2296,6 +2297,7 @@ static int i40e_run_xdp(struct i40e_ring *rx_ring, struct xdp_buff *xdp) int err, result = I40E_XDP_PASS; struct i40e_ring *xdp_ring; struct bpf_prog *xdp_prog; + struct xdp_sock *xs; u32 act; xdp_prog = READ_ONCE(rx_ring->xdp_prog); @@ -2315,6 +2317,12 @@ static int i40e_run_xdp(struct i40e_ring *rx_ring, struct xdp_buff *xdp) if (result == I40E_XDP_CONSUMED) goto out_failure; break; + case XDP_REDIRECT_XSK: + xs = xsk_get_redirect_xsk(&rx_ring->netdev->_rx[xdp->rxq->queue_index]); + err = xsk_rcv(xs, xdp); + if (err) + goto out_failure; + return I40E_XDP_REDIR_XSK; case XDP_REDIRECT: err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); if (err) @@ -2401,6 +2409,9 @@ void i40e_update_rx_stats(struct i40e_ring *rx_ring, **/ void i40e_finalize_xdp_rx(struct i40e_ring *rx_ring, unsigned int xdp_res) { + if (xdp_res & I40E_XDP_REDIR_XSK) + xsk_flush(xsk_get_redirect_xsk(&rx_ring->netdev->_rx[rx_ring->queue_index])); + if (xdp_res & I40E_XDP_REDIR) xdp_do_flush_map(); @@ -2516,7 +2527,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget) } if (xdp_res) { - if (xdp_res & (I40E_XDP_TX | I40E_XDP_REDIR)) { + if (xdp_res & (I40E_XDP_TX | I40E_XDP_REDIR | I40E_XDP_REDIR_XSK)) { xdp_xmit |= xdp_res; i40e_rx_buffer_flip(rx_ring, rx_buffer, size); } else { diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h index 19da3b22160f..17e521a71201 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h @@ -20,6 +20,7 @@ void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val); #define I40E_XDP_CONSUMED BIT(0) #define I40E_XDP_TX BIT(1) #define I40E_XDP_REDIR BIT(2) +#define I40E_XDP_REDIR_XSK BIT(3) /* * build_ctob - Builds the Tx descriptor (cmd, offset and type) qword diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c index ea06e957393e..31b794672ea5 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c @@ -144,13 +144,14 @@ int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool, * @rx_ring: Rx ring * @xdp: xdp_buff used as input to the XDP program * - * Returns any of I40E_XDP_{PASS, CONSUMED, TX, REDIR} + * Returns any of I40E_XDP_{PASS, CONSUMED, TX, REDIR, REDIR_XSK} **/ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp) { int err, result = I40E_XDP_PASS; struct i40e_ring *xdp_ring; struct bpf_prog *xdp_prog; + struct xdp_sock *xs; u32 act; /* NB! xdp_prog will always be !NULL, due to the fact that @@ -159,14 +160,21 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp) xdp_prog = READ_ONCE(rx_ring->xdp_prog); act = bpf_prog_run_xdp(xdp_prog, xdp); - if (likely(act == XDP_REDIRECT)) { - err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); + if (likely(act == XDP_REDIRECT_XSK)) { + xs = xsk_get_redirect_xsk(&rx_ring->netdev->_rx[xdp->rxq->queue_index]); + err = xsk_rcv(xs, xdp); if (err) goto out_failure; - return I40E_XDP_REDIR; + return I40E_XDP_REDIR_XSK; } switch (act) { + case XDP_REDIRECT: + err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); + if (err) + goto out_failure; + result = I40E_XDP_REDIR; + break; case XDP_PASS: break; case XDP_TX: @@ -275,7 +283,8 @@ static void i40e_handle_xdp_result_zc(struct i40e_ring *rx_ring, *rx_packets = 1; *rx_bytes = size; - if (likely(xdp_res == I40E_XDP_REDIR) || xdp_res == I40E_XDP_TX) + if (likely(xdp_res == I40E_XDP_REDIR_XSK) || xdp_res == I40E_XDP_REDIR || + xdp_res == I40E_XDP_TX) return; if (xdp_res == I40E_XDP_CONSUMED) { @@ -371,7 +380,7 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget) &rx_bytes, size, xdp_res); total_rx_packets += rx_packets; total_rx_bytes += rx_bytes; - xdp_xmit |= xdp_res & (I40E_XDP_TX | I40E_XDP_REDIR); + xdp_xmit |= xdp_res & (I40E_XDP_TX | I40E_XDP_REDIR | I40E_XDP_REDIR_XSK); next_to_clean = (next_to_clean + 1) & count_mask; } From patchwork Tue Nov 16 07:37:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ciara Loftus X-Patchwork-Id: 12621669 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 256E2C433F5 for ; Tue, 16 Nov 2021 07:38:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0BF296320D for ; Tue, 16 Nov 2021 07:38:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230399AbhKPHlj (ORCPT ); Tue, 16 Nov 2021 02:41:39 -0500 Received: from mga11.intel.com ([192.55.52.93]:42900 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230356AbhKPHlh (ORCPT ); Tue, 16 Nov 2021 02:41:37 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10169"; a="231099075" X-IronPort-AV: E=Sophos;i="5.87,238,1631602800"; d="scan'208";a="231099075" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Nov 2021 23:38:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,238,1631602800"; d="scan'208";a="671857354" Received: from silpixa00401086.ir.intel.com (HELO localhost.localdomain) ([10.55.129.110]) by orsmga005.jf.intel.com with ESMTP; 15 Nov 2021 23:38:37 -0800 From: Ciara Loftus To: netdev@vger.kernel.org, bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net, kuba@kernel.org, hawk@kernel.org, john.fastabend@gmail.com, toke@redhat.com, bjorn@kernel.org, magnus.karlsson@intel.com, jonathan.lemon@gmail.com, maciej.fijalkowski@intel.com, Ciara Loftus Subject: [RFC PATCH bpf-next 5/8] xsk: implement a batched version of xsk_rcv Date: Tue, 16 Nov 2021 07:37:39 +0000 Message-Id: <20211116073742.7941-6-ciara.loftus@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211116073742.7941-1-ciara.loftus@intel.com> References: <20211116073742.7941-1-ciara.loftus@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Introduce a batched version of xsk_rcv called xsk_rcv_batch which takes an array of xdp_buffs and pushes them to the Rx ring. Also introduce a batched version of xsk_buff_dma_sync_for_cpu. Signed-off-by: Ciara Loftus --- include/net/xdp_sock_drv.h | 28 ++++++++++++++++++++++++++++ include/net/xsk_buff_pool.h | 22 ++++++++++++++++++++++ net/xdp/xsk.c | 29 +++++++++++++++++++++++++++++ net/xdp/xsk_queue.h | 31 +++++++++++++++++++++++++++++++ 4 files changed, 110 insertions(+) diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index e923f5d1adb6..0b352d7a34af 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -23,6 +23,7 @@ void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool); void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool); bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool); int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp); +int xsk_rcv_batch(struct xdp_sock *xs, struct xdp_buff **bufs, int batch_size); void xsk_flush(struct xdp_sock *xs); static inline u32 xsk_pool_get_headroom(struct xsk_buff_pool *pool) @@ -125,6 +126,22 @@ static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp, struct xsk_bu xp_dma_sync_for_cpu(xskb); } +static inline void xsk_buff_dma_sync_for_cpu_batch(struct xdp_buff **bufs, + struct xsk_buff_pool *pool, + int batch_size) +{ + struct xdp_buff_xsk *xskb; + int i; + + if (!pool->dma_need_sync) + return; + + for (i = 0; i < batch_size; i++) { + xskb = container_of(*(bufs + i), struct xdp_buff_xsk, xdp); + xp_dma_sync_for_cpu(xskb); + } +} + static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool, dma_addr_t dma, size_t size) @@ -191,6 +208,11 @@ static inline int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) return 0; } +static inline int xsk_rcv_batch(struct xdp_sock *xs, struct xdp_buff **bufs, int batch_size) +{ + return 0; +} + static inline void xsk_flush(struct xdp_sock *xs) { } @@ -274,6 +296,12 @@ static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp, struct xsk_bu { } +static inline void xsk_buff_dma_sync_for_cpu_batch(struct xdp_buff **bufs, + struct xsk_buff_pool *pool, + int batch_size) +{ +} + static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool, dma_addr_t dma, size_t size) diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index ddeefc4a1040..f6d76c7eaf6b 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -214,6 +214,28 @@ static inline void xp_release(struct xdp_buff_xsk *xskb) xskb->pool->free_heads[xskb->pool->free_heads_cnt++] = xskb; } +/* Release a batch of xdp_buffs back to an xdp_buff_pool. + * The batch of buffs must all come from the same xdp_buff_pool. This way + * it is safe to push the batch to the top of the free_heads stack, because + * at least the same amount will have been popped from the stack earlier in + * the datapath. + */ +static inline void xp_release_batch(struct xdp_buff **bufs, int batch_size) +{ + struct xdp_buff_xsk *xskb = container_of(*bufs, struct xdp_buff_xsk, xdp); + struct xsk_buff_pool *pool = xskb->pool; + u32 tail = pool->free_heads_cnt; + u32 i; + + if (pool->unaligned) { + for (i = 0; i < batch_size; i++) { + xskb = container_of(*(bufs + i), struct xdp_buff_xsk, xdp); + pool->free_heads[tail + i] = xskb; + } + pool->free_heads_cnt += batch_size; + } +} + static inline u64 xp_get_handle(struct xdp_buff_xsk *xskb) { u64 offset = xskb->xdp.data - xskb->xdp.data_hard_start; diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index ce004f5fae64..22d00173a96f 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -151,6 +151,20 @@ static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) return 0; } +static int __xsk_rcv_zc_batch(struct xdp_sock *xs, struct xdp_buff **bufs, int batch_size) +{ + int err; + + err = xskq_prod_reserve_desc_batch(xs->rx, bufs, batch_size); + if (err) { + xs->rx_queue_full++; + return -1; + } + + xp_release_batch(bufs, batch_size); + return 0; +} + static void xsk_copy_xdp(struct xdp_buff *to, struct xdp_buff *from, u32 len) { void *from_buf, *to_buf; @@ -269,6 +283,21 @@ int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) } EXPORT_SYMBOL(xsk_rcv); +int xsk_rcv_batch(struct xdp_sock *xs, struct xdp_buff **bufs, int batch_size) +{ + int err; + + err = xsk_rcv_check(xs, *bufs); + if (err) + return err; + + if ((*bufs)->rxq->mem.type != MEM_TYPE_XSK_BUFF_POOL) + return -1; + + return __xsk_rcv_zc_batch(xs, bufs, batch_size); +} +EXPORT_SYMBOL(xsk_rcv_batch); + int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp) { struct list_head *flush_list = this_cpu_ptr(&xskmap_flush_list); diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index e9aa2c236356..3be9f4a01d77 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -338,6 +338,11 @@ static inline bool xskq_prod_is_full(struct xsk_queue *q) return xskq_prod_nb_free(q, 1) ? false : true; } +static inline bool xskq_prod_is_full_n(struct xsk_queue *q, u32 n) +{ + return xskq_prod_nb_free(q, n) ? false : true; +} + static inline void xskq_prod_cancel(struct xsk_queue *q) { q->cached_prod--; @@ -399,6 +404,32 @@ static inline int xskq_prod_reserve_desc(struct xsk_queue *q, return 0; } +static inline int xskq_prod_reserve_desc_batch(struct xsk_queue *q, struct xdp_buff **bufs, + int batch_size) +{ + struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring; + struct xdp_buff_xsk *xskb; + u64 addr; + u32 len; + u32 i; + + if (xskq_prod_is_full_n(q, batch_size)) + return -ENOSPC; + + /* A, matches D */ + for (i = 0; i < batch_size; i++) { + len = (*(bufs + i))->data_end - (*(bufs + i))->data; + xskb = container_of(*(bufs + i), struct xdp_buff_xsk, xdp); + addr = xp_get_handle(xskb); + ring->desc[(q->cached_prod + i) & q->ring_mask].addr = addr; + ring->desc[(q->cached_prod + i) & q->ring_mask].len = len; + } + + q->cached_prod += batch_size; + + return 0; +} + static inline void __xskq_prod_submit(struct xsk_queue *q, u32 idx) { smp_store_release(&q->ring->producer, idx); /* B, matches C */ From patchwork Tue Nov 16 07:37:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ciara Loftus X-Patchwork-Id: 12621671 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE3F6C4332F for ; Tue, 16 Nov 2021 07:38:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AFDBC61101 for ; Tue, 16 Nov 2021 07:38:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230414AbhKPHlm (ORCPT ); Tue, 16 Nov 2021 02:41:42 -0500 Received: from mga11.intel.com ([192.55.52.93]:42900 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230403AbhKPHlk (ORCPT ); Tue, 16 Nov 2021 02:41:40 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10169"; a="231099083" X-IronPort-AV: E=Sophos;i="5.87,238,1631602800"; d="scan'208";a="231099083" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Nov 2021 23:38:43 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,238,1631602800"; d="scan'208";a="671857370" Received: from silpixa00401086.ir.intel.com (HELO localhost.localdomain) ([10.55.129.110]) by orsmga005.jf.intel.com with ESMTP; 15 Nov 2021 23:38:40 -0800 From: Ciara Loftus To: netdev@vger.kernel.org, bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net, kuba@kernel.org, hawk@kernel.org, john.fastabend@gmail.com, toke@redhat.com, bjorn@kernel.org, magnus.karlsson@intel.com, jonathan.lemon@gmail.com, maciej.fijalkowski@intel.com, Ciara Loftus , Cristian Dumitrescu Subject: [RFC PATCH bpf-next 6/8] i40e: isolate descriptor processing in separate function Date: Tue, 16 Nov 2021 07:37:40 +0000 Message-Id: <20211116073742.7941-7-ciara.loftus@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211116073742.7941-1-ciara.loftus@intel.com> References: <20211116073742.7941-1-ciara.loftus@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC To prepare for batched processing, first isolate descriptor processing in a separate function to make it easier to introduce the batched interfaces. Signed-off-by: Cristian Dumitrescu Signed-off-by: Ciara Loftus --- drivers/net/ethernet/intel/i40e/i40e_xsk.c | 51 +++++++++++++++------- 1 file changed, 36 insertions(+), 15 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c index 31b794672ea5..c994b4d9c38a 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c @@ -323,28 +323,23 @@ static void i40e_handle_xdp_result_zc(struct i40e_ring *rx_ring, WARN_ON_ONCE(1); } -/** - * i40e_clean_rx_irq_zc - Consumes Rx packets from the hardware ring - * @rx_ring: Rx ring - * @budget: NAPI budget - * - * Returns amount of work completed - **/ -int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget) +static inline void i40e_clean_rx_desc_zc(struct i40e_ring *rx_ring, + unsigned int *stat_rx_packets, + unsigned int *stat_rx_bytes, + unsigned int *xmit, + int budget) { - unsigned int total_rx_bytes = 0, total_rx_packets = 0; - u16 cleaned_count = I40E_DESC_UNUSED(rx_ring); + unsigned int total_rx_packets = *stat_rx_packets, total_rx_bytes = *stat_rx_bytes; u16 next_to_clean = rx_ring->next_to_clean; u16 count_mask = rx_ring->count - 1; - unsigned int xdp_res, xdp_xmit = 0; - bool failure = false; + unsigned int xdp_xmit = *xmit; while (likely(total_rx_packets < (unsigned int)budget)) { union i40e_rx_desc *rx_desc; + unsigned int size, xdp_res; unsigned int rx_packets; unsigned int rx_bytes; struct xdp_buff *bi; - unsigned int size; u64 qword; rx_desc = I40E_RX_DESC(rx_ring, next_to_clean); @@ -385,7 +380,33 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget) } rx_ring->next_to_clean = next_to_clean; - cleaned_count = (next_to_clean - rx_ring->next_to_use - 1) & count_mask; + *stat_rx_packets = total_rx_packets; + *stat_rx_bytes = total_rx_bytes; + *xmit = xdp_xmit; +} + +/** + * i40e_clean_rx_irq_zc - Consumes Rx packets from the hardware ring + * @rx_ring: Rx ring + * @budget: NAPI budget + * + * Returns amount of work completed + **/ +int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget) +{ + unsigned int total_rx_bytes = 0, total_rx_packets = 0; + u16 count_mask = rx_ring->count - 1; + unsigned int xdp_xmit = 0; + bool failure = false; + u16 cleaned_count; + + i40e_clean_rx_desc_zc(rx_ring, + &total_rx_packets, + &total_rx_bytes, + &xdp_xmit, + budget); + + cleaned_count = (rx_ring->next_to_clean - rx_ring->next_to_use - 1) & count_mask; if (cleaned_count >= I40E_RX_BUFFER_WRITE) failure = !i40e_alloc_rx_buffers_zc(rx_ring, cleaned_count); @@ -394,7 +415,7 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget) i40e_update_rx_stats(rx_ring, total_rx_bytes, total_rx_packets); if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) { - if (failure || next_to_clean == rx_ring->next_to_use) + if (failure || rx_ring->next_to_clean == rx_ring->next_to_use) xsk_set_rx_need_wakeup(rx_ring->xsk_pool); else xsk_clear_rx_need_wakeup(rx_ring->xsk_pool); From patchwork Tue Nov 16 07:37:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ciara Loftus X-Patchwork-Id: 12621673 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0725FC433EF for ; Tue, 16 Nov 2021 07:38:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E36A96320D for ; Tue, 16 Nov 2021 07:38:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230415AbhKPHls (ORCPT ); Tue, 16 Nov 2021 02:41:48 -0500 Received: from mga11.intel.com ([192.55.52.93]:42900 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230403AbhKPHln (ORCPT ); Tue, 16 Nov 2021 02:41:43 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10169"; a="231099102" X-IronPort-AV: E=Sophos;i="5.87,238,1631602800"; d="scan'208";a="231099102" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Nov 2021 23:38:47 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,238,1631602800"; d="scan'208";a="671857381" Received: from silpixa00401086.ir.intel.com (HELO localhost.localdomain) ([10.55.129.110]) by orsmga005.jf.intel.com with ESMTP; 15 Nov 2021 23:38:43 -0800 From: Ciara Loftus To: netdev@vger.kernel.org, bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net, kuba@kernel.org, hawk@kernel.org, john.fastabend@gmail.com, toke@redhat.com, bjorn@kernel.org, magnus.karlsson@intel.com, jonathan.lemon@gmail.com, maciej.fijalkowski@intel.com, Ciara Loftus , Cristian Dumitrescu Subject: [RFC PATCH bpf-next 7/8] i40e: introduce batched XDP rx descriptor processing Date: Tue, 16 Nov 2021 07:37:41 +0000 Message-Id: <20211116073742.7941-8-ciara.loftus@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211116073742.7941-1-ciara.loftus@intel.com> References: <20211116073742.7941-1-ciara.loftus@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Introduce batched processing of XDP frames in the i40e driver. The batch size is fixed at 64. First, the driver performs a lookahead in the rx ring to determine if there are 64 contiguous descriptors available to be processed. If so, and if the action returned from the bpf program run for each of the 64 descriptors is XDP_REDIRECT_XSK, the new xsk_rcv_batch API is used to push the batch to the XDP socket Rx ring. Logic to fallback to scalar processing is included for situations where batch processing is not possible eg. not enough descriptors, ring wrap, different actions returned from bpf program, etc. Signed-off-by: Ciara Loftus Signed-off-by: Cristian Dumitrescu --- drivers/net/ethernet/intel/i40e/i40e_xsk.c | 219 +++++++++++++++++++-- 1 file changed, 200 insertions(+), 19 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c index c994b4d9c38a..a578bb7b3b99 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c @@ -10,6 +10,9 @@ #include "i40e_txrx_common.h" #include "i40e_xsk.h" +#define I40E_DESCS_PER_BATCH 64 +#define I40E_XSK_BATCH_MASK ~(I40E_DESCS_PER_BATCH - 1) + int i40e_alloc_rx_bi_zc(struct i40e_ring *rx_ring) { unsigned long sz = sizeof(*rx_ring->rx_bi_zc) * rx_ring->count; @@ -139,26 +142,12 @@ int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool, i40e_xsk_pool_disable(vsi, qid); } -/** - * i40e_run_xdp_zc - Executes an XDP program on an xdp_buff - * @rx_ring: Rx ring - * @xdp: xdp_buff used as input to the XDP program - * - * Returns any of I40E_XDP_{PASS, CONSUMED, TX, REDIR, REDIR_XSK} - **/ -static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp) +static int i40e_handle_xdp_action(struct i40e_ring *rx_ring, struct xdp_buff *xdp, + struct bpf_prog *xdp_prog, u32 act) { int err, result = I40E_XDP_PASS; struct i40e_ring *xdp_ring; - struct bpf_prog *xdp_prog; struct xdp_sock *xs; - u32 act; - - /* NB! xdp_prog will always be !NULL, due to the fact that - * this path is enabled by setting an XDP program. - */ - xdp_prog = READ_ONCE(rx_ring->xdp_prog); - act = bpf_prog_run_xdp(xdp_prog, xdp); if (likely(act == XDP_REDIRECT_XSK)) { xs = xsk_get_redirect_xsk(&rx_ring->netdev->_rx[xdp->rxq->queue_index]); @@ -197,6 +186,21 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp) return result; } +/** + * i40e_run_xdp_zc - Executes an XDP program on an xdp_buff + * @rx_ring: Rx ring + * @xdp: xdp_buff used as input to the XDP program + * + * Returns any of I40E_XDP_{PASS, CONSUMED, TX, REDIR, REDIR_XSK} + **/ +static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp, + struct bpf_prog *xdp_prog) +{ + u32 act = bpf_prog_run_xdp(xdp_prog, xdp); + + return i40e_handle_xdp_action(rx_ring, xdp, xdp_prog, act); +} + bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count) { u16 ntu = rx_ring->next_to_use; @@ -218,6 +222,7 @@ bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count) dma = xsk_buff_xdp_get_dma(*xdp); rx_desc->read.pkt_addr = cpu_to_le64(dma); rx_desc->read.hdr_addr = 0; + rx_desc->wb.qword1.status_error_len = 0; rx_desc++; xdp++; @@ -324,6 +329,7 @@ static void i40e_handle_xdp_result_zc(struct i40e_ring *rx_ring, } static inline void i40e_clean_rx_desc_zc(struct i40e_ring *rx_ring, + struct bpf_prog *xdp_prog, unsigned int *stat_rx_packets, unsigned int *stat_rx_bytes, unsigned int *xmit, @@ -370,7 +376,7 @@ static inline void i40e_clean_rx_desc_zc(struct i40e_ring *rx_ring, xsk_buff_set_size(bi, size); xsk_buff_dma_sync_for_cpu(bi, rx_ring->xsk_pool); - xdp_res = i40e_run_xdp_zc(rx_ring, bi); + xdp_res = i40e_run_xdp_zc(rx_ring, bi, xdp_prog); i40e_handle_xdp_result_zc(rx_ring, bi, rx_desc, &rx_packets, &rx_bytes, size, xdp_res); total_rx_packets += rx_packets; @@ -385,6 +391,172 @@ static inline void i40e_clean_rx_desc_zc(struct i40e_ring *rx_ring, *xmit = xdp_xmit; } +/** + * i40_rx_ring_lookahead - check for new descriptors in the rx ring + * @rx_ring: Rx ring + * @budget: NAPI budget + * + * Returns the number of available descriptors in contiguous memory ie. + * without a ring wrap. + * + **/ +static inline unsigned int i40_rx_ring_lookahead(struct i40e_ring *rx_ring, + unsigned int budget) +{ + u32 used = (rx_ring->next_to_clean - rx_ring->next_to_use - 1) & (rx_ring->count - 1); + union i40e_rx_desc *rx_desc0 = (union i40e_rx_desc *)rx_ring->desc, *rx_desc; + u32 next_to_clean = rx_ring->next_to_clean; + u32 potential = rx_ring->count - used; + u16 count_mask = rx_ring->count - 1; + unsigned int size; + u64 qword; + + budget &= I40E_XSK_BATCH_MASK; + + while (budget) { + if (budget > potential) + goto next; + rx_desc = rx_desc0 + ((next_to_clean + budget - 1) & count_mask); + qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len); + dma_rmb(); + + size = (qword & I40E_RXD_QW1_LENGTH_PBUF_MASK) >> + I40E_RXD_QW1_LENGTH_PBUF_SHIFT; + if (size && ((next_to_clean + budget) <= count_mask)) + return budget; + +next: + budget >>= 1; + budget &= I40E_XSK_BATCH_MASK; + } + + return 0; +} + +/** + * i40e_run_xdp_zc_batch - Executes an XDP program on an array of xdp_buffs + * @rx_ring: Rx ring + * @bufs: array of xdp_buffs used as input to the XDP program + * @res: array of ints with result for each buf if an error occurs or slow path taken. + * + * Returns zero if all xdp_buffs successfully took the fast path (XDP_REDIRECT_XSK). + * Otherwise returns -1 and sets individual results for each buf in the array *res. + * Individual results are one of I40E_XDP_{PASS, CONSUMED, TX, REDIR, REDIR_XSK} + **/ +static int i40e_run_xdp_zc_batch(struct i40e_ring *rx_ring, struct xdp_buff **bufs, + struct bpf_prog *xdp_prog, int *res) +{ + u32 last_act = XDP_REDIRECT_XSK; + int runs = 0, ret = 0, err, i; + + while ((runs < I40E_DESCS_PER_BATCH) && (last_act == XDP_REDIRECT_XSK)) + last_act = bpf_prog_run_xdp(xdp_prog, *(bufs + runs++)); + + if (likely(runs == I40E_DESCS_PER_BATCH)) { + struct xdp_sock *xs = + xsk_get_redirect_xsk(&rx_ring->netdev->_rx[(*bufs)->rxq->queue_index]); + + err = xsk_rcv_batch(xs, bufs, I40E_DESCS_PER_BATCH); + if (unlikely(err)) { + ret = -1; + for (i = 0; i < I40E_DESCS_PER_BATCH; i++) + *(res + i) = I40E_XDP_PASS; + } + } else { + /* Handle the result of each program run individually */ + u32 act; + + ret = -1; + for (i = 0; i < I40E_DESCS_PER_BATCH; i++) { + struct xdp_buff *xdp = *(bufs + i); + + /* The result of the first runs-2 programs was XDP_REDIRECT_XSK. + * The result of the subsequent program run was last_act. + * Any remaining bufs have not yet had the program executed, so + * execute it now. + */ + + if (i < runs - 2) + act = XDP_REDIRECT_XSK; + else if (i == runs - 1) + act = last_act; + else + act = bpf_prog_run_xdp(xdp_prog, xdp); + + *(res + i) = i40e_handle_xdp_action(rx_ring, xdp, xdp_prog, act); + } + } + + return ret; +} + +static inline void i40e_clean_rx_desc_zc_batch(struct i40e_ring *rx_ring, + struct bpf_prog *xdp_prog, + unsigned int *total_rx_packets, + unsigned int *total_rx_bytes, + unsigned int *xdp_xmit) +{ + u16 next_to_clean = rx_ring->next_to_clean; + unsigned int xdp_res[I40E_DESCS_PER_BATCH]; + unsigned int size[I40E_DESCS_PER_BATCH]; + unsigned int rx_packets, rx_bytes = 0; + union i40e_rx_desc *rx_desc; + struct xdp_buff **bufs; + int j, ret; + u64 qword; + + rx_desc = I40E_RX_DESC(rx_ring, next_to_clean); + + prefetch(rx_desc + I40E_DESCS_PER_BATCH); + + for (j = 0; j < I40E_DESCS_PER_BATCH; j++) { + qword = le64_to_cpu((rx_desc + j)->wb.qword1.status_error_len); + size[j] = (qword & I40E_RXD_QW1_LENGTH_PBUF_MASK) >> + I40E_RXD_QW1_LENGTH_PBUF_SHIFT; + } + + /* This memory barrier is needed to keep us from reading + * any other fields out of the rx_descs until we have + * verified the descriptors have been written back. + */ + dma_rmb(); + + bufs = i40e_rx_bi(rx_ring, next_to_clean); + + for (j = 0; j < I40E_DESCS_PER_BATCH; j++) + xsk_buff_set_size(*(bufs + j), size[j]); + + xsk_buff_dma_sync_for_cpu_batch(bufs, rx_ring->xsk_pool, I40E_DESCS_PER_BATCH); + + ret = i40e_run_xdp_zc_batch(rx_ring, bufs, xdp_prog, xdp_res); + + if (unlikely(ret)) { + unsigned int err_rx_packets = 0, err_rx_bytes = 0; + + rx_packets = 0; + rx_bytes = 0; + + for (j = 0; j < I40E_DESCS_PER_BATCH; j++) { + i40e_handle_xdp_result_zc(rx_ring, *(bufs + j), rx_desc + j, + &err_rx_packets, &err_rx_bytes, size[j], + xdp_res[j]); + *xdp_xmit |= (xdp_res[j] & (I40E_XDP_TX | I40E_XDP_REDIR | + I40E_XDP_REDIR_XSK)); + rx_packets += err_rx_packets; + rx_bytes += err_rx_bytes; + } + } else { + rx_packets = I40E_DESCS_PER_BATCH; + for (j = 0; j < I40E_DESCS_PER_BATCH; j++) + rx_bytes += size[j]; + *xdp_xmit |= I40E_XDP_REDIR_XSK; + } + + rx_ring->next_to_clean += I40E_DESCS_PER_BATCH; + *total_rx_packets += rx_packets; + *total_rx_bytes += rx_bytes; +} + /** * i40e_clean_rx_irq_zc - Consumes Rx packets from the hardware ring * @rx_ring: Rx ring @@ -394,17 +566,26 @@ static inline void i40e_clean_rx_desc_zc(struct i40e_ring *rx_ring, **/ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget) { + int batch_budget = i40_rx_ring_lookahead(rx_ring, (unsigned int)budget); + struct bpf_prog *xdp_prog = READ_ONCE(rx_ring->xdp_prog); unsigned int total_rx_bytes = 0, total_rx_packets = 0; u16 count_mask = rx_ring->count - 1; unsigned int xdp_xmit = 0; bool failure = false; u16 cleaned_count; + int i; + + for (i = 0; i < batch_budget; i += I40E_DESCS_PER_BATCH) + i40e_clean_rx_desc_zc_batch(rx_ring, xdp_prog, + &total_rx_packets, + &total_rx_bytes, + &xdp_xmit); - i40e_clean_rx_desc_zc(rx_ring, + i40e_clean_rx_desc_zc(rx_ring, xdp_prog, &total_rx_packets, &total_rx_bytes, &xdp_xmit, - budget); + (unsigned int)budget - total_rx_packets); cleaned_count = (rx_ring->next_to_clean - rx_ring->next_to_use - 1) & count_mask; From patchwork Tue Nov 16 07:37:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ciara Loftus X-Patchwork-Id: 12621675 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CF61C433EF for ; Tue, 16 Nov 2021 07:39:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1786561101 for ; Tue, 16 Nov 2021 07:39:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230494AbhKPHlx (ORCPT ); Tue, 16 Nov 2021 02:41:53 -0500 Received: from mga11.intel.com ([192.55.52.93]:42935 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230482AbhKPHls (ORCPT ); Tue, 16 Nov 2021 02:41:48 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10169"; a="231099116" X-IronPort-AV: E=Sophos;i="5.87,238,1631602800"; d="scan'208";a="231099116" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Nov 2021 23:38:50 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,238,1631602800"; d="scan'208";a="671857396" Received: from silpixa00401086.ir.intel.com (HELO localhost.localdomain) ([10.55.129.110]) by orsmga005.jf.intel.com with ESMTP; 15 Nov 2021 23:38:47 -0800 From: Ciara Loftus To: netdev@vger.kernel.org, bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net, kuba@kernel.org, hawk@kernel.org, john.fastabend@gmail.com, toke@redhat.com, bjorn@kernel.org, magnus.karlsson@intel.com, jonathan.lemon@gmail.com, maciej.fijalkowski@intel.com, Ciara Loftus Subject: [RFC PATCH bpf-next 8/8] libbpf: use bpf_redirect_xsk in the default program Date: Tue, 16 Nov 2021 07:37:42 +0000 Message-Id: <20211116073742.7941-9-ciara.loftus@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211116073742.7941-1-ciara.loftus@intel.com> References: <20211116073742.7941-1-ciara.loftus@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC NOTE: This will be committed to libxdp, not libbpf as the xsk support in that library has been deprecated. It is only here to serve as an example of what will be added into libxdp. Use the new bpf_redirect_xsk helper in the default program if the kernel supports it. Signed-off-by: Ciara Loftus --- tools/include/uapi/linux/bpf.h | 13 +++++++++ tools/lib/bpf/xsk.c | 50 ++++++++++++++++++++++++++++++++-- 2 files changed, 60 insertions(+), 3 deletions(-) diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 6297eafdc40f..a33cc63c8e6f 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -4957,6 +4957,17 @@ union bpf_attr { * **-ENOENT** if *task->mm* is NULL, or no vma contains *addr*. * **-EBUSY** if failed to try lock mmap_lock. * **-EINVAL** for invalid **flags**. + * + * long bpf_redirect_xsk(void *ctx, struct bpf_map *map, u32 key, u64 flags) + * Description + * Redirect the packet to the XDP socket associated with the netdev queue if + * the socket has an rx ring configured and is the only socket attached to the + * queue. Fall back to bpf_redirect_map behavior if either condition is not met. + * Return + * **XDP_REDIRECT_XSK** if successful. + * + * **XDP_REDIRECT** if the fall back was successful, or the value of the + * two lower bits of the *flags* argument on error */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5140,6 +5151,7 @@ union bpf_attr { FN(skc_to_unix_sock), \ FN(kallsyms_lookup_name), \ FN(find_vma), \ + FN(redirect_xsk), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper @@ -5520,6 +5532,7 @@ enum xdp_action { XDP_PASS, XDP_TX, XDP_REDIRECT, + XDP_REDIRECT_XSK, }; /* user accessible metadata for XDP packet hook diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c index fdb22f5405c9..ec66d4206af0 100644 --- a/tools/lib/bpf/xsk.c +++ b/tools/lib/bpf/xsk.c @@ -50,6 +50,7 @@ enum xsk_prog { XSK_PROG_FALLBACK, XSK_PROG_REDIRECT_FLAGS, + XSK_PROG_REDIRECT_XSK_FLAGS, }; struct xsk_umem { @@ -374,7 +375,15 @@ static enum xsk_prog get_xsk_prog(void) BPF_EMIT_CALL(BPF_FUNC_redirect_map), BPF_EXIT_INSN(), }; - int prog_fd, map_fd, ret, insn_cnt = ARRAY_SIZE(insns); + struct bpf_insn insns_xsk[] = { + BPF_MOV64_REG(BPF_REG_CTX, BPF_REG_ARG1), + BPF_LD_MAP_FD(BPF_REG_2, 0), + BPF_MOV64_IMM(BPF_REG_3, 0), + BPF_MOV64_IMM(BPF_REG_4, XDP_PASS), + BPF_EMIT_CALL(BPF_FUNC_redirect_xsk), + BPF_EXIT_INSN(), + }; + int prog_fd, map_fd, ret; memset(&map_attr, 0, sizeof(map_attr)); map_attr.map_type = BPF_MAP_TYPE_XSKMAP; @@ -386,9 +395,25 @@ static enum xsk_prog get_xsk_prog(void) if (map_fd < 0) return detected; + insns_xsk[1].imm = map_fd; + + prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, NULL, "GPL", insns_xsk, ARRAY_SIZE(insns_xsk), + NULL); + if (prog_fd < 0) + goto prog_redirect; + + ret = bpf_prog_test_run(prog_fd, 0, &data_in, 1, &data_out, &size_out, &retval, &duration); + if (!ret && retval == XDP_PASS) { + detected = XSK_PROG_REDIRECT_XSK_FLAGS; + close(map_fd); + close(prog_fd); + return detected; + } + +prog_redirect: insns[0].imm = map_fd; - prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, NULL, "GPL", insns, insn_cnt, NULL); + prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, NULL, "GPL", insns, ARRAY_SIZE(insns), NULL); if (prog_fd < 0) { close(map_fd); return detected; @@ -483,10 +508,29 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk) BPF_EMIT_CALL(BPF_FUNC_redirect_map), BPF_EXIT_INSN(), }; + + /* This is the post-5.13 kernel C-program: + * SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx) + * { + * return bpf_redirect_xsk(ctx, &xsks_map, ctx->rx_queue_index, XDP_PASS); + * } + */ + struct bpf_insn prog_redirect_xsk_flags[] = { + /* r3 = *(u32 *)(r1 + 16) */ + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, 16), + /* r2 = xskmap[] */ + BPF_LD_MAP_FD(BPF_REG_2, ctx->xsks_map_fd), + /* r4 = XDP_PASS */ + BPF_MOV64_IMM(BPF_REG_4, 2), + /* call bpf_redirect_xsk */ + BPF_EMIT_CALL(BPF_FUNC_redirect_xsk), + BPF_EXIT_INSN(), + }; size_t insns_cnt[] = {sizeof(prog) / sizeof(struct bpf_insn), sizeof(prog_redirect_flags) / sizeof(struct bpf_insn), + sizeof(prog_redirect_xsk_flags) / sizeof(struct bpf_insn), }; - struct bpf_insn *progs[] = {prog, prog_redirect_flags}; + struct bpf_insn *progs[] = {prog, prog_redirect_flags, prog_redirect_xsk_flags}; enum xsk_prog option = get_xsk_prog(); LIBBPF_OPTS(bpf_prog_load_opts, opts, .log_buf = log_buf,