From patchwork Thu May 18 18:05:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13247218 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D1A228C19; Thu, 18 May 2023 18:05:59 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A6E0C2; Thu, 18 May 2023 11:05:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684433158; x=1715969158; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=uvmacB+r+UQEhG+wwb13qe4vdYd3iT9ucVX2FWe84Ww=; b=SSs/RFqlD0PppKbqU5eBfmk3Jw8efw+IGoO67HxMxDikkRCykv+MkFqI sIO/FcaM93KZ1o2SIMsmlsDaVYZPVQttwErjmpNXhD72E2Un8znCO8Qir D7l8Ezqa4fLjk+9qBDEjQHQesiADWEnF0qBmCAhWVnyUzHsoX9o4DhL1K XYAMQYputHc870d6N+8UW2fQ5YOK6a4cMX2i1sv6BbsLEWyo/dr0xHi4+ xkK9rKdcIH7tQji1AOASQcMYH7ojL5VZfAONBNcZhpVyOu2ur/DwMbX21 2L3gB2Tfy+ZhhucQcViVar/0DKKDGIpWiU+fyj9xo2jBEcVsFTr8yVKgL A==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="350984693" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="350984693" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 11:05:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="948780102" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="948780102" Received: from boxer.igk.intel.com ([10.102.20.173]) by fmsmga006.fm.intel.com with ESMTP; 18 May 2023 11:05:53 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, tirthendu.sarkar@intel.com, maciej.fijalkowski@intel.com, bjorn@kernel.org Subject: [PATCH bpf-next 01/21] xsk: prepare 'options' in xdp_desc for multi-buffer use Date: Thu, 18 May 2023 20:05:25 +0200 Message-Id: <20230518180545.159100-2-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230518180545.159100-1-maciej.fijalkowski@intel.com> References: <20230518180545.159100-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Tirthendu Sarkar Use the 'options' field in xdp_desc as a packet continuity marker. Since 'options' field was unused till now and was expected to be set to 0, the 'eop' descriptor will have it set to 0, while the non-eop descriptors will have to set it to 1. This ensures legacy applications continue to work without needing any change for single-buffer packets. Add helper functions and extend xskq_prod_reserve_desc() to use the 'options' field. Signed-off-by: Tirthendu Sarkar --- include/uapi/linux/if_xdp.h | 16 ++++++++++++++++ net/xdp/xsk.c | 8 ++++---- net/xdp/xsk_queue.h | 12 +++++++++--- 3 files changed, 29 insertions(+), 7 deletions(-) diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h index a78a8096f4ce..4acc3a9430f3 100644 --- a/include/uapi/linux/if_xdp.h +++ b/include/uapi/linux/if_xdp.h @@ -108,4 +108,20 @@ struct xdp_desc { /* UMEM descriptor is __u64 */ +/* Flag indicating that the packet continues with the buffer pointed out by the + * next frame in the ring. The end of the packet is signalled by setting this + * bit to zero. For single buffer packets, every descriptor has 'options' set + * to 0 and this maintains backward compatibility. + */ +#define XDP_PKT_CONTD (1 << 0) + +/* Maximum number of descriptors supported as frags for a packet. So the total + * number of descriptors supported for a packet is XSK_DESC_MAX_FRAGS + 1. The + * max frags supported by skb is 16 for page sizes greater than 4K and 17 or + * more for 4K or lesser page sizes. XSK_DESC_MAX_FRAGS is set as the minimum + * value of 16 so that xsk applications see the same behavior for all + * architectures. + */ +#define XSK_DESC_MAX_FRAGS 16 + #endif /* _LINUX_IF_XDP_H */ diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index cc1e7f15fa73..99f90a0d04ae 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -135,14 +135,14 @@ int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool *pool, return 0; } -static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) +static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len, u32 flags) { struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp); u64 addr; int err; addr = xp_get_handle(xskb); - err = xskq_prod_reserve_desc(xs->rx, addr, len); + err = xskq_prod_reserve_desc(xs->rx, addr, len, flags); if (err) { xs->rx_queue_full++; return err; @@ -189,7 +189,7 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) } xsk_copy_xdp(xsk_xdp, xdp, len); - err = __xsk_rcv_zc(xs, xsk_xdp, len); + err = __xsk_rcv_zc(xs, xsk_xdp, len, 0); if (err) { xsk_buff_free(xsk_xdp); return err; @@ -259,7 +259,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) if (xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL) { len = xdp->data_end - xdp->data; - return __xsk_rcv_zc(xs, xdp, len); + return __xsk_rcv_zc(xs, xdp, len, 0); } err = __xsk_rcv(xs, xdp); diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index 6d40a77fccbe..ad81b19e6fdf 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -130,6 +130,11 @@ static inline bool xskq_cons_read_addr_unchecked(struct xsk_queue *q, u64 *addr) return false; } +static inline bool xp_unused_options_set(u16 options) +{ + return options & ~XDP_PKT_CONTD; +} + static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) { @@ -141,7 +146,7 @@ static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool, if (desc->addr >= pool->addrs_cnt) return false; - if (desc->options) + if (xp_unused_options_set(desc->options)) return false; return true; } @@ -158,7 +163,7 @@ static inline bool xp_unaligned_validate_desc(struct xsk_buff_pool *pool, xp_desc_crosses_non_contig_pg(pool, addr, desc->len)) return false; - if (desc->options) + if (xp_unused_options_set(desc->options)) return false; return true; } @@ -360,7 +365,7 @@ static inline void xskq_prod_write_addr_batch(struct xsk_queue *q, struct xdp_de } static inline int xskq_prod_reserve_desc(struct xsk_queue *q, - u64 addr, u32 len) + u64 addr, u32 len, u32 flags) { struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring; u32 idx; @@ -372,6 +377,7 @@ static inline int xskq_prod_reserve_desc(struct xsk_queue *q, idx = q->cached_prod++ & q->ring_mask; ring->desc[idx].addr = addr; ring->desc[idx].len = len; + ring->desc[idx].options = flags; return 0; } From patchwork Thu May 18 18:05:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13247219 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B64CD28C19; Thu, 18 May 2023 18:06:01 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 62B65C2; Thu, 18 May 2023 11:06:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684433160; x=1715969160; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=JNw2+5ey9ocL32lJ1PKr+CnICVACGjPFwW5GuE1o/fA=; b=JF7fBUlrmrprvh7GpZHx6SmgVRClI3KHK6NdUt8hUgVBOyunrZmebA7n IJnn7DwWqX/klF4shZYa088yXujiRhT4H2BXqAU8JjkDJjKjs364CT8QC gmOpNPcHu8JbBBTmLYW0oCF9fIhzsG8GsF/wfbtHaJQ9C9JjvaATzByIr FrwAaU4sOHrAMZkmYwczOjFrVpunsquhEtYaCHPQJ1JgBjkQrO+m3Oixv 9Z4C9DPsh0fmt86AyTJEpKdXdZ7v2N0NNliZd7ijmSDYCoyZZdHy7fa2x WO+8mc4yz5Gyhd/56wkmrY+ksfpRxFxH/m3tK5nRsI5cLQW2KtdNa1QDI g==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="350984709" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="350984709" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 11:06:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="948780116" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="948780116" Received: from boxer.igk.intel.com ([10.102.20.173]) by fmsmga006.fm.intel.com with ESMTP; 18 May 2023 11:05:56 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, tirthendu.sarkar@intel.com, maciej.fijalkowski@intel.com, bjorn@kernel.org Subject: [PATCH bpf-next 02/21] xsk: introduce XSK_USE_SG bind flag for xsk socket Date: Thu, 18 May 2023 20:05:26 +0200 Message-Id: <20230518180545.159100-3-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230518180545.159100-1-maciej.fijalkowski@intel.com> References: <20230518180545.159100-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Tirthendu Sarkar As of now xsk core drops any xdp_buff with data size greater than the xsk frame_size as set by the af_xdp application. With multi-buffer support introduced in the next patch xsk core can now split those buffers into multiple descriptors provided the af_xdp application can handle them. Such capability of the application needs to be independent of the xdp_prog's frag support capability since there are cases where even a single xdp_buffer may need to be split into multiple descriptors owing to a smaller xsk frame size. For e.g., with NIC rx_buffer size set to 4kB, a 3kB packet will constitute of a single buffer and so will be sent as such to AF_XDP layer irrespective of 'xdp.frags' capability of the XDP program. Now if the xsk frame size is set to 2kB by the AF_XDP application, then the packet will need to be split into 2 descriptors if AF_XDP application can handle multi-buffer, else it needs to be dropped. Applications can now advertise their frag handling capability to xsk core so that xsk core can decide if it should drop or split xdp_buffs that exceed xsk frame size. This is done using a new 'XSK_USE_SG' bind flag for the xdp socket. Signed-off-by: Tirthendu Sarkar --- include/net/xdp_sock.h | 1 + include/uapi/linux/if_xdp.h | 6 ++++++ net/xdp/xsk.c | 5 +++-- 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index e96a1151ec75..36b0411a0d1b 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -52,6 +52,7 @@ struct xdp_sock { struct xsk_buff_pool *pool; u16 queue_id; bool zc; + bool sg; enum { XSK_READY = 0, XSK_BOUND, diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h index 4acc3a9430f3..870c18f96365 100644 --- a/include/uapi/linux/if_xdp.h +++ b/include/uapi/linux/if_xdp.h @@ -25,6 +25,12 @@ * application. */ #define XDP_USE_NEED_WAKEUP (1 << 3) +/* By setting this option, userspace application indicates that it can + * handle multiple descriptors per packet thus enabling AF_XDP to split + * multi-buffer XDP frames into multiple Rx descriptors. Without this set + * such frames will be dropped. + */ +#define XDP_USE_SG (1 << 4) /* Flags for xsk_umem_config flags */ #define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 99f90a0d04ae..62d49a81d5f6 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -896,7 +896,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) flags = sxdp->sxdp_flags; if (flags & ~(XDP_SHARED_UMEM | XDP_COPY | XDP_ZEROCOPY | - XDP_USE_NEED_WAKEUP)) + XDP_USE_NEED_WAKEUP | XDP_USE_SG)) return -EINVAL; rtnl_lock(); @@ -924,7 +924,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) struct socket *sock; if ((flags & XDP_COPY) || (flags & XDP_ZEROCOPY) || - (flags & XDP_USE_NEED_WAKEUP)) { + (flags & XDP_USE_NEED_WAKEUP) || (flags & XDP_USE_SG)) { /* Cannot specify flags for shared sockets. */ err = -EINVAL; goto out_unlock; @@ -1023,6 +1023,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) xs->dev = dev; xs->zc = xs->umem->zc; + xs->sg = !!(flags & XDP_USE_SG); xs->queue_id = qid; xp_add_xsk(xs->pool, xs); From patchwork Thu May 18 18:05:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13247220 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 167F62910D; Thu, 18 May 2023 18:06:04 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14AB1C2; Thu, 18 May 2023 11:06:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684433163; x=1715969163; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hc2V3wDiez2J3NRYgENpxhICpfuC7i21mYjxvDrm1YU=; b=fOGqBGtLfZcFZsOiIFg6U/CdiHw9Hy+klYZlW1wOnkUojh6IoXFHUZLV xuWHXHuZidjKkS+EJXzu9IxZGf29C8/Xscika71D+CfiIyrkeDZMiKLvy WHJRHJN88i0ag+m6+mCQPfFoK9Uc7kNMkrpdBmpPpGiPPSywqU9rgB49j U6e23tQc+XtT6FR77mF5MuwgVgSqxF6cQo2BOLmBfxKo9nnwjrQ1W75Po d12gFwg14cqsRCsZmQM9vDHsWJBTEk3ln5dQxqupsZ482hwFiuyKV4x/+ I3dAuY3uX+2C0/ZSMkTC84f3FUjsDzvZLKGLzO9QHLlWs4ld8BvmmlvSE A==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="350984724" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="350984724" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 11:06:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="948780138" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="948780138" Received: from boxer.igk.intel.com ([10.102.20.173]) by fmsmga006.fm.intel.com with ESMTP; 18 May 2023 11:06:00 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, tirthendu.sarkar@intel.com, maciej.fijalkowski@intel.com, bjorn@kernel.org Subject: [PATCH bpf-next 03/21] xsk: prepare both copy and zero-copy modes to co-exist Date: Thu, 18 May 2023 20:05:27 +0200 Message-Id: <20230518180545.159100-4-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230518180545.159100-1-maciej.fijalkowski@intel.com> References: <20230518180545.159100-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Currently, __xsk_rcv_zc() is a function that is responsible for producing AF_XDP Rx descriptors. It is used by both copy and zero-copy mode. Both of these modes are going to differ when multi-buffer support is going to be added. ZC will work on a chain of xdp_buff_xsk structs whereas copy-mode is going to utilize skb_shared_info contents. This means that ZC-specific changes would affect the copy mode. Let's modify __xsk_rcv_zc() to work directly on xdp_buff_xsk so the callsites have to retrieve this from xdp_buff. Also, introduce xsk_rcv_zc() which will carry all the needed later changes for supporting multi-buffer on ZC side that do not apply to copy mode. Signed-off-by: Maciej Fijalkowski --- net/xdp/xsk.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 62d49a81d5f6..3a68988dd06f 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -135,9 +135,9 @@ int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool *pool, return 0; } -static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len, u32 flags) +static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff_xsk *xskb, u32 len, + u32 flags) { - struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp); u64 addr; int err; @@ -152,6 +152,13 @@ static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len, u32 return 0; } +static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) +{ + struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp); + + return __xsk_rcv_zc(xs, xskb, len, 0); +} + static void xsk_copy_xdp(struct xdp_buff *to, struct xdp_buff *from, u32 len) { void *from_buf, *to_buf; @@ -172,6 +179,7 @@ static void xsk_copy_xdp(struct xdp_buff *to, struct xdp_buff *from, u32 len) static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) { + struct xdp_buff_xsk *xskb; struct xdp_buff *xsk_xdp; int err; u32 len; @@ -189,7 +197,8 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) } xsk_copy_xdp(xsk_xdp, xdp, len); - err = __xsk_rcv_zc(xs, xsk_xdp, len, 0); + xskb = container_of(xsk_xdp, struct xdp_buff_xsk, xdp); + err = __xsk_rcv_zc(xs, xskb, len, 0); if (err) { xsk_buff_free(xsk_xdp); return err; @@ -259,7 +268,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) if (xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL) { len = xdp->data_end - xdp->data; - return __xsk_rcv_zc(xs, xdp, len, 0); + return xsk_rcv_zc(xs, xdp, len); } err = __xsk_rcv(xs, xdp); From patchwork Thu May 18 18:05:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13247221 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 308E227735; Thu, 18 May 2023 18:06:07 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1B58F9; Thu, 18 May 2023 11:06:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684433165; x=1715969165; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=wLt5cFhr6dUJTNj++uS4GYd0eezKpS66tJZvL1PiiTc=; b=WaGxHpUpys99+dOaztyk2se9//oM3iDvxa/a/5rQhts06U2g4EsLSUib 8kY++AtcNzl/AGUJJB8CTrJ9EaBYMK30ACmzzR1xRyZJhYzVZlJrSNKAQ vpXGk7DWi5IJzteLqcEXrs59BR46ROTOuuHItzU/Z6BP+az4DuPS183FZ E1XaV2Q/sgIP4ywgpBA4D++SxuJi0hFd6iIhYvwJCNT0lU5ReubGrejl0 q74imIIiyp2mFEWgnl/T6zw3NyNZlG8oPwsQytSRmH79KKNtCOWdPIf6p MK5UkOx5nCOTfVOzGAsB4Rllt585XZyk6bOTa9nJYAF6XofESBOEuYtVI A==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="350984748" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="350984748" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 11:06:05 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="948780175" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="948780175" Received: from boxer.igk.intel.com ([10.102.20.173]) by fmsmga006.fm.intel.com with ESMTP; 18 May 2023 11:06:02 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, tirthendu.sarkar@intel.com, maciej.fijalkowski@intel.com, bjorn@kernel.org Subject: [PATCH bpf-next 04/21] xsk: move xdp_buff's data length check to xsk_rcv_check Date: Thu, 18 May 2023 20:05:28 +0200 Message-Id: <20230518180545.159100-5-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230518180545.159100-1-maciej.fijalkowski@intel.com> References: <20230518180545.159100-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Tirthendu Sarkar If the data in xdp_buff exceeds the xsk frame length, the packet needs to be dropped. This check is currently being done in __xsk_rcv(). Move the described logic to xsk_rcv_check() so that such a xdp_buff will only be dropped if the application does not support multi-buffer (absence of XDP_USE_SG bind flag). This is applicable for all cases: copy mode, zero copy mode as well as skb mode. Signed-off-by: Tirthendu Sarkar --- net/xdp/xsk.c | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 3a68988dd06f..22eeb7f6ac05 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -177,18 +177,11 @@ static void xsk_copy_xdp(struct xdp_buff *to, struct xdp_buff *from, u32 len) memcpy(to_buf, from_buf, len + metalen); } -static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) +static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) { struct xdp_buff_xsk *xskb; struct xdp_buff *xsk_xdp; int err; - u32 len; - - len = xdp->data_end - xdp->data; - if (len > xsk_pool_get_rx_frame_size(xs->pool)) { - xs->rx_dropped++; - return -ENOSPC; - } xsk_xdp = xsk_buff_alloc(xs->pool); if (!xsk_xdp) { @@ -224,7 +217,7 @@ static bool xsk_is_bound(struct xdp_sock *xs) return false; } -static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp) +static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) { if (!xsk_is_bound(xs)) return -ENXIO; @@ -232,6 +225,11 @@ static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp) if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index) return -EINVAL; + if (len > xsk_pool_get_rx_frame_size(xs->pool)) { + xs->rx_dropped++; + return -ENOSPC; + } + sk_mark_napi_id_once_xdp(&xs->sk, xdp); return 0; } @@ -245,12 +243,13 @@ static void xsk_flush(struct xdp_sock *xs) int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) { + u32 len = xdp_get_buff_len(xdp); int err; spin_lock_bh(&xs->rx_lock); - err = xsk_rcv_check(xs, xdp); + err = xsk_rcv_check(xs, xdp, len); if (!err) { - err = __xsk_rcv(xs, xdp); + err = __xsk_rcv(xs, xdp, len); xsk_flush(xs); } spin_unlock_bh(&xs->rx_lock); @@ -259,10 +258,10 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) { + u32 len = xdp_get_buff_len(xdp); int err; - u32 len; - err = xsk_rcv_check(xs, xdp); + err = xsk_rcv_check(xs, xdp, len); if (err) return err; @@ -271,7 +270,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) return xsk_rcv_zc(xs, xdp, len); } - err = __xsk_rcv(xs, xdp); + err = __xsk_rcv(xs, xdp, len); if (!err) xdp_return_buff(xdp); return err; From patchwork Thu May 18 18:05:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13247222 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 448F227738; Thu, 18 May 2023 18:06:09 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D2CE4C2; Thu, 18 May 2023 11:06:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684433167; x=1715969167; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=AgOkuCfKSTj9lFMElAs5wBXrjFCBRca7aHpZzOaNtN8=; b=bCl8aksYrb+3ujUzDMaAjdeovMfmA5el2XgClaJnXkDGAZqMlu7EsZZE KoAEduFlEXfMDJEkjDrUvfVB2qosEx/FVm2LbNnWwEF3rAnnLRbONiIz4 0ioz4ydheT2UBUz+9K+S6VW1k1jY2FDlBm7xjWsktE9190n9S6qXsyT1O zUxKSPiNiQHkzb48+5DXFSnvWJ3Uh4uJboU3/BPwXY+MO8kLPa32i6aox x/qa04autfAhEvRHKikE4eNlNQhEJHvVKv0nOHzWhUiCn1PTocebOURnw CaTQa1e2a6TVCVLXlCiNg6p4ePO/vULpb9UsOLimzLZVr7NW/7YRA8eHm g==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="350984778" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="350984778" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 11:06:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="948780206" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="948780206" Received: from boxer.igk.intel.com ([10.102.20.173]) by fmsmga006.fm.intel.com with ESMTP; 18 May 2023 11:06:05 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, tirthendu.sarkar@intel.com, maciej.fijalkowski@intel.com, bjorn@kernel.org Subject: [PATCH bpf-next 05/21] xsk: add support for AF_XDP multi-buffer on Rx path Date: Thu, 18 May 2023 20:05:29 +0200 Message-Id: <20230518180545.159100-6-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230518180545.159100-1-maciej.fijalkowski@intel.com> References: <20230518180545.159100-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Tirthendu Sarkar Add multi-buffer support for AF_XDP by extending the XDP multi-buffer support to be reflected in user-space when a packet is redirected to an AF_XDP socket. In the XDP implementation, the NIC driver builds the xdp_buff from the first frag of the packet and adds any subsequent frags in the skb_shinfo area of the xdp_buff. In AF_XDP core, XDP buffers are allocated from xdp_sock's pool and data is copied from the driver's xdp_buff and frags. Once an allocated XDP buffer is full and there is still data to be copied, the 'XDP_PKT_CONTD' flag in'options' field of the corresponding xdp ring decriptor is set and passed to the application. When application sees the aforementioned flag set it knows there is pending data for this packet that will be carried in the following descriptors. If there is no more data to be copied, the flag in 'options' field is cleared for that descriptor signalling EOP to the application. If application reads a batch of descriptors using for example the libxdp interfaces, it is not guaranteed that the batch will end with a full packet. It might end in the middle of a packet and the rest of the frames of that packet will arrive at the beginning of the next batch. AF_XDP ensures that only a complete packet (along with all its frags) is sent to application. Signed-off-by: Tirthendu Sarkar --- net/core/filter.c | 7 +-- net/xdp/xsk.c | 110 ++++++++++++++++++++++++++++++++++++---------- 2 files changed, 88 insertions(+), 29 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 451b0ec7f242..7c91be766fe2 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4344,13 +4344,8 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); enum bpf_map_type map_type = ri->map_type; - if (map_type == BPF_MAP_TYPE_XSKMAP) { - /* XDP_REDIRECT is not supported AF_XDP yet. */ - if (unlikely(xdp_buff_has_frags(xdp))) - return -EOPNOTSUPP; - + if (map_type == BPF_MAP_TYPE_XSKMAP) return __xdp_do_redirect_xsk(ri, dev, xdp, xdp_prog); - } return __xdp_do_redirect_frame(ri, dev, xdp_convert_buff_to_frame(xdp), xdp_prog); diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 22eeb7f6ac05..86d8b23ae0a7 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -159,43 +159,107 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) return __xsk_rcv_zc(xs, xskb, len, 0); } -static void xsk_copy_xdp(struct xdp_buff *to, struct xdp_buff *from, u32 len) +static void *xsk_copy_xdp_start(struct xdp_buff *from) { - void *from_buf, *to_buf; - u32 metalen; + if (unlikely(xdp_data_meta_unsupported(from))) + return from->data; + else + return from->data_meta; +} - if (unlikely(xdp_data_meta_unsupported(from))) { - from_buf = from->data; - to_buf = to->data; - metalen = 0; - } else { - from_buf = from->data_meta; - metalen = from->data - from->data_meta; - to_buf = to->data - metalen; - } +static u32 xsk_copy_xdp(void *to, void **from, u32 to_len, + u32 *from_len, skb_frag_t **frag, u32 rem) +{ + u32 copied = 0; + + while (1) { + u32 copy_len = min_t(u32, *from_len, to_len); + + memcpy(to, *from, copy_len); + copied += copy_len; + if (rem == copied) + return copied; + + if (*from_len == copy_len) { + *from = skb_frag_address(*frag); + *from_len = skb_frag_size((*frag)++); + } else { + *from += copy_len; + *from_len -= copy_len; + } + if (to_len == copy_len) + return copied; - memcpy(to_buf, from_buf, len + metalen); + to_len -= copy_len; + to += copy_len; + } } static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) { + u32 frame_size = xsk_pool_get_rx_frame_size(xs->pool); + void *copy_from = xsk_copy_xdp_start(xdp), *copy_to; + u32 from_len, meta_len, rem, num_desc; struct xdp_buff_xsk *xskb; struct xdp_buff *xsk_xdp; - int err; + skb_frag_t *frag; + + from_len = xdp->data_end - copy_from; + meta_len = xdp->data - copy_from; + rem = len + meta_len; + + if (len <= frame_size && !xdp_buff_has_frags(xdp)) { + int err; - xsk_xdp = xsk_buff_alloc(xs->pool); - if (!xsk_xdp) { + xsk_xdp = xsk_buff_alloc(xs->pool); + if (!xsk_xdp) { + xs->rx_dropped++; + return -ENOMEM; + } + memcpy(xsk_xdp->data - meta_len, copy_from, rem); + xskb = container_of(xsk_xdp, struct xdp_buff_xsk, xdp); + err = __xsk_rcv_zc(xs, xskb, len, 0); + if (err) { + xsk_buff_free(xsk_xdp); + return err; + } + + return 0; + } + + num_desc = (len - 1) / frame_size + 1; + + if (!xsk_buff_can_alloc(xs->pool, num_desc)) { xs->rx_dropped++; return -ENOMEM; } + if (xskq_prod_nb_free(xs->rx, num_desc) < num_desc) { + xs->rx_queue_full++; + return -ENOBUFS; + } - xsk_copy_xdp(xsk_xdp, xdp, len); - xskb = container_of(xsk_xdp, struct xdp_buff_xsk, xdp); - err = __xsk_rcv_zc(xs, xskb, len, 0); - if (err) { - xsk_buff_free(xsk_xdp); - return err; + if (xdp_buff_has_frags(xdp)) { + struct skb_shared_info *sinfo; + + sinfo = xdp_get_shared_info_from_buff(xdp); + frag = &sinfo->frags[0]; } + + do { + u32 to_len = frame_size + meta_len; + u32 copied; + + xsk_xdp = xsk_buff_alloc(xs->pool); + copy_to = xsk_xdp->data - meta_len; + + copied = xsk_copy_xdp(copy_to, ©_from, to_len, &from_len, &frag, rem); + rem -= copied; + + xskb = container_of(xsk_xdp, struct xdp_buff_xsk, xdp); + __xsk_rcv_zc(xs, xskb, copied - meta_len, rem ? XDP_PKT_CONTD : 0); + meta_len = 0; + } while (rem); + return 0; } @@ -225,7 +289,7 @@ static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index) return -EINVAL; - if (len > xsk_pool_get_rx_frame_size(xs->pool)) { + if (len > xsk_pool_get_rx_frame_size(xs->pool) && !xs->sg) { xs->rx_dropped++; return -ENOSPC; } From patchwork Thu May 18 18:05:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13247223 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 941B3449B1; Thu, 18 May 2023 18:06:11 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2ADCBC2; Thu, 18 May 2023 11:06:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684433170; x=1715969170; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tHcLMTkuK8RFIe6q3fYVgEI06b/qhxmSTSet7FSe1rQ=; b=GLkIFbdAgBIABhwHxUQsD7uBhttLi5coaZefnWFUxgZVMmY3uLyz2vjA iufztd9FoNrjRKPNQA8hqVX4wmlrSPZDMkfwT7WL6HErwmBwacZITG2n9 yCRsLDX6TI2aFZCRrdT+MlnWra7imK0piypWFF2QqYJSUQ+T6YaUUjo1I IFMvl3JKz/m23f+Pr7LEGI1L4S6FHD1VGnwjfOTdxetHlKZ05iv7PiSwl 81yT7RCCw2p3bF7gnyDtcKgVP5Zcqltz+5rtc0c3nf5hEa9qbJQJtKFcM N8Nc5fDMSIclERopQC4t+04Q50q3qx7fsRerr3Vlzy1SFkX+IioDLUkyv A==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="350984798" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="350984798" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 11:06:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="948780228" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="948780228" Received: from boxer.igk.intel.com ([10.102.20.173]) by fmsmga006.fm.intel.com with ESMTP; 18 May 2023 11:06:07 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, tirthendu.sarkar@intel.com, maciej.fijalkowski@intel.com, bjorn@kernel.org Subject: [PATCH bpf-next 06/21] xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path Date: Thu, 18 May 2023 20:05:30 +0200 Message-Id: <20230518180545.159100-7-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230518180545.159100-1-maciej.fijalkowski@intel.com> References: <20230518180545.159100-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Tirthendu Sarkar In Tx path, xsk core reserves space for each desc to be transmitted in the completion queue and it's address contained in it is stored in the skb destructor arg. After successful transmission the skb destructor submits the addr marking completion. To handle multiple descriptors per packet, now along with reserving space for each descriptor, the corresponding address is also stored in completion queue. The number of pending descriptors are stored in skb destructor arg and is used by the skb destructor to update completions. Introduce 'skb' in xdp_sock to store a partially built packet when __xsk_generic_xmit() must return before it sees the EOP descriptor for the current packet so that packet building can resume in next call of __xsk_generic_xmit(). Helper functions are introduced to set and get the pending descriptors in the skb destructor arg. Also, wrappers are introduced for storing descriptor addresses, submitting and cancelling (for unsuccessful transmissions) the number of completions. Signed-off-by: Tirthendu Sarkar --- include/net/xdp_sock.h | 6 ++++ net/xdp/xsk.c | 74 ++++++++++++++++++++++++++++++------------ net/xdp/xsk_queue.h | 19 ++++------- 3 files changed, 67 insertions(+), 32 deletions(-) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 36b0411a0d1b..1617af380162 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -68,6 +68,12 @@ struct xdp_sock { u64 rx_dropped; u64 rx_queue_full; + /* When __xsk_generic_xmit() must return before it sees the EOP descriptor for the current + * packet, the partially built skb is saved here so that packet building can resume in next + * call of __xsk_generic_xmit(). + */ + struct sk_buff *skb; + struct list_head map_list; /* Protects map_list */ spinlock_t map_list_lock; diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 86d8b23ae0a7..29bda8452e2c 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -480,19 +480,65 @@ static int xsk_wakeup(struct xdp_sock *xs, u8 flags) return dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id, flags); } -static void xsk_destruct_skb(struct sk_buff *skb) +static int xsk_cq_reserve_addr_locked(struct xdp_sock *xs, u64 addr) +{ + unsigned long flags; + int ret; + + spin_lock_irqsave(&xs->pool->cq_lock, flags); + ret = xskq_prod_reserve_addr(xs->pool->cq, addr); + spin_unlock_irqrestore(&xs->pool->cq_lock, flags); + + return ret; +} + +static void xsk_cq_submit_locked(struct xdp_sock *xs, u32 n) +{ + unsigned long flags; + + spin_lock_irqsave(&xs->pool->cq_lock, flags); + xskq_prod_submit_n(xs->pool->cq, n); + spin_unlock_irqrestore(&xs->pool->cq_lock, flags); +} + +static void xsk_cq_cancel_locked(struct xdp_sock *xs, u32 n) { - u64 addr = (u64)(long)skb_shinfo(skb)->destructor_arg; - struct xdp_sock *xs = xdp_sk(skb->sk); unsigned long flags; spin_lock_irqsave(&xs->pool->cq_lock, flags); - xskq_prod_submit_addr(xs->pool->cq, addr); + xskq_prod_cancel_n(xs->pool->cq, n); spin_unlock_irqrestore(&xs->pool->cq_lock, flags); +} + +static u32 xsk_get_num_desc(struct sk_buff *skb) +{ + return skb ? (long)skb_shinfo(skb)->destructor_arg : 0; +} +static void xsk_destruct_skb(struct sk_buff *skb) +{ + xsk_cq_submit_locked(xdp_sk(skb->sk), xsk_get_num_desc(skb)); sock_wfree(skb); } +static void xsk_set_destructor_arg(struct sk_buff *skb) +{ + long num = xsk_get_num_desc(xdp_sk(skb->sk)->skb) + 1; + + skb_shinfo(skb)->destructor_arg = (void *)num; +} + +static void xsk_consume_skb(struct sk_buff *skb) +{ + struct xdp_sock *xs = xdp_sk(skb->sk); + + skb->destructor = sock_wfree; + xsk_cq_cancel_locked(xs, xsk_get_num_desc(skb)); + /* Free skb without triggering the perf drop trace */ + consume_skb(skb); + xs->skb = NULL; +} + static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, struct xdp_desc *desc) { @@ -578,8 +624,8 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, skb->dev = dev; skb->priority = xs->sk.sk_priority; skb->mark = xs->sk.sk_mark; - skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr; skb->destructor = xsk_destruct_skb; + xsk_set_destructor_arg(skb); return skb; } @@ -591,7 +637,6 @@ static int __xsk_generic_xmit(struct sock *sk) bool sent_frame = false; struct xdp_desc desc; struct sk_buff *skb; - unsigned long flags; int err = 0; mutex_lock(&xs->mutex); @@ -616,31 +661,20 @@ static int __xsk_generic_xmit(struct sock *sk) * if there is space in it. This avoids having to implement * any buffering in the Tx path. */ - spin_lock_irqsave(&xs->pool->cq_lock, flags); - if (xskq_prod_reserve(xs->pool->cq)) { - spin_unlock_irqrestore(&xs->pool->cq_lock, flags); + if (xsk_cq_reserve_addr_locked(xs, desc.addr)) goto out; - } - spin_unlock_irqrestore(&xs->pool->cq_lock, flags); skb = xsk_build_skb(xs, &desc); if (IS_ERR(skb)) { err = PTR_ERR(skb); - spin_lock_irqsave(&xs->pool->cq_lock, flags); - xskq_prod_cancel(xs->pool->cq); - spin_unlock_irqrestore(&xs->pool->cq_lock, flags); + xsk_cq_cancel_locked(xs, 1); goto out; } err = __dev_direct_xmit(skb, xs->queue_id); if (err == NETDEV_TX_BUSY) { /* Tell user-space to retry the send */ - skb->destructor = sock_wfree; - spin_lock_irqsave(&xs->pool->cq_lock, flags); - xskq_prod_cancel(xs->pool->cq); - spin_unlock_irqrestore(&xs->pool->cq_lock, flags); - /* Free skb without triggering the perf drop trace */ - consume_skb(skb); + xsk_consume_skb(skb); err = -EAGAIN; goto out; } diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index ad81b19e6fdf..4190f43ce0b0 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -297,6 +297,11 @@ static inline void xskq_cons_release(struct xsk_queue *q) q->cached_cons++; } +static inline void xskq_cons_cancel_n(struct xsk_queue *q, u32 cnt) +{ + q->cached_cons -= cnt; +} + static inline u32 xskq_cons_present_entries(struct xsk_queue *q) { /* No barriers needed since data is not accessed */ @@ -324,9 +329,9 @@ static inline bool xskq_prod_is_full(struct xsk_queue *q) return xskq_prod_nb_free(q, 1) ? false : true; } -static inline void xskq_prod_cancel(struct xsk_queue *q) +static inline void xskq_prod_cancel_n(struct xsk_queue *q, u32 cnt) { - q->cached_prod--; + q->cached_prod -= cnt; } static inline int xskq_prod_reserve(struct xsk_queue *q) @@ -392,16 +397,6 @@ static inline void xskq_prod_submit(struct xsk_queue *q) __xskq_prod_submit(q, q->cached_prod); } -static inline void xskq_prod_submit_addr(struct xsk_queue *q, u64 addr) -{ - struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring; - u32 idx = q->ring->producer; - - ring->desc[idx++ & q->ring_mask] = addr; - - __xskq_prod_submit(q, idx); -} - static inline void xskq_prod_submit_n(struct xsk_queue *q, u32 nb_entries) { __xskq_prod_submit(q, q->ring->producer + nb_entries); From patchwork Thu May 18 18:05:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13247224 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2B204449B1; Thu, 18 May 2023 18:06:13 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 121A6F9; Thu, 18 May 2023 11:06:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684433172; x=1715969172; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=wMV2V2E1sFrBlyO+yBc8Em/Okxc0+BCvYpMSJeYiI1E=; b=F38MvkiJveQ9m6STgjlFecEBRKzj/Yva2vOgiUD8Tpj6GPwCyre0hdQu PbJq1QHA1DRhGQhyGfWec3GLrh3hyNiAq21g2ykCGuQ+dGvWTjtcCcztZ UBqbH1P8TeFcosgfWodAVHPfmunVDIZZFv2wR3ySQqSBB7wE593wpfGuD IcJXpszKWc14lMCCDcXW5P1X/Rpqh9udizySxNUF4PaGiNL8FkznDsKMz DQ6G2TXbl7R39fnjn0PPL4rDDF8nMi1WNTsdG0Dis+/xbmVcsEu36mToq QlTufUWlqYy7euB7tcgPV+/7/N1mqIo0Mc06jGaFSkqpws6mFnALbpeQW A==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="350984820" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="350984820" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 11:06:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="948780250" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="948780250" Received: from boxer.igk.intel.com ([10.102.20.173]) by fmsmga006.fm.intel.com with ESMTP; 18 May 2023 11:06:09 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, tirthendu.sarkar@intel.com, maciej.fijalkowski@intel.com, bjorn@kernel.org Subject: [PATCH bpf-next 07/21] xsk: allow core/drivers to test EOP bit Date: Thu, 18 May 2023 20:05:31 +0200 Message-Id: <20230518180545.159100-8-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230518180545.159100-1-maciej.fijalkowski@intel.com> References: <20230518180545.159100-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Drivers are used to check for EOP bit whereas AF_XDP operates on inverted logic - user space indicates that current frag is not the last one and packet continues. For AF_XDP core needs, add xp_mb_desc() that will simply test XDP_PKT_CONTD from xdp_desc::options, but in order to preserve drivers default behavior, introduce an interface for ZC drivers that will negate xp_mb_desc() result and therefore make it easier to test EOP bit from during production of HW Tx descriptors. Signed-off-by: Maciej Fijalkowski --- include/net/xdp_sock_drv.h | 10 ++++++++++ include/net/xsk_buff_pool.h | 5 +++++ 2 files changed, 15 insertions(+) diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index 9c0d860609ba..a3e18b958d93 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -89,6 +89,11 @@ static inline struct xdp_buff *xsk_buff_alloc(struct xsk_buff_pool *pool) return xp_alloc(pool); } +static inline bool xsk_is_eop_desc(struct xdp_desc *desc) +{ + return !xp_mb_desc(desc); +} + /* Returns as many entries as possible up to max. 0 <= N <= max. */ static inline u32 xsk_buff_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u32 max) { @@ -241,6 +246,11 @@ static inline struct xdp_buff *xsk_buff_alloc(struct xsk_buff_pool *pool) return NULL; } +static inline bool xsk_is_eop_desc(struct xdp_desc *desc) +{ + return false; +} + static inline u32 xsk_buff_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u32 max) { return 0; diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index a8d7b8a3688a..4dcca163e076 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -184,6 +184,11 @@ static inline bool xp_desc_crosses_non_contig_pg(struct xsk_buff_pool *pool, !(pool->dma_pages[addr >> PAGE_SHIFT] & XSK_NEXT_PG_CONTIG_MASK); } +static inline bool xp_mb_desc(struct xdp_desc *desc) +{ + return desc->options & XDP_PKT_CONTD; +} + static inline u64 xp_aligned_extract_addr(struct xsk_buff_pool *pool, u64 addr) { return addr & pool->chunk_mask; From patchwork Thu May 18 18:05:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13247225 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1C07E27709; Thu, 18 May 2023 18:06:16 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59FB4C2; Thu, 18 May 2023 11:06:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684433174; x=1715969174; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yVBdPR2N9CVJrZ231YOi6wbjOIifqnd2Ce+fVsYjncs=; b=SlQXYf1UngNOn9GTcHXmijx09tdrdxQkdvvRQlceJQBxeT64AI8AOwrx S3vHAM/dYmFDJagrtSYFfnP/JTICnWGlFyDwOHN02E/sj3qt3h+r4v8GK RyQpCmtwZNJjlRq+IrDp5bpiwC58FHWQCK+wbz2EyxXgDsReq+vnBDxjk IrUdRwZ9+x1djkexUDDqLEBF9rB+tbjPQ/DBGs/XHDh4GQAdRzmXqCNJh RmvcZ+Bksn0ZDdLIJU0+DT1zXjCWfHT8KiI4Zr9b9WQ94b6Q7spIEHiUL bxCbcKCoPV0QU5PA1sTIcx3UiC5kcLzZ/rYkilm9zSUIfLfHhXQUfIqjH w==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="350984837" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="350984837" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 11:06:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="948780285" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="948780285" Received: from boxer.igk.intel.com ([10.102.20.173]) by fmsmga006.fm.intel.com with ESMTP; 18 May 2023 11:06:11 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, tirthendu.sarkar@intel.com, maciej.fijalkowski@intel.com, bjorn@kernel.org Subject: [PATCH bpf-next 08/21] xsk: add support for AF_XDP multi-buffer on Tx path Date: Thu, 18 May 2023 20:05:32 +0200 Message-Id: <20230518180545.159100-9-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230518180545.159100-1-maciej.fijalkowski@intel.com> References: <20230518180545.159100-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Tirthendu Sarkar For transmitting an AF_XDP packet, allocate skb while processing the first desc and copy data to it. The 'XDP_PKT_CONTD' flag in 'options' field of the desc indicates the EOP status of the packet. If the current desc is not EOP, store the skb, release the current desc and go on to read the next descs. Allocate a page for each subsequent desc, copy data to it and add it as a frag in the skb stored in xsk. On processing EOP, transmit the skb with frags. Addresses contained in descs have been already queued in consumer queue and skb destructor updated the completion count. On transmit failure cancel the releases, clear the descs from the completion queue and consume the skb for retrying packet transmission. For any invalid descriptor (invalid length/address/options) in the middle of a packet, all pending descriptors will be dropped by xsk core along with the invalid one and the next descriptor is treated as the start of a new packet. Maximum supported frames for a packet is MAX_SKB_FRAGS + 1. If it is exceeded, all descriptors accumulated so far are dropped. Signed-off-by: Tirthendu Sarkar --- net/xdp/xsk.c | 117 +++++++++++++++++++++++++++++++++----------- net/xdp/xsk_queue.h | 13 +++-- 2 files changed, 97 insertions(+), 33 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 29bda8452e2c..3df635bb2a57 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -393,7 +393,8 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) rcu_read_lock(); list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) { if (!xskq_cons_peek_desc(xs->tx, desc, pool)) { - xs->tx->queue_empty_descs++; + if (xskq_has_descs(xs->tx)) + xskq_cons_release(xs->tx); continue; } @@ -539,24 +540,32 @@ static void xsk_consume_skb(struct sk_buff *skb) xs->skb = NULL; } +static void xsk_drop_skb(struct sk_buff *skb) +{ + xdp_sk(skb->sk)->tx->invalid_descs += xsk_get_num_desc(skb); + xsk_consume_skb(skb); +} + static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, struct xdp_desc *desc) { struct xsk_buff_pool *pool = xs->pool; u32 hr, len, ts, offset, copy, copied; - struct sk_buff *skb; + struct sk_buff *skb = xs->skb; struct page *page; void *buffer; int err, i; u64 addr; - hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(xs->dev->needed_headroom)); + if (!skb) { + hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(xs->dev->needed_headroom)); - skb = sock_alloc_send_skb(&xs->sk, hr, 1, &err); - if (unlikely(!skb)) - return ERR_PTR(err); + skb = sock_alloc_send_skb(&xs->sk, hr, 1, &err); + if (unlikely(!skb)) + return ERR_PTR(err); - skb_reserve(skb, hr); + skb_reserve(skb, hr); + } addr = desc->addr; len = desc->len; @@ -566,7 +575,10 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, offset = offset_in_page(buffer); addr = buffer - pool->addrs; - for (copied = 0, i = 0; copied < len; i++) { + for (copied = 0, i = skb_shinfo(skb)->nr_frags; copied < len; i++) { + if (unlikely(i >= MAX_SKB_FRAGS)) + return ERR_PTR(-EFAULT); + page = pool->umem->pgs[addr >> PAGE_SHIFT]; get_page(page); @@ -591,33 +603,56 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, struct xdp_desc *desc) { struct net_device *dev = xs->dev; - struct sk_buff *skb; + struct sk_buff *skb = xs->skb; + int err; if (dev->priv_flags & IFF_TX_SKB_NO_LINEAR) { skb = xsk_build_skb_zerocopy(xs, desc); - if (IS_ERR(skb)) - return skb; + if (IS_ERR(skb)) { + err = PTR_ERR(skb); + goto free_err; + } } else { u32 hr, tr, len; void *buffer; - int err; - hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(dev->needed_headroom)); - tr = dev->needed_tailroom; + buffer = xsk_buff_raw_get_data(xs->pool, desc->addr); len = desc->len; - skb = sock_alloc_send_skb(&xs->sk, hr + len + tr, 1, &err); - if (unlikely(!skb)) - return ERR_PTR(err); + if (!skb) { + hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(dev->needed_headroom)); + tr = dev->needed_tailroom; + skb = sock_alloc_send_skb(&xs->sk, hr + len + tr, 1, &err); + if (unlikely(!skb)) + goto free_err; - skb_reserve(skb, hr); - skb_put(skb, len); + skb_reserve(skb, hr); + skb_put(skb, len); - buffer = xsk_buff_raw_get_data(xs->pool, desc->addr); - err = skb_store_bits(skb, 0, buffer, len); - if (unlikely(err)) { - kfree_skb(skb); - return ERR_PTR(err); + err = skb_store_bits(skb, 0, buffer, len); + if (unlikely(err)) + goto free_err; + } else { + int nr_frags = skb_shinfo(skb)->nr_frags; + struct page *page; + u8 *vaddr; + + if (unlikely(nr_frags >= XSK_DESC_MAX_FRAGS)) { + err = -EFAULT; + goto free_err; + } + + page = alloc_page(xs->sk.sk_allocation); + if (unlikely(!page)) { + err = -EAGAIN; + goto free_err; + } + + vaddr = kmap_local_page(page); + memcpy(vaddr, buffer, len); + kunmap_local(vaddr); + + skb_add_rx_frag(skb, nr_frags, page, 0, len, 0); } } @@ -628,6 +663,17 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, xsk_set_destructor_arg(skb); return skb; + +free_err: + if (err == -EAGAIN) { + xsk_cq_cancel_locked(xs, 1); + } else { + xsk_set_destructor_arg(skb); + xsk_drop_skb(skb); + xskq_cons_release(xs->tx); + } + + return ERR_PTR(err); } static int __xsk_generic_xmit(struct sock *sk) @@ -667,30 +713,45 @@ static int __xsk_generic_xmit(struct sock *sk) skb = xsk_build_skb(xs, &desc); if (IS_ERR(skb)) { err = PTR_ERR(skb); - xsk_cq_cancel_locked(xs, 1); - goto out; + if (err == -EAGAIN) + goto out; + err = 0; + continue; + } + + xskq_cons_release(xs->tx); + + if (xp_mb_desc(&desc)) { + xs->skb = skb; + continue; } err = __dev_direct_xmit(skb, xs->queue_id); if (err == NETDEV_TX_BUSY) { /* Tell user-space to retry the send */ + xskq_cons_cancel_n(xs->tx, xsk_get_num_desc(skb)); xsk_consume_skb(skb); err = -EAGAIN; goto out; } - xskq_cons_release(xs->tx); /* Ignore NET_XMIT_CN as packet might have been sent */ if (err == NET_XMIT_DROP) { /* SKB completed but not sent */ err = -EBUSY; + xs->skb = NULL; goto out; } sent_frame = true; + xs->skb = NULL; } - xs->tx->queue_empty_descs++; + if (xskq_has_descs(xs->tx)) { + if (xs->skb) + xsk_drop_skb(xs->skb); + xskq_cons_release(xs->tx); + } out: if (sent_frame) diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index 4190f43ce0b0..2d2af9fc2744 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -175,6 +175,11 @@ static inline bool xp_validate_desc(struct xsk_buff_pool *pool, xp_aligned_validate_desc(pool, desc); } +static inline bool xskq_has_descs(struct xsk_queue *q) +{ + return q->cached_cons != q->cached_prod; +} + static inline bool xskq_cons_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d, struct xsk_buff_pool *pool) @@ -190,17 +195,15 @@ static inline bool xskq_cons_read_desc(struct xsk_queue *q, struct xdp_desc *desc, struct xsk_buff_pool *pool) { - while (q->cached_cons != q->cached_prod) { + if (q->cached_cons != q->cached_prod) { struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring; u32 idx = q->cached_cons & q->ring_mask; *desc = ring->desc[idx]; - if (xskq_cons_is_valid_desc(q, desc, pool)) - return true; - - q->cached_cons++; + return xskq_cons_is_valid_desc(q, desc, pool); } + q->queue_empty_descs++; return false; } From patchwork Thu May 18 18:05:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13247226 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0CF44134A3; Thu, 18 May 2023 18:06:18 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 00054C2; Thu, 18 May 2023 11:06:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684433177; x=1715969177; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EsUcyGLX/1e3p/Xfu/7Y1Zlz4JefwYwv/uRem1HNiiE=; b=SN9lovzri7w6hRKyRKq83fnpVZnEQU07Tulct4aoOusHDJi4M40PVPAf SpIMUBAx7nMpwNCB8leoV5spFMKjZeWztUDKdi6cfwZDUX6kSiI1S1hX3 +T1E7G/B0C9uN1+/428OcVzEsQmdXLa9vnZf+Fk+96jBabq3KTSKAn6mV +8VQ3QnHsVOMnCmV1l0FAEL6H8eAauEeajovfeJ5s57fb0WdHPYAOzzMt pGpQ8RR75LLhcUNfYrvb9Vwo6EMqXhpMD+meDzyD9XktAfdzSRzg4KRV9 9jRyn8hVPcIh0KxUumm49c4TYdblCZcExHeTewkQzPPrlGeRYUfNZ6AnM g==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="350984862" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="350984862" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 11:06:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="948780320" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="948780320" Received: from boxer.igk.intel.com ([10.102.20.173]) by fmsmga006.fm.intel.com with ESMTP; 18 May 2023 11:06:14 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, tirthendu.sarkar@intel.com, maciej.fijalkowski@intel.com, bjorn@kernel.org Subject: [PATCH bpf-next 09/21] xsk: discard zero length descriptors in Tx path Date: Thu, 18 May 2023 20:05:33 +0200 Message-Id: <20230518180545.159100-10-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230518180545.159100-1-maciej.fijalkowski@intel.com> References: <20230518180545.159100-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Tirthendu Sarkar Descriptors with zero length are not supported by many NICs. To preserve uniform behavior discard any zero length desc as invvalid desc. Signed-off-by: Tirthendu Sarkar Signed-off-by: Maciej Fijalkowski --- net/xdp/xsk_queue.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index 2d2af9fc2744..ab0d13d6d90e 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -140,6 +140,9 @@ static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool, { u64 offset = desc->addr & (pool->chunk_size - 1); + if (!desc->len) + return false; + if (offset + desc->len > pool->chunk_size) return false; @@ -156,6 +159,9 @@ static inline bool xp_unaligned_validate_desc(struct xsk_buff_pool *pool, { u64 addr = xp_unaligned_add_offset_to_addr(desc->addr); + if (!desc->len) + return false; + if (desc->len > pool->chunk_size) return false; From patchwork Thu May 18 18:05:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13247227 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AF83B134A3; Thu, 18 May 2023 18:06:22 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5D928F9; Thu, 18 May 2023 11:06:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684433181; x=1715969181; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Yf1SvWPjJp8YHvQYZtcKlU4iSGePPMdo2rMOgNXSfQ8=; b=jRCXw/rl6Lo+ossZl1exzmtPrsW7PDMSRXbyUsrNYA7z4+lCbN8o+DeU d+LBThF5jhkpH0WtPx0trzurM3pIVZAthL7NeNp/+oghpn5jJec7C+LPU gUMjVTohrX/A0BbN+T0f+4o/kW+1Yd2paTJ66h3OHEceFa528wN6f4BNG brHAotxsNdjMFnrb6i2GjQGNSbCGf3q9tkxFLtSuLIoa0caj3fSDGQePP CqI/Jjea5/Q41EQWD1y3gLN+M3Cdy+a4iWgkLXQA9/coQZikB/B5O9SJz U6Dk0sfqZUzyGHrDjhRA1cvtfDzhcB6NV40wpaSjkGVOerByFzcZ7OYxR g==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="350984900" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="350984900" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 11:06:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="948780336" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="948780336" Received: from boxer.igk.intel.com ([10.102.20.173]) by fmsmga006.fm.intel.com with ESMTP; 18 May 2023 11:06:16 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, tirthendu.sarkar@intel.com, maciej.fijalkowski@intel.com, bjorn@kernel.org Subject: [PATCH bpf-next 10/21] xsk: support mbuf on ZC RX Date: Thu, 18 May 2023 20:05:34 +0200 Message-Id: <20230518180545.159100-11-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230518180545.159100-1-maciej.fijalkowski@intel.com> References: <20230518180545.159100-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Extend xdp_buff_xsk with two new list_head members - xskb_list and xskb_list_node. This means that each xdp_buff_xsk can be a node in the chain of these structs and also each xskb can carry the list by itself. This is needed so ZC drivers can add frags as xskb nodes which will make it possible to handle it both when producing AF_XDP Rx descriptors as well as freeing/recycling all the frags that a single frame carries. Speaking of latter, update xsk_buff_free() to take care of list nodes. For the former (adding as frags), introduce xsk_buff_add_frag() for ZC drivers usage that is going to be used to add a frag to the first xskb. xsk_buff_get_frag() will be utilized by XDP_TX and, on contrary, will return xdp_buff. One of the previous patches added a wrapper for ZC Rx so implement xskb list walk and production of Rx descriptors there. Signed-off-by: Maciej Fijalkowski --- include/net/xdp_sock_drv.h | 45 +++++++++++++++++++++++++++++++++++++ include/net/xsk_buff_pool.h | 2 ++ net/xdp/xsk.c | 24 +++++++++++++++++++- net/xdp/xsk_buff_pool.c | 2 ++ 4 files changed, 72 insertions(+), 1 deletion(-) diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index a3e18b958d93..1dbbe2b9c57e 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -108,10 +108,46 @@ static inline bool xsk_buff_can_alloc(struct xsk_buff_pool *pool, u32 count) static inline void xsk_buff_free(struct xdp_buff *xdp) { struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp); + struct xdp_buff_xsk *pos, *tmp; + if (likely(!xdp_buff_has_frags(xdp))) + goto out; + + list_for_each_entry_safe(pos, tmp, &xskb->xskb_list, xskb_list_node) { + list_del(&pos->xskb_list_node); + xp_free(pos); + } + + xdp_get_shared_info_from_buff(xdp)->nr_frags = 0; +out: xp_free(xskb); } +static inline void xsk_buff_add_frag(struct xdp_buff *first, + struct xdp_buff *xdp) +{ + struct xdp_buff_xsk *xskb = container_of(first, struct xdp_buff_xsk, xdp); + struct xdp_buff_xsk *frag = container_of(xdp, struct xdp_buff_xsk, xdp); + + list_add_tail(&frag->xskb_list_node, &xskb->xskb_list); +} + +static inline struct xdp_buff *xsk_buff_get_frag(struct xdp_buff *first) +{ + struct xdp_buff_xsk *xskb = container_of(first, struct xdp_buff_xsk, xdp); + struct xdp_buff *ret = NULL; + struct xdp_buff_xsk *frag; + + frag = list_first_entry_or_null(&xskb->xskb_list, struct xdp_buff_xsk, + xskb_list_node); + if (frag) { + list_del(&frag->xskb_list_node); + ret = &frag->xdp; + } + + return ret; +} + static inline void xsk_buff_set_size(struct xdp_buff *xdp, u32 size) { xdp->data = xdp->data_hard_start + XDP_PACKET_HEADROOM; @@ -265,6 +301,15 @@ static inline void xsk_buff_free(struct xdp_buff *xdp) { } +static inline void xsk_buff_add_frag(struct xdp_buff *first, + struct xdp_buff *xdp) +{ +} + +static inline struct xdp_buff *xsk_buff_get_frag(struct xdp_buff *first) +{ +} + static inline void xsk_buff_discard(struct xdp_buff *xdp) { } diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index 4dcca163e076..8a430163512c 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -29,6 +29,8 @@ struct xdp_buff_xsk { struct xsk_buff_pool *pool; u64 orig_addr; struct list_head free_list_node; + struct list_head xskb_list_node; + struct list_head xskb_list; }; #define XSK_CHECK_PRIV_TYPE(t) BUILD_BUG_ON(sizeof(t) > offsetofend(struct xdp_buff_xsk, cb)) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 3df635bb2a57..fa2034c1721b 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -155,8 +155,30 @@ static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff_xsk *xskb, u32 len, static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) { struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp); + u32 frags = xdp_buff_has_frags(xdp); + struct xdp_buff_xsk *pos, *tmp; + u32 contd = 0; + int err; + + if (frags) + contd = XDP_PKT_CONTD; - return __xsk_rcv_zc(xs, xskb, len, 0); + err = __xsk_rcv_zc(xs, xskb, len, contd); + if (err || likely(!frags)) + goto out; + + list_for_each_entry_safe(pos, tmp, &xskb->xskb_list, xskb_list_node) { + if (list_is_singular(&xskb->xskb_list)) + contd = 0; + len = pos->xdp.data_end - pos->xdp.data; + err = __xsk_rcv_zc(xs, pos, len, contd); + if (err) + return err; + list_del(&pos->xskb_list_node); + } + +out: + return err; } static void *xsk_copy_xdp_start(struct xdp_buff *from) diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index 26f6d304451e..0a9f8ea68de3 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -99,6 +99,8 @@ struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs, xskb->pool = pool; xskb->xdp.frame_sz = umem->chunk_size - umem->headroom; INIT_LIST_HEAD(&xskb->free_list_node); + INIT_LIST_HEAD(&xskb->xskb_list_node); + INIT_LIST_HEAD(&xskb->xskb_list); if (pool->unaligned) pool->free_heads[i] = xskb; else From patchwork Thu May 18 18:05:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13247228 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 469E7134A3; Thu, 18 May 2023 18:06:54 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8919E8F; Thu, 18 May 2023 11:06:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684433212; x=1715969212; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aToO0/P/ZU91nnEI/iiSVVTAbt6SZF0tokSZtJLAf3Q=; b=UGtvVgmnRLL2kq83yUJzjZXPSdlr3jkAfhUcTsDzJ//bvtgSmexKTado x9kXp3a2C9QtgIMU0Dine+u4BAkbJAGsMLk0Qlricz9ujsfVLEICYipWs SSLsTczPhaBUjgTZ5ULG+b2CW3g7L5EMUA58Ot9M1t+k9/hIiw7PA+5y0 fzddErpK4cyVpLChkknWFYAhP/QqEruSUvzboQ2wXgvsCQsjdmFvgaUQo nGWWJUHMIN11cMcG8CPtDLjXl3GvQ1qwzRCTuFd98tMVdGhiUM9PUPSjU EdQe0Su4fa4sWxUc2tMYT/ZWkFTcgVncrF7p8aSsiVRxHusAQH2wWI74/ w==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="350984954" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="350984954" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 11:06:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="948780462" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="948780462" Received: from boxer.igk.intel.com ([10.102.20.173]) by fmsmga006.fm.intel.com with ESMTP; 18 May 2023 11:06:18 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, tirthendu.sarkar@intel.com, maciej.fijalkowski@intel.com, bjorn@kernel.org Subject: [PATCH bpf-next 11/21] ice: xsk: add RX multi-buffer support Date: Thu, 18 May 2023 20:05:35 +0200 Message-Id: <20230518180545.159100-12-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230518180545.159100-1-maciej.fijalkowski@intel.com> References: <20230518180545.159100-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net This support is strongly inspired by work that introduced multi-buffer support to regular Rx data path in ice. There are some differences, though. When adding a frag, besides adding it to skb_shared_info, use also fresh xsk_buff_add_frag() helper. Reason for doing both things is that we can not rule out the fact that AF_XDP pipeline could use XDP program that needs to access frame fragments. Without them being in skb_shared_info it will not be possible. Another difference is that XDP_PASS has to allocate a new pages for each frags and copy contents from memory backed by xsk_buff_pool. chain_len that is used for programming HW Rx descriptors no longer has to be limited to 1 when xsk_pool is present - remove this restriction. Signed-off-by: Maciej Fijalkowski --- drivers/net/ethernet/intel/ice/ice_base.c | 9 +- drivers/net/ethernet/intel/ice/ice_xsk.c | 136 ++++++++++++++++------ 2 files changed, 102 insertions(+), 43 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c index 1911d644dfa8..b790e3d6be6f 100644 --- a/drivers/net/ethernet/intel/ice/ice_base.c +++ b/drivers/net/ethernet/intel/ice/ice_base.c @@ -366,7 +366,6 @@ static unsigned int ice_rx_offset(struct ice_rx_ring *rx_ring) */ static int ice_setup_rx_ctx(struct ice_rx_ring *ring) { - int chain_len = ICE_MAX_CHAINED_RX_BUFS; struct ice_vsi *vsi = ring->vsi; u32 rxdid = ICE_RXDID_FLEX_NIC; struct ice_rlan_ctx rlan_ctx; @@ -430,17 +429,11 @@ static int ice_setup_rx_ctx(struct ice_rx_ring *ring) */ rlan_ctx.showiv = 0; - /* For AF_XDP ZC, we disallow packets to span on - * multiple buffers, thus letting us skip that - * handling in the fast-path. - */ - if (ring->xsk_pool) - chain_len = 1; /* Max packet size for this queue - must not be set to a larger value * than 5 x DBUF */ rlan_ctx.rxmax = min_t(u32, vsi->max_frame, - chain_len * ring->rx_buf_len); + ICE_MAX_CHAINED_RX_BUFS * ring->rx_buf_len); /* Rx queue threshold in units of 64 */ rlan_ctx.lrxqthresh = 1; diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index d1e489da7363..920cf2b16836 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -548,19 +548,6 @@ bool ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count) return __ice_alloc_rx_bufs_zc(rx_ring, leftover); } -/** - * ice_bump_ntc - Bump the next_to_clean counter of an Rx ring - * @rx_ring: Rx ring - */ -static void ice_bump_ntc(struct ice_rx_ring *rx_ring) -{ - int ntc = rx_ring->next_to_clean + 1; - - ntc = (ntc < rx_ring->count) ? ntc : 0; - rx_ring->next_to_clean = ntc; - prefetch(ICE_RX_DESC(rx_ring, ntc)); -} - /** * ice_construct_skb_zc - Create an sk_buff from zero-copy buffer * @rx_ring: Rx ring @@ -575,8 +562,14 @@ ice_construct_skb_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp) { unsigned int totalsize = xdp->data_end - xdp->data_meta; unsigned int metasize = xdp->data - xdp->data_meta; + struct skb_shared_info *sinfo = NULL; struct sk_buff *skb; + u32 nr_frags = 0; + if (unlikely(xdp_buff_has_frags(xdp))) { + sinfo = xdp_get_shared_info_from_buff(xdp); + nr_frags = sinfo->nr_frags; + } net_prefetch(xdp->data_meta); skb = __napi_alloc_skb(&rx_ring->q_vector->napi, totalsize, @@ -592,6 +585,29 @@ ice_construct_skb_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp) __skb_pull(skb, metasize); } + if (likely(!xdp_buff_has_frags(xdp))) + goto out; + + for (int i = 0; i < nr_frags; i++) { + struct skb_shared_info *skinfo = skb_shinfo(skb); + skb_frag_t *frag = &sinfo->frags[i]; + struct page *page; + void *addr; + + page = dev_alloc_page(); + if (!page) { + dev_kfree_skb(skb); + return NULL; + } + addr = page_to_virt(page); + + memcpy(addr, skb_frag_page(frag), skb_frag_size(frag)); + + __skb_fill_page_desc_noacc(skinfo, skinfo->nr_frags++, + addr, 0, skb_frag_size(frag)); + } + +out: xsk_buff_free(xdp); return skb; } @@ -755,6 +771,34 @@ ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, return result; } +static int +ice_add_xsk_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *first, + struct xdp_buff *xdp, const unsigned int size) +{ + struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(first); + + if (!size) + return 0; + + if (!xdp_buff_has_frags(first)) { + sinfo->nr_frags = 0; + sinfo->xdp_frags_size = 0; + xdp_buff_set_frags_flag(first); + } + + if (unlikely(sinfo->nr_frags == MAX_SKB_FRAGS)) { + xsk_buff_free(first); + return -ENOMEM; + } + + __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, + virt_to_page(xdp->data_hard_start), 0, size); + sinfo->xdp_frags_size += size; + xsk_buff_add_frag(first, xdp); + + return 0; +} + /** * ice_clean_rx_irq_zc - consumes packets from the hardware ring * @rx_ring: AF_XDP Rx ring @@ -765,9 +809,14 @@ ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) { unsigned int total_rx_bytes = 0, total_rx_packets = 0; + struct xsk_buff_pool *xsk_pool = rx_ring->xsk_pool; + u32 ntc = rx_ring->next_to_clean; + u32 ntu = rx_ring->next_to_use; + struct xdp_buff *first = NULL; struct ice_tx_ring *xdp_ring; unsigned int xdp_xmit = 0; struct bpf_prog *xdp_prog; + u32 cnt = rx_ring->count; bool failure = false; int entries_to_alloc; @@ -777,6 +826,9 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) xdp_prog = READ_ONCE(rx_ring->xdp_prog); xdp_ring = rx_ring->xdp_ring; + if (ntc != rx_ring->first_desc) + first = *ice_xdp_buf(rx_ring, rx_ring->first_desc); + while (likely(total_rx_packets < (unsigned int)budget)) { union ice_32b_rx_flex_desc *rx_desc; unsigned int size, xdp_res = 0; @@ -786,7 +838,7 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) u16 vlan_tag = 0; u16 rx_ptype; - rx_desc = ICE_RX_DESC(rx_ring, rx_ring->next_to_clean); + rx_desc = ICE_RX_DESC(rx_ring, ntc); stat_err_bits = BIT(ICE_RX_FLEX_DESC_STATUS0_DD_S); if (!ice_test_staterr(rx_desc->wb.status_error0, stat_err_bits)) @@ -798,51 +850,61 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) */ dma_rmb(); - if (unlikely(rx_ring->next_to_clean == rx_ring->next_to_use)) + if (unlikely(ntc == ntu)) break; - xdp = *ice_xdp_buf(rx_ring, rx_ring->next_to_clean); + xdp = *ice_xdp_buf(rx_ring, ntc); size = le16_to_cpu(rx_desc->wb.pkt_len) & ICE_RX_FLX_DESC_PKT_LEN_M; - if (!size) { - xdp->data = NULL; - xdp->data_end = NULL; - xdp->data_hard_start = NULL; - xdp->data_meta = NULL; - goto construct_skb; - } xsk_buff_set_size(xdp, size); - xsk_buff_dma_sync_for_cpu(xdp, rx_ring->xsk_pool); + xsk_buff_dma_sync_for_cpu(xdp, xsk_pool); + + if (!first) { + first = xdp; + xdp_buff_clear_frags_flag(first); + } else if (ice_add_xsk_frag(rx_ring, first, xdp, size)) { + break; + } + + if (++ntc == cnt) + ntc = 0; + + if (ice_is_non_eop(rx_ring, rx_desc)) + continue; - xdp_res = ice_run_xdp_zc(rx_ring, xdp, xdp_prog, xdp_ring); + xdp_res = ice_run_xdp_zc(rx_ring, first, xdp_prog, xdp_ring); if (likely(xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR))) { xdp_xmit |= xdp_res; } else if (xdp_res == ICE_XDP_EXIT) { failure = true; + first = NULL; + rx_ring->first_desc = ntc; break; } else if (xdp_res == ICE_XDP_CONSUMED) { - xsk_buff_free(xdp); + xsk_buff_free(first); } else if (xdp_res == ICE_XDP_PASS) { goto construct_skb; } - total_rx_bytes += size; + total_rx_bytes += xdp_get_buff_len(first); total_rx_packets++; - ice_bump_ntc(rx_ring); + first = NULL; + rx_ring->first_desc = ntc; continue; construct_skb: /* XDP_PASS path */ - skb = ice_construct_skb_zc(rx_ring, xdp); + skb = ice_construct_skb_zc(rx_ring, first); if (!skb) { rx_ring->ring_stats->rx_stats.alloc_buf_failed++; break; } - ice_bump_ntc(rx_ring); + first = NULL; + rx_ring->first_desc = ntc; if (eth_skb_pad(skb)) { skb = NULL; @@ -861,18 +923,22 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) ice_receive_skb(rx_ring, skb, vlan_tag); } - entries_to_alloc = ICE_DESC_UNUSED(rx_ring); + rx_ring->next_to_clean = ntc; + entries_to_alloc = ICE_RX_DESC_UNUSED(rx_ring); if (entries_to_alloc > ICE_RING_QUARTER(rx_ring)) failure |= !ice_alloc_rx_bufs_zc(rx_ring, entries_to_alloc); ice_finalize_xdp_rx(xdp_ring, xdp_xmit, 0); ice_update_rx_ring_stats(rx_ring, total_rx_packets, total_rx_bytes); - if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) { - if (failure || rx_ring->next_to_clean == rx_ring->next_to_use) - xsk_set_rx_need_wakeup(rx_ring->xsk_pool); + if (xsk_uses_need_wakeup(xsk_pool)) { + /* ntu could have changed when allocating entries above, so + * use rx_ring value instead of stack based one + */ + if (failure || ntc == rx_ring->next_to_use) + xsk_set_rx_need_wakeup(xsk_pool); else - xsk_clear_rx_need_wakeup(rx_ring->xsk_pool); + xsk_clear_rx_need_wakeup(xsk_pool); return (int)total_rx_packets; } From patchwork Thu May 18 18:05:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13247229 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7ACD4134A3; Thu, 18 May 2023 18:06:55 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 70B9A8F; Thu, 18 May 2023 11:06:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684433214; x=1715969214; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tgDv88hQgZCzP3VrYKShZSuN3qxZaUyFZJ4hl9hdzEo=; b=ZZH5i0cBmBX7wHrufJqKJHphQTEZcPdagF4U8WsEC/+lDllozSxFSvHx b89/Se3nTj82VktNVdXJQ7N1Z1G/3lzhCQxmVG43r8+6IgevSH2srzfSr 8dPvlXigngfQkJpLrA/XJefEDDMsuheKaL8/IFt/78LGvbqOimBS6WS/g HRwtyiLES+JVIWGj35zs3P7vjtMixP25IQuHjXt+mKpCnCRIk7wZd0/p5 vgr/nlxNMaXcf0dqOdbvK/pdovO/PxyBFtftV2fu2TCupRSbuT0qMWaRL aNmycoyfR9XAqbrmNjlzDxYFr0vdyy4bMRVZ54xUC0pJAbu5ICGcnz54U w==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="350984990" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="350984990" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 11:06:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="948780479" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="948780479" Received: from boxer.igk.intel.com ([10.102.20.173]) by fmsmga006.fm.intel.com with ESMTP; 18 May 2023 11:06:21 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, tirthendu.sarkar@intel.com, maciej.fijalkowski@intel.com, bjorn@kernel.org Subject: [PATCH bpf-next 12/21] xsk: support ZC Tx multi-buffer in batch API Date: Thu, 18 May 2023 20:05:36 +0200 Message-Id: <20230518180545.159100-13-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230518180545.159100-1-maciej.fijalkowski@intel.com> References: <20230518180545.159100-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Modify xskq_cons_read_desc_batch() in a way that each processed descriptor will be checked if it is an EOP one or not and act accordingly to that. Change the behavior of mentioned function to break the processing when stumbling upon invalid descriptor instead of skipping it. Furthermore, let us give only full packets down to ZC driver. With these two assumptions ZC drivers will not have to take care of an intermediate state of incomplete frames, which will simplify its implementations a lot. Signed-off-by: Maciej Fijalkowski --- net/xdp/xsk_queue.h | 41 ++++++++++++++++++++++++++++++++--------- 1 file changed, 32 insertions(+), 9 deletions(-) diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index ab0d13d6d90e..a8f11a41609a 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -48,6 +48,11 @@ struct xsk_queue { size_t ring_vmalloc_size; }; +struct parsed_desc { + u32 mb; + u32 valid; +}; + /* The structure of the shared state of the rings are a simple * circular buffer, as outlined in * Documentation/core-api/circular-buffers.rst. For the Rx and @@ -218,30 +223,48 @@ static inline void xskq_cons_release_n(struct xsk_queue *q, u32 cnt) q->cached_cons += cnt; } -static inline u32 xskq_cons_read_desc_batch(struct xsk_queue *q, struct xsk_buff_pool *pool, - u32 max) +static inline void parse_desc(struct xsk_queue *q, struct xsk_buff_pool *pool, + struct xdp_desc *desc, struct parsed_desc *parsed) +{ + parsed->valid = xskq_cons_is_valid_desc(q, desc, pool); + parsed->mb = xp_mb_desc(desc); +} + +static inline +u32 xskq_cons_read_desc_batch(struct xsk_queue *q, struct xsk_buff_pool *pool, + u32 max) { u32 cached_cons = q->cached_cons, nb_entries = 0; struct xdp_desc *descs = pool->tx_descs; + u32 total_descs = 0, nr_frags = 0; + /* track first entry, if stumble upon *any* invalid descriptor, rewind + * current packet that consists of frags and stop the processing + */ while (cached_cons != q->cached_prod && nb_entries < max) { struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring; u32 idx = cached_cons & q->ring_mask; + struct parsed_desc parsed; descs[nb_entries] = ring->desc[idx]; - if (unlikely(!xskq_cons_is_valid_desc(q, &descs[nb_entries], pool))) { - /* Skip the entry */ - cached_cons++; - continue; - } + cached_cons++; + parse_desc(q, pool, &descs[nb_entries], &parsed); + if (unlikely(!parsed.valid)) + break; nb_entries++; - cached_cons++; + if (likely(!parsed.mb)) { + total_descs += (nr_frags + 1); + nr_frags = 0; + } else { + nr_frags++; + } } + cached_cons -= nr_frags; /* Release valid plus any invalid entries */ xskq_cons_release_n(q, cached_cons - q->cached_cons); - return nb_entries; + return total_descs; } /* Functions for consumers */ From patchwork Thu May 18 18:05:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13247230 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1367D28C1C; Thu, 18 May 2023 18:06:56 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F346F3; Thu, 18 May 2023 11:06:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684433215; x=1715969215; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=IFz+u9AmH8GLo5eeLDHIHY4bJ+Z3eR8A8KVZ2XTbC2I=; b=Bs5Mhc433uvChoWUaVaDGAn2lIvriF3YcvOovZCuTcID6fRiLKO3qziv +5Yo44zcQSfcdE8IfBSavYd0p3adVV193Bk+bS+kkD0hLp4gsFvIIH9nM MTsZ6ARdyV7/WlMlvwfSf/yLPrqv1uctG1wEqV9kS5V8xWx9byOMQ/HBO eBOLXH7v1BQAUmF5EWIS1zg3LmpDPGYiPhzrzOUmEqNGW9POlaIPbh2Xo bgAL/+2PVFhWwJ5wTlefg3GTZG74IiqgB4FNBfi7IlQKV1O9seucaYbFY viqk3fAnyvUbZTFSH326YGT/Ss8+g5D1qLvbUdK7BgI3gxgW39GObh9C/ g==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="350984995" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="350984995" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 11:06:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="948780483" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="948780483" Received: from boxer.igk.intel.com ([10.102.20.173]) by fmsmga006.fm.intel.com with ESMTP; 18 May 2023 11:06:23 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, tirthendu.sarkar@intel.com, maciej.fijalkowski@intel.com, bjorn@kernel.org Subject: [PATCH bpf-next 13/21] xsk: report ZC multi-buffer capability via xdp_features Date: Thu, 18 May 2023 20:05:37 +0200 Message-Id: <20230518180545.159100-14-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230518180545.159100-1-maciej.fijalkowski@intel.com> References: <20230518180545.159100-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Introduce new xdp_feature NETDEV_XDP_ACT_NDO_ZC_SG that will be used to find out if user space that wants to do ZC multi-buffer will be able to do so against underlying ZC driver. Signed-off-by: Maciej Fijalkowski --- include/uapi/linux/netdev.h | 4 ++-- net/xdp/xsk_buff_pool.c | 6 ++++++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h index 639524b59930..bfca07224f7b 100644 --- a/include/uapi/linux/netdev.h +++ b/include/uapi/linux/netdev.h @@ -33,8 +33,8 @@ enum netdev_xdp_act { NETDEV_XDP_ACT_HW_OFFLOAD = 16, NETDEV_XDP_ACT_RX_SG = 32, NETDEV_XDP_ACT_NDO_XMIT_SG = 64, - - NETDEV_XDP_ACT_MASK = 127, + NETDEV_XDP_ACT_NDO_ZC_SG = 128, + NETDEV_XDP_ACT_MASK = 255, }; enum { diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index 0a9f8ea68de3..43cca5fa90cf 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -189,6 +189,12 @@ int xp_assign_dev(struct xsk_buff_pool *pool, goto err_unreg_pool; } + if (!(netdev->xdp_features & NETDEV_XDP_ACT_NDO_ZC_SG) && + flags & XDP_USE_SG) { + err = -EOPNOTSUPP; + goto err_unreg_pool; + } + bpf.command = XDP_SETUP_XSK_POOL; bpf.xsk.pool = pool; bpf.xsk.queue_id = queue_id; From patchwork Thu May 18 18:05:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13247231 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CC22536D82; Thu, 18 May 2023 18:06:56 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 65AAEC2; Thu, 18 May 2023 11:06:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684433215; x=1715969215; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FWkD4MZ6l0DBIbd9/NoNNUfwP1vFzcm+WNjctI/EyMQ=; b=BIYePgU9rJiq7sJazXAjDp6fnbELBehYhI3g/MBnFovsPE9Wv1xc0i2t gSjT3wqEmr7AzkuANijS9ttb4MNUtVwYjDB0rgp7+BDilQhZVw8EPlWAr VoJdBrauQNi4U1uH1ACsixWsZM/Cy+HDaVbNf7pECBPTAvcOXLonQAolv 7G37a3MFJUB3t6aMKVA8g1dNX9nfYtBGHvRRhd8FFhpbneJGO2j5Fe1Gl R8M6vNBVe1Z1qWlhREE/oaLlT9C1d4O11RC9jNxXg7dneS0tHfloXNZN9 i2fbkMi4tf5MYBs4DTCOKNp1I2qVjOwLlXcqLMvnQEWpme/y6Og/orxyc A==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="350985008" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="350985008" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 11:06:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="948780489" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="948780489" Received: from boxer.igk.intel.com ([10.102.20.173]) by fmsmga006.fm.intel.com with ESMTP; 18 May 2023 11:06:26 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, tirthendu.sarkar@intel.com, maciej.fijalkowski@intel.com, bjorn@kernel.org Subject: [PATCH bpf-next 14/21] ice: xsk: Tx multi-buffer support Date: Thu, 18 May 2023 20:05:38 +0200 Message-Id: <20230518180545.159100-15-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230518180545.159100-1-maciej.fijalkowski@intel.com> References: <20230518180545.159100-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Most of this patch is about actually supporting XDP_TX action. Pure Tx ZC support is only about looking at XDP_PKT_CONTD presence at options field and based on that generating EOP bit on Tx HW descriptor. This is that simple due to the implementation on xsk_tx_peek_release_desc_batch() where we are making sure that last produced descriptor is an EOP one. Report via xdp_features that this driver is now capable of consuming multi-buffer packets on both Rx and Tx sides. Signed-off-by: Maciej Fijalkowski --- drivers/net/ethernet/intel/ice/ice_main.c | 2 +- drivers/net/ethernet/intel/ice/ice_xsk.c | 83 ++++++++++++++++------- 2 files changed, 61 insertions(+), 24 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index a1f7c8edc22f..bd16c9de1153 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -3385,7 +3385,7 @@ static void ice_set_ops(struct ice_vsi *vsi) netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | NETDEV_XDP_ACT_XSK_ZEROCOPY | - NETDEV_XDP_ACT_RX_SG; + NETDEV_XDP_ACT_RX_SG | NETDEV_XDP_ACT_NDO_ZC_SG; } /** diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 920cf2b16836..d8bf774b5f6d 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -616,7 +616,7 @@ ice_construct_skb_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp) * ice_clean_xdp_irq_zc - produce AF_XDP descriptors to CQ * @xdp_ring: XDP Tx ring */ -static void ice_clean_xdp_irq_zc(struct ice_tx_ring *xdp_ring) +static u32 ice_clean_xdp_irq_zc(struct ice_tx_ring *xdp_ring) { u16 ntc = xdp_ring->next_to_clean; struct ice_tx_desc *tx_desc; @@ -638,7 +638,7 @@ static void ice_clean_xdp_irq_zc(struct ice_tx_ring *xdp_ring) } if (!completed_frames) - return; + return 0; if (likely(!xdp_ring->xdp_tx_active)) { xsk_frames = completed_frames; @@ -668,6 +668,8 @@ static void ice_clean_xdp_irq_zc(struct ice_tx_ring *xdp_ring) xdp_ring->next_to_clean -= cnt; if (xsk_frames) xsk_tx_completed(xdp_ring->xsk_pool, xsk_frames); + + return completed_frames; } /** @@ -685,37 +687,72 @@ static void ice_clean_xdp_irq_zc(struct ice_tx_ring *xdp_ring) static int ice_xmit_xdp_tx_zc(struct xdp_buff *xdp, struct ice_tx_ring *xdp_ring) { + struct skb_shared_info *sinfo = NULL; u32 size = xdp->data_end - xdp->data; u32 ntu = xdp_ring->next_to_use; struct ice_tx_desc *tx_desc; struct ice_tx_buf *tx_buf; - dma_addr_t dma; + struct xdp_buff *head; + u32 nr_frags = 0; + u32 free_space; + u32 frag = 0; - if (ICE_DESC_UNUSED(xdp_ring) < ICE_RING_QUARTER(xdp_ring)) { - ice_clean_xdp_irq_zc(xdp_ring); - if (!ICE_DESC_UNUSED(xdp_ring)) { - xdp_ring->ring_stats->tx_stats.tx_busy++; - return ICE_XDP_CONSUMED; - } - } + free_space = ICE_DESC_UNUSED(xdp_ring); + if (free_space < ICE_RING_QUARTER(xdp_ring)) + free_space += ice_clean_xdp_irq_zc(xdp_ring); - dma = xsk_buff_xdp_get_dma(xdp); - xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, size); + if (unlikely(!free_space)) + goto busy; + + if (unlikely(xdp_buff_has_frags(xdp))) { + sinfo = xdp_get_shared_info_from_buff(xdp); + nr_frags = sinfo->nr_frags; + if (free_space < nr_frags + 1) + goto busy; + } - tx_buf = &xdp_ring->tx_buf[ntu]; - tx_buf->xdp = xdp; - tx_buf->type = ICE_TX_BUF_XSK_TX; tx_desc = ICE_TX_DESC(xdp_ring, ntu); - tx_desc->buf_addr = cpu_to_le64(dma); - tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TX_DESC_CMD_EOP, - 0, size, 0); - xdp_ring->xdp_tx_active++; + tx_buf = &xdp_ring->tx_buf[ntu]; + head = xdp; + + for (;;) { + dma_addr_t dma; + + dma = xsk_buff_xdp_get_dma(xdp); + xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, size); + + tx_buf->xdp = xdp; + tx_buf->type = ICE_TX_BUF_XSK_TX; + tx_desc->buf_addr = cpu_to_le64(dma); + tx_desc->cmd_type_offset_bsz = ice_build_ctob(0, 0, size, 0); + /* account for each xdp_buff from xsk_buff_pool */ + xdp_ring->xdp_tx_active++; + + if (++ntu == xdp_ring->count) + ntu = 0; + + if (frag == nr_frags) + break; + + tx_desc = ICE_TX_DESC(xdp_ring, ntu); + tx_buf = &xdp_ring->tx_buf[ntu]; + + xdp = xsk_buff_get_frag(head); + size = skb_frag_size(&sinfo->frags[frag]); + frag++; + } - if (++ntu == xdp_ring->count) - ntu = 0; xdp_ring->next_to_use = ntu; + /* update last descriptor from a frame with EOP */ + tx_desc->cmd_type_offset_bsz |= + cpu_to_le64(ICE_TX_DESC_CMD_EOP << ICE_TXD_QW1_CMD_S); return ICE_XDP_TX; + +busy: + xdp_ring->ring_stats->tx_stats.tx_busy++; + + return ICE_XDP_CONSUMED; } /** @@ -963,7 +1000,7 @@ static void ice_xmit_pkt(struct ice_tx_ring *xdp_ring, struct xdp_desc *desc, tx_desc = ICE_TX_DESC(xdp_ring, xdp_ring->next_to_use++); tx_desc->buf_addr = cpu_to_le64(dma); - tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TX_DESC_CMD_EOP, + tx_desc->cmd_type_offset_bsz = ice_build_ctob(xsk_is_eop_desc(desc), 0, desc->len, 0); *total_bytes += desc->len; @@ -990,7 +1027,7 @@ static void ice_xmit_pkt_batch(struct ice_tx_ring *xdp_ring, struct xdp_desc *de tx_desc = ICE_TX_DESC(xdp_ring, ntu++); tx_desc->buf_addr = cpu_to_le64(dma); - tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TX_DESC_CMD_EOP, + tx_desc->cmd_type_offset_bsz = ice_build_ctob(xsk_is_eop_desc(&descs[i]), 0, descs[i].len, 0); *total_bytes += descs[i].len; From patchwork Thu May 18 18:05:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13247232 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 60DFF36D82; Thu, 18 May 2023 18:06:58 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D6A78F; Thu, 18 May 2023 11:06:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684433216; x=1715969216; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=JLRI9w7J8tNxgPMgG5VTS8fXd8CXzLu49u53Tfuumb0=; b=ASooOm+DvsGSMhr5wjj8Hdv4Z3niO8mTDVX5MKhRWe9F0NCZMIUh30U4 jT57wWwLhMFTK2t2jGKISd1FFtCf0OAZSaAheSTDpckt3orVgCtpNITKw jHgPbvHhI6wtQyJ6dRh6L1gTEdTQSXdtDC7+jRqMHbwvGHZu/BjzqQSRF +sGyLthfFWxbbJ1VrA4UDsV8Au24UW4DzC9G5vd9g065eWj86DKr6fsbV 7mNknIXU+I6oBaNp8oDhYVPhvvBcxaC3VJSYo3J/FWmeMNR9B68qWPQk1 nhsz8Masnz2MjJ9sOMXCU9oqqSHmUywVoPtVR5Z5JOnDM+je4EA4LkAFX g==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="350985025" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="350985025" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 11:06:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="948780517" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="948780517" Received: from boxer.igk.intel.com ([10.102.20.173]) by fmsmga006.fm.intel.com with ESMTP; 18 May 2023 11:06:28 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, tirthendu.sarkar@intel.com, maciej.fijalkowski@intel.com, bjorn@kernel.org Subject: [PATCH bpf-next 15/21] selftests/xsk: transmit and receive multi-buffer packets Date: Thu, 18 May 2023 20:05:39 +0200 Message-Id: <20230518180545.159100-16-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230518180545.159100-1-maciej.fijalkowski@intel.com> References: <20230518180545.159100-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Magnus Karlsson Add the ability to send and receive packets that are larger than the size of a umem frame, using the AF_XDP /XDP multi-buffer support. There are three pieces of code that need to be changed to achieve this: the Rx path, the Tx path, and the validation logic. Both the Rx path and Tx could only deal with a single fragment per packet. The Tx path is extended with a new function called pkt_nb_frags() that can be used to retrieve the number of fragments a packet will consume. We then create these many fragments in a loop and fill the N-1 first ones to the max size limit to use the buffer space efficiently, and the Nth one with whatever data that is left. This goes on until we have filled in at the most BATCH_SIZE worth of descriptors and fragments. If we detect that the next packet would lead to BATCH_SIZE number of fragments sent being exceeded, we do not send this packet and finish the batch. This packet is instead sent in the next iteration of BATCH_SIZE fragments. For Rx, we loop over all fragments we receive as usual, but for every descriptor that we receive we call a new validation function called is_frag_valid() to validate the consistency of this fragment. The code then checks if the packet continues in the next frame. If so, it loops over the next packet and performs the same validation. once we have received the last fragment of the packet we also call the function is_pkt_valid() to validate the packet as a whole. If we get to the end of the batch and we are not at the end of the current packet, we back out the partial packet and end the loop. Once we get into the receive loop next time, we start over from the beginning of that packet. This so the code becomes simpler at the cost of some performance. The validation function is_frag_valid() checks that the sequence and packet numbers are correct at the start and end of each fragment. Signed-off-by: Magnus Karlsson --- tools/include/uapi/linux/if_xdp.h | 3 + tools/testing/selftests/bpf/xskxceiver.c | 167 ++++++++++++++++++----- tools/testing/selftests/bpf/xskxceiver.h | 3 +- 3 files changed, 139 insertions(+), 34 deletions(-) diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h index a78a8096f4ce..80245f5b4dd7 100644 --- a/tools/include/uapi/linux/if_xdp.h +++ b/tools/include/uapi/linux/if_xdp.h @@ -106,6 +106,9 @@ struct xdp_desc { __u32 options; }; +/* Flag indicating packet constitutes of multiple buffers*/ +#define XDP_PKT_CONTD (1 << 0) + /* UMEM descriptor is __u64 */ #endif /* _LINUX_IF_XDP_H */ diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c index 218d7f694e5c..5e29e8850488 100644 --- a/tools/testing/selftests/bpf/xskxceiver.c +++ b/tools/testing/selftests/bpf/xskxceiver.c @@ -533,6 +533,11 @@ static struct pkt_stream *__pkt_stream_alloc(u32 nb_pkts) return pkt_stream; } +static bool pkt_continues(const struct xdp_desc *desc) +{ + return desc->options & XDP_PKT_CONTD; +} + static u32 ceil_u32(u32 a, u32 b) { return (a + b - 1) / b; @@ -549,7 +554,7 @@ static void pkt_set(struct xsk_umem_info *umem, struct pkt *pkt, int offset, u32 { pkt->offset = offset; pkt->len = len; - if (len > umem->frame_size - XDP_PACKET_HEADROOM - MIN_PKT_SIZE * 2 - umem->frame_headroom) + if (len > MAX_ETH_JUMBO_SIZE) pkt->valid = false; else pkt->valid = true; @@ -637,6 +642,11 @@ static u64 pkt_get_addr(struct pkt *pkt, struct xsk_umem_info *umem) return pkt->offset + umem_alloc_buffer(umem); } +static void pkt_stream_cancel(struct pkt_stream *pkt_stream) +{ + pkt_stream->current_pkt_nb--; +} + static void pkt_generate(struct ifobject *ifobject, u64 addr, u32 len, u32 pkt_nb, u32 bytes_written) { @@ -765,43 +775,81 @@ static bool is_metadata_correct(struct pkt *pkt, void *buffer, u64 addr) return true; } -static bool is_pkt_valid(struct pkt *pkt, void *buffer, u64 addr, u32 len) +static bool is_frag_valid(struct xsk_umem_info *umem, u64 addr, u32 len, u32 expected_pkt_nb, + u32 bytes_processed) { - void *data = xsk_umem__get_data(buffer, addr); - u32 seqnum, pkt_data; + u32 seqnum, pkt_nb, *pkt_data, words_to_end, expected_seqnum; + void *data = xsk_umem__get_data(umem->buffer, addr); - if (!pkt) { - ksft_print_msg("[%s] too many packets received\n", __func__); - goto error; + addr -= umem->base_addr; + + if (addr >= umem->num_frames * umem->frame_size || + addr + len > umem->num_frames * umem->frame_size) { + ksft_print_msg("Frag invalid addr: %llx len: %u\n", addr, len); + return false; + } + if (!umem->unaligned_mode && addr % umem->frame_size + len > umem->frame_size) { + ksft_print_msg("Frag crosses frame boundary addr: %llx len: %u\n", addr, len); + return false; } - if (len < MIN_PKT_SIZE || pkt->len < MIN_PKT_SIZE) { - /* Do not try to verify packets that are smaller than minimum size. */ - return true; + pkt_data = data; + if (!bytes_processed) { + pkt_data += PKT_HDR_SIZE / sizeof(*pkt_data); + len -= PKT_HDR_SIZE; + } else { + bytes_processed -= PKT_HDR_SIZE; } - if (pkt->len != len) { - ksft_print_msg("[%s] expected length [%d], got length [%d]\n", - __func__, pkt->len, len); + expected_seqnum = bytes_processed / sizeof(*pkt_data); + seqnum = ntohl(*pkt_data) & 0xffff; + pkt_nb = ntohl(*pkt_data) >> 16; + + if (expected_pkt_nb != pkt_nb) { + ksft_print_msg("[%s] expected pkt_nb [%u], got pkt_nb [%u]\n", + __func__, expected_pkt_nb, pkt_nb); + goto error; + } + if (expected_seqnum != seqnum) { + ksft_print_msg("[%s] expected seqnum at start [%u], got seqnum [%u]\n", + __func__, expected_seqnum, seqnum); goto error; } - pkt_data = ntohl(*((u32 *)(data + PKT_HDR_SIZE))); - seqnum = pkt_data >> 16; - - if (pkt->pkt_nb != seqnum) { - ksft_print_msg("[%s] expected seqnum [%d], got seqnum [%d]\n", - __func__, pkt->pkt_nb, seqnum); + words_to_end = len / sizeof(*pkt_data) - 1; + pkt_data += words_to_end; + seqnum = ntohl(*pkt_data) & 0xffff; + expected_seqnum += words_to_end; + if (expected_seqnum != seqnum) { + ksft_print_msg("[%s] expected seqnum at end [%u], got seqnum [%u]\n", + __func__, expected_seqnum, seqnum); goto error; } return true; error: - pkt_dump(data, len, true); + pkt_dump(data, len, !bytes_processed); return false; } +static bool is_pkt_valid(struct pkt *pkt, void *buffer, u64 addr, u32 len) +{ + if (!pkt) { + ksft_print_msg("[%s] too many packets received\n", __func__); + return false; + } + + if (pkt->len != len) { + ksft_print_msg("[%s] expected packet length [%d], got length [%d]\n", + __func__, pkt->len, len); + pkt_dump(xsk_umem__get_data(buffer, addr), len, true); + return false; + } + + return true; +} + static void kick_tx(struct xsk_socket_info *xsk) { int ret; @@ -854,8 +902,8 @@ static int receive_pkts(struct test_spec *test, struct pollfd *fds) { struct timeval tv_end, tv_now, tv_timeout = {THREAD_TMOUT, 0}; struct pkt_stream *pkt_stream = test->ifobj_rx->pkt_stream; - u32 idx_rx = 0, idx_fq = 0, rcvd, i, pkts_sent = 0; struct xsk_socket_info *xsk = test->ifobj_rx->xsk; + u32 idx_rx = 0, idx_fq = 0, rcvd, pkts_sent = 0; struct ifobject *ifobj = test->ifobj_rx; struct xsk_umem_info *umem = xsk->umem; struct pkt *pkt; @@ -868,6 +916,9 @@ static int receive_pkts(struct test_spec *test, struct pollfd *fds) pkt = pkt_stream_get_next_rx_pkt(pkt_stream, &pkts_sent); while (pkt) { + u32 frags_processed = 0, nb_frags = 0, pkt_len = 0; + u64 first_addr; + ret = gettimeofday(&tv_now, NULL); if (ret) exit_with_error(errno); @@ -913,27 +964,53 @@ static int receive_pkts(struct test_spec *test, struct pollfd *fds) } } - for (i = 0; i < rcvd; i++) { + while (frags_processed < rcvd) { const struct xdp_desc *desc = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++); u64 addr = desc->addr, orig; orig = xsk_umem__extract_addr(addr); addr = xsk_umem__add_offset_to_addr(addr); - if (!is_pkt_valid(pkt, umem->buffer, addr, desc->len) || + if (!is_frag_valid(umem, addr, desc->len, pkt->pkt_nb, pkt_len) || !is_offset_correct(umem, pkt, addr) || (ifobj->use_metadata && !is_metadata_correct(pkt, umem->buffer, addr))) return TEST_FAILURE; + if (!nb_frags++) + first_addr = addr; + frags_processed++; + pkt_len += desc->len; if (ifobj->use_fill_ring) *xsk_ring_prod__fill_addr(&umem->fq, idx_fq++) = orig; + + if (pkt_continues(desc)) + continue; + + /* The complete packet has been received */ + if (!is_pkt_valid(pkt, umem->buffer, first_addr, pkt_len) || + !is_offset_correct(umem, pkt, addr)) + return TEST_FAILURE; + pkt = pkt_stream_get_next_rx_pkt(pkt_stream, &pkts_sent); + nb_frags = 0; + pkt_len = 0; + } + + if (nb_frags) { + /* In the middle of a packet. Start over from beginning of packet. */ + idx_rx -= nb_frags; + xsk_ring_cons__cancel(&xsk->rx, nb_frags); + if (ifobj->use_fill_ring) { + idx_fq -= nb_frags; + xsk_ring_prod__cancel(&umem->fq, nb_frags); + } + frags_processed -= nb_frags; } if (ifobj->use_fill_ring) - xsk_ring_prod__submit(&umem->fq, rcvd); + xsk_ring_prod__submit(&umem->fq, frags_processed); if (ifobj->release_rx) - xsk_ring_cons__release(&xsk->rx, rcvd); + xsk_ring_cons__release(&xsk->rx, frags_processed); pthread_mutex_lock(&pacing_mutex); pkts_in_flight -= pkts_sent; @@ -946,13 +1023,14 @@ static int receive_pkts(struct test_spec *test, struct pollfd *fds) static int __send_pkts(struct ifobject *ifobject, struct pollfd *fds, bool timeout) { + u32 i, idx = 0, valid_pkts = 0, valid_frags = 0, buffer_len; + struct pkt_stream *pkt_stream = ifobject->pkt_stream; struct xsk_socket_info *xsk = ifobject->xsk; struct xsk_umem_info *umem = ifobject->umem; - u32 i, idx = 0, valid_pkts = 0, buffer_len; bool use_poll = ifobject->use_poll; int ret; - buffer_len = pkt_get_buffer_len(umem, ifobject->pkt_stream->max_pkt_len); + buffer_len = pkt_get_buffer_len(umem, pkt_stream->max_pkt_len); /* pkts_in_flight might be negative if many invalid packets are sent */ if (pkts_in_flight >= (int)((umem_size(umem) - BATCH_SIZE * buffer_len) / buffer_len)) { kick_tx(xsk); @@ -983,17 +1061,40 @@ static int __send_pkts(struct ifobject *ifobject, struct pollfd *fds, bool timeo } for (i = 0; i < BATCH_SIZE; i++) { - struct xdp_desc *tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx + i); - struct pkt *pkt = pkt_stream_get_next_tx_pkt(ifobject->pkt_stream); + struct pkt *pkt = pkt_stream_get_next_tx_pkt(pkt_stream); + u32 nb_frags, bytes_written = 0; if (!pkt) break; - tx_desc->addr = pkt_get_addr(pkt, umem); - tx_desc->len = pkt->len; + nb_frags = pkt_nb_frags(umem->frame_size, pkt); + if (nb_frags > BATCH_SIZE - i) { + pkt_stream_cancel(pkt_stream); + xsk_ring_prod__cancel(&xsk->tx, BATCH_SIZE - i); + break; + } + if (pkt->valid) { valid_pkts++; - pkt_generate(ifobject, tx_desc->addr, tx_desc->len, pkt->pkt_nb, 0); + valid_frags += nb_frags; + } + + while (nb_frags--) { + struct xdp_desc *tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx + i); + + tx_desc->addr = pkt_get_addr(pkt, ifobject->umem); + if (nb_frags) { + tx_desc->len = umem->frame_size; + tx_desc->options = XDP_PKT_CONTD; + i++; + } else { + tx_desc->len = pkt->len - bytes_written; + tx_desc->options = 0; + } + if (pkt->valid) + pkt_generate(ifobject, tx_desc->addr, tx_desc->len, pkt->pkt_nb, + bytes_written); + bytes_written += tx_desc->len; } } @@ -1002,7 +1103,7 @@ static int __send_pkts(struct ifobject *ifobject, struct pollfd *fds, bool timeo pthread_mutex_unlock(&pacing_mutex); xsk_ring_prod__submit(&xsk->tx, i); - xsk->outstanding_tx += valid_pkts; + xsk->outstanding_tx += valid_frags; if (use_poll) { ret = poll(fds, 1, POLL_TMOUT); diff --git a/tools/testing/selftests/bpf/xskxceiver.h b/tools/testing/selftests/bpf/xskxceiver.h index aaf27e067640..310b48ad8a3a 100644 --- a/tools/testing/selftests/bpf/xskxceiver.h +++ b/tools/testing/selftests/bpf/xskxceiver.h @@ -38,6 +38,7 @@ #define MAX_TEARDOWN_ITER 10 #define PKT_HDR_SIZE (sizeof(struct ethhdr) + 2) /* Just to align the data in the packet */ #define MIN_PKT_SIZE 64 +#define MAX_ETH_JUMBO_SIZE 9000 #define USLEEP_MAX 10000 #define SOCK_RECONF_CTR 10 #define BATCH_SIZE 64 @@ -47,7 +48,7 @@ #define DEFAULT_UMEM_BUFFERS (DEFAULT_PKT_CNT / 4) #define RX_FULL_RXQSIZE 32 #define UMEM_HEADROOM_TEST_SIZE 128 -#define XSK_UMEM__INVALID_FRAME_SIZE (XSK_UMEM__DEFAULT_FRAME_SIZE + 1) +#define XSK_UMEM__INVALID_FRAME_SIZE (MAX_ETH_JUMBO_SIZE + 1) #define HUGEPAGE_SIZE (2 * 1024 * 1024) #define PKT_DUMP_NB_TO_PRINT 16 From patchwork Thu May 18 18:05:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13247233 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B9AD936DA1; Thu, 18 May 2023 18:06:58 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E4BF4C2; Thu, 18 May 2023 11:06:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684433216; x=1715969216; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dyJuqKl3nsAXIo+LOXkEkawWZEpwHYGvCubRthLAmzY=; b=V/R7uYAQe8HNR0FkXnekuvD5knJ+MGXyQhuB352V3veIk+wG2b8LWOTx Wb7iqzK7a7sa1OUIZxIuRDL8+f+nSmBjAG7nRdDmsWW6VAT68eoTSWAyW F7NmK27NWhHHSjwWL9Hqwm6c2J7wrp3Hy0P1+NBMaCWRdsUck+A2V7D1x chxUVXY/lykHlGpdg4l5UpD7uhCCJKhFgzSjxZwKVRvgNDE+dBf28cusN bjt0t9n+mk4g62rTWFEbJKQzzrdZ9XwvC6YDF34CXM+GvguH6bLdN5QA1 RfNxoL9D1lebaeGuyz+e317c8mZQae4ShJZ5Ckm14ultEV1XEa0qhaG55 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="350985043" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="350985043" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 11:06:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="948780549" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="948780549" Received: from boxer.igk.intel.com ([10.102.20.173]) by fmsmga006.fm.intel.com with ESMTP; 18 May 2023 11:06:31 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, tirthendu.sarkar@intel.com, maciej.fijalkowski@intel.com, bjorn@kernel.org Subject: [PATCH bpf-next 16/21] selftests/xsk: add basic multi-buffer test Date: Thu, 18 May 2023 20:05:40 +0200 Message-Id: <20230518180545.159100-17-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230518180545.159100-1-maciej.fijalkowski@intel.com> References: <20230518180545.159100-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Magnus Karlsson Add the first basic multi-buffer test that sends a stream of 9K packets and validates that they are received at the other end. In order to enable sending and receiving multi-buffer packets, code that sets the MTU is introduced as well as modifications to the XDP programs so that they signal that they are multi-buffer enabled. Signed-off-by: Magnus Karlsson --- tools/include/uapi/linux/if_xdp.h | 6 + .../selftests/bpf/progs/xsk_xdp_progs.c | 4 +- tools/testing/selftests/bpf/xsk.c | 135 ++++++++++++++++++ tools/testing/selftests/bpf/xsk.h | 2 + tools/testing/selftests/bpf/xskxceiver.c | 58 ++++++++ tools/testing/selftests/bpf/xskxceiver.h | 5 + 6 files changed, 208 insertions(+), 2 deletions(-) diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h index 80245f5b4dd7..73a47da885dc 100644 --- a/tools/include/uapi/linux/if_xdp.h +++ b/tools/include/uapi/linux/if_xdp.h @@ -25,6 +25,12 @@ * application. */ #define XDP_USE_NEED_WAKEUP (1 << 3) +/* By setting this option, userspace application indicates that it can + * handle multiple descriptors per packet thus enabling xsk core to split + * multi-buffer XDP frames into multiple Rx descriptors. Without this set + * such frames will be dropped by xsk. + */ +#define XDP_USE_SG (1 << 4) /* Flags for xsk_umem_config flags */ #define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0) diff --git a/tools/testing/selftests/bpf/progs/xsk_xdp_progs.c b/tools/testing/selftests/bpf/progs/xsk_xdp_progs.c index a630c95c7471..ac76e7363776 100644 --- a/tools/testing/selftests/bpf/progs/xsk_xdp_progs.c +++ b/tools/testing/selftests/bpf/progs/xsk_xdp_progs.c @@ -15,12 +15,12 @@ struct { static unsigned int idx; int count = 0; -SEC("xdp") int xsk_def_prog(struct xdp_md *xdp) +SEC("xdp.frags") int xsk_def_prog(struct xdp_md *xdp) { return bpf_redirect_map(&xsk, 0, XDP_DROP); } -SEC("xdp") int xsk_xdp_drop(struct xdp_md *xdp) +SEC("xdp.frags") int xsk_xdp_drop(struct xdp_md *xdp) { /* Drop every other packet */ if (idx++ % 2) diff --git a/tools/testing/selftests/bpf/xsk.c b/tools/testing/selftests/bpf/xsk.c index 687d83e707f8..006946fa1bc0 100644 --- a/tools/testing/selftests/bpf/xsk.c +++ b/tools/testing/selftests/bpf/xsk.c @@ -18,10 +18,13 @@ #include #include #include +#include #include #include #include #include +#include +#include #include #include #include @@ -81,6 +84,12 @@ struct xsk_socket { int fd; }; +struct nl_mtu_req { + struct nlmsghdr nh; + struct ifinfomsg msg; + char buf[512]; +}; + int xsk_umem__fd(const struct xsk_umem *umem) { return umem ? umem->fd : -EINVAL; @@ -286,6 +295,132 @@ bool xsk_is_in_mode(u32 ifindex, int mode) return false; } +/* Lifted from netlink.c in tools/lib/bpf */ +static int netlink_recvmsg(int sock, struct msghdr *mhdr, int flags) +{ + int len; + + do { + len = recvmsg(sock, mhdr, flags); + } while (len < 0 && (errno == EINTR || errno == EAGAIN)); + + if (len < 0) + return -errno; + return len; +} + +/* Lifted from netlink.c in tools/lib/bpf */ +static int alloc_iov(struct iovec *iov, int len) +{ + void *nbuf; + + nbuf = realloc(iov->iov_base, len); + if (!nbuf) + return -ENOMEM; + + iov->iov_base = nbuf; + iov->iov_len = len; + return 0; +} + +/* Original version lifted from netlink.c in tools/lib/bpf */ +static int netlink_recv(int sock) +{ + struct iovec iov = {}; + struct msghdr mhdr = { + .msg_iov = &iov, + .msg_iovlen = 1, + }; + bool multipart = true; + struct nlmsgerr *err; + struct nlmsghdr *nh; + int len, ret; + + ret = alloc_iov(&iov, 4096); + if (ret) + goto done; + + while (multipart) { + multipart = false; + len = netlink_recvmsg(sock, &mhdr, MSG_PEEK | MSG_TRUNC); + if (len < 0) { + ret = len; + goto done; + } + + if (len > iov.iov_len) { + ret = alloc_iov(&iov, len); + if (ret) + goto done; + } + + len = netlink_recvmsg(sock, &mhdr, 0); + if (len < 0) { + ret = len; + goto done; + } + + if (len == 0) + break; + + for (nh = (struct nlmsghdr *)iov.iov_base; NLMSG_OK(nh, len); + nh = NLMSG_NEXT(nh, len)) { + if (nh->nlmsg_flags & NLM_F_MULTI) + multipart = true; + switch (nh->nlmsg_type) { + case NLMSG_ERROR: + err = (struct nlmsgerr *)NLMSG_DATA(nh); + if (!err->error) + continue; + ret = err->error; + goto done; + case NLMSG_DONE: + ret = 0; + goto done; + default: + break; + } + } + } + ret = 0; +done: + free(iov.iov_base); + return ret; +} + +int xsk_set_mtu(int ifindex, int mtu) +{ + struct nl_mtu_req req; + struct rtattr *rta; + int fd, ret; + + fd = socket(AF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE); + if (fd < 0) + return fd; + + memset(&req, 0, sizeof(req)); + req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)); + req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK; + req.nh.nlmsg_type = RTM_NEWLINK; + req.msg.ifi_family = AF_UNSPEC; + req.msg.ifi_index = ifindex; + rta = (struct rtattr *)(((char *)&req) + NLMSG_ALIGN(req.nh.nlmsg_len)); + rta->rta_type = IFLA_MTU; + rta->rta_len = RTA_LENGTH(sizeof(unsigned int)); + req.nh.nlmsg_len = NLMSG_ALIGN(req.nh.nlmsg_len) + RTA_LENGTH(sizeof(mtu)); + memcpy(RTA_DATA(rta), &mtu, sizeof(mtu)); + + ret = send(fd, &req, req.nh.nlmsg_len, 0); + if (ret < 0) { + close(fd); + return errno; + } + + ret = netlink_recv(fd); + close(fd); + return ret; +} + int xsk_attach_xdp_program(struct bpf_program *prog, int ifindex, u32 xdp_flags) { int prog_fd; diff --git a/tools/testing/selftests/bpf/xsk.h b/tools/testing/selftests/bpf/xsk.h index 8da8d557768b..d93200fdaa8d 100644 --- a/tools/testing/selftests/bpf/xsk.h +++ b/tools/testing/selftests/bpf/xsk.h @@ -239,6 +239,8 @@ int xsk_socket__create_shared(struct xsk_socket **xsk_ptr, int xsk_umem__delete(struct xsk_umem *umem); void xsk_socket__delete(struct xsk_socket *xsk); +int xsk_set_mtu(int ifindex, int mtu); + #ifdef __cplusplus } /* extern "C" */ #endif diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c index 5e29e8850488..41b8450561c9 100644 --- a/tools/testing/selftests/bpf/xskxceiver.c +++ b/tools/testing/selftests/bpf/xskxceiver.c @@ -49,6 +49,7 @@ * h. tests for invalid and corner case Tx descriptors so that the correct ones * are discarded and let through, respectively. * i. 2K frame size tests + * j. If multi-buffer is supported, send 9k packets divided into 3 frames * * Total tests: 12 * @@ -77,6 +78,7 @@ #include #include #include +#include #include #include #include @@ -253,6 +255,8 @@ static int __xsk_configure_socket(struct xsk_socket_info *xsk, struct xsk_umem_i cfg.bind_flags = ifobject->bind_flags; if (shared) cfg.bind_flags |= XDP_SHARED_UMEM; + if (ifobject->pkt_stream && ifobject->mtu > MAX_ETH_PKT_SIZE) + cfg.bind_flags |= XDP_USE_SG; txr = ifobject->tx_on ? &xsk->tx : NULL; rxr = ifobject->rx_on ? &xsk->rx : NULL; @@ -415,6 +419,7 @@ static void __test_spec_init(struct test_spec *test, struct ifobject *ifobj_tx, test->total_steps = 1; test->nb_sockets = 1; test->fail = false; + test->mtu = MAX_ETH_PKT_SIZE; test->xdp_prog_rx = ifobj_rx->xdp_progs->progs.xsk_def_prog; test->xskmap_rx = ifobj_rx->xdp_progs->maps.xsk; test->xdp_prog_tx = ifobj_tx->xdp_progs->progs.xsk_def_prog; @@ -468,6 +473,26 @@ static void test_spec_set_xdp_prog(struct test_spec *test, struct bpf_program *x test->xskmap_tx = xskmap_tx; } +static int test_spec_set_mtu(struct test_spec *test, int mtu) +{ + int err; + + if (test->ifobj_rx->mtu != mtu) { + err = xsk_set_mtu(test->ifobj_rx->ifindex, mtu); + if (err) + return err; + test->ifobj_rx->mtu = mtu; + } + if (test->ifobj_tx->mtu != mtu) { + err = xsk_set_mtu(test->ifobj_tx->ifindex, mtu); + if (err) + return err; + test->ifobj_tx->mtu = mtu; + } + + return 0; +} + static void pkt_stream_reset(struct pkt_stream *pkt_stream) { if (pkt_stream) @@ -1516,6 +1541,18 @@ static int __testapp_validate_traffic(struct test_spec *test, struct ifobject *i struct ifobject *ifobj2) { pthread_t t0, t1; + int err; + + if (test->mtu > MAX_ETH_PKT_SIZE && (!ifobj1->multi_buff_supp || + (ifobj2 && !ifobj2->multi_buff_supp))) { + ksft_test_result_skip("Multi buffer not supported.\n"); + return TEST_SKIP; + } + err = test_spec_set_mtu(test, test->mtu); + if (err) { + ksft_print_msg("Error, could not set mtu.\n"); + exit_with_error(err); + } if (ifobj2) { if (pthread_barrier_init(&barr, NULL, 2)) @@ -1725,6 +1762,15 @@ static int testapp_single_pkt(struct test_spec *test) return testapp_validate_traffic(test); } +static int testapp_multi_buffer(struct test_spec *test) +{ + test_spec_set_name(test, "RUN_TO_COMPLETION_9K_PACKETS"); + test->mtu = MAX_ETH_JUMBO_SIZE; + pkt_stream_replace(test, DEFAULT_PKT_CNT, MAX_ETH_JUMBO_SIZE); + + return testapp_validate_traffic(test); +} + static int testapp_invalid_desc(struct test_spec *test) { struct xsk_umem_info *umem = test->ifobj_tx->umem; @@ -1858,6 +1904,7 @@ static bool hugepages_present(void) static void init_iface(struct ifobject *ifobj, const char *dst_mac, const char *src_mac, thread_func_t func_ptr) { + LIBBPF_OPTS(bpf_xdp_query_opts, query_opts); int err; memcpy(ifobj->dst_mac, dst_mac, ETH_ALEN); @@ -1873,6 +1920,14 @@ static void init_iface(struct ifobject *ifobj, const char *dst_mac, const char * if (hugepages_present()) ifobj->unaligned_supp = true; + + err = bpf_xdp_query(ifobj->ifindex, XDP_FLAGS_DRV_MODE, &query_opts); + if (err) { + ksft_print_msg("Error querrying XDP capabilities\n"); + exit_with_error(-err); + } + if (query_opts.feature_flags & NETDEV_XDP_ACT_RX_SG) + ifobj->multi_buff_supp = true; } static void run_pkt_test(struct test_spec *test, enum test_mode mode, enum test_type type) @@ -1905,6 +1960,9 @@ static void run_pkt_test(struct test_spec *test, enum test_mode mode, enum test_ test_spec_set_name(test, "RUN_TO_COMPLETION"); ret = testapp_validate_traffic(test); break; + case TEST_TYPE_RUN_TO_COMPLETION_MB: + ret = testapp_multi_buffer(test); + break; case TEST_TYPE_RUN_TO_COMPLETION_SINGLE_PKT: test_spec_set_name(test, "RUN_TO_COMPLETION_SINGLE_PKT"); ret = testapp_single_pkt(test); diff --git a/tools/testing/selftests/bpf/xskxceiver.h b/tools/testing/selftests/bpf/xskxceiver.h index 310b48ad8a3a..c2091b1512f5 100644 --- a/tools/testing/selftests/bpf/xskxceiver.h +++ b/tools/testing/selftests/bpf/xskxceiver.h @@ -38,6 +38,7 @@ #define MAX_TEARDOWN_ITER 10 #define PKT_HDR_SIZE (sizeof(struct ethhdr) + 2) /* Just to align the data in the packet */ #define MIN_PKT_SIZE 64 +#define MAX_ETH_PKT_SIZE 1518 #define MAX_ETH_JUMBO_SIZE 9000 #define USLEEP_MAX 10000 #define SOCK_RECONF_CTR 10 @@ -84,6 +85,7 @@ enum test_type { TEST_TYPE_BPF_RES, TEST_TYPE_XDP_DROP_HALF, TEST_TYPE_XDP_METADATA_COUNT, + TEST_TYPE_RUN_TO_COMPLETION_MB, TEST_TYPE_MAX }; @@ -142,6 +144,7 @@ struct ifobject { struct bpf_program *xdp_prog; enum test_mode mode; int ifindex; + int mtu; u32 bind_flags; bool tx_on; bool rx_on; @@ -152,6 +155,7 @@ struct ifobject { bool shared_umem; bool use_metadata; bool unaligned_supp; + bool multi_buff_supp; u8 dst_mac[ETH_ALEN]; u8 src_mac[ETH_ALEN]; }; @@ -165,6 +169,7 @@ struct test_spec { struct bpf_program *xdp_prog_tx; struct bpf_map *xskmap_rx; struct bpf_map *xskmap_tx; + int mtu; u16 total_steps; u16 current_step; u16 nb_sockets; From patchwork Thu May 18 18:05:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13247234 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CA08C36DA4; Thu, 18 May 2023 18:06:58 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC1B2103; Thu, 18 May 2023 11:06:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684433217; x=1715969217; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3TxgoTBi95GacODPpEJ45HOBrxC2HnZ2ypZTBFsFTuI=; b=CjZKJOMpaEDCXI/E1/Ao2LG/03BrsmGEI+RkbDUkGiLfwiIVIeeIzVT/ pzdlnMVXZpxlo20AzJFbMtx3bAJDbw7t79cz6og9lg4SGmV9dGvWNRDXs 6UWD44X1i6nfAOQaHUoPakOUf0V7kmbOUSaFdPoNc8jcpkMr7c+n+jh86 IvU7+Ti5Jz2mvWuFQvsf45RvV8PJpmB6rtjFg+WnSyksjS/mk3Xd0/wjL 0fZ2mZ7st7TRwIJSDSuf0cJzUWyAA/o/pFQwt6pn/HJ+T/ubywiQZKo2j 7fHUZdIYVMPbau/R7v+szejgStiK3HaFZhAyH/kb6Gq/qkmGE/a4VX6RX Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="350985053" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="350985053" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 11:06:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="948780570" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="948780570" Received: from boxer.igk.intel.com ([10.102.20.173]) by fmsmga006.fm.intel.com with ESMTP; 18 May 2023 11:06:34 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, tirthendu.sarkar@intel.com, maciej.fijalkowski@intel.com, bjorn@kernel.org Subject: [PATCH bpf-next 17/21] selftests/xsk: add unaligned mode test for multi-buffer Date: Thu, 18 May 2023 20:05:41 +0200 Message-Id: <20230518180545.159100-18-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230518180545.159100-1-maciej.fijalkowski@intel.com> References: <20230518180545.159100-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Magnus Karlsson Add a test for multi-buffer AF_XDP when using unaligned mode. The test sends 4096 9K-buffers. Signed-off-by: Magnus Karlsson --- tools/testing/selftests/bpf/xskxceiver.c | 15 +++++++++++++++ tools/testing/selftests/bpf/xskxceiver.h | 1 + 2 files changed, 16 insertions(+) diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c index 41b8450561c9..8ad98eda90ad 100644 --- a/tools/testing/selftests/bpf/xskxceiver.c +++ b/tools/testing/selftests/bpf/xskxceiver.c @@ -50,6 +50,8 @@ * are discarded and let through, respectively. * i. 2K frame size tests * j. If multi-buffer is supported, send 9k packets divided into 3 frames + * k. If multi-buffer and huge pages are supported, send 9k packets in a single frame + * using unaligned mode * * Total tests: 12 * @@ -1754,6 +1756,16 @@ static int testapp_unaligned(struct test_spec *test) return testapp_validate_traffic(test); } +static int testapp_unaligned_mb(struct test_spec *test) +{ + test_spec_set_name(test, "UNALIGNED_MODE_9K"); + test->mtu = MAX_ETH_JUMBO_SIZE; + test->ifobj_tx->umem->unaligned_mode = true; + test->ifobj_rx->umem->unaligned_mode = true; + pkt_stream_replace(test, DEFAULT_PKT_CNT, MAX_ETH_JUMBO_SIZE); + return testapp_validate_traffic(test); +} + static int testapp_single_pkt(struct test_spec *test) { struct pkt pkts[] = {{0, MIN_PKT_SIZE, 0, true}}; @@ -2028,6 +2040,9 @@ static void run_pkt_test(struct test_spec *test, enum test_mode mode, enum test_ case TEST_TYPE_UNALIGNED: ret = testapp_unaligned(test); break; + case TEST_TYPE_UNALIGNED_MB: + ret = testapp_unaligned_mb(test); + break; case TEST_TYPE_HEADROOM: ret = testapp_headroom(test); break; diff --git a/tools/testing/selftests/bpf/xskxceiver.h b/tools/testing/selftests/bpf/xskxceiver.h index c2091b1512f5..42bde17e8d17 100644 --- a/tools/testing/selftests/bpf/xskxceiver.h +++ b/tools/testing/selftests/bpf/xskxceiver.h @@ -86,6 +86,7 @@ enum test_type { TEST_TYPE_XDP_DROP_HALF, TEST_TYPE_XDP_METADATA_COUNT, TEST_TYPE_RUN_TO_COMPLETION_MB, + TEST_TYPE_UNALIGNED_MB, TEST_TYPE_MAX }; From patchwork Thu May 18 18:05:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13247236 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D6C53A2E3; Thu, 18 May 2023 18:07:00 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D36910E; Thu, 18 May 2023 11:06:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684433218; x=1715969218; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=avcr4BRH7HPjOKHbQAZh4dMCLxdDtBanegdAPLkL72w=; b=C+0fVn9qb+14djF5ZylvnESPamPkpxsxwRMw/cOOebje/iyjLoIAmQmj ZgY7GR3xQc8RpZroU+70xJUovCuI0cXhep/NMOh4AmA+WCWSLb+eoGhDM bFXtIqkish5N4QQTv8Vbbem+poxTuCPLjo2NN4FjoOCzs/kPiHpGo65mB o9CowzVdkZNrYKWYhA9NmW8rhn5kn9BEn9/6LT36u/aYtPr0XhT4qMXO6 QMdFI+zn3vuWqtCX+AhKSXX7Wh8s+mrL19vuoKfraO7OzfyNddhphNnwO J3vo0Hr48AfRwU+NDCDDpufRqFEuhPbVLwwJOytPhQZHYj16/oJst6Ofa A==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="350985061" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="350985061" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 11:06:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="948780597" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="948780597" Received: from boxer.igk.intel.com ([10.102.20.173]) by fmsmga006.fm.intel.com with ESMTP; 18 May 2023 11:06:36 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, tirthendu.sarkar@intel.com, maciej.fijalkowski@intel.com, bjorn@kernel.org Subject: [PATCH bpf-next 18/21] selftests/xsk: add invalid descriptor test for multi-buffer Date: Thu, 18 May 2023 20:05:42 +0200 Message-Id: <20230518180545.159100-19-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230518180545.159100-1-maciej.fijalkowski@intel.com> References: <20230518180545.159100-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Magnus Karlsson Add a test that produces lots of nasty descriptors testing the corner cases of the descriptor validation. Some of these descriptors are valid and some are not as indicated by the valid flag. For a description of all the test combinations, please see the code. To stress the API, we need to be able to generate combinations of descriptors that make little sense. A new verbatim mode is introduced for the packet_stream to accomplish this. In this mode, all packets in the packet_stream are sent as is. We do not try to chop them up into frames that are of the right size that we know are going to work as we would normally do. The packets are just written into the Tx ring even if we know they make no sense. Signed-off-by: Magnus Karlsson Signed-off-by: Maciej Fijalkowski # adjusted valid flags for frags --- tools/testing/selftests/bpf/xskxceiver.c | 185 ++++++++++++++++++----- tools/testing/selftests/bpf/xskxceiver.h | 7 + 2 files changed, 151 insertions(+), 41 deletions(-) diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c index 8ad98eda90ad..1e1e9422ca0a 100644 --- a/tools/testing/selftests/bpf/xskxceiver.c +++ b/tools/testing/selftests/bpf/xskxceiver.c @@ -52,8 +52,8 @@ * j. If multi-buffer is supported, send 9k packets divided into 3 frames * k. If multi-buffer and huge pages are supported, send 9k packets in a single frame * using unaligned mode - * - * Total tests: 12 + * l. If multi-buffer is supported, try various nasty combinations of descriptors to + * check if they pass the validation or not * * Flow: * ----- @@ -76,7 +76,6 @@ #include #include #include -#include #include #include #include @@ -95,7 +94,6 @@ #include #include #include -#include #include #include "xsk_xdp_progs.skel.h" @@ -560,9 +558,9 @@ static struct pkt_stream *__pkt_stream_alloc(u32 nb_pkts) return pkt_stream; } -static bool pkt_continues(const struct xdp_desc *desc) +static bool pkt_continues(u32 options) { - return desc->options & XDP_PKT_CONTD; + return options & XDP_PKT_CONTD; } static u32 ceil_u32(u32 a, u32 b) @@ -570,11 +568,32 @@ static u32 ceil_u32(u32 a, u32 b) return (a + b - 1) / b; } -static u32 pkt_nb_frags(u32 frame_size, struct pkt *pkt) +static u32 pkt_nb_frags(u32 frame_size, struct pkt_stream *pkt_stream, struct pkt *pkt) { - if (!pkt || !pkt->valid) + u32 nb_frags = 1, next_frag; + + if (!pkt) return 1; - return ceil_u32(pkt->len, frame_size); + + if (!pkt_stream->verbatim) { + if (!pkt->valid || !pkt->len) + return 1; + return ceil_u32(pkt->len, frame_size); + } + + /* Search for the end of the packet in verbatim mode */ + if (!pkt_continues(pkt->options)) + return nb_frags; + + next_frag = pkt_stream->current_pkt_nb; + pkt++; + while (next_frag++ < pkt_stream->nb_pkts) { + nb_frags++; + if (!pkt_continues(pkt->options) || !pkt->valid) + break; + pkt++; + } + return nb_frags; } static void pkt_set(struct xsk_umem_info *umem, struct pkt *pkt, int offset, u32 len) @@ -694,34 +713,59 @@ static void pkt_generate(struct ifobject *ifobject, u64 addr, u32 len, u32 pkt_n write_payload(data, pkt_nb, bytes_written, len); } -static void __pkt_stream_generate_custom(struct ifobject *ifobj, - struct pkt *pkts, u32 nb_pkts) +static struct pkt_stream *__pkt_stream_generate_custom(struct ifobject *ifobj, struct pkt *frames, + u32 nb_frames, bool verbatim) { + u32 i, len = 0, pkt_nb = 0, payload = 0; struct pkt_stream *pkt_stream; - u32 i; - pkt_stream = __pkt_stream_alloc(nb_pkts); + pkt_stream = __pkt_stream_alloc(nb_frames); if (!pkt_stream) exit_with_error(ENOMEM); - for (i = 0; i < nb_pkts; i++) { - struct pkt *pkt = &pkt_stream->pkts[i]; + for (i = 0; i < nb_frames; i++) { + struct pkt *pkt = &pkt_stream->pkts[pkt_nb]; + struct pkt *frame = &frames[i]; - pkt->offset = pkts[i].offset; - pkt->len = pkts[i].len; - pkt->pkt_nb = i; - pkt->valid = pkts[i].valid; - if (pkt->len > pkt_stream->max_pkt_len) + pkt->offset = frame->offset; + if (verbatim) { + *pkt = *frame; + pkt->pkt_nb = payload; + if (!frame->valid || !pkt_continues(frame->options)) + payload++; + } else { + if (frame->valid) + len += frame->len; + if (frame->valid && pkt_continues(frame->options)) + continue; + + pkt->pkt_nb = pkt_nb; + pkt->len = len; + pkt->valid = frame->valid; + pkt->options = 0; + + len = 0; + } + + if (pkt->valid && pkt->len > pkt_stream->max_pkt_len) pkt_stream->max_pkt_len = pkt->len; + pkt_nb++; } - ifobj->pkt_stream = pkt_stream; + pkt_stream->nb_pkts = pkt_nb; + pkt_stream->verbatim = verbatim; + return pkt_stream; } static void pkt_stream_generate_custom(struct test_spec *test, struct pkt *pkts, u32 nb_pkts) { - __pkt_stream_generate_custom(test->ifobj_tx, pkts, nb_pkts); - __pkt_stream_generate_custom(test->ifobj_rx, pkts, nb_pkts); + struct pkt_stream *pkt_stream; + + pkt_stream = __pkt_stream_generate_custom(test->ifobj_tx, pkts, nb_pkts, true); + test->ifobj_tx->pkt_stream = pkt_stream; + + pkt_stream = __pkt_stream_generate_custom(test->ifobj_rx, pkts, nb_pkts, false); + test->ifobj_rx->pkt_stream = pkt_stream; } static void pkt_print_data(u32 *data, u32 cnt) @@ -862,11 +906,6 @@ static bool is_frag_valid(struct xsk_umem_info *umem, u64 addr, u32 len, u32 exp static bool is_pkt_valid(struct pkt *pkt, void *buffer, u64 addr, u32 len) { - if (!pkt) { - ksft_print_msg("[%s] too many packets received\n", __func__); - return false; - } - if (pkt->len != len) { ksft_print_msg("[%s] expected packet length [%d], got length [%d]\n", __func__, pkt->len, len); @@ -966,7 +1005,6 @@ static int receive_pkts(struct test_spec *test, struct pollfd *fds) ksft_print_msg("ERROR: [%s] Poll timed out\n", __func__); return TEST_FAILURE; - } if (!(fds->revents & POLLIN)) @@ -998,6 +1036,12 @@ static int receive_pkts(struct test_spec *test, struct pollfd *fds) orig = xsk_umem__extract_addr(addr); addr = xsk_umem__add_offset_to_addr(addr); + if (!pkt) { + ksft_print_msg("[%s] received too many packets addr: %lx len %u\n", + __func__, addr, desc->len); + return TEST_FAILURE; + } + if (!is_frag_valid(umem, addr, desc->len, pkt->pkt_nb, pkt_len) || !is_offset_correct(umem, pkt, addr) || (ifobj->use_metadata && !is_metadata_correct(pkt, umem->buffer, addr))) @@ -1010,7 +1054,7 @@ static int receive_pkts(struct test_spec *test, struct pollfd *fds) if (ifobj->use_fill_ring) *xsk_ring_prod__fill_addr(&umem->fq, idx_fq++) = orig; - if (pkt_continues(desc)) + if (pkt_continues(desc->options)) continue; /* The complete packet has been received */ @@ -1089,31 +1133,29 @@ static int __send_pkts(struct ifobject *ifobject, struct pollfd *fds, bool timeo for (i = 0; i < BATCH_SIZE; i++) { struct pkt *pkt = pkt_stream_get_next_tx_pkt(pkt_stream); - u32 nb_frags, bytes_written = 0; + u32 nb_frags_left, nb_frags, bytes_written = 0; if (!pkt) break; - nb_frags = pkt_nb_frags(umem->frame_size, pkt); + nb_frags = pkt_nb_frags(umem->frame_size, pkt_stream, pkt); if (nb_frags > BATCH_SIZE - i) { pkt_stream_cancel(pkt_stream); xsk_ring_prod__cancel(&xsk->tx, BATCH_SIZE - i); break; } + nb_frags_left = nb_frags; - if (pkt->valid) { - valid_pkts++; - valid_frags += nb_frags; - } - - while (nb_frags--) { + while (nb_frags_left--) { struct xdp_desc *tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx + i); tx_desc->addr = pkt_get_addr(pkt, ifobject->umem); - if (nb_frags) { + if (pkt_stream->verbatim) { + tx_desc->len = pkt->len; + tx_desc->options = pkt->options; + } else if (nb_frags_left) { tx_desc->len = umem->frame_size; tx_desc->options = XDP_PKT_CONTD; - i++; } else { tx_desc->len = pkt->len - bytes_written; tx_desc->options = 0; @@ -1122,6 +1164,17 @@ static int __send_pkts(struct ifobject *ifobject, struct pollfd *fds, bool timeo pkt_generate(ifobject, tx_desc->addr, tx_desc->len, pkt->pkt_nb, bytes_written); bytes_written += tx_desc->len; + + if (nb_frags_left) { + i++; + if (pkt_stream->verbatim) + pkt = pkt_stream_get_next_tx_pkt(pkt_stream); + } + } + + if (pkt && pkt->valid) { + valid_pkts++; + valid_frags += nb_frags; } } @@ -1350,7 +1403,7 @@ static void xsk_populate_fill_ring(struct xsk_umem_info *umem, struct pkt_stream u64 addr; u32 i; - for (i = 0; i < pkt_nb_frags(rx_frame_size, pkt); i++) { + for (i = 0; i < pkt_nb_frags(rx_frame_size, pkt_stream, pkt); i++) { if (!pkt) { if (!fill_up) break; @@ -1783,6 +1836,46 @@ static int testapp_multi_buffer(struct test_spec *test) return testapp_validate_traffic(test); } +static int testapp_invalid_desc_mb(struct test_spec *test) +{ + struct xsk_umem_info *umem = test->ifobj_tx->umem; + u64 umem_size = umem->num_frames * umem->frame_size; + struct pkt pkts[] = { + /* Valid packet for synch to start with */ + {0, MIN_PKT_SIZE, 0, true, 0}, + /* Zero frame len is not legal */ + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, + {0, 0, 0, false, 0}, + /* Invalid address in the second frame */ + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, + {umem_size, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, + /* Invalid len in the middle */ + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, + {0, XSK_UMEM__INVALID_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, + /* Invalid options in the middle */ + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XSK_DESC__INVALID_OPTION}, + /* Transmit 2 frags, receive 3 */ + {0, XSK_UMEM__MAX_FRAME_SIZE, 0, true, XDP_PKT_CONTD}, + {0, XSK_UMEM__MAX_FRAME_SIZE, 0, true, 0}, + /* Middle frame crosses chunk boundary with small length */ + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, + {-MIN_PKT_SIZE / 2, MIN_PKT_SIZE, 0, false, 0}, + /* Valid packet for synch so that something is received */ + {0, MIN_PKT_SIZE, 0, true, 0}}; + + if (umem->unaligned_mode) { + /* Crossing a chunk boundary allowed */ + pkts[12].valid = true; + pkts[13].valid = true; + } + + test->mtu = MAX_ETH_JUMBO_SIZE; + pkt_stream_generate_custom(test, pkts, ARRAY_SIZE(pkts)); + return testapp_validate_traffic(test); +} + static int testapp_invalid_desc(struct test_spec *test) { struct xsk_umem_info *umem = test->ifobj_tx->umem; @@ -2037,6 +2130,16 @@ static void run_pkt_test(struct test_spec *test, enum test_mode mode, enum test_ ret = testapp_invalid_desc(test); break; } + case TEST_TYPE_ALIGNED_INV_DESC_MB: + test_spec_set_name(test, "ALIGNED_INV_DESC_MULTI_BUFF"); + ret = testapp_invalid_desc_mb(test); + break; + case TEST_TYPE_UNALIGNED_INV_DESC_MB: + test_spec_set_name(test, "UNALIGNED_INV_DESC_MULTI_BUFF"); + test->ifobj_tx->umem->unaligned_mode = true; + test->ifobj_rx->umem->unaligned_mode = true; + ret = testapp_invalid_desc_mb(test); + break; case TEST_TYPE_UNALIGNED: ret = testapp_unaligned(test); break; diff --git a/tools/testing/selftests/bpf/xskxceiver.h b/tools/testing/selftests/bpf/xskxceiver.h index 42bde17e8d17..de781357ea15 100644 --- a/tools/testing/selftests/bpf/xskxceiver.h +++ b/tools/testing/selftests/bpf/xskxceiver.h @@ -50,6 +50,9 @@ #define RX_FULL_RXQSIZE 32 #define UMEM_HEADROOM_TEST_SIZE 128 #define XSK_UMEM__INVALID_FRAME_SIZE (MAX_ETH_JUMBO_SIZE + 1) +#define XSK_UMEM__LARGE_FRAME_SIZE (3 * 1024) +#define XSK_UMEM__MAX_FRAME_SIZE (4 * 1024) +#define XSK_DESC__INVALID_OPTION (0xffff) #define HUGEPAGE_SIZE (2 * 1024 * 1024) #define PKT_DUMP_NB_TO_PRINT 16 @@ -87,6 +90,8 @@ enum test_type { TEST_TYPE_XDP_METADATA_COUNT, TEST_TYPE_RUN_TO_COMPLETION_MB, TEST_TYPE_UNALIGNED_MB, + TEST_TYPE_ALIGNED_INV_DESC_MB, + TEST_TYPE_UNALIGNED_INV_DESC_MB, TEST_TYPE_MAX }; @@ -119,6 +124,7 @@ struct pkt { u32 len; u32 pkt_nb; bool valid; + u16 options; }; struct pkt_stream { @@ -126,6 +132,7 @@ struct pkt_stream { u32 current_pkt_nb; struct pkt *pkts; u32 max_pkt_len; + bool verbatim; }; struct ifobject; From patchwork Thu May 18 18:05:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13247235 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB0213A2E3; Thu, 18 May 2023 18:06:59 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C50A123; Thu, 18 May 2023 11:06:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684433218; x=1715969218; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=inFRyTq2EqX2z3gvLMkdtmAE3+Gn4olztkSAHDiSgV8=; b=H6MbiufFWZkwpEb8hUd+pMnf6+064kGqFV0RE/6NwO7qzQp4RDjTd3TP V3K6jVFT3ImFVhYcOjaKDEJJ9WL/XzsJ38yfdwZu0ugbq7h0zCXKMFgSw 01U9ty+GitDvsh32oPToXfWASX1c+pm4Cn3f3clPRqLw9jo/uJEKSZ4Ui OG+LPSXbYXJL8tSplGTF1OF8zcTOLF0n1kl+52scQcfq53JYjJPo+YNIh Qhkotl0EOYuM+6JJxd+irkhNN6eX6QngI/mCnvtfQK5IYYxRTI1P/svXK vvu7PW6eskxPC/qeuRbuKzImGgKosLMsDXJ1CvSZrooq+guLj/v9gPcBp w==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="350985067" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="350985067" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 11:06:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="948780611" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="948780611" Received: from boxer.igk.intel.com ([10.102.20.173]) by fmsmga006.fm.intel.com with ESMTP; 18 May 2023 11:06:39 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, tirthendu.sarkar@intel.com, maciej.fijalkowski@intel.com, bjorn@kernel.org Subject: [PATCH bpf-next 19/21] selftests/xsk: add metadata copy test for multi-buff Date: Thu, 18 May 2023 20:05:43 +0200 Message-Id: <20230518180545.159100-20-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230518180545.159100-1-maciej.fijalkowski@intel.com> References: <20230518180545.159100-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Magnus Karlsson Enable the already existing metadata copy test to also run in multi-buffer mode with 9K packets. Signed-off-by: Magnus Karlsson --- tools/testing/selftests/bpf/progs/xsk_xdp_progs.c | 2 +- tools/testing/selftests/bpf/xskxceiver.c | 7 ++++++- tools/testing/selftests/bpf/xskxceiver.h | 1 + 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/xsk_xdp_progs.c b/tools/testing/selftests/bpf/progs/xsk_xdp_progs.c index ac76e7363776..24369f242853 100644 --- a/tools/testing/selftests/bpf/progs/xsk_xdp_progs.c +++ b/tools/testing/selftests/bpf/progs/xsk_xdp_progs.c @@ -29,7 +29,7 @@ SEC("xdp.frags") int xsk_xdp_drop(struct xdp_md *xdp) return bpf_redirect_map(&xsk, 0, XDP_DROP); } -SEC("xdp") int xsk_xdp_populate_metadata(struct xdp_md *xdp) +SEC("xdp.frags") int xsk_xdp_populate_metadata(struct xdp_md *xdp) { void *data, *data_meta; struct xdp_info *meta; diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c index 1e1e9422ca0a..658e978053d4 100644 --- a/tools/testing/selftests/bpf/xskxceiver.c +++ b/tools/testing/selftests/bpf/xskxceiver.c @@ -1942,7 +1942,6 @@ static int testapp_xdp_metadata_count(struct test_spec *test) int count = 0; int key = 0; - test_spec_set_name(test, "XDP_METADATA_COUNT"); test_spec_set_xdp_prog(test, skel_rx->progs.xsk_xdp_populate_metadata, skel_tx->progs.xsk_xdp_populate_metadata, skel_rx->maps.xsk, skel_tx->maps.xsk); @@ -2153,6 +2152,12 @@ static void run_pkt_test(struct test_spec *test, enum test_mode mode, enum test_ ret = testapp_xdp_drop(test); break; case TEST_TYPE_XDP_METADATA_COUNT: + test_spec_set_name(test, "XDP_METADATA_COUNT"); + ret = testapp_xdp_metadata_count(test); + break; + case TEST_TYPE_XDP_METADATA_COUNT_MB: + test_spec_set_name(test, "XDP_METADATA_COUNT_MULTI_BUFF"); + test->mtu = MAX_ETH_JUMBO_SIZE; ret = testapp_xdp_metadata_count(test); break; default: diff --git a/tools/testing/selftests/bpf/xskxceiver.h b/tools/testing/selftests/bpf/xskxceiver.h index de781357ea15..786c2dfab7a6 100644 --- a/tools/testing/selftests/bpf/xskxceiver.h +++ b/tools/testing/selftests/bpf/xskxceiver.h @@ -88,6 +88,7 @@ enum test_type { TEST_TYPE_BPF_RES, TEST_TYPE_XDP_DROP_HALF, TEST_TYPE_XDP_METADATA_COUNT, + TEST_TYPE_XDP_METADATA_COUNT_MB, TEST_TYPE_RUN_TO_COMPLETION_MB, TEST_TYPE_UNALIGNED_MB, TEST_TYPE_ALIGNED_INV_DESC_MB, From patchwork Thu May 18 18:05:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13247237 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 850B33A2E3; Thu, 18 May 2023 18:07:00 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 126B5C2; Thu, 18 May 2023 11:06:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684433219; x=1715969219; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OQprRLEOf6fJ8kEz1z775CxA9iaUNUzWvTpvflDdk7g=; b=ZIHgr6VyN8KzYFj2018n9L2GCJ+lDXWD0cRLb5kyCgf21peuPLGCjz51 01AGg9uD1DsCMrb/j5bLF5/AzorMGDXNMllv2LflyarYC4ED+I9zRg20w zzeb4QZq+M+WFqEE33qNsWZB1LYCWIEasl1m2+GaB4Pzz4duZpyM5hpf3 EPFL+IXjo8enz4mX2s8/xkuwjAGKz6gC66FIIqoXswgY7jWguXdCFfY+B iu0811dfR36TF7JOvRCoG3CnRQ5CJRfzPrFId0nvRI3BkPj6n69edM/0x csdR0gcNcajjSl6P/6t2/2S+ztOWIp3aknofq6cc9lthfLtpH4uUUDbDc g==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="350985081" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="350985081" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 11:06:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="948780626" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="948780626" Received: from boxer.igk.intel.com ([10.102.20.173]) by fmsmga006.fm.intel.com with ESMTP; 18 May 2023 11:06:41 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, tirthendu.sarkar@intel.com, maciej.fijalkowski@intel.com, bjorn@kernel.org Subject: [PATCH bpf-next 20/21] selftests/xsk: add test for too many frags Date: Thu, 18 May 2023 20:05:44 +0200 Message-Id: <20230518180545.159100-21-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230518180545.159100-1-maciej.fijalkowski@intel.com> References: <20230518180545.159100-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Magnus Karlsson Add a test for testing that a packet consisting of more than 17 frags is discarded. This is only valid for SKB and DRV mode since in zero-copy mode, this limit is up to the HW and what it supports. Signed-off-by: Magnus Karlsson --- tools/testing/selftests/bpf/xskxceiver.c | 37 ++++++++++++++++++++++++ tools/testing/selftests/bpf/xskxceiver.h | 2 ++ 2 files changed, 39 insertions(+) diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c index 658e978053d4..430ff0cec24c 100644 --- a/tools/testing/selftests/bpf/xskxceiver.c +++ b/tools/testing/selftests/bpf/xskxceiver.c @@ -1975,6 +1975,40 @@ static int testapp_poll_rxq_tmout(struct test_spec *test) return testapp_validate_traffic_single_thread(test, test->ifobj_rx); } +static int testapp_too_many_frags(struct test_spec *test) +{ + struct pkt pkts[XSK_DESC__MAX_FRAGS + 3] = {}; + u32 i; + + test_spec_set_name(test, "TOO_MANY_FRAGS"); + if (test->mode == TEST_MODE_ZC) { + /* Limit is up to driver for zero-copy mode so not testable. */ + ksft_test_result_skip("Cannot be run for zero-copy mode.\n"); + return TEST_SKIP; + } + + test->mtu = MAX_ETH_JUMBO_SIZE; + + /* Valid packet for synch */ + pkts[0].len = MIN_PKT_SIZE; + pkts[0].valid = true; + + /* Do not signal end-of-packet in the 17th frag. This is not legal. */ + for (i = 1; i < XSK_DESC__MAX_FRAGS + 2; i++) { + pkts[i].len = MIN_PKT_SIZE; + pkts[i].options = XDP_PKT_CONTD; + pkts[i].valid = true; + } + pkts[XSK_DESC__MAX_FRAGS + 1].valid = false; + + /* Valid packet for synch */ + pkts[XSK_DESC__MAX_FRAGS + 2].len = MIN_PKT_SIZE; + pkts[XSK_DESC__MAX_FRAGS + 2].valid = true; + + pkt_stream_generate_custom(test, pkts, ARRAY_SIZE(pkts)); + return testapp_validate_traffic(test); +} + static int xsk_load_xdp_programs(struct ifobject *ifobj) { ifobj->xdp_progs = xsk_xdp_progs__open_and_load(); @@ -2160,6 +2194,9 @@ static void run_pkt_test(struct test_spec *test, enum test_mode mode, enum test_ test->mtu = MAX_ETH_JUMBO_SIZE; ret = testapp_xdp_metadata_count(test); break; + case TEST_TYPE_TOO_MANY_FRAGS: + ret = testapp_too_many_frags(test); + break; default: break; } diff --git a/tools/testing/selftests/bpf/xskxceiver.h b/tools/testing/selftests/bpf/xskxceiver.h index 786c2dfab7a6..bb9d36a7e544 100644 --- a/tools/testing/selftests/bpf/xskxceiver.h +++ b/tools/testing/selftests/bpf/xskxceiver.h @@ -53,6 +53,7 @@ #define XSK_UMEM__LARGE_FRAME_SIZE (3 * 1024) #define XSK_UMEM__MAX_FRAME_SIZE (4 * 1024) #define XSK_DESC__INVALID_OPTION (0xffff) +#define XSK_DESC__MAX_FRAGS 17 #define HUGEPAGE_SIZE (2 * 1024 * 1024) #define PKT_DUMP_NB_TO_PRINT 16 @@ -93,6 +94,7 @@ enum test_type { TEST_TYPE_UNALIGNED_MB, TEST_TYPE_ALIGNED_INV_DESC_MB, TEST_TYPE_UNALIGNED_INV_DESC_MB, + TEST_TYPE_TOO_MANY_FRAGS, TEST_TYPE_MAX }; From patchwork Thu May 18 18:05:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13247238 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B50D3A2E3; Thu, 18 May 2023 18:07:02 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E58F28F; Thu, 18 May 2023 11:07:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684433220; x=1715969220; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=btMEqfPJdz24ajU0aJZF0IeGeFF/1NrmUw/U4+YuNPo=; b=QCUre7+j6TmljQLGX+HY0kXcvw6etSqLKndw5GuJPNk8KxUI0u1Yi/mg ZKaVWIZoDeagYWnxRinpYrYIdniENy3nmWeMAFkXRV31/+o8fdeyZu0fN YucIx3o3W2ar1X7MFwMYAqgX7837bzNCPX4pevW+Wy7P0xXTlsKcrRusY DZ7l4V37cKFHrqUhxNc/8knnjMZFnE2u0C1tr6CNtLlkb+m8krry4hEvF qonUXqGxdoieKHFjQ53HP9m5UpBQ22wL06tKlONBCkqHNkK0uari+/nIR GvdODAd7vZqWQ8Df1iAd8a9mqmd0Bkb9xkZE0ta0fPFK7ADjFktV0wwTc A==; X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="350985094" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="350985094" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 11:06:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10714"; a="948780640" X-IronPort-AV: E=Sophos;i="6.00,174,1681196400"; d="scan'208";a="948780640" Received: from boxer.igk.intel.com ([10.102.20.173]) by fmsmga006.fm.intel.com with ESMTP; 18 May 2023 11:06:43 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, tirthendu.sarkar@intel.com, maciej.fijalkowski@intel.com, bjorn@kernel.org Subject: [PATCH bpf-next 21/21] selftests/xsk: reset NIC settings to default after running test suite Date: Thu, 18 May 2023 20:05:45 +0200 Message-Id: <20230518180545.159100-22-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230518180545.159100-1-maciej.fijalkowski@intel.com> References: <20230518180545.159100-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Currently, when running ZC test suite, after finishing first run of test suite and then switching to busy-poll tests within xskxceiver, such errors are observed: libbpf: Kernel error message: ice: MTU is too large for linear frames and XDP prog does not support frags 1..26 libbpf: Kernel error message: Native and generic XDP can't be active at the same time Error attaching XDP program not ok 1 [xskxceiver.c:xsk_reattach_xdp:1568]: ERROR: 17/"File exists" this is because test suite ends with 9k MTU and native xdp program being loaded. Busy-poll tests start non-multi-buffer tests for generic mode. To fix this, let us introduce bash function that will reset NIC settings to default (e.g. 1500 MTU and no xdp progs loaded) so that test suite can continue without interrupts. It also means that after busy-poll tests NIC will have those default settings, whereas right now it is left with 9k MTU and xdp prog loaded in native mode. Signed-off-by: Maciej Fijalkowski --- tools/testing/selftests/bpf/test_xsk.sh | 5 +++++ tools/testing/selftests/bpf/xsk_prereqs.sh | 7 +++++++ 2 files changed, 12 insertions(+) diff --git a/tools/testing/selftests/bpf/test_xsk.sh b/tools/testing/selftests/bpf/test_xsk.sh index c2ad50f26b63..2aa5a3445056 100755 --- a/tools/testing/selftests/bpf/test_xsk.sh +++ b/tools/testing/selftests/bpf/test_xsk.sh @@ -171,7 +171,10 @@ exec_xskxceiver if [ -z $ETH ]; then cleanup_exit ${VETH0} ${VETH1} +else + cleanup_iface ${ETH} ${MTU} fi + TEST_NAME="XSK_SELFTESTS_${VETH0}_BUSY_POLL" busy_poll=1 @@ -184,6 +187,8 @@ exec_xskxceiver if [ -z $ETH ]; then cleanup_exit ${VETH0} ${VETH1} +else + cleanup_iface ${ETH} ${MTU} fi failures=0 diff --git a/tools/testing/selftests/bpf/xsk_prereqs.sh b/tools/testing/selftests/bpf/xsk_prereqs.sh index ae697a10a056..29175682c44d 100755 --- a/tools/testing/selftests/bpf/xsk_prereqs.sh +++ b/tools/testing/selftests/bpf/xsk_prereqs.sh @@ -53,6 +53,13 @@ test_exit() exit 1 } +cleanup_iface() +{ + ip link set $1 mtu $2 + ip link set $1 xdp off + ip link set $1 xdpgeneric off +} + clear_configs() { [ $(ip link show $1 &>/dev/null; echo $?;) == 0 ] &&