From patchwork Mon May 29 15:50:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13258745 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 936789441; Mon, 29 May 2023 15:50:33 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D6C14C7; Mon, 29 May 2023 08:50:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685375431; x=1716911431; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=8MrgUlsqrXu8BqAavZRNYO9xvrhE39bWFbxfxDuf9rg=; b=KotlbB3d9UKnB9qIrJNCmfmc6wJIEaW1geE3LI+Qqu3a/delFU+6yxp0 EZH81CZtDYInCHO3fARQwUqVPz+9X4qhQOyx2TLVzSK5n6d2bYe1L+ZeF CAiBVEN9Hhdo+WEHDQHjd0rlDKj4RuwNdxw/x6eB9aQ7l41bVB8S8ON6J SRhsBhrJSut6p0FgOgBwzvBciNYsAtMZobYHhDWM+6JpjZxwNhUhfz5vT RlyEi/Liu9G1xc8wdKXhIK199/K3k2jq/7HJe9v/yBKIr+ZjpS8axSP04 qLOQh3srnKqvUHsD16yfeHrcp2zr1+dL6gB+5UCJOG/wAgbZ8+P4Com3i w==; X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="344228951" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="344228951" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2023 08:50:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="880440990" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="880440990" Received: from boxer.igk.intel.com ([10.102.20.173]) by orsmga005.jf.intel.com with ESMTP; 29 May 2023 08:50:29 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, tirthendu.sarkar@intel.com Subject: [PATCH v2 bpf-next 01/22] xsk: prepare 'options' in xdp_desc for multi-buffer use Date: Mon, 29 May 2023 17:50:03 +0200 Message-Id: <20230529155024.222213-2-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230529155024.222213-1-maciej.fijalkowski@intel.com> References: <20230529155024.222213-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Tirthendu Sarkar Use the 'options' field in xdp_desc as a packet continuity marker. Since 'options' field was unused till now and was expected to be set to 0, the 'eop' descriptor will have it set to 0, while the non-eop descriptors will have to set it to 1. This ensures legacy applications continue to work without needing any change for single-buffer packets. Add helper functions and extend xskq_prod_reserve_desc() to use the 'options' field. Signed-off-by: Tirthendu Sarkar --- include/uapi/linux/if_xdp.h | 7 +++++++ net/xdp/xsk.c | 8 ++++---- net/xdp/xsk_queue.h | 12 +++++++++--- 3 files changed, 20 insertions(+), 7 deletions(-) diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h index a78a8096f4ce..434f313dc26c 100644 --- a/include/uapi/linux/if_xdp.h +++ b/include/uapi/linux/if_xdp.h @@ -108,4 +108,11 @@ struct xdp_desc { /* UMEM descriptor is __u64 */ +/* Flag indicating that the packet continues with the buffer pointed out by the + * next frame in the ring. The end of the packet is signalled by setting this + * bit to zero. For single buffer packets, every descriptor has 'options' set + * to 0 and this maintains backward compatibility. + */ +#define XDP_PKT_CONTD (1 << 0) + #endif /* _LINUX_IF_XDP_H */ diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index cc1e7f15fa73..99f90a0d04ae 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -135,14 +135,14 @@ int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool *pool, return 0; } -static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) +static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len, u32 flags) { struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp); u64 addr; int err; addr = xp_get_handle(xskb); - err = xskq_prod_reserve_desc(xs->rx, addr, len); + err = xskq_prod_reserve_desc(xs->rx, addr, len, flags); if (err) { xs->rx_queue_full++; return err; @@ -189,7 +189,7 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) } xsk_copy_xdp(xsk_xdp, xdp, len); - err = __xsk_rcv_zc(xs, xsk_xdp, len); + err = __xsk_rcv_zc(xs, xsk_xdp, len, 0); if (err) { xsk_buff_free(xsk_xdp); return err; @@ -259,7 +259,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) if (xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL) { len = xdp->data_end - xdp->data; - return __xsk_rcv_zc(xs, xdp, len); + return __xsk_rcv_zc(xs, xdp, len, 0); } err = __xsk_rcv(xs, xdp); diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index 6d40a77fccbe..ad81b19e6fdf 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -130,6 +130,11 @@ static inline bool xskq_cons_read_addr_unchecked(struct xsk_queue *q, u64 *addr) return false; } +static inline bool xp_unused_options_set(u16 options) +{ + return options & ~XDP_PKT_CONTD; +} + static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) { @@ -141,7 +146,7 @@ static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool, if (desc->addr >= pool->addrs_cnt) return false; - if (desc->options) + if (xp_unused_options_set(desc->options)) return false; return true; } @@ -158,7 +163,7 @@ static inline bool xp_unaligned_validate_desc(struct xsk_buff_pool *pool, xp_desc_crosses_non_contig_pg(pool, addr, desc->len)) return false; - if (desc->options) + if (xp_unused_options_set(desc->options)) return false; return true; } @@ -360,7 +365,7 @@ static inline void xskq_prod_write_addr_batch(struct xsk_queue *q, struct xdp_de } static inline int xskq_prod_reserve_desc(struct xsk_queue *q, - u64 addr, u32 len) + u64 addr, u32 len, u32 flags) { struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring; u32 idx; @@ -372,6 +377,7 @@ static inline int xskq_prod_reserve_desc(struct xsk_queue *q, idx = q->cached_prod++ & q->ring_mask; ring->desc[idx].addr = addr; ring->desc[idx].len = len; + ring->desc[idx].options = flags; return 0; } From patchwork Mon May 29 15:50:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13258746 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1030B9441; Mon, 29 May 2023 15:50:36 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D03D7C4; Mon, 29 May 2023 08:50:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685375434; x=1716911434; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2khGcsO2LyLYWzDrt8bA/RMai+QHX0D+6HdnFcQ6kcY=; b=Xl4YjTqiuImMBDNbPT7LxqL70w4hzKFuPpLl1/bD132XTPP1TGHszF+Y BZBBaTegAqGqzykC/TFkNfRVyDY6etjvwDmOctYIUWOtnuTkfTTCVgxzY jzcpf9HDLDlE+arMSUO8lhOms4kI79nDCyCNwUdF1DBIj3qX9628j6hc2 0mTLl7vWXemny2bE/V8OrLGS7n4C4fnofGN0ihKfHGHdQmH00jZAohaWM zDkS4j+p1ThAQmQIl+L/V8JfIJdj6n1jovM0XhQAZ9KJHNQUUEy3ws5zw 5hKLo2MK793/wWAmTvDERWVhRhR/VDEfk5p8f4qWlIniJqyq9NJU4HdlZ w==; X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="344228961" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="344228961" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2023 08:50:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="880441001" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="880441001" Received: from boxer.igk.intel.com ([10.102.20.173]) by orsmga005.jf.intel.com with ESMTP; 29 May 2023 08:50:31 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, tirthendu.sarkar@intel.com Subject: [PATCH v2 bpf-next 02/22] xsk: introduce XSK_USE_SG bind flag for xsk socket Date: Mon, 29 May 2023 17:50:04 +0200 Message-Id: <20230529155024.222213-3-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230529155024.222213-1-maciej.fijalkowski@intel.com> References: <20230529155024.222213-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Tirthendu Sarkar As of now xsk core drops any xdp_buff with data size greater than the xsk frame_size as set by the af_xdp application. With multi-buffer support introduced in the next patch xsk core can now split those buffers into multiple descriptors provided the af_xdp application can handle them. Such capability of the application needs to be independent of the xdp_prog's frag support capability since there are cases where even a single xdp_buffer may need to be split into multiple descriptors owing to a smaller xsk frame size. For e.g., with NIC rx_buffer size set to 4kB, a 3kB packet will constitute of a single buffer and so will be sent as such to AF_XDP layer irrespective of 'xdp.frags' capability of the XDP program. Now if the xsk frame size is set to 2kB by the AF_XDP application, then the packet will need to be split into 2 descriptors if AF_XDP application can handle multi-buffer, else it needs to be dropped. Applications can now advertise their frag handling capability to xsk core so that xsk core can decide if it should drop or split xdp_buffs that exceed xsk frame size. This is done using a new 'XSK_USE_SG' bind flag for the xdp socket. Signed-off-by: Tirthendu Sarkar --- include/net/xdp_sock.h | 1 + include/uapi/linux/if_xdp.h | 6 ++++++ net/xdp/xsk.c | 5 +++-- 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index e96a1151ec75..36b0411a0d1b 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -52,6 +52,7 @@ struct xdp_sock { struct xsk_buff_pool *pool; u16 queue_id; bool zc; + bool sg; enum { XSK_READY = 0, XSK_BOUND, diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h index 434f313dc26c..8d48863472b9 100644 --- a/include/uapi/linux/if_xdp.h +++ b/include/uapi/linux/if_xdp.h @@ -25,6 +25,12 @@ * application. */ #define XDP_USE_NEED_WAKEUP (1 << 3) +/* By setting this option, userspace application indicates that it can + * handle multiple descriptors per packet thus enabling AF_XDP to split + * multi-buffer XDP frames into multiple Rx descriptors. Without this set + * such frames will be dropped. + */ +#define XDP_USE_SG (1 << 4) /* Flags for xsk_umem_config flags */ #define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 99f90a0d04ae..62d49a81d5f6 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -896,7 +896,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) flags = sxdp->sxdp_flags; if (flags & ~(XDP_SHARED_UMEM | XDP_COPY | XDP_ZEROCOPY | - XDP_USE_NEED_WAKEUP)) + XDP_USE_NEED_WAKEUP | XDP_USE_SG)) return -EINVAL; rtnl_lock(); @@ -924,7 +924,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) struct socket *sock; if ((flags & XDP_COPY) || (flags & XDP_ZEROCOPY) || - (flags & XDP_USE_NEED_WAKEUP)) { + (flags & XDP_USE_NEED_WAKEUP) || (flags & XDP_USE_SG)) { /* Cannot specify flags for shared sockets. */ err = -EINVAL; goto out_unlock; @@ -1023,6 +1023,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) xs->dev = dev; xs->zc = xs->umem->zc; + xs->sg = !!(flags & XDP_USE_SG); xs->queue_id = qid; xp_add_xsk(xs->pool, xs); From patchwork Mon May 29 15:50:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13258747 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2364AA94C; Mon, 29 May 2023 15:50:38 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EBF61D2; Mon, 29 May 2023 08:50:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685375437; x=1716911437; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EkI+u7NjR1z5LPEwmE3rD1pz9rJqcDwtVBQ+0mhO1zE=; b=GMrM6KESsBpmKGiYjcxzyRYBBhSXi/UHnysaZO/79Y2Bd17fQP8acPh+ 8QCOdrhi0PO+ccDsKWBFI5E671tUT4JgrJ8VXYB8cHh2y5Nkp+cuHR12M L9NY5BAj9RXZn+bftnAordGqHmieoXk463F8Q3DP1YCG02tvoE55swG2r dTNPXBYgZr1cL0w4Xkq296KJJYhsV/z20PCtedsmlUUWJB3NouvWH/TnP 8iPrOIwqIOfcmGdXA65zAM+Q2cFFwGA68v9WJJ7pvr4l0QzQGAC/hIKL4 ftxcZkpKZzj5s7kkoHOOI6eVTlPeJG2350cn4yQ65JRlzRMPeqqoMVIhI g==; X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="344228975" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="344228975" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2023 08:50:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="880441019" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="880441019" Received: from boxer.igk.intel.com ([10.102.20.173]) by orsmga005.jf.intel.com with ESMTP; 29 May 2023 08:50:34 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, tirthendu.sarkar@intel.com, Maciej Fijalkowski Subject: [PATCH v2 bpf-next 03/22] xsk: prepare both copy and zero-copy modes to co-exist Date: Mon, 29 May 2023 17:50:05 +0200 Message-Id: <20230529155024.222213-4-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230529155024.222213-1-maciej.fijalkowski@intel.com> References: <20230529155024.222213-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Currently, __xsk_rcv_zc() is a function that is responsible for producing AF_XDP Rx descriptors. It is used by both copy and zero-copy mode. Both of these modes are going to differ when multi-buffer support is going to be added. ZC will work on a chain of xdp_buff_xsk structs whereas copy-mode is going to utilize skb_shared_info contents. This means that ZC-specific changes would affect the copy mode. Let's modify __xsk_rcv_zc() to work directly on xdp_buff_xsk so the callsites have to retrieve this from xdp_buff. Also, introduce xsk_rcv_zc() which will carry all the needed later changes for supporting multi-buffer on ZC side that do not apply to copy mode. Signed-off-by: Maciej Fijalkowski --- net/xdp/xsk.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 62d49a81d5f6..3a68988dd06f 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -135,9 +135,9 @@ int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool *pool, return 0; } -static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len, u32 flags) +static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff_xsk *xskb, u32 len, + u32 flags) { - struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp); u64 addr; int err; @@ -152,6 +152,13 @@ static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len, u32 return 0; } +static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) +{ + struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp); + + return __xsk_rcv_zc(xs, xskb, len, 0); +} + static void xsk_copy_xdp(struct xdp_buff *to, struct xdp_buff *from, u32 len) { void *from_buf, *to_buf; @@ -172,6 +179,7 @@ static void xsk_copy_xdp(struct xdp_buff *to, struct xdp_buff *from, u32 len) static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) { + struct xdp_buff_xsk *xskb; struct xdp_buff *xsk_xdp; int err; u32 len; @@ -189,7 +197,8 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) } xsk_copy_xdp(xsk_xdp, xdp, len); - err = __xsk_rcv_zc(xs, xsk_xdp, len, 0); + xskb = container_of(xsk_xdp, struct xdp_buff_xsk, xdp); + err = __xsk_rcv_zc(xs, xskb, len, 0); if (err) { xsk_buff_free(xsk_xdp); return err; @@ -259,7 +268,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) if (xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL) { len = xdp->data_end - xdp->data; - return __xsk_rcv_zc(xs, xdp, len, 0); + return xsk_rcv_zc(xs, xdp, len); } err = __xsk_rcv(xs, xdp); From patchwork Mon May 29 15:50:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13258748 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 39FC2BE49; Mon, 29 May 2023 15:50:41 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 776ACA3; Mon, 29 May 2023 08:50:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685375439; x=1716911439; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=wWYwHByZ0dmDML1FJxTgrszkLnxHYaT7kKhvasZIciM=; b=GR0n6+8XRPCqqN76GNU2jqmfv9WFu+y3bLO0SSYFxSTw8ADhme3EfBq3 x0AbMuf3cmnGIL0LxQmOU+cAKVjy9jCdx1KlWITMZT6dR82qTTspWMkaZ GPslOPEjCYV5pffBYabEXrvcinvAeFwc8JKOA9Vh9AkBuoGFwaRqfmAZJ ihwJKCvmcptafzb+kbgZEYZ+uvAoJ0vhpaQgpJS9pDrk2b+qlfnXXMOZD AECMjMNMFW1ZMsUm6cj/TpeIHCF1PLeGAvbTpB/3aOuqOM2IG89HHuVCi KfUj61h0ysnqaxZAy+nRpTRy/CrOORssh5n9EOfLjS5KIJP/I+uEcsfm3 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="344228979" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="344228979" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2023 08:50:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="880441030" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="880441030" Received: from boxer.igk.intel.com ([10.102.20.173]) by orsmga005.jf.intel.com with ESMTP; 29 May 2023 08:50:36 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, tirthendu.sarkar@intel.com Subject: [PATCH v2 bpf-next 04/22] xsk: move xdp_buff's data length check to xsk_rcv_check Date: Mon, 29 May 2023 17:50:06 +0200 Message-Id: <20230529155024.222213-5-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230529155024.222213-1-maciej.fijalkowski@intel.com> References: <20230529155024.222213-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Tirthendu Sarkar If the data in xdp_buff exceeds the xsk frame length, the packet needs to be dropped. This check is currently being done in __xsk_rcv(). Move the described logic to xsk_rcv_check() so that such a xdp_buff will only be dropped if the application does not support multi-buffer (absence of XDP_USE_SG bind flag). This is applicable for all cases: copy mode, zero copy mode as well as skb mode. Signed-off-by: Tirthendu Sarkar --- net/xdp/xsk.c | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 3a68988dd06f..22eeb7f6ac05 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -177,18 +177,11 @@ static void xsk_copy_xdp(struct xdp_buff *to, struct xdp_buff *from, u32 len) memcpy(to_buf, from_buf, len + metalen); } -static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) +static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) { struct xdp_buff_xsk *xskb; struct xdp_buff *xsk_xdp; int err; - u32 len; - - len = xdp->data_end - xdp->data; - if (len > xsk_pool_get_rx_frame_size(xs->pool)) { - xs->rx_dropped++; - return -ENOSPC; - } xsk_xdp = xsk_buff_alloc(xs->pool); if (!xsk_xdp) { @@ -224,7 +217,7 @@ static bool xsk_is_bound(struct xdp_sock *xs) return false; } -static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp) +static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) { if (!xsk_is_bound(xs)) return -ENXIO; @@ -232,6 +225,11 @@ static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp) if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index) return -EINVAL; + if (len > xsk_pool_get_rx_frame_size(xs->pool)) { + xs->rx_dropped++; + return -ENOSPC; + } + sk_mark_napi_id_once_xdp(&xs->sk, xdp); return 0; } @@ -245,12 +243,13 @@ static void xsk_flush(struct xdp_sock *xs) int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) { + u32 len = xdp_get_buff_len(xdp); int err; spin_lock_bh(&xs->rx_lock); - err = xsk_rcv_check(xs, xdp); + err = xsk_rcv_check(xs, xdp, len); if (!err) { - err = __xsk_rcv(xs, xdp); + err = __xsk_rcv(xs, xdp, len); xsk_flush(xs); } spin_unlock_bh(&xs->rx_lock); @@ -259,10 +258,10 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) { + u32 len = xdp_get_buff_len(xdp); int err; - u32 len; - err = xsk_rcv_check(xs, xdp); + err = xsk_rcv_check(xs, xdp, len); if (err) return err; @@ -271,7 +270,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) return xsk_rcv_zc(xs, xdp, len); } - err = __xsk_rcv(xs, xdp); + err = __xsk_rcv(xs, xdp, len); if (!err) xdp_return_buff(xdp); return err; From patchwork Mon May 29 15:50:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13258749 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE2908F6D; Mon, 29 May 2023 15:50:45 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 733EFCD; Mon, 29 May 2023 08:50:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685375442; x=1716911442; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=zLAZqLN2UDvBKg6EgoSlabnz1XZ+p1r2rPgwqy2WG+Q=; b=HHWQ72Ybbvm3BjJx3JMP4RrM0yNibnMdMThJnmiFjkirvDT33YftCQN4 YehWzqPgHSuNSDBHwA3+k9dYU4V6vNDpRxA8Xx0ctCZPQc0x70eJGIxBK 1OpnHB6IKKLjIAhJPKU3Pd1yYhjZ4IKxDYQMQSp6CvJL4+BRLnMyqagvJ NZQCz+apHKyMudFBIkhNvFppZNsgbCmoLr1trXsyEZ304RuejG19iyXXH 0lTmSsrKEZBqJwRbhxEhfHmFbloTezyXK7U0giZonOHRKsdqe4Ld2uD1c Yzq1q5rDZoDMcO5Gj59ZnlO3xv6d5/jVNeAKP7kXJc8+Wn+MrTjn7hZeL g==; X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="344228983" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="344228983" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2023 08:50:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="880441040" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="880441040" Received: from boxer.igk.intel.com ([10.102.20.173]) by orsmga005.jf.intel.com with ESMTP; 29 May 2023 08:50:39 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, tirthendu.sarkar@intel.com Subject: [PATCH v2 bpf-next 05/22] xsk: add support for AF_XDP multi-buffer on Rx path Date: Mon, 29 May 2023 17:50:07 +0200 Message-Id: <20230529155024.222213-6-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230529155024.222213-1-maciej.fijalkowski@intel.com> References: <20230529155024.222213-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Tirthendu Sarkar Add multi-buffer support for AF_XDP by extending the XDP multi-buffer support to be reflected in user-space when a packet is redirected to an AF_XDP socket. In the XDP implementation, the NIC driver builds the xdp_buff from the first frag of the packet and adds any subsequent frags in the skb_shinfo area of the xdp_buff. In AF_XDP core, XDP buffers are allocated from xdp_sock's pool and data is copied from the driver's xdp_buff and frags. Once an allocated XDP buffer is full and there is still data to be copied, the 'XDP_PKT_CONTD' flag in'options' field of the corresponding xdp ring descriptor is set and passed to the application. When application sees the aforementioned flag set it knows there is pending data for this packet that will be carried in the following descriptors. If there is no more data to be copied, the flag in 'options' field is cleared for that descriptor signalling EOP to the application. If application reads a batch of descriptors using for example the libxdp interfaces, it is not guaranteed that the batch will end with a full packet. It might end in the middle of a packet and the rest of the frames of that packet will arrive at the beginning of the next batch. AF_XDP ensures that only a complete packet (along with all its frags) is sent to application. Signed-off-by: Tirthendu Sarkar --- net/core/filter.c | 7 +-- net/xdp/xsk.c | 110 ++++++++++++++++++++++++++++++++++++---------- 2 files changed, 88 insertions(+), 29 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 968139f4a1ac..07c2ad8f3df9 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4344,13 +4344,8 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); enum bpf_map_type map_type = ri->map_type; - if (map_type == BPF_MAP_TYPE_XSKMAP) { - /* XDP_REDIRECT is not supported AF_XDP yet. */ - if (unlikely(xdp_buff_has_frags(xdp))) - return -EOPNOTSUPP; - + if (map_type == BPF_MAP_TYPE_XSKMAP) return __xdp_do_redirect_xsk(ri, dev, xdp, xdp_prog); - } return __xdp_do_redirect_frame(ri, dev, xdp_convert_buff_to_frame(xdp), xdp_prog); diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 22eeb7f6ac05..86d8b23ae0a7 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -159,43 +159,107 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) return __xsk_rcv_zc(xs, xskb, len, 0); } -static void xsk_copy_xdp(struct xdp_buff *to, struct xdp_buff *from, u32 len) +static void *xsk_copy_xdp_start(struct xdp_buff *from) { - void *from_buf, *to_buf; - u32 metalen; + if (unlikely(xdp_data_meta_unsupported(from))) + return from->data; + else + return from->data_meta; +} - if (unlikely(xdp_data_meta_unsupported(from))) { - from_buf = from->data; - to_buf = to->data; - metalen = 0; - } else { - from_buf = from->data_meta; - metalen = from->data - from->data_meta; - to_buf = to->data - metalen; - } +static u32 xsk_copy_xdp(void *to, void **from, u32 to_len, + u32 *from_len, skb_frag_t **frag, u32 rem) +{ + u32 copied = 0; + + while (1) { + u32 copy_len = min_t(u32, *from_len, to_len); + + memcpy(to, *from, copy_len); + copied += copy_len; + if (rem == copied) + return copied; + + if (*from_len == copy_len) { + *from = skb_frag_address(*frag); + *from_len = skb_frag_size((*frag)++); + } else { + *from += copy_len; + *from_len -= copy_len; + } + if (to_len == copy_len) + return copied; - memcpy(to_buf, from_buf, len + metalen); + to_len -= copy_len; + to += copy_len; + } } static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) { + u32 frame_size = xsk_pool_get_rx_frame_size(xs->pool); + void *copy_from = xsk_copy_xdp_start(xdp), *copy_to; + u32 from_len, meta_len, rem, num_desc; struct xdp_buff_xsk *xskb; struct xdp_buff *xsk_xdp; - int err; + skb_frag_t *frag; + + from_len = xdp->data_end - copy_from; + meta_len = xdp->data - copy_from; + rem = len + meta_len; + + if (len <= frame_size && !xdp_buff_has_frags(xdp)) { + int err; - xsk_xdp = xsk_buff_alloc(xs->pool); - if (!xsk_xdp) { + xsk_xdp = xsk_buff_alloc(xs->pool); + if (!xsk_xdp) { + xs->rx_dropped++; + return -ENOMEM; + } + memcpy(xsk_xdp->data - meta_len, copy_from, rem); + xskb = container_of(xsk_xdp, struct xdp_buff_xsk, xdp); + err = __xsk_rcv_zc(xs, xskb, len, 0); + if (err) { + xsk_buff_free(xsk_xdp); + return err; + } + + return 0; + } + + num_desc = (len - 1) / frame_size + 1; + + if (!xsk_buff_can_alloc(xs->pool, num_desc)) { xs->rx_dropped++; return -ENOMEM; } + if (xskq_prod_nb_free(xs->rx, num_desc) < num_desc) { + xs->rx_queue_full++; + return -ENOBUFS; + } - xsk_copy_xdp(xsk_xdp, xdp, len); - xskb = container_of(xsk_xdp, struct xdp_buff_xsk, xdp); - err = __xsk_rcv_zc(xs, xskb, len, 0); - if (err) { - xsk_buff_free(xsk_xdp); - return err; + if (xdp_buff_has_frags(xdp)) { + struct skb_shared_info *sinfo; + + sinfo = xdp_get_shared_info_from_buff(xdp); + frag = &sinfo->frags[0]; } + + do { + u32 to_len = frame_size + meta_len; + u32 copied; + + xsk_xdp = xsk_buff_alloc(xs->pool); + copy_to = xsk_xdp->data - meta_len; + + copied = xsk_copy_xdp(copy_to, ©_from, to_len, &from_len, &frag, rem); + rem -= copied; + + xskb = container_of(xsk_xdp, struct xdp_buff_xsk, xdp); + __xsk_rcv_zc(xs, xskb, copied - meta_len, rem ? XDP_PKT_CONTD : 0); + meta_len = 0; + } while (rem); + return 0; } @@ -225,7 +289,7 @@ static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index) return -EINVAL; - if (len > xsk_pool_get_rx_frame_size(xs->pool)) { + if (len > xsk_pool_get_rx_frame_size(xs->pool) && !xs->sg) { xs->rx_dropped++; return -ENOSPC; } From patchwork Mon May 29 15:50:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13258750 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A6F12D30B; Mon, 29 May 2023 15:50:50 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B33CFD2; Mon, 29 May 2023 08:50:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685375444; x=1716911444; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=x8FFWEkD8XqU85QMPz3N9wzhwSqbpk8WiQpXyV3StMM=; b=Gde4iqKCUPcYDt1VFzfyF9Ju6JOLBjsgVixxojEbdTI/ma04djDG7c2X zyUPpLcyT3CvUasR7gUnkrHqVwXPtoRtGqoRFJRXl3bk7HKjOmL6ThPVm D0oALIkvgUsQuTaEzUoPnDWXz9Kf381lRfWYeeFMJ4h+c+yo00Ij6BsHN nIoJdLryX7Jp0W9vODmA7CJy7HDbPyJInVZdj3qBSjw3mQCgcHuGek557 U4CVDxqYd/MgcJz3z0HpqNEwEayO/xxHyz3B/jtWYFKTaAkWNQWollI4Q a0z7tKU2FJsxTCH8OtYMmsPJoHn0Z8iyp7zibh7w6Qmo9am3FluSfZcwF w==; X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="344228986" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="344228986" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2023 08:50:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="880441054" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="880441054" Received: from boxer.igk.intel.com ([10.102.20.173]) by orsmga005.jf.intel.com with ESMTP; 29 May 2023 08:50:41 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, tirthendu.sarkar@intel.com Subject: [PATCH v2 bpf-next 06/22] xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path Date: Mon, 29 May 2023 17:50:08 +0200 Message-Id: <20230529155024.222213-7-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230529155024.222213-1-maciej.fijalkowski@intel.com> References: <20230529155024.222213-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Tirthendu Sarkar In Tx path, xsk core reserves space for each desc to be transmitted in the completion queue and it's address contained in it is stored in the skb destructor arg. After successful transmission the skb destructor submits the addr marking completion. To handle multiple descriptors per packet, now along with reserving space for each descriptor, the corresponding address is also stored in completion queue. The number of pending descriptors are stored in skb destructor arg and is used by the skb destructor to update completions. Introduce 'skb' in xdp_sock to store a partially built packet when __xsk_generic_xmit() must return before it sees the EOP descriptor for the current packet so that packet building can resume in next call of __xsk_generic_xmit(). Helper functions are introduced to set and get the pending descriptors in the skb destructor arg. Also, wrappers are introduced for storing descriptor addresses, submitting and cancelling (for unsuccessful transmissions) the number of completions. Signed-off-by: Tirthendu Sarkar --- include/net/xdp_sock.h | 6 ++++ net/xdp/xsk.c | 74 ++++++++++++++++++++++++++++++------------ net/xdp/xsk_queue.h | 19 ++++------- 3 files changed, 67 insertions(+), 32 deletions(-) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 36b0411a0d1b..1617af380162 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -68,6 +68,12 @@ struct xdp_sock { u64 rx_dropped; u64 rx_queue_full; + /* When __xsk_generic_xmit() must return before it sees the EOP descriptor for the current + * packet, the partially built skb is saved here so that packet building can resume in next + * call of __xsk_generic_xmit(). + */ + struct sk_buff *skb; + struct list_head map_list; /* Protects map_list */ spinlock_t map_list_lock; diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 86d8b23ae0a7..29bda8452e2c 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -480,19 +480,65 @@ static int xsk_wakeup(struct xdp_sock *xs, u8 flags) return dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id, flags); } -static void xsk_destruct_skb(struct sk_buff *skb) +static int xsk_cq_reserve_addr_locked(struct xdp_sock *xs, u64 addr) +{ + unsigned long flags; + int ret; + + spin_lock_irqsave(&xs->pool->cq_lock, flags); + ret = xskq_prod_reserve_addr(xs->pool->cq, addr); + spin_unlock_irqrestore(&xs->pool->cq_lock, flags); + + return ret; +} + +static void xsk_cq_submit_locked(struct xdp_sock *xs, u32 n) +{ + unsigned long flags; + + spin_lock_irqsave(&xs->pool->cq_lock, flags); + xskq_prod_submit_n(xs->pool->cq, n); + spin_unlock_irqrestore(&xs->pool->cq_lock, flags); +} + +static void xsk_cq_cancel_locked(struct xdp_sock *xs, u32 n) { - u64 addr = (u64)(long)skb_shinfo(skb)->destructor_arg; - struct xdp_sock *xs = xdp_sk(skb->sk); unsigned long flags; spin_lock_irqsave(&xs->pool->cq_lock, flags); - xskq_prod_submit_addr(xs->pool->cq, addr); + xskq_prod_cancel_n(xs->pool->cq, n); spin_unlock_irqrestore(&xs->pool->cq_lock, flags); +} + +static u32 xsk_get_num_desc(struct sk_buff *skb) +{ + return skb ? (long)skb_shinfo(skb)->destructor_arg : 0; +} +static void xsk_destruct_skb(struct sk_buff *skb) +{ + xsk_cq_submit_locked(xdp_sk(skb->sk), xsk_get_num_desc(skb)); sock_wfree(skb); } +static void xsk_set_destructor_arg(struct sk_buff *skb) +{ + long num = xsk_get_num_desc(xdp_sk(skb->sk)->skb) + 1; + + skb_shinfo(skb)->destructor_arg = (void *)num; +} + +static void xsk_consume_skb(struct sk_buff *skb) +{ + struct xdp_sock *xs = xdp_sk(skb->sk); + + skb->destructor = sock_wfree; + xsk_cq_cancel_locked(xs, xsk_get_num_desc(skb)); + /* Free skb without triggering the perf drop trace */ + consume_skb(skb); + xs->skb = NULL; +} + static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, struct xdp_desc *desc) { @@ -578,8 +624,8 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, skb->dev = dev; skb->priority = xs->sk.sk_priority; skb->mark = xs->sk.sk_mark; - skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr; skb->destructor = xsk_destruct_skb; + xsk_set_destructor_arg(skb); return skb; } @@ -591,7 +637,6 @@ static int __xsk_generic_xmit(struct sock *sk) bool sent_frame = false; struct xdp_desc desc; struct sk_buff *skb; - unsigned long flags; int err = 0; mutex_lock(&xs->mutex); @@ -616,31 +661,20 @@ static int __xsk_generic_xmit(struct sock *sk) * if there is space in it. This avoids having to implement * any buffering in the Tx path. */ - spin_lock_irqsave(&xs->pool->cq_lock, flags); - if (xskq_prod_reserve(xs->pool->cq)) { - spin_unlock_irqrestore(&xs->pool->cq_lock, flags); + if (xsk_cq_reserve_addr_locked(xs, desc.addr)) goto out; - } - spin_unlock_irqrestore(&xs->pool->cq_lock, flags); skb = xsk_build_skb(xs, &desc); if (IS_ERR(skb)) { err = PTR_ERR(skb); - spin_lock_irqsave(&xs->pool->cq_lock, flags); - xskq_prod_cancel(xs->pool->cq); - spin_unlock_irqrestore(&xs->pool->cq_lock, flags); + xsk_cq_cancel_locked(xs, 1); goto out; } err = __dev_direct_xmit(skb, xs->queue_id); if (err == NETDEV_TX_BUSY) { /* Tell user-space to retry the send */ - skb->destructor = sock_wfree; - spin_lock_irqsave(&xs->pool->cq_lock, flags); - xskq_prod_cancel(xs->pool->cq); - spin_unlock_irqrestore(&xs->pool->cq_lock, flags); - /* Free skb without triggering the perf drop trace */ - consume_skb(skb); + xsk_consume_skb(skb); err = -EAGAIN; goto out; } diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index ad81b19e6fdf..4190f43ce0b0 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -297,6 +297,11 @@ static inline void xskq_cons_release(struct xsk_queue *q) q->cached_cons++; } +static inline void xskq_cons_cancel_n(struct xsk_queue *q, u32 cnt) +{ + q->cached_cons -= cnt; +} + static inline u32 xskq_cons_present_entries(struct xsk_queue *q) { /* No barriers needed since data is not accessed */ @@ -324,9 +329,9 @@ static inline bool xskq_prod_is_full(struct xsk_queue *q) return xskq_prod_nb_free(q, 1) ? false : true; } -static inline void xskq_prod_cancel(struct xsk_queue *q) +static inline void xskq_prod_cancel_n(struct xsk_queue *q, u32 cnt) { - q->cached_prod--; + q->cached_prod -= cnt; } static inline int xskq_prod_reserve(struct xsk_queue *q) @@ -392,16 +397,6 @@ static inline void xskq_prod_submit(struct xsk_queue *q) __xskq_prod_submit(q, q->cached_prod); } -static inline void xskq_prod_submit_addr(struct xsk_queue *q, u64 addr) -{ - struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring; - u32 idx = q->ring->producer; - - ring->desc[idx++ & q->ring_mask] = addr; - - __xskq_prod_submit(q, idx); -} - static inline void xskq_prod_submit_n(struct xsk_queue *q, u32 nb_entries) { __xskq_prod_submit(q, q->ring->producer + nb_entries); From patchwork Mon May 29 15:50:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13258751 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 921ECDDC9; Mon, 29 May 2023 15:50:51 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 436B3C7; Mon, 29 May 2023 08:50:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685375446; x=1716911446; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=22zRMCMKH8lVdmtmiFLlBektSQdQh4pS5QkST5/Qiqc=; b=l8luJdOL8n6XS18IOq25lBjfOlEhSLZFmyCJrm+3q8UKnZi13GL5YifO RAqNg8mJ4vuCvjMyCrD9emC87l3fH5P+EmIQYg5o7zP34FYjNnnQWdfAe uUmp8mW7zUlk1XSKnI8PEG6RLIXdLtm/t8rTsotnQAs+UBo3J0oQBhu08 8m24AO52PRNswn1LqsKTTEyMHFv7o0RPFDuKvh8c/JRs6DqwHrY1v50ZD H+C/Al85MNOT6pe55vjx0CoScv9i3HOlLRknqhKiU9z6QAjxYGl8tSAdb HJc6KDf/oNmTlZOWhsj9Etrhur4OOYtSxzIyj8bZ1BtY2IZqZ8r9vAwCC w==; X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="344228993" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="344228993" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2023 08:50:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="880441066" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="880441066" Received: from boxer.igk.intel.com ([10.102.20.173]) by orsmga005.jf.intel.com with ESMTP; 29 May 2023 08:50:43 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, tirthendu.sarkar@intel.com, Maciej Fijalkowski Subject: [PATCH v2 bpf-next 07/22] xsk: allow core/drivers to test EOP bit Date: Mon, 29 May 2023 17:50:09 +0200 Message-Id: <20230529155024.222213-8-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230529155024.222213-1-maciej.fijalkowski@intel.com> References: <20230529155024.222213-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Drivers are used to check for EOP bit whereas AF_XDP operates on inverted logic - user space indicates that current frag is not the last one and packet continues. For AF_XDP core needs, add xp_mb_desc() that will simply test XDP_PKT_CONTD from xdp_desc::options, but in order to preserve drivers default behavior, introduce an interface for ZC drivers that will negate xp_mb_desc() result and therefore make it easier to test EOP bit from during production of HW Tx descriptors. Signed-off-by: Maciej Fijalkowski --- include/net/xdp_sock_drv.h | 10 ++++++++++ include/net/xsk_buff_pool.h | 5 +++++ 2 files changed, 15 insertions(+) diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index 9c0d860609ba..a3e18b958d93 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -89,6 +89,11 @@ static inline struct xdp_buff *xsk_buff_alloc(struct xsk_buff_pool *pool) return xp_alloc(pool); } +static inline bool xsk_is_eop_desc(struct xdp_desc *desc) +{ + return !xp_mb_desc(desc); +} + /* Returns as many entries as possible up to max. 0 <= N <= max. */ static inline u32 xsk_buff_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u32 max) { @@ -241,6 +246,11 @@ static inline struct xdp_buff *xsk_buff_alloc(struct xsk_buff_pool *pool) return NULL; } +static inline bool xsk_is_eop_desc(struct xdp_desc *desc) +{ + return false; +} + static inline u32 xsk_buff_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u32 max) { return 0; diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index a8d7b8a3688a..4dcca163e076 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -184,6 +184,11 @@ static inline bool xp_desc_crosses_non_contig_pg(struct xsk_buff_pool *pool, !(pool->dma_pages[addr >> PAGE_SHIFT] & XSK_NEXT_PG_CONTIG_MASK); } +static inline bool xp_mb_desc(struct xdp_desc *desc) +{ + return desc->options & XDP_PKT_CONTD; +} + static inline u64 xp_aligned_extract_addr(struct xsk_buff_pool *pool, u64 addr) { return addr & pool->chunk_mask; From patchwork Mon May 29 15:50:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13258752 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C705ADDC9; Mon, 29 May 2023 15:50:52 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C2F7CE3; Mon, 29 May 2023 08:50:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685375448; x=1716911448; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YYxcnih+l1E/STFjh2FHeIQ3ot4Ey6eEd/svhJaX89k=; b=Erdg5zZN9TIY5G5csWtFbjPJh3TyLmoW+sr7861Etdvb9qwlcowhVJq8 2db5lrc/vkz8W4UwGOclGega9s35qKXAkszK/bL4+yMnhRNUw5+hrwlAr w64A8/5CKCbIFb6BI4e+qdVZfRsqu3awm5Cwe72InEMBZjuYmwoyfEilC HA9pggS6bibY2vZfqktIF0taADkaeW5mREeFg9qhaZLKt/zBHcVx01HR1 jRkTNnDcwwivzJsA99YxWJzEs/UxM0X/gsuB/uxvSlDQ2iCU8JN1VwdFM LtEtOmpzx0pnw3YwtJ033MYGjgkDsRMILPCHgkx/4UCq4cnYjSTmrcoLu Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="344229002" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="344229002" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2023 08:50:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="880441073" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="880441073" Received: from boxer.igk.intel.com ([10.102.20.173]) by orsmga005.jf.intel.com with ESMTP; 29 May 2023 08:50:46 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, tirthendu.sarkar@intel.com Subject: [PATCH v2 bpf-next 08/22] xsk: add support for AF_XDP multi-buffer on Tx path Date: Mon, 29 May 2023 17:50:10 +0200 Message-Id: <20230529155024.222213-9-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230529155024.222213-1-maciej.fijalkowski@intel.com> References: <20230529155024.222213-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Tirthendu Sarkar For transmitting an AF_XDP packet, allocate skb while processing the first desc and copy data to it. The 'XDP_PKT_CONTD' flag in 'options' field of the desc indicates the EOP status of the packet. If the current desc is not EOP, store the skb, release the current desc and go on to read the next descs. Allocate a page for each subsequent desc, copy data to it and add it as a frag in the skb stored in xsk. On processing EOP, transmit the skb with frags. Addresses contained in descs have been already queued in consumer queue and skb destructor updated the completion count. On transmit failure cancel the releases, clear the descs from the completion queue and consume the skb for retrying packet transmission. For any invalid descriptor (invalid length/address/options) in the middle of a packet, all pending descriptors will be dropped by xsk core along with the invalid one and the next descriptor is treated as the start of a new packet. Maximum supported frames for a packet is MAX_SKB_FRAGS + 1. If it is exceeded, all descriptors accumulated so far are dropped. Signed-off-by: Tirthendu Sarkar --- net/xdp/xsk.c | 117 +++++++++++++++++++++++++++++++++----------- net/xdp/xsk_queue.h | 13 +++-- 2 files changed, 97 insertions(+), 33 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 29bda8452e2c..bc8dbd9f5312 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -393,7 +393,8 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) rcu_read_lock(); list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) { if (!xskq_cons_peek_desc(xs->tx, desc, pool)) { - xs->tx->queue_empty_descs++; + if (xskq_has_descs(xs->tx)) + xskq_cons_release(xs->tx); continue; } @@ -539,24 +540,32 @@ static void xsk_consume_skb(struct sk_buff *skb) xs->skb = NULL; } +static void xsk_drop_skb(struct sk_buff *skb) +{ + xdp_sk(skb->sk)->tx->invalid_descs += xsk_get_num_desc(skb); + xsk_consume_skb(skb); +} + static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, struct xdp_desc *desc) { struct xsk_buff_pool *pool = xs->pool; u32 hr, len, ts, offset, copy, copied; - struct sk_buff *skb; + struct sk_buff *skb = xs->skb; struct page *page; void *buffer; int err, i; u64 addr; - hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(xs->dev->needed_headroom)); + if (!skb) { + hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(xs->dev->needed_headroom)); - skb = sock_alloc_send_skb(&xs->sk, hr, 1, &err); - if (unlikely(!skb)) - return ERR_PTR(err); + skb = sock_alloc_send_skb(&xs->sk, hr, 1, &err); + if (unlikely(!skb)) + return ERR_PTR(err); - skb_reserve(skb, hr); + skb_reserve(skb, hr); + } addr = desc->addr; len = desc->len; @@ -566,7 +575,10 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, offset = offset_in_page(buffer); addr = buffer - pool->addrs; - for (copied = 0, i = 0; copied < len; i++) { + for (copied = 0, i = skb_shinfo(skb)->nr_frags; copied < len; i++) { + if (unlikely(i >= MAX_SKB_FRAGS)) + return ERR_PTR(-EFAULT); + page = pool->umem->pgs[addr >> PAGE_SHIFT]; get_page(page); @@ -591,33 +603,56 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, struct xdp_desc *desc) { struct net_device *dev = xs->dev; - struct sk_buff *skb; + struct sk_buff *skb = xs->skb; + int err; if (dev->priv_flags & IFF_TX_SKB_NO_LINEAR) { skb = xsk_build_skb_zerocopy(xs, desc); - if (IS_ERR(skb)) - return skb; + if (IS_ERR(skb)) { + err = PTR_ERR(skb); + goto free_err; + } } else { u32 hr, tr, len; void *buffer; - int err; - hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(dev->needed_headroom)); - tr = dev->needed_tailroom; + buffer = xsk_buff_raw_get_data(xs->pool, desc->addr); len = desc->len; - skb = sock_alloc_send_skb(&xs->sk, hr + len + tr, 1, &err); - if (unlikely(!skb)) - return ERR_PTR(err); + if (!skb) { + hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(dev->needed_headroom)); + tr = dev->needed_tailroom; + skb = sock_alloc_send_skb(&xs->sk, hr + len + tr, 1, &err); + if (unlikely(!skb)) + goto free_err; - skb_reserve(skb, hr); - skb_put(skb, len); + skb_reserve(skb, hr); + skb_put(skb, len); - buffer = xsk_buff_raw_get_data(xs->pool, desc->addr); - err = skb_store_bits(skb, 0, buffer, len); - if (unlikely(err)) { - kfree_skb(skb); - return ERR_PTR(err); + err = skb_store_bits(skb, 0, buffer, len); + if (unlikely(err)) + goto free_err; + } else { + int nr_frags = skb_shinfo(skb)->nr_frags; + struct page *page; + u8 *vaddr; + + if (unlikely(nr_frags >= MAX_SKB_FRAGS)) { + err = -EFAULT; + goto free_err; + } + + page = alloc_page(xs->sk.sk_allocation); + if (unlikely(!page)) { + err = -EAGAIN; + goto free_err; + } + + vaddr = kmap_local_page(page); + memcpy(vaddr, buffer, len); + kunmap_local(vaddr); + + skb_add_rx_frag(skb, nr_frags, page, 0, len, 0); } } @@ -628,6 +663,17 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, xsk_set_destructor_arg(skb); return skb; + +free_err: + if (err == -EAGAIN) { + xsk_cq_cancel_locked(xs, 1); + } else { + xsk_set_destructor_arg(skb); + xsk_drop_skb(skb); + xskq_cons_release(xs->tx); + } + + return ERR_PTR(err); } static int __xsk_generic_xmit(struct sock *sk) @@ -667,30 +713,45 @@ static int __xsk_generic_xmit(struct sock *sk) skb = xsk_build_skb(xs, &desc); if (IS_ERR(skb)) { err = PTR_ERR(skb); - xsk_cq_cancel_locked(xs, 1); - goto out; + if (err == -EAGAIN) + goto out; + err = 0; + continue; + } + + xskq_cons_release(xs->tx); + + if (xp_mb_desc(&desc)) { + xs->skb = skb; + continue; } err = __dev_direct_xmit(skb, xs->queue_id); if (err == NETDEV_TX_BUSY) { /* Tell user-space to retry the send */ + xskq_cons_cancel_n(xs->tx, xsk_get_num_desc(skb)); xsk_consume_skb(skb); err = -EAGAIN; goto out; } - xskq_cons_release(xs->tx); /* Ignore NET_XMIT_CN as packet might have been sent */ if (err == NET_XMIT_DROP) { /* SKB completed but not sent */ err = -EBUSY; + xs->skb = NULL; goto out; } sent_frame = true; + xs->skb = NULL; } - xs->tx->queue_empty_descs++; + if (xskq_has_descs(xs->tx)) { + if (xs->skb) + xsk_drop_skb(xs->skb); + xskq_cons_release(xs->tx); + } out: if (sent_frame) diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index 4190f43ce0b0..2d2af9fc2744 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -175,6 +175,11 @@ static inline bool xp_validate_desc(struct xsk_buff_pool *pool, xp_aligned_validate_desc(pool, desc); } +static inline bool xskq_has_descs(struct xsk_queue *q) +{ + return q->cached_cons != q->cached_prod; +} + static inline bool xskq_cons_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d, struct xsk_buff_pool *pool) @@ -190,17 +195,15 @@ static inline bool xskq_cons_read_desc(struct xsk_queue *q, struct xdp_desc *desc, struct xsk_buff_pool *pool) { - while (q->cached_cons != q->cached_prod) { + if (q->cached_cons != q->cached_prod) { struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring; u32 idx = q->cached_cons & q->ring_mask; *desc = ring->desc[idx]; - if (xskq_cons_is_valid_desc(q, desc, pool)) - return true; - - q->cached_cons++; + return xskq_cons_is_valid_desc(q, desc, pool); } + q->queue_empty_descs++; return false; } From patchwork Mon May 29 15:50:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13258753 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 16C88107A0; Mon, 29 May 2023 15:50:54 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D8F99C4; Mon, 29 May 2023 08:50:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685375450; x=1716911450; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=AilTsEvxp/AWGPKL5B7pI6JA0eOe2QWyzInjE4LTUdk=; b=CNFTVkVHXZSZmaikHkS/e8Y9qG/7qDrOauGyr6NxNQaGjOPNx0muhOIe hqAD1THvh7Fs3bj+MmVtv2oCbvfUlTSAn5oaoP3Z26nF8k2jRwFpdHaeq 83SRBdrCQD2nYzKaQ8j1AQ/28B0CwoLBXhf94nhqaXYEtV+GOlQjF5ivi ZWnBFbL5N4MFc20QWSi1a1n480aZr15ibFTTAknLGGLxjlzMO7qjZ0NOQ pBn7syLDe2khqNeNltpGtJvrOdfmjxIw3Sd9a9K06vfgP36+YhzkOB/5l THVqXXdthKeUhwiMqTrjs7ev5ESErmr0KqKkKt/yP/MZNROmuulLNsvAL w==; X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="344229009" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="344229009" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2023 08:50:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="880441088" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="880441088" Received: from boxer.igk.intel.com ([10.102.20.173]) by orsmga005.jf.intel.com with ESMTP; 29 May 2023 08:50:48 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, tirthendu.sarkar@intel.com, Maciej Fijalkowski Subject: [PATCH v2 bpf-next 09/22] xsk: discard zero length descriptors in Tx path Date: Mon, 29 May 2023 17:50:11 +0200 Message-Id: <20230529155024.222213-10-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230529155024.222213-1-maciej.fijalkowski@intel.com> References: <20230529155024.222213-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Tirthendu Sarkar Descriptors with zero length are not supported by many NICs. To preserve uniform behavior discard any zero length desc as invvalid desc. Signed-off-by: Tirthendu Sarkar Signed-off-by: Maciej Fijalkowski --- net/xdp/xsk_queue.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index 2d2af9fc2744..ab0d13d6d90e 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -140,6 +140,9 @@ static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool, { u64 offset = desc->addr & (pool->chunk_size - 1); + if (!desc->len) + return false; + if (offset + desc->len > pool->chunk_size) return false; @@ -156,6 +159,9 @@ static inline bool xp_unaligned_validate_desc(struct xsk_buff_pool *pool, { u64 addr = xp_unaligned_add_offset_to_addr(desc->addr); + if (!desc->len) + return false; + if (desc->len > pool->chunk_size) return false; From patchwork Mon May 29 15:50:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13258754 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B3336107A0; Mon, 29 May 2023 15:50:54 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5F627DE; Mon, 29 May 2023 08:50:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685375453; x=1716911453; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TfVoz87qpEFgK5ifXVQFDV2Kicze8w77gtAMOYlBW/Y=; b=gzlTejf3PgV+Kh8whqj/cQvPUoC+HFExeJtL3EvJNx02LUHanFg5Xt4X YXkcaN9+6qAKdH6TjMu1IifEGDb+4cRPdhVmJwR5PeIE6fqTtQJghip4Y 36q+1BfLA8Y38bfZcda/v4g2oC7JTPZKq4An9x8wYE3wouF31TZbyEyhm ds7YY9kJN9thwUCbCcCr863u31fTj+ZfkVHUAvlNKneMGjVo2aMYUoG6G 8M9FamNsGLpAljrc++u8ORh9fcamfLN2+D6TRHOQyR4nslJPEX2hSrMrC jNooVUNjcZUCbcop2eOF6WU4t21HT3GAzhiFxWDpOhO2r+jidnNIKy+PB g==; X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="344229018" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="344229018" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2023 08:50:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="880441099" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="880441099" Received: from boxer.igk.intel.com ([10.102.20.173]) by orsmga005.jf.intel.com with ESMTP; 29 May 2023 08:50:50 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, tirthendu.sarkar@intel.com, Maciej Fijalkowski Subject: [PATCH v2 bpf-next 10/22] xsk: support mbuf on ZC RX Date: Mon, 29 May 2023 17:50:12 +0200 Message-Id: <20230529155024.222213-11-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230529155024.222213-1-maciej.fijalkowski@intel.com> References: <20230529155024.222213-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Extend xdp_buff_xsk with two new list_head members - xskb_list and xskb_list_node. This means that each xdp_buff_xsk can be a node in the chain of these structs and also each xskb can carry the list by itself. This is needed so ZC drivers can add frags as xskb nodes which will make it possible to handle it both when producing AF_XDP Rx descriptors as well as freeing/recycling all the frags that a single frame carries. Speaking of latter, update xsk_buff_free() to take care of list nodes. For the former (adding as frags), introduce xsk_buff_add_frag() for ZC drivers usage that is going to be used to add a frag to the first xskb. xsk_buff_get_frag() will be utilized by XDP_TX and, on contrary, will return xdp_buff. One of the previous patches added a wrapper for ZC Rx so implement xskb list walk and production of Rx descriptors there. Signed-off-by: Maciej Fijalkowski --- include/net/xdp_sock_drv.h | 46 +++++++++++++++++++++++++++++++++++++ include/net/xsk_buff_pool.h | 2 ++ net/xdp/xsk.c | 24 ++++++++++++++++++- net/xdp/xsk_buff_pool.c | 2 ++ 4 files changed, 73 insertions(+), 1 deletion(-) diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index a3e18b958d93..36984c7b1393 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -108,10 +108,46 @@ static inline bool xsk_buff_can_alloc(struct xsk_buff_pool *pool, u32 count) static inline void xsk_buff_free(struct xdp_buff *xdp) { struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp); + struct xdp_buff_xsk *pos, *tmp; + if (likely(!xdp_buff_has_frags(xdp))) + goto out; + + list_for_each_entry_safe(pos, tmp, &xskb->xskb_list, xskb_list_node) { + list_del(&pos->xskb_list_node); + xp_free(pos); + } + + xdp_get_shared_info_from_buff(xdp)->nr_frags = 0; +out: xp_free(xskb); } +static inline void xsk_buff_add_frag(struct xdp_buff *first, + struct xdp_buff *xdp) +{ + struct xdp_buff_xsk *xskb = container_of(first, struct xdp_buff_xsk, xdp); + struct xdp_buff_xsk *frag = container_of(xdp, struct xdp_buff_xsk, xdp); + + list_add_tail(&frag->xskb_list_node, &xskb->xskb_list); +} + +static inline struct xdp_buff *xsk_buff_get_frag(struct xdp_buff *first) +{ + struct xdp_buff_xsk *xskb = container_of(first, struct xdp_buff_xsk, xdp); + struct xdp_buff *ret = NULL; + struct xdp_buff_xsk *frag; + + frag = list_first_entry_or_null(&xskb->xskb_list, struct xdp_buff_xsk, + xskb_list_node); + if (frag) { + list_del(&frag->xskb_list_node); + ret = &frag->xdp; + } + + return ret; +} + static inline void xsk_buff_set_size(struct xdp_buff *xdp, u32 size) { xdp->data = xdp->data_hard_start + XDP_PACKET_HEADROOM; @@ -265,6 +301,16 @@ static inline void xsk_buff_free(struct xdp_buff *xdp) { } +static inline void xsk_buff_add_frag(struct xdp_buff *first, + struct xdp_buff *xdp) +{ +} + +static inline struct xdp_buff *xsk_buff_get_frag(struct xdp_buff *first) +{ + return NULL; +} + static inline void xsk_buff_discard(struct xdp_buff *xdp) { } diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index 4dcca163e076..8a430163512c 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -29,6 +29,8 @@ struct xdp_buff_xsk { struct xsk_buff_pool *pool; u64 orig_addr; struct list_head free_list_node; + struct list_head xskb_list_node; + struct list_head xskb_list; }; #define XSK_CHECK_PRIV_TYPE(t) BUILD_BUG_ON(sizeof(t) > offsetofend(struct xdp_buff_xsk, cb)) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index bc8dbd9f5312..b519fe4456bc 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -155,8 +155,30 @@ static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff_xsk *xskb, u32 len, static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) { struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp); + u32 frags = xdp_buff_has_frags(xdp); + struct xdp_buff_xsk *pos, *tmp; + u32 contd = 0; + int err; + + if (frags) + contd = XDP_PKT_CONTD; - return __xsk_rcv_zc(xs, xskb, len, 0); + err = __xsk_rcv_zc(xs, xskb, len, contd); + if (err || likely(!frags)) + goto out; + + list_for_each_entry_safe(pos, tmp, &xskb->xskb_list, xskb_list_node) { + if (list_is_singular(&xskb->xskb_list)) + contd = 0; + len = pos->xdp.data_end - pos->xdp.data; + err = __xsk_rcv_zc(xs, pos, len, contd); + if (err) + return err; + list_del(&pos->xskb_list_node); + } + +out: + return err; } static void *xsk_copy_xdp_start(struct xdp_buff *from) diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index 26f6d304451e..0a9f8ea68de3 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -99,6 +99,8 @@ struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs, xskb->pool = pool; xskb->xdp.frame_sz = umem->chunk_size - umem->headroom; INIT_LIST_HEAD(&xskb->free_list_node); + INIT_LIST_HEAD(&xskb->xskb_list_node); + INIT_LIST_HEAD(&xskb->xskb_list); if (pool->unaligned) pool->free_heads[i] = xskb; else From patchwork Mon May 29 15:50:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13258755 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4954E10970; Mon, 29 May 2023 15:50:58 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC507C4; Mon, 29 May 2023 08:50:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685375455; x=1716911455; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=gZdqfj3szF9Jdrd5PqPZMFlJ2HmIAd2YRz3B2wZHImA=; b=Aal4mFjcnPfZjSOtIalDsPyH8RtDN33jPvYPvvUA0J+u091P//12fTWG 0lscsteCa9BVnWwizqEdWNaWjgdKkO8wEWBEx+UA5sKXD9HF4UbejBjAT T7/E+CmugJVTVqajAE3v8kDq58LN2ZdaEJosBNOtkFiSdhsjBMLF5ilxy FwkqiTfR9kCnZZ6U3XJ27SWL/2lJyzHGhleWmTIjEoNiCmL1d7PmYj09I BaVrY22Gaaz5AUf4GUc+EtmPFFvtQxgmWf+YSUV7ig+1Kebcj/SVcc+W9 E2gktou3sUoL8E9vjMjygQy05RW3s9TWrLafcloqOrBXFQr6DhRJpN8wv w==; X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="344229025" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="344229025" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2023 08:50:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="880441119" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="880441119" Received: from boxer.igk.intel.com ([10.102.20.173]) by orsmga005.jf.intel.com with ESMTP; 29 May 2023 08:50:53 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, tirthendu.sarkar@intel.com, Maciej Fijalkowski Subject: [PATCH v2 bpf-next 11/22] ice: xsk: add RX multi-buffer support Date: Mon, 29 May 2023 17:50:13 +0200 Message-Id: <20230529155024.222213-12-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230529155024.222213-1-maciej.fijalkowski@intel.com> References: <20230529155024.222213-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net This support is strongly inspired by work that introduced multi-buffer support to regular Rx data path in ice. There are some differences, though. When adding a frag, besides adding it to skb_shared_info, use also fresh xsk_buff_add_frag() helper. Reason for doing both things is that we can not rule out the fact that AF_XDP pipeline could use XDP program that needs to access frame fragments. Without them being in skb_shared_info it will not be possible. Another difference is that XDP_PASS has to allocate a new pages for each frags and copy contents from memory backed by xsk_buff_pool. chain_len that is used for programming HW Rx descriptors no longer has to be limited to 1 when xsk_pool is present - remove this restriction. Signed-off-by: Maciej Fijalkowski --- drivers/net/ethernet/intel/ice/ice_base.c | 9 +- drivers/net/ethernet/intel/ice/ice_xsk.c | 136 ++++++++++++++++------ 2 files changed, 102 insertions(+), 43 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c index 1911d644dfa8..b790e3d6be6f 100644 --- a/drivers/net/ethernet/intel/ice/ice_base.c +++ b/drivers/net/ethernet/intel/ice/ice_base.c @@ -366,7 +366,6 @@ static unsigned int ice_rx_offset(struct ice_rx_ring *rx_ring) */ static int ice_setup_rx_ctx(struct ice_rx_ring *ring) { - int chain_len = ICE_MAX_CHAINED_RX_BUFS; struct ice_vsi *vsi = ring->vsi; u32 rxdid = ICE_RXDID_FLEX_NIC; struct ice_rlan_ctx rlan_ctx; @@ -430,17 +429,11 @@ static int ice_setup_rx_ctx(struct ice_rx_ring *ring) */ rlan_ctx.showiv = 0; - /* For AF_XDP ZC, we disallow packets to span on - * multiple buffers, thus letting us skip that - * handling in the fast-path. - */ - if (ring->xsk_pool) - chain_len = 1; /* Max packet size for this queue - must not be set to a larger value * than 5 x DBUF */ rlan_ctx.rxmax = min_t(u32, vsi->max_frame, - chain_len * ring->rx_buf_len); + ICE_MAX_CHAINED_RX_BUFS * ring->rx_buf_len); /* Rx queue threshold in units of 64 */ rlan_ctx.lrxqthresh = 1; diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index d1e489da7363..920cf2b16836 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -548,19 +548,6 @@ bool ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count) return __ice_alloc_rx_bufs_zc(rx_ring, leftover); } -/** - * ice_bump_ntc - Bump the next_to_clean counter of an Rx ring - * @rx_ring: Rx ring - */ -static void ice_bump_ntc(struct ice_rx_ring *rx_ring) -{ - int ntc = rx_ring->next_to_clean + 1; - - ntc = (ntc < rx_ring->count) ? ntc : 0; - rx_ring->next_to_clean = ntc; - prefetch(ICE_RX_DESC(rx_ring, ntc)); -} - /** * ice_construct_skb_zc - Create an sk_buff from zero-copy buffer * @rx_ring: Rx ring @@ -575,8 +562,14 @@ ice_construct_skb_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp) { unsigned int totalsize = xdp->data_end - xdp->data_meta; unsigned int metasize = xdp->data - xdp->data_meta; + struct skb_shared_info *sinfo = NULL; struct sk_buff *skb; + u32 nr_frags = 0; + if (unlikely(xdp_buff_has_frags(xdp))) { + sinfo = xdp_get_shared_info_from_buff(xdp); + nr_frags = sinfo->nr_frags; + } net_prefetch(xdp->data_meta); skb = __napi_alloc_skb(&rx_ring->q_vector->napi, totalsize, @@ -592,6 +585,29 @@ ice_construct_skb_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp) __skb_pull(skb, metasize); } + if (likely(!xdp_buff_has_frags(xdp))) + goto out; + + for (int i = 0; i < nr_frags; i++) { + struct skb_shared_info *skinfo = skb_shinfo(skb); + skb_frag_t *frag = &sinfo->frags[i]; + struct page *page; + void *addr; + + page = dev_alloc_page(); + if (!page) { + dev_kfree_skb(skb); + return NULL; + } + addr = page_to_virt(page); + + memcpy(addr, skb_frag_page(frag), skb_frag_size(frag)); + + __skb_fill_page_desc_noacc(skinfo, skinfo->nr_frags++, + addr, 0, skb_frag_size(frag)); + } + +out: xsk_buff_free(xdp); return skb; } @@ -755,6 +771,34 @@ ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, return result; } +static int +ice_add_xsk_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *first, + struct xdp_buff *xdp, const unsigned int size) +{ + struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(first); + + if (!size) + return 0; + + if (!xdp_buff_has_frags(first)) { + sinfo->nr_frags = 0; + sinfo->xdp_frags_size = 0; + xdp_buff_set_frags_flag(first); + } + + if (unlikely(sinfo->nr_frags == MAX_SKB_FRAGS)) { + xsk_buff_free(first); + return -ENOMEM; + } + + __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, + virt_to_page(xdp->data_hard_start), 0, size); + sinfo->xdp_frags_size += size; + xsk_buff_add_frag(first, xdp); + + return 0; +} + /** * ice_clean_rx_irq_zc - consumes packets from the hardware ring * @rx_ring: AF_XDP Rx ring @@ -765,9 +809,14 @@ ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) { unsigned int total_rx_bytes = 0, total_rx_packets = 0; + struct xsk_buff_pool *xsk_pool = rx_ring->xsk_pool; + u32 ntc = rx_ring->next_to_clean; + u32 ntu = rx_ring->next_to_use; + struct xdp_buff *first = NULL; struct ice_tx_ring *xdp_ring; unsigned int xdp_xmit = 0; struct bpf_prog *xdp_prog; + u32 cnt = rx_ring->count; bool failure = false; int entries_to_alloc; @@ -777,6 +826,9 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) xdp_prog = READ_ONCE(rx_ring->xdp_prog); xdp_ring = rx_ring->xdp_ring; + if (ntc != rx_ring->first_desc) + first = *ice_xdp_buf(rx_ring, rx_ring->first_desc); + while (likely(total_rx_packets < (unsigned int)budget)) { union ice_32b_rx_flex_desc *rx_desc; unsigned int size, xdp_res = 0; @@ -786,7 +838,7 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) u16 vlan_tag = 0; u16 rx_ptype; - rx_desc = ICE_RX_DESC(rx_ring, rx_ring->next_to_clean); + rx_desc = ICE_RX_DESC(rx_ring, ntc); stat_err_bits = BIT(ICE_RX_FLEX_DESC_STATUS0_DD_S); if (!ice_test_staterr(rx_desc->wb.status_error0, stat_err_bits)) @@ -798,51 +850,61 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) */ dma_rmb(); - if (unlikely(rx_ring->next_to_clean == rx_ring->next_to_use)) + if (unlikely(ntc == ntu)) break; - xdp = *ice_xdp_buf(rx_ring, rx_ring->next_to_clean); + xdp = *ice_xdp_buf(rx_ring, ntc); size = le16_to_cpu(rx_desc->wb.pkt_len) & ICE_RX_FLX_DESC_PKT_LEN_M; - if (!size) { - xdp->data = NULL; - xdp->data_end = NULL; - xdp->data_hard_start = NULL; - xdp->data_meta = NULL; - goto construct_skb; - } xsk_buff_set_size(xdp, size); - xsk_buff_dma_sync_for_cpu(xdp, rx_ring->xsk_pool); + xsk_buff_dma_sync_for_cpu(xdp, xsk_pool); + + if (!first) { + first = xdp; + xdp_buff_clear_frags_flag(first); + } else if (ice_add_xsk_frag(rx_ring, first, xdp, size)) { + break; + } + + if (++ntc == cnt) + ntc = 0; + + if (ice_is_non_eop(rx_ring, rx_desc)) + continue; - xdp_res = ice_run_xdp_zc(rx_ring, xdp, xdp_prog, xdp_ring); + xdp_res = ice_run_xdp_zc(rx_ring, first, xdp_prog, xdp_ring); if (likely(xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR))) { xdp_xmit |= xdp_res; } else if (xdp_res == ICE_XDP_EXIT) { failure = true; + first = NULL; + rx_ring->first_desc = ntc; break; } else if (xdp_res == ICE_XDP_CONSUMED) { - xsk_buff_free(xdp); + xsk_buff_free(first); } else if (xdp_res == ICE_XDP_PASS) { goto construct_skb; } - total_rx_bytes += size; + total_rx_bytes += xdp_get_buff_len(first); total_rx_packets++; - ice_bump_ntc(rx_ring); + first = NULL; + rx_ring->first_desc = ntc; continue; construct_skb: /* XDP_PASS path */ - skb = ice_construct_skb_zc(rx_ring, xdp); + skb = ice_construct_skb_zc(rx_ring, first); if (!skb) { rx_ring->ring_stats->rx_stats.alloc_buf_failed++; break; } - ice_bump_ntc(rx_ring); + first = NULL; + rx_ring->first_desc = ntc; if (eth_skb_pad(skb)) { skb = NULL; @@ -861,18 +923,22 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) ice_receive_skb(rx_ring, skb, vlan_tag); } - entries_to_alloc = ICE_DESC_UNUSED(rx_ring); + rx_ring->next_to_clean = ntc; + entries_to_alloc = ICE_RX_DESC_UNUSED(rx_ring); if (entries_to_alloc > ICE_RING_QUARTER(rx_ring)) failure |= !ice_alloc_rx_bufs_zc(rx_ring, entries_to_alloc); ice_finalize_xdp_rx(xdp_ring, xdp_xmit, 0); ice_update_rx_ring_stats(rx_ring, total_rx_packets, total_rx_bytes); - if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) { - if (failure || rx_ring->next_to_clean == rx_ring->next_to_use) - xsk_set_rx_need_wakeup(rx_ring->xsk_pool); + if (xsk_uses_need_wakeup(xsk_pool)) { + /* ntu could have changed when allocating entries above, so + * use rx_ring value instead of stack based one + */ + if (failure || ntc == rx_ring->next_to_use) + xsk_set_rx_need_wakeup(xsk_pool); else - xsk_clear_rx_need_wakeup(rx_ring->xsk_pool); + xsk_clear_rx_need_wakeup(xsk_pool); return (int)total_rx_packets; } From patchwork Mon May 29 15:50:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13258756 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 740D68F62; Mon, 29 May 2023 15:51:00 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 300ABDB; Mon, 29 May 2023 08:50:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685375458; x=1716911458; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TexfAAXuZtQ3n+9KFI8j2ot5gwJO7QF5sLKTHpPEBuI=; b=CH0iNVbqUnZIutDYe7j+o/JDW9OfLi31+SAMO1XTtobpzpizdY9Y4IM3 SBRTVbYSgeTFRuz6zkxpF6oY532jvLNmVuBxRWvfD80/tav6mgX2y7mkO OwKeiaJ/KWeQ1yxUwP0HPIAjSsk/NCnkzJ3CGuGc3Rif0vbXKlEdIxZjC BuI3biU+LYU8OpSKOco2Cbswhr4zmpQsx7IQSKmyO97E0grC1ALWNZDjE XLqdxxpF2JMmD0wLThGprOElHUio2dQ7iv7X81BPfVFyKynAwwE+ciMyk jo3TmBiZYwV56Kni61pwWLZaXegVDVp7meQ7etoKLrDtYuL77UYxkQITG A==; X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="344229028" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="344229028" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2023 08:50:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="880441138" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="880441138" Received: from boxer.igk.intel.com ([10.102.20.173]) by orsmga005.jf.intel.com with ESMTP; 29 May 2023 08:50:55 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, tirthendu.sarkar@intel.com, Maciej Fijalkowski Subject: [PATCH v2 bpf-next 12/22] xsk: support ZC Tx multi-buffer in batch API Date: Mon, 29 May 2023 17:50:14 +0200 Message-Id: <20230529155024.222213-13-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230529155024.222213-1-maciej.fijalkowski@intel.com> References: <20230529155024.222213-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Modify xskq_cons_read_desc_batch() in a way that each processed descriptor will be checked if it is an EOP one or not and act accordingly to that. Change the behavior of mentioned function to break the processing when stumbling upon invalid descriptor instead of skipping it. Furthermore, let us give only full packets down to ZC driver. With these two assumptions ZC drivers will not have to take care of an intermediate state of incomplete frames, which will simplify its implementations a lot. Signed-off-by: Maciej Fijalkowski --- net/xdp/xsk_queue.h | 41 ++++++++++++++++++++++++++++++++--------- 1 file changed, 32 insertions(+), 9 deletions(-) diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index ab0d13d6d90e..a8f11a41609a 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -48,6 +48,11 @@ struct xsk_queue { size_t ring_vmalloc_size; }; +struct parsed_desc { + u32 mb; + u32 valid; +}; + /* The structure of the shared state of the rings are a simple * circular buffer, as outlined in * Documentation/core-api/circular-buffers.rst. For the Rx and @@ -218,30 +223,48 @@ static inline void xskq_cons_release_n(struct xsk_queue *q, u32 cnt) q->cached_cons += cnt; } -static inline u32 xskq_cons_read_desc_batch(struct xsk_queue *q, struct xsk_buff_pool *pool, - u32 max) +static inline void parse_desc(struct xsk_queue *q, struct xsk_buff_pool *pool, + struct xdp_desc *desc, struct parsed_desc *parsed) +{ + parsed->valid = xskq_cons_is_valid_desc(q, desc, pool); + parsed->mb = xp_mb_desc(desc); +} + +static inline +u32 xskq_cons_read_desc_batch(struct xsk_queue *q, struct xsk_buff_pool *pool, + u32 max) { u32 cached_cons = q->cached_cons, nb_entries = 0; struct xdp_desc *descs = pool->tx_descs; + u32 total_descs = 0, nr_frags = 0; + /* track first entry, if stumble upon *any* invalid descriptor, rewind + * current packet that consists of frags and stop the processing + */ while (cached_cons != q->cached_prod && nb_entries < max) { struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring; u32 idx = cached_cons & q->ring_mask; + struct parsed_desc parsed; descs[nb_entries] = ring->desc[idx]; - if (unlikely(!xskq_cons_is_valid_desc(q, &descs[nb_entries], pool))) { - /* Skip the entry */ - cached_cons++; - continue; - } + cached_cons++; + parse_desc(q, pool, &descs[nb_entries], &parsed); + if (unlikely(!parsed.valid)) + break; nb_entries++; - cached_cons++; + if (likely(!parsed.mb)) { + total_descs += (nr_frags + 1); + nr_frags = 0; + } else { + nr_frags++; + } } + cached_cons -= nr_frags; /* Release valid plus any invalid entries */ xskq_cons_release_n(q, cached_cons - q->cached_cons); - return nb_entries; + return total_descs; } /* Functions for consumers */ From patchwork Mon May 29 15:50:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13258757 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7576411196; Mon, 29 May 2023 15:51:02 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7AF40CF; Mon, 29 May 2023 08:51:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685375460; x=1716911460; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KIxYAJTX1HXf9qG9oPpa2qNiCKMKWw47fIZSNt4aIhU=; b=YmXicTpn7dSrRg93rYI6JwWmxN41eyuYHK6oJs4G3v8CZunL/i8eRhLt zCD14zuqdivBSB6J6n2kAIDGAutWjgJ3594WGSsqSh+20XKrurhFjrlz3 xGMU5rR/3X2k4duaFfwXoCb5gu9AoTDWnD4ffMk3sD6LZIexT536zFa1p P4dzi/mCIA/ng526KY7ZWfgy+OwSQVpWAzEM2bPbgaiwcCVUnbZpIfLa2 +E12j3XQggV0zx1lSCc4EuTl069ndXwcSLTz1rXthHzgIt6LE5GGGBZq3 t30utdWNKjI8V68yqWcQtrj/2TPffqPcJ+MhxRfntd9wRZtlBPtO+hdB5 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="344229035" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="344229035" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2023 08:51:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="880441159" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="880441159" Received: from boxer.igk.intel.com ([10.102.20.173]) by orsmga005.jf.intel.com with ESMTP; 29 May 2023 08:50:58 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, tirthendu.sarkar@intel.com, Maciej Fijalkowski Subject: [PATCH v2 bpf-next 13/22] xsk: report ZC multi-buffer capability via xdp_features Date: Mon, 29 May 2023 17:50:15 +0200 Message-Id: <20230529155024.222213-14-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230529155024.222213-1-maciej.fijalkowski@intel.com> References: <20230529155024.222213-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Introduce new xdp_feature NETDEV_XDP_ACT_NDO_ZC_SG that will be used to find out if user space that wants to do ZC multi-buffer will be able to do so against underlying ZC driver. Signed-off-by: Maciej Fijalkowski --- include/uapi/linux/netdev.h | 4 ++-- net/xdp/xsk_buff_pool.c | 6 ++++++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h index 639524b59930..bfca07224f7b 100644 --- a/include/uapi/linux/netdev.h +++ b/include/uapi/linux/netdev.h @@ -33,8 +33,8 @@ enum netdev_xdp_act { NETDEV_XDP_ACT_HW_OFFLOAD = 16, NETDEV_XDP_ACT_RX_SG = 32, NETDEV_XDP_ACT_NDO_XMIT_SG = 64, - - NETDEV_XDP_ACT_MASK = 127, + NETDEV_XDP_ACT_NDO_ZC_SG = 128, + NETDEV_XDP_ACT_MASK = 255, }; enum { diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index 0a9f8ea68de3..43cca5fa90cf 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -189,6 +189,12 @@ int xp_assign_dev(struct xsk_buff_pool *pool, goto err_unreg_pool; } + if (!(netdev->xdp_features & NETDEV_XDP_ACT_NDO_ZC_SG) && + flags & XDP_USE_SG) { + err = -EOPNOTSUPP; + goto err_unreg_pool; + } + bpf.command = XDP_SETUP_XSK_POOL; bpf.xsk.pool = pool; bpf.xsk.queue_id = queue_id; From patchwork Mon May 29 15:50:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13258758 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6AC4E11C8F; Mon, 29 May 2023 15:51:04 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB708CF; Mon, 29 May 2023 08:51:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685375462; x=1716911462; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CZmuAExqMPHrqjmWo0tgaKrbzmD5ZXtil+aS9Wy4/cI=; b=igX25azM/Vnar2BTpy7hKmIrhOppUB5N0VMkwEDBCsl02/yAc0uq/HLb UVJlWSsyqWYRhwPvE88MbHoMuedAJnDqb88PjB6W5LDWeUWAb4sNzn03K LJZrqO2ZaI719kOyn2Y8QFPMZYkl5R2zvnTH2odrYIj8Q00Qm00vlJzSm XW1dX5d+D+masSCv6bxaDHZvEHQKJ2n23dJtrcQzLdv1vk71D6HUkHn49 uX4ziYLG+qmtt7Qp2TECw0yKv/WPSrbhVadjynvcPbZ/tm+Rls3f+e3PJ Jj6D7eTmyJg1OJMtgws68a5PlwyaklEfWXv5SlasnhIm2X6oUz+0MCi78 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="344229042" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="344229042" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2023 08:51:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="880441175" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="880441175" Received: from boxer.igk.intel.com ([10.102.20.173]) by orsmga005.jf.intel.com with ESMTP; 29 May 2023 08:51:00 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, tirthendu.sarkar@intel.com, Maciej Fijalkowski Subject: [PATCH v2 bpf-next 14/22] ice: xsk: Tx multi-buffer support Date: Mon, 29 May 2023 17:50:16 +0200 Message-Id: <20230529155024.222213-15-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230529155024.222213-1-maciej.fijalkowski@intel.com> References: <20230529155024.222213-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Most of this patch is about actually supporting XDP_TX action. Pure Tx ZC support is only about looking at XDP_PKT_CONTD presence at options field and based on that generating EOP bit on Tx HW descriptor. This is that simple due to the implementation on xsk_tx_peek_release_desc_batch() where we are making sure that last produced descriptor is an EOP one. Report via xdp_features that this driver is now capable of consuming multi-buffer packets on both Rx and Tx sides. Signed-off-by: Maciej Fijalkowski --- drivers/net/ethernet/intel/ice/ice_main.c | 2 +- drivers/net/ethernet/intel/ice/ice_xsk.c | 83 ++++++++++++++++------- 2 files changed, 61 insertions(+), 24 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index a1f7c8edc22f..bd16c9de1153 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -3385,7 +3385,7 @@ static void ice_set_ops(struct ice_vsi *vsi) netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | NETDEV_XDP_ACT_XSK_ZEROCOPY | - NETDEV_XDP_ACT_RX_SG; + NETDEV_XDP_ACT_RX_SG | NETDEV_XDP_ACT_NDO_ZC_SG; } /** diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 920cf2b16836..d8bf774b5f6d 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -616,7 +616,7 @@ ice_construct_skb_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp) * ice_clean_xdp_irq_zc - produce AF_XDP descriptors to CQ * @xdp_ring: XDP Tx ring */ -static void ice_clean_xdp_irq_zc(struct ice_tx_ring *xdp_ring) +static u32 ice_clean_xdp_irq_zc(struct ice_tx_ring *xdp_ring) { u16 ntc = xdp_ring->next_to_clean; struct ice_tx_desc *tx_desc; @@ -638,7 +638,7 @@ static void ice_clean_xdp_irq_zc(struct ice_tx_ring *xdp_ring) } if (!completed_frames) - return; + return 0; if (likely(!xdp_ring->xdp_tx_active)) { xsk_frames = completed_frames; @@ -668,6 +668,8 @@ static void ice_clean_xdp_irq_zc(struct ice_tx_ring *xdp_ring) xdp_ring->next_to_clean -= cnt; if (xsk_frames) xsk_tx_completed(xdp_ring->xsk_pool, xsk_frames); + + return completed_frames; } /** @@ -685,37 +687,72 @@ static void ice_clean_xdp_irq_zc(struct ice_tx_ring *xdp_ring) static int ice_xmit_xdp_tx_zc(struct xdp_buff *xdp, struct ice_tx_ring *xdp_ring) { + struct skb_shared_info *sinfo = NULL; u32 size = xdp->data_end - xdp->data; u32 ntu = xdp_ring->next_to_use; struct ice_tx_desc *tx_desc; struct ice_tx_buf *tx_buf; - dma_addr_t dma; + struct xdp_buff *head; + u32 nr_frags = 0; + u32 free_space; + u32 frag = 0; - if (ICE_DESC_UNUSED(xdp_ring) < ICE_RING_QUARTER(xdp_ring)) { - ice_clean_xdp_irq_zc(xdp_ring); - if (!ICE_DESC_UNUSED(xdp_ring)) { - xdp_ring->ring_stats->tx_stats.tx_busy++; - return ICE_XDP_CONSUMED; - } - } + free_space = ICE_DESC_UNUSED(xdp_ring); + if (free_space < ICE_RING_QUARTER(xdp_ring)) + free_space += ice_clean_xdp_irq_zc(xdp_ring); - dma = xsk_buff_xdp_get_dma(xdp); - xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, size); + if (unlikely(!free_space)) + goto busy; + + if (unlikely(xdp_buff_has_frags(xdp))) { + sinfo = xdp_get_shared_info_from_buff(xdp); + nr_frags = sinfo->nr_frags; + if (free_space < nr_frags + 1) + goto busy; + } - tx_buf = &xdp_ring->tx_buf[ntu]; - tx_buf->xdp = xdp; - tx_buf->type = ICE_TX_BUF_XSK_TX; tx_desc = ICE_TX_DESC(xdp_ring, ntu); - tx_desc->buf_addr = cpu_to_le64(dma); - tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TX_DESC_CMD_EOP, - 0, size, 0); - xdp_ring->xdp_tx_active++; + tx_buf = &xdp_ring->tx_buf[ntu]; + head = xdp; + + for (;;) { + dma_addr_t dma; + + dma = xsk_buff_xdp_get_dma(xdp); + xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, size); + + tx_buf->xdp = xdp; + tx_buf->type = ICE_TX_BUF_XSK_TX; + tx_desc->buf_addr = cpu_to_le64(dma); + tx_desc->cmd_type_offset_bsz = ice_build_ctob(0, 0, size, 0); + /* account for each xdp_buff from xsk_buff_pool */ + xdp_ring->xdp_tx_active++; + + if (++ntu == xdp_ring->count) + ntu = 0; + + if (frag == nr_frags) + break; + + tx_desc = ICE_TX_DESC(xdp_ring, ntu); + tx_buf = &xdp_ring->tx_buf[ntu]; + + xdp = xsk_buff_get_frag(head); + size = skb_frag_size(&sinfo->frags[frag]); + frag++; + } - if (++ntu == xdp_ring->count) - ntu = 0; xdp_ring->next_to_use = ntu; + /* update last descriptor from a frame with EOP */ + tx_desc->cmd_type_offset_bsz |= + cpu_to_le64(ICE_TX_DESC_CMD_EOP << ICE_TXD_QW1_CMD_S); return ICE_XDP_TX; + +busy: + xdp_ring->ring_stats->tx_stats.tx_busy++; + + return ICE_XDP_CONSUMED; } /** @@ -963,7 +1000,7 @@ static void ice_xmit_pkt(struct ice_tx_ring *xdp_ring, struct xdp_desc *desc, tx_desc = ICE_TX_DESC(xdp_ring, xdp_ring->next_to_use++); tx_desc->buf_addr = cpu_to_le64(dma); - tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TX_DESC_CMD_EOP, + tx_desc->cmd_type_offset_bsz = ice_build_ctob(xsk_is_eop_desc(desc), 0, desc->len, 0); *total_bytes += desc->len; @@ -990,7 +1027,7 @@ static void ice_xmit_pkt_batch(struct ice_tx_ring *xdp_ring, struct xdp_desc *de tx_desc = ICE_TX_DESC(xdp_ring, ntu++); tx_desc->buf_addr = cpu_to_le64(dma); - tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TX_DESC_CMD_EOP, + tx_desc->cmd_type_offset_bsz = ice_build_ctob(xsk_is_eop_desc(&descs[i]), 0, descs[i].len, 0); *total_bytes += descs[i].len; From patchwork Mon May 29 15:50:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13258759 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 42717125B8; Mon, 29 May 2023 15:51:07 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3FD7FCF; Mon, 29 May 2023 08:51:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685375465; x=1716911465; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hxoXSYVUi5pkxXecemnvbbG7MisatNQOgKuDLXNnKgs=; b=XoZ1N8Fls/bqqkuQHxvrYx4BXCbjbsYb0ykD4iFpmRSnFQf4PSDc3Sj0 xxevT+os9LsgJNGO5pOkAsXpQHudukZgmEtDi8Cgnol6FWBvetdzogWZ1 QKk/siN5BEobV8gRM6GdkxYd+fNg7SZF4HBi7YsZm1A7CuavGjCtrfjrZ 4s8YKx+OQsR6pNe5F4ANYWceQhDCm+dM7GF9mUipE0YkE5SyVtro0QXxW xBSNVQlIUJylBQr/vc3HBwOvWuU69J4kh3exbs8Li1CfH7smHYwEIIAiE YmgpE76LbxCHNAfdLv1NSEIwhD6cJeGCWhDZJFuODoCTKL7wMiFrHPHyH w==; X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="344229049" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="344229049" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2023 08:51:05 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="880441183" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="880441183" Received: from boxer.igk.intel.com ([10.102.20.173]) by orsmga005.jf.intel.com with ESMTP; 29 May 2023 08:51:02 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, tirthendu.sarkar@intel.com Subject: [PATCH v2 bpf-next 15/22] xsk: add multi-buffer documentation Date: Mon, 29 May 2023 17:50:17 +0200 Message-Id: <20230529155024.222213-16-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230529155024.222213-1-maciej.fijalkowski@intel.com> References: <20230529155024.222213-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Magnus Karlsson Add AF_XDP multi-buffer support documentation including two pseudo-code samples. Signed-off-by: Magnus Karlsson --- Documentation/networking/af_xdp.rst | 177 ++++++++++++++++++++++++++++ 1 file changed, 177 insertions(+) diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst index 247c6c4127e9..2b583f58967b 100644 --- a/Documentation/networking/af_xdp.rst +++ b/Documentation/networking/af_xdp.rst @@ -453,6 +453,93 @@ XDP_OPTIONS getsockopt Gets options from an XDP socket. The only one supported so far is XDP_OPTIONS_ZEROCOPY which tells you if zero-copy is on or not. +Multi-Buffer Support +-------------------- + +With multi-buffer support, programs using AF_XDP sockets can receive +and transmit packets consisting of multiple buffers both in copy and +zero-copy mode. For example, a packet can consist of two +frames/buffers, one with the header and the other one with the data, +or a 9K Ethernet jumbo frame can be constructed by chaining together +three 4K frames. + +Some definitions: + +* A packet consists of one or more frames + +* A descriptor in one of the AF_XDP rings always refers to a single + frame. In the case the packet consists of a single frame, the + descriptor refers to the whole packet. + +To enable multi-buffer support for an AF_XDP socket, use the new bind +flag XDP_USE_SG. If this is not provided, all multi-buffer packets +will be dropped just as before. Note that the XDP program loaded also +needs to be in multi-buffer mode. This can be accomplished by using +"xdp.frags" as the section name of the XDP program used. + +To represent a packet consisting of multiple frames, a new flag called +XDP_PKT_CONTD is introduced in the options field of the Rx and Tx +descriptors. If it is true (1) the packet continues with the next +descriptor and if it is false (0) it means this is the last descriptor +of the packet. Why the reverse logic of end-of-packet (eop) flag found +in many NICs? Just to preserve compatibility with non-multi-buffer +applications that have this bit set to false for all packets on Rx, +and the apps set the options field to zero for Tx, as anything else +will be treated as an invalid descriptor. + +These are the semantics for producing packets onto AF_XDP Tx ring +consisting of multiple frames: + +* When an invalid descriptor is found, all the other + descriptors/frames of this packet are marked as invalid and not + completed. The next descriptor is treated as the start of a new + packet, even if this was not the intent (because we cannot guess + the intent). As before, if your program is producing invalid + descriptors you have a bug that must be fixed. + +* Zero length descriptors are treated as invalid descriptors. + +* For copy mode, the maximum supported number of frames in a packet is + equal to CONFIG_MAX_SKB_FRAGS + 1. If it is exceeded, all + descriptors accumulated so far are dropped and treated as + invalid. To produce an application that will work on any system + regardless of this config setting, limit the number of frags to 18, + as the minimum value of the config is 17. + +* For zero-copy mode, the limit is up to what the NIC HW + supports. Usually at least five on the NICs we have checked. We + consciously chose to not enforce a rigid limit (such as + CONFIG_MAX_SKB_FRAGS + 1) for zero-copy mode, as it would have + resulted in copy actions under the hood to fit into what limit + the NIC supports. Kind of defeats the purpose of zero-copy mode. + +* The ZC batch API guarantees that it will provide a batch of Tx + descriptors that ends with full packet at the end. If not, ZC + drivers would have to gather the full packet on their side. The + approach we picked makes ZC drivers' life much easier (at least on + Tx side). + +On the Rx path in copy-mode, the xsk core copies the XDP data into +multiple descriptors, if needed, and sets the XDP_PKT_CONTD flag as +detailed before. Zero-copy mode works the same, though the data is not +copied. When the application gets a descriptor with the XDP_PKT_CONTD +flag set to one, it means that the packet consists of multiple buffers +and it continues with the next buffer in the following +descriptor. When a descriptor with XDP_PKT_CONTD == 0 is received, it +means that this is the last buffer of the packet. AF_XDP guarantees +that only a complete packet (all frames in the packet) is sent to the +application. + +If application reads a batch of descriptors, using for example the libxdp +interfaces, it is not guaranteed that the batch will end with a full +packet. It might end in the middle of a packet and the rest of the +buffers of that packet will arrive at the beginning of the next batch, +since the libxdp interface does not read the whole ring (unless you +have an enormous batch size or a very small ring size). + +An example program each for Rx and Tx multi-buffer support can be found +later in this document. + Usage ===== @@ -532,6 +619,96 @@ like this: But please use the libbpf functions as they are optimized and ready to use. Will make your life easier. +Usage Multi-Buffer Rx +===================== + +Here is a simple Rx path pseudo-code example (using libxdp interfaces +for simplicity). Error paths have been excluded to keep it short: + +.. code-block:: c + + void rx_packets(struct xsk_socket_info *xsk) + { + static bool new_packet = true; + u32 idx_rx = 0, idx_fq = 0; + static char *pkt; + + int rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx); + + xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq); + + for (int i = 0; i < rcvd; i++) { + struct xdp_desc *desc = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++); + char *frag = xsk_umem__get_data(xsk->umem->buffer, desc->addr); + bool eop = !(desc->options & XDP_PKT_CONTD); + + if (new_packet) + pkt = frag; + else + add_frag_to_pkt(pkt, frag); + + if (eop) + process_pkt(pkt); + + new_packet = eop; + + *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = desc->addr; + } + + xsk_ring_prod__submit(&xsk->umem->fq, rcvd); + xsk_ring_cons__release(&xsk->rx, rcvd); + } + +Usage Multi-Buffer Tx +===================== + +Here is an example Tx path pseudo-code (using libxdp interfaces for +simplicity) ignoring that the umem is finite in size, and that we +eventually will run out of packets to send. Also assumes pkts.addr +points to a valid location in the umem. + +.. code-block:: c + + void tx_packets(struct xsk_socket_info *xsk, struct pkt *pkts, + int batch_size) + { + u32 idx, i, pkt_nb = 0; + + xsk_ring_prod__reserve(&xsk->tx, batch_size, &idx); + + for (i = 0; i < batch_size;) { + u64 addr = pkts[pkt_nb].addr; + u32 len = pkts[pkt_nb].size; + + do { + struct xdp_desc *tx_desc; + + tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx + i++); + tx_desc->addr = addr; + + if (len > xsk_frame_size) { + tx_desc->len = xsk_frame_size; + tx_desc->options = XDP_PKT_CONTD; + } else { + tx_desc->len = len; + tx_desc->options = 0; + pkt_nb++; + } + len -= tx_desc->len; + addr += xsk_frame_size; + + if (i == batch_size) { + /* Remember len, addr, pkt_nb for next iteration. + * Skipped for simplicity. + */ + break; + } + } while (len); + } + + xsk_ring_prod__submit(&xsk->tx, i); + } + Sample application ================== From patchwork Mon May 29 15:50:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13258760 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DF5BD125B8; Mon, 29 May 2023 15:51:09 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C3136CD; Mon, 29 May 2023 08:51:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685375467; x=1716911467; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=gbu4pbRoXHAXM26Hvs5n+5X0u3mopEV3k95/yYAlCKg=; b=TtmrNAKgy4XCoX1V1d0wxzqKWcw8SwWOyJZYV1d7umh+ig26MttcIrbA osNj3R8AO45Y9y8BRQi1LIHsKELFC/zZEvJqFOlNuq5J05V2V6aVb+zEu Fm9REzqTaxQmt45G5Tt7onRBuwa1P7eDhxb7Ep7wNEIAf5tbqBOEQ2TE5 z/d/pfx/vEEHi5Ec6VMkJpgTAEb8YFX9Jm9+zrFGq3cPangBFPp8jHmZK GrSIS7yikiBa9avFN75SJnGNb93pNk0h5aHBZJH7alX1yLkUf0riYU8tv oXpLQYyIa5iqpvYcCY2aFk0qkDe4ABftGGcyWOZXX1/52tefx4B5cvXce A==; X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="344229058" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="344229058" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2023 08:51:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="880441195" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="880441195" Received: from boxer.igk.intel.com ([10.102.20.173]) by orsmga005.jf.intel.com with ESMTP; 29 May 2023 08:51:05 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, tirthendu.sarkar@intel.com Subject: [PATCH v2 bpf-next 16/22] selftests/xsk: transmit and receive multi-buffer packets Date: Mon, 29 May 2023 17:50:18 +0200 Message-Id: <20230529155024.222213-17-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230529155024.222213-1-maciej.fijalkowski@intel.com> References: <20230529155024.222213-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Magnus Karlsson Add the ability to send and receive packets that are larger than the size of a umem frame, using the AF_XDP /XDP multi-buffer support. There are three pieces of code that need to be changed to achieve this: the Rx path, the Tx path, and the validation logic. Both the Rx path and Tx could only deal with a single fragment per packet. The Tx path is extended with a new function called pkt_nb_frags() that can be used to retrieve the number of fragments a packet will consume. We then create these many fragments in a loop and fill the N-1 first ones to the max size limit to use the buffer space efficiently, and the Nth one with whatever data that is left. This goes on until we have filled in at the most BATCH_SIZE worth of descriptors and fragments. If we detect that the next packet would lead to BATCH_SIZE number of fragments sent being exceeded, we do not send this packet and finish the batch. This packet is instead sent in the next iteration of BATCH_SIZE fragments. For Rx, we loop over all fragments we receive as usual, but for every descriptor that we receive we call a new validation function called is_frag_valid() to validate the consistency of this fragment. The code then checks if the packet continues in the next frame. If so, it loops over the next packet and performs the same validation. once we have received the last fragment of the packet we also call the function is_pkt_valid() to validate the packet as a whole. If we get to the end of the batch and we are not at the end of the current packet, we back out the partial packet and end the loop. Once we get into the receive loop next time, we start over from the beginning of that packet. This so the code becomes simpler at the cost of some performance. The validation function is_frag_valid() checks that the sequence and packet numbers are correct at the start and end of each fragment. Signed-off-by: Magnus Karlsson --- tools/include/uapi/linux/if_xdp.h | 3 + tools/testing/selftests/bpf/xskxceiver.c | 167 ++++++++++++++++++----- tools/testing/selftests/bpf/xskxceiver.h | 3 +- 3 files changed, 139 insertions(+), 34 deletions(-) diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h index a78a8096f4ce..80245f5b4dd7 100644 --- a/tools/include/uapi/linux/if_xdp.h +++ b/tools/include/uapi/linux/if_xdp.h @@ -106,6 +106,9 @@ struct xdp_desc { __u32 options; }; +/* Flag indicating packet constitutes of multiple buffers*/ +#define XDP_PKT_CONTD (1 << 0) + /* UMEM descriptor is __u64 */ #endif /* _LINUX_IF_XDP_H */ diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c index 218d7f694e5c..5e29e8850488 100644 --- a/tools/testing/selftests/bpf/xskxceiver.c +++ b/tools/testing/selftests/bpf/xskxceiver.c @@ -533,6 +533,11 @@ static struct pkt_stream *__pkt_stream_alloc(u32 nb_pkts) return pkt_stream; } +static bool pkt_continues(const struct xdp_desc *desc) +{ + return desc->options & XDP_PKT_CONTD; +} + static u32 ceil_u32(u32 a, u32 b) { return (a + b - 1) / b; @@ -549,7 +554,7 @@ static void pkt_set(struct xsk_umem_info *umem, struct pkt *pkt, int offset, u32 { pkt->offset = offset; pkt->len = len; - if (len > umem->frame_size - XDP_PACKET_HEADROOM - MIN_PKT_SIZE * 2 - umem->frame_headroom) + if (len > MAX_ETH_JUMBO_SIZE) pkt->valid = false; else pkt->valid = true; @@ -637,6 +642,11 @@ static u64 pkt_get_addr(struct pkt *pkt, struct xsk_umem_info *umem) return pkt->offset + umem_alloc_buffer(umem); } +static void pkt_stream_cancel(struct pkt_stream *pkt_stream) +{ + pkt_stream->current_pkt_nb--; +} + static void pkt_generate(struct ifobject *ifobject, u64 addr, u32 len, u32 pkt_nb, u32 bytes_written) { @@ -765,43 +775,81 @@ static bool is_metadata_correct(struct pkt *pkt, void *buffer, u64 addr) return true; } -static bool is_pkt_valid(struct pkt *pkt, void *buffer, u64 addr, u32 len) +static bool is_frag_valid(struct xsk_umem_info *umem, u64 addr, u32 len, u32 expected_pkt_nb, + u32 bytes_processed) { - void *data = xsk_umem__get_data(buffer, addr); - u32 seqnum, pkt_data; + u32 seqnum, pkt_nb, *pkt_data, words_to_end, expected_seqnum; + void *data = xsk_umem__get_data(umem->buffer, addr); - if (!pkt) { - ksft_print_msg("[%s] too many packets received\n", __func__); - goto error; + addr -= umem->base_addr; + + if (addr >= umem->num_frames * umem->frame_size || + addr + len > umem->num_frames * umem->frame_size) { + ksft_print_msg("Frag invalid addr: %llx len: %u\n", addr, len); + return false; + } + if (!umem->unaligned_mode && addr % umem->frame_size + len > umem->frame_size) { + ksft_print_msg("Frag crosses frame boundary addr: %llx len: %u\n", addr, len); + return false; } - if (len < MIN_PKT_SIZE || pkt->len < MIN_PKT_SIZE) { - /* Do not try to verify packets that are smaller than minimum size. */ - return true; + pkt_data = data; + if (!bytes_processed) { + pkt_data += PKT_HDR_SIZE / sizeof(*pkt_data); + len -= PKT_HDR_SIZE; + } else { + bytes_processed -= PKT_HDR_SIZE; } - if (pkt->len != len) { - ksft_print_msg("[%s] expected length [%d], got length [%d]\n", - __func__, pkt->len, len); + expected_seqnum = bytes_processed / sizeof(*pkt_data); + seqnum = ntohl(*pkt_data) & 0xffff; + pkt_nb = ntohl(*pkt_data) >> 16; + + if (expected_pkt_nb != pkt_nb) { + ksft_print_msg("[%s] expected pkt_nb [%u], got pkt_nb [%u]\n", + __func__, expected_pkt_nb, pkt_nb); + goto error; + } + if (expected_seqnum != seqnum) { + ksft_print_msg("[%s] expected seqnum at start [%u], got seqnum [%u]\n", + __func__, expected_seqnum, seqnum); goto error; } - pkt_data = ntohl(*((u32 *)(data + PKT_HDR_SIZE))); - seqnum = pkt_data >> 16; - - if (pkt->pkt_nb != seqnum) { - ksft_print_msg("[%s] expected seqnum [%d], got seqnum [%d]\n", - __func__, pkt->pkt_nb, seqnum); + words_to_end = len / sizeof(*pkt_data) - 1; + pkt_data += words_to_end; + seqnum = ntohl(*pkt_data) & 0xffff; + expected_seqnum += words_to_end; + if (expected_seqnum != seqnum) { + ksft_print_msg("[%s] expected seqnum at end [%u], got seqnum [%u]\n", + __func__, expected_seqnum, seqnum); goto error; } return true; error: - pkt_dump(data, len, true); + pkt_dump(data, len, !bytes_processed); return false; } +static bool is_pkt_valid(struct pkt *pkt, void *buffer, u64 addr, u32 len) +{ + if (!pkt) { + ksft_print_msg("[%s] too many packets received\n", __func__); + return false; + } + + if (pkt->len != len) { + ksft_print_msg("[%s] expected packet length [%d], got length [%d]\n", + __func__, pkt->len, len); + pkt_dump(xsk_umem__get_data(buffer, addr), len, true); + return false; + } + + return true; +} + static void kick_tx(struct xsk_socket_info *xsk) { int ret; @@ -854,8 +902,8 @@ static int receive_pkts(struct test_spec *test, struct pollfd *fds) { struct timeval tv_end, tv_now, tv_timeout = {THREAD_TMOUT, 0}; struct pkt_stream *pkt_stream = test->ifobj_rx->pkt_stream; - u32 idx_rx = 0, idx_fq = 0, rcvd, i, pkts_sent = 0; struct xsk_socket_info *xsk = test->ifobj_rx->xsk; + u32 idx_rx = 0, idx_fq = 0, rcvd, pkts_sent = 0; struct ifobject *ifobj = test->ifobj_rx; struct xsk_umem_info *umem = xsk->umem; struct pkt *pkt; @@ -868,6 +916,9 @@ static int receive_pkts(struct test_spec *test, struct pollfd *fds) pkt = pkt_stream_get_next_rx_pkt(pkt_stream, &pkts_sent); while (pkt) { + u32 frags_processed = 0, nb_frags = 0, pkt_len = 0; + u64 first_addr; + ret = gettimeofday(&tv_now, NULL); if (ret) exit_with_error(errno); @@ -913,27 +964,53 @@ static int receive_pkts(struct test_spec *test, struct pollfd *fds) } } - for (i = 0; i < rcvd; i++) { + while (frags_processed < rcvd) { const struct xdp_desc *desc = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++); u64 addr = desc->addr, orig; orig = xsk_umem__extract_addr(addr); addr = xsk_umem__add_offset_to_addr(addr); - if (!is_pkt_valid(pkt, umem->buffer, addr, desc->len) || + if (!is_frag_valid(umem, addr, desc->len, pkt->pkt_nb, pkt_len) || !is_offset_correct(umem, pkt, addr) || (ifobj->use_metadata && !is_metadata_correct(pkt, umem->buffer, addr))) return TEST_FAILURE; + if (!nb_frags++) + first_addr = addr; + frags_processed++; + pkt_len += desc->len; if (ifobj->use_fill_ring) *xsk_ring_prod__fill_addr(&umem->fq, idx_fq++) = orig; + + if (pkt_continues(desc)) + continue; + + /* The complete packet has been received */ + if (!is_pkt_valid(pkt, umem->buffer, first_addr, pkt_len) || + !is_offset_correct(umem, pkt, addr)) + return TEST_FAILURE; + pkt = pkt_stream_get_next_rx_pkt(pkt_stream, &pkts_sent); + nb_frags = 0; + pkt_len = 0; + } + + if (nb_frags) { + /* In the middle of a packet. Start over from beginning of packet. */ + idx_rx -= nb_frags; + xsk_ring_cons__cancel(&xsk->rx, nb_frags); + if (ifobj->use_fill_ring) { + idx_fq -= nb_frags; + xsk_ring_prod__cancel(&umem->fq, nb_frags); + } + frags_processed -= nb_frags; } if (ifobj->use_fill_ring) - xsk_ring_prod__submit(&umem->fq, rcvd); + xsk_ring_prod__submit(&umem->fq, frags_processed); if (ifobj->release_rx) - xsk_ring_cons__release(&xsk->rx, rcvd); + xsk_ring_cons__release(&xsk->rx, frags_processed); pthread_mutex_lock(&pacing_mutex); pkts_in_flight -= pkts_sent; @@ -946,13 +1023,14 @@ static int receive_pkts(struct test_spec *test, struct pollfd *fds) static int __send_pkts(struct ifobject *ifobject, struct pollfd *fds, bool timeout) { + u32 i, idx = 0, valid_pkts = 0, valid_frags = 0, buffer_len; + struct pkt_stream *pkt_stream = ifobject->pkt_stream; struct xsk_socket_info *xsk = ifobject->xsk; struct xsk_umem_info *umem = ifobject->umem; - u32 i, idx = 0, valid_pkts = 0, buffer_len; bool use_poll = ifobject->use_poll; int ret; - buffer_len = pkt_get_buffer_len(umem, ifobject->pkt_stream->max_pkt_len); + buffer_len = pkt_get_buffer_len(umem, pkt_stream->max_pkt_len); /* pkts_in_flight might be negative if many invalid packets are sent */ if (pkts_in_flight >= (int)((umem_size(umem) - BATCH_SIZE * buffer_len) / buffer_len)) { kick_tx(xsk); @@ -983,17 +1061,40 @@ static int __send_pkts(struct ifobject *ifobject, struct pollfd *fds, bool timeo } for (i = 0; i < BATCH_SIZE; i++) { - struct xdp_desc *tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx + i); - struct pkt *pkt = pkt_stream_get_next_tx_pkt(ifobject->pkt_stream); + struct pkt *pkt = pkt_stream_get_next_tx_pkt(pkt_stream); + u32 nb_frags, bytes_written = 0; if (!pkt) break; - tx_desc->addr = pkt_get_addr(pkt, umem); - tx_desc->len = pkt->len; + nb_frags = pkt_nb_frags(umem->frame_size, pkt); + if (nb_frags > BATCH_SIZE - i) { + pkt_stream_cancel(pkt_stream); + xsk_ring_prod__cancel(&xsk->tx, BATCH_SIZE - i); + break; + } + if (pkt->valid) { valid_pkts++; - pkt_generate(ifobject, tx_desc->addr, tx_desc->len, pkt->pkt_nb, 0); + valid_frags += nb_frags; + } + + while (nb_frags--) { + struct xdp_desc *tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx + i); + + tx_desc->addr = pkt_get_addr(pkt, ifobject->umem); + if (nb_frags) { + tx_desc->len = umem->frame_size; + tx_desc->options = XDP_PKT_CONTD; + i++; + } else { + tx_desc->len = pkt->len - bytes_written; + tx_desc->options = 0; + } + if (pkt->valid) + pkt_generate(ifobject, tx_desc->addr, tx_desc->len, pkt->pkt_nb, + bytes_written); + bytes_written += tx_desc->len; } } @@ -1002,7 +1103,7 @@ static int __send_pkts(struct ifobject *ifobject, struct pollfd *fds, bool timeo pthread_mutex_unlock(&pacing_mutex); xsk_ring_prod__submit(&xsk->tx, i); - xsk->outstanding_tx += valid_pkts; + xsk->outstanding_tx += valid_frags; if (use_poll) { ret = poll(fds, 1, POLL_TMOUT); diff --git a/tools/testing/selftests/bpf/xskxceiver.h b/tools/testing/selftests/bpf/xskxceiver.h index aaf27e067640..310b48ad8a3a 100644 --- a/tools/testing/selftests/bpf/xskxceiver.h +++ b/tools/testing/selftests/bpf/xskxceiver.h @@ -38,6 +38,7 @@ #define MAX_TEARDOWN_ITER 10 #define PKT_HDR_SIZE (sizeof(struct ethhdr) + 2) /* Just to align the data in the packet */ #define MIN_PKT_SIZE 64 +#define MAX_ETH_JUMBO_SIZE 9000 #define USLEEP_MAX 10000 #define SOCK_RECONF_CTR 10 #define BATCH_SIZE 64 @@ -47,7 +48,7 @@ #define DEFAULT_UMEM_BUFFERS (DEFAULT_PKT_CNT / 4) #define RX_FULL_RXQSIZE 32 #define UMEM_HEADROOM_TEST_SIZE 128 -#define XSK_UMEM__INVALID_FRAME_SIZE (XSK_UMEM__DEFAULT_FRAME_SIZE + 1) +#define XSK_UMEM__INVALID_FRAME_SIZE (MAX_ETH_JUMBO_SIZE + 1) #define HUGEPAGE_SIZE (2 * 1024 * 1024) #define PKT_DUMP_NB_TO_PRINT 16 From patchwork Mon May 29 15:50:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13258761 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 33D6B125B8; Mon, 29 May 2023 15:51:12 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D0BFCD; Mon, 29 May 2023 08:51:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685375470; x=1716911470; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=j6MnTSSiiGK4xaZkIUN+lb1SDZdKaD0wY03VsqAtELs=; b=fKIN6zMsm2dNR5lepibvbgk3HHlUYS6oN1bI6lbe2RGH3Y2wj3NxFcbb qxVVnQwHCWYdZ8z2BfKQ1e3ABwiY+c8X7y3+kndtzi15yaWttnrfxw8sp qDDiuqf4A36K7HtkMaSODLrNoN+PZqFNph/GGXeaRrlS3R2vfMtAaJfUc 2aBRiRpNbkF/oGWZ8LVsiZmp5KIJYFtgrsSJUQlDmsjWVOYK/lFSqawRc KJO4/jtBXwBziVvdvD0KXj74NTFjKbw3qcmx8sNQ3xri/a6s54220Vkfw yqZqkmr2qL9x+96720TM8O2ZHgHPt72gKlwqYgH8geyivxRUH/80PWPUM A==; X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="344229066" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="344229066" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2023 08:51:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="880441204" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="880441204" Received: from boxer.igk.intel.com ([10.102.20.173]) by orsmga005.jf.intel.com with ESMTP; 29 May 2023 08:51:07 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, tirthendu.sarkar@intel.com Subject: [PATCH v2 bpf-next 17/22] selftests/xsk: add basic multi-buffer test Date: Mon, 29 May 2023 17:50:19 +0200 Message-Id: <20230529155024.222213-18-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230529155024.222213-1-maciej.fijalkowski@intel.com> References: <20230529155024.222213-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Magnus Karlsson Add the first basic multi-buffer test that sends a stream of 9K packets and validates that they are received at the other end. In order to enable sending and receiving multi-buffer packets, code that sets the MTU is introduced as well as modifications to the XDP programs so that they signal that they are multi-buffer enabled. Signed-off-by: Magnus Karlsson --- tools/include/uapi/linux/if_xdp.h | 6 + .../selftests/bpf/progs/xsk_xdp_progs.c | 4 +- tools/testing/selftests/bpf/xsk.c | 135 ++++++++++++++++++ tools/testing/selftests/bpf/xsk.h | 2 + tools/testing/selftests/bpf/xskxceiver.c | 58 ++++++++ tools/testing/selftests/bpf/xskxceiver.h | 5 + 6 files changed, 208 insertions(+), 2 deletions(-) diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h index 80245f5b4dd7..73a47da885dc 100644 --- a/tools/include/uapi/linux/if_xdp.h +++ b/tools/include/uapi/linux/if_xdp.h @@ -25,6 +25,12 @@ * application. */ #define XDP_USE_NEED_WAKEUP (1 << 3) +/* By setting this option, userspace application indicates that it can + * handle multiple descriptors per packet thus enabling xsk core to split + * multi-buffer XDP frames into multiple Rx descriptors. Without this set + * such frames will be dropped by xsk. + */ +#define XDP_USE_SG (1 << 4) /* Flags for xsk_umem_config flags */ #define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0) diff --git a/tools/testing/selftests/bpf/progs/xsk_xdp_progs.c b/tools/testing/selftests/bpf/progs/xsk_xdp_progs.c index a630c95c7471..ac76e7363776 100644 --- a/tools/testing/selftests/bpf/progs/xsk_xdp_progs.c +++ b/tools/testing/selftests/bpf/progs/xsk_xdp_progs.c @@ -15,12 +15,12 @@ struct { static unsigned int idx; int count = 0; -SEC("xdp") int xsk_def_prog(struct xdp_md *xdp) +SEC("xdp.frags") int xsk_def_prog(struct xdp_md *xdp) { return bpf_redirect_map(&xsk, 0, XDP_DROP); } -SEC("xdp") int xsk_xdp_drop(struct xdp_md *xdp) +SEC("xdp.frags") int xsk_xdp_drop(struct xdp_md *xdp) { /* Drop every other packet */ if (idx++ % 2) diff --git a/tools/testing/selftests/bpf/xsk.c b/tools/testing/selftests/bpf/xsk.c index 687d83e707f8..006946fa1bc0 100644 --- a/tools/testing/selftests/bpf/xsk.c +++ b/tools/testing/selftests/bpf/xsk.c @@ -18,10 +18,13 @@ #include #include #include +#include #include #include #include #include +#include +#include #include #include #include @@ -81,6 +84,12 @@ struct xsk_socket { int fd; }; +struct nl_mtu_req { + struct nlmsghdr nh; + struct ifinfomsg msg; + char buf[512]; +}; + int xsk_umem__fd(const struct xsk_umem *umem) { return umem ? umem->fd : -EINVAL; @@ -286,6 +295,132 @@ bool xsk_is_in_mode(u32 ifindex, int mode) return false; } +/* Lifted from netlink.c in tools/lib/bpf */ +static int netlink_recvmsg(int sock, struct msghdr *mhdr, int flags) +{ + int len; + + do { + len = recvmsg(sock, mhdr, flags); + } while (len < 0 && (errno == EINTR || errno == EAGAIN)); + + if (len < 0) + return -errno; + return len; +} + +/* Lifted from netlink.c in tools/lib/bpf */ +static int alloc_iov(struct iovec *iov, int len) +{ + void *nbuf; + + nbuf = realloc(iov->iov_base, len); + if (!nbuf) + return -ENOMEM; + + iov->iov_base = nbuf; + iov->iov_len = len; + return 0; +} + +/* Original version lifted from netlink.c in tools/lib/bpf */ +static int netlink_recv(int sock) +{ + struct iovec iov = {}; + struct msghdr mhdr = { + .msg_iov = &iov, + .msg_iovlen = 1, + }; + bool multipart = true; + struct nlmsgerr *err; + struct nlmsghdr *nh; + int len, ret; + + ret = alloc_iov(&iov, 4096); + if (ret) + goto done; + + while (multipart) { + multipart = false; + len = netlink_recvmsg(sock, &mhdr, MSG_PEEK | MSG_TRUNC); + if (len < 0) { + ret = len; + goto done; + } + + if (len > iov.iov_len) { + ret = alloc_iov(&iov, len); + if (ret) + goto done; + } + + len = netlink_recvmsg(sock, &mhdr, 0); + if (len < 0) { + ret = len; + goto done; + } + + if (len == 0) + break; + + for (nh = (struct nlmsghdr *)iov.iov_base; NLMSG_OK(nh, len); + nh = NLMSG_NEXT(nh, len)) { + if (nh->nlmsg_flags & NLM_F_MULTI) + multipart = true; + switch (nh->nlmsg_type) { + case NLMSG_ERROR: + err = (struct nlmsgerr *)NLMSG_DATA(nh); + if (!err->error) + continue; + ret = err->error; + goto done; + case NLMSG_DONE: + ret = 0; + goto done; + default: + break; + } + } + } + ret = 0; +done: + free(iov.iov_base); + return ret; +} + +int xsk_set_mtu(int ifindex, int mtu) +{ + struct nl_mtu_req req; + struct rtattr *rta; + int fd, ret; + + fd = socket(AF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE); + if (fd < 0) + return fd; + + memset(&req, 0, sizeof(req)); + req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)); + req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK; + req.nh.nlmsg_type = RTM_NEWLINK; + req.msg.ifi_family = AF_UNSPEC; + req.msg.ifi_index = ifindex; + rta = (struct rtattr *)(((char *)&req) + NLMSG_ALIGN(req.nh.nlmsg_len)); + rta->rta_type = IFLA_MTU; + rta->rta_len = RTA_LENGTH(sizeof(unsigned int)); + req.nh.nlmsg_len = NLMSG_ALIGN(req.nh.nlmsg_len) + RTA_LENGTH(sizeof(mtu)); + memcpy(RTA_DATA(rta), &mtu, sizeof(mtu)); + + ret = send(fd, &req, req.nh.nlmsg_len, 0); + if (ret < 0) { + close(fd); + return errno; + } + + ret = netlink_recv(fd); + close(fd); + return ret; +} + int xsk_attach_xdp_program(struct bpf_program *prog, int ifindex, u32 xdp_flags) { int prog_fd; diff --git a/tools/testing/selftests/bpf/xsk.h b/tools/testing/selftests/bpf/xsk.h index 8da8d557768b..d93200fdaa8d 100644 --- a/tools/testing/selftests/bpf/xsk.h +++ b/tools/testing/selftests/bpf/xsk.h @@ -239,6 +239,8 @@ int xsk_socket__create_shared(struct xsk_socket **xsk_ptr, int xsk_umem__delete(struct xsk_umem *umem); void xsk_socket__delete(struct xsk_socket *xsk); +int xsk_set_mtu(int ifindex, int mtu); + #ifdef __cplusplus } /* extern "C" */ #endif diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c index 5e29e8850488..41b8450561c9 100644 --- a/tools/testing/selftests/bpf/xskxceiver.c +++ b/tools/testing/selftests/bpf/xskxceiver.c @@ -49,6 +49,7 @@ * h. tests for invalid and corner case Tx descriptors so that the correct ones * are discarded and let through, respectively. * i. 2K frame size tests + * j. If multi-buffer is supported, send 9k packets divided into 3 frames * * Total tests: 12 * @@ -77,6 +78,7 @@ #include #include #include +#include #include #include #include @@ -253,6 +255,8 @@ static int __xsk_configure_socket(struct xsk_socket_info *xsk, struct xsk_umem_i cfg.bind_flags = ifobject->bind_flags; if (shared) cfg.bind_flags |= XDP_SHARED_UMEM; + if (ifobject->pkt_stream && ifobject->mtu > MAX_ETH_PKT_SIZE) + cfg.bind_flags |= XDP_USE_SG; txr = ifobject->tx_on ? &xsk->tx : NULL; rxr = ifobject->rx_on ? &xsk->rx : NULL; @@ -415,6 +419,7 @@ static void __test_spec_init(struct test_spec *test, struct ifobject *ifobj_tx, test->total_steps = 1; test->nb_sockets = 1; test->fail = false; + test->mtu = MAX_ETH_PKT_SIZE; test->xdp_prog_rx = ifobj_rx->xdp_progs->progs.xsk_def_prog; test->xskmap_rx = ifobj_rx->xdp_progs->maps.xsk; test->xdp_prog_tx = ifobj_tx->xdp_progs->progs.xsk_def_prog; @@ -468,6 +473,26 @@ static void test_spec_set_xdp_prog(struct test_spec *test, struct bpf_program *x test->xskmap_tx = xskmap_tx; } +static int test_spec_set_mtu(struct test_spec *test, int mtu) +{ + int err; + + if (test->ifobj_rx->mtu != mtu) { + err = xsk_set_mtu(test->ifobj_rx->ifindex, mtu); + if (err) + return err; + test->ifobj_rx->mtu = mtu; + } + if (test->ifobj_tx->mtu != mtu) { + err = xsk_set_mtu(test->ifobj_tx->ifindex, mtu); + if (err) + return err; + test->ifobj_tx->mtu = mtu; + } + + return 0; +} + static void pkt_stream_reset(struct pkt_stream *pkt_stream) { if (pkt_stream) @@ -1516,6 +1541,18 @@ static int __testapp_validate_traffic(struct test_spec *test, struct ifobject *i struct ifobject *ifobj2) { pthread_t t0, t1; + int err; + + if (test->mtu > MAX_ETH_PKT_SIZE && (!ifobj1->multi_buff_supp || + (ifobj2 && !ifobj2->multi_buff_supp))) { + ksft_test_result_skip("Multi buffer not supported.\n"); + return TEST_SKIP; + } + err = test_spec_set_mtu(test, test->mtu); + if (err) { + ksft_print_msg("Error, could not set mtu.\n"); + exit_with_error(err); + } if (ifobj2) { if (pthread_barrier_init(&barr, NULL, 2)) @@ -1725,6 +1762,15 @@ static int testapp_single_pkt(struct test_spec *test) return testapp_validate_traffic(test); } +static int testapp_multi_buffer(struct test_spec *test) +{ + test_spec_set_name(test, "RUN_TO_COMPLETION_9K_PACKETS"); + test->mtu = MAX_ETH_JUMBO_SIZE; + pkt_stream_replace(test, DEFAULT_PKT_CNT, MAX_ETH_JUMBO_SIZE); + + return testapp_validate_traffic(test); +} + static int testapp_invalid_desc(struct test_spec *test) { struct xsk_umem_info *umem = test->ifobj_tx->umem; @@ -1858,6 +1904,7 @@ static bool hugepages_present(void) static void init_iface(struct ifobject *ifobj, const char *dst_mac, const char *src_mac, thread_func_t func_ptr) { + LIBBPF_OPTS(bpf_xdp_query_opts, query_opts); int err; memcpy(ifobj->dst_mac, dst_mac, ETH_ALEN); @@ -1873,6 +1920,14 @@ static void init_iface(struct ifobject *ifobj, const char *dst_mac, const char * if (hugepages_present()) ifobj->unaligned_supp = true; + + err = bpf_xdp_query(ifobj->ifindex, XDP_FLAGS_DRV_MODE, &query_opts); + if (err) { + ksft_print_msg("Error querrying XDP capabilities\n"); + exit_with_error(-err); + } + if (query_opts.feature_flags & NETDEV_XDP_ACT_RX_SG) + ifobj->multi_buff_supp = true; } static void run_pkt_test(struct test_spec *test, enum test_mode mode, enum test_type type) @@ -1905,6 +1960,9 @@ static void run_pkt_test(struct test_spec *test, enum test_mode mode, enum test_ test_spec_set_name(test, "RUN_TO_COMPLETION"); ret = testapp_validate_traffic(test); break; + case TEST_TYPE_RUN_TO_COMPLETION_MB: + ret = testapp_multi_buffer(test); + break; case TEST_TYPE_RUN_TO_COMPLETION_SINGLE_PKT: test_spec_set_name(test, "RUN_TO_COMPLETION_SINGLE_PKT"); ret = testapp_single_pkt(test); diff --git a/tools/testing/selftests/bpf/xskxceiver.h b/tools/testing/selftests/bpf/xskxceiver.h index 310b48ad8a3a..c2091b1512f5 100644 --- a/tools/testing/selftests/bpf/xskxceiver.h +++ b/tools/testing/selftests/bpf/xskxceiver.h @@ -38,6 +38,7 @@ #define MAX_TEARDOWN_ITER 10 #define PKT_HDR_SIZE (sizeof(struct ethhdr) + 2) /* Just to align the data in the packet */ #define MIN_PKT_SIZE 64 +#define MAX_ETH_PKT_SIZE 1518 #define MAX_ETH_JUMBO_SIZE 9000 #define USLEEP_MAX 10000 #define SOCK_RECONF_CTR 10 @@ -84,6 +85,7 @@ enum test_type { TEST_TYPE_BPF_RES, TEST_TYPE_XDP_DROP_HALF, TEST_TYPE_XDP_METADATA_COUNT, + TEST_TYPE_RUN_TO_COMPLETION_MB, TEST_TYPE_MAX }; @@ -142,6 +144,7 @@ struct ifobject { struct bpf_program *xdp_prog; enum test_mode mode; int ifindex; + int mtu; u32 bind_flags; bool tx_on; bool rx_on; @@ -152,6 +155,7 @@ struct ifobject { bool shared_umem; bool use_metadata; bool unaligned_supp; + bool multi_buff_supp; u8 dst_mac[ETH_ALEN]; u8 src_mac[ETH_ALEN]; }; @@ -165,6 +169,7 @@ struct test_spec { struct bpf_program *xdp_prog_tx; struct bpf_map *xskmap_rx; struct bpf_map *xskmap_tx; + int mtu; u16 total_steps; u16 current_step; u16 nb_sockets; From patchwork Mon May 29 15:50:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13258762 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 905E912B77; Mon, 29 May 2023 15:51:13 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 68E64C4; Mon, 29 May 2023 08:51:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685375472; x=1716911472; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DjtwMP2aDBHMT1I9aUhD7mqXomqhz7U3PVtqcr9+d+s=; b=FjA+0Fl7/l+ntBxFFLNid08HElxW7FAXZeycDl9HrdVibzTZe7WGsDkm iHXkBvwN4weLDhWdIG1HhBExJzE2ayI3R6TADiDb1kRCc7bV8GLpWzHYI O0XsnZlNWmJZ6GmMMYemHHsu0gEXbVUq7sl6LLrjZwdD7o6PSGEGARIfc oXw0y5Wr4XTm/AgOcMiD6kJwOkL3TBSrYadtdnJKzY5l/EAYqp1gqiZQV XrUtZerJhsVgusBUwaQ/zrH0PbT1CjpIRfSTCgM5vJP3EcxaB5pv31UFx Ol/lrDYUvhEX6XuIzq7gT4Vs5gIv3elpQO65Q4MZQu2JedPZcuvFmN5Rg Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="344229072" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="344229072" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2023 08:51:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="880441216" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="880441216" Received: from boxer.igk.intel.com ([10.102.20.173]) by orsmga005.jf.intel.com with ESMTP; 29 May 2023 08:51:10 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, tirthendu.sarkar@intel.com Subject: [PATCH v2 bpf-next 18/22] selftests/xsk: add unaligned mode test for multi-buffer Date: Mon, 29 May 2023 17:50:20 +0200 Message-Id: <20230529155024.222213-19-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230529155024.222213-1-maciej.fijalkowski@intel.com> References: <20230529155024.222213-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Magnus Karlsson Add a test for multi-buffer AF_XDP when using unaligned mode. The test sends 4096 9K-buffers. Signed-off-by: Magnus Karlsson --- tools/testing/selftests/bpf/xskxceiver.c | 15 +++++++++++++++ tools/testing/selftests/bpf/xskxceiver.h | 1 + 2 files changed, 16 insertions(+) diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c index 41b8450561c9..8ad98eda90ad 100644 --- a/tools/testing/selftests/bpf/xskxceiver.c +++ b/tools/testing/selftests/bpf/xskxceiver.c @@ -50,6 +50,8 @@ * are discarded and let through, respectively. * i. 2K frame size tests * j. If multi-buffer is supported, send 9k packets divided into 3 frames + * k. If multi-buffer and huge pages are supported, send 9k packets in a single frame + * using unaligned mode * * Total tests: 12 * @@ -1754,6 +1756,16 @@ static int testapp_unaligned(struct test_spec *test) return testapp_validate_traffic(test); } +static int testapp_unaligned_mb(struct test_spec *test) +{ + test_spec_set_name(test, "UNALIGNED_MODE_9K"); + test->mtu = MAX_ETH_JUMBO_SIZE; + test->ifobj_tx->umem->unaligned_mode = true; + test->ifobj_rx->umem->unaligned_mode = true; + pkt_stream_replace(test, DEFAULT_PKT_CNT, MAX_ETH_JUMBO_SIZE); + return testapp_validate_traffic(test); +} + static int testapp_single_pkt(struct test_spec *test) { struct pkt pkts[] = {{0, MIN_PKT_SIZE, 0, true}}; @@ -2028,6 +2040,9 @@ static void run_pkt_test(struct test_spec *test, enum test_mode mode, enum test_ case TEST_TYPE_UNALIGNED: ret = testapp_unaligned(test); break; + case TEST_TYPE_UNALIGNED_MB: + ret = testapp_unaligned_mb(test); + break; case TEST_TYPE_HEADROOM: ret = testapp_headroom(test); break; diff --git a/tools/testing/selftests/bpf/xskxceiver.h b/tools/testing/selftests/bpf/xskxceiver.h index c2091b1512f5..42bde17e8d17 100644 --- a/tools/testing/selftests/bpf/xskxceiver.h +++ b/tools/testing/selftests/bpf/xskxceiver.h @@ -86,6 +86,7 @@ enum test_type { TEST_TYPE_XDP_DROP_HALF, TEST_TYPE_XDP_METADATA_COUNT, TEST_TYPE_RUN_TO_COMPLETION_MB, + TEST_TYPE_UNALIGNED_MB, TEST_TYPE_MAX }; From patchwork Mon May 29 15:50:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13258763 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8EAEF8C11; Mon, 29 May 2023 15:51:17 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B96FCD; Mon, 29 May 2023 08:51:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685375475; x=1716911475; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=sc6QA9IRq+G37CyvplPWPfsvOoWWGxhKlH3q/bDvsVw=; b=JI5nO1djGgZQWRgBU0Q0pXg6gf1KHNAVbsU6n1wQYeSHra2Eh2Fqv54R y+51MvKPMrCg1BrWM4SPlXrfJZpDJTSL3nbDCzgAp8NxqAliKJ+n91f8B jMl2MEUK+3Us+45yu3QZTzn/Vp17MBmi4X3dTlnWW82aZpUGCiTQB11EQ N4XyYi9aKdjh7IB1PaqyaynT7ypLS2jyFukSSMCUe4inmwDTrQ73x3438 KH9U+W0fimP4vgpjtl523e+DOiX98NZQZ0u9emzMpo8Y3s4QHRf7VUVmN jDwItzTqwf8fByJzjEIEWnLNwl91u3m8BZNglKn8x0F0kGArFhesPSXZv A==; X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="344229082" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="344229082" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2023 08:51:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="880441243" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="880441243" Received: from boxer.igk.intel.com ([10.102.20.173]) by orsmga005.jf.intel.com with ESMTP; 29 May 2023 08:51:12 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, tirthendu.sarkar@intel.com, Maciej Fijalkowski Subject: [PATCH v2 bpf-next 19/22] selftests/xsk: add invalid descriptor test for multi-buffer Date: Mon, 29 May 2023 17:50:21 +0200 Message-Id: <20230529155024.222213-20-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230529155024.222213-1-maciej.fijalkowski@intel.com> References: <20230529155024.222213-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Magnus Karlsson Add a test that produces lots of nasty descriptors testing the corner cases of the descriptor validation. Some of these descriptors are valid and some are not as indicated by the valid flag. For a description of all the test combinations, please see the code. To stress the API, we need to be able to generate combinations of descriptors that make little sense. A new verbatim mode is introduced for the packet_stream to accomplish this. In this mode, all packets in the packet_stream are sent as is. We do not try to chop them up into frames that are of the right size that we know are going to work as we would normally do. The packets are just written into the Tx ring even if we know they make no sense. Signed-off-by: Magnus Karlsson Signed-off-by: Maciej Fijalkowski # adjusted valid flags for frags --- tools/testing/selftests/bpf/xskxceiver.c | 185 ++++++++++++++++++----- tools/testing/selftests/bpf/xskxceiver.h | 7 + 2 files changed, 151 insertions(+), 41 deletions(-) diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c index 8ad98eda90ad..1e1e9422ca0a 100644 --- a/tools/testing/selftests/bpf/xskxceiver.c +++ b/tools/testing/selftests/bpf/xskxceiver.c @@ -52,8 +52,8 @@ * j. If multi-buffer is supported, send 9k packets divided into 3 frames * k. If multi-buffer and huge pages are supported, send 9k packets in a single frame * using unaligned mode - * - * Total tests: 12 + * l. If multi-buffer is supported, try various nasty combinations of descriptors to + * check if they pass the validation or not * * Flow: * ----- @@ -76,7 +76,6 @@ #include #include #include -#include #include #include #include @@ -95,7 +94,6 @@ #include #include #include -#include #include #include "xsk_xdp_progs.skel.h" @@ -560,9 +558,9 @@ static struct pkt_stream *__pkt_stream_alloc(u32 nb_pkts) return pkt_stream; } -static bool pkt_continues(const struct xdp_desc *desc) +static bool pkt_continues(u32 options) { - return desc->options & XDP_PKT_CONTD; + return options & XDP_PKT_CONTD; } static u32 ceil_u32(u32 a, u32 b) @@ -570,11 +568,32 @@ static u32 ceil_u32(u32 a, u32 b) return (a + b - 1) / b; } -static u32 pkt_nb_frags(u32 frame_size, struct pkt *pkt) +static u32 pkt_nb_frags(u32 frame_size, struct pkt_stream *pkt_stream, struct pkt *pkt) { - if (!pkt || !pkt->valid) + u32 nb_frags = 1, next_frag; + + if (!pkt) return 1; - return ceil_u32(pkt->len, frame_size); + + if (!pkt_stream->verbatim) { + if (!pkt->valid || !pkt->len) + return 1; + return ceil_u32(pkt->len, frame_size); + } + + /* Search for the end of the packet in verbatim mode */ + if (!pkt_continues(pkt->options)) + return nb_frags; + + next_frag = pkt_stream->current_pkt_nb; + pkt++; + while (next_frag++ < pkt_stream->nb_pkts) { + nb_frags++; + if (!pkt_continues(pkt->options) || !pkt->valid) + break; + pkt++; + } + return nb_frags; } static void pkt_set(struct xsk_umem_info *umem, struct pkt *pkt, int offset, u32 len) @@ -694,34 +713,59 @@ static void pkt_generate(struct ifobject *ifobject, u64 addr, u32 len, u32 pkt_n write_payload(data, pkt_nb, bytes_written, len); } -static void __pkt_stream_generate_custom(struct ifobject *ifobj, - struct pkt *pkts, u32 nb_pkts) +static struct pkt_stream *__pkt_stream_generate_custom(struct ifobject *ifobj, struct pkt *frames, + u32 nb_frames, bool verbatim) { + u32 i, len = 0, pkt_nb = 0, payload = 0; struct pkt_stream *pkt_stream; - u32 i; - pkt_stream = __pkt_stream_alloc(nb_pkts); + pkt_stream = __pkt_stream_alloc(nb_frames); if (!pkt_stream) exit_with_error(ENOMEM); - for (i = 0; i < nb_pkts; i++) { - struct pkt *pkt = &pkt_stream->pkts[i]; + for (i = 0; i < nb_frames; i++) { + struct pkt *pkt = &pkt_stream->pkts[pkt_nb]; + struct pkt *frame = &frames[i]; - pkt->offset = pkts[i].offset; - pkt->len = pkts[i].len; - pkt->pkt_nb = i; - pkt->valid = pkts[i].valid; - if (pkt->len > pkt_stream->max_pkt_len) + pkt->offset = frame->offset; + if (verbatim) { + *pkt = *frame; + pkt->pkt_nb = payload; + if (!frame->valid || !pkt_continues(frame->options)) + payload++; + } else { + if (frame->valid) + len += frame->len; + if (frame->valid && pkt_continues(frame->options)) + continue; + + pkt->pkt_nb = pkt_nb; + pkt->len = len; + pkt->valid = frame->valid; + pkt->options = 0; + + len = 0; + } + + if (pkt->valid && pkt->len > pkt_stream->max_pkt_len) pkt_stream->max_pkt_len = pkt->len; + pkt_nb++; } - ifobj->pkt_stream = pkt_stream; + pkt_stream->nb_pkts = pkt_nb; + pkt_stream->verbatim = verbatim; + return pkt_stream; } static void pkt_stream_generate_custom(struct test_spec *test, struct pkt *pkts, u32 nb_pkts) { - __pkt_stream_generate_custom(test->ifobj_tx, pkts, nb_pkts); - __pkt_stream_generate_custom(test->ifobj_rx, pkts, nb_pkts); + struct pkt_stream *pkt_stream; + + pkt_stream = __pkt_stream_generate_custom(test->ifobj_tx, pkts, nb_pkts, true); + test->ifobj_tx->pkt_stream = pkt_stream; + + pkt_stream = __pkt_stream_generate_custom(test->ifobj_rx, pkts, nb_pkts, false); + test->ifobj_rx->pkt_stream = pkt_stream; } static void pkt_print_data(u32 *data, u32 cnt) @@ -862,11 +906,6 @@ static bool is_frag_valid(struct xsk_umem_info *umem, u64 addr, u32 len, u32 exp static bool is_pkt_valid(struct pkt *pkt, void *buffer, u64 addr, u32 len) { - if (!pkt) { - ksft_print_msg("[%s] too many packets received\n", __func__); - return false; - } - if (pkt->len != len) { ksft_print_msg("[%s] expected packet length [%d], got length [%d]\n", __func__, pkt->len, len); @@ -966,7 +1005,6 @@ static int receive_pkts(struct test_spec *test, struct pollfd *fds) ksft_print_msg("ERROR: [%s] Poll timed out\n", __func__); return TEST_FAILURE; - } if (!(fds->revents & POLLIN)) @@ -998,6 +1036,12 @@ static int receive_pkts(struct test_spec *test, struct pollfd *fds) orig = xsk_umem__extract_addr(addr); addr = xsk_umem__add_offset_to_addr(addr); + if (!pkt) { + ksft_print_msg("[%s] received too many packets addr: %lx len %u\n", + __func__, addr, desc->len); + return TEST_FAILURE; + } + if (!is_frag_valid(umem, addr, desc->len, pkt->pkt_nb, pkt_len) || !is_offset_correct(umem, pkt, addr) || (ifobj->use_metadata && !is_metadata_correct(pkt, umem->buffer, addr))) @@ -1010,7 +1054,7 @@ static int receive_pkts(struct test_spec *test, struct pollfd *fds) if (ifobj->use_fill_ring) *xsk_ring_prod__fill_addr(&umem->fq, idx_fq++) = orig; - if (pkt_continues(desc)) + if (pkt_continues(desc->options)) continue; /* The complete packet has been received */ @@ -1089,31 +1133,29 @@ static int __send_pkts(struct ifobject *ifobject, struct pollfd *fds, bool timeo for (i = 0; i < BATCH_SIZE; i++) { struct pkt *pkt = pkt_stream_get_next_tx_pkt(pkt_stream); - u32 nb_frags, bytes_written = 0; + u32 nb_frags_left, nb_frags, bytes_written = 0; if (!pkt) break; - nb_frags = pkt_nb_frags(umem->frame_size, pkt); + nb_frags = pkt_nb_frags(umem->frame_size, pkt_stream, pkt); if (nb_frags > BATCH_SIZE - i) { pkt_stream_cancel(pkt_stream); xsk_ring_prod__cancel(&xsk->tx, BATCH_SIZE - i); break; } + nb_frags_left = nb_frags; - if (pkt->valid) { - valid_pkts++; - valid_frags += nb_frags; - } - - while (nb_frags--) { + while (nb_frags_left--) { struct xdp_desc *tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx + i); tx_desc->addr = pkt_get_addr(pkt, ifobject->umem); - if (nb_frags) { + if (pkt_stream->verbatim) { + tx_desc->len = pkt->len; + tx_desc->options = pkt->options; + } else if (nb_frags_left) { tx_desc->len = umem->frame_size; tx_desc->options = XDP_PKT_CONTD; - i++; } else { tx_desc->len = pkt->len - bytes_written; tx_desc->options = 0; @@ -1122,6 +1164,17 @@ static int __send_pkts(struct ifobject *ifobject, struct pollfd *fds, bool timeo pkt_generate(ifobject, tx_desc->addr, tx_desc->len, pkt->pkt_nb, bytes_written); bytes_written += tx_desc->len; + + if (nb_frags_left) { + i++; + if (pkt_stream->verbatim) + pkt = pkt_stream_get_next_tx_pkt(pkt_stream); + } + } + + if (pkt && pkt->valid) { + valid_pkts++; + valid_frags += nb_frags; } } @@ -1350,7 +1403,7 @@ static void xsk_populate_fill_ring(struct xsk_umem_info *umem, struct pkt_stream u64 addr; u32 i; - for (i = 0; i < pkt_nb_frags(rx_frame_size, pkt); i++) { + for (i = 0; i < pkt_nb_frags(rx_frame_size, pkt_stream, pkt); i++) { if (!pkt) { if (!fill_up) break; @@ -1783,6 +1836,46 @@ static int testapp_multi_buffer(struct test_spec *test) return testapp_validate_traffic(test); } +static int testapp_invalid_desc_mb(struct test_spec *test) +{ + struct xsk_umem_info *umem = test->ifobj_tx->umem; + u64 umem_size = umem->num_frames * umem->frame_size; + struct pkt pkts[] = { + /* Valid packet for synch to start with */ + {0, MIN_PKT_SIZE, 0, true, 0}, + /* Zero frame len is not legal */ + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, + {0, 0, 0, false, 0}, + /* Invalid address in the second frame */ + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, + {umem_size, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, + /* Invalid len in the middle */ + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, + {0, XSK_UMEM__INVALID_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, + /* Invalid options in the middle */ + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XSK_DESC__INVALID_OPTION}, + /* Transmit 2 frags, receive 3 */ + {0, XSK_UMEM__MAX_FRAME_SIZE, 0, true, XDP_PKT_CONTD}, + {0, XSK_UMEM__MAX_FRAME_SIZE, 0, true, 0}, + /* Middle frame crosses chunk boundary with small length */ + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, + {-MIN_PKT_SIZE / 2, MIN_PKT_SIZE, 0, false, 0}, + /* Valid packet for synch so that something is received */ + {0, MIN_PKT_SIZE, 0, true, 0}}; + + if (umem->unaligned_mode) { + /* Crossing a chunk boundary allowed */ + pkts[12].valid = true; + pkts[13].valid = true; + } + + test->mtu = MAX_ETH_JUMBO_SIZE; + pkt_stream_generate_custom(test, pkts, ARRAY_SIZE(pkts)); + return testapp_validate_traffic(test); +} + static int testapp_invalid_desc(struct test_spec *test) { struct xsk_umem_info *umem = test->ifobj_tx->umem; @@ -2037,6 +2130,16 @@ static void run_pkt_test(struct test_spec *test, enum test_mode mode, enum test_ ret = testapp_invalid_desc(test); break; } + case TEST_TYPE_ALIGNED_INV_DESC_MB: + test_spec_set_name(test, "ALIGNED_INV_DESC_MULTI_BUFF"); + ret = testapp_invalid_desc_mb(test); + break; + case TEST_TYPE_UNALIGNED_INV_DESC_MB: + test_spec_set_name(test, "UNALIGNED_INV_DESC_MULTI_BUFF"); + test->ifobj_tx->umem->unaligned_mode = true; + test->ifobj_rx->umem->unaligned_mode = true; + ret = testapp_invalid_desc_mb(test); + break; case TEST_TYPE_UNALIGNED: ret = testapp_unaligned(test); break; diff --git a/tools/testing/selftests/bpf/xskxceiver.h b/tools/testing/selftests/bpf/xskxceiver.h index 42bde17e8d17..de781357ea15 100644 --- a/tools/testing/selftests/bpf/xskxceiver.h +++ b/tools/testing/selftests/bpf/xskxceiver.h @@ -50,6 +50,9 @@ #define RX_FULL_RXQSIZE 32 #define UMEM_HEADROOM_TEST_SIZE 128 #define XSK_UMEM__INVALID_FRAME_SIZE (MAX_ETH_JUMBO_SIZE + 1) +#define XSK_UMEM__LARGE_FRAME_SIZE (3 * 1024) +#define XSK_UMEM__MAX_FRAME_SIZE (4 * 1024) +#define XSK_DESC__INVALID_OPTION (0xffff) #define HUGEPAGE_SIZE (2 * 1024 * 1024) #define PKT_DUMP_NB_TO_PRINT 16 @@ -87,6 +90,8 @@ enum test_type { TEST_TYPE_XDP_METADATA_COUNT, TEST_TYPE_RUN_TO_COMPLETION_MB, TEST_TYPE_UNALIGNED_MB, + TEST_TYPE_ALIGNED_INV_DESC_MB, + TEST_TYPE_UNALIGNED_INV_DESC_MB, TEST_TYPE_MAX }; @@ -119,6 +124,7 @@ struct pkt { u32 len; u32 pkt_nb; bool valid; + u16 options; }; struct pkt_stream { @@ -126,6 +132,7 @@ struct pkt_stream { u32 current_pkt_nb; struct pkt *pkts; u32 max_pkt_len; + bool verbatim; }; struct ifobject; From patchwork Mon May 29 15:50:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13258764 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 96A46134B4; Mon, 29 May 2023 15:51:18 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5BFBBA3; Mon, 29 May 2023 08:51:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685375477; x=1716911477; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=JEcwyDi2S/o9tBSXLAY7AqTGgMfcnOLSTWP8FSJ2hp4=; b=nw9B6d+waK3Bnuee2jsVJ2tXZG1t9dnkYVpH9Gs/TtB6b33jR6q6ioUg 9CAJe0m6CDdNmLovlifvXJsX/rU3XtutkATcllExhlizjW6Ie55Mp3u/j 35843DjGZUFOXD6WZOWsBhBG7ivEe8NlXbyzisIChgMT4xQ/zoaegQDpe 9julaSe4rwaDCeb+ILFdvZoLwR/uyi/fQwfeVeD7jB/DYF8PnD067Vlu2 ruLDsKTVGI8KhZfyIJHGksXdOZiINKrJxA4g09PXMi0LWaknZu1h4gENB h2UP6g3aze3byo5aHra8BdDYOAoAjYYpnlcTTwslCPj2yHcLPjw5i38R6 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="344229092" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="344229092" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2023 08:51:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="880441259" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="880441259" Received: from boxer.igk.intel.com ([10.102.20.173]) by orsmga005.jf.intel.com with ESMTP; 29 May 2023 08:51:14 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, tirthendu.sarkar@intel.com Subject: [PATCH v2 bpf-next 20/22] selftests/xsk: add metadata copy test for multi-buff Date: Mon, 29 May 2023 17:50:22 +0200 Message-Id: <20230529155024.222213-21-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230529155024.222213-1-maciej.fijalkowski@intel.com> References: <20230529155024.222213-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Magnus Karlsson Enable the already existing metadata copy test to also run in multi-buffer mode with 9K packets. Signed-off-by: Magnus Karlsson --- tools/testing/selftests/bpf/progs/xsk_xdp_progs.c | 2 +- tools/testing/selftests/bpf/xskxceiver.c | 7 ++++++- tools/testing/selftests/bpf/xskxceiver.h | 1 + 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/xsk_xdp_progs.c b/tools/testing/selftests/bpf/progs/xsk_xdp_progs.c index ac76e7363776..24369f242853 100644 --- a/tools/testing/selftests/bpf/progs/xsk_xdp_progs.c +++ b/tools/testing/selftests/bpf/progs/xsk_xdp_progs.c @@ -29,7 +29,7 @@ SEC("xdp.frags") int xsk_xdp_drop(struct xdp_md *xdp) return bpf_redirect_map(&xsk, 0, XDP_DROP); } -SEC("xdp") int xsk_xdp_populate_metadata(struct xdp_md *xdp) +SEC("xdp.frags") int xsk_xdp_populate_metadata(struct xdp_md *xdp) { void *data, *data_meta; struct xdp_info *meta; diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c index 1e1e9422ca0a..658e978053d4 100644 --- a/tools/testing/selftests/bpf/xskxceiver.c +++ b/tools/testing/selftests/bpf/xskxceiver.c @@ -1942,7 +1942,6 @@ static int testapp_xdp_metadata_count(struct test_spec *test) int count = 0; int key = 0; - test_spec_set_name(test, "XDP_METADATA_COUNT"); test_spec_set_xdp_prog(test, skel_rx->progs.xsk_xdp_populate_metadata, skel_tx->progs.xsk_xdp_populate_metadata, skel_rx->maps.xsk, skel_tx->maps.xsk); @@ -2153,6 +2152,12 @@ static void run_pkt_test(struct test_spec *test, enum test_mode mode, enum test_ ret = testapp_xdp_drop(test); break; case TEST_TYPE_XDP_METADATA_COUNT: + test_spec_set_name(test, "XDP_METADATA_COUNT"); + ret = testapp_xdp_metadata_count(test); + break; + case TEST_TYPE_XDP_METADATA_COUNT_MB: + test_spec_set_name(test, "XDP_METADATA_COUNT_MULTI_BUFF"); + test->mtu = MAX_ETH_JUMBO_SIZE; ret = testapp_xdp_metadata_count(test); break; default: diff --git a/tools/testing/selftests/bpf/xskxceiver.h b/tools/testing/selftests/bpf/xskxceiver.h index de781357ea15..786c2dfab7a6 100644 --- a/tools/testing/selftests/bpf/xskxceiver.h +++ b/tools/testing/selftests/bpf/xskxceiver.h @@ -88,6 +88,7 @@ enum test_type { TEST_TYPE_BPF_RES, TEST_TYPE_XDP_DROP_HALF, TEST_TYPE_XDP_METADATA_COUNT, + TEST_TYPE_XDP_METADATA_COUNT_MB, TEST_TYPE_RUN_TO_COMPLETION_MB, TEST_TYPE_UNALIGNED_MB, TEST_TYPE_ALIGNED_INV_DESC_MB, From patchwork Mon May 29 15:50:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13258765 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8A6BC14265; Mon, 29 May 2023 15:51:20 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 751F0A3; Mon, 29 May 2023 08:51:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685375479; x=1716911479; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=95GqTvlmuQ59hVLj2UomxKfzOqq0J8bLQWQg/CLG5/M=; b=jvZpJJfEqJa8K8L+a15FilDhz++O5IsF1Js/9BgEIqrdjyAjs+/wEv15 Ho8XswPQ7qgBFod8botNfj1RU4mzYr0+Rp9XjTVaVitUAI9M6hFBpy3oN ou0x1Qz+9qLxaNZHKqTiMSzZFbDsk8mRSO4dcTOQZ7vOZO13HkmeH9MwD ZjXGIcopU4nK6TpaxFmXfs9T8mOj/CrW0dtzxJUTUkXWwQ8xDvm8cET6n ndoxR9WpIsbtSYRPoxq8TyN2U77vMqKkjsO9sFZ0nrrgXz3aRwbfED+Bs /P3RDpDHqIpftgvyXHGmRMaYTwUtiCQJLT2Wcs9ACOy3Vnlo3xp11n/Mf g==; X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="344229104" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="344229104" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2023 08:51:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="880441270" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="880441270" Received: from boxer.igk.intel.com ([10.102.20.173]) by orsmga005.jf.intel.com with ESMTP; 29 May 2023 08:51:16 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, tirthendu.sarkar@intel.com Subject: [PATCH v2 bpf-next 21/22] selftests/xsk: add test for too many frags Date: Mon, 29 May 2023 17:50:23 +0200 Message-Id: <20230529155024.222213-22-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230529155024.222213-1-maciej.fijalkowski@intel.com> References: <20230529155024.222213-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net From: Magnus Karlsson Add a test for testing that a packet consisting of more than 17 frags is discarded. This is only valid for SKB and DRV mode since in zero-copy mode, this limit is up to the HW and what it supports. Signed-off-by: Magnus Karlsson --- tools/testing/selftests/bpf/xskxceiver.c | 37 ++++++++++++++++++++++++ tools/testing/selftests/bpf/xskxceiver.h | 2 ++ 2 files changed, 39 insertions(+) diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c index 658e978053d4..430ff0cec24c 100644 --- a/tools/testing/selftests/bpf/xskxceiver.c +++ b/tools/testing/selftests/bpf/xskxceiver.c @@ -1975,6 +1975,40 @@ static int testapp_poll_rxq_tmout(struct test_spec *test) return testapp_validate_traffic_single_thread(test, test->ifobj_rx); } +static int testapp_too_many_frags(struct test_spec *test) +{ + struct pkt pkts[XSK_DESC__MAX_FRAGS + 3] = {}; + u32 i; + + test_spec_set_name(test, "TOO_MANY_FRAGS"); + if (test->mode == TEST_MODE_ZC) { + /* Limit is up to driver for zero-copy mode so not testable. */ + ksft_test_result_skip("Cannot be run for zero-copy mode.\n"); + return TEST_SKIP; + } + + test->mtu = MAX_ETH_JUMBO_SIZE; + + /* Valid packet for synch */ + pkts[0].len = MIN_PKT_SIZE; + pkts[0].valid = true; + + /* Do not signal end-of-packet in the 17th frag. This is not legal. */ + for (i = 1; i < XSK_DESC__MAX_FRAGS + 2; i++) { + pkts[i].len = MIN_PKT_SIZE; + pkts[i].options = XDP_PKT_CONTD; + pkts[i].valid = true; + } + pkts[XSK_DESC__MAX_FRAGS + 1].valid = false; + + /* Valid packet for synch */ + pkts[XSK_DESC__MAX_FRAGS + 2].len = MIN_PKT_SIZE; + pkts[XSK_DESC__MAX_FRAGS + 2].valid = true; + + pkt_stream_generate_custom(test, pkts, ARRAY_SIZE(pkts)); + return testapp_validate_traffic(test); +} + static int xsk_load_xdp_programs(struct ifobject *ifobj) { ifobj->xdp_progs = xsk_xdp_progs__open_and_load(); @@ -2160,6 +2194,9 @@ static void run_pkt_test(struct test_spec *test, enum test_mode mode, enum test_ test->mtu = MAX_ETH_JUMBO_SIZE; ret = testapp_xdp_metadata_count(test); break; + case TEST_TYPE_TOO_MANY_FRAGS: + ret = testapp_too_many_frags(test); + break; default: break; } diff --git a/tools/testing/selftests/bpf/xskxceiver.h b/tools/testing/selftests/bpf/xskxceiver.h index 786c2dfab7a6..bb9d36a7e544 100644 --- a/tools/testing/selftests/bpf/xskxceiver.h +++ b/tools/testing/selftests/bpf/xskxceiver.h @@ -53,6 +53,7 @@ #define XSK_UMEM__LARGE_FRAME_SIZE (3 * 1024) #define XSK_UMEM__MAX_FRAME_SIZE (4 * 1024) #define XSK_DESC__INVALID_OPTION (0xffff) +#define XSK_DESC__MAX_FRAGS 17 #define HUGEPAGE_SIZE (2 * 1024 * 1024) #define PKT_DUMP_NB_TO_PRINT 16 @@ -93,6 +94,7 @@ enum test_type { TEST_TYPE_UNALIGNED_MB, TEST_TYPE_ALIGNED_INV_DESC_MB, TEST_TYPE_UNALIGNED_INV_DESC_MB, + TEST_TYPE_TOO_MANY_FRAGS, TEST_TYPE_MAX }; From patchwork Mon May 29 15:50:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 13258766 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C1308174FB; Mon, 29 May 2023 15:51:22 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 73B70A3; Mon, 29 May 2023 08:51:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1685375481; x=1716911481; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=gSB+hgR2vfbiw29TZSXitUEJtG7n/EKVboWM74wIYpY=; b=Xv8r6h4eDeKa3SgZCwyBhNCGkNsgO0jsXivf50OC787RySwMq9xXS7xY JyqiHCWvYaiOx+pj5O2Mc4hXHgUN36Zuf9qhADZbR+zymNhSiAl3IqVRl lp9y12Bs80QhHfs046R8G6oLUuQajo3cZw0qGKxavaA5vfvlWOGO9HPSG gXfONAtexwyZ8FPY8ANH0/+mj1g7bDSKNOJlvAunrBzma5yz1DbA3b2c8 Hk9eneATfrE7vo/mVTaDo6llWzgw886hpE+2/jALTzFNXYx52DxYSFiUZ B3Vhib8MBNccslTakGVT2op1wmeah7ZrpaBL7fQ1zTKJyrQr3biXt6gIT A==; X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="344229111" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="344229111" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2023 08:51:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10725"; a="880441287" X-IronPort-AV: E=Sophos;i="6.00,201,1681196400"; d="scan'208";a="880441287" Received: from boxer.igk.intel.com ([10.102.20.173]) by orsmga005.jf.intel.com with ESMTP; 29 May 2023 08:51:19 -0700 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, tirthendu.sarkar@intel.com, Maciej Fijalkowski Subject: [PATCH v2 bpf-next 22/22] selftests/xsk: reset NIC settings to default after running test suite Date: Mon, 29 May 2023 17:50:24 +0200 Message-Id: <20230529155024.222213-23-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230529155024.222213-1-maciej.fijalkowski@intel.com> References: <20230529155024.222213-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Currently, when running ZC test suite, after finishing first run of test suite and then switching to busy-poll tests within xskxceiver, such errors are observed: libbpf: Kernel error message: ice: MTU is too large for linear frames and XDP prog does not support frags 1..26 libbpf: Kernel error message: Native and generic XDP can't be active at the same time Error attaching XDP program not ok 1 [xskxceiver.c:xsk_reattach_xdp:1568]: ERROR: 17/"File exists" this is because test suite ends with 9k MTU and native xdp program being loaded. Busy-poll tests start non-multi-buffer tests for generic mode. To fix this, let us introduce bash function that will reset NIC settings to default (e.g. 1500 MTU and no xdp progs loaded) so that test suite can continue without interrupts. It also means that after busy-poll tests NIC will have those default settings, whereas right now it is left with 9k MTU and xdp prog loaded in native mode. Signed-off-by: Maciej Fijalkowski --- tools/testing/selftests/bpf/test_xsk.sh | 5 +++++ tools/testing/selftests/bpf/xsk_prereqs.sh | 7 +++++++ 2 files changed, 12 insertions(+) diff --git a/tools/testing/selftests/bpf/test_xsk.sh b/tools/testing/selftests/bpf/test_xsk.sh index c2ad50f26b63..2aa5a3445056 100755 --- a/tools/testing/selftests/bpf/test_xsk.sh +++ b/tools/testing/selftests/bpf/test_xsk.sh @@ -171,7 +171,10 @@ exec_xskxceiver if [ -z $ETH ]; then cleanup_exit ${VETH0} ${VETH1} +else + cleanup_iface ${ETH} ${MTU} fi + TEST_NAME="XSK_SELFTESTS_${VETH0}_BUSY_POLL" busy_poll=1 @@ -184,6 +187,8 @@ exec_xskxceiver if [ -z $ETH ]; then cleanup_exit ${VETH0} ${VETH1} +else + cleanup_iface ${ETH} ${MTU} fi failures=0 diff --git a/tools/testing/selftests/bpf/xsk_prereqs.sh b/tools/testing/selftests/bpf/xsk_prereqs.sh index ae697a10a056..29175682c44d 100755 --- a/tools/testing/selftests/bpf/xsk_prereqs.sh +++ b/tools/testing/selftests/bpf/xsk_prereqs.sh @@ -53,6 +53,13 @@ test_exit() exit 1 } +cleanup_iface() +{ + ip link set $1 mtu $2 + ip link set $1 xdp off + ip link set $1 xdpgeneric off +} + clear_configs() { [ $(ip link show $1 &>/dev/null; echo $?;) == 0 ] &&