From patchwork Mon Aug 7 12:15:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13344042 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 090C84436 for ; Mon, 7 Aug 2023 12:15:25 +0000 (UTC) Received: from mail-pj1-x102e.google.com (mail-pj1-x102e.google.com [IPv6:2607:f8b0:4864:20::102e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38D26E44 for ; Mon, 7 Aug 2023 05:15:24 -0700 (PDT) Received: by mail-pj1-x102e.google.com with SMTP id 98e67ed59e1d1-26854159c05so2364617a91.2 for ; Mon, 07 Aug 2023 05:15:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691410523; x=1692015323; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=uLQYdihKrZnqzA8pyJFR0KmRfev22Y9Nk5IK1CsNWSQ=; b=bEAsqH3EjC2u0IGUnDABZVNA3fbPjqYFRA+b6OttdreALvj5DL7l7Ckx1qTRiEI13l /MNN1P3o3BUJAbMyJ5RZk7d1fAqiWmKhRsmLQRel5SV+JfC6akLzqW9B/w0QwXMG+Axx EbhFL6nmoNWKymjJleYouiGVXtulkdgMMReMJ48ag1EerhFmBMPRRxPb4+VYYK78FigG H8+2tCLHg7Q5ZF78UsXOv+7taDaXBCWTD0bY26hvadEh1o3oxyl8/xWhkUrT6uL19SQs ifTEX5J9eqOjtREpgVr+LfniIynboIEd/NlIcpVj3S468oO+PrfuwNV2Lw0N/Icu+Bif mfvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691410523; x=1692015323; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uLQYdihKrZnqzA8pyJFR0KmRfev22Y9Nk5IK1CsNWSQ=; b=MnvtguvUWTIdjEn7p6SbN7ANvJTOHGAFgqlTr9vQ4nkyWORJdEu0Ult8MKPSNQ1lz0 nH3kPO4Ds0OysDm/WjGyguI7lqoq6F3XO9nivvwHOOb5e5cbqpHy3Ijoa7kkqBIa7DXo H5F95uNH5c4ELEW+gPsQBbuDL3TRcnhR2ZjQNivSIJ04oCmiHNjvXVDzugnRWlUOBkHb PDczbXWqskLropIy2S2McdOR+BiBfMRpALXKpL2uOCSBF33LzAPc2FqfZ2jTKpT1fde9 /0H58MEUtqNE25wHaql6iOOde4vy2BuxUHGemYbTTsVkArgast1NNTneTgqoQWjXDHt4 R9GQ== X-Gm-Message-State: AOJu0Yy0WmCdJbus0TxPtpBLw7o9uJNfbuy6Z+VNuKt8EfZJFnn4VQGe FVk/a2jn7o235XcYvoowTWJbzA== X-Google-Smtp-Source: AGHT+IEFVp6KMtDbFDIQLBkpU/u99kKhhVeErfazpqq/dYJkpUEs2Le54iKKZ2BwOj5rl02nFUtamA== X-Received: by 2002:a17:90b:3e88:b0:269:3771:7342 with SMTP id rj8-20020a17090b3e8800b0026937717342mr4621802pjb.18.1691410523675; Mon, 07 Aug 2023 05:15:23 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id n12-20020a17090ac68c00b00268320ab9f2sm8645761pjt.6.2023.08.07.05.15.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 05:15:23 -0700 (PDT) From: Albert Huang To: Cc: Albert Huang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , netdev@vger.kernel.org (open list:NETWORKING DRIVERS), linux-kernel@vger.kernel.org (open list) Subject: [RFC v2 Optimizing veth xsk performance 1/9] veth: Implement ethtool's get_ringparam() callback Date: Mon, 7 Aug 2023 20:15:10 +0800 Message-Id: <20230807121510.84113-1-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230807120434.83644-1-huangjie.albert@bytedance.com> References: <20230807120434.83644-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC some xsk library calls get_ringparam() API to get the queue length to init the xsk umem. Implement that in veth so those scenarios can work properly. Signed-off-by: Albert Huang --- drivers/net/veth.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 614f3e3efab0..77e12d52ca2b 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -255,6 +255,17 @@ static void veth_get_channels(struct net_device *dev, static int veth_set_channels(struct net_device *dev, struct ethtool_channels *ch); +static void veth_get_ringparam(struct net_device *dev, + struct ethtool_ringparam *ring, + struct kernel_ethtool_ringparam *kernel_ring, + struct netlink_ext_ack *extack) +{ + ring->rx_max_pending = VETH_RING_SIZE; + ring->tx_max_pending = VETH_RING_SIZE; + ring->rx_pending = VETH_RING_SIZE; + ring->tx_pending = VETH_RING_SIZE; +} + static const struct ethtool_ops veth_ethtool_ops = { .get_drvinfo = veth_get_drvinfo, .get_link = ethtool_op_get_link, @@ -265,6 +276,7 @@ static const struct ethtool_ops veth_ethtool_ops = { .get_ts_info = ethtool_op_get_ts_info, .get_channels = veth_get_channels, .set_channels = veth_set_channels, + .get_ringparam = veth_get_ringparam, }; /* general routines */ From patchwork Mon Aug 7 12:19:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13344044 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4E800FC17 for ; Mon, 7 Aug 2023 12:20:01 +0000 (UTC) Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B6BD1BE8 for ; Mon, 7 Aug 2023 05:19:37 -0700 (PDT) Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-1bc6624623cso12030565ad.3 for ; Mon, 07 Aug 2023 05:19:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691410772; x=1692015572; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CSkaRhEoax/kU+S7BeIv84prtFC3VRpkmwXx3cyNhW4=; b=lNSdc/+v7DVvIEdGXmh9m6TWjMuePx5ixHEq5A3moRUidZPbc2VkVmgz1HIHwnketG iYbjjR8DZZK5NGxcmtk9G8i7h01IWnAeFmogRREbWXxQdB2EObvt44H1JoEpCjKbiuHC T57RpuMx0LOtogAgDRmTcNHhLJhyJTLgUnc2UgsniuZpFVkr4JH6dprrwRSG2kSfmWgt wadkCo10VcLkZ+OEiFVhdCQdxybRGB5eoyhqF5gj/aQr4p6d9334in9xO74pK7dl6UVJ 4d7212QuSx1U5p8qpETqLnQRyUeq2pHeC2RruKBBb4hgNQeHaypp1chUN9oxFR4w2mK9 +A3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691410772; x=1692015572; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CSkaRhEoax/kU+S7BeIv84prtFC3VRpkmwXx3cyNhW4=; b=aqOh7xBfD7oocaf0Fjrvu6plwak6VwovHsFTkj+SLRrzQY+cSgFnMlCy/qwa9aloni 9znHEbJHJdfoGdFMTUsqjerC/8/3E8+BHO3dbDyu9luHXOYfR9i8UAVl0Kkwo/GEgJCF yIu8t6JKT036LWNS2NxoZ48z4NJeflf7rYWIPhqsipxSu0T2CdG77GIbZt1K7OAgW7D6 CJ91Ucaex1vVNIHHHR3LmWr62DRMfKuOXudvX9SbfiK+0YTCwPpb1dqYNzwT8hTs0+yt nt+5KGqJoVe+MFnGQAb67HkWjv0d6C7bTuTv/dRdMKt5krRYqhRxFMgu6bzBElTdNIZX 6ELQ== X-Gm-Message-State: AOJu0YxfR+BuNDydQDH0rpsRUWDCgmBaLA/d8E9Srfn1Ome40QlpBofE PHthjTjWpBNqjO2ZYYCqwSpHKA== X-Google-Smtp-Source: AGHT+IG0EOJYPbVCZdHWLuCBSIWm1P+FwFwIuu5YzmQQNlmkkc7aqTPCBYttrj4ssxYQsG4YxAEaPw== X-Received: by 2002:a17:902:e743:b0:1bb:8e13:deba with SMTP id p3-20020a170902e74300b001bb8e13debamr11182082plf.11.1691410772101; Mon, 07 Aug 2023 05:19:32 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id m1-20020a170902db0100b001b9d95945afsm6758444plx.155.2023.08.07.05.19.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 05:19:31 -0700 (PDT) From: Albert Huang To: Cc: Albert Huang , =?utf-8?b?QmrDtnJuIFTDtnBl?= =?utf-8?b?bA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , netdev@vger.kernel.org (open list:XDP SOCKETS (AF_XDP)), bpf@vger.kernel.org (open list:XDP SOCKETS (AF_XDP)), linux-kernel@vger.kernel.org (open list) Subject: [RFC v2 Optimizing veth xsk performance 2/9] xsk: add dma_check_skip for skipping dma check Date: Mon, 7 Aug 2023 20:19:17 +0800 Message-Id: <20230807121917.84905-1-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230807120434.83644-1-huangjie.albert@bytedance.com> References: <20230807120434.83644-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC for the virtual net device such as veth, there is no need to do dma check if we support zero copy. add this flag after unaligned. beacause there is 4 bytes hole pahole -V ./net/xdp/xsk_buff_pool.o: ----------- ... /* --- cacheline 3 boundary (192 bytes) --- */ u32 chunk_size; /* 192 4 */ u32 frame_len; /* 196 4 */ u8 cached_need_wakeup; /* 200 1 */ bool uses_need_wakeup; /* 201 1 */ bool dma_need_sync; /* 202 1 */ bool unaligned; /* 203 1 */ /* XXX 4 bytes hole, try to pack */ void * addrs; /* 208 8 */ spinlock_t cq_lock; /* 216 4 */ ... ----------- Signed-off-by: Albert Huang --- include/net/xsk_buff_pool.h | 1 + net/xdp/xsk_buff_pool.c | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index b0bdff26fc88..fe31097dc11b 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -81,6 +81,7 @@ struct xsk_buff_pool { bool uses_need_wakeup; bool dma_need_sync; bool unaligned; + bool dma_check_skip; void *addrs; /* Mutual exclusion of the completion ring in the SKB mode. Two cases to protect: * NAPI TX thread and sendmsg error paths in the SKB destructor callback and when diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index b3f7b310811e..ed251b8e8773 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -85,6 +85,7 @@ struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs, XDP_PACKET_HEADROOM; pool->umem = umem; pool->addrs = umem->addrs; + pool->dma_check_skip = false; INIT_LIST_HEAD(&pool->free_list); INIT_LIST_HEAD(&pool->xskb_list); INIT_LIST_HEAD(&pool->xsk_tx_list); @@ -202,7 +203,7 @@ int xp_assign_dev(struct xsk_buff_pool *pool, if (err) goto err_unreg_pool; - if (!pool->dma_pages) { + if (!pool->dma_pages && !pool->dma_check_skip) { WARN(1, "Driver did not DMA map zero-copy buffers"); err = -EINVAL; goto err_unreg_xsk; From patchwork Mon Aug 7 12:22:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13344071 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2FE77101DF for ; Mon, 7 Aug 2023 12:22:57 +0000 (UTC) Received: from mail-pl1-x62a.google.com (mail-pl1-x62a.google.com [IPv6:2607:f8b0:4864:20::62a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46DB2E74 for ; Mon, 7 Aug 2023 05:22:47 -0700 (PDT) Received: by mail-pl1-x62a.google.com with SMTP id d9443c01a7336-1bc6bfc4b58so4829885ad.1 for ; Mon, 07 Aug 2023 05:22:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691410967; x=1692015767; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9C7bHaMSd+SY+pPB7Vtx4/6UZIJ72tWmfxoauVOSVx0=; b=l4xJiWEEbz2IEuorkINwxBVQ/5PnK9fhGmaLj2DODmbsWX93ppJtAyvJqtF6Ztvco0 qKs5yAkD9hAZSetvftFm3zxR/knZZ++YSY80dIXgMn379PWrH9U+ANOEFtJ5x6RdTS6y XYQRI5TmfzgqmA/jmahEN6XMtndm/UPo/TG3A38K7ZeoLVOpfRYn0RtvpgZ6YhmhUr1J CI+kA1hDqOrfZX6Sz9COvQCAz3m1QiMtcNQs5WrsThDatldjRHsjJs4H/HTeQfp2eoCk DWewbiMqktZxszv2kbOsjYlz7Zxvm2rJ0/8QkdTvMZdbCRocQG6aL3AflCYjgw6/LhuT Q8vA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691410967; x=1692015767; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9C7bHaMSd+SY+pPB7Vtx4/6UZIJ72tWmfxoauVOSVx0=; b=Z40IspDf6nVZ/H6qjRg7pc4dVw0MBjvs9sq2J1EBQCut1J7BPyPTFNlWQB78PYE1td GlH1VR2g4O7YsifV0cpGpjOUvMpFvLpETTDVGBnCtuppwyU1hUmYY8YdJ8omIogj3RKb pYOEfq+5eHavZvdcPBrdtM+Co7QfvCmnGfbWwdZU0ycvX45WL/JjCP/TXMRMjayK42L3 g0Iu4v0GoRv/CdU9KdzCNQtkKISsYfeX38rlkPqQWM/tFZCaOwfVMR6jF5A+WoHX5Cx5 KAQS3Fxcy5oSRiAoClevqGUyROUf3otdTY/Eq+gpWWY235y6mDvyYvwUf5RJkCVK2jcl vMRQ== X-Gm-Message-State: AOJu0YwJQdyDAYu2Bs3XeuciXEkP95GBrwkQ3gzzEqCIWIYD6jLIChat gJGO+dBUKyATLVe1oJvZKigfRQ== X-Google-Smtp-Source: AGHT+IE89np5AUQBiPFRPVGYnS82GPk320IaciVQZYSLKkUoZ+GQA8NGhOsYaOaNrp4N9CL1ZFsYag== X-Received: by 2002:a17:902:82c5:b0:1b8:6cae:4400 with SMTP id u5-20020a17090282c500b001b86cae4400mr7028201plz.37.1691410966789; Mon, 07 Aug 2023 05:22:46 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id jk21-20020a170903331500b001bbfa86ca3bsm6819846plb.78.2023.08.07.05.22.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 05:22:46 -0700 (PDT) From: Albert Huang To: Cc: Albert Huang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , netdev@vger.kernel.org (open list:NETWORKING DRIVERS), linux-kernel@vger.kernel.org (open list), bpf@vger.kernel.org (open list:XDP (eXpress Data Path)) Subject: [RFC v2 Optimizing veth xsk performance 3/9] veth: add support for send queue Date: Mon, 7 Aug 2023 20:22:38 +0800 Message-Id: <20230807122238.85463-1-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230807120434.83644-1-huangjie.albert@bytedance.com> References: <20230807120434.83644-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC in order to support native af_xdp for veth. we need support for send queue for napi tx. the upcoming patch will make use of it. Signed-off-by: Albert Huang --- drivers/net/veth.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 77e12d52ca2b..25faba879505 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -56,6 +56,11 @@ struct veth_rq_stats { struct u64_stats_sync syncp; }; +struct veth_sq_stats { + struct veth_stats vs; + struct u64_stats_sync syncp; +}; + struct veth_rq { struct napi_struct xdp_napi; struct napi_struct __rcu *napi; /* points to xdp_napi when the latter is initialized */ @@ -69,11 +74,25 @@ struct veth_rq { struct page_pool *page_pool; }; +struct veth_sq { + struct napi_struct xdp_napi; + struct net_device *dev; + struct xdp_mem_info xdp_mem; + struct veth_sq_stats stats; + u32 queue_index; + /* for xsk */ + struct { + struct xsk_buff_pool __rcu *pool; + u32 last_cpu; + } xsk; +}; + struct veth_priv { struct net_device __rcu *peer; atomic64_t dropped; struct bpf_prog *_xdp_prog; struct veth_rq *rq; + struct veth_sq *sq; unsigned int requested_headroom; }; @@ -1495,6 +1514,15 @@ static int veth_alloc_queues(struct net_device *dev) u64_stats_init(&priv->rq[i].stats.syncp); } + priv->sq = kcalloc(dev->num_tx_queues, sizeof(*priv->sq), GFP_KERNEL); + if (!priv->sq) + return -ENOMEM; + + for (i = 0; i < dev->num_tx_queues; i++) { + priv->sq[i].dev = dev; + u64_stats_init(&priv->sq[i].stats.syncp); + } + return 0; } @@ -1503,6 +1531,7 @@ static void veth_free_queues(struct net_device *dev) struct veth_priv *priv = netdev_priv(dev); kfree(priv->rq); + kfree(priv->sq); } static int veth_dev_init(struct net_device *dev) From patchwork Mon Aug 7 12:23:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13344072 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C986B101EE for ; Mon, 7 Aug 2023 12:23:45 +0000 (UTC) Received: from mail-pl1-x62b.google.com (mail-pl1-x62b.google.com [IPv6:2607:f8b0:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ECFDC128 for ; Mon, 7 Aug 2023 05:23:43 -0700 (PDT) Received: by mail-pl1-x62b.google.com with SMTP id d9443c01a7336-1bc7b25c699so589865ad.1 for ; Mon, 07 Aug 2023 05:23:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691411023; x=1692015823; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=foBrBXdOwtDKCq/cydfflnj4IbGd0R2nKQL39Yldago=; b=GnmYBiNpOf2ZKoNDZSeQ7wUkAcu+xKnDmwbHrK34Y7Ua2erUZDLoT8JJxk5fl8TGiB OoHuCCiTUoQ2wtEX7kztN7VB5MCCVQGv/wy5MNOVhivYLoAGl9+OeQx/Kw/+jDyh+IZk Xf34XG9jJ7w+QASgtv2fyUxkkyrMGyP6/I0vtcn4k1nIAQ0LStSdIMZUtZi7KPcerJJm ttyh2eo2Mr2IAudL5/+IPgdfq3sDTaPtRnlakn9I6901GtcSgiHbI4JxFOOvxNdmgD/v L3YtslWMidkywD9iDeTKaE3PmiyYyUFd8OWljozaVGFRKrWaaUvz+k4RQubOkOeKf9I2 XGDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691411023; x=1692015823; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=foBrBXdOwtDKCq/cydfflnj4IbGd0R2nKQL39Yldago=; b=M0r5t1nZ+lQ0Z5uhbbT3WerIBxmOpUMoFTFDKhCmKTldZgRe2sScH79OKb2V7QRQMv J+Xb1LMnIcXb7M78yCozgeZl3yVNGbU2RiKTTCkOcIiPKcdJ79TvLMva7tXEmswV9bBk /hvqJnpcpbsTT85wTvqv5a8o7mzB2s4vjzkJah7JT9nkmJ7ea2sFpu6tYVbASpyXo50e kXq50PTBZxU+v3KLss98Bur/WrowHYhvNRhh6SqoY6EbJQt3ha/NACzGDLVkCKy9FATn qBHGIBqaRCpuBoGTrPNocKs4ciaxravdTZI7EWdvWsVgMRSDp58xYU8LbTDf98PZHker 05Cw== X-Gm-Message-State: AOJu0YyBgVTl9BICb4HROMtT7vT0hmO6Gun4QIyGFCtSeOYNHI9h4pjw 2meYd7z4wjD8o31EpINlcYSguA== X-Google-Smtp-Source: AGHT+IHC5uCrys4OzS5COe5Y4fFinKxUPnbKTC2Fp/O9uWUOctQ/U/MnuwwTmOi5qU48U+ui/3Lqbg== X-Received: by 2002:a17:903:18d:b0:1bc:5d0:e8e8 with SMTP id z13-20020a170903018d00b001bc05d0e8e8mr8191638plg.20.1691411023337; Mon, 07 Aug 2023 05:23:43 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id u16-20020a170902e81000b001ab2b4105ddsm6766864plg.60.2023.08.07.05.23.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 05:23:42 -0700 (PDT) From: Albert Huang To: Cc: Albert Huang , =?utf-8?b?QmrDtnJuIFTDtnBl?= =?utf-8?b?bA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , netdev@vger.kernel.org (open list:XDP SOCKETS (AF_XDP)), bpf@vger.kernel.org (open list:XDP SOCKETS (AF_XDP)), linux-kernel@vger.kernel.org (open list) Subject: [RFC v2 Optimizing veth xsk performance 4/9] xsk: add xsk_tx_completed_addr function Date: Mon, 7 Aug 2023 20:23:32 +0800 Message-Id: <20230807122332.85628-1-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230807120434.83644-1-huangjie.albert@bytedance.com> References: <20230807120434.83644-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Return desc to the cq by using the descriptor address. Signed-off-by: Albert Huang --- include/net/xdp_sock_drv.h | 1 + net/xdp/xsk.c | 6 ++++++ net/xdp/xsk_queue.h | 10 ++++++++++ 3 files changed, 17 insertions(+) diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index 1f6fc8c7a84c..5220454bff5c 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -15,6 +15,7 @@ #ifdef CONFIG_XDP_SOCKETS void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries); +void xsk_tx_completed_addr(struct xsk_buff_pool *pool, u64 addr); bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc); u32 xsk_tx_peek_release_desc_batch(struct xsk_buff_pool *pool, u32 max); void xsk_tx_release(struct xsk_buff_pool *pool); diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 4f1e0599146e..b2b8aa7b0bcf 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -396,6 +396,12 @@ void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries) } EXPORT_SYMBOL(xsk_tx_completed); +void xsk_tx_completed_addr(struct xsk_buff_pool *pool, u64 addr) +{ + xskq_prod_submit_addr(pool->cq, addr); +} +EXPORT_SYMBOL(xsk_tx_completed_addr); + void xsk_tx_release(struct xsk_buff_pool *pool) { struct xdp_sock *xs; diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index 13354a1e4280..3a5e26a81dc2 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -428,6 +428,16 @@ static inline void __xskq_prod_submit(struct xsk_queue *q, u32 idx) smp_store_release(&q->ring->producer, idx); /* B, matches C */ } +static inline void xskq_prod_submit_addr(struct xsk_queue *q, u64 addr) +{ + struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring; + u32 idx = q->ring->producer; + + ring->desc[idx++ & q->ring_mask] = addr; + + __xskq_prod_submit(q, idx); +} + static inline void xskq_prod_submit(struct xsk_queue *q) { __xskq_prod_submit(q, q->cached_prod); From patchwork Mon Aug 7 12:24:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13344073 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D4CE84436 for ; Mon, 7 Aug 2023 12:25:01 +0000 (UTC) Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 620B7E53 for ; Mon, 7 Aug 2023 05:24:59 -0700 (PDT) Received: by mail-pl1-x634.google.com with SMTP id d9443c01a7336-1bc34b32785so28032835ad.3 for ; Mon, 07 Aug 2023 05:24:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691411099; x=1692015899; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tzvycUSp6fj3lhy6v0s47NdlhPCC8jzqygliYQ4HaF8=; b=kZ3jGdcXWEZ4oyg0oP0T/xcd6VEXp/7aa/nPih+VNkFf6OkCznLwQRb9/jRzCzHOCk 3Vi0RhO2IHMdpKPcwYMUEb2tKvyMWFfgmC83p57OWVM5GdbK7Yh3YguUZMlVEAzOmpb9 QaYVLP38DAub6T6A99e8dHKpv5/z/3099Ssdut5thWefm6PuSyc6y5r0mED2/XGGgsiH q7HTKm58Nd6dnoI2I9ZF1gotW+WfkyGGGKLUwJP4EBLEGjCDhCBPYri9J4TBngYufbN6 RmJwVW1DngJ7cVDGCf3Z/HGNPPArBcaE+fAAjSeprwM8uDvTYDbTJXU1ThQ5I23awoaU mr/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691411099; x=1692015899; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tzvycUSp6fj3lhy6v0s47NdlhPCC8jzqygliYQ4HaF8=; b=EiV/41EOOM20/L4G/zaGhbp3yXsggXhndRFAyO+95z7/hKGDvz82ULGbMS4AoxLoko 5zuuCWqgvN5i53ulKiDaAEULp5+4I1o7cKFC9HLmAJvzTzpH4Gzvb3iTZo5lvsWYufDw FjC7WD7HnNb0QtE3NAyW8Y4KQZjRnsyh4TOy+4bRAEMn9uzNW06PT3lBKH44eXUyeJxH ez77Wv/5vyXZhjRwsLAqodI0B5QzDOy2vZQ9h2QRtw+Ruh+VJk4hA6KX7F2ypquEmbQm wTTocPayFnQA2/nTj20EiF6CYyKgSE4j6A2yg5hGAZ7dbazq/l3UreTukPpBxkZ+7ISV LXSA== X-Gm-Message-State: AOJu0YyboJczSJHTADecdpzXZAYJ38aHJE8XaP4mM3LCanIyJ6n8YzGl xzyVk9i2POqYgQg/Y16B7+Tidw== X-Google-Smtp-Source: AGHT+IEboynI5V/8G38VpIJprkiWU7gjnmIYItz7meICAcRwBMHQ5y3CElE3+GIyHu1Z1mmgIEzXvA== X-Received: by 2002:a17:902:e84e:b0:1b6:4bbd:c3a7 with SMTP id t14-20020a170902e84e00b001b64bbdc3a7mr8900815plg.66.1691411098793; Mon, 07 Aug 2023 05:24:58 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([139.177.225.252]) by smtp.gmail.com with ESMTPSA id b10-20020a170902a9ca00b001bc16bc9f5fsm6746674plr.284.2023.08.07.05.24.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 05:24:58 -0700 (PDT) From: Albert Huang To: Cc: Albert Huang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , netdev@vger.kernel.org (open list:NETWORKING DRIVERS), linux-kernel@vger.kernel.org (open list), bpf@vger.kernel.org (open list:XDP (eXpress Data Path)) Subject: [RFC v2 Optimizing veth xsk performance 5/9] veth: use send queue tx napi to xmit xsk tx desc Date: Mon, 7 Aug 2023 20:24:47 +0800 Message-Id: <20230807122447.85725-1-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230807120434.83644-1-huangjie.albert@bytedance.com> References: <20230807120434.83644-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC use send queue tx napi to xmit xsk tx desc Signed-off-by: Albert Huang --- drivers/net/veth.c | 230 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 229 insertions(+), 1 deletion(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 25faba879505..28b891dd8dc9 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -27,6 +27,8 @@ #include #include #include +#include +#include #define DRV_NAME "veth" #define DRV_VERSION "1.0" @@ -1061,6 +1063,141 @@ static int veth_poll(struct napi_struct *napi, int budget) return done; } +static struct sk_buff *veth_build_skb(void *head, int headroom, int len, + int buflen) +{ + struct sk_buff *skb; + + skb = build_skb(head, buflen); + if (!skb) + return NULL; + + skb_reserve(skb, headroom); + skb_put(skb, len); + + return skb; +} + +static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, int budget) +{ + struct veth_priv *priv, *peer_priv; + struct net_device *dev, *peer_dev; + struct veth_stats stats = {}; + struct sk_buff *skb = NULL; + struct veth_rq *peer_rq; + struct xdp_desc desc; + int done = 0; + + dev = sq->dev; + priv = netdev_priv(dev); + peer_dev = priv->peer; + peer_priv = netdev_priv(peer_dev); + + /* todo: queue index must set before this */ + peer_rq = &peer_priv->rq[sq->queue_index]; + + /* set xsk wake up flag, to do: where to disable */ + if (xsk_uses_need_wakeup(xsk_pool)) + xsk_set_tx_need_wakeup(xsk_pool); + + while (budget-- > 0) { + unsigned int truesize = 0; + struct page *page; + void *vaddr; + void *addr; + + if (!xsk_tx_peek_desc(xsk_pool, &desc)) + break; + + addr = xsk_buff_raw_get_data(xsk_pool, desc.addr); + + /* can not hold all data in a page */ + truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + truesize += desc.len + xsk_pool->headroom; + if (truesize > PAGE_SIZE) { + xsk_tx_completed_addr(xsk_pool, desc.addr); + stats.xdp_drops++; + break; + } + + page = dev_alloc_page(); + if (!page) { + xsk_tx_completed_addr(xsk_pool, desc.addr); + stats.xdp_drops++; + break; + } + vaddr = page_to_virt(page); + + memcpy(vaddr + xsk_pool->headroom, addr, desc.len); + xsk_tx_completed_addr(xsk_pool, desc.addr); + + skb = veth_build_skb(vaddr, xsk_pool->headroom, desc.len, PAGE_SIZE); + if (!skb) { + put_page(page); + stats.xdp_drops++; + break; + } + skb->protocol = eth_type_trans(skb, peer_dev); + napi_gro_receive(&peer_rq->xdp_napi, skb); + + stats.xdp_bytes += desc.len; + done++; + } + + /* release, move consumer,and wakeup the producer */ + if (done) { + napi_schedule(&peer_rq->xdp_napi); + xsk_tx_release(xsk_pool); + } + + u64_stats_update_begin(&sq->stats.syncp); + sq->stats.vs.xdp_packets += done; + sq->stats.vs.xdp_bytes += stats.xdp_bytes; + sq->stats.vs.xdp_drops += stats.xdp_drops; + u64_stats_update_end(&sq->stats.syncp); + + return done; +} + +static int veth_poll_tx(struct napi_struct *napi, int budget) +{ + struct veth_sq *sq = container_of(napi, struct veth_sq, xdp_napi); + struct xsk_buff_pool *pool; + int done = 0; + + sq->xsk.last_cpu = smp_processor_id(); + + /* xmit for tx queue */ + rcu_read_lock(); + pool = rcu_dereference(sq->xsk.pool); + if (pool) + done = veth_xsk_tx_xmit(sq, pool, budget); + + rcu_read_unlock(); + + if (done < budget) { + /* if done < budget, the tx ring is no buffer */ + napi_complete_done(napi, done); + } + + return done; +} + +static int veth_napi_add_tx(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + int i; + + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_sq *sq = &priv->sq[i]; + + netif_napi_add(dev, &sq->xdp_napi, veth_poll_tx); + napi_enable(&sq->xdp_napi); + } + + return 0; +} + static int veth_create_page_pool(struct veth_rq *rq) { struct page_pool_params pp_params = { @@ -1153,6 +1290,19 @@ static void veth_napi_del_range(struct net_device *dev, int start, int end) } } +static void veth_napi_del_tx(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + int i; + + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_sq *sq = &priv->sq[i]; + + napi_disable(&sq->xdp_napi); + __netif_napi_del(&sq->xdp_napi); + } +} + static void veth_napi_del(struct net_device *dev) { veth_napi_del_range(dev, 0, dev->real_num_rx_queues); @@ -1360,7 +1510,7 @@ static void veth_set_xdp_features(struct net_device *dev) struct veth_priv *priv_peer = netdev_priv(peer); xdp_features_t val = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | - NETDEV_XDP_ACT_RX_SG; + NETDEV_XDP_ACT_RX_SG | NETDEV_XDP_ACT_XSK_ZEROCOPY; if (priv_peer->_xdp_prog || veth_gro_requested(peer)) val |= NETDEV_XDP_ACT_NDO_XMIT | @@ -1737,11 +1887,89 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, return err; } +static int veth_xsk_pool_enable(struct net_device *dev, struct xsk_buff_pool *pool, u16 qid) +{ + struct veth_priv *peer_priv; + struct veth_priv *priv = netdev_priv(dev); + struct net_device *peer_dev = priv->peer; + int err = 0; + + if (qid >= dev->real_num_tx_queues) + return -EINVAL; + + if (!peer_dev) + return -EINVAL; + + /* no dma, so we just skip dma skip in xsk zero copy */ + pool->dma_check_skip = true; + + peer_priv = netdev_priv(peer_dev); + + /* enable peer tx xdp here, this side + * xdp is enable by veth_xdp_set + * to do: we need to check whther this side is already enable xdp + * maybe it do not have xdp prog + */ + if (!(peer_priv->_xdp_prog) && (!veth_gro_requested(peer_dev))) { + /* peer should enable napi*/ + err = veth_napi_enable(peer_dev); + if (err) + return err; + } + + /* Here is already protected by rtnl_lock, so rcu_assign_pointer + * is safe. + */ + rcu_assign_pointer(priv->sq[qid].xsk.pool, pool); + + veth_napi_add_tx(dev); + + return err; +} + +static int veth_xsk_pool_disable(struct net_device *dev, u16 qid) +{ + struct veth_priv *peer_priv; + struct veth_priv *priv = netdev_priv(dev); + struct net_device *peer_dev = priv->peer; + int err = 0; + + if (qid >= dev->real_num_tx_queues) + return -EINVAL; + + if (!peer_dev) + return -EINVAL; + + peer_priv = netdev_priv(peer_dev); + + /* to do: this may be failed */ + if (!(peer_priv->_xdp_prog) && (!veth_gro_requested(peer_dev))) { + /* disable peer napi */ + veth_napi_del(peer_dev); + } + + veth_napi_del_tx(dev); + + rcu_assign_pointer(priv->sq[qid].xsk.pool, NULL); + return err; +} + +/* this is for setup xdp */ +static int veth_xsk_pool_setup(struct net_device *dev, struct netdev_bpf *xdp) +{ + if (xdp->xsk.pool) + return veth_xsk_pool_enable(dev, xdp->xsk.pool, xdp->xsk.queue_id); + else + return veth_xsk_pool_disable(dev, xdp->xsk.queue_id); +} + static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp) { switch (xdp->command) { case XDP_SETUP_PROG: return veth_xdp_set(dev, xdp->prog, xdp->extack); + case XDP_SETUP_XSK_POOL: + return veth_xsk_pool_setup(dev, xdp); default: return -EINVAL; } From patchwork Mon Aug 7 12:25:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13344074 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 61A68101DE for ; Mon, 7 Aug 2023 12:25:33 +0000 (UTC) Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com [IPv6:2607:f8b0:4864:20::52b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3D344E70 for ; Mon, 7 Aug 2023 05:25:31 -0700 (PDT) Received: by mail-pg1-x52b.google.com with SMTP id 41be03b00d2f7-564b8ea94c1so1868465a12.1 for ; Mon, 07 Aug 2023 05:25:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691411131; x=1692015931; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ICV3oSZrEepyfkdFh294C2oc/HDgt3q4tnvL6u1XOSo=; b=b6g1SIob3LrwBUcULHenYei5xyOCHfj5Kh/mGyOEaHiip8n13U8ZcRqE1k4T3MDRRM x2lyr1DY3nGrczG7wBhPI60UBpEUdI9Yp4s5Q5XKKOaIvcIYIfsq0SHygtuZHrrI7/sJ Dlw1F7V3yl9asBUH5U+jnwECabs8oBnRRvhpSAs5tbaWVm8eZsL1DqJT3hUePtkjCdh7 bvda0PHpxMRijzygp+Y8ikxGAeYHfTAnRmjG5DASvvWfIX12EvDGN49oe5Z6lsQwmrxz JgtNTjH/Frz9Dr2Sj/uWIuSuX/buM8Vv/M5TcyzdDutZU8EX80YmYzJCZuv/0ObD+jQG sWaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691411131; x=1692015931; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ICV3oSZrEepyfkdFh294C2oc/HDgt3q4tnvL6u1XOSo=; b=CpQ7NZb3ddFp5msg/YQ+FJSnvAAwlyxUqXbbv4F5Zyb909IA1E9SZHvAEmCJaKzeV1 KemSrThu0f+IgEVut9cREZdh1h1ORs1p6gmPh702UUI3K373Y/L8KYvoefmFVcNNnD91 UFHvs5hJgDSUjDb6PBzSKnRan0ZeH37n46gCRiP6fGTP5SncJkGb+Q/xZNJKpAJSssN2 mIcN5Wmthvmop9BYvbAElNK+if/meXI0CsNY+wnuDcHdmJUKbpY9awK3R9AcTNK3N5M2 1HZ7nvu3EMVxO8//tISrcsMTEwBro1eZetDT5ea9r92gwv11z2Yqskf26z2mb7ka1C46 FiuA== X-Gm-Message-State: AOJu0YwrRzI7Ncw2vM5yXfzgMP8vGUemmF8D3PrWmPqpOkyPc/aZSR1P zOn7q0beazaSglTbb0Dnc+6mXg== X-Google-Smtp-Source: AGHT+IHci0yjfkSv+HxTsOnZgFme2ZxQQuK4ytn/nitEZzci3avLuAUQEYY9VIvr9bZB5W1yGy+2Gg== X-Received: by 2002:a17:90b:692:b0:268:6e30:600f with SMTP id m18-20020a17090b069200b002686e30600fmr7060392pjz.32.1691411130684; Mon, 07 Aug 2023 05:25:30 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id o14-20020a17090a4b4e00b00268b439a0cbsm5942391pjl.23.2023.08.07.05.25.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 05:25:30 -0700 (PDT) From: Albert Huang To: Cc: Albert Huang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , netdev@vger.kernel.org (open list:NETWORKING DRIVERS), linux-kernel@vger.kernel.org (open list), bpf@vger.kernel.org (open list:XDP (eXpress Data Path)) Subject: [RFC v2 Optimizing veth xsk performance 6/9] veth: add ndo_xsk_wakeup callback for veth Date: Mon, 7 Aug 2023 20:25:22 +0800 Message-Id: <20230807122522.85762-1-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230807120434.83644-1-huangjie.albert@bytedance.com> References: <20230807120434.83644-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC add ndo_xsk_wakeup callback for veth, this is used to wakeup napi tx. Signed-off-by: Albert Huang --- drivers/net/veth.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 28b891dd8dc9..ac78d6a87416 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -1805,6 +1805,44 @@ static void veth_set_rx_headroom(struct net_device *dev, int new_hr) rcu_read_unlock(); } +static void veth_xsk_remote_trigger_napi(void *info) +{ + struct veth_sq *sq = info; + + napi_schedule(&sq->xdp_napi); +} + +static int veth_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag) +{ + struct veth_priv *priv; + struct veth_sq *sq; + u32 last_cpu, cur_cpu; + + if (!netif_running(dev)) + return -ENETDOWN; + + if (qid >= dev->real_num_rx_queues) + return -EINVAL; + + priv = netdev_priv(dev); + sq = &priv->sq[qid]; + + if (napi_if_scheduled_mark_missed(&sq->xdp_napi)) + return 0; + + last_cpu = sq->xsk.last_cpu; + cur_cpu = get_cpu(); + + /* raise a napi */ + if (last_cpu == cur_cpu) + napi_schedule(&sq->xdp_napi); + else + smp_call_function_single(last_cpu, veth_xsk_remote_trigger_napi, sq, true); + + put_cpu(); + return 0; +} + static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, struct netlink_ext_ack *extack) { @@ -2019,6 +2057,7 @@ static const struct net_device_ops veth_netdev_ops = { .ndo_set_rx_headroom = veth_set_rx_headroom, .ndo_bpf = veth_xdp, .ndo_xdp_xmit = veth_ndo_xdp_xmit, + .ndo_xsk_wakeup = veth_xsk_wakeup, .ndo_get_peer_dev = veth_peer_dev, }; From patchwork Mon Aug 7 12:26:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13344082 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 71E3C4436 for ; Mon, 7 Aug 2023 12:26:29 +0000 (UTC) Received: from mail-pf1-x430.google.com (mail-pf1-x430.google.com [IPv6:2607:f8b0:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3C98B128 for ; Mon, 7 Aug 2023 05:26:27 -0700 (PDT) Received: by mail-pf1-x430.google.com with SMTP id d2e1a72fcca58-686b9964ae2so3025676b3a.3 for ; Mon, 07 Aug 2023 05:26:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691411186; x=1692015986; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VNxERTokHn2QOzCO7nu1QH5j2GU8N7cGLZ9USDe1a7I=; b=UxOEpaG2o+xqx/J3NAFr9T/TXCrWhWtpPYaNl+rxWd6X06ltJfGQuZyB+mySSCYSOH OiBmgM1AndvMiLctAaeJGlS8Nn3YPNEnH3rXxClTohGrSXieLByWmK064JmlnmJzTOJt KKYC0h4iWIwAua6np92zxX6F9LG30TlbRT7XV2GLSrzz8L4quPJ4kcYB9PUWUwGLrzSX UQEdcqd9F687AKaceiRKcM9L7hxt5agKkQbKQkV8TKJO6EnR3PMkUHEJ825nfROlZeJR lXSUzwtEqxNLlKTYMCZ04bZkxWvNoAVqgwpuJvFols1+9RpvLvplwJyVfgpJxmcOSges /0dw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691411186; x=1692015986; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VNxERTokHn2QOzCO7nu1QH5j2GU8N7cGLZ9USDe1a7I=; b=Yb6Zg505Bx260BldfoXMw8T4OjpqoY0GCBs1RKmJP745ZZlVGFl+Xysb2RpAI8+JIw dZ2Up5j1wv04YEPnzX1E6rlHwyp9syjTT0FqQu0V+MmucexPQrEyk0XNX+j33L9VjLr8 K+U821/1X3DQdtjX74P3QbhG5MYQ5fWYzRITuEHsWfAUM+KR26rV4TcfAk8C15AZqNFR fM/HTIcn3GPcJik8MAW1FovqS3SsFVIUjR5Fc6lIMKRUpOSK/de6MVFMI3xazVQJX/bW Xk928vRuNX8yzxRAfHoYSRRtZpHkPnsMZ22B6ZcrpH4ujjTpreVAtrJtidcYa3GIPulI C35g== X-Gm-Message-State: AOJu0YxU5namwuu7rmsZvKmbMgyvFxqN8X7y9SZsdXMgiJ9gKhnxJ71N 8KMCJ6bTWz54jIOydVOyvmbW5Q== X-Google-Smtp-Source: AGHT+IGkLvXck+BgnTaLCKmOThWuPtHwOWLg4xPo4gzumYuiSTN4y35ZcW1uUAKYtat1ZPHip9hoIg== X-Received: by 2002:a05:6a00:a17:b0:666:b0e7:10ea with SMTP id p23-20020a056a000a1700b00666b0e710eamr9238745pfh.31.1691411186600; Mon, 07 Aug 2023 05:26:26 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id v19-20020aa78093000000b00672401787c6sm6060354pff.109.2023.08.07.05.26.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 05:26:25 -0700 (PDT) From: Albert Huang To: Cc: Albert Huang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , netdev@vger.kernel.org (open list:NETWORKING DRIVERS), linux-kernel@vger.kernel.org (open list), bpf@vger.kernel.org (open list:XDP (eXpress Data Path)) Subject: [RFC v2 Optimizing veth xsk performance 8/9] veth: af_xdp tx batch support for ipv4 udp Date: Mon, 7 Aug 2023 20:26:17 +0800 Message-Id: <20230807122617.85882-1-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230807120434.83644-1-huangjie.albert@bytedance.com> References: <20230807120434.83644-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC A typical topology is shown below: veth<--------veth-peer 1 | |2 | bridge<------->eth0(such as mlnx5 NIC) If you use af_xdp to send packets from veth to a physical NIC, it needs to go through some software paths, so we can refer to the implementation of kernel GSO. When af_xdp sends packets out from veth, consider aggregating packets and send a large packet from the veth virtual NIC to the physical NIC. performance:(test weth libxdp lib) AF_XDP without batch : 480 Kpps (with ksoftirqd 100% cpu) AF_XDP with batch : 1.5 Mpps (with ksoftirqd 15% cpu) With af_xdp batch, the libxdp user-space program reaches a bottleneck. Therefore, the softirq did not reach the limit. Signed-off-by: Albert Huang --- drivers/net/veth.c | 408 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 387 insertions(+), 21 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index ac78d6a87416..70489d017b51 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -29,6 +29,7 @@ #include #include #include +#include #define DRV_NAME "veth" #define DRV_VERSION "1.0" @@ -103,6 +104,23 @@ struct veth_xdp_tx_bq { unsigned int count; }; +struct veth_batch_tuple { + __u8 protocol; + __be32 saddr; + __be32 daddr; + __be16 source; + __be16 dest; + __be16 batch_size; + __be16 batch_segs; + bool batch_enable; + bool batch_flush; +}; + +struct veth_seg_info { + u32 segs; + u64 desc[] ____cacheline_aligned_in_smp; +}; + /* * ethtool interface */ @@ -1078,11 +1096,340 @@ static struct sk_buff *veth_build_skb(void *head, int headroom, int len, return skb; } +static void veth_xsk_destruct_skb(struct sk_buff *skb) +{ + struct skb_shared_info *si = skb_shinfo(skb); + struct xsk_buff_pool *pool = (struct xsk_buff_pool *)si->destructor_arg_xsk_pool; + struct veth_seg_info *seg_info = (struct veth_seg_info *)si->destructor_arg; + unsigned long flags; + u32 index = 0; + u64 addr; + + /* release cq */ + spin_lock_irqsave(&pool->cq_lock, flags); + for (index = 0; index < seg_info->segs; index++) { + addr = (u64)(long)seg_info->desc[index]; + xsk_tx_completed_addr(pool, addr); + } + spin_unlock_irqrestore(&pool->cq_lock, flags); + + kfree(seg_info); + si->destructor_arg = NULL; + si->destructor_arg_xsk_pool = NULL; +} + +static struct sk_buff *veth_build_gso_head_skb(struct net_device *dev, + char *buff, u32 tot_len, + u32 headroom, u32 iph_len, + u32 th_len) +{ + struct sk_buff *skb = NULL; + int err = 0; + + skb = alloc_skb(tot_len, GFP_KERNEL); + if (unlikely(!skb)) + return NULL; + + /* header room contains the eth header */ + skb_reserve(skb, headroom - ETH_HLEN); + skb_put(skb, ETH_HLEN + iph_len + th_len); + skb_shinfo(skb)->gso_segs = 0; + + err = skb_store_bits(skb, 0, buff, ETH_HLEN + iph_len + th_len); + if (unlikely(err)) { + kfree_skb(skb); + return NULL; + } + + skb->protocol = eth_type_trans(skb, dev); + skb->network_header = skb->mac_header + ETH_HLEN; + skb->transport_header = skb->network_header + iph_len; + skb->ip_summed = CHECKSUM_PARTIAL; + + return skb; +} + +/* only ipv4 udp match + * to do: tcp and ipv6 + */ +static inline bool veth_segment_match(struct veth_batch_tuple *tuple, + struct iphdr *iph, struct udphdr *udph) +{ + if (tuple->protocol == iph->protocol && + tuple->saddr == iph->saddr && + tuple->daddr == iph->daddr && + tuple->source == udph->source && + tuple->dest == udph->dest && + tuple->batch_size == ntohs(udph->len)) { + tuple->batch_flush = false; + return true; + } + + tuple->batch_flush = true; + return false; +} + +static inline void veth_tuple_init(struct veth_batch_tuple *tuple, + struct iphdr *iph, struct udphdr *udph) +{ + tuple->protocol = iph->protocol; + tuple->saddr = iph->saddr; + tuple->daddr = iph->daddr; + tuple->source = udph->source; + tuple->dest = udph->dest; + tuple->batch_flush = false; + tuple->batch_size = ntohs(udph->len); + tuple->batch_segs = 0; +} + +static inline bool veth_batch_ip_check_v4(struct iphdr *iph, u32 len) +{ + if (len <= (ETH_HLEN + sizeof(*iph))) + return false; + + if (iph->ihl < 5 || iph->version != 4 || len < (iph->ihl * 4 + ETH_HLEN)) + return false; + + return true; +} + +static struct sk_buff *veth_build_skb_batch_udp(struct net_device *dev, + struct xsk_buff_pool *pool, + struct xdp_desc *desc, + struct veth_batch_tuple *tuple, + struct sk_buff *prev_skb) +{ + u32 hr, len, ts, index, iph_len, th_len, data_offset, data_len, tot_len; + struct veth_seg_info *seg_info; + void *buffer; + struct udphdr *udph; + struct iphdr *iph; + struct sk_buff *skb; + struct page *page; + u32 seg_len = 0; + int hh_len = 0; + u64 addr; + + addr = desc->addr; + len = desc->len; + + /* l2 reserved len */ + hh_len = LL_RESERVED_SPACE(dev); + hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(hh_len)); + + /* data points to eth header */ + buffer = (unsigned char *)xsk_buff_raw_get_data(pool, addr); + + iph = (struct iphdr *)(buffer + ETH_HLEN); + iph_len = iph->ihl * 4; + + udph = (struct udphdr *)(buffer + ETH_HLEN + iph_len); + th_len = sizeof(struct udphdr); + + if (tuple->batch_flush) + veth_tuple_init(tuple, iph, udph); + + ts = pool->unaligned ? len : pool->chunk_size; + + data_offset = offset_in_page(buffer) + ETH_HLEN + iph_len + th_len; + data_len = len - (ETH_HLEN + iph_len + th_len); + + /* head is null or this is a new 5 tuple */ + if (!prev_skb || !veth_segment_match(tuple, iph, udph)) { + tot_len = hr + iph_len + th_len; + skb = veth_build_gso_head_skb(dev, buffer, tot_len, hr, iph_len, th_len); + if (!skb) { + /* to do: handle here for skb */ + return NULL; + } + + /* store information for gso */ + seg_len = struct_size(seg_info, desc, MAX_SKB_FRAGS); + seg_info = kmalloc(seg_len, GFP_KERNEL); + if (!seg_info) { + /* to do */ + kfree_skb(skb); + return NULL; + } + } else { + skb = prev_skb; + skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4 | SKB_GSO_PARTIAL; + skb_shinfo(skb)->gso_size = data_len; + skb->ip_summed = CHECKSUM_PARTIAL; + + /* max segment is MAX_SKB_FRAGS */ + if (skb_shinfo(skb)->gso_segs >= MAX_SKB_FRAGS - 1) + tuple->batch_flush = true; + + seg_info = (struct veth_seg_info *)skb_shinfo(skb)->destructor_arg; + } + + /* offset in umem pool buffer */ + addr = buffer - pool->addrs; + + /* get the page of the desc */ + page = pool->umem->pgs[addr >> PAGE_SHIFT]; + + /* in order to avoid to get freed by kfree_skb */ + get_page(page); + + /* desc.data can not hold in two */ + skb_fill_page_desc(skb, skb_shinfo(skb)->gso_segs, page, data_offset, data_len); + + skb->len += data_len; + skb->data_len += data_len; + skb->truesize += ts; + skb->dev = dev; + + /* later we will support gso for this */ + index = skb_shinfo(skb)->gso_segs; + seg_info->desc[index] = desc->addr; + seg_info->segs = ++index; + skb_shinfo(skb)->gso_segs++; + + skb_shinfo(skb)->destructor_arg = (void *)(long)seg_info; + skb_shinfo(skb)->destructor_arg_xsk_pool = (void *)(long)pool; + skb->destructor = veth_xsk_destruct_skb; + + /* to do: + * add skb to sock. may be there is no need to do for this + * and this might be multiple xsk sockets involved, so it's + * difficult to determine which socket is sending the data. + * refcount_add(ts, &xs->sk.sk_wmem_alloc); + */ + return skb; +} + +static inline struct sk_buff *veth_build_skb_def(struct net_device *dev, + struct xsk_buff_pool *pool, struct xdp_desc *desc) +{ + struct sk_buff *skb = NULL; + struct page *page; + void *buffer; + void *vaddr; + + page = dev_alloc_page(); + if (!page) + return NULL; + + buffer = (unsigned char *)xsk_buff_raw_get_data(pool, desc->addr); + + vaddr = page_to_virt(page); + memcpy(vaddr + pool->headroom, buffer, desc->len); + skb = veth_build_skb(vaddr, pool->headroom, desc->len, PAGE_SIZE); + if (!skb) { + put_page(page); + return NULL; + } + + skb->protocol = eth_type_trans(skb, dev); + + return skb; +} + +/* To call the following function, the following conditions must be met: + * 1.The data packet must be a standard Ethernet data packet + * 2. Data packets support batch sending + */ +static inline struct sk_buff *veth_build_skb_batch_v4(struct net_device *dev, + struct xsk_buff_pool *pool, + struct xdp_desc *desc, + struct veth_batch_tuple *tuple, + struct sk_buff *prev_skb) +{ + struct iphdr *iph; + void *buffer; + u64 addr; + + addr = desc->addr; + buffer = (unsigned char *)xsk_buff_raw_get_data(pool, addr); + iph = (struct iphdr *)(buffer + ETH_HLEN); + if (!veth_batch_ip_check_v4(iph, desc->len)) + goto normal; + + switch (iph->protocol) { + case IPPROTO_UDP: + return veth_build_skb_batch_udp(dev, pool, desc, tuple, prev_skb); + default: + break; + } +normal: + tuple->batch_enable = false; + return veth_build_skb_def(dev, pool, desc); +} + +/* Zero copy needs to meet the following conditions: + * 1. The data content of tx desc must be within one page + * 2、the tx desc must support batch xmit, which seted by userspace + */ +static inline bool veth_batch_desc_check(void *buff, u32 len) +{ + u32 offset; + + offset = offset_in_page(buff); + if (PAGE_SIZE - offset < len) + return false; + + return true; +} + +/* here must be a ipv4 or ipv6 packet */ +static inline struct sk_buff *veth_build_skb_batch(struct net_device *dev, + struct xsk_buff_pool *pool, + struct xdp_desc *desc, + struct veth_batch_tuple *tuple, + struct sk_buff *prev_skb) +{ + const struct ethhdr *eth; + void *buffer; + + buffer = xsk_buff_raw_get_data(pool, desc->addr); + if (!veth_batch_desc_check(buffer, desc->len)) + goto normal; + + eth = (struct ethhdr *)buffer; + switch (ntohs(eth->h_proto)) { + case ETH_P_IP: + tuple->batch_enable = true; + return veth_build_skb_batch_v4(dev, pool, desc, tuple, prev_skb); + /* to do: not support yet, just build skb, no batch */ + case ETH_P_IPV6: + fallthrough; + default: + break; + } + +normal: + tuple->batch_flush = false; + tuple->batch_enable = false; + return veth_build_skb_def(dev, pool, desc); +} + +/* just support ipv4 udp batch + * to do: ipv4 tcp and ipv6 + */ +static inline void veth_skb_batch_checksum(struct sk_buff *skb) +{ + struct iphdr *iph = ip_hdr(skb); + struct udphdr *uh = udp_hdr(skb); + int ip_tot_len = skb->len; + int udp_len = skb->len - (skb->transport_header - skb->network_header); + + iph->tot_len = htons(ip_tot_len); + ip_send_check(iph); + uh->len = htons(udp_len); + uh->check = 0; + + udp4_hwcsum(skb, iph->saddr, iph->daddr); +} + static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, int budget) { struct veth_priv *priv, *peer_priv; struct net_device *dev, *peer_dev; + struct veth_batch_tuple tuple; struct veth_stats stats = {}; + struct sk_buff *prev_skb = NULL; struct sk_buff *skb = NULL; struct veth_rq *peer_rq; struct xdp_desc desc; @@ -1093,24 +1440,23 @@ static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, peer_dev = priv->peer; peer_priv = netdev_priv(peer_dev); - /* todo: queue index must set before this */ + /* queue_index set in napi enable + * to do:may be we should select rq by 5-tuple or hash + */ peer_rq = &peer_priv->rq[sq->queue_index]; + memset(&tuple, 0, sizeof(tuple)); + /* set xsk wake up flag, to do: where to disable */ if (xsk_uses_need_wakeup(xsk_pool)) xsk_set_tx_need_wakeup(xsk_pool); while (budget-- > 0) { unsigned int truesize = 0; - struct page *page; - void *vaddr; - void *addr; if (!xsk_tx_peek_desc(xsk_pool, &desc)) break; - addr = xsk_buff_raw_get_data(xsk_pool, desc.addr); - /* can not hold all data in a page */ truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); truesize += desc.len + xsk_pool->headroom; @@ -1120,30 +1466,50 @@ static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, break; } - page = dev_alloc_page(); - if (!page) { + skb = veth_build_skb_batch(peer_dev, xsk_pool, &desc, &tuple, prev_skb); + if (!skb) { + stats.rx_drops++; xsk_tx_completed_addr(xsk_pool, desc.addr); - stats.xdp_drops++; - break; + if (prev_skb != skb) { + napi_gro_receive(&peer_rq->xdp_napi, prev_skb); + prev_skb = NULL; + } + continue; } - vaddr = page_to_virt(page); - - memcpy(vaddr + xsk_pool->headroom, addr, desc.len); - xsk_tx_completed_addr(xsk_pool, desc.addr); - skb = veth_build_skb(vaddr, xsk_pool->headroom, desc.len, PAGE_SIZE); - if (!skb) { - put_page(page); - stats.xdp_drops++; - break; + if (!tuple.batch_enable) { + xsk_tx_completed_addr(xsk_pool, desc.addr); + /* flush the prev skb first to avoid out of order */ + if (prev_skb != skb && prev_skb) { + veth_skb_batch_checksum(prev_skb); + napi_gro_receive(&peer_rq->xdp_napi, prev_skb); + prev_skb = NULL; + } + napi_gro_receive(&peer_rq->xdp_napi, skb); + skb = NULL; + } else { + if (prev_skb && tuple.batch_flush) { + veth_skb_batch_checksum(prev_skb); + napi_gro_receive(&peer_rq->xdp_napi, prev_skb); + if (prev_skb == skb) + prev_skb = skb = NULL; + else + prev_skb = skb; + } else { + prev_skb = skb; + } } - skb->protocol = eth_type_trans(skb, peer_dev); - napi_gro_receive(&peer_rq->xdp_napi, skb); stats.xdp_bytes += desc.len; done++; } + /* means there is a skb need to send to peer_rq (batch)*/ + if (skb) { + veth_skb_batch_checksum(skb); + napi_gro_receive(&peer_rq->xdp_napi, skb); + } + /* release, move consumer,and wakeup the producer */ if (done) { napi_schedule(&peer_rq->xdp_napi); From patchwork Mon Aug 7 12:26:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13344105 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A193D101CE for ; Mon, 7 Aug 2023 12:26:50 +0000 (UTC) Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3439C130 for ; Mon, 7 Aug 2023 05:26:49 -0700 (PDT) Received: by mail-pl1-x632.google.com with SMTP id d9443c01a7336-1bc34b32785so28046895ad.3 for ; Mon, 07 Aug 2023 05:26:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691411208; x=1692016008; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DZBR/L7n+c0K4+2jvj2qF9IKHrGGr2P5oMHLWIibsdA=; b=dngqcxXkDTjUguZrMGIXCZRSIUUg9AZAyokTIPWxIrcInPtG7Oom8LrZYQCNQfAxfM HfuxYkkNZ92dsmzFnU31gFlDgQ0QnbwdGfT9Es48c0ImSHzs/d0j7HH9+TEWMgk9Tr8i iKjXU4rtNS7XpURqDZLWy7NBO1O1V8h1eQ4pgieMGX4/yDGfQz7qyaIZ49ua5sPJbjxy 4LJtbYzALFrjnewrODhYVDl5AMUkle2pxD9GEFBQSPMvXnNEY61rNykWv5GjFEcfUFLo KiPxWohvcAjtUDfEA2/xHi2IoyZpyJZzi3/kisTUicQ9gqTEhWgVrhSUXP1Mt7ljP3Qm 6Jkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691411208; x=1692016008; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DZBR/L7n+c0K4+2jvj2qF9IKHrGGr2P5oMHLWIibsdA=; b=Cw01MU/rdwFqkOp1WfPpr46DBmzg2SvqW1EJFjxOTUH3ClPIlKmpqvXfilqL1DHWQa VEaM7ygcJy0eO6XQAIX7aNozrKshhs5BQbIet2YxL1MkpVxn0KOsb3Mr+vzk9uGxCyrc FYVixy484NVWokl0iZKXjkM5LKow4EYMkcjTAQJyga9Qneq/Whzsf7jgrHInWkzUpYdc ls4JaGSvvA7KWW1ZnSgDf2QyGnUKCNRPOj5SHDCtLkbb761NbMcfSxKQgi/OE4o2K7zZ ShpxJSsM9GNMr9u3HVQdZjbhOAy3fGSfUtSG/JxaBDRJ6FMTpfi79XDOm0t1P9T61xxJ TLtQ== X-Gm-Message-State: AOJu0Yx842FwWj72kgofLytafTlGbVads9QO383PsRXfMSNnPhr12wk9 PqVTPGAO0TnOx/hi9yF0S9TnZg== X-Google-Smtp-Source: AGHT+IE0UTxZOMHvSuq9wsKJiCX3RPP6CxfzwfTLYdpnBG8UJD0b3bu3kQMZ6+Be3716D2hcPya8Yg== X-Received: by 2002:a17:903:1cb:b0:1b8:2ba0:c9c0 with SMTP id e11-20020a17090301cb00b001b82ba0c9c0mr8983752plh.59.1691411208720; Mon, 07 Aug 2023 05:26:48 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id m3-20020a170902768300b001b8a3dd5a4asm6792670pll.283.2023.08.07.05.26.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 05:26:48 -0700 (PDT) From: Albert Huang To: Cc: Albert Huang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , netdev@vger.kernel.org (open list:NETWORKING DRIVERS), linux-kernel@vger.kernel.org (open list) Subject: [RFC v2 Optimizing veth xsk performance 9/9] veth: add support for AF_XDP tx need_wakup feature Date: Mon, 7 Aug 2023 20:26:40 +0800 Message-Id: <20230807122641.85940-1-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230807120434.83644-1-huangjie.albert@bytedance.com> References: <20230807120434.83644-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC this patch only support for tx need_wakup feature. Signed-off-by: Albert Huang --- drivers/net/veth.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 70489d017b51..7c60c64ef10b 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -1447,9 +1447,9 @@ static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, memset(&tuple, 0, sizeof(tuple)); - /* set xsk wake up flag, to do: where to disable */ + /* clear xsk wake up flag */ if (xsk_uses_need_wakeup(xsk_pool)) - xsk_set_tx_need_wakeup(xsk_pool); + xsk_clear_tx_need_wakeup(xsk_pool); while (budget-- > 0) { unsigned int truesize = 0; @@ -1539,12 +1539,15 @@ static int veth_poll_tx(struct napi_struct *napi, int budget) if (pool) done = veth_xsk_tx_xmit(sq, pool, budget); - rcu_read_unlock(); - if (done < budget) { + /* set xsk wake up flag */ + if (xsk_uses_need_wakeup(pool)) + xsk_set_tx_need_wakeup(pool); + /* if done < budget, the tx ring is no buffer */ napi_complete_done(napi, done); } + rcu_read_unlock(); return done; }