From patchwork Tue Aug 8 03:19:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13345658 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8D9FA1FD6 for ; Tue, 8 Aug 2023 03:20:30 +0000 (UTC) Received: from mail-pl1-x636.google.com (mail-pl1-x636.google.com [IPv6:2607:f8b0:4864:20::636]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A3F5710E7 for ; Mon, 7 Aug 2023 20:20:28 -0700 (PDT) Received: by mail-pl1-x636.google.com with SMTP id d9443c01a7336-1bbc06f830aso35113285ad.0 for ; Mon, 07 Aug 2023 20:20:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691464828; x=1692069628; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=uLQYdihKrZnqzA8pyJFR0KmRfev22Y9Nk5IK1CsNWSQ=; b=Kl4mmZ6wC30QI0KYvWfqwsTsY4ihiWC63YfNOwTtHsCTiPRlVcI029sp+MJOROKfoo 2ZSjwgqwyuqMHwmhreAo6j9KGFeqiP4JTgT+5eIfWSoSekMcFAsja8NYdATAxJRAzZha LgHGf3BWqBvWvn5Ibdk2tZLiEfEVgU7EokhG1/IOeN7a4Al8TjQt7K1s0XC5mN196l4D ObsT7B1ydnVq2hD/R+PJ3GIdyMv70DAsQoTfRJJk0ZiyITCwrdg2BpYb2NUERKzwD37q 7WVSfkeOaG8pIqpShLoRNDtTBASpWi86pCdSd4Sb5itRrV6POLS4x2CKUFvjHQ28tKtd jk2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691464828; x=1692069628; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uLQYdihKrZnqzA8pyJFR0KmRfev22Y9Nk5IK1CsNWSQ=; b=htrye+LsW0ASAQjrAPx0iEnLb3sao8aAoDLC2oyYUZr6pHbVjnEyoz9JRDeAPJuj3O HvPF2BvuS/JSAK3H/i4SecXYkEOBwY3C3sQgN6/PJnM7xrk6G/ZqLZWxgU92CH8gsEAI Kz2FNXwB2qUPbUFYo9jebY0Ex9C2EQEZTVJo1qpm8l14q6n7YlaTpi05Yn9BOeEgMkog Xbcg+J9kzW/VBp+jOn3UPb76g8zX2lYvbDrc7uFW4anuCJbv2Oh4KiuNho2yfcu+90yw JcZlyEsaanvND/VcbxXJ6mPq75oqnQbxxYkn30OmtBCe8SDqkVTsuKZcRVbxTT6/27jZ RL0g== X-Gm-Message-State: AOJu0Yw15duGL68HlroPaWVkL5LBk88OY7+VQTFEBC7S8y9mrz5bEO43 1NBEXm7OAvbUs+MtZ9tbGWdqvw== X-Google-Smtp-Source: AGHT+IGLpz+58KIOin27q8snOr3n0kQTemVujLRn7G1yFjy5FFK/EozRt3B6Cq5rzwpxMgPHKWxaaw== X-Received: by 2002:a17:902:d4c8:b0:1bc:1e17:6d70 with SMTP id o8-20020a170902d4c800b001bc1e176d70mr11761596plg.24.1691464827688; Mon, 07 Aug 2023 20:20:27 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c10d00b001b896686c78sm7675800pli.66.2023.08.07.20.20.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 20:20:27 -0700 (PDT) From: Albert Huang To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: Albert Huang , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , "open list:NETWORKING DRIVERS" , open list , "open list:XDP (eXpress Data Path)" Subject: [RFC v3 Optimizing veth xsk performance 1/9] veth: Implement ethtool's get_ringparam() callback Date: Tue, 8 Aug 2023 11:19:05 +0800 Message-Id: <20230808031913.46965-2-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230808031913.46965-1-huangjie.albert@bytedance.com> References: <20230808031913.46965-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC some xsk library calls get_ringparam() API to get the queue length to init the xsk umem. Implement that in veth so those scenarios can work properly. Signed-off-by: Albert Huang --- drivers/net/veth.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 614f3e3efab0..77e12d52ca2b 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -255,6 +255,17 @@ static void veth_get_channels(struct net_device *dev, static int veth_set_channels(struct net_device *dev, struct ethtool_channels *ch); +static void veth_get_ringparam(struct net_device *dev, + struct ethtool_ringparam *ring, + struct kernel_ethtool_ringparam *kernel_ring, + struct netlink_ext_ack *extack) +{ + ring->rx_max_pending = VETH_RING_SIZE; + ring->tx_max_pending = VETH_RING_SIZE; + ring->rx_pending = VETH_RING_SIZE; + ring->tx_pending = VETH_RING_SIZE; +} + static const struct ethtool_ops veth_ethtool_ops = { .get_drvinfo = veth_get_drvinfo, .get_link = ethtool_op_get_link, @@ -265,6 +276,7 @@ static const struct ethtool_ops veth_ethtool_ops = { .get_ts_info = ethtool_op_get_ts_info, .get_channels = veth_get_channels, .set_channels = veth_set_channels, + .get_ringparam = veth_get_ringparam, }; /* general routines */ From patchwork Tue Aug 8 03:19:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13345659 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 80BD517FD for ; Tue, 8 Aug 2023 03:20:38 +0000 (UTC) Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A32D010DC for ; Mon, 7 Aug 2023 20:20:36 -0700 (PDT) Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-1bbff6b2679so33589295ad.1 for ; Mon, 07 Aug 2023 20:20:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691464836; x=1692069636; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CSkaRhEoax/kU+S7BeIv84prtFC3VRpkmwXx3cyNhW4=; b=d6NExLM8tFgWdgX9G9vZowhLgc1yvXqs7IkLMLHXraFw++XbkXxgetWnrYgoDnH2PL bKoG+iR2qZdyLwagLCZt6X9V4gshueZ4JBNkk7hSH9GzbbXtE6DV8cFw2t6L291TW3Bl RkrxAU1mOtbcr70fK4bljLdaAEYWd4A5hr9wZ96t2zuuh/S5KsRZqo4l/VpAuTJk/0th pUM4uhplW0joZl356oAtpSoeLbDUed2TyxXYGQvRA20cg/tISFQFjAtUbYl0jgSapqG8 fJpW3L2C0NbOqpLGIqYRwXCjCJ31gazngAztABsx8PEOY9U9Ou7Yy/RRiZiEIIQEcmLC ZkHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691464836; x=1692069636; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CSkaRhEoax/kU+S7BeIv84prtFC3VRpkmwXx3cyNhW4=; b=Em2m7eXwbkBYiVVgHZaJpI7fxQkXIWG8NzbuQmqj6v7h73W7oYmdGswz79iLbQZWKX BjnbtvnnrwObC2KoCGB87CzGBdxJ05+IephLRA3WSpmH6UrSIvbrJHW2mf+UEyxtJHjk V8qDerLdgfMK1E20MLM2R/t2WFSR4cGEjsvK2qvTDh3lWyJM6TAmkDTudEX3A22P2TQ7 0BBtHvw+i80WN401o//NH0yOxCslDqy9bR/vnKHyyAtnDtmdk2lUrRKa8SU6reYY04qV ZAjSx7Tmi+ItkWWcSPGBieJOpc8mrKfbunS8XsEiq5ftLAIXZSvfeQQMV/QfDRHGfMBI cyCg== X-Gm-Message-State: AOJu0Yy+8kqwPdqkn44eAqy4wC4ERTUDVVwK3mztRNJl2cdhpaAdzB0X VFnlNjh9FRXgxiwVwWjK66l1Jw== X-Google-Smtp-Source: AGHT+IH2VXJ6Ghkxwofrs8nrrcDgKtW93x4dyG1q9rUxYCzHKLPem259aEOah6YsFt5u9x/X6BX1Lw== X-Received: by 2002:a17:902:d48c:b0:1bc:1df2:4c07 with SMTP id c12-20020a170902d48c00b001bc1df24c07mr11625675plg.63.1691464836135; Mon, 07 Aug 2023 20:20:36 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c10d00b001b896686c78sm7675800pli.66.2023.08.07.20.20.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 20:20:35 -0700 (PDT) From: Albert Huang To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: Albert Huang , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , "open list:NETWORKING DRIVERS" , open list , "open list:XDP (eXpress Data Path)" Subject: [RFC v3 Optimizing veth xsk performance 2/9] xsk: add dma_check_skip for skipping dma check Date: Tue, 8 Aug 2023 11:19:06 +0800 Message-Id: <20230808031913.46965-3-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230808031913.46965-1-huangjie.albert@bytedance.com> References: <20230808031913.46965-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC for the virtual net device such as veth, there is no need to do dma check if we support zero copy. add this flag after unaligned. beacause there is 4 bytes hole pahole -V ./net/xdp/xsk_buff_pool.o: ----------- ... /* --- cacheline 3 boundary (192 bytes) --- */ u32 chunk_size; /* 192 4 */ u32 frame_len; /* 196 4 */ u8 cached_need_wakeup; /* 200 1 */ bool uses_need_wakeup; /* 201 1 */ bool dma_need_sync; /* 202 1 */ bool unaligned; /* 203 1 */ /* XXX 4 bytes hole, try to pack */ void * addrs; /* 208 8 */ spinlock_t cq_lock; /* 216 4 */ ... ----------- Signed-off-by: Albert Huang --- include/net/xsk_buff_pool.h | 1 + net/xdp/xsk_buff_pool.c | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index b0bdff26fc88..fe31097dc11b 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -81,6 +81,7 @@ struct xsk_buff_pool { bool uses_need_wakeup; bool dma_need_sync; bool unaligned; + bool dma_check_skip; void *addrs; /* Mutual exclusion of the completion ring in the SKB mode. Two cases to protect: * NAPI TX thread and sendmsg error paths in the SKB destructor callback and when diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index b3f7b310811e..ed251b8e8773 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -85,6 +85,7 @@ struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs, XDP_PACKET_HEADROOM; pool->umem = umem; pool->addrs = umem->addrs; + pool->dma_check_skip = false; INIT_LIST_HEAD(&pool->free_list); INIT_LIST_HEAD(&pool->xskb_list); INIT_LIST_HEAD(&pool->xsk_tx_list); @@ -202,7 +203,7 @@ int xp_assign_dev(struct xsk_buff_pool *pool, if (err) goto err_unreg_pool; - if (!pool->dma_pages) { + if (!pool->dma_pages && !pool->dma_check_skip) { WARN(1, "Driver did not DMA map zero-copy buffers"); err = -EINVAL; goto err_unreg_xsk; From patchwork Tue Aug 8 03:19:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13345660 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 57BD8638 for ; Tue, 8 Aug 2023 03:20:45 +0000 (UTC) Received: from mail-pl1-x62b.google.com (mail-pl1-x62b.google.com [IPv6:2607:f8b0:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C412A10E0 for ; Mon, 7 Aug 2023 20:20:43 -0700 (PDT) Received: by mail-pl1-x62b.google.com with SMTP id d9443c01a7336-1bc6bfc4b58so10637635ad.1 for ; Mon, 07 Aug 2023 20:20:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691464843; x=1692069643; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9C7bHaMSd+SY+pPB7Vtx4/6UZIJ72tWmfxoauVOSVx0=; b=VHfxDg5QSz1PqjRWt9B9x6mNaS6RhCwH3kbVmHYf16gDy0fMcTjbQfLdJ1bUO9PgyF e858P0e+77eKT6XZAaPITT05KvwS75X5CChow28CxQevaNfwHO/a/b/Tjth6QhYmamUl D2koPxHSfUnNqjewXP7wEAFJPTs3dyQoX1v5vV3qDUxrlvdVl0KOQKfXzW86ewoVN7mA /80D4Vn2cmagGQlwfUnU6NadEKweg09+IK+04wQqqwm6DaEtzrp97grfGS8he+jLvx/k eWU21VDBsw4THoGugTPD0TDkYAr8PyGh2PllRIUx5Bm4PqGbt8R1AGICgsH1UV/Qg33Y IKPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691464843; x=1692069643; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9C7bHaMSd+SY+pPB7Vtx4/6UZIJ72tWmfxoauVOSVx0=; b=Se5+tnGAXTKKJebxITw4ieDAVvlQq2W7/38vDXwjJkY8Zcr5/wZjDJta+J1zvEXZvj elC7GCL4onY/6U+LRfrQiTzlFPSdOmoQPEt/rsfghIgvyzcBQ0Dl2C+egsZQGXFCmp1/ 8BRYU/tcjbF0mWlA4qJC0AhKq0Vhft2XauBaRuK3A8LbbytAYWBdzlRg4VCNTrh3dGjX 3ABvuoKn6GOOmGht0G4lQ//+fx707xYupu2q4Bi6tv7GnRW8VFDBU7kmfvYUAxCF0w5U Uj3vFk+f+BwXGeP1iDfWqvyO9vwdNtNKmmzILPmlaZDbs2QbyjJ+3xD7MueDS42IaUJc x5IA== X-Gm-Message-State: AOJu0YyvPraTEyVRzVSf1uDip9qadruS1DEJyHlHE0yUIBZ8/ClvaJqf vdih5ZjPj4J/YhCq+4cMt6bnlg== X-Google-Smtp-Source: AGHT+IFT7mDq0JlyeShuwm6n4b9kapZWwkdRXEbqIpMukZkRiMaS8kGKuxgNgq0KI7HbhgNQsXCRbQ== X-Received: by 2002:a17:902:c404:b0:1bb:a367:a77 with SMTP id k4-20020a170902c40400b001bba3670a77mr11633095plk.31.1691464843297; Mon, 07 Aug 2023 20:20:43 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c10d00b001b896686c78sm7675800pli.66.2023.08.07.20.20.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 20:20:42 -0700 (PDT) From: Albert Huang To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: Albert Huang , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , "open list:NETWORKING DRIVERS" , open list , "open list:XDP (eXpress Data Path)" Subject: [RFC v3 Optimizing veth xsk performance 3/9] veth: add support for send queue Date: Tue, 8 Aug 2023 11:19:07 +0800 Message-Id: <20230808031913.46965-4-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230808031913.46965-1-huangjie.albert@bytedance.com> References: <20230808031913.46965-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC in order to support native af_xdp for veth. we need support for send queue for napi tx. the upcoming patch will make use of it. Signed-off-by: Albert Huang --- drivers/net/veth.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 77e12d52ca2b..25faba879505 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -56,6 +56,11 @@ struct veth_rq_stats { struct u64_stats_sync syncp; }; +struct veth_sq_stats { + struct veth_stats vs; + struct u64_stats_sync syncp; +}; + struct veth_rq { struct napi_struct xdp_napi; struct napi_struct __rcu *napi; /* points to xdp_napi when the latter is initialized */ @@ -69,11 +74,25 @@ struct veth_rq { struct page_pool *page_pool; }; +struct veth_sq { + struct napi_struct xdp_napi; + struct net_device *dev; + struct xdp_mem_info xdp_mem; + struct veth_sq_stats stats; + u32 queue_index; + /* for xsk */ + struct { + struct xsk_buff_pool __rcu *pool; + u32 last_cpu; + } xsk; +}; + struct veth_priv { struct net_device __rcu *peer; atomic64_t dropped; struct bpf_prog *_xdp_prog; struct veth_rq *rq; + struct veth_sq *sq; unsigned int requested_headroom; }; @@ -1495,6 +1514,15 @@ static int veth_alloc_queues(struct net_device *dev) u64_stats_init(&priv->rq[i].stats.syncp); } + priv->sq = kcalloc(dev->num_tx_queues, sizeof(*priv->sq), GFP_KERNEL); + if (!priv->sq) + return -ENOMEM; + + for (i = 0; i < dev->num_tx_queues; i++) { + priv->sq[i].dev = dev; + u64_stats_init(&priv->sq[i].stats.syncp); + } + return 0; } @@ -1503,6 +1531,7 @@ static void veth_free_queues(struct net_device *dev) struct veth_priv *priv = netdev_priv(dev); kfree(priv->rq); + kfree(priv->sq); } static int veth_dev_init(struct net_device *dev) From patchwork Tue Aug 8 03:19:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13345661 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B8FA1C29 for ; Tue, 8 Aug 2023 03:20:52 +0000 (UTC) Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CAAE010D1 for ; Mon, 7 Aug 2023 20:20:50 -0700 (PDT) Received: by mail-pl1-x62c.google.com with SMTP id d9443c01a7336-1bb9e6c2a90so45762835ad.1 for ; Mon, 07 Aug 2023 20:20:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691464850; x=1692069650; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=nhqzuKs8acODjWqieOr60TzQPrln+0aaO9utMxKEwXY=; b=OJL9LrbSgQt70QJMij8iZC+JU41pG03VeJ0y7Nl922F2ZaU7U87A5B753UGGc0Yzx1 /bOWzzKWiz5MHCq3MgepwcOP88c3QaVKgVkTsFPHfIKsB1CIEy7ISR3PzFTQvfTVch3s WnjevnnH9SsiyAEF6GrygCCohR3vtxY6qtYmJVOnIJAzQC5P3SlBXIP0HPbiUcyU/Tpg WzO3Noh9RFrUXrGlfzBi0PbKEYMSibgslfmMf83UoYgMOmjvVh7T/nHKLh9NspmzZ0bp ZmzvtRNd6WNniQsh/hzTpgIkY/Il5uuKqN/T/P4pQ1UI1p7bI63YzkuOWTSh/Dn//l3M qjZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691464850; x=1692069650; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nhqzuKs8acODjWqieOr60TzQPrln+0aaO9utMxKEwXY=; b=TS84LsIdmgNJdIxBpdH30TRFv0uI4iMy6SvKSsVmA2BTxyNEazPRRCwyCNz6e2j4Gs aom0IZPY13sPW1p//P8DCcRXttIcZ0cd+9qkudfxPfWKYw+jZ1+zIksIaICXGllNvWp+ g2c0s9IraDohXngvOAxNFnzucHax21ZnwpcdMBYCK5XanSjQ6xCOO3j2Fv0zYuD2k3Vs O4ZltNOpjHtoIr0j8ZqpZrw3ZMQbRjqKliFDFsM5wnOQUc3ofobT9chdZ+WOdoI5U/Qe HzX5G/2RRCAiC1F8ye9VVnHsIMaSpdjjOiHtKvi9Ty9LO01DRkFvjup9h/+9jTCMeMjB 4NfQ== X-Gm-Message-State: AOJu0Yz6vJQk1nsk7NpcPvXKTFD+oAhN9LgPgWJXrdLd14Tjl1YRm2Bq /O1FGKUn/yIhrv59gzwec2cGJA== X-Google-Smtp-Source: AGHT+IFJntj41Gt0ECRGUl7vLwT94BPvXtio+ZQROoGCzVRtLpUoICA2rzvmhhrD1xkeCfFkMSjNtw== X-Received: by 2002:a17:902:eccc:b0:1b8:1335:b775 with SMTP id a12-20020a170902eccc00b001b81335b775mr13965313plh.0.1691464850307; Mon, 07 Aug 2023 20:20:50 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c10d00b001b896686c78sm7675800pli.66.2023.08.07.20.20.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 20:20:49 -0700 (PDT) From: Albert Huang To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: Albert Huang , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , "open list:NETWORKING DRIVERS" , open list , "open list:XDP (eXpress Data Path)" Subject: [RFC v3 Optimizing veth xsk performance 4/9] xsk: add xsk_tx_completed_addr function Date: Tue, 8 Aug 2023 11:19:08 +0800 Message-Id: <20230808031913.46965-5-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230808031913.46965-1-huangjie.albert@bytedance.com> References: <20230808031913.46965-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Return desc to the cq by using the descriptor address. Signed-off-by: Albert Huang --- include/net/xdp_sock_drv.h | 5 +++++ net/xdp/xsk.c | 6 ++++++ net/xdp/xsk_queue.h | 10 ++++++++++ 3 files changed, 21 insertions(+) diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index 1f6fc8c7a84c..de82c596e48f 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -15,6 +15,7 @@ #ifdef CONFIG_XDP_SOCKETS void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries); +void xsk_tx_completed_addr(struct xsk_buff_pool *pool, u64 addr); bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc); u32 xsk_tx_peek_release_desc_batch(struct xsk_buff_pool *pool, u32 max); void xsk_tx_release(struct xsk_buff_pool *pool); @@ -188,6 +189,10 @@ static inline void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries) { } +static inline void xsk_tx_completed_addr(struct xsk_buff_pool *pool, u64 addr) +{ +} + static inline bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) { diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 4f1e0599146e..b2b8aa7b0bcf 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -396,6 +396,12 @@ void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries) } EXPORT_SYMBOL(xsk_tx_completed); +void xsk_tx_completed_addr(struct xsk_buff_pool *pool, u64 addr) +{ + xskq_prod_submit_addr(pool->cq, addr); +} +EXPORT_SYMBOL(xsk_tx_completed_addr); + void xsk_tx_release(struct xsk_buff_pool *pool) { struct xdp_sock *xs; diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index 13354a1e4280..3a5e26a81dc2 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -428,6 +428,16 @@ static inline void __xskq_prod_submit(struct xsk_queue *q, u32 idx) smp_store_release(&q->ring->producer, idx); /* B, matches C */ } +static inline void xskq_prod_submit_addr(struct xsk_queue *q, u64 addr) +{ + struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring; + u32 idx = q->ring->producer; + + ring->desc[idx++ & q->ring_mask] = addr; + + __xskq_prod_submit(q, idx); +} + static inline void xskq_prod_submit(struct xsk_queue *q) { __xskq_prod_submit(q, q->cached_prod); From patchwork Tue Aug 8 03:19:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13345662 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 982A0D517 for ; Tue, 8 Aug 2023 03:21:00 +0000 (UTC) Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com [IPv6:2607:f8b0:4864:20::62e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DCA8610C0 for ; Mon, 7 Aug 2023 20:20:57 -0700 (PDT) Received: by mail-pl1-x62e.google.com with SMTP id d9443c01a7336-1bbc06f830aso35115005ad.0 for ; Mon, 07 Aug 2023 20:20:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691464857; x=1692069657; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tzvycUSp6fj3lhy6v0s47NdlhPCC8jzqygliYQ4HaF8=; b=NvVq2h/3LxshBlFB6gD30775uo3/1fXs1KQJKJFoXl/OGNxuF83Tu9AIGUcdDtSWix TbsEnaXloQxMxlKaCVptTVgdab0tdTLokOycU/saAgyxbajhxKhjG0JFKl+S2ntaQDEi H/BkS3UzYJXQTbydd1Fh8rJ28tHVdjdslTXSLoKRAaE6VlD7bwEbCCHTjH2Yl/CtlOgq qE8YFjdsbQs2aSzfcriCwjrjvYqmwhvsFV5aopNlV9BEMdlZ9Dcqo9riMIPW6sSi4gWL WHT5FQQxnzB4mD3NT1+4b1bHu+7RUbCzfXxMfDLL5kYnoz3Gbztq3ildDRlo8jDGdh0u QzSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691464857; x=1692069657; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tzvycUSp6fj3lhy6v0s47NdlhPCC8jzqygliYQ4HaF8=; b=Wn1JiIgGiHp4k8j/buCLi+merrVHnkSetolJllafTK3NZB6MO3ckSJBHD9Ro2/e3Pv P6n9rLtJZEKjwnmCJg+W274Pn6HRppVmzjPux0XaHf0mGan/XB2wBa5cHIh5sbDnTZfj rFWCJakSn6NlqfW6eStC9uxvWnGFw7mRtkydQakQrxnaRWDznIQE1YCf4R5razZfp+oJ YDP68Eb1jaLrN5mFoPe1hwrPNBBAs+l1Hl53OMNvhjG5eRADCmoMFSZPuwSZiUGwIUkX SdjU8KPCPAld8rW1q0SKtS3tL3CdaMZdWlDvQqoMRSKQ5HKKsJWoitPHzgJU3iN5TrDR zuEA== X-Gm-Message-State: AOJu0Yxy+1t4qvFrJSewUTQ6nSa0cLhPfHjUyM2BHkfhAoyQiRRBaWm7 lACPqEebImLAnoqGmdX4Gy/GzQ== X-Google-Smtp-Source: AGHT+IGczbp6+PgAEbO1oFEDSNGLeIH6sbO9lcpT91LZiAk8F7bvXkch/TPiWwysUhcrgcO4FE55Mg== X-Received: by 2002:a17:902:6bc1:b0:1b6:6a14:3734 with SMTP id m1-20020a1709026bc100b001b66a143734mr8363404plt.29.1691464857325; Mon, 07 Aug 2023 20:20:57 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c10d00b001b896686c78sm7675800pli.66.2023.08.07.20.20.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 20:20:56 -0700 (PDT) From: Albert Huang To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: Albert Huang , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , "open list:NETWORKING DRIVERS" , open list , "open list:XDP (eXpress Data Path)" Subject: [RFC v3 Optimizing veth xsk performance 5/9] veth: use send queue tx napi to xmit xsk tx desc Date: Tue, 8 Aug 2023 11:19:09 +0800 Message-Id: <20230808031913.46965-6-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230808031913.46965-1-huangjie.albert@bytedance.com> References: <20230808031913.46965-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC use send queue tx napi to xmit xsk tx desc Signed-off-by: Albert Huang --- drivers/net/veth.c | 230 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 229 insertions(+), 1 deletion(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 25faba879505..28b891dd8dc9 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -27,6 +27,8 @@ #include #include #include +#include +#include #define DRV_NAME "veth" #define DRV_VERSION "1.0" @@ -1061,6 +1063,141 @@ static int veth_poll(struct napi_struct *napi, int budget) return done; } +static struct sk_buff *veth_build_skb(void *head, int headroom, int len, + int buflen) +{ + struct sk_buff *skb; + + skb = build_skb(head, buflen); + if (!skb) + return NULL; + + skb_reserve(skb, headroom); + skb_put(skb, len); + + return skb; +} + +static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, int budget) +{ + struct veth_priv *priv, *peer_priv; + struct net_device *dev, *peer_dev; + struct veth_stats stats = {}; + struct sk_buff *skb = NULL; + struct veth_rq *peer_rq; + struct xdp_desc desc; + int done = 0; + + dev = sq->dev; + priv = netdev_priv(dev); + peer_dev = priv->peer; + peer_priv = netdev_priv(peer_dev); + + /* todo: queue index must set before this */ + peer_rq = &peer_priv->rq[sq->queue_index]; + + /* set xsk wake up flag, to do: where to disable */ + if (xsk_uses_need_wakeup(xsk_pool)) + xsk_set_tx_need_wakeup(xsk_pool); + + while (budget-- > 0) { + unsigned int truesize = 0; + struct page *page; + void *vaddr; + void *addr; + + if (!xsk_tx_peek_desc(xsk_pool, &desc)) + break; + + addr = xsk_buff_raw_get_data(xsk_pool, desc.addr); + + /* can not hold all data in a page */ + truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + truesize += desc.len + xsk_pool->headroom; + if (truesize > PAGE_SIZE) { + xsk_tx_completed_addr(xsk_pool, desc.addr); + stats.xdp_drops++; + break; + } + + page = dev_alloc_page(); + if (!page) { + xsk_tx_completed_addr(xsk_pool, desc.addr); + stats.xdp_drops++; + break; + } + vaddr = page_to_virt(page); + + memcpy(vaddr + xsk_pool->headroom, addr, desc.len); + xsk_tx_completed_addr(xsk_pool, desc.addr); + + skb = veth_build_skb(vaddr, xsk_pool->headroom, desc.len, PAGE_SIZE); + if (!skb) { + put_page(page); + stats.xdp_drops++; + break; + } + skb->protocol = eth_type_trans(skb, peer_dev); + napi_gro_receive(&peer_rq->xdp_napi, skb); + + stats.xdp_bytes += desc.len; + done++; + } + + /* release, move consumer,and wakeup the producer */ + if (done) { + napi_schedule(&peer_rq->xdp_napi); + xsk_tx_release(xsk_pool); + } + + u64_stats_update_begin(&sq->stats.syncp); + sq->stats.vs.xdp_packets += done; + sq->stats.vs.xdp_bytes += stats.xdp_bytes; + sq->stats.vs.xdp_drops += stats.xdp_drops; + u64_stats_update_end(&sq->stats.syncp); + + return done; +} + +static int veth_poll_tx(struct napi_struct *napi, int budget) +{ + struct veth_sq *sq = container_of(napi, struct veth_sq, xdp_napi); + struct xsk_buff_pool *pool; + int done = 0; + + sq->xsk.last_cpu = smp_processor_id(); + + /* xmit for tx queue */ + rcu_read_lock(); + pool = rcu_dereference(sq->xsk.pool); + if (pool) + done = veth_xsk_tx_xmit(sq, pool, budget); + + rcu_read_unlock(); + + if (done < budget) { + /* if done < budget, the tx ring is no buffer */ + napi_complete_done(napi, done); + } + + return done; +} + +static int veth_napi_add_tx(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + int i; + + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_sq *sq = &priv->sq[i]; + + netif_napi_add(dev, &sq->xdp_napi, veth_poll_tx); + napi_enable(&sq->xdp_napi); + } + + return 0; +} + static int veth_create_page_pool(struct veth_rq *rq) { struct page_pool_params pp_params = { @@ -1153,6 +1290,19 @@ static void veth_napi_del_range(struct net_device *dev, int start, int end) } } +static void veth_napi_del_tx(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + int i; + + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_sq *sq = &priv->sq[i]; + + napi_disable(&sq->xdp_napi); + __netif_napi_del(&sq->xdp_napi); + } +} + static void veth_napi_del(struct net_device *dev) { veth_napi_del_range(dev, 0, dev->real_num_rx_queues); @@ -1360,7 +1510,7 @@ static void veth_set_xdp_features(struct net_device *dev) struct veth_priv *priv_peer = netdev_priv(peer); xdp_features_t val = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | - NETDEV_XDP_ACT_RX_SG; + NETDEV_XDP_ACT_RX_SG | NETDEV_XDP_ACT_XSK_ZEROCOPY; if (priv_peer->_xdp_prog || veth_gro_requested(peer)) val |= NETDEV_XDP_ACT_NDO_XMIT | @@ -1737,11 +1887,89 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, return err; } +static int veth_xsk_pool_enable(struct net_device *dev, struct xsk_buff_pool *pool, u16 qid) +{ + struct veth_priv *peer_priv; + struct veth_priv *priv = netdev_priv(dev); + struct net_device *peer_dev = priv->peer; + int err = 0; + + if (qid >= dev->real_num_tx_queues) + return -EINVAL; + + if (!peer_dev) + return -EINVAL; + + /* no dma, so we just skip dma skip in xsk zero copy */ + pool->dma_check_skip = true; + + peer_priv = netdev_priv(peer_dev); + + /* enable peer tx xdp here, this side + * xdp is enable by veth_xdp_set + * to do: we need to check whther this side is already enable xdp + * maybe it do not have xdp prog + */ + if (!(peer_priv->_xdp_prog) && (!veth_gro_requested(peer_dev))) { + /* peer should enable napi*/ + err = veth_napi_enable(peer_dev); + if (err) + return err; + } + + /* Here is already protected by rtnl_lock, so rcu_assign_pointer + * is safe. + */ + rcu_assign_pointer(priv->sq[qid].xsk.pool, pool); + + veth_napi_add_tx(dev); + + return err; +} + +static int veth_xsk_pool_disable(struct net_device *dev, u16 qid) +{ + struct veth_priv *peer_priv; + struct veth_priv *priv = netdev_priv(dev); + struct net_device *peer_dev = priv->peer; + int err = 0; + + if (qid >= dev->real_num_tx_queues) + return -EINVAL; + + if (!peer_dev) + return -EINVAL; + + peer_priv = netdev_priv(peer_dev); + + /* to do: this may be failed */ + if (!(peer_priv->_xdp_prog) && (!veth_gro_requested(peer_dev))) { + /* disable peer napi */ + veth_napi_del(peer_dev); + } + + veth_napi_del_tx(dev); + + rcu_assign_pointer(priv->sq[qid].xsk.pool, NULL); + return err; +} + +/* this is for setup xdp */ +static int veth_xsk_pool_setup(struct net_device *dev, struct netdev_bpf *xdp) +{ + if (xdp->xsk.pool) + return veth_xsk_pool_enable(dev, xdp->xsk.pool, xdp->xsk.queue_id); + else + return veth_xsk_pool_disable(dev, xdp->xsk.queue_id); +} + static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp) { switch (xdp->command) { case XDP_SETUP_PROG: return veth_xdp_set(dev, xdp->prog, xdp->extack); + case XDP_SETUP_XSK_POOL: + return veth_xsk_pool_setup(dev, xdp); default: return -EINVAL; } From patchwork Tue Aug 8 03:19:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13345663 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 272DEDDC2 for ; Tue, 8 Aug 2023 03:21:07 +0000 (UTC) Received: from mail-pg1-x52f.google.com (mail-pg1-x52f.google.com [IPv6:2607:f8b0:4864:20::52f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EDCE31703 for ; Mon, 7 Aug 2023 20:21:04 -0700 (PDT) Received: by mail-pg1-x52f.google.com with SMTP id 41be03b00d2f7-56433b1b12dso3027193a12.1 for ; Mon, 07 Aug 2023 20:21:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691464864; x=1692069664; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ICV3oSZrEepyfkdFh294C2oc/HDgt3q4tnvL6u1XOSo=; b=lutZ0A2WljE0PHMq69IriO2rZ3mTGdwt4yVVFZOwx7KaXGHtsYthZEnorDptzuoR9Z 3bTfCPeVS8fpSvafHRcMLxrDNgnA2loby72Wgu9cfloNQRxmHCbPK3vKdpTZfK8jUlmF 7p0hYB/3nnDjWEtIceXVAL8kl4qqmuLDT5CgZQQRARE9KRnFvExGQzyuneXXfHttAgdL zsKB0Oh+uxe0BPwxqJaQMOPzn/nTgtz9LdZMlVg0RTJdKhq0bRAT7rJsnC7zn+ynGm5n Lymc0VkgXf3C5i+q0ZNqVquZH5V2gVRVSKb9Yn7GpEFzEVlv3u68qZNnVPavuo/G/XFK RNwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691464864; x=1692069664; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ICV3oSZrEepyfkdFh294C2oc/HDgt3q4tnvL6u1XOSo=; b=el5qc13egqSBWU+3V7wJd+IZ9VQtCoFMoVPG+mMNScb2foNst3CZn/eP9EcaM5KCvf CWD6CPV9EKXupHPpFabaFvkqy36Cru0YbSGtVgRKlvtfgAHPJ4eE4Q+kpcN/zXCTG2mU 1cOLoGxLW36TBhbD64d7ToE85RAQbY+dO33fKpYV9WhyJtNADFf6WjJWm2XbhxgDD3JA zaVGQ1SO0NdZMBKOZdUqZyaB1M7Okwqu+zgGs7oI0di67X1Ms0PuUCVcFCcEAppFCDqb g6b8oDtAfSun2PbbeAXL+krRzFmTT97RtCqFUdvli3CvFFlz+f319/aqbL/g9wGi8VH5 GMmg== X-Gm-Message-State: AOJu0YwEyvRn2I8qGHtvWCgmylJokPcurdb9DaXEXXbXJsV0peD27V9w 7PL/ZF5hb+2iL7/kHBEfMRWARA== X-Google-Smtp-Source: AGHT+IF76sZiV7Ae2VDxsLeP3TXhil8es9a99enu8nvMpK1GwLMowzM4hBRaMRVwS8bSHHASr/moLg== X-Received: by 2002:a05:6a21:2713:b0:133:2974:d31a with SMTP id rm19-20020a056a21271300b001332974d31amr8482956pzb.17.1691464864453; Mon, 07 Aug 2023 20:21:04 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c10d00b001b896686c78sm7675800pli.66.2023.08.07.20.20.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 20:21:04 -0700 (PDT) From: Albert Huang To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: Albert Huang , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , "open list:NETWORKING DRIVERS" , open list , "open list:XDP (eXpress Data Path)" Subject: [RFC v3 Optimizing veth xsk performance 6/9] veth: add ndo_xsk_wakeup callback for veth Date: Tue, 8 Aug 2023 11:19:10 +0800 Message-Id: <20230808031913.46965-7-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230808031913.46965-1-huangjie.albert@bytedance.com> References: <20230808031913.46965-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC add ndo_xsk_wakeup callback for veth, this is used to wakeup napi tx. Signed-off-by: Albert Huang --- drivers/net/veth.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 28b891dd8dc9..ac78d6a87416 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -1805,6 +1805,44 @@ static void veth_set_rx_headroom(struct net_device *dev, int new_hr) rcu_read_unlock(); } +static void veth_xsk_remote_trigger_napi(void *info) +{ + struct veth_sq *sq = info; + + napi_schedule(&sq->xdp_napi); +} + +static int veth_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag) +{ + struct veth_priv *priv; + struct veth_sq *sq; + u32 last_cpu, cur_cpu; + + if (!netif_running(dev)) + return -ENETDOWN; + + if (qid >= dev->real_num_rx_queues) + return -EINVAL; + + priv = netdev_priv(dev); + sq = &priv->sq[qid]; + + if (napi_if_scheduled_mark_missed(&sq->xdp_napi)) + return 0; + + last_cpu = sq->xsk.last_cpu; + cur_cpu = get_cpu(); + + /* raise a napi */ + if (last_cpu == cur_cpu) + napi_schedule(&sq->xdp_napi); + else + smp_call_function_single(last_cpu, veth_xsk_remote_trigger_napi, sq, true); + + put_cpu(); + return 0; +} + static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, struct netlink_ext_ack *extack) { @@ -2019,6 +2057,7 @@ static const struct net_device_ops veth_netdev_ops = { .ndo_set_rx_headroom = veth_set_rx_headroom, .ndo_bpf = veth_xdp, .ndo_xdp_xmit = veth_ndo_xdp_xmit, + .ndo_xsk_wakeup = veth_xsk_wakeup, .ndo_get_peer_dev = veth_peer_dev, }; From patchwork Tue Aug 8 03:19:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13345664 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 06D3D15AB for ; Tue, 8 Aug 2023 03:21:19 +0000 (UTC) Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4A319121 for ; Mon, 7 Aug 2023 20:21:12 -0700 (PDT) Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-1bba04b9df3so45869765ad.0 for ; Mon, 07 Aug 2023 20:21:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691464872; x=1692069672; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vNrm/+UjyC6jP3361KhAbnju37DBvc3v6PBY22LsM1M=; b=khfL3k+d/CGznwaO5bu9+cRGuRQ6fYxlH9gn4KFfrTBbf9tNYQ6PkS2QLz2ndsXr7u ZZFyAYrq4pkWZ2egL0T8+F1oRE2BdqG2AchQKzk5rpmHiJn7brKTCQ8ATfQT7sFIWFge +wJcsLzJca4mkXRRBfJXZfKyNTBPlv+eL9czPa03R4sohsTcUYnfsQxbhSXxU25FemRt iejLrUhpyeo72CMnzwLsp9SuIrXP3m4KaE4DHI1lDQX6YY5lFuWPo3Vo9fSW3SuO+lAg gxB6SIK1coQKqnReEqyCG/Z1l38wetEMXmr5Avzcf+u4KBASxQT7KEpJWvcaZOhS10vC fb8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691464872; x=1692069672; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vNrm/+UjyC6jP3361KhAbnju37DBvc3v6PBY22LsM1M=; b=QX15ux7hMJ3TO0bgIQ9UTXB5frYS/35EYQp2LMF5kWOV3Kseq59YcP0MAetBcaa7n8 LudVHcBFYjmDA3frYFlyEq/ofLZCTjoFjqyoBJPwOem4yAY7O+rVJiF7GS2LXCI46Ca4 FfK+92H3bDcJlXJBEzzLIkkoaKnC+f7m3Ag5OqXcfhRDuB6sVClJJHTnl6eZX4nR0D6o gw/KcuRG1myECi617vad1WI+3V4lcFY3OyVfNqzxlJ3wlDZYJjHMlKO/+qA2zDklRw8D iHyVIHMhfWDvzJ8EOTIbCPHahIm5iVegdDqAF5Hm7zje/XKF2cQIR0D5C+AyvYM0eFJq jHUw== X-Gm-Message-State: AOJu0YxZj1QkJVXR0qH6p+UUgRsDxhlbpTquG4CLkmx2MgoYZT5Vz8w0 3M44mHa2kAGIfHkFQZnU513inQ== X-Google-Smtp-Source: AGHT+IEf+1UGY33nl/NwIi62HwrnmxDbqQr/9EwPpV2GjIISAPneNKJbTvSnXmVCy/syiT18tgdayw== X-Received: by 2002:a17:902:6909:b0:1b8:9461:6729 with SMTP id j9-20020a170902690900b001b894616729mr10585584plk.2.1691464871774; Mon, 07 Aug 2023 20:21:11 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c10d00b001b896686c78sm7675800pli.66.2023.08.07.20.21.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 20:21:11 -0700 (PDT) From: Albert Huang To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: Albert Huang , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , "open list:NETWORKING DRIVERS" , open list , "open list:XDP (eXpress Data Path)" Subject: [RFC v3 Optimizing veth xsk performance 7/9] sk_buff: add destructor_arg_xsk_pool for zero copy Date: Tue, 8 Aug 2023 11:19:11 +0800 Message-Id: <20230808031913.46965-8-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230808031913.46965-1-huangjie.albert@bytedance.com> References: <20230808031913.46965-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-State: RFC this member is add for dummy dev to suppot zero copy Signed-off-by: Albert Huang --- include/linux/skbuff.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 16a49ba534e4..db999056022e 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -592,6 +592,8 @@ struct skb_shared_info { /* Intermediate layers must ensure that destructor_arg * remains valid until skb destructor */ void * destructor_arg; + /* just for dummy device xsk zero copy */ + void *destructor_arg_xsk_pool; /* must be last field, see pskb_expand_head() */ skb_frag_t frags[MAX_SKB_FRAGS]; From patchwork Tue Aug 8 03:19:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13345666 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 31456EAE7 for ; Tue, 8 Aug 2023 03:21:32 +0000 (UTC) Received: from mail-pg1-x536.google.com (mail-pg1-x536.google.com [IPv6:2607:f8b0:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD3542105 for ; Mon, 7 Aug 2023 20:21:19 -0700 (PDT) Received: by mail-pg1-x536.google.com with SMTP id 41be03b00d2f7-5645bbc82aaso3801896a12.2 for ; Mon, 07 Aug 2023 20:21:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691464879; x=1692069679; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VNxERTokHn2QOzCO7nu1QH5j2GU8N7cGLZ9USDe1a7I=; b=C4mBcxo9HLIzgPmkZwvVwMe2Y8+oOUbmzCrCDJqkOO4K6EXNq7sy0sXwURk4dnYWN/ nqw603ksEycjKPz+q7Gb7cwNFSYIK+vVrjiyNyqw3L/+v1thar9dzpcpeGKEFR3o4wLH S6xLfrbCC72IxnIkr5RURV71aSuBQadDIjBi3PHym4cpsgU/YZrISBPUzd4jceT+L/0a 307O/xTRmtCcwfALjTjmrGApGmrIdSdjks+lyuqttA5hf1liWdxzdc0Cqo6COCriBkaS QPLzDbZufpr2FBVXchXRpN1W/43dNgJJikedWzqrHrtBXTwNFXaAoiDaBXhaBmaMmOlA CRuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691464879; x=1692069679; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VNxERTokHn2QOzCO7nu1QH5j2GU8N7cGLZ9USDe1a7I=; b=Jqt3sH/Yz1rN2NsSRrrr7eks8yTgRbZY+7vjX4+UyMsyyF6dbm5xPaXmnf7KjBxzfG zDAX5fHdb5NOoSxcdzduzRlOCUpnS0Q/EvyBU7fqxYRBC5UC4nr0tgEeA4/yrRIL+h6t oVK3HkjDmQ4JT6sRgncGhGcJhVxturTmM3L95Y2YYuExdEKB+XaKZ5hfAltpRRJcztFy sOF4iX62V4sqLBawC3LpiSRfk+yZQuIKXCBQSuBwiTn+MPdTIylIzEZe97vNHrAF4jzr tGyzUMCTmPi7l5H/y2ZtlS/Wn60/BSWaenQqm95PWjez72e+B5eKB6V/1kTAFep6moLl Zd7Q== X-Gm-Message-State: AOJu0YxFfOpaaYui4d+LUihl92rERnLkl6cdTBtjl1+Max1pS3Fh97os zcMSerZYO07MUy3XnKdVB0mTDQ== X-Google-Smtp-Source: AGHT+IEOYdq2Iq7uEVVkuLq5Idoxa0lnhD1ymimKbq0IIAGWG6YIbaqjqtDvHW0ImxBWxvTQ4G+7Nw== X-Received: by 2002:a05:6a20:3206:b0:138:2fb8:6a14 with SMTP id hl6-20020a056a20320600b001382fb86a14mr11332615pzc.3.1691464878793; Mon, 07 Aug 2023 20:21:18 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c10d00b001b896686c78sm7675800pli.66.2023.08.07.20.21.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 20:21:18 -0700 (PDT) From: Albert Huang To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: Albert Huang , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , "open list:NETWORKING DRIVERS" , open list , "open list:XDP (eXpress Data Path)" Subject: [RFC v3 Optimizing veth xsk performance 8/9] veth: af_xdp tx batch support for ipv4 udp Date: Tue, 8 Aug 2023 11:19:12 +0800 Message-Id: <20230808031913.46965-9-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230808031913.46965-1-huangjie.albert@bytedance.com> References: <20230808031913.46965-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC A typical topology is shown below: veth<--------veth-peer 1 | |2 | bridge<------->eth0(such as mlnx5 NIC) If you use af_xdp to send packets from veth to a physical NIC, it needs to go through some software paths, so we can refer to the implementation of kernel GSO. When af_xdp sends packets out from veth, consider aggregating packets and send a large packet from the veth virtual NIC to the physical NIC. performance:(test weth libxdp lib) AF_XDP without batch : 480 Kpps (with ksoftirqd 100% cpu) AF_XDP with batch : 1.5 Mpps (with ksoftirqd 15% cpu) With af_xdp batch, the libxdp user-space program reaches a bottleneck. Therefore, the softirq did not reach the limit. Signed-off-by: Albert Huang --- drivers/net/veth.c | 408 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 387 insertions(+), 21 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index ac78d6a87416..70489d017b51 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -29,6 +29,7 @@ #include #include #include +#include #define DRV_NAME "veth" #define DRV_VERSION "1.0" @@ -103,6 +104,23 @@ struct veth_xdp_tx_bq { unsigned int count; }; +struct veth_batch_tuple { + __u8 protocol; + __be32 saddr; + __be32 daddr; + __be16 source; + __be16 dest; + __be16 batch_size; + __be16 batch_segs; + bool batch_enable; + bool batch_flush; +}; + +struct veth_seg_info { + u32 segs; + u64 desc[] ____cacheline_aligned_in_smp; +}; + /* * ethtool interface */ @@ -1078,11 +1096,340 @@ static struct sk_buff *veth_build_skb(void *head, int headroom, int len, return skb; } +static void veth_xsk_destruct_skb(struct sk_buff *skb) +{ + struct skb_shared_info *si = skb_shinfo(skb); + struct xsk_buff_pool *pool = (struct xsk_buff_pool *)si->destructor_arg_xsk_pool; + struct veth_seg_info *seg_info = (struct veth_seg_info *)si->destructor_arg; + unsigned long flags; + u32 index = 0; + u64 addr; + + /* release cq */ + spin_lock_irqsave(&pool->cq_lock, flags); + for (index = 0; index < seg_info->segs; index++) { + addr = (u64)(long)seg_info->desc[index]; + xsk_tx_completed_addr(pool, addr); + } + spin_unlock_irqrestore(&pool->cq_lock, flags); + + kfree(seg_info); + si->destructor_arg = NULL; + si->destructor_arg_xsk_pool = NULL; +} + +static struct sk_buff *veth_build_gso_head_skb(struct net_device *dev, + char *buff, u32 tot_len, + u32 headroom, u32 iph_len, + u32 th_len) +{ + struct sk_buff *skb = NULL; + int err = 0; + + skb = alloc_skb(tot_len, GFP_KERNEL); + if (unlikely(!skb)) + return NULL; + + /* header room contains the eth header */ + skb_reserve(skb, headroom - ETH_HLEN); + skb_put(skb, ETH_HLEN + iph_len + th_len); + skb_shinfo(skb)->gso_segs = 0; + + err = skb_store_bits(skb, 0, buff, ETH_HLEN + iph_len + th_len); + if (unlikely(err)) { + kfree_skb(skb); + return NULL; + } + + skb->protocol = eth_type_trans(skb, dev); + skb->network_header = skb->mac_header + ETH_HLEN; + skb->transport_header = skb->network_header + iph_len; + skb->ip_summed = CHECKSUM_PARTIAL; + + return skb; +} + +/* only ipv4 udp match + * to do: tcp and ipv6 + */ +static inline bool veth_segment_match(struct veth_batch_tuple *tuple, + struct iphdr *iph, struct udphdr *udph) +{ + if (tuple->protocol == iph->protocol && + tuple->saddr == iph->saddr && + tuple->daddr == iph->daddr && + tuple->source == udph->source && + tuple->dest == udph->dest && + tuple->batch_size == ntohs(udph->len)) { + tuple->batch_flush = false; + return true; + } + + tuple->batch_flush = true; + return false; +} + +static inline void veth_tuple_init(struct veth_batch_tuple *tuple, + struct iphdr *iph, struct udphdr *udph) +{ + tuple->protocol = iph->protocol; + tuple->saddr = iph->saddr; + tuple->daddr = iph->daddr; + tuple->source = udph->source; + tuple->dest = udph->dest; + tuple->batch_flush = false; + tuple->batch_size = ntohs(udph->len); + tuple->batch_segs = 0; +} + +static inline bool veth_batch_ip_check_v4(struct iphdr *iph, u32 len) +{ + if (len <= (ETH_HLEN + sizeof(*iph))) + return false; + + if (iph->ihl < 5 || iph->version != 4 || len < (iph->ihl * 4 + ETH_HLEN)) + return false; + + return true; +} + +static struct sk_buff *veth_build_skb_batch_udp(struct net_device *dev, + struct xsk_buff_pool *pool, + struct xdp_desc *desc, + struct veth_batch_tuple *tuple, + struct sk_buff *prev_skb) +{ + u32 hr, len, ts, index, iph_len, th_len, data_offset, data_len, tot_len; + struct veth_seg_info *seg_info; + void *buffer; + struct udphdr *udph; + struct iphdr *iph; + struct sk_buff *skb; + struct page *page; + u32 seg_len = 0; + int hh_len = 0; + u64 addr; + + addr = desc->addr; + len = desc->len; + + /* l2 reserved len */ + hh_len = LL_RESERVED_SPACE(dev); + hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(hh_len)); + + /* data points to eth header */ + buffer = (unsigned char *)xsk_buff_raw_get_data(pool, addr); + + iph = (struct iphdr *)(buffer + ETH_HLEN); + iph_len = iph->ihl * 4; + + udph = (struct udphdr *)(buffer + ETH_HLEN + iph_len); + th_len = sizeof(struct udphdr); + + if (tuple->batch_flush) + veth_tuple_init(tuple, iph, udph); + + ts = pool->unaligned ? len : pool->chunk_size; + + data_offset = offset_in_page(buffer) + ETH_HLEN + iph_len + th_len; + data_len = len - (ETH_HLEN + iph_len + th_len); + + /* head is null or this is a new 5 tuple */ + if (!prev_skb || !veth_segment_match(tuple, iph, udph)) { + tot_len = hr + iph_len + th_len; + skb = veth_build_gso_head_skb(dev, buffer, tot_len, hr, iph_len, th_len); + if (!skb) { + /* to do: handle here for skb */ + return NULL; + } + + /* store information for gso */ + seg_len = struct_size(seg_info, desc, MAX_SKB_FRAGS); + seg_info = kmalloc(seg_len, GFP_KERNEL); + if (!seg_info) { + /* to do */ + kfree_skb(skb); + return NULL; + } + } else { + skb = prev_skb; + skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4 | SKB_GSO_PARTIAL; + skb_shinfo(skb)->gso_size = data_len; + skb->ip_summed = CHECKSUM_PARTIAL; + + /* max segment is MAX_SKB_FRAGS */ + if (skb_shinfo(skb)->gso_segs >= MAX_SKB_FRAGS - 1) + tuple->batch_flush = true; + + seg_info = (struct veth_seg_info *)skb_shinfo(skb)->destructor_arg; + } + + /* offset in umem pool buffer */ + addr = buffer - pool->addrs; + + /* get the page of the desc */ + page = pool->umem->pgs[addr >> PAGE_SHIFT]; + + /* in order to avoid to get freed by kfree_skb */ + get_page(page); + + /* desc.data can not hold in two */ + skb_fill_page_desc(skb, skb_shinfo(skb)->gso_segs, page, data_offset, data_len); + + skb->len += data_len; + skb->data_len += data_len; + skb->truesize += ts; + skb->dev = dev; + + /* later we will support gso for this */ + index = skb_shinfo(skb)->gso_segs; + seg_info->desc[index] = desc->addr; + seg_info->segs = ++index; + skb_shinfo(skb)->gso_segs++; + + skb_shinfo(skb)->destructor_arg = (void *)(long)seg_info; + skb_shinfo(skb)->destructor_arg_xsk_pool = (void *)(long)pool; + skb->destructor = veth_xsk_destruct_skb; + + /* to do: + * add skb to sock. may be there is no need to do for this + * and this might be multiple xsk sockets involved, so it's + * difficult to determine which socket is sending the data. + * refcount_add(ts, &xs->sk.sk_wmem_alloc); + */ + return skb; +} + +static inline struct sk_buff *veth_build_skb_def(struct net_device *dev, + struct xsk_buff_pool *pool, struct xdp_desc *desc) +{ + struct sk_buff *skb = NULL; + struct page *page; + void *buffer; + void *vaddr; + + page = dev_alloc_page(); + if (!page) + return NULL; + + buffer = (unsigned char *)xsk_buff_raw_get_data(pool, desc->addr); + + vaddr = page_to_virt(page); + memcpy(vaddr + pool->headroom, buffer, desc->len); + skb = veth_build_skb(vaddr, pool->headroom, desc->len, PAGE_SIZE); + if (!skb) { + put_page(page); + return NULL; + } + + skb->protocol = eth_type_trans(skb, dev); + + return skb; +} + +/* To call the following function, the following conditions must be met: + * 1.The data packet must be a standard Ethernet data packet + * 2. Data packets support batch sending + */ +static inline struct sk_buff *veth_build_skb_batch_v4(struct net_device *dev, + struct xsk_buff_pool *pool, + struct xdp_desc *desc, + struct veth_batch_tuple *tuple, + struct sk_buff *prev_skb) +{ + struct iphdr *iph; + void *buffer; + u64 addr; + + addr = desc->addr; + buffer = (unsigned char *)xsk_buff_raw_get_data(pool, addr); + iph = (struct iphdr *)(buffer + ETH_HLEN); + if (!veth_batch_ip_check_v4(iph, desc->len)) + goto normal; + + switch (iph->protocol) { + case IPPROTO_UDP: + return veth_build_skb_batch_udp(dev, pool, desc, tuple, prev_skb); + default: + break; + } +normal: + tuple->batch_enable = false; + return veth_build_skb_def(dev, pool, desc); +} + +/* Zero copy needs to meet the following conditions: + * 1. The data content of tx desc must be within one page + * 2、the tx desc must support batch xmit, which seted by userspace + */ +static inline bool veth_batch_desc_check(void *buff, u32 len) +{ + u32 offset; + + offset = offset_in_page(buff); + if (PAGE_SIZE - offset < len) + return false; + + return true; +} + +/* here must be a ipv4 or ipv6 packet */ +static inline struct sk_buff *veth_build_skb_batch(struct net_device *dev, + struct xsk_buff_pool *pool, + struct xdp_desc *desc, + struct veth_batch_tuple *tuple, + struct sk_buff *prev_skb) +{ + const struct ethhdr *eth; + void *buffer; + + buffer = xsk_buff_raw_get_data(pool, desc->addr); + if (!veth_batch_desc_check(buffer, desc->len)) + goto normal; + + eth = (struct ethhdr *)buffer; + switch (ntohs(eth->h_proto)) { + case ETH_P_IP: + tuple->batch_enable = true; + return veth_build_skb_batch_v4(dev, pool, desc, tuple, prev_skb); + /* to do: not support yet, just build skb, no batch */ + case ETH_P_IPV6: + fallthrough; + default: + break; + } + +normal: + tuple->batch_flush = false; + tuple->batch_enable = false; + return veth_build_skb_def(dev, pool, desc); +} + +/* just support ipv4 udp batch + * to do: ipv4 tcp and ipv6 + */ +static inline void veth_skb_batch_checksum(struct sk_buff *skb) +{ + struct iphdr *iph = ip_hdr(skb); + struct udphdr *uh = udp_hdr(skb); + int ip_tot_len = skb->len; + int udp_len = skb->len - (skb->transport_header - skb->network_header); + + iph->tot_len = htons(ip_tot_len); + ip_send_check(iph); + uh->len = htons(udp_len); + uh->check = 0; + + udp4_hwcsum(skb, iph->saddr, iph->daddr); +} + static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, int budget) { struct veth_priv *priv, *peer_priv; struct net_device *dev, *peer_dev; + struct veth_batch_tuple tuple; struct veth_stats stats = {}; + struct sk_buff *prev_skb = NULL; struct sk_buff *skb = NULL; struct veth_rq *peer_rq; struct xdp_desc desc; @@ -1093,24 +1440,23 @@ static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, peer_dev = priv->peer; peer_priv = netdev_priv(peer_dev); - /* todo: queue index must set before this */ + /* queue_index set in napi enable + * to do:may be we should select rq by 5-tuple or hash + */ peer_rq = &peer_priv->rq[sq->queue_index]; + memset(&tuple, 0, sizeof(tuple)); + /* set xsk wake up flag, to do: where to disable */ if (xsk_uses_need_wakeup(xsk_pool)) xsk_set_tx_need_wakeup(xsk_pool); while (budget-- > 0) { unsigned int truesize = 0; - struct page *page; - void *vaddr; - void *addr; if (!xsk_tx_peek_desc(xsk_pool, &desc)) break; - addr = xsk_buff_raw_get_data(xsk_pool, desc.addr); - /* can not hold all data in a page */ truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); truesize += desc.len + xsk_pool->headroom; @@ -1120,30 +1466,50 @@ static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, break; } - page = dev_alloc_page(); - if (!page) { + skb = veth_build_skb_batch(peer_dev, xsk_pool, &desc, &tuple, prev_skb); + if (!skb) { + stats.rx_drops++; xsk_tx_completed_addr(xsk_pool, desc.addr); - stats.xdp_drops++; - break; + if (prev_skb != skb) { + napi_gro_receive(&peer_rq->xdp_napi, prev_skb); + prev_skb = NULL; + } + continue; } - vaddr = page_to_virt(page); - - memcpy(vaddr + xsk_pool->headroom, addr, desc.len); - xsk_tx_completed_addr(xsk_pool, desc.addr); - skb = veth_build_skb(vaddr, xsk_pool->headroom, desc.len, PAGE_SIZE); - if (!skb) { - put_page(page); - stats.xdp_drops++; - break; + if (!tuple.batch_enable) { + xsk_tx_completed_addr(xsk_pool, desc.addr); + /* flush the prev skb first to avoid out of order */ + if (prev_skb != skb && prev_skb) { + veth_skb_batch_checksum(prev_skb); + napi_gro_receive(&peer_rq->xdp_napi, prev_skb); + prev_skb = NULL; + } + napi_gro_receive(&peer_rq->xdp_napi, skb); + skb = NULL; + } else { + if (prev_skb && tuple.batch_flush) { + veth_skb_batch_checksum(prev_skb); + napi_gro_receive(&peer_rq->xdp_napi, prev_skb); + if (prev_skb == skb) + prev_skb = skb = NULL; + else + prev_skb = skb; + } else { + prev_skb = skb; + } } - skb->protocol = eth_type_trans(skb, peer_dev); - napi_gro_receive(&peer_rq->xdp_napi, skb); stats.xdp_bytes += desc.len; done++; } + /* means there is a skb need to send to peer_rq (batch)*/ + if (skb) { + veth_skb_batch_checksum(skb); + napi_gro_receive(&peer_rq->xdp_napi, skb); + } + /* release, move consumer,and wakeup the producer */ if (done) { napi_schedule(&peer_rq->xdp_napi); From patchwork Tue Aug 8 03:19:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13345667 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8E2B8F9C4 for ; Tue, 8 Aug 2023 03:21:51 +0000 (UTC) Received: from mail-pf1-x436.google.com (mail-pf1-x436.google.com [IPv6:2607:f8b0:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FC79213E for ; Mon, 7 Aug 2023 20:21:28 -0700 (PDT) Received: by mail-pf1-x436.google.com with SMTP id d2e1a72fcca58-686be28e1a8so3619235b3a.0 for ; Mon, 07 Aug 2023 20:21:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691464888; x=1692069688; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DZBR/L7n+c0K4+2jvj2qF9IKHrGGr2P5oMHLWIibsdA=; b=P/V9ljiS+RZV06sSAc2COBIOCp251sZvy3FmMZoG/OkNXoyMAnbNe24EKZNrbifKu0 TMw9/bIBHlaHnzbtUtWzPHTqYsy2sFGtkRs+xt8vbZzsiS0x7Rc9mKFj2RktzhmElS+2 SIeHhQhwHgkVPUcIXzdkakUQs/PkxTbSAUA2zU2DOYbE+GTmiLeGsM6DB34oY7IioJoX u8Jqfm2ETyE34gqRs7naQ2o/4eL8Sh9ZBIOH/9/XmQWQmnyKbBt5QKI6jrWVjEbTDSob NWueoCvGAVQgsnFEvyZJztfQr+58ypf9d4mbmvIy2JWUjfWc7DNBUg9/MfRu1XPR5mvw xldg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691464888; x=1692069688; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DZBR/L7n+c0K4+2jvj2qF9IKHrGGr2P5oMHLWIibsdA=; b=H+kf8+2pXjasMoCM2/SGN2BTkcLt4LYd/9JTVzQ1PBIlD7uRtIooeNEYXGMLjmEw6b 4oK8mCRBdLSKcsdhmPrqtvDrsAvX1AD/pbf1YWEBvUEKYGu0EpvG1cMKB2gdV0uvx6A3 n9qwmW5AfjMRV7nAJ7AZHosFIf6B4k1pb0IiPXd1NRu26VAPkVoj+JZfkYhBSTelB+/S Cywkxekby4xbZgDUqfoL5oVJU62z+UFNuPjDPo/JUq2m8Glt0WlXTSby1mZhASnAnHwS 1ve4MOFxPwOvYw9vSea3cEemWDC9j2k41kgmbwAJbdsuCxl31KWlvsWCpdaLsgH8XZVi fvZg== X-Gm-Message-State: AOJu0Yys+vzQ4AvCwe7fy2TMIQsYP0Oy0LPn7QPtc3/E8ZS9SUAqVMjg s9rzuBNmkRlOkpZbvq5g+DRCDQ== X-Google-Smtp-Source: AGHT+IFOa9auvTw+52jDK1Q58QGt14PBhuKoWVXpCxAriS0oNTmVYC/tpubsf59++3sdyZXGMGacNA== X-Received: by 2002:a05:6a20:9193:b0:140:d536:d424 with SMTP id v19-20020a056a20919300b00140d536d424mr6753584pzd.53.1691464887689; Mon, 07 Aug 2023 20:21:27 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c10d00b001b896686c78sm7675800pli.66.2023.08.07.20.21.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 20:21:27 -0700 (PDT) From: Albert Huang To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: Albert Huang , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , "open list:NETWORKING DRIVERS" , open list , "open list:XDP (eXpress Data Path)" Subject: [RFC v3 Optimizing veth xsk performance 9/9] veth: add support for AF_XDP tx need_wakup feature Date: Tue, 8 Aug 2023 11:19:13 +0800 Message-Id: <20230808031913.46965-10-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230808031913.46965-1-huangjie.albert@bytedance.com> References: <20230808031913.46965-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC this patch only support for tx need_wakup feature. Signed-off-by: Albert Huang --- drivers/net/veth.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 70489d017b51..7c60c64ef10b 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -1447,9 +1447,9 @@ static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, memset(&tuple, 0, sizeof(tuple)); - /* set xsk wake up flag, to do: where to disable */ + /* clear xsk wake up flag */ if (xsk_uses_need_wakeup(xsk_pool)) - xsk_set_tx_need_wakeup(xsk_pool); + xsk_clear_tx_need_wakeup(xsk_pool); while (budget-- > 0) { unsigned int truesize = 0; @@ -1539,12 +1539,15 @@ static int veth_poll_tx(struct napi_struct *napi, int budget) if (pool) done = veth_xsk_tx_xmit(sq, pool, budget); - rcu_read_unlock(); - if (done < budget) { + /* set xsk wake up flag */ + if (xsk_uses_need_wakeup(pool)) + xsk_set_tx_need_wakeup(pool); + /* if done < budget, the tx ring is no buffer */ napi_complete_done(napi, done); } + rcu_read_unlock(); return done; }