From patchwork Thu Aug 3 14:04:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13340134 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B4BF23D8E for ; Thu, 3 Aug 2023 14:05:56 +0000 (UTC) Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57C293C31 for ; Thu, 3 Aug 2023 07:05:31 -0700 (PDT) Received: by mail-pl1-x62c.google.com with SMTP id d9443c01a7336-1bb775625e2so7268535ad.1 for ; Thu, 03 Aug 2023 07:05:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691071521; x=1691676321; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7lti4noGvWp81QMD78qgE1vMb8ru0CmeNy8GFo0ds5M=; b=ARPb/1sWh5T9p5GWjB03DuP/6QZsEazyqs/OXDGx5lkxhGk5XSHzLq7ODIElIYrFOb w6NmJppncUr5m/HHnhC+ubn3OXKSO93n7CunQ91Kqvw7Tf48agwl/sfkzVFunkSBsUwL JDSPtcxynczQTY4LUWfzEj8vza7wxX8VrrPUBZDCl9dsKp1h0oShS+rTDk01lxfzOOiw y6/9vPodJQo4JciEVHY9A/tNsegQ1GZOZrbkayOHoEORYyBxV7vd9Endd5Bcsumt2EBl c0qPe6AsKe7fZ1nTpg/XssSjVOlfsWqyaGzoMubvQPNbG7/icyQChTqyeTV/5C8talsZ Co2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691071521; x=1691676321; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7lti4noGvWp81QMD78qgE1vMb8ru0CmeNy8GFo0ds5M=; b=dmTgQm8DFKCctwEWkmvnH8PHHSoikjYu4o+6pMPguGTsASSsdQbxilc3wDIYtQnsA3 wZ85d/GrbnDieCfvDuO+Wv41J2EK2KYRYvRP0x0E8YgWtNOHSfJmRAy66b3T/KlX76Me YoLn7ydQGF77T7ELeWegF+BjgvjigN1KFr+oIHtwf/xrOLJdryYgbpCtrxoURVtiuaBL MebcK86pGXxCg8BrQr66wFEIWJmMiNrZ40qrZTsKx19fVBlCd3s5IrWXYqkCwfzUgPJr KIPb2pQvfFJrUgvfKjtxxXp3tT4CAan6m1BSsDC6K9xvpVnEEDPVDJvMNK/Lz3c7b0fh uF/A== X-Gm-Message-State: ABy/qLY0/ySHd2ZAWhC2drNrxtfCLqruMCtY8dtAGGoQDoV///OzPcy0 k2i415cLDU0v6FvPOkN+3BvNcw== X-Google-Smtp-Source: APBJJlFo2cNFMR6i7kmoV21NH6/h1fH77lSujUPrSipJAFbQsyZD+jUKaJqFZK1ENQnjLqiibq8j6w== X-Received: by 2002:a17:902:6943:b0:1b8:6984:f5e5 with SMTP id k3-20020a170902694300b001b86984f5e5mr18474453plt.12.1691071521455; Thu, 03 Aug 2023 07:05:21 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2001:c10:ff04:0:1000::8]) by smtp.gmail.com with ESMTPSA id ji11-20020a170903324b00b001b8a897cd26sm14367485plb.195.2023.08.03.07.05.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Aug 2023 07:05:20 -0700 (PDT) From: "huangjie.albert" To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: "huangjie.albert" , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Menglong Dong , Yunsheng Lin , Richard Gobert , netdev@vger.kernel.org (open list:NETWORKING DRIVERS), linux-kernel@vger.kernel.org (open list), bpf@vger.kernel.org (open list:XDP (eXpress Data Path)) Subject: [RFC Optimizing veth xsk performance 01/10] veth: Implement ethtool's get_ringparam() callback Date: Thu, 3 Aug 2023 22:04:27 +0800 Message-Id: <20230803140441.53596-2-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230803140441.53596-1-huangjie.albert@bytedance.com> References: <20230803140441.53596-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC some xsk libary calls get_ringparam() API to get the queue length to init the xsk umem. Implement that in veth so those scenarios can work properly. Signed-off-by: huangjie.albert --- drivers/net/veth.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 614f3e3efab0..c2b431a7a017 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -255,6 +255,17 @@ static void veth_get_channels(struct net_device *dev, static int veth_set_channels(struct net_device *dev, struct ethtool_channels *ch); +static void veth_get_ringparam(struct net_device *dev, + struct ethtool_ringparam *ring, + struct kernel_ethtool_ringparam *kernel_ring, + struct netlink_ext_ack *extack) +{ + ring->rx_max_pending = VETH_RING_SIZE; + ring->tx_max_pending = VETH_RING_SIZE; + ring->rx_pending = VETH_RING_SIZE; + ring->tx_pending = VETH_RING_SIZE; +} + static const struct ethtool_ops veth_ethtool_ops = { .get_drvinfo = veth_get_drvinfo, .get_link = ethtool_op_get_link, @@ -265,6 +276,7 @@ static const struct ethtool_ops veth_ethtool_ops = { .get_ts_info = ethtool_op_get_ts_info, .get_channels = veth_get_channels, .set_channels = veth_set_channels, + .get_ringparam = veth_get_ringparam, }; /* general routines */ From patchwork Thu Aug 3 14:04:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13340135 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DFAE03D8E for ; Thu, 3 Aug 2023 14:05:59 +0000 (UTC) Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50E60422B for ; Thu, 3 Aug 2023 07:05:35 -0700 (PDT) Received: by mail-pl1-x62f.google.com with SMTP id d9443c01a7336-1bb893e6365so6840405ad.2 for ; Thu, 03 Aug 2023 07:05:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691071534; x=1691676334; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=U+EKFuLLGda3oiEd1qoWkJ3WKIdEe7yI+fFIgxoez3I=; b=fwe4/jwUmizIlQByLG/O10dPGK2dq5O1KyDOna7TN71l452ImeRZyualxarybGdLTE KSINW/D7yJGI/aFLQkhxK3+64JQiodpZvCwoGQg9YTTRbW5UIBKGDIGqSCDZc0josq2d 2P9Mp7v9JLZYEnKdld7C+HuTI+F9QlP+06UlKoisZWQYVZTBEVOxhHgulrtl2LX2dsLj T14TWHGsmOO/0dAKlV2/RtTPFXi7/aWi6ccZCUGBq76s4ZGEvQgdo9wugFVEtweVmFCk koL+jVBuohHproWlNCAlhWsk57vd+MbRFaTzJhLqjDR+JpizjbmPbOzaXkgS2AvXJWJo PDtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691071534; x=1691676334; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=U+EKFuLLGda3oiEd1qoWkJ3WKIdEe7yI+fFIgxoez3I=; b=g26IjNUcxRs6odvIYQN4n+oPFW+VKJQvHqz5tm5OcDLFSQWlqRJCGt8o/CoBf+3gOR rm4HYVp9BqFAC+gOqgIAOsohIZW4d2cJkZv0PWFmIEgOCOeXVnJUc7+zu+J/14LKp/F5 4o/rr2gl/1vbXrDvASn5vp3ONNQZ0JlhjO3PjnnJmeXgXKSpfrVdj37gjz2r4jZMOebd xtmEzRa4V0wg3GkxJyaHrhrXwVSUqiCrH6sFFoa297+cG4XanW2CcxzmIDatS/wnxjfr NGG0bCzr9vFNYXQ8wv51jQTKaE+dbZ0tgmb+vNKqGIDSMi0JvEMcJ5MrYrlJGLyOH6Mz /cPA== X-Gm-Message-State: ABy/qLbdYpyswKeCgvoVpC/oavWKxi6l0/J3UNVvScmBKqzkFo9ltNYW rG9HYTfOlEjg56NYEn3BcJY3OA== X-Google-Smtp-Source: APBJJlGHmcO6FVU+0C9TTuH74fwzKYtkeokBQvVwLyE+muThaGpUX7ZWrGdCmBMnIU1ezs9YL+1y/g== X-Received: by 2002:a17:902:e811:b0:1bc:2c79:c6b5 with SMTP id u17-20020a170902e81100b001bc2c79c6b5mr7539181plg.4.1691071533982; Thu, 03 Aug 2023 07:05:33 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2001:c10:ff04:0:1000::8]) by smtp.gmail.com with ESMTPSA id ji11-20020a170903324b00b001b8a897cd26sm14367485plb.195.2023.08.03.07.05.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Aug 2023 07:05:33 -0700 (PDT) From: "huangjie.albert" To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: "huangjie.albert" , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Shmulik Ladkani , Kees Cook , Richard Gobert , Yunsheng Lin , netdev@vger.kernel.org (open list:NETWORKING DRIVERS), linux-kernel@vger.kernel.org (open list), bpf@vger.kernel.org (open list:XDP (eXpress Data Path)) Subject: [RFC Optimizing veth xsk performance 02/10] xsk: add dma_check_skip for skipping dma check Date: Thu, 3 Aug 2023 22:04:28 +0800 Message-Id: <20230803140441.53596-3-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230803140441.53596-1-huangjie.albert@bytedance.com> References: <20230803140441.53596-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC for the virtual net device such as veth, there is no need to do dma check if we support zero copy. add this flag after unaligned. beacause there are 4 bytes hole pahole -V ./net/xdp/xsk_buff_pool.o: ----------- ... /* --- cacheline 3 boundary (192 bytes) --- */ u32 chunk_size; /* 192 4 */ u32 frame_len; /* 196 4 */ u8 cached_need_wakeup; /* 200 1 */ bool uses_need_wakeup; /* 201 1 */ bool dma_need_sync; /* 202 1 */ bool unaligned; /* 203 1 */ /* XXX 4 bytes hole, try to pack */ void * addrs; /* 208 8 */ spinlock_t cq_lock; /* 216 4 */ ... ----------- Signed-off-by: huangjie.albert --- include/net/xsk_buff_pool.h | 1 + net/xdp/xsk_buff_pool.c | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index b0bdff26fc88..fe31097dc11b 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -81,6 +81,7 @@ struct xsk_buff_pool { bool uses_need_wakeup; bool dma_need_sync; bool unaligned; + bool dma_check_skip; void *addrs; /* Mutual exclusion of the completion ring in the SKB mode. Two cases to protect: * NAPI TX thread and sendmsg error paths in the SKB destructor callback and when diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index b3f7b310811e..ed251b8e8773 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -85,6 +85,7 @@ struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs, XDP_PACKET_HEADROOM; pool->umem = umem; pool->addrs = umem->addrs; + pool->dma_check_skip = false; INIT_LIST_HEAD(&pool->free_list); INIT_LIST_HEAD(&pool->xskb_list); INIT_LIST_HEAD(&pool->xsk_tx_list); @@ -202,7 +203,7 @@ int xp_assign_dev(struct xsk_buff_pool *pool, if (err) goto err_unreg_pool; - if (!pool->dma_pages) { + if (!pool->dma_pages && !pool->dma_check_skip) { WARN(1, "Driver did not DMA map zero-copy buffers"); err = -EINVAL; goto err_unreg_xsk; From patchwork Thu Aug 3 14:04:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13340146 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D47263D8E for ; Thu, 3 Aug 2023 14:06:07 +0000 (UTC) Received: from mail-pl1-x62a.google.com (mail-pl1-x62a.google.com [IPv6:2607:f8b0:4864:20::62a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D07D344AB for ; Thu, 3 Aug 2023 07:05:46 -0700 (PDT) Received: by mail-pl1-x62a.google.com with SMTP id d9443c01a7336-1bbf0f36ce4so6958315ad.0 for ; Thu, 03 Aug 2023 07:05:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691071545; x=1691676345; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=6Lk7Clg45RdUie/KjcmE3+28g2kd4eecHFcFfnkQv4M=; b=asWSJ+14SkxdbE8UXoYgIqvov0BFM79fmeO9euJ1Wv5XRqpkHwShFP6NIXKgvBYS39 KY1uMbrWNQbSjJupGWVrrURhvJ8kmrkqWIv6ZEFPZTTKBxHUS5kikgwuA3z8c5IUvEuo pwfcOp99klDueriStK3YRP+G2A3ta6IrwdNffYwsY1QRFXnKH7rVeHr4XVx9ICdMcs8u TYhqMuVKoTXjFoBzzxm/+LJENMac9MTDwKIUuXzmbia+bqx/NXWlVFRC025fJ/SNhI7G 0sUAAhVdNnFRuvikv0BqaV30V85oEqKXByDdz6upJq+kqp/EKzh4Br8vam8A46D6ers7 qYNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691071545; x=1691676345; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6Lk7Clg45RdUie/KjcmE3+28g2kd4eecHFcFfnkQv4M=; b=l4pf/qzU7M5/Gdbvpa0ahWUateuBUfqmzGK9GifxqtC675Cl/rCOfHCUgkVByGmuWr xSeA0WOJzBtEBTsZ8DZzbC3g6sOFCwxrLosmFCCYnJ5knzOrMdUR6EakhBcVs+xDkUUB U2zY6y5RCIKu6gb6KjbSCwDRdomCOFEIG9utVG8V++MvUuqhtemB47EiqkAPx9qvJucV VgaPdpwlN8tYe0VfdmwR/ibHsanMzJAB2CjzclkOleTf+Z9QdpxQIxxh9gTkaLN/eiDo s+PMlDM7hw1jShvhgCbSXwiu6XA4ksBRnSpmxXINVlMA/oC2UY2RJ2fPJUn+IMZdwkck 4s4w== X-Gm-Message-State: ABy/qLZF7g2IUTEbr1VJGh0BlRLIwJzksU3mjqPtleJ29x7/Ejp6eTQC FSH9KE2JrWstu3mUPtLNyHIc2g== X-Google-Smtp-Source: APBJJlFAD49Rki1qot079JravPBo6eprAsHUVmlpWIbFP9NtxcaEiWCV/ykOzsAbarNuQtLgr7hHUw== X-Received: by 2002:a17:903:22c1:b0:1b8:a936:1905 with SMTP id y1-20020a17090322c100b001b8a9361905mr19934085plg.38.1691071545362; Thu, 03 Aug 2023 07:05:45 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2001:c10:ff04:0:1000::8]) by smtp.gmail.com with ESMTPSA id ji11-20020a170903324b00b001b8a897cd26sm14367485plb.195.2023.08.03.07.05.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Aug 2023 07:05:44 -0700 (PDT) From: "huangjie.albert" To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: "huangjie.albert" , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Kees Cook , Richard Gobert , Yunsheng Lin , netdev@vger.kernel.org (open list:NETWORKING DRIVERS), linux-kernel@vger.kernel.org (open list), bpf@vger.kernel.org (open list:XDP (eXpress Data Path)) Subject: [RFC Optimizing veth xsk performance 03/10] veth: add support for send queue Date: Thu, 3 Aug 2023 22:04:29 +0800 Message-Id: <20230803140441.53596-4-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230803140441.53596-1-huangjie.albert@bytedance.com> References: <20230803140441.53596-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC in order to support native af_xdp for veth. we need support for send queue for napi tx. the upcoming patch will make use of it. Signed-off-by: huangjie.albert --- drivers/net/veth.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index c2b431a7a017..63c3ebe4c5d0 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -56,6 +56,11 @@ struct veth_rq_stats { struct u64_stats_sync syncp; }; +struct veth_sq_stats { + struct veth_stats vs; + struct u64_stats_sync syncp; +}; + struct veth_rq { struct napi_struct xdp_napi; struct napi_struct __rcu *napi; /* points to xdp_napi when the latter is initialized */ @@ -69,11 +74,25 @@ struct veth_rq { struct page_pool *page_pool; }; +struct veth_sq { + struct napi_struct xdp_napi; + struct net_device *dev; + struct xdp_mem_info xdp_mem; + struct veth_sq_stats stats; + u32 queue_index; + /* this is for xsk */ + struct { + struct xsk_buff_pool __rcu *pool; + u32 last_cpu; + }xsk; +}; + struct veth_priv { struct net_device __rcu *peer; atomic64_t dropped; struct bpf_prog *_xdp_prog; struct veth_rq *rq; + struct veth_sq *sq; unsigned int requested_headroom; }; @@ -1495,6 +1514,15 @@ static int veth_alloc_queues(struct net_device *dev) u64_stats_init(&priv->rq[i].stats.syncp); } + priv->sq = kcalloc(dev->num_tx_queues, sizeof(*priv->sq), GFP_KERNEL); + if (!priv->sq) + return -ENOMEM; + + for (i = 0; i < dev->num_tx_queues; i++) { + priv->sq[i].dev = dev; + u64_stats_init(&priv->sq[i].stats.syncp); + } + return 0; } @@ -1503,6 +1531,7 @@ static void veth_free_queues(struct net_device *dev) struct veth_priv *priv = netdev_priv(dev); kfree(priv->rq); + kfree(priv->sq); } static int veth_dev_init(struct net_device *dev) From patchwork Thu Aug 3 14:04:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13340147 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 72BBC3D8E for ; Thu, 3 Aug 2023 14:06:14 +0000 (UTC) Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C75119AA for ; Thu, 3 Aug 2023 07:05:57 -0700 (PDT) Received: by mail-pl1-x62d.google.com with SMTP id d9443c01a7336-1bbc06f830aso7390455ad.0 for ; Thu, 03 Aug 2023 07:05:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691071557; x=1691676357; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gL5Z8CGNWAetROxW2z+SYhypMB/ajEP4EJq25cVNZ/U=; b=bihjjANSY7lExRM+eNzd60fiHXxilGR3q6QxUzPHDL32kYYtKnJpLWNJk1NXvt6QQf 1pz3H9+q910ZhOR8HFBpRcNh6COunDjRS0T8nQsdCCUXi2N66puUcGVJzjxIvoznbUzg ZjCXnAuf0z/AuAGOW696eb2lF2TDa7J1cKthkrLJXtaZhGUfcWxa618oDyWBfhFuQ9lf 5ao7kmH4KaZB/dHyeGdbu9iYN+b2r8Ge/Ni/XjQi/VzdgQRyH83sYvUkGstzHgIyd4Ie bQlh8M3HEMPgB2jpqkCdeTNqEz1rbRxRaeNHUcqK58QOA8wadmleYL9CtxY2V08NMqCl dbZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691071557; x=1691676357; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gL5Z8CGNWAetROxW2z+SYhypMB/ajEP4EJq25cVNZ/U=; b=RKeR42KVk8vzkJfV9tILF4DZDZEP04WJVzRxjtm2qIDDKZLYevxsSPxAe47/5p+fQS baQGh4BOv9rkrIWui+Ni+jdlkrsalJr+XERbH+ltx62A49U4N2NJwb2rtJvWip82oD0e GbiqKYNEJ6QkzlRQxOcItnji7HUpwdelqYi4vdsy3Qe4flM2dlmPe8Jsaw6d/2jgE8Xg W0Z0cAAYqQrHbZuaSsQZqNdtFUQEni/sg8kuWMxM8aOEl04gaqqdAhZ8Q/6ACO5hPZKW X6Z7FqJzqRue1R3H9faPAIy12OR9PibP19jdRS4Forxpj/0puR7ZQ6lS2QddJsmYUF8c ZDdQ== X-Gm-Message-State: ABy/qLYICfzGe2tGi8FRuOAxAcHoq9VZ0Pvfi7oAE96hSnPh3BM8yaFJ llA8GTBWp4vkqLpG4mlQaBbJyw== X-Google-Smtp-Source: APBJJlGptWOnvE77XkL4RxuVOdLMENM3+cQfeJs7EjtuzM45C3B60yWS1RRYlzoWtX8oSq45CNynLA== X-Received: by 2002:a17:902:d506:b0:1b8:c8bc:c81b with SMTP id b6-20020a170902d50600b001b8c8bcc81bmr23436241plg.21.1691071556771; Thu, 03 Aug 2023 07:05:56 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2001:c10:ff04:0:1000::8]) by smtp.gmail.com with ESMTPSA id ji11-20020a170903324b00b001b8a897cd26sm14367485plb.195.2023.08.03.07.05.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Aug 2023 07:05:56 -0700 (PDT) From: "huangjie.albert" To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: "huangjie.albert" , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Menglong Dong , Richard Gobert , netdev@vger.kernel.org (open list:NETWORKING DRIVERS), linux-kernel@vger.kernel.org (open list), bpf@vger.kernel.org (open list:XDP (eXpress Data Path)) Subject: [RFC Optimizing veth xsk performance 04/10] xsk: add xsk_tx_completed_addr function Date: Thu, 3 Aug 2023 22:04:30 +0800 Message-Id: <20230803140441.53596-5-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230803140441.53596-1-huangjie.albert@bytedance.com> References: <20230803140441.53596-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Return desc to the cq by using the descriptor address. Signed-off-by: huangjie.albert --- include/net/xdp_sock_drv.h | 1 + net/xdp/xsk.c | 6 ++++++ net/xdp/xsk_queue.h | 11 +++++++++++ 3 files changed, 18 insertions(+) diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index 1f6fc8c7a84c..5220454bff5c 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -15,6 +15,7 @@ #ifdef CONFIG_XDP_SOCKETS void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries); +void xsk_tx_completed_addr(struct xsk_buff_pool *pool, u64 addr); bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc); u32 xsk_tx_peek_release_desc_batch(struct xsk_buff_pool *pool, u32 max); void xsk_tx_release(struct xsk_buff_pool *pool); diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 4f1e0599146e..b2b8aa7b0bcf 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -396,6 +396,12 @@ void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries) } EXPORT_SYMBOL(xsk_tx_completed); +void xsk_tx_completed_addr(struct xsk_buff_pool *pool, u64 addr) +{ + xskq_prod_submit_addr(pool->cq, addr); +} +EXPORT_SYMBOL(xsk_tx_completed_addr); + void xsk_tx_release(struct xsk_buff_pool *pool) { struct xdp_sock *xs; diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index 13354a1e4280..a494d1dcb1c3 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -428,6 +428,17 @@ static inline void __xskq_prod_submit(struct xsk_queue *q, u32 idx) smp_store_release(&q->ring->producer, idx); /* B, matches C */ } + +static inline void xskq_prod_submit_addr(struct xsk_queue *q, u64 addr) +{ + struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring; + u32 idx = q->ring->producer; + + ring->desc[idx++ & q->ring_mask] = addr; + + __xskq_prod_submit(q, idx); +} + static inline void xskq_prod_submit(struct xsk_queue *q) { __xskq_prod_submit(q, q->cached_prod); From patchwork Thu Aug 3 14:04:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13340148 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF0651F195 for ; Thu, 3 Aug 2023 14:06:26 +0000 (UTC) Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com [IPv6:2607:f8b0:4864:20::62e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD3594697 for ; Thu, 3 Aug 2023 07:06:09 -0700 (PDT) Received: by mail-pl1-x62e.google.com with SMTP id d9443c01a7336-1bbbbb77b38so6847925ad.3 for ; Thu, 03 Aug 2023 07:06:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691071568; x=1691676368; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ldmbgmFCPgTqW2fPQLY5TbtgDmSxsPryWGSwRaZKrPQ=; b=fgb+z1UHH51nnAt8NwEDrRzGqf1c0PuIYH3dTKXFyhE9MPizHTukjzM/X6LgJi8uwz zjSph5k0AUXsRkWV2F2Ci7jSTppkWbcDuQeH0adRzCeJdiI8gCyc7V11IBEWdPn2YOd5 txT7HJzEQZu+rpfM+KM9HIvR1Ijdh+0QasGlgWWUmXH1wsPh3ur4qjiHRwzvsYWqGL6V jGgNY2rZdZ3/jJzDn6ovpYavvEw/9n5UJxh22VGbCGZ1XBlQQPoHubgF5bUkbJaCS8dj VIS0HHCiFUUO9K0iGt2GVI0tQ5THSmrMh71g0ExCpLKJRWJHu2y1TSjI337RTIBh5VBy Wqrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691071568; x=1691676368; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ldmbgmFCPgTqW2fPQLY5TbtgDmSxsPryWGSwRaZKrPQ=; b=hopQomlmWPIGIGJ9QRANW76qKN9CPQTjj3hnpt+DSfA0rv5NOh+4mfVsLGaWpuCofC N7ThQaFk4ZLzYa73p5f6SJZfuXDrFaEfw6ZigQ+KR13k4N9u2rZ0YT/4KLx9dCln0pv6 GUqagIyDZuZzOlEr+vnESqjpxQq+M0pIWHI5ZgDP3tcMBzNSjpH0L1UE52Gv3y0VxQBl 0K/Ydd+sf3hoBIT/Ee6bDP1lQMYa6LL959zbvXXXTLkRjE6AQAQEVBAr/WSFvp8SMiZm mZRZpUMwS2eZAENOZaC+m0L2J7f1g+T8+doSJ7b43ziF3KsELWeF07iA/kE09IrzjqGD Iu8w== X-Gm-Message-State: ABy/qLbaBPhYCnjAcb7M0AB80n5kObOXwh/V4LpNDOOwUbFg1Y5CZmc2 GW50Jjuh265MOkYFHXW0O813zQ== X-Google-Smtp-Source: APBJJlEA3L6q3zQI/3gCKshzrzY08HPMuOIT+b6ya61ZrKbTkkVkNjwAzY46zwqr+lQnEuvfOxbG9w== X-Received: by 2002:a17:902:d2cd:b0:1bc:239:a7e3 with SMTP id n13-20020a170902d2cd00b001bc0239a7e3mr15632332plc.44.1691071568602; Thu, 03 Aug 2023 07:06:08 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2001:c10:ff04:0:1000::8]) by smtp.gmail.com with ESMTPSA id ji11-20020a170903324b00b001b8a897cd26sm14367485plb.195.2023.08.03.07.06.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Aug 2023 07:06:08 -0700 (PDT) From: "huangjie.albert" To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: "huangjie.albert" , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Kees Cook , Menglong Dong , Richard Gobert , Yunsheng Lin , netdev@vger.kernel.org (open list:NETWORKING DRIVERS), linux-kernel@vger.kernel.org (open list), bpf@vger.kernel.org (open list:XDP (eXpress Data Path)) Subject: [RFC Optimizing veth xsk performance 05/10] veth: use send queue tx napi to xmit xsk tx desc Date: Thu, 3 Aug 2023 22:04:31 +0800 Message-Id: <20230803140441.53596-6-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230803140441.53596-1-huangjie.albert@bytedance.com> References: <20230803140441.53596-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Signed-off-by: huangjie.albert --- drivers/net/veth.c | 265 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 264 insertions(+), 1 deletion(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 63c3ebe4c5d0..944761807ca4 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -27,6 +27,8 @@ #include #include #include +#include +#include #define DRV_NAME "veth" #define DRV_VERSION "1.0" @@ -1061,6 +1063,176 @@ static int veth_poll(struct napi_struct *napi, int budget) return done; } +static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, int budget) +{ + struct veth_priv *priv, *peer_priv; + struct net_device *dev, *peer_dev; + struct veth_rq *peer_rq; + struct veth_stats peer_stats = {}; + struct veth_stats stats = {}; + struct veth_xdp_tx_bq bq; + struct xdp_desc desc; + void *xdpf; + int done = 0; + + bq.count = 0; + dev = sq->dev; + priv = netdev_priv(dev); + peer_dev = priv->peer; + peer_priv = netdev_priv(peer_dev); + + /* todo: queue index must set before this */ + peer_rq = &peer_priv->rq[sq->queue_index]; + + /* set xsk wake up flag, to do: where to disable */ + if (xsk_uses_need_wakeup(xsk_pool)) + xsk_set_tx_need_wakeup(xsk_pool); + + while (budget-- > 0) { + unsigned int truesize = 0; + struct xdp_frame *p_frame; + struct page *page; + void *new_addr; + void *addr; + + /* + * get a desc from xsk pool + */ + if (!xsk_tx_peek_desc(xsk_pool, &desc)) { + break; + } + + /* + * Get a xmit addr + * desc.addr is a offset, so we should to convert to real virtual address + */ + addr = xsk_buff_raw_get_data(xsk_pool, desc.addr); + + /* can not hold all data in a page */ + truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) + desc.len + sizeof(struct xdp_frame); + if (truesize > PAGE_SIZE) { + stats.xdp_drops++; + xsk_tx_completed_addr(xsk_pool, desc.addr); + continue; + } + + page = dev_alloc_page(); + if (!page) { + /* + * error , release xdp frame and increase drops + */ + xsk_tx_completed_addr(xsk_pool, desc.addr); + stats.xdp_drops++; + break; + } + new_addr = page_to_virt(page); + + p_frame = new_addr; + new_addr += sizeof(struct xdp_frame); + p_frame->data = new_addr; + p_frame->len = desc.len; + + /* frame should change to the page size, beacause the (struct skb_shared_info) is so large, + * if we build skb in veth_xdp_rcv_one, skb->tail may larger than skb->end which could triger a skb_panic + */ + p_frame->headroom = 0; + p_frame->metasize = 0; + p_frame->frame_sz = PAGE_SIZE; + p_frame->flags = 0; + p_frame->mem.type = MEM_TYPE_PAGE_SHARED; + memcpy(p_frame->data, addr, p_frame->len); + xsk_tx_completed_addr(xsk_pool, desc.addr); + + /* if peer have xdp prog, if it has ,just send to peer */ + p_frame = veth_xdp_rcv_one(peer_rq, p_frame, &bq, &peer_stats); + /* if no xdp with this queue, convert to skb to xmit*/ + if (p_frame) { + xdpf = p_frame; + veth_xdp_rcv_bulk_skb(peer_rq, &xdpf, 1, &bq, &peer_stats); + p_frame = NULL; + } + + stats.xdp_bytes += desc.len; + + done++; + } + + /* release, move consumer,and wakeup the producer */ + if (done) { + napi_schedule(&peer_rq->xdp_napi); + xsk_tx_release(xsk_pool); + } + + + + /* just for peer rq */ + if (peer_stats.xdp_tx > 0) + veth_xdp_flush(peer_rq, &bq); + if (peer_stats.xdp_redirect > 0) + xdp_do_flush(); + + /* update peer rq stats, or maybe we do not need to do this */ + u64_stats_update_begin(&peer_rq->stats.syncp); + peer_rq->stats.vs.xdp_redirect += peer_stats.xdp_redirect; + peer_rq->stats.vs.xdp_packets += done; + peer_rq->stats.vs.xdp_bytes += stats.xdp_bytes; + peer_rq->stats.vs.xdp_drops += peer_stats.xdp_drops; + peer_rq->stats.vs.rx_drops += peer_stats.rx_drops; + peer_rq->stats.vs.xdp_tx += peer_stats.xdp_tx; + u64_stats_update_end(&peer_rq->stats.syncp); + + /* update sq stats */ + u64_stats_update_begin(&sq->stats.syncp); + sq->stats.vs.xdp_packets += done; + sq->stats.vs.xdp_bytes += stats.xdp_bytes; + sq->stats.vs.xdp_drops += stats.xdp_drops; + u64_stats_update_end(&sq->stats.syncp); + + return done; +} + +static int veth_poll_tx(struct napi_struct *napi, int budget) +{ + struct veth_sq *sq = container_of(napi, struct veth_sq, xdp_napi); + struct xsk_buff_pool *pool; + int done = 0; + xdp_set_return_frame_no_direct(); + + sq->xsk.last_cpu = smp_processor_id(); + + /* xmit for tx queue */ + rcu_read_lock(); + pool = rcu_dereference(sq->xsk.pool); + if (pool) { + done = veth_xsk_tx_xmit(sq, pool, budget); + } + rcu_read_unlock(); + + if (done < budget) { + /* if done < budget, the tx ring is no buffer */ + napi_complete_done(napi, done); + } + + xdp_clear_return_frame_no_direct(); + + return done; +} + + +static int veth_napi_add_tx(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + int i; + + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_sq *sq = &priv->sq[i]; + netif_napi_add(dev, &sq->xdp_napi, veth_poll_tx); + napi_enable(&sq->xdp_napi); + } + + return 0; +} + static int veth_create_page_pool(struct veth_rq *rq) { struct page_pool_params pp_params = { @@ -1153,6 +1325,19 @@ static void veth_napi_del_range(struct net_device *dev, int start, int end) } } +static void veth_napi_del_tx(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + int i; + + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_sq *sq = &priv->sq[i]; + + napi_disable(&sq->xdp_napi); + __netif_napi_del(&sq->xdp_napi); + } +} + static void veth_napi_del(struct net_device *dev) { veth_napi_del_range(dev, 0, dev->real_num_rx_queues); @@ -1360,7 +1545,7 @@ static void veth_set_xdp_features(struct net_device *dev) struct veth_priv *priv_peer = netdev_priv(peer); xdp_features_t val = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | - NETDEV_XDP_ACT_RX_SG; + NETDEV_XDP_ACT_RX_SG | NETDEV_XDP_ACT_XSK_ZEROCOPY; if (priv_peer->_xdp_prog || veth_gro_requested(peer)) val |= NETDEV_XDP_ACT_NDO_XMIT | @@ -1737,11 +1922,89 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, return err; } +static int veth_xsk_pool_enable(struct net_device *dev, struct xsk_buff_pool *pool, u16 qid) +{ + struct veth_priv *peer_priv; + struct veth_priv *priv = netdev_priv(dev); + struct net_device *peer_dev = priv->peer; + int err = 0; + + if (qid >= dev->real_num_tx_queues) + return -EINVAL; + + if(!peer_dev) + return -EINVAL; + + /* no dma, so we just skip dma skip in xsk zero copy */ + pool->dma_check_skip = true; + + peer_priv = netdev_priv(peer_dev); + /* + * enable peer tx xdp here, this side + * xdp is enable by veth_xdp_set + * to do: we need to check whther this side is already enable xdp + * maybe it do not have xdp prog + */ + if (!(peer_priv->_xdp_prog) && (!veth_gro_requested(peer_dev))) { + /* peer should enable napi*/ + err = veth_napi_enable(peer_dev); + if (err) + return err; + } + + /* Here is already protected by rtnl_lock, so rcu_assign_pointer + * is safe. + */ + rcu_assign_pointer(priv->sq[qid].xsk.pool, pool); + + veth_napi_add_tx(dev); + + return err; +} + +static int veth_xsk_pool_disable(struct net_device *dev, u16 qid) +{ + struct veth_priv *peer_priv; + struct veth_priv *priv = netdev_priv(dev); + struct net_device *peer_dev = priv->peer; + int err = 0; + + if (qid >= dev->real_num_tx_queues) + return -EINVAL; + + if(!peer_dev) + return -EINVAL; + + peer_priv = netdev_priv(peer_dev); + + /* to do: this may be failed */ + if (!(peer_priv->_xdp_prog) && (!veth_gro_requested(peer_dev))) { + /* disable peer napi */ + veth_napi_del(peer_dev); + } + + veth_napi_del_tx(dev); + + rcu_assign_pointer(priv->sq[qid].xsk.pool, NULL); + return err; +} + +/* this is for setup xdp */ +static int veth_xsk_pool_setup(struct net_device *dev, struct netdev_bpf *xdp) +{ + if (xdp->xsk.pool) + return veth_xsk_pool_enable(dev, xdp->xsk.pool, xdp->xsk.queue_id); + else + return veth_xsk_pool_disable(dev, xdp->xsk.queue_id); +} + static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp) { switch (xdp->command) { case XDP_SETUP_PROG: return veth_xdp_set(dev, xdp->prog, xdp->extack); + case XDP_SETUP_XSK_POOL: + return veth_xsk_pool_setup(dev, xdp); default: return -EINVAL; } From patchwork Thu Aug 3 14:04:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13340149 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 592B1200D0 for ; Thu, 3 Aug 2023 14:06:29 +0000 (UTC) Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 42F1B46AB for ; Thu, 3 Aug 2023 07:06:21 -0700 (PDT) Received: by mail-pl1-x62d.google.com with SMTP id d9443c01a7336-1bbf8cb694aso8779355ad.3 for ; Thu, 03 Aug 2023 07:06:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691071580; x=1691676380; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=uAdtU7SLip58WY/jMpbQZS/1BOrBTm47cvrvRETW+jo=; b=gje/ZNVdV37PsuBXcn9JcvDF1Ybdczh0zKLeYvcGuYmGxjT6hPC415oZTPIOtzx83g 7FFCFIT6Kmor5VCwN/i3qktqe92oFscFmNOAH082Ja7YR86GHFserclV4lc3CJNE1JVd +JiXjBg0V2wOrubU8ZMJeKXuoUv0Ozj0un1MiQjMdSXoHWHZZSf8vRs5OnkLjIX36jjU ig5yH157Hd/7+McdeYRCdJOIPawm3lbgYqB1DJYMWZGMAML63qPftJeILVLdeNK3uyOL pQHZQMLoxkAmw95IlOXouBPXN7IMMwa9psr71neaanIiaQcBn8UqW9AyIOJ3Td1T4F+e 6mDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691071580; x=1691676380; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uAdtU7SLip58WY/jMpbQZS/1BOrBTm47cvrvRETW+jo=; b=d+Ebk8Y8LEPCeQ+qJ/BJ8R/NNft6K54laD4sC38V1mBBK6oQo481NhBNctCyhDJdnf 4ECmjrF/fkRoGrld/HysU4sD37EhVCChK43y7M5VNBXqaF/paNGvu2kPzdhU08eDCuZo D97b+Joi0WeVufmvrQd3GhEsAzA/iP5JOmnULKW5nUZ8eZQgFCpE8JHnaVq6TfMwPA1F Jjb/hmZ6SGrlbekIyCpHAuGc7EwfGmxCDLd5FePE+wHXSub1EnhbVlS8YL8cf079+dVA Qvc3v+MfK4157ZHK0Uw2WjD9TvnBHO/mYsaUptTZpDoNfVgKWTyt5Hj6EqhTBfrR/9o8 L7Og== X-Gm-Message-State: ABy/qLZtshcdh3ks16E4Yo83K8gkwFZVTXCQe+i/5NI1ar5xZ9cQh8t7 cVws2fynl33oK45G3MuazCr2ag== X-Google-Smtp-Source: APBJJlEB4J2N6PwO/XzHjVb9E/ZDHejJFXzJmb2/i3ejuedYImdsgBFrlBMRUUbpVS6DJIX4pEtG5w== X-Received: by 2002:a17:902:e807:b0:1b9:e091:8037 with SMTP id u7-20020a170902e80700b001b9e0918037mr23334397plg.30.1691071580298; Thu, 03 Aug 2023 07:06:20 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2001:c10:ff04:0:1000::8]) by smtp.gmail.com with ESMTPSA id ji11-20020a170903324b00b001b8a897cd26sm14367485plb.195.2023.08.03.07.06.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Aug 2023 07:06:19 -0700 (PDT) From: "huangjie.albert" To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: "huangjie.albert" , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Shmulik Ladkani , Kees Cook , Richard Gobert , Yunsheng Lin , netdev@vger.kernel.org (open list:NETWORKING DRIVERS), linux-kernel@vger.kernel.org (open list), bpf@vger.kernel.org (open list:XDP (eXpress Data Path)) Subject: [RFC Optimizing veth xsk performance 06/10] veth: add ndo_xsk_wakeup callback for veth Date: Thu, 3 Aug 2023 22:04:32 +0800 Message-Id: <20230803140441.53596-7-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230803140441.53596-1-huangjie.albert@bytedance.com> References: <20230803140441.53596-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Signed-off-by: huangjie.albert --- drivers/net/veth.c | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 944761807ca4..600225e27e9e 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -1840,6 +1840,45 @@ static void veth_set_rx_headroom(struct net_device *dev, int new_hr) rcu_read_unlock(); } +static void veth_xsk_remote_trigger_napi(void *info) +{ + struct veth_sq *sq = info; + + napi_schedule(&sq->xdp_napi); +} + +static int veth_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag) +{ + struct veth_priv *priv; + struct veth_sq *sq; + u32 last_cpu, cur_cpu; + + if (!netif_running(dev)) + return -ENETDOWN; + + if (qid >= dev->real_num_rx_queues) + return -EINVAL; + + priv = netdev_priv(dev); + sq = &priv->sq[qid]; + + if (napi_if_scheduled_mark_missed(&sq->xdp_napi)) + return 0; + + last_cpu = sq->xsk.last_cpu; + cur_cpu = get_cpu(); + + /* raise a napi */ + if (last_cpu == cur_cpu) { + napi_schedule(&sq->xdp_napi); + } else { + smp_call_function_single(last_cpu, veth_xsk_remote_trigger_napi, sq, true); + } + + put_cpu(); + return 0; +} + static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, struct netlink_ext_ack *extack) { @@ -2054,6 +2093,7 @@ static const struct net_device_ops veth_netdev_ops = { .ndo_set_rx_headroom = veth_set_rx_headroom, .ndo_bpf = veth_xdp, .ndo_xdp_xmit = veth_ndo_xdp_xmit, + .ndo_xsk_wakeup = veth_xsk_wakeup, .ndo_get_peer_dev = veth_peer_dev, }; From patchwork Thu Aug 3 14:04:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13340150 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 31632200D0 for ; Thu, 3 Aug 2023 14:06:34 +0000 (UTC) Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8AB9E46AD for ; Thu, 3 Aug 2023 07:06:32 -0700 (PDT) Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-1bb7b8390e8so6897795ad.2 for ; Thu, 03 Aug 2023 07:06:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691071592; x=1691676392; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Eg+CmrWjgB1kgvmO7SAfesuSxaP3dPOXxrKbkUe7IUY=; b=XrqyvZDxSgiavPWsBCbSRHg+ou1GVyTzy2qzcAS64qQGp+5sGIgEphDFf+c7iQsHOn sZUqX8uA285/7n1CGhGV341ljQbfmvQZRJA2l/46yYxYNTZNY//6jbTn4Sz5r00eY15x uP68lQ9F0N2yR9rt4u2yYkcTx+bCa4Z5wEMIUI8iFf5eXqYrRcPeDEfmOkp7L+DROPOW 49sQrOMYJDe2u70gPeP+xU/GOuRU3LH+bIrm3AOp+0kMDIpIgOwC7BAiBy+nnWtFq7XK uOJx0krx1qjTykH3Pfsu4sp+nJH4RWaU59BpMVaGHqHStjFrTTCi4SYIYS+lZoPloPYZ yZAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691071592; x=1691676392; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Eg+CmrWjgB1kgvmO7SAfesuSxaP3dPOXxrKbkUe7IUY=; b=eP4xYIg0lUpXuB2x2sM2zGn59uuT0E5hRcijBFFdxMMiRqZYtv9c2mfjBoVW3cWZyS xidj9uxq3m5+TKfyx6aYVHS/cU0+nfbSNDf3bHfxGI3A+vP5wnleTMTBJHtPE1dhxabS dmzQNpHjU6rGQaO2sGOPueZv+MuDPkmP+wOFrE0UeHD5mYpk2NOUhEuiDxOKtg9Bjo/i tduPGe46j6fHo5vMl/c7F5kbFdwKW3UZucVPOLXeg9U7ShQMHh7tyTl/Ho9bY+YMcavG znr/a3lSlDo28LSxBvTuFDOYL1kJUvMKQ6cz28qmnEYICt3FruPGu/j5HMaJPTXyCLEv AGKQ== X-Gm-Message-State: ABy/qLZX1N0Fzrkr+xnGRpuJ7MzPKsy0Z2zDJmvQ76gqoEpfZbgGQ25U 0IsEwTUHxvWMzVmexgK929fv4A== X-Google-Smtp-Source: APBJJlEBtUTtysA9ORUwLWAkRadcMVE68MWA/AhiBT1iIGdjrvwOYA4w2MgmmH5jyHDtduyS/gd0Mw== X-Received: by 2002:a17:903:120a:b0:1bb:9bc8:d230 with SMTP id l10-20020a170903120a00b001bb9bc8d230mr20412138plh.23.1691071591969; Thu, 03 Aug 2023 07:06:31 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2001:c10:ff04:0:1000::8]) by smtp.gmail.com with ESMTPSA id ji11-20020a170903324b00b001b8a897cd26sm14367485plb.195.2023.08.03.07.06.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Aug 2023 07:06:31 -0700 (PDT) From: "huangjie.albert" To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: "huangjie.albert" , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Shmulik Ladkani , Kees Cook , Richard Gobert , Yunsheng Lin , netdev@vger.kernel.org (open list:NETWORKING DRIVERS), linux-kernel@vger.kernel.org (open list), bpf@vger.kernel.org (open list:XDP (eXpress Data Path)) Subject: [RFC Optimizing veth xsk performance 07/10] sk_buff: add destructor_arg_xsk_pool for zero copy Date: Thu, 3 Aug 2023 22:04:33 +0800 Message-Id: <20230803140441.53596-8-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230803140441.53596-1-huangjie.albert@bytedance.com> References: <20230803140441.53596-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-State: RFC this member is add for dummy dev to suppot zero copy Signed-off-by: huangjie.albert --- include/linux/skbuff.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 16a49ba534e4..fa9577d233a4 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -592,6 +592,7 @@ struct skb_shared_info { /* Intermediate layers must ensure that destructor_arg * remains valid until skb destructor */ void * destructor_arg; + void * destructor_arg_xsk_pool; /* just for dummy device xsk zero copy */ /* must be last field, see pskb_expand_head() */ skb_frag_t frags[MAX_SKB_FRAGS]; From patchwork Thu Aug 3 14:04:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13340151 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D6B5724184 for ; Thu, 3 Aug 2023 14:06:46 +0000 (UTC) Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 90C104226 for ; Thu, 3 Aug 2023 07:06:44 -0700 (PDT) Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-1bb119be881so8702655ad.3 for ; Thu, 03 Aug 2023 07:06:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691071604; x=1691676404; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TLnjc8jPq2PoD5M5aiFgUeVBrc9aIeJJlwbo6fwSV28=; b=PniuI4mhE+9Hr5kVvw8WodCwDYHCknlrSoi4oY/F/4MMl9qgwDvgyPHh76hzQ6TSjH Vd0kAhHQsPBcV7x4jj/bvR4YBGspwCSImxmOJnMrRL7xOC6KwcEj9UDlYCeE9sLTaJh5 a8SJArDhjkzS9FNXvbI3C6V+aHuAWEdph5x0qqqWF352VlalvDF0dWL3rqWtUR1xQ0yu 7kasjrpqgsxNGpDDdDWhB2qbQGwLM4HYCXkVaBmBqQm4ke9nl12IufmKSnFbwO+bgZWO UEWrwVu6BztJAvACi1qE0AXwr+3xmtO4RtI/OJr2LdfmbI53BnQ9b9o8HUYub4rpIsjD Ex7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691071604; x=1691676404; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TLnjc8jPq2PoD5M5aiFgUeVBrc9aIeJJlwbo6fwSV28=; b=f0E7IKWd+MbdhIev4bpR6Xq/P0w0Y/Ib6B4c85MGyd2hXwvbWtpGXhqzY79fSyPtS6 AHhtf48M5EsDVeNB7FYW6H+3VbGD7TcWXZpSuvN8+BKlS+UyfrFdBjJas6csXHnhM0Wv m4gL4TEYrysceAXZnfOXLg7hz9On4/T1b0iUCPjM0kl4+zhCOZPHcB5CLp7Q6NEXQS5z Jq/E45pQe6PKaByC2iR4caRDq4klu8W4xAj7jJUfJ25YfUXGhtINwWJJRqwoFZOWzF90 IdieZh2Wn+7PMdZmIqSGEd3GH7wiEXI4bFryMogS4VNNnYeWrPboduIIz5A7aUxXhkxV q7ow== X-Gm-Message-State: ABy/qLY8lbiBRFNkWKTA3ubCN0X7/cOOsHIYWs9z17JgzVuB3unJYyOc YfU2Wlsn8bBQP2JvpYbnHvg7ZA== X-Google-Smtp-Source: APBJJlGBfvMWrRFHnBIykTOaed0DaYtu47ZrmnPZAuDr9AgMye0iCQDRpN4u3trjXrWFZSDvLs74ig== X-Received: by 2002:a17:903:22c8:b0:1bb:b91b:2b40 with SMTP id y8-20020a17090322c800b001bbb91b2b40mr22927532plg.60.1691071603910; Thu, 03 Aug 2023 07:06:43 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2001:c10:ff04:0:1000::8]) by smtp.gmail.com with ESMTPSA id ji11-20020a170903324b00b001b8a897cd26sm14367485plb.195.2023.08.03.07.06.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Aug 2023 07:06:43 -0700 (PDT) From: "huangjie.albert" To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: "huangjie.albert" , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Kees Cook , Shmulik Ladkani , Richard Gobert , Yunsheng Lin , netdev@vger.kernel.org (open list:NETWORKING DRIVERS), linux-kernel@vger.kernel.org (open list), bpf@vger.kernel.org (open list:XDP (eXpress Data Path)) Subject: [RFC Optimizing veth xsk performance 08/10] xdp: add xdp_mem_type MEM_TYPE_XSK_BUFF_POOL_TX Date: Thu, 3 Aug 2023 22:04:34 +0800 Message-Id: <20230803140441.53596-9-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230803140441.53596-1-huangjie.albert@bytedance.com> References: <20230803140441.53596-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC this type of xdp mem will be used for zero copy in later patch Signed-off-by: huangjie.albert --- include/net/xdp.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/net/xdp.h b/include/net/xdp.h index d1c5381fc95f..cb1621b5a0c9 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -42,6 +42,7 @@ enum xdp_mem_type { MEM_TYPE_PAGE_ORDER0, /* Orig XDP full page model */ MEM_TYPE_PAGE_POOL, MEM_TYPE_XSK_BUFF_POOL, + MEM_TYPE_XSK_BUFF_POOL_TX, MEM_TYPE_MAX, }; From patchwork Thu Aug 3 14:04:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13340152 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8EE1B21D4B for ; Thu, 3 Aug 2023 14:06:58 +0000 (UTC) Received: from mail-pl1-x62a.google.com (mail-pl1-x62a.google.com [IPv6:2607:f8b0:4864:20::62a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B97E21704 for ; Thu, 3 Aug 2023 07:06:56 -0700 (PDT) Received: by mail-pl1-x62a.google.com with SMTP id d9443c01a7336-1bb2468257fso6999535ad.0 for ; Thu, 03 Aug 2023 07:06:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691071616; x=1691676416; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=owtOoOyNNyHXuBh5jsdR6sP92r7iqlMI4HArUfBEUKw=; b=Ghs/tD2bqRAzRE4qNjDbjZ2jqi4Aud2qJLVOS1R3Mn88/gE9N324cixR+sHK93osyp kISFnDgXvzvcrT9htphzgxZPqyDOOydh3MGFg0+3skcYy7YW0UgwejBFFqHoE510WuH4 sFCXuh4V1WWmQnAyX2ZkAfhjFe6RKRym9yHoJyr/F9WBTvHWGi5Fk5PcjLbyqUFLQtxt /Mb2H98BsMj1+apKPqpzLhPSfVFPZyo8iu6ITI8us/EfGT2OT3+7ELiVbaLkQi8wj9lF mMxas9KXjr51l0KO8jbXfO/aY1egn8YB8+1ATEyujoYTK0aj92t+hL/hw7Db5hxZpLTj dLmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691071616; x=1691676416; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=owtOoOyNNyHXuBh5jsdR6sP92r7iqlMI4HArUfBEUKw=; b=hokANw/ife6VA8jpti73sI60cRaGgDGK0s4DzkQNpnlkF6G4ySIU5fBf7aaKQ9Fwcp 6QfaxFtiUrQxtcd3Tzx5/iL2geCW8ouAyzJLlWLyEZM3OX7CwxAZxiceFKy1i5znW9BA IXACA5PUIUKYAr70A/R/1R/lnqPCpaHmkEA/x/8ntwL4sc/Q0VwzxDBcq50lEdTDDj+i Ks7Evhl23EVOPc6PVD1dxuuXXN3qYXQrs0/IVqm2Dq5FNr1AeHiJT8AL3IBHyvtD+yKB VPgrQ1aI1JHcweLldQNIkRrMhW0mD0SBSbJfQz4hTIzf/Fx9VV5Im0bCvgWSrCa4IlEV ynyg== X-Gm-Message-State: ABy/qLY+P5WjdVafBk7IXtHQ2bQrsDt3jyGaV2b6PGRbzcPA3qH5j7K/ BZjr6oXdY1WhlYE4sWmTdUDUIQ== X-Google-Smtp-Source: APBJJlG+Oba9iEBFPitGDkKbiUy3MmyaJzxquGm84pDzuTntAP0RERF/1d4Tz4lH4qyCZALpDbE4CA== X-Received: by 2002:a17:902:f686:b0:1bb:673f:36ae with SMTP id l6-20020a170902f68600b001bb673f36aemr19669038plg.15.1691071616012; Thu, 03 Aug 2023 07:06:56 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2001:c10:ff04:0:1000::8]) by smtp.gmail.com with ESMTPSA id ji11-20020a170903324b00b001b8a897cd26sm14367485plb.195.2023.08.03.07.06.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Aug 2023 07:06:55 -0700 (PDT) From: "huangjie.albert" To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: "huangjie.albert" , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , netdev@vger.kernel.org (open list:NETWORKING DRIVERS), linux-kernel@vger.kernel.org (open list), bpf@vger.kernel.org (open list:XDP (eXpress Data Path)) Subject: [RFC Optimizing veth xsk performance 09/10] veth: support zero copy for af xdp Date: Thu, 3 Aug 2023 22:04:35 +0800 Message-Id: <20230803140441.53596-10-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230803140441.53596-1-huangjie.albert@bytedance.com> References: <20230803140441.53596-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC The following conditions need to be satisfied to achieve zero-copy: 1. The tx desc has enough space to store the xdp_frame and skb_share_info. 2. The memory address pointed to by the tx desc is within a page. test zero copy with libxdp Performance: |MSS (bytes) | Packet rate (PPS) AF_XDP | 1300 | 480k AF_XDP with zero copy| 1300 | 540K signed-off-by: huangjie.albert --- drivers/net/veth.c | 207 ++++++++++++++++++++++++++++++++++++++------- 1 file changed, 178 insertions(+), 29 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 600225e27e9e..e4f1a8345f42 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -103,6 +103,11 @@ struct veth_xdp_tx_bq { unsigned int count; }; +struct veth_seg_info { + u32 segs; + u64 desc[] ____cacheline_aligned_in_smp; +}; + /* * ethtool interface */ @@ -645,6 +650,100 @@ static int veth_xdp_tx(struct veth_rq *rq, struct xdp_buff *xdp, return 0; } +static struct sk_buff *veth_build_skb(void *head, int headroom, int len, + int buflen) +{ + struct sk_buff *skb; + + skb = build_skb(head, buflen); + if (!skb) + return NULL; + + skb_reserve(skb, headroom); + skb_put(skb, len); + + return skb; +} + +static void veth_xsk_destruct_skb(struct sk_buff *skb) +{ + struct veth_seg_info *seg_info = (struct veth_seg_info *)skb_shinfo(skb)->destructor_arg; + struct xsk_buff_pool *pool = (struct xsk_buff_pool *)skb_shinfo(skb)->destructor_arg_xsk_pool; + unsigned long flags; + u32 index = 0; + u64 addr; + + /* release cq */ + spin_lock_irqsave(&pool->cq_lock, flags); + for (index = 0; index < seg_info->segs; index++) { + addr = (u64)(long)seg_info->desc[index]; + xsk_tx_completed_addr(pool, addr); + } + spin_unlock_irqrestore(&pool->cq_lock, flags); + + kfree(seg_info); + skb_shinfo(skb)->destructor_arg = NULL; + skb_shinfo(skb)->destructor_arg_xsk_pool = NULL; +} + +static struct sk_buff *veth_build_skb_zerocopy(struct net_device *dev, struct xsk_buff_pool *pool, + struct xdp_desc *desc) +{ + struct veth_seg_info *seg_info; + struct sk_buff *skb; + struct page *page; + void *hard_start; + u32 len, ts; + void *buffer; + int headroom; + u64 addr; + u32 index; + + addr = desc->addr; + len = desc->len; + buffer = xsk_buff_raw_get_data(pool, addr); + ts = pool->unaligned ? len : pool->chunk_size; + + headroom = offset_in_page(buffer); + + /* offset in umem pool buffer */ + addr = buffer - pool->addrs; + + /* get the page of the desc */ + page = pool->umem->pgs[addr >> PAGE_SHIFT]; + + /* in order to avoid to get freed by kfree_skb */ + get_page(page); + + hard_start = page_to_virt(page); + + skb = veth_build_skb(hard_start, headroom, len, ts); + seg_info = (struct veth_seg_info *)kmalloc(struct_size(seg_info, desc, MAX_SKB_FRAGS), GFP_KERNEL); + if (!seg_info) + { + printk("here must to deal with\n"); + } + + /* later we will support gso for this */ + index = skb_shinfo(skb)->gso_segs; + seg_info->desc[index] = desc->addr; + seg_info->segs = ++index; + + skb->truesize += ts; + skb->dev = dev; + skb_shinfo(skb)->destructor_arg = (void *)(long)seg_info; + skb_shinfo(skb)->destructor_arg_xsk_pool = (void *)(long)pool; + skb->destructor = veth_xsk_destruct_skb; + + /* set the mac header */ + skb->protocol = eth_type_trans(skb, dev); + + /* to do, add skb to sock. may be there is no need to do for this + * refcount_add(ts, &xs->sk.sk_wmem_alloc); + */ + return skb; +} + static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq, struct xdp_frame *frame, struct veth_xdp_tx_bq *bq, @@ -1063,6 +1162,20 @@ static int veth_poll(struct napi_struct *napi, int budget) return done; } +/* if buffer contain in a page */ +static inline bool buffer_in_page(void *buffer, u32 len) +{ + u32 offset; + + offset = offset_in_page(buffer); + + if(PAGE_SIZE - offset >= len) { + return true; + } else { + return false; + } +} + static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, int budget) { struct veth_priv *priv, *peer_priv; @@ -1073,6 +1186,9 @@ static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, struct veth_xdp_tx_bq bq; struct xdp_desc desc; void *xdpf; + struct sk_buff *skb = NULL; + bool zc = xsk_pool->umem->zc; + u32 xsk_headroom = xsk_pool->headroom; int done = 0; bq.count = 0; @@ -1102,12 +1218,6 @@ static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, break; } - /* - * Get a xmit addr - * desc.addr is a offset, so we should to convert to real virtual address - */ - addr = xsk_buff_raw_get_data(xsk_pool, desc.addr); - /* can not hold all data in a page */ truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) + desc.len + sizeof(struct xdp_frame); if (truesize > PAGE_SIZE) { @@ -1116,16 +1226,39 @@ static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, continue; } - page = dev_alloc_page(); - if (!page) { - /* - * error , release xdp frame and increase drops - */ - xsk_tx_completed_addr(xsk_pool, desc.addr); - stats.xdp_drops++; - break; + /* + * Get a xmit addr + * desc.addr is a offset, so we should to convert to real virtual address + */ + addr = xsk_buff_raw_get_data(xsk_pool, desc.addr); + + /* + * in order to support zero copy, headroom must have enough space to hold xdp_frame + */ + if (zc && (xsk_headroom < sizeof(struct xdp_frame))) + zc = false; + + /* + * if desc not contain in a page, also do not support zero copy + */ + if (!buffer_in_page(addr, desc.len)) + zc = false; + + if (zc) { + /* headroom is reserved for xdp_frame */ + new_addr = addr - sizeof(struct xdp_frame); + } else { + page = dev_alloc_page(); + if (!page) { + /* + * error , release xdp frame and increase drops + */ + xsk_tx_completed_addr(xsk_pool, desc.addr); + stats.xdp_drops++; + break; + } + new_addr = page_to_virt(page); } - new_addr = page_to_virt(page); p_frame = new_addr; new_addr += sizeof(struct xdp_frame); @@ -1137,19 +1270,37 @@ static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, */ p_frame->headroom = 0; p_frame->metasize = 0; - p_frame->frame_sz = PAGE_SIZE; p_frame->flags = 0; - p_frame->mem.type = MEM_TYPE_PAGE_SHARED; - memcpy(p_frame->data, addr, p_frame->len); - xsk_tx_completed_addr(xsk_pool, desc.addr); - - /* if peer have xdp prog, if it has ,just send to peer */ - p_frame = veth_xdp_rcv_one(peer_rq, p_frame, &bq, &peer_stats); - /* if no xdp with this queue, convert to skb to xmit*/ - if (p_frame) { - xdpf = p_frame; - veth_xdp_rcv_bulk_skb(peer_rq, &xdpf, 1, &bq, &peer_stats); - p_frame = NULL; + + if (zc) { + p_frame->frame_sz = xsk_pool->frame_len; + /* to do: if there is a xdp, how to recycle the tx desc */ + p_frame->mem.type = MEM_TYPE_XSK_BUFF_POOL_TX; + /* no need to copy address for af+xdp */ + p_frame = veth_xdp_rcv_one(peer_rq, p_frame, &bq, &peer_stats); + if (p_frame) { + skb = veth_build_skb_zerocopy(peer_dev, xsk_pool, &desc); + if (skb) { + napi_gro_receive(&peer_rq->xdp_napi, skb); + skb = NULL; + } else { + xsk_tx_completed_addr(xsk_pool, desc.addr); + } + } + } else { + p_frame->frame_sz = PAGE_SIZE; + p_frame->mem.type = MEM_TYPE_PAGE_SHARED; + memcpy(p_frame->data, addr, p_frame->len); + xsk_tx_completed_addr(xsk_pool, desc.addr); + + /* if peer have xdp prog, if it has ,just send to peer */ + p_frame = veth_xdp_rcv_one(peer_rq, p_frame, &bq, &peer_stats); + /* if no xdp with this queue, convert to skb to xmit*/ + if (p_frame) { + xdpf = p_frame; + veth_xdp_rcv_bulk_skb(peer_rq, &xdpf, 1, &bq, &peer_stats); + p_frame = NULL; + } } stats.xdp_bytes += desc.len; @@ -1163,8 +1314,6 @@ static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, xsk_tx_release(xsk_pool); } - - /* just for peer rq */ if (peer_stats.xdp_tx > 0) veth_xdp_flush(peer_rq, &bq); From patchwork Thu Aug 3 14:04:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 13340153 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6F74121D57 for ; Thu, 3 Aug 2023 14:07:13 +0000 (UTC) Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACD53212D for ; Thu, 3 Aug 2023 07:07:08 -0700 (PDT) Received: by mail-pl1-x631.google.com with SMTP id d9443c01a7336-1bb893e6365so6855305ad.2 for ; Thu, 03 Aug 2023 07:07:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691071628; x=1691676428; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/2IExsisHLl5wetq8sgZJvv6YFTcLdIhC4TMC2ESGVY=; b=Oo5SDTLnzyXY8qDJYDdgxiSm5KInG31WNp+6s7Aw/7vgUIpls7XV7PDYHeze+UBg5n DNl2lunWdtdOaDHXuIxE5JrJyhPZkrGSa5Td1Oqf49qpwWw8PUrIhrB7fyBHyv9odtE/ 2qhsRMSNlJy/L5ER8K/Vh3MdsTTXMR3Gnygwt5BAmPJvMOF3dflvGX+gQaOJXTqq16QM P5ltuB+AVoHAG8A95Q946y5BIfsKc/I92jLgMKQ0AYzhnBfHhIGGNmOqSujdFbxrxmja ioZjmD0Md1lh9M3POXnCgEhPxjLWOm1sgM3OhCN/Kk0TToqHwyEuF4xyR9Y7y+KH8Otp jFCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691071628; x=1691676428; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/2IExsisHLl5wetq8sgZJvv6YFTcLdIhC4TMC2ESGVY=; b=CQn9CVMsLPP657foD1P8IpRueZybwHq3wQHSA9ZGnZjrPy72Ob4zaaFCuSs9exR3+g Yr2fEu1K+j96Bll5jJ047r3lV7N2TF/AbBTPIVLzXcCvuQd07hywRn0whtTEDBr9FapZ TB/c65baTq02358v7NdhoP6LQq1XlyfIszYIEKdwMsY4twmj2Zxzzb7c6y9vfJngunuH jGQENZ9+lx4+jW2Ktba7rUDlnN6ZqkoIN7W+2z5WByKIFXC/e/Y1hZt0bryEPn/HaWYX yZS3lzUJSI3mjUTJWC9445obAvP4yRfKcUYVvI9SaYjOaC3PditAfY0tLxU9DlnJTYvk VBtQ== X-Gm-Message-State: ABy/qLYDfGq6eTSUS8ERP/SRMTKYvA9KQU8BrOb88WHyfU/C9DW7eHoE MI2wniszGbMYTDxVvQhmXIUxjc62mBn5l7c4dCV+zw== X-Google-Smtp-Source: APBJJlGM8RoAE6cD3VioW4CUSFeL+hHwLfuzL1GJ4xLv4TINBL02vq+XY8MB4/1oSJTMpPWFx/47lg== X-Received: by 2002:a17:902:ea08:b0:1bb:893e:5df5 with SMTP id s8-20020a170902ea0800b001bb893e5df5mr22673401plg.34.1691071627752; Thu, 03 Aug 2023 07:07:07 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2001:c10:ff04:0:1000::8]) by smtp.gmail.com with ESMTPSA id ji11-20020a170903324b00b001b8a897cd26sm14367485plb.195.2023.08.03.07.07.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Aug 2023 07:07:07 -0700 (PDT) From: "huangjie.albert" To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: "huangjie.albert" , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , netdev@vger.kernel.org (open list:NETWORKING DRIVERS), linux-kernel@vger.kernel.org (open list), bpf@vger.kernel.org (open list:XDP (eXpress Data Path)) Subject: [RFC Optimizing veth xsk performance 10/10] veth: af_xdp tx batch support for ipv4 udp Date: Thu, 3 Aug 2023 22:04:36 +0800 Message-Id: <20230803140441.53596-11-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230803140441.53596-1-huangjie.albert@bytedance.com> References: <20230803140441.53596-1-huangjie.albert@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC A typical topology is shown below: veth<--------veth-peer 1 | |2 | bridge<------->eth0(such as mlnx5 NIC) If you use af_xdp to send packets from veth to a physical NIC, it needs to go through some software paths, so we can refer to the implementation of kernel GSO. When af_xdp sends packets out from veth, consider aggregating packets and send a large packet from the veth virtual NIC to the physical NIC. performance:(test weth libxdp lib) AF_XDP without batch : 480 Kpps (with ksoftirqd 100% cpu) AF_XDP with batch : 1.5 Mpps (with ksoftirqd 15% cpu) With af_xdp batch, the libxdp user-space program reaches a bottleneck. Therefore, the softirq did not reach the limit. Signed-off-by: huangjie.albert --- drivers/net/veth.c | 264 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 249 insertions(+), 15 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index e4f1a8345f42..b0dbd21089c8 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -29,6 +29,7 @@ #include #include #include +#include #define DRV_NAME "veth" #define DRV_VERSION "1.0" @@ -103,6 +104,18 @@ struct veth_xdp_tx_bq { unsigned int count; }; +struct veth_gso_tuple { + __u8 protocol; + __be32 saddr; + __be32 daddr; + __be16 source; + __be16 dest; + __be16 gso_size; + __be16 gso_segs; + bool gso_enable; + bool gso_flush; +}; + struct veth_seg_info { u32 segs; u64 desc[] ____cacheline_aligned_in_smp; @@ -650,6 +663,84 @@ static int veth_xdp_tx(struct veth_rq *rq, struct xdp_buff *xdp, return 0; } +static struct sk_buff *veth_build_gso_head_skb(struct net_device *dev, char *buff, u32 tot_len, u32 headroom, u32 iph_len, u32 th_len) +{ + struct sk_buff *skb = NULL; + int err = 0; + + skb = alloc_skb(tot_len, GFP_KERNEL); + if (unlikely(!skb)) + return NULL; + + /* header room contains the eth header */ + skb_reserve(skb, headroom - ETH_HLEN); + + skb_put(skb, ETH_HLEN + iph_len + th_len); + + skb_shinfo(skb)->gso_segs = 0; + + err = skb_store_bits(skb, 0, buff, ETH_HLEN + iph_len + th_len); + if (unlikely(err)) { + kfree_skb(skb); + return NULL; + } + + skb->protocol = eth_type_trans(skb, dev); + skb->network_header = skb->mac_header + ETH_HLEN; + skb->transport_header = skb->network_header + iph_len; + skb->ip_summed = CHECKSUM_PARTIAL; + + return skb; +} + +static inline bool gso_segment_match(struct veth_gso_tuple *gso_tuple, struct iphdr *iph, struct udphdr *udph) +{ + if (gso_tuple->protocol == iph->protocol && + gso_tuple->saddr == iph->saddr && + gso_tuple->daddr == iph->daddr && + gso_tuple->source == udph->source && + gso_tuple->dest == udph->dest && + gso_tuple->gso_size == ntohs(udph->len)) + { + gso_tuple->gso_flush = false; + return true; + } else { + gso_tuple->gso_flush = true; + return false; + } +} + +static inline void gso_tuple_init(struct veth_gso_tuple *gso_tuple, struct iphdr *iph, struct udphdr *udph) +{ + gso_tuple->protocol = iph->protocol; + gso_tuple->saddr = iph->saddr; + gso_tuple->daddr = iph->daddr; + gso_tuple->source = udph->source; + gso_tuple->dest = udph->dest; + gso_tuple->gso_flush = false; + gso_tuple->gso_size = ntohs(udph->len); + gso_tuple->gso_segs = 0; +} + +/* only ipv4 udp support gso now */ +static inline bool ip_hdr_gso_check(unsigned char *buff, u32 len) +{ + struct iphdr *iph; + + if (len <= (ETH_HLEN + sizeof(*iph))) + return false; + + iph = (struct iphdr *)(buff + ETH_HLEN); + + /* + * check for ip headers, if the data support gso + */ + if (iph->ihl < 5 || iph->version != 4 || len < (iph->ihl * 4 + ETH_HLEN) || iph->protocol != IPPROTO_UDP) + return false; + + return true; +} + static struct sk_buff *veth_build_skb(void *head, int headroom, int len, int buflen) { @@ -686,8 +777,8 @@ static void veth_xsk_destruct_skb(struct sk_buff *skb) skb_shinfo(skb)->destructor_arg_xsk_pool = NULL; } -static struct sk_buff *veth_build_skb_zerocopy(struct net_device *dev, struct xsk_buff_pool *pool, - struct xdp_desc *desc) +static struct sk_buff *veth_build_skb_zerocopy_normal(struct net_device *dev, + struct xsk_buff_pool *pool, struct xdp_desc *desc) { struct veth_seg_info *seg_info; struct sk_buff *skb; @@ -698,45 +789,133 @@ static struct sk_buff *veth_build_skb_zerocopy(struct net_device *dev, struct xs int headroom; u64 addr; u32 index; - addr = desc->addr; len = desc->len; buffer = xsk_buff_raw_get_data(pool, addr); ts = pool->unaligned ? len : pool->chunk_size; - headroom = offset_in_page(buffer); - /* offset in umem pool buffer */ addr = buffer - pool->addrs; - /* get the page of the desc */ page = pool->umem->pgs[addr >> PAGE_SHIFT]; - /* in order to avoid to get freed by kfree_skb */ get_page(page); - hard_start = page_to_virt(page); - skb = veth_build_skb(hard_start, headroom, len, ts); seg_info = (struct veth_seg_info *)kmalloc(struct_size(seg_info, desc, MAX_SKB_FRAGS), GFP_KERNEL); if (!seg_info) { printk("here must to deal with\n"); } - /* later we will support gso for this */ index = skb_shinfo(skb)->gso_segs; seg_info->desc[index] = desc->addr; seg_info->segs = ++index; - skb->truesize += ts; skb->dev = dev; skb_shinfo(skb)->destructor_arg = (void *)(long)seg_info; skb_shinfo(skb)->destructor_arg_xsk_pool = (void *)(long)pool; skb->destructor = veth_xsk_destruct_skb; - /* set the mac header */ skb->protocol = eth_type_trans(skb, dev); + /* to do, add skb to sock. may be there is no need to do for this + * refcount_add(ts, &xs->sk.sk_wmem_alloc); + */ + return skb; +} + +static struct sk_buff *veth_build_skb_zerocopy_gso(struct net_device *dev, struct xsk_buff_pool *pool, + struct xdp_desc *desc, struct veth_gso_tuple *gso_tuple, struct sk_buff *prev_skb) +{ + u32 hr, len, ts, index, iph_len, th_len, data_offset, data_len, tot_len; + struct veth_seg_info *seg_info; + void *buffer; + struct udphdr *udph; + struct iphdr *iph; + struct sk_buff *skb; + struct page *page; + int hh_len = 0; + u64 addr; + + addr = desc->addr; + len = desc->len; + + /* l2 reserved len */ + hh_len = LL_RESERVED_SPACE(dev); + hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(hh_len)); + + /* data points to eth header */ + buffer = (unsigned char *)xsk_buff_raw_get_data(pool, addr); + + iph = (struct iphdr *)(buffer + ETH_HLEN); + iph_len = iph->ihl * 4; + + udph = (struct udphdr *)(buffer + ETH_HLEN + iph_len); + th_len = sizeof(struct udphdr); + + if (gso_tuple->gso_flush) + gso_tuple_init(gso_tuple, iph, udph); + + ts = pool->unaligned ? len : pool->chunk_size; + + data_offset = offset_in_page(buffer) + ETH_HLEN + iph_len + th_len; + data_len = len - (ETH_HLEN + iph_len + th_len); + + /* head is null or this is a new 5 tuple */ + if (NULL == prev_skb || !gso_segment_match(gso_tuple, iph, udph)) { + tot_len = hr + iph_len + th_len; + skb = veth_build_gso_head_skb(dev, buffer, tot_len, hr, iph_len, th_len); + if (!skb) { + /* to do: handle here for skb */ + return NULL; + } + + /* store information for gso */ + seg_info = (struct veth_seg_info *)kmalloc(struct_size(seg_info, desc, MAX_SKB_FRAGS), GFP_KERNEL); + if (!seg_info) { + /* to do */ + kfree_skb(skb); + return NULL; + } + } else { + skb = prev_skb; + skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4 | SKB_GSO_PARTIAL; + skb_shinfo(skb)->gso_size = data_len; + skb->ip_summed = CHECKSUM_PARTIAL; + + /* max segment is MAX_SKB_FRAGS */ + if(skb_shinfo(skb)->gso_segs >= MAX_SKB_FRAGS - 1) { + gso_tuple->gso_flush = true; + } + seg_info = (struct veth_seg_info *)skb_shinfo(skb)->destructor_arg; + } + + /* offset in umem pool buffer */ + addr = buffer - pool->addrs; + + /* get the page of the desc */ + page = pool->umem->pgs[addr >> PAGE_SHIFT]; + + /* in order to avoid to get freed by kfree_skb */ + get_page(page); + + /* desc.data can not hold in two */ + skb_fill_page_desc(skb, skb_shinfo(skb)->gso_segs, page, data_offset, data_len); + + skb->len += data_len; + skb->data_len += data_len; + skb->truesize += ts; + skb->dev = dev; + + /* later we will support gso for this */ + index = skb_shinfo(skb)->gso_segs; + seg_info->desc[index] = desc->addr; + seg_info->segs = ++index; + skb_shinfo(skb)->gso_segs++; + + skb_shinfo(skb)->destructor_arg = (void *)(long)seg_info; + skb_shinfo(skb)->destructor_arg_xsk_pool = (void *)(long)pool; + skb->destructor = veth_xsk_destruct_skb; /* to do, add skb to sock. may be there is no need to do for this * refcount_add(ts, &xs->sk.sk_wmem_alloc); @@ -744,6 +923,22 @@ static struct sk_buff *veth_build_skb_zerocopy(struct net_device *dev, struct xs return skb; } +static inline struct sk_buff *veth_build_skb_zerocopy(struct net_device *dev, struct xsk_buff_pool *pool, + struct xdp_desc *desc, struct veth_gso_tuple *gso_tuple, struct sk_buff *prev_skb) +{ + void *buffer; + + buffer = xsk_buff_raw_get_data(pool, desc->addr); + if (ip_hdr_gso_check(buffer, desc->len)) { + gso_tuple->gso_enable = true; + return veth_build_skb_zerocopy_gso(dev, pool, desc, gso_tuple, prev_skb); + } else { + gso_tuple->gso_flush = false; + gso_tuple->gso_enable = false; + return veth_build_skb_zerocopy_normal(dev, pool, desc); + } +} + static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq, struct xdp_frame *frame, struct veth_xdp_tx_bq *bq, @@ -1176,16 +1371,33 @@ static inline bool buffer_in_page(void *buffer, u32 len) } } +static inline void veth_skb_gso_check_update(struct sk_buff *skb) +{ + struct iphdr *iph = ip_hdr(skb); + struct udphdr *uh = udp_hdr(skb); + int ip_tot_len = skb->len; + int udp_len = skb->len - (skb->transport_header - skb->network_header); + iph->tot_len = htons(ip_tot_len); + ip_send_check(iph); + uh->len = htons(udp_len); + uh->check = 0; + + /* udp4 checksum update */ + udp4_hwcsum(skb, iph->saddr, iph->daddr); +} + static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, int budget) { struct veth_priv *priv, *peer_priv; struct net_device *dev, *peer_dev; + struct veth_gso_tuple gso_tuple; struct veth_rq *peer_rq; struct veth_stats peer_stats = {}; struct veth_stats stats = {}; struct veth_xdp_tx_bq bq; struct xdp_desc desc; void *xdpf; + struct sk_buff *prev_skb = NULL; struct sk_buff *skb = NULL; bool zc = xsk_pool->umem->zc; u32 xsk_headroom = xsk_pool->headroom; @@ -1200,6 +1412,8 @@ static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, /* todo: queue index must set before this */ peer_rq = &peer_priv->rq[sq->queue_index]; + memset(&gso_tuple, 0, sizeof(gso_tuple)); + /* set xsk wake up flag, to do: where to disable */ if (xsk_uses_need_wakeup(xsk_pool)) xsk_set_tx_need_wakeup(xsk_pool); @@ -1279,12 +1493,26 @@ static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, /* no need to copy address for af+xdp */ p_frame = veth_xdp_rcv_one(peer_rq, p_frame, &bq, &peer_stats); if (p_frame) { - skb = veth_build_skb_zerocopy(peer_dev, xsk_pool, &desc); - if (skb) { + skb = veth_build_skb_zerocopy(peer_dev, xsk_pool, &desc, &gso_tuple, prev_skb); + if (!gso_tuple.gso_enable) { napi_gro_receive(&peer_rq->xdp_napi, skb); skb = NULL; } else { - xsk_tx_completed_addr(xsk_pool, desc.addr); + if (prev_skb && gso_tuple.gso_flush) { + veth_skb_gso_check_update(prev_skb); + napi_gro_receive(&peer_rq->xdp_napi, prev_skb); + + if (prev_skb == skb) { + skb = NULL; + prev_skb = NULL; + } else { + prev_skb = skb; + } + } else if (NULL == skb){ + xsk_tx_completed_addr(xsk_pool, desc.addr); + } else { + prev_skb = skb; + } } } } else { @@ -1308,6 +1536,12 @@ static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, done++; } + /* gso skb */ + if (NULL!=skb) { + veth_skb_gso_check_update(skb); + napi_gro_receive(&peer_rq->xdp_napi, skb); + } + /* release, move consumer,and wakeup the producer */ if (done) { napi_schedule(&peer_rq->xdp_napi);