From patchwork Tue Jan 5 09:11:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xuan Zhuo X-Patchwork-Id: 11998413 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A962C43331 for ; Tue, 5 Jan 2021 09:13:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0953D225AC for ; Tue, 5 Jan 2021 09:13:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726059AbhAEJM4 (ORCPT ); Tue, 5 Jan 2021 04:12:56 -0500 Received: from out30-43.freemail.mail.aliyun.com ([115.124.30.43]:52590 "EHLO out30-43.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727225AbhAEJMa (ORCPT ); Tue, 5 Jan 2021 04:12:30 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R191e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=alimailimapcm10staff010182156082;MF=xuanzhuo@linux.alibaba.com;NM=1;PH=DS;RN=22;SR=0;TI=SMTPD_---0UKoYpcb_1609837905; Received: from localhost(mailfrom:xuanzhuo@linux.alibaba.com fp:SMTPD_---0UKoYpcb_1609837905) by smtp.aliyun-inc.com(127.0.0.1); Tue, 05 Jan 2021 17:11:45 +0800 From: Xuan Zhuo To: netdev@vger.kernel.org Cc: dust.li@linux.alibaba.com, tonylu@linux.alibaba.com, "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Jonathan Lemon , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , virtualization@lists.linux-foundation.org (open list:VIRTIO CORE AND NET DRIVERS), linux-kernel@vger.kernel.org (open list), bpf@vger.kernel.org (open list:XDP SOCKETS (AF_XDP)) Subject: [PATCH netdev 1/5] xsk: support get page for drv Date: Tue, 5 Jan 2021 17:11:39 +0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org For some drivers, such as virtio-net, we do not configure dma when binding xsk. We will get the page when sending. This patch participates in a field need_dma during the setup pool. If the device does not use dma, this value should be set to false. And a function xsk_buff_raw_get_page is added to get the page based on addr in drv. Signed-off-by: Xuan Zhuo --- include/linux/netdevice.h | 1 + include/net/xdp_sock_drv.h | 10 ++++++++++ include/net/xsk_buff_pool.h | 1 + net/xdp/xsk_buff_pool.c | 10 +++++++++- 4 files changed, 21 insertions(+), 1 deletion(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 7bf1679..b8baef9 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -915,6 +915,7 @@ struct netdev_bpf { struct { struct xsk_buff_pool *pool; u16 queue_id; + bool need_dma; } xsk; }; }; diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index 4e295541..e9c7e25 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -100,6 +100,11 @@ static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr) return xp_raw_get_data(pool, addr); } +static inline struct page *xsk_buff_raw_get_page(struct xsk_buff_pool *pool, u64 addr) +{ + return xp_raw_get_page(pool, addr); +} + static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp, struct xsk_buff_pool *pool) { struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp); @@ -232,6 +237,11 @@ static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr) return NULL; } +static inline struct page *xsk_buff_raw_get_page(struct xsk_buff_pool *pool, u64 addr) +{ + return NULL; +} + static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp, struct xsk_buff_pool *pool) { } diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index 01755b8..54e461d 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -103,6 +103,7 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev, bool xp_can_alloc(struct xsk_buff_pool *pool, u32 count); void *xp_raw_get_data(struct xsk_buff_pool *pool, u64 addr); dma_addr_t xp_raw_get_dma(struct xsk_buff_pool *pool, u64 addr); +struct page *xp_raw_get_page(struct xsk_buff_pool *pool, u64 addr); static inline dma_addr_t xp_get_dma(struct xdp_buff_xsk *xskb) { return xskb->dma; diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index 67a4494..9bb058f 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -167,12 +167,13 @@ static int __xp_assign_dev(struct xsk_buff_pool *pool, bpf.command = XDP_SETUP_XSK_POOL; bpf.xsk.pool = pool; bpf.xsk.queue_id = queue_id; + bpf.xsk.need_dma = true; err = netdev->netdev_ops->ndo_bpf(netdev, &bpf); if (err) goto err_unreg_pool; - if (!pool->dma_pages) { + if (bpf.xsk.need_dma && !pool->dma_pages) { WARN(1, "Driver did not DMA map zero-copy buffers"); err = -EINVAL; goto err_unreg_xsk; @@ -536,6 +537,13 @@ void *xp_raw_get_data(struct xsk_buff_pool *pool, u64 addr) } EXPORT_SYMBOL(xp_raw_get_data); +struct page *xp_raw_get_page(struct xsk_buff_pool *pool, u64 addr) +{ + addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr; + return pool->umem->pgs[addr >> PAGE_SHIFT]; +} +EXPORT_SYMBOL(xp_raw_get_page); + dma_addr_t xp_raw_get_dma(struct xsk_buff_pool *pool, u64 addr) { addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr; From patchwork Tue Jan 5 09:11:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xuan Zhuo X-Patchwork-Id: 11998409 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.9 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY, UNWANTED_LANGUAGE_BODY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6EBD1C433E6 for ; Tue, 5 Jan 2021 09:13:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 31EDA22AAA for ; Tue, 5 Jan 2021 09:13:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727389AbhAEJMa (ORCPT ); Tue, 5 Jan 2021 04:12:30 -0500 Received: from out30-44.freemail.mail.aliyun.com ([115.124.30.44]:44538 "EHLO out30-44.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726677AbhAEJMa (ORCPT ); Tue, 5 Jan 2021 04:12:30 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=xuanzhuo@linux.alibaba.com;NM=1;PH=DS;RN=22;SR=0;TI=SMTPD_---0UKoFTZ0_1609837906; Received: from localhost(mailfrom:xuanzhuo@linux.alibaba.com fp:SMTPD_---0UKoFTZ0_1609837906) by smtp.aliyun-inc.com(127.0.0.1); Tue, 05 Jan 2021 17:11:46 +0800 From: Xuan Zhuo To: netdev@vger.kernel.org Cc: dust.li@linux.alibaba.com, tonylu@linux.alibaba.com, "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Jonathan Lemon , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , virtualization@lists.linux-foundation.org (open list:VIRTIO CORE AND NET DRIVERS), linux-kernel@vger.kernel.org (open list), bpf@vger.kernel.org (open list:XDP SOCKETS (AF_XDP)) Subject: [PATCH netdev 2/5] virtio-net: support XDP_TX when not more queues Date: Tue, 5 Jan 2021 17:11:40 +0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org The number of queues implemented by many virtio backends is limited, especially some machines have a large number of CPUs. In this case, it is often impossible to allocate a separate queue for XDP_TX. This patch allows XDP_TX to run by lock when not enough queue. Signed-off-by: Xuan Zhuo --- drivers/net/virtio_net.c | 42 ++++++++++++++++++++++++++++++++---------- 1 file changed, 32 insertions(+), 10 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index f65eea6..f2349b8 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -194,6 +194,7 @@ struct virtnet_info { /* # of XDP queue pairs currently used by the driver */ u16 xdp_queue_pairs; + bool xdp_enable; /* I like... big packets and I cannot lie! */ bool big_packets; @@ -481,14 +482,34 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi, return 0; } -static struct send_queue *virtnet_xdp_sq(struct virtnet_info *vi) +static struct send_queue *virtnet_get_xdp_sq(struct virtnet_info *vi) { unsigned int qp; + struct netdev_queue *txq; + + if (vi->curr_queue_pairs > nr_cpu_ids) { + qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id(); + } else { + qp = smp_processor_id() % vi->curr_queue_pairs; + txq = netdev_get_tx_queue(vi->dev, qp); + __netif_tx_lock(txq, raw_smp_processor_id()); + } - qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id(); return &vi->sq[qp]; } +static void virtnet_put_xdp_sq(struct virtnet_info *vi) +{ + unsigned int qp; + struct netdev_queue *txq; + + if (vi->curr_queue_pairs <= nr_cpu_ids) { + qp = smp_processor_id() % vi->curr_queue_pairs; + txq = netdev_get_tx_queue(vi->dev, qp); + __netif_tx_unlock(txq); + } +} + static int virtnet_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, u32 flags) { @@ -512,7 +533,7 @@ static int virtnet_xdp_xmit(struct net_device *dev, if (!xdp_prog) return -ENXIO; - sq = virtnet_xdp_sq(vi); + sq = virtnet_get_xdp_sq(vi); if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) { ret = -EINVAL; @@ -560,12 +581,13 @@ static int virtnet_xdp_xmit(struct net_device *dev, sq->stats.kicks += kicks; u64_stats_update_end(&sq->stats.syncp); + virtnet_put_xdp_sq(vi); return ret; } static unsigned int virtnet_get_headroom(struct virtnet_info *vi) { - return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0; + return vi->xdp_enable ? VIRTIO_XDP_HEADROOM : 0; } /* We copy the packet for XDP in the following cases: @@ -1457,12 +1479,13 @@ static int virtnet_poll(struct napi_struct *napi, int budget) xdp_do_flush(); if (xdp_xmit & VIRTIO_XDP_TX) { - sq = virtnet_xdp_sq(vi); + sq = virtnet_get_xdp_sq(vi); if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) { u64_stats_update_begin(&sq->stats.syncp); sq->stats.kicks++; u64_stats_update_end(&sq->stats.syncp); } + virtnet_put_xdp_sq(vi); } return received; @@ -2415,10 +2438,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog, /* XDP requires extra queues for XDP_TX */ if (curr_qp + xdp_qp > vi->max_queue_pairs) { - NL_SET_ERR_MSG_MOD(extack, "Too few free TX rings available"); - netdev_warn(dev, "request %i queues but max is %i\n", - curr_qp + xdp_qp, vi->max_queue_pairs); - return -ENOMEM; + xdp_qp = 0; } old_prog = rtnl_dereference(vi->rq[0].xdp_prog); @@ -2451,12 +2471,14 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog, netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp); vi->xdp_queue_pairs = xdp_qp; + vi->xdp_enable = false; if (prog) { for (i = 0; i < vi->max_queue_pairs; i++) { rcu_assign_pointer(vi->rq[i].xdp_prog, prog); if (i == 0 && !old_prog) virtnet_clear_guest_offloads(vi); } + vi->xdp_enable = true; } for (i = 0; i < vi->max_queue_pairs; i++) { @@ -2524,7 +2546,7 @@ static int virtnet_set_features(struct net_device *dev, int err; if ((dev->features ^ features) & NETIF_F_LRO) { - if (vi->xdp_queue_pairs) + if (vi->xdp_enable) return -EBUSY; if (features & NETIF_F_LRO) From patchwork Tue Jan 5 09:11:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xuan Zhuo X-Patchwork-Id: 11998407 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CAD0C43381 for ; Tue, 5 Jan 2021 09:13:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5E9D622AAD for ; Tue, 5 Jan 2021 09:13:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727725AbhAEJMe (ORCPT ); Tue, 5 Jan 2021 04:12:34 -0500 Received: from out30-132.freemail.mail.aliyun.com ([115.124.30.132]:54094 "EHLO out30-132.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726677AbhAEJMd (ORCPT ); Tue, 5 Jan 2021 04:12:33 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R141e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01424;MF=xuanzhuo@linux.alibaba.com;NM=1;PH=DS;RN=22;SR=0;TI=SMTPD_---0UKog-tc_1609837907; Received: from localhost(mailfrom:xuanzhuo@linux.alibaba.com fp:SMTPD_---0UKog-tc_1609837907) by smtp.aliyun-inc.com(127.0.0.1); Tue, 05 Jan 2021 17:11:47 +0800 From: Xuan Zhuo To: netdev@vger.kernel.org Cc: dust.li@linux.alibaba.com, tonylu@linux.alibaba.com, "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Jonathan Lemon , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , virtualization@lists.linux-foundation.org (open list:VIRTIO CORE AND NET DRIVERS), linux-kernel@vger.kernel.org (open list), bpf@vger.kernel.org (open list:XDP SOCKETS (AF_XDP)) Subject: [PATCH netdev 3/5] virtio-net, xsk: distinguish XDP_TX and XSK XMIT ctx Date: Tue, 5 Jan 2021 17:11:41 +0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org If support xsk, a new ptr will be recovered during the process of freeing the old ptr. In order to distinguish between ctx sent by XDP_TX and ctx sent by xsk, a struct is added here to distinguish between these two situations. virtnet_xdp_type.type It is used to distinguish different ctx, and virtnet_xdp_type.offset is used to record the offset between "true ctx" and virtnet_xdp_type. The newly added virtnet_xsk_hdr will be used for xsk. Signed-off-by: Xuan Zhuo --- drivers/net/virtio_net.c | 77 ++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 62 insertions(+), 15 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index f2349b8..df38a9f 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -94,6 +94,22 @@ struct virtnet_rq_stats { u64 kicks; }; +enum { + XDP_TYPE_XSK, + XDP_TYPE_TX, +}; + +struct virtnet_xdp_type { + int offset:24; + unsigned type:8; +}; + +struct virtnet_xsk_hdr { + struct virtnet_xdp_type type; + struct virtio_net_hdr_mrg_rxbuf hdr; + u32 len; +}; + #define VIRTNET_SQ_STAT(m) offsetof(struct virtnet_sq_stats, m) #define VIRTNET_RQ_STAT(m) offsetof(struct virtnet_rq_stats, m) @@ -252,14 +268,19 @@ static bool is_xdp_frame(void *ptr) return (unsigned long)ptr & VIRTIO_XDP_FLAG; } -static void *xdp_to_ptr(struct xdp_frame *ptr) +static void *xdp_to_ptr(struct virtnet_xdp_type *ptr) { return (void *)((unsigned long)ptr | VIRTIO_XDP_FLAG); } -static struct xdp_frame *ptr_to_xdp(void *ptr) +static struct virtnet_xdp_type *ptr_to_xtype(void *ptr) { - return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG); + return (struct virtnet_xdp_type *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG); +} + +static void *xtype_got_ptr(struct virtnet_xdp_type *xdptype) +{ + return (char *)xdptype + xdptype->offset; } /* Converting between virtqueue no. and kernel tx/rx queue no. @@ -460,11 +481,16 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi, struct xdp_frame *xdpf) { struct virtio_net_hdr_mrg_rxbuf *hdr; + struct virtnet_xdp_type *xdptype; int err; - if (unlikely(xdpf->headroom < vi->hdr_len)) + if (unlikely(xdpf->headroom < vi->hdr_len + sizeof(*xdptype))) return -EOVERFLOW; + xdptype = (struct virtnet_xdp_type *)(xdpf + 1); + xdptype->offset = (char *)xdpf - (char *)xdptype; + xdptype->type = XDP_TYPE_TX; + /* Make room for virtqueue hdr (also change xdpf->headroom?) */ xdpf->data -= vi->hdr_len; /* Zero header and leave csum up to XDP layers */ @@ -474,7 +500,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi, sg_init_one(sq->sg, xdpf->data, xdpf->len); - err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdp_to_ptr(xdpf), + err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdp_to_ptr(xdptype), GFP_ATOMIC); if (unlikely(err)) return -ENOSPC; /* Caller handle free/refcnt */ @@ -544,8 +570,11 @@ static int virtnet_xdp_xmit(struct net_device *dev, /* Free up any pending old buffers before queueing new ones. */ while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) { if (likely(is_xdp_frame(ptr))) { - struct xdp_frame *frame = ptr_to_xdp(ptr); + struct virtnet_xdp_type *xtype; + struct xdp_frame *frame; + xtype = ptr_to_xtype(ptr); + frame = xtype_got_ptr(xtype); bytes += frame->len; xdp_return_frame(frame); } else { @@ -1395,24 +1424,34 @@ static int virtnet_receive(struct receive_queue *rq, int budget, static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi) { - unsigned int len; unsigned int packets = 0; unsigned int bytes = 0; - void *ptr; + unsigned int len; + struct virtnet_xdp_type *xtype; + struct xdp_frame *frame; + struct virtnet_xsk_hdr *xskhdr; + struct sk_buff *skb; + void *ptr; while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) { if (likely(!is_xdp_frame(ptr))) { - struct sk_buff *skb = ptr; + skb = ptr; pr_debug("Sent skb %p\n", skb); bytes += skb->len; napi_consume_skb(skb, in_napi); } else { - struct xdp_frame *frame = ptr_to_xdp(ptr); + xtype = ptr_to_xtype(ptr); - bytes += frame->len; - xdp_return_frame(frame); + if (xtype->type == XDP_TYPE_XSK) { + xskhdr = (struct virtnet_xsk_hdr *)xtype; + bytes += xskhdr->len; + } else { + frame = xtype_got_ptr(xtype); + xdp_return_frame(frame); + bytes += frame->len; + } } packets++; } @@ -2675,14 +2714,22 @@ static void free_unused_bufs(struct virtnet_info *vi) { void *buf; int i; + struct send_queue *sq; for (i = 0; i < vi->max_queue_pairs; i++) { struct virtqueue *vq = vi->sq[i].vq; + sq = vi->sq + i; while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) { - if (!is_xdp_frame(buf)) + if (!is_xdp_frame(buf)) { dev_kfree_skb(buf); - else - xdp_return_frame(ptr_to_xdp(buf)); + } else { + struct virtnet_xdp_type *xtype; + + xtype = ptr_to_xtype(buf); + + if (xtype->type != XDP_TYPE_XSK) + xdp_return_frame(xtype_got_ptr(xtype)); + } } } From patchwork Tue Jan 5 09:11:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xuan Zhuo X-Patchwork-Id: 11998411 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA34EC4332B for ; Tue, 5 Jan 2021 09:13:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A8DC922AB9 for ; Tue, 5 Jan 2021 09:13:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727755AbhAEJMf (ORCPT ); Tue, 5 Jan 2021 04:12:35 -0500 Received: from out30-131.freemail.mail.aliyun.com ([115.124.30.131]:42207 "EHLO out30-131.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727657AbhAEJMd (ORCPT ); Tue, 5 Jan 2021 04:12:33 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R191e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04357;MF=xuanzhuo@linux.alibaba.com;NM=1;PH=DS;RN=22;SR=0;TI=SMTPD_---0UKoYpdO_1609837908; Received: from localhost(mailfrom:xuanzhuo@linux.alibaba.com fp:SMTPD_---0UKoYpdO_1609837908) by smtp.aliyun-inc.com(127.0.0.1); Tue, 05 Jan 2021 17:11:49 +0800 From: Xuan Zhuo To: netdev@vger.kernel.org Cc: dust.li@linux.alibaba.com, tonylu@linux.alibaba.com, "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Jonathan Lemon , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , virtualization@lists.linux-foundation.org (open list:VIRTIO CORE AND NET DRIVERS), linux-kernel@vger.kernel.org (open list), bpf@vger.kernel.org (open list:XDP SOCKETS (AF_XDP)) Subject: [PATCH netdev 4/5] xsk, virtio-net: prepare for support xsk Date: Tue, 5 Jan 2021 17:11:42 +0800 Message-Id: <4c424e0980420dfff194a9d1c8e66609b2fa6cba.1609837120.git.xuanzhuo@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Split function free_old_xmit_skbs, add sub-function __free_old_xmit_ptr, which is convenient to call with other statistical information, and supports the parameter 'xsk_wakeup' required for processing xsk. Use netif stop check as a function virtnet_sq_stop_check, which will be used when adding xsk support. Signed-off-by: Xuan Zhuo --- drivers/net/virtio_net.c | 95 ++++++++++++++++++++++++++---------------------- 1 file changed, 52 insertions(+), 43 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index df38a9f..e744dce 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -263,6 +263,11 @@ struct padded_vnet_hdr { char padding[4]; }; +static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi, + bool xsk_wakeup, + unsigned int *_packets, unsigned int *_bytes); +static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi); + static bool is_xdp_frame(void *ptr) { return (unsigned long)ptr & VIRTIO_XDP_FLAG; @@ -376,6 +381,37 @@ static void skb_xmit_done(struct virtqueue *vq) netif_wake_subqueue(vi->dev, vq2txq(vq)); } +static void virtnet_sq_stop_check(struct send_queue *sq, bool in_napi) +{ + struct virtnet_info *vi = sq->vq->vdev->priv; + struct net_device *dev = vi->dev; + int qnum = sq - vi->sq; + + /* If running out of space, stop queue to avoid getting packets that we + * are then unable to transmit. + * An alternative would be to force queuing layer to requeue the skb by + * returning NETDEV_TX_BUSY. However, NETDEV_TX_BUSY should not be + * returned in a normal path of operation: it means that driver is not + * maintaining the TX queue stop/start state properly, and causes + * the stack to do a non-trivial amount of useless work. + * Since most packets only take 1 or 2 ring slots, stopping the queue + * early means 16 slots are typically wasted. + */ + + if (sq->vq->num_free < 2 + MAX_SKB_FRAGS) { + netif_stop_subqueue(dev, qnum); + if (!sq->napi.weight && + unlikely(!virtqueue_enable_cb_delayed(sq->vq))) { + /* More just got used, free them then recheck. */ + free_old_xmit_skbs(sq, in_napi); + if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) { + netif_start_subqueue(dev, qnum); + virtqueue_disable_cb(sq->vq); + } + } + } +} + #define MRG_CTX_HEADER_SHIFT 22 static void *mergeable_len_to_ctx(unsigned int truesize, unsigned int headroom) @@ -543,13 +579,11 @@ static int virtnet_xdp_xmit(struct net_device *dev, struct receive_queue *rq = vi->rq; struct bpf_prog *xdp_prog; struct send_queue *sq; - unsigned int len; int packets = 0; int bytes = 0; int drops = 0; int kicks = 0; int ret, err; - void *ptr; int i; /* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this @@ -567,24 +601,7 @@ static int virtnet_xdp_xmit(struct net_device *dev, goto out; } - /* Free up any pending old buffers before queueing new ones. */ - while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) { - if (likely(is_xdp_frame(ptr))) { - struct virtnet_xdp_type *xtype; - struct xdp_frame *frame; - - xtype = ptr_to_xtype(ptr); - frame = xtype_got_ptr(xtype); - bytes += frame->len; - xdp_return_frame(frame); - } else { - struct sk_buff *skb = ptr; - - bytes += skb->len; - napi_consume_skb(skb, false); - } - packets++; - } + __free_old_xmit_ptr(sq, false, true, &packets, &bytes); for (i = 0; i < n; i++) { struct xdp_frame *xdpf = frames[i]; @@ -1422,7 +1439,9 @@ static int virtnet_receive(struct receive_queue *rq, int budget, return stats.packets; } -static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi) +static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi, + bool xsk_wakeup, + unsigned int *_packets, unsigned int *_bytes) { unsigned int packets = 0; unsigned int bytes = 0; @@ -1456,6 +1475,17 @@ static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi) packets++; } + *_packets = packets; + *_bytes = bytes; +} + +static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi) +{ + unsigned int packets = 0; + unsigned int bytes = 0; + + __free_old_xmit_ptr(sq, in_napi, true, &packets, &bytes); + /* Avoid overhead when no packets have been processed * happens when called speculatively from start_xmit. */ @@ -1672,28 +1702,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) nf_reset_ct(skb); } - /* If running out of space, stop queue to avoid getting packets that we - * are then unable to transmit. - * An alternative would be to force queuing layer to requeue the skb by - * returning NETDEV_TX_BUSY. However, NETDEV_TX_BUSY should not be - * returned in a normal path of operation: it means that driver is not - * maintaining the TX queue stop/start state properly, and causes - * the stack to do a non-trivial amount of useless work. - * Since most packets only take 1 or 2 ring slots, stopping the queue - * early means 16 slots are typically wasted. - */ - if (sq->vq->num_free < 2+MAX_SKB_FRAGS) { - netif_stop_subqueue(dev, qnum); - if (!use_napi && - unlikely(!virtqueue_enable_cb_delayed(sq->vq))) { - /* More just got used, free them then recheck. */ - free_old_xmit_skbs(sq, false); - if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) { - netif_start_subqueue(dev, qnum); - virtqueue_disable_cb(sq->vq); - } - } - } + virtnet_sq_stop_check(sq, false); if (kick || netif_xmit_stopped(txq)) { if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) { From patchwork Tue Jan 5 09:11:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xuan Zhuo X-Patchwork-Id: 11998405 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B7B3C4332B for ; Tue, 5 Jan 2021 09:12:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1BFA0225AC for ; Tue, 5 Jan 2021 09:12:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727844AbhAEJMi (ORCPT ); Tue, 5 Jan 2021 04:12:38 -0500 Received: from out30-42.freemail.mail.aliyun.com ([115.124.30.42]:58402 "EHLO out30-42.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727686AbhAEJMg (ORCPT ); Tue, 5 Jan 2021 04:12:36 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R151e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=alimailimapcm10staff010182156082;MF=xuanzhuo@linux.alibaba.com;NM=1;PH=DS;RN=22;SR=0;TI=SMTPD_---0UKoFTZy_1609837909; Received: from localhost(mailfrom:xuanzhuo@linux.alibaba.com fp:SMTPD_---0UKoFTZy_1609837909) by smtp.aliyun-inc.com(127.0.0.1); Tue, 05 Jan 2021 17:11:50 +0800 From: Xuan Zhuo To: netdev@vger.kernel.org Cc: dust.li@linux.alibaba.com, tonylu@linux.alibaba.com, "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Jonathan Lemon , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , virtualization@lists.linux-foundation.org (open list:VIRTIO CORE AND NET DRIVERS), linux-kernel@vger.kernel.org (open list), bpf@vger.kernel.org (open list:XDP SOCKETS (AF_XDP)) Subject: [PATCH netdev 5/5] virtio-net, xsk: virtio-net support xsk zero copy tx Date: Tue, 5 Jan 2021 17:11:43 +0800 Message-Id: <65b5d0af6c4ed878cbcfa53c925d9dcbb09ecc55.1609837120.git.xuanzhuo@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Virtio net support xdp socket. We should open the module param "napi_tx" for using this feature. In fact, various virtio implementations have some problems: 1. The tx interrupt may be lost 2. The tx interrupt may have a relatively large delay This brings us to several questions: 1. Wakeup wakes up a tx interrupt or directly starts a napi on the current cpu, which will cause a delay in sending packets. 2. When the tx ring is full, the tx interrupt may be lost or delayed, resulting in untimely recovery. I choose to send part of the data directly during wakeup. If the sending has not been completed, I will start a napi to complete the subsequent sending work. Since the possible delay or loss of tx interrupt occurs when the tx ring is full, I added a timer to solve this problem. The performance of udp sending based on virtio net + xsk is 6 times that of ordinary kernel udp send. * xsk_check_timeout: when the dev full or all xsk.hdr used, start timer to check the xsk.hdr is avail. the unit is us. * xsk_num_max: the xsk.hdr max num * xsk_num_percent: the max hdr num be the percent of the virtio ring size. The real xsk hdr num will the min of xsk_num_max and the percent of the num of virtio ring * xsk_budget: the budget for xsk run Signed-off-by: Xuan Zhuo Reported-by: kernel test robot Reported-by: kernel test robot Reported-by: Dan Carpenter Reported-by: kernel test robot Reported-by: kernel test robot --- drivers/net/virtio_net.c | 437 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 434 insertions(+), 3 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index e744dce..76319e7 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -22,10 +22,21 @@ #include #include #include +#include static int napi_weight = NAPI_POLL_WEIGHT; module_param(napi_weight, int, 0444); +static int xsk_check_timeout = 100; +static int xsk_num_max = 1024; +static int xsk_num_percent = 80; +static int xsk_budget = 128; + +module_param(xsk_check_timeout, int, 0644); +module_param(xsk_num_max, int, 0644); +module_param(xsk_num_percent, int, 0644); +module_param(xsk_budget, int, 0644); + static bool csum = true, gso = true, napi_tx = true; module_param(csum, bool, 0444); module_param(gso, bool, 0444); @@ -110,6 +121,9 @@ struct virtnet_xsk_hdr { u32 len; }; +#define VIRTNET_STATE_XSK_WAKEUP BIT(0) +#define VIRTNET_STATE_XSK_TIMER BIT(1) + #define VIRTNET_SQ_STAT(m) offsetof(struct virtnet_sq_stats, m) #define VIRTNET_RQ_STAT(m) offsetof(struct virtnet_rq_stats, m) @@ -149,6 +163,32 @@ struct send_queue { struct virtnet_sq_stats stats; struct napi_struct napi; + + struct { + struct xsk_buff_pool __rcu *pool; + struct virtnet_xsk_hdr __rcu *hdr; + + unsigned long state; + u64 hdr_con; + u64 hdr_pro; + u64 hdr_n; + struct xdp_desc last_desc; + bool wait_slot; + /* tx interrupt issues + * 1. that may be lost + * 2. that too slow, 200/s or delay 10ms + * + * timer for: + * 1. recycle the desc.(no check for performance, see below) + * 2. check the nic ring is avali. when nic ring is full + * + * Here, the regular check is performed for dev full. The + * application layer must ensure that the number of cq is + * sufficient, otherwise there may be insufficient cq in use. + * + */ + struct hrtimer timer; + } xsk; }; /* Internal representation of a receive virtqueue */ @@ -267,6 +307,8 @@ static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi, bool xsk_wakeup, unsigned int *_packets, unsigned int *_bytes); static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi); +static int virtnet_xsk_run(struct send_queue *sq, + struct xsk_buff_pool *pool, int budget); static bool is_xdp_frame(void *ptr) { @@ -1439,6 +1481,40 @@ static int virtnet_receive(struct receive_queue *rq, int budget, return stats.packets; } +static void virt_xsk_complete(struct send_queue *sq, u32 num, bool xsk_wakeup) +{ + struct xsk_buff_pool *pool; + int n; + + rcu_read_lock(); + + WRITE_ONCE(sq->xsk.hdr_pro, sq->xsk.hdr_pro + num); + + pool = rcu_dereference(sq->xsk.pool); + if (!pool) { + if (sq->xsk.hdr_pro - sq->xsk.hdr_con == sq->xsk.hdr_n) { + kfree(sq->xsk.hdr); + rcu_assign_pointer(sq->xsk.hdr, NULL); + } + rcu_read_unlock(); + return; + } + + xsk_tx_completed(pool, num); + + rcu_read_unlock(); + + if (!xsk_wakeup || !sq->xsk.wait_slot) + return; + + n = sq->xsk.hdr_pro - sq->xsk.hdr_con; + + if (n > sq->xsk.hdr_n / 2) { + sq->xsk.wait_slot = false; + virtqueue_napi_schedule(&sq->napi, sq->vq); + } +} + static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi, bool xsk_wakeup, unsigned int *_packets, unsigned int *_bytes) @@ -1446,6 +1522,7 @@ static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi, unsigned int packets = 0; unsigned int bytes = 0; unsigned int len; + u64 xsknum = 0; struct virtnet_xdp_type *xtype; struct xdp_frame *frame; struct virtnet_xsk_hdr *xskhdr; @@ -1466,6 +1543,7 @@ static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi, if (xtype->type == XDP_TYPE_XSK) { xskhdr = (struct virtnet_xsk_hdr *)xtype; bytes += xskhdr->len; + xsknum += 1; } else { frame = xtype_got_ptr(xtype); xdp_return_frame(frame); @@ -1475,6 +1553,9 @@ static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi, packets++; } + if (xsknum) + virt_xsk_complete(sq, xsknum, xsk_wakeup); + *_packets = packets; *_bytes = bytes; } @@ -1595,6 +1676,8 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget) struct virtnet_info *vi = sq->vq->vdev->priv; unsigned int index = vq2txq(sq->vq); struct netdev_queue *txq; + struct xsk_buff_pool *pool; + int work = 0; if (unlikely(is_xdp_raw_buffer_queue(vi, index))) { /* We don't need to enable cb for XDP */ @@ -1604,15 +1687,26 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget) txq = netdev_get_tx_queue(vi->dev, index); __netif_tx_lock(txq, raw_smp_processor_id()); - free_old_xmit_skbs(sq, true); + + rcu_read_lock(); + pool = rcu_dereference(sq->xsk.pool); + if (pool) { + work = virtnet_xsk_run(sq, pool, budget); + rcu_read_unlock(); + } else { + rcu_read_unlock(); + free_old_xmit_skbs(sq, true); + } + __netif_tx_unlock(txq); - virtqueue_napi_complete(napi, sq->vq, 0); + if (work < budget) + virtqueue_napi_complete(napi, sq->vq, 0); if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) netif_tx_wake_queue(txq); - return 0; + return work; } static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) @@ -2560,16 +2654,346 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog, return err; } +static enum hrtimer_restart virtnet_xsk_timeout(struct hrtimer *timer) +{ + struct send_queue *sq; + + sq = container_of(timer, struct send_queue, xsk.timer); + + clear_bit(VIRTNET_STATE_XSK_TIMER, &sq->xsk.state); + + virtqueue_napi_schedule(&sq->napi, sq->vq); + + return HRTIMER_NORESTART; +} + +static int virtnet_xsk_pool_enable(struct net_device *dev, + struct xsk_buff_pool *pool, + u16 qid) +{ + struct virtnet_info *vi = netdev_priv(dev); + struct send_queue *sq = &vi->sq[qid]; + struct virtnet_xsk_hdr *hdr; + int n, ret = 0; + + if (qid >= dev->real_num_rx_queues || qid >= dev->real_num_tx_queues) + return -EINVAL; + + if (qid >= vi->curr_queue_pairs) + return -EINVAL; + + rcu_read_lock(); + + ret = -EBUSY; + if (rcu_dereference(sq->xsk.pool)) + goto end; + + /* check last xsk wait for hdr been free */ + if (rcu_dereference(sq->xsk.hdr)) + goto end; + + n = virtqueue_get_vring_size(sq->vq); + n = min(xsk_num_max, n * (xsk_num_percent % 100) / 100); + + ret = -ENOMEM; + hdr = kcalloc(n, sizeof(struct virtnet_xsk_hdr), GFP_ATOMIC); + if (!hdr) + goto end; + + memset(&sq->xsk, 0, sizeof(sq->xsk)); + + sq->xsk.hdr_pro = n; + sq->xsk.hdr_n = n; + + hrtimer_init(&sq->xsk.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED); + sq->xsk.timer.function = virtnet_xsk_timeout; + + rcu_assign_pointer(sq->xsk.pool, pool); + rcu_assign_pointer(sq->xsk.hdr, hdr); + + ret = 0; +end: + rcu_read_unlock(); + + return ret; +} + +static int virtnet_xsk_pool_disable(struct net_device *dev, u16 qid) +{ + struct virtnet_info *vi = netdev_priv(dev); + struct send_queue *sq = &vi->sq[qid]; + + if (qid >= dev->real_num_rx_queues || qid >= dev->real_num_tx_queues) + return -EINVAL; + + if (qid >= vi->curr_queue_pairs) + return -EINVAL; + + rcu_assign_pointer(sq->xsk.pool, NULL); + + hrtimer_cancel(&sq->xsk.timer); + + synchronize_rcu(); /* Sync with the XSK wakeup and with NAPI. */ + + if (sq->xsk.hdr_pro - sq->xsk.hdr_con == sq->xsk.hdr_n) { + kfree(sq->xsk.hdr); + rcu_assign_pointer(sq->xsk.hdr, NULL); + synchronize_rcu(); + } + + return 0; +} + static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp) { switch (xdp->command) { case XDP_SETUP_PROG: return virtnet_xdp_set(dev, xdp->prog, xdp->extack); + case XDP_SETUP_XSK_POOL: + xdp->xsk.need_dma = false; + if (xdp->xsk.pool) + return virtnet_xsk_pool_enable(dev, xdp->xsk.pool, + xdp->xsk.queue_id); + else + return virtnet_xsk_pool_disable(dev, xdp->xsk.queue_id); default: return -EINVAL; } } +static int virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool, + struct xdp_desc *desc) +{ + struct virtnet_info *vi = sq->vq->vdev->priv; + void *data, *ptr; + struct page *page; + struct virtnet_xsk_hdr *xskhdr; + u32 idx, offset, n, i, copy, copied; + u64 addr; + int err, m; + + addr = desc->addr; + + data = xsk_buff_raw_get_data(pool, addr); + offset = offset_in_page(data); + + /* one for hdr, one for the first page */ + n = 2; + m = desc->len - (PAGE_SIZE - offset); + if (m > 0) { + n += m >> PAGE_SHIFT; + if (m & PAGE_MASK) + ++n; + + n = min_t(u32, n, ARRAY_SIZE(sq->sg)); + } + + idx = sq->xsk.hdr_con % sq->xsk.hdr_n; + xskhdr = &sq->xsk.hdr[idx]; + + /* xskhdr->hdr has been memset to zero, so not need to clear again */ + + sg_init_table(sq->sg, n); + sg_set_buf(sq->sg, &xskhdr->hdr, vi->hdr_len); + + copied = 0; + for (i = 1; i < n; ++i) { + copy = min_t(int, desc->len - copied, PAGE_SIZE - offset); + + page = xsk_buff_raw_get_page(pool, addr + copied); + + sg_set_page(sq->sg + i, page, copy, offset); + copied += copy; + if (offset) + offset = 0; + } + + xskhdr->len = desc->len; + ptr = xdp_to_ptr(&xskhdr->type); + + err = virtqueue_add_outbuf(sq->vq, sq->sg, n, ptr, GFP_ATOMIC); + if (unlikely(err)) + sq->xsk.last_desc = *desc; + else + sq->xsk.hdr_con++; + + return err; +} + +static bool virtnet_xsk_dev_is_full(struct send_queue *sq) +{ + if (sq->vq->num_free < 2 + MAX_SKB_FRAGS) + return true; + + if (sq->xsk.hdr_con == sq->xsk.hdr_pro) + return true; + + return false; +} + +static int virtnet_xsk_xmit_zc(struct send_queue *sq, + struct xsk_buff_pool *pool, unsigned int budget) +{ + struct xdp_desc desc; + int err, packet = 0; + int ret = -EAGAIN; + + if (sq->xsk.last_desc.addr) { + err = virtnet_xsk_xmit(sq, pool, &sq->xsk.last_desc); + if (unlikely(err)) + return -EBUSY; + + ++packet; + sq->xsk.last_desc.addr = 0; + } + + while (budget-- > 0) { + if (virtnet_xsk_dev_is_full(sq)) { + ret = -EBUSY; + break; + } + + if (!xsk_tx_peek_desc(pool, &desc)) { + /* done */ + ret = 0; + break; + } + + err = virtnet_xsk_xmit(sq, pool, &desc); + if (unlikely(err)) { + ret = -EBUSY; + break; + } + + ++packet; + } + + if (packet) { + xsk_tx_release(pool); + + if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) { + u64_stats_update_begin(&sq->stats.syncp); + sq->stats.kicks++; + u64_stats_update_end(&sq->stats.syncp); + } + } + + return ret; +} + +static int virtnet_xsk_run(struct send_queue *sq, + struct xsk_buff_pool *pool, int budget) +{ + int err, ret = 0; + unsigned int _packets = 0; + unsigned int _bytes = 0; + + sq->xsk.wait_slot = false; + + if (test_and_clear_bit(VIRTNET_STATE_XSK_TIMER, &sq->xsk.state)) + hrtimer_try_to_cancel(&sq->xsk.timer); + + __free_old_xmit_ptr(sq, true, false, &_packets, &_bytes); + + err = virtnet_xsk_xmit_zc(sq, pool, xsk_budget); + if (!err) { + struct xdp_desc desc; + + clear_bit(VIRTNET_STATE_XSK_WAKEUP, &sq->xsk.state); + xsk_set_tx_need_wakeup(pool); + + /* Race breaker. If new is coming after last xmit + * but before flag change + */ + + if (!xsk_tx_peek_desc(pool, &desc)) + goto end; + + set_bit(VIRTNET_STATE_XSK_WAKEUP, &sq->xsk.state); + xsk_clear_tx_need_wakeup(pool); + + sq->xsk.last_desc = desc; + ret = budget; + goto end; + } + + xsk_clear_tx_need_wakeup(pool); + + if (err == -EAGAIN) { + ret = budget; + goto end; + } + + /* -EBUSY: wait tx ring avali. + * by tx interrupt or rx interrupt or start_xmit or timer + */ + + __free_old_xmit_ptr(sq, true, false, &_packets, &_bytes); + + if (!virtnet_xsk_dev_is_full(sq)) { + ret = budget; + goto end; + } + + sq->xsk.wait_slot = true; + + if (xsk_check_timeout) { + hrtimer_start(&sq->xsk.timer, + ns_to_ktime(xsk_check_timeout * 1000), + HRTIMER_MODE_REL_PINNED); + + set_bit(VIRTNET_STATE_XSK_TIMER, &sq->xsk.state); + } + + virtnet_sq_stop_check(sq, true); + +end: + return ret; +} + +static int virtnet_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag) +{ + struct virtnet_info *vi = netdev_priv(dev); + struct send_queue *sq; + struct xsk_buff_pool *pool; + struct netdev_queue *txq; + int work = 0; + + if (!netif_running(dev)) + return -ENETDOWN; + + if (qid >= vi->curr_queue_pairs) + return -EINVAL; + + sq = &vi->sq[qid]; + + rcu_read_lock(); + + pool = rcu_dereference(sq->xsk.pool); + if (!pool) + goto end; + + if (test_and_set_bit(VIRTNET_STATE_XSK_WAKEUP, &sq->xsk.state)) + goto end; + + txq = netdev_get_tx_queue(dev, qid); + + local_bh_disable(); + __netif_tx_lock(txq, raw_smp_processor_id()); + + work = virtnet_xsk_run(sq, pool, xsk_budget); + + __netif_tx_unlock(txq); + local_bh_enable(); + + if (work == xsk_budget) + virtqueue_napi_schedule(&sq->napi, sq->vq); + +end: + rcu_read_unlock(); + return 0; +} + static int virtnet_get_phys_port_name(struct net_device *dev, char *buf, size_t len) { @@ -2624,6 +3048,7 @@ static int virtnet_set_features(struct net_device *dev, .ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid, .ndo_bpf = virtnet_xdp, .ndo_xdp_xmit = virtnet_xdp_xmit, + .ndo_xsk_wakeup = virtnet_xsk_wakeup, .ndo_features_check = passthru_features_check, .ndo_get_phys_port_name = virtnet_get_phys_port_name, .ndo_set_features = virtnet_set_features, @@ -2722,6 +3147,7 @@ static void free_receive_page_frags(struct virtnet_info *vi) static void free_unused_bufs(struct virtnet_info *vi) { void *buf; + u32 n; int i; struct send_queue *sq; @@ -2740,6 +3166,11 @@ static void free_unused_bufs(struct virtnet_info *vi) xdp_return_frame(xtype_got_ptr(xtype)); } } + + n = sq->xsk.hdr_con + sq->xsk.hdr_n; + n -= sq->xsk.hdr_pro; + if (n) + virt_xsk_complete(sq, n, false); } for (i = 0; i < vi->max_queue_pairs; i++) {