diff mbox series

[net-next,v5,14/15] virtio-net: xsk direct xmit inside xsk wakeup

Message ID 20210610082209.91487-15-xuanzhuo@linux.alibaba.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series virtio-net: support xdp socket zero copy | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers success CCed 15 of 15 maintainers
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit fail Errors and warnings before: 15 this patch: 15
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 40 lines checked
netdev/build_allmodconfig_warn fail Errors and warnings before: 15 this patch: 15
netdev/header_inline success Link

Commit Message

Xuan Zhuo June 10, 2021, 8:22 a.m. UTC
Calling virtqueue_napi_schedule() in wakeup results in napi running on
the current cpu. If the application is not busy, then there is no
problem. But if the application itself is busy, it will cause a lot of
scheduling.

If the application is continuously sending data packets, due to the
continuous scheduling between the application and napi, the data packet
transmission will not be smooth, and there will be an obvious delay in
the transmission (you can use tcpdump to see it). When pressing a
channel to 100% (vhost reaches 100%), the cpu where the application is
located reaches 100%.

This patch sends a small amount of data directly in wakeup. The purpose
of this is to trigger the tx interrupt. The tx interrupt will be
awakened on the cpu of its affinity, and then trigger the operation of
the napi mechanism, napi can continue to consume the xsk tx queue. Two
cpus are running, cpu0 is running applications, cpu1 executes
napi consumption data. The same is to press a channel to 100%, but the
utilization rate of cpu0 is 12.7% and the utilization rate of cpu1 is
2.9%.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio/xsk.c | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

Comments

Jason Wang June 17, 2021, 3:07 a.m. UTC | #1
在 2021/6/10 下午4:22, Xuan Zhuo 写道:
> Calling virtqueue_napi_schedule() in wakeup results in napi running on
> the current cpu. If the application is not busy, then there is no
> problem. But if the application itself is busy, it will cause a lot of
> scheduling.
>
> If the application is continuously sending data packets, due to the
> continuous scheduling between the application and napi, the data packet
> transmission will not be smooth, and there will be an obvious delay in
> the transmission (you can use tcpdump to see it). When pressing a
> channel to 100% (vhost reaches 100%), the cpu where the application is
> located reaches 100%.
>
> This patch sends a small amount of data directly in wakeup. The purpose
> of this is to trigger the tx interrupt. The tx interrupt will be
> awakened on the cpu of its affinity, and then trigger the operation of
> the napi mechanism, napi can continue to consume the xsk tx queue. Two
> cpus are running, cpu0 is running applications, cpu1 executes
> napi consumption data. The same is to press a channel to 100%, but the
> utilization rate of cpu0 is 12.7% and the utilization rate of cpu1 is
> 2.9%.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>   drivers/net/virtio/xsk.c | 28 +++++++++++++++++++++++-----
>   1 file changed, 23 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/virtio/xsk.c b/drivers/net/virtio/xsk.c
> index 36cda2dcf8e7..3973c82d1ad2 100644
> --- a/drivers/net/virtio/xsk.c
> +++ b/drivers/net/virtio/xsk.c
> @@ -547,6 +547,7 @@ int virtnet_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag)
>   {
>   	struct virtnet_info *vi = netdev_priv(dev);
>   	struct xsk_buff_pool *pool;
> +	struct netdev_queue *txq;
>   	struct send_queue *sq;
>   
>   	if (!netif_running(dev))
> @@ -559,11 +560,28 @@ int virtnet_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag)
>   
>   	rcu_read_lock();
>   	pool = rcu_dereference(sq->xsk.pool);
> -	if (pool) {
> -		local_bh_disable();
> -		virtqueue_napi_schedule(&sq->napi, sq->vq);
> -		local_bh_enable();
> -	}
> +	if (!pool)
> +		goto end;
> +
> +	if (napi_if_scheduled_mark_missed(&sq->napi))
> +		goto end;
> +
> +	txq = netdev_get_tx_queue(dev, qid);
> +
> +	__netif_tx_lock_bh(txq);
> +
> +	/* Send part of the packet directly to reduce the delay in sending the
> +	 * packet, and this can actively trigger the tx interrupts.
> +	 *
> +	 * If no packet is sent out, the ring of the device is full. In this
> +	 * case, we will still get a tx interrupt response. Then we will deal
> +	 * with the subsequent packet sending work.
> +	 */
> +	virtnet_xsk_run(sq, pool, sq->napi.weight, false);


This looks tricky, and it won't be efficient since there could be some 
contention on the tx lock.

I wonder if we can simulate the interrupt via IPI like what RPS did.

In the long run, we may want to extend the spec to support interrupt 
trigger though driver.

Thanks


> +
> +	__netif_tx_unlock_bh(txq);
> +
> +end:
>   	rcu_read_unlock();
>   	return 0;
>   }
diff mbox series

Patch

diff --git a/drivers/net/virtio/xsk.c b/drivers/net/virtio/xsk.c
index 36cda2dcf8e7..3973c82d1ad2 100644
--- a/drivers/net/virtio/xsk.c
+++ b/drivers/net/virtio/xsk.c
@@ -547,6 +547,7 @@  int virtnet_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
 	struct xsk_buff_pool *pool;
+	struct netdev_queue *txq;
 	struct send_queue *sq;
 
 	if (!netif_running(dev))
@@ -559,11 +560,28 @@  int virtnet_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag)
 
 	rcu_read_lock();
 	pool = rcu_dereference(sq->xsk.pool);
-	if (pool) {
-		local_bh_disable();
-		virtqueue_napi_schedule(&sq->napi, sq->vq);
-		local_bh_enable();
-	}
+	if (!pool)
+		goto end;
+
+	if (napi_if_scheduled_mark_missed(&sq->napi))
+		goto end;
+
+	txq = netdev_get_tx_queue(dev, qid);
+
+	__netif_tx_lock_bh(txq);
+
+	/* Send part of the packet directly to reduce the delay in sending the
+	 * packet, and this can actively trigger the tx interrupts.
+	 *
+	 * If no packet is sent out, the ring of the device is full. In this
+	 * case, we will still get a tx interrupt response. Then we will deal
+	 * with the subsequent packet sending work.
+	 */
+	virtnet_xsk_run(sq, pool, sq->napi.weight, false);
+
+	__netif_tx_unlock_bh(txq);
+
+end:
 	rcu_read_unlock();
 	return 0;
 }