Message ID | 20250124090211.110328-1-sankararaman.jayaraman@broadcom.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] vmxnet3: Fix tx queue race condition with XDP | expand |
On Fri, Jan 24, 2025 at 1:00 AM Sankararaman Jayaraman <sankararaman.jayaraman@broadcom.com> wrote: > > If XDP traffic runs on a CPU which is greater than or equal to > the number of the Tx queues of the NIC, then vmxnet3_xdp_get_tq() > always picks up queue 0 for transmission as it uses reciprocal scale > instead of simple modulo operation. > > vmxnet3_xdp_xmit() and vmxnet3_xdp_xmit_frame() use the above > returned queue without any locking which can lead to race conditions > when multiple XDP xmits run in parallel on different CPU's. > > This patch uses a simple module scheme when the current CPU equals or > exceeds the number of Tx queues on the NIC. It also adds locking in > vmxnet3_xdp_xmit() and vmxnet3_xdp_xmit_frame() functions. > > Fixes: 54f00cce1178 ("vmxnet3: Add XDP support.") > Signed-off-by: Sankararaman Jayaraman <sankararaman.jayaraman@broadcom.com> > Signed-off-by: Ronak Doshi <ronak.doshi@broadcom.com> > --- LGTM Acked-by: William Tu <u9012063@gmail.com>
On Fri, Jan 24, 2025 at 02:32:11PM +0530, Sankararaman Jayaraman wrote: > If XDP traffic runs on a CPU which is greater than or equal to > the number of the Tx queues of the NIC, then vmxnet3_xdp_get_tq() > always picks up queue 0 for transmission as it uses reciprocal scale > instead of simple modulo operation. > > vmxnet3_xdp_xmit() and vmxnet3_xdp_xmit_frame() use the above > returned queue without any locking which can lead to race conditions > when multiple XDP xmits run in parallel on different CPU's. > > This patch uses a simple module scheme when the current CPU equals or > exceeds the number of Tx queues on the NIC. It also adds locking in > vmxnet3_xdp_xmit() and vmxnet3_xdp_xmit_frame() functions. > > Fixes: 54f00cce1178 ("vmxnet3: Add XDP support.") > Signed-off-by: Sankararaman Jayaraman <sankararaman.jayaraman@broadcom.com> > Signed-off-by: Ronak Doshi <ronak.doshi@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org>
On Fri, 24 Jan 2025 14:32:11 +0530 Sankararaman Jayaraman wrote: > + * Copyright (C) 2008-2025, VMware, Inc. All Rights Reserved. Please don't update copyright dates in a fix. It increases the size of the patch and risk of a conflict. > @@ -123,7 +123,9 @@ vmxnet3_xdp_xmit_frame(struct vmxnet3_adapter *adapter, > struct page *page; > u32 buf_size; > u32 dw2; > + unsigned long irq_flags; please order variable declaration lines longest to shortest > + spin_lock_irqsave(&tq->tx_lock, irq_flags); why _irqsave() ?
diff --git a/drivers/net/vmxnet3/vmxnet3_xdp.c b/drivers/net/vmxnet3/vmxnet3_xdp.c index 1341374a4588..5f177e77cfcb 100644 --- a/drivers/net/vmxnet3/vmxnet3_xdp.c +++ b/drivers/net/vmxnet3/vmxnet3_xdp.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0-or-later /* * Linux driver for VMware's vmxnet3 ethernet NIC. - * Copyright (C) 2008-2023, VMware, Inc. All Rights Reserved. + * Copyright (C) 2008-2025, VMware, Inc. All Rights Reserved. * Maintained by: pv-drivers@vmware.com * */ @@ -28,7 +28,7 @@ vmxnet3_xdp_get_tq(struct vmxnet3_adapter *adapter) if (likely(cpu < tq_number)) tq = &adapter->tx_queue[cpu]; else - tq = &adapter->tx_queue[reciprocal_scale(cpu, tq_number)]; + tq = &adapter->tx_queue[cpu % tq_number]; return tq; } @@ -123,7 +123,9 @@ vmxnet3_xdp_xmit_frame(struct vmxnet3_adapter *adapter, struct page *page; u32 buf_size; u32 dw2; + unsigned long irq_flags; + spin_lock_irqsave(&tq->tx_lock, irq_flags); dw2 = (tq->tx_ring.gen ^ 0x1) << VMXNET3_TXD_GEN_SHIFT; dw2 |= xdpf->len; ctx.sop_txd = tq->tx_ring.base + tq->tx_ring.next2fill; @@ -134,6 +136,7 @@ vmxnet3_xdp_xmit_frame(struct vmxnet3_adapter *adapter, if (vmxnet3_cmd_ring_desc_avail(&tq->tx_ring) == 0) { tq->stats.tx_ring_full++; + spin_unlock_irqrestore(&tq->tx_lock, irq_flags); return -ENOSPC; } @@ -142,8 +145,10 @@ vmxnet3_xdp_xmit_frame(struct vmxnet3_adapter *adapter, tbi->dma_addr = dma_map_single(&adapter->pdev->dev, xdpf->data, buf_size, DMA_TO_DEVICE); - if (dma_mapping_error(&adapter->pdev->dev, tbi->dma_addr)) + if (dma_mapping_error(&adapter->pdev->dev, tbi->dma_addr)) { + spin_unlock_irqrestore(&tq->tx_lock, irq_flags); return -EFAULT; + } tbi->map_type |= VMXNET3_MAP_SINGLE; } else { /* XDP buffer from page pool */ page = virt_to_page(xdpf->data); @@ -182,6 +187,7 @@ vmxnet3_xdp_xmit_frame(struct vmxnet3_adapter *adapter, dma_wmb(); gdesc->dword[2] = cpu_to_le32(le32_to_cpu(gdesc->dword[2]) ^ VMXNET3_TXD_GEN); + spin_unlock_irqrestore(&tq->tx_lock, irq_flags); /* No need to handle the case when tx_num_deferred doesn't reach * threshold. Backend driver at hypervisor side will poll and reset @@ -226,6 +232,7 @@ vmxnet3_xdp_xmit(struct net_device *dev, struct vmxnet3_adapter *adapter = netdev_priv(dev); struct vmxnet3_tx_queue *tq; int i; + struct netdev_queue *nq; if (unlikely(test_bit(VMXNET3_STATE_BIT_QUIESCED, &adapter->state))) return -ENETDOWN; @@ -236,6 +243,9 @@ vmxnet3_xdp_xmit(struct net_device *dev, if (tq->stopped) return -ENETDOWN; + nq = netdev_get_tx_queue(adapter->netdev, tq->qid); + + __netif_tx_lock(nq, smp_processor_id()); for (i = 0; i < n; i++) { if (vmxnet3_xdp_xmit_frame(adapter, frames[i], tq, true)) { tq->stats.xdp_xmit_err++; @@ -243,6 +253,7 @@ vmxnet3_xdp_xmit(struct net_device *dev, } } tq->stats.xdp_xmit += i; + __netif_tx_unlock(nq); return i; }