diff mbox series

[net,v4] atl1c: move tx cleanup processing out of interrupt

Message ID c8327d4bb516dd4741878c64fa6485cd@mikrotik.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net,v4] atl1c: move tx cleanup processing out of interrupt | expand

Checks

Context Check Description
netdev/apply fail Patch does not apply to net
netdev/tree_selection success Clearly marked for net

Commit Message

Gatis Peisenieks April 6, 2021, 2:49 p.m. UTC
Tx queue cleanup happens in interrupt handler on same core as rx queue
processing. Both can take considerable amount of processing in high
packet-per-second scenarios.

Sending big amounts of packets can stall the rx processing which is 
unfair
and also can lead to out-of-memory condition since __dev_kfree_skb_irq
queues the skbs for later kfree in softirq which is not allowed to 
happen
with heavy load in interrupt handler.

This puts tx cleanup in its own napi and enables threaded napi to allow
the rx/tx queue processing to happen on different cores. Also as the 
first
in-driver user of dev_set_threaded API, need to add EXPORT_SYMBOL for 
it.

The ability to sustain equal amounts of tx/rx traffic increased:
from 280Kpps to 1130Kpps on Threadripper 3960X with upcoming
Mikrotik 10/25G NIC,
from 520Kpps to 850Kpps on Intel i3-3320 with Mikrotik RB44Ge adapter.

Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com>
---
changes since v3:
	- made scripts/checkpatch.pl happy (commit message line wrap +
	  missing comment on spinlock)
	- moved the new intr_mask_lock to be besides the intr_mask it
	  protects so they are more likely to be on same cacheline
changes since v2:
	- addressed comments from Eric Dumazet
	- added EXPORT_SYMBOL for dev_set_threaded

Sorry for reposting, noticed that scripts/checkpatch.pl was not happy.
---
  drivers/net/ethernet/atheros/atl1c/atl1c.h    |  2 +
  .../net/ethernet/atheros/atl1c/atl1c_main.c   | 44 ++++++++++++++-----
  net/core/dev.c                                |  1 +
  3 files changed, 37 insertions(+), 10 deletions(-)

  		    int (*poll)(struct napi_struct *, int), int weight)

Comments

David Miller April 6, 2021, 11:44 p.m. UTC | #1
From: Gatis Peisenieks <gatis@mikrotik.com>
Date: Tue, 06 Apr 2021 17:49:32 +0300

> Tx queue cleanup happens in interrupt handler on same core as rx queue
> processing. Both can take considerable amount of processing in high
> packet-per-second scenarios.
> 
> Sending big amounts of packets can stall the rx processing which is
> unfair
> and also can lead to out-of-memory condition since __dev_kfree_skb_irq
> queues the skbs for later kfree in softirq which is not allowed to
> happen
> with heavy load in interrupt handler.
> 
> This puts tx cleanup in its own napi and enables threaded napi to
> allow
> the rx/tx queue processing to happen on different cores. Also as the
> first
> in-driver user of dev_set_threaded API, need to add EXPORT_SYMBOL for
> it.
> 
> The ability to sustain equal amounts of tx/rx traffic increased:
> from 280Kpps to 1130Kpps on Threadripper 3960X with upcoming
> Mikrotik 10/25G NIC,
> from 520Kpps to 850Kpps on Intel i3-3320 with Mikrotik RB44Ge adapter.
> 
> Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com>
> ---
> changes since v3:
> 	- made scripts/checkpatch.pl happy (commit message line wrap +
> 	  missing comment on spinlock)
> 	- moved the new intr_mask_lock to be besides the intr_mask it
> 	  protects so they are more likely to be on same cacheline
> changes since v2:
> 	- addressed comments from Eric Dumazet
> 	- added EXPORT_SYMBOL for dev_set_threaded
> 
> Sorry for reposting, noticed that scripts/checkpatch.pl was not happy.

This does not apply to 'net',  did you mean 'net-next'?  If so, please indicate this clearly in
the Subject line as per convention.

Thank you.
Eric Dumazet April 7, 2021, 4:55 p.m. UTC | #2
On 4/6/21 4:49 PM, Gatis Peisenieks wrote:
> Tx queue cleanup happens in interrupt handler on same core as rx queue
> processing. Both can take considerable amount of processing in high
> packet-per-second scenarios.
> 
> Sending big amounts of packets can stall the rx processing which is unfair
> and also can lead to out-of-memory condition since __dev_kfree_skb_irq
> queues the skbs for later kfree in softirq which is not allowed to happen
> with heavy load in interrupt handler.
> 

[ ... ]

> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0f72ff5d34ba..489ac60b530c 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6789,6 +6789,7 @@ int dev_set_threaded(struct net_device *dev, bool threaded)
> 
>      return err;
>  }
> +EXPORT_SYMBOL(dev_set_threaded);
> 
>  void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
>              int (*poll)(struct napi_struct *, int), int weight)

This has already been done in net-next

Please base your patch on top of net-next, this can not be backported to old
versions anyway, without some amount of pain.
Gatis Peisenieks April 8, 2021, 5:21 a.m. UTC | #3
On 2021-04-07 19:55, Eric Dumazet wrote:
> On 4/6/21 4:49 PM, Gatis Peisenieks wrote:
>> Tx queue cleanup happens in interrupt handler on same core as rx queue
>> processing. Both can take considerable amount of processing in high
>> packet-per-second scenarios.
>> 
>> Sending big amounts of packets can stall the rx processing which is 
>> unfair
>> and also can lead to out-of-memory condition since __dev_kfree_skb_irq
>> queues the skbs for later kfree in softirq which is not allowed to 
>> happen
>> with heavy load in interrupt handler.
>> 
> 
> [ ... ]
> 
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 0f72ff5d34ba..489ac60b530c 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -6789,6 +6789,7 @@ int dev_set_threaded(struct net_device *dev, 
>> bool threaded)
>> 
>>      return err;
>>  }
>> +EXPORT_SYMBOL(dev_set_threaded);
>> 
>>  void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
>>              int (*poll)(struct napi_struct *, int), int weight)
> 
> This has already been done in net-next
> 
> Please base your patch on top of net-next, this can not be backported 
> to old
> versions anyway, without some amount of pain.

Thank you Eric, for heads up, the v5 patch sent for net-next in response 
to
David Miller comment already does that.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c.h 
b/drivers/net/ethernet/atheros/atl1c/atl1c.h
index a0562a90fb6d..28ae5c16831e 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c.h
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c.h
@@ -367,6 +367,7 @@  struct atl1c_hw {
  	u16 phy_id1;
  	u16 phy_id2;

+	spinlock_t intr_mask_lock;	/* protect the intr_mask */
  	u32 intr_mask;

  	u8 preamble_len;
@@ -506,6 +507,7 @@  struct atl1c_adapter {
  	struct net_device   *netdev;
  	struct pci_dev      *pdev;
  	struct napi_struct  napi;
+	struct napi_struct  tx_napi;
  	struct page         *rx_page;
  	unsigned int	    rx_page_offset;
  	unsigned int	    rx_frag_size;
diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c 
b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
index 3f65f2b370c5..cfa1ce91402e 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
@@ -813,6 +813,7 @@  static int atl1c_sw_init(struct atl1c_adapter 
*adapter)
  	atl1c_set_rxbufsize(adapter, adapter->netdev);
  	atomic_set(&adapter->irq_sem, 1);
  	spin_lock_init(&adapter->mdio_lock);
+	spin_lock_init(&adapter->hw.intr_mask_lock);
  	set_bit(__AT_DOWN, &adapter->flags);

  	return 0;
@@ -1530,20 +1531,19 @@  static inline void atl1c_clear_phy_int(struct 
atl1c_adapter *adapter)
  	spin_unlock(&adapter->mdio_lock);
  }

-static bool atl1c_clean_tx_irq(struct atl1c_adapter *adapter,
-				enum atl1c_trans_queue type)
+static int atl1c_clean_tx(struct napi_struct *napi, int budget)
  {
-	struct atl1c_tpd_ring *tpd_ring = &adapter->tpd_ring[type];
+	struct atl1c_adapter *adapter =
+		container_of(napi, struct atl1c_adapter, tx_napi);
+	struct atl1c_tpd_ring *tpd_ring = 
&adapter->tpd_ring[atl1c_trans_normal];
  	struct atl1c_buffer *buffer_info;
  	struct pci_dev *pdev = adapter->pdev;
  	u16 next_to_clean = atomic_read(&tpd_ring->next_to_clean);
  	u16 hw_next_to_clean;
-	u16 reg;
  	unsigned int total_bytes = 0, total_packets = 0;
+	unsigned long flags;

-	reg = type == atl1c_trans_high ? REG_TPD_PRI1_CIDX : 
REG_TPD_PRI0_CIDX;
-
-	AT_READ_REGW(&adapter->hw, reg, &hw_next_to_clean);
+	AT_READ_REGW(&adapter->hw, REG_TPD_PRI0_CIDX, &hw_next_to_clean);

  	while (next_to_clean != hw_next_to_clean) {
  		buffer_info = &tpd_ring->buffer_info[next_to_clean];
@@ -1564,7 +1564,15 @@  static bool atl1c_clean_tx_irq(struct 
atl1c_adapter *adapter,
  		netif_wake_queue(adapter->netdev);
  	}

-	return true;
+	if (total_packets < budget) {
+		napi_complete_done(napi, total_packets);
+		spin_lock_irqsave(&adapter->hw.intr_mask_lock, flags);
+		adapter->hw.intr_mask |= ISR_TX_PKT;
+		AT_WRITE_REG(&adapter->hw, REG_IMR, adapter->hw.intr_mask);
+		spin_unlock_irqrestore(&adapter->hw.intr_mask_lock, flags);
+		return total_packets;
+	}
+	return budget;
  }

  /**
@@ -1599,13 +1607,22 @@  static irqreturn_t atl1c_intr(int irq, void 
*data)
  		AT_WRITE_REG(hw, REG_ISR, status | ISR_DIS_INT);
  		if (status & ISR_RX_PKT) {
  			if (likely(napi_schedule_prep(&adapter->napi))) {
+				spin_lock(&hw->intr_mask_lock);
  				hw->intr_mask &= ~ISR_RX_PKT;
  				AT_WRITE_REG(hw, REG_IMR, hw->intr_mask);
+				spin_unlock(&hw->intr_mask_lock);
  				__napi_schedule(&adapter->napi);
  			}
  		}
-		if (status & ISR_TX_PKT)
-			atl1c_clean_tx_irq(adapter, atl1c_trans_normal);
+		if (status & ISR_TX_PKT) {
+			if (napi_schedule_prep(&adapter->tx_napi)) {
+				spin_lock(&hw->intr_mask_lock);
+				hw->intr_mask &= ~ISR_TX_PKT;
+				AT_WRITE_REG(hw, REG_IMR, hw->intr_mask);
+				spin_unlock(&hw->intr_mask_lock);
+				__napi_schedule(&adapter->tx_napi);
+			}
+		}

  		handled = IRQ_HANDLED;
  		/* check if PCIE PHY Link down */
@@ -1870,6 +1887,7 @@  static int atl1c_clean(struct napi_struct *napi, 
int budget)
  	struct atl1c_adapter *adapter =
  			container_of(napi, struct atl1c_adapter, napi);
  	int work_done = 0;
+	unsigned long flags;

  	/* Keep link state information with original netdev */
  	if (!netif_carrier_ok(adapter->netdev))
@@ -1880,8 +1898,10 @@  static int atl1c_clean(struct napi_struct *napi, 
int budget)
  	if (work_done < budget) {
  quit_polling:
  		napi_complete_done(napi, work_done);
+		spin_lock_irqsave(&adapter->hw.intr_mask_lock, flags);
  		adapter->hw.intr_mask |= ISR_RX_PKT;
  		AT_WRITE_REG(&adapter->hw, REG_IMR, adapter->hw.intr_mask);
+		spin_unlock_irqrestore(&adapter->hw.intr_mask_lock, flags);
  	}
  	return work_done;
  }
@@ -2319,6 +2339,7 @@  static int atl1c_up(struct atl1c_adapter *adapter)
  	atl1c_check_link_status(adapter);
  	clear_bit(__AT_DOWN, &adapter->flags);
  	napi_enable(&adapter->napi);
+	napi_enable(&adapter->tx_napi);
  	atl1c_irq_enable(adapter);
  	netif_start_queue(netdev);
  	return err;
@@ -2339,6 +2360,7 @@  static void atl1c_down(struct atl1c_adapter 
*adapter)
  	set_bit(__AT_DOWN, &adapter->flags);
  	netif_carrier_off(netdev);
  	napi_disable(&adapter->napi);
+	napi_disable(&adapter->tx_napi);
  	atl1c_irq_disable(adapter);
  	atl1c_free_irq(adapter);
  	/* disable ASPM if device inactive */
@@ -2587,7 +2609,9 @@  static int atl1c_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
  	adapter->mii.mdio_write = atl1c_mdio_write;
  	adapter->mii.phy_id_mask = 0x1f;
  	adapter->mii.reg_num_mask = MDIO_CTRL_REG_MASK;
+	dev_set_threaded(netdev, true);
  	netif_napi_add(netdev, &adapter->napi, atl1c_clean, 64);
+	netif_napi_add(netdev, &adapter->tx_napi, atl1c_clean_tx, 64);
  	timer_setup(&adapter->phy_config_timer, atl1c_phy_config, 0);
  	/* setup the private structure */
  	err = atl1c_sw_init(adapter);
diff --git a/net/core/dev.c b/net/core/dev.c
index 0f72ff5d34ba..489ac60b530c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6789,6 +6789,7 @@  int dev_set_threaded(struct net_device *dev, bool 
threaded)

  	return err;
  }
+EXPORT_SYMBOL(dev_set_threaded);

  void netif_napi_add(struct net_device *dev, struct napi_struct *napi,