diff mbox series

[net-next,v3,7/7] net: piggy back on the memory barrier in bql when waking queues

Message ID 20230405223134.94665-8-kuba@kernel.org (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series net: lockless stop/wake combo macros | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 4246 this patch: 4246
netdev/cc_maintainers warning 1 maintainers not CCed: intel-wired-lan@lists.osuosl.org
netdev/build_clang success Errors and warnings before: 948 this patch: 948
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4453 this patch: 4453
netdev/checkpatch warning WARNING: Avoid unnecessary line continuations WARNING: line length of 81 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: macros should not use a trailing semicolon WARNING: memory barrier without comment
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Jakub Kicinski April 5, 2023, 10:31 p.m. UTC
Drivers call netdev_tx_completed_queue() right before
netif_txq_maybe_wake(). If BQL is enabled netdev_tx_completed_queue()
should issue a memory barrier, so we can depend on that separating
the stop check from the consumer index update, instead of adding
another barrier in netif_txq_maybe_wake().

This matters more than the barriers on the xmit path, because
the wake condition is almost always true. So we issue the
consumer side barrier often.

Plus since macro gets pkt/byte counts as arguments now -
we can skip waking if there were no packets completed.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v3:
 - new patch

CC: michael.chan@broadcom.com
CC: jesse.brandeburg@intel.com
CC: anthony.l.nguyen@intel.com
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  6 +--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 12 +++---
 include/linux/netdevice.h                     |  2 +-
 include/net/netdev_queues.h                   | 41 ++++++++++++++-----
 4 files changed, 40 insertions(+), 21 deletions(-)

Comments

Herbert Xu April 6, 2023, 7:35 a.m. UTC | #1
On Wed, Apr 05, 2023 at 03:31:34PM -0700, Jakub Kicinski wrote:
>					\
> @@ -82,10 +82,26 @@
>  		_res;							\
>  	})								\
>  
> +/* Variant of netdev_tx_completed_queue() which guarantees smp_mb() if
> + * @bytes != 0, regardless of kernel config.
> + */
> +static inline void
> +netdev_txq_completed_mb(struct netdev_queue *dev_queue,
> +			unsigned int pkts, unsigned int bytes)
> +{
> +#ifdef CONFIG_BQL
> +	netdev_tx_completed_queue(dev_queue, pkts, bytes);
> +#else
> +	if (bytes)
> +		smp_mb();
> +#endif
> +}

Minor nit, I would write this as

	if (IS_ENABLED(CONFIG_BQL))
		netdev_tx_completed_queue(dev_queue, pkts, bytes);
	else if (bytes)
		smp_mb();

Actually, why is this checking bytes while the caller is checking
pkts? Do we need to check them at all? If pkts/bytes is commonly
non-zero, then we should just do a barrier unconditionally and make
the uncommon path pay the penalty.

Thanks,
Jakub Kicinski April 7, 2023, 12:41 a.m. UTC | #2
On Thu, 6 Apr 2023 15:35:49 +0800 Herbert Xu wrote:
> Minor nit, I would write this as
> 
> 	if (IS_ENABLED(CONFIG_BQL))
> 		netdev_tx_completed_queue(dev_queue, pkts, bytes);
> 	else if (bytes)
> 		smp_mb();

Will do!

> Actually, why is this checking bytes while the caller is checking
> pkts? Do we need to check them at all? If pkts/bytes is commonly
> non-zero, then we should just do a barrier unconditionally and make
> the uncommon path pay the penalty.

I wanted to keep the same semantics as netdev_tx_completed_queue()
which only barriers if (bytes). Not in the least to make it obvious
to someone looking at the code of netdev_txq_completed_mb() (and not
the comment above it) that it doesn't _always_ put a barrier in.
Herbert Xu April 12, 2023, 6:06 a.m. UTC | #3
On Thu, Apr 06, 2023 at 05:41:40PM -0700, Jakub Kicinski wrote:
>
> I wanted to keep the same semantics as netdev_tx_completed_queue()
> which only barriers if (bytes). Not in the least to make it obvious
> to someone looking at the code of netdev_txq_completed_mb() (and not
> the comment above it) that it doesn't _always_ put a barrier in.

OK, but I think we should instead change netdev_tx_compelted_queue
to do the smp_mb unconditionally.  We should never optimise for the
unlikely case, and it is extremely unlikely for a TX cleanup routine
to wind up with nothing to do.

Thanks,
Jakub Kicinski April 12, 2023, 1:54 p.m. UTC | #4
On Wed, 12 Apr 2023 14:06:34 +0800 Herbert Xu wrote:
> On Thu, Apr 06, 2023 at 05:41:40PM -0700, Jakub Kicinski wrote:
> > I wanted to keep the same semantics as netdev_tx_completed_queue()
> > which only barriers if (bytes). Not in the least to make it obvious
> > to someone looking at the code of netdev_txq_completed_mb() (and not
> > the comment above it) that it doesn't _always_ put a barrier in.  
> 
> OK, but I think we should instead change netdev_tx_compelted_queue
> to do the smp_mb unconditionally.  We should never optimise for the
> unlikely case, and it is extremely unlikely for a TX cleanup routine
> to wind up with nothing to do.

I don't understand what you're trying to argue. The whole point of 
the patch is to use the BQL barrier and BQL returns early, before 
the barrier.

I don't think many people actually build kernels with BQL=n so the other
branch is more *documentation* than it is relevant, executed code.
Herbert Xu April 13, 2023, 2:31 a.m. UTC | #5
On Wed, Apr 12, 2023 at 06:54:08AM -0700, Jakub Kicinski wrote:
>
> I don't understand what you're trying to argue. The whole point of 
> the patch is to use the BQL barrier and BQL returns early, before 
> the barrier.

I'm saying that the smp_mb should be unconditional in all cases.
As it stands the only time when smp_mb will be skipped is when the
TX cleaner finds no work to do.  That's an extremely unlikely case
and there is no point in skipping the barrier just for that.

> I don't think many people actually build kernels with BQL=n so the other
> branch is more *documentation* than it is relevant, executed code.

No I'm not referring to BQL, I'm referring to bytes/pkts == 0.

Looking into the git history, the bytes == 0 check wasn't even
meant for the barrier as the barrier was added later.  So it
was simply there to make the BQL code function correctly.  All
we have to do is move the check around the BQL code and then
the smp_mb can be done unconditionally.

Cheers,
diff mbox series

Patch

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index de97bee25249..f7602d8d79e3 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -688,11 +688,11 @@  static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts)
 		dev_kfree_skb_any(skb);
 	}
 
-	netdev_tx_completed_queue(txq, nr_pkts, tx_bytes);
 	txr->tx_cons = cons;
 
-	__netif_txq_maybe_wake(txq, bnxt_tx_avail(bp, txr), bp->tx_wake_thresh,
-			       READ_ONCE(txr->dev_state) != BNXT_DEV_STATE_CLOSING);
+	__netif_txq_completed_wake(txq, nr_pkts, tx_bytes,
+				   bnxt_tx_avail(bp, txr), bp->tx_wake_thresh,
+				   READ_ONCE(txr->dev_state) != BNXT_DEV_STATE_CLOSING);
 }
 
 static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index cbbddee55db1..f2604fc05991 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1251,15 +1251,13 @@  static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 	if (ring_is_xdp(tx_ring))
 		return !!budget;
 
-	netdev_tx_completed_queue(txring_txq(tx_ring),
-				  total_packets, total_bytes);
-
 #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2)
 	txq = netdev_get_tx_queue(tx_ring->netdev, tx_ring->queue_index);
-	if (total_packets && netif_carrier_ok(tx_ring->netdev) &&
-	    !__netif_txq_maybe_wake(txq, ixgbe_desc_unused(tx_ring),
-				    TX_WAKE_THRESHOLD,
-				    test_bit(__IXGBE_DOWN, &adapter->state)))
+	if (!__netif_txq_completed_wake(txq, total_packets, total_bytes,
+					ixgbe_desc_unused(tx_ring),
+					TX_WAKE_THRESHOLD,
+					netif_carrier_ok(tx_ring->netdev) &&
+					test_bit(__IXGBE_DOWN, &adapter->state)))
 		++tx_ring->tx_stats.restart_queue;
 
 	return !!budget;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 18770d325499..10736fc6d85e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3538,7 +3538,7 @@  static inline void netdev_tx_completed_queue(struct netdev_queue *dev_queue,
 	 * netdev_tx_sent_queue will miss the update and cause the queue to
 	 * be stopped forever
 	 */
-	smp_mb();
+	smp_mb(); /* NOTE: netdev_txq_completed_mb() assumes this exists */
 
 	if (unlikely(dql_avail(&dev_queue->dql) < 0))
 		return;
diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
index 60a9ac5439b3..3e38fe8300a7 100644
--- a/include/net/netdev_queues.h
+++ b/include/net/netdev_queues.h
@@ -38,7 +38,7 @@ 
 		netif_tx_stop_queue(txq);				\
 		/* Producer index and stop bit must be visible		\
 		 * to consumer before we recheck.			\
-		 * Pairs with a barrier in __netif_txq_maybe_wake().	\
+		 * Pairs with a barrier in __netif_txq_completed_wake(). \
 		 */							\
 		smp_mb__after_atomic();					\
 									\
@@ -82,10 +82,26 @@ 
 		_res;							\
 	})								\
 
+/* Variant of netdev_tx_completed_queue() which guarantees smp_mb() if
+ * @bytes != 0, regardless of kernel config.
+ */
+static inline void
+netdev_txq_completed_mb(struct netdev_queue *dev_queue,
+			unsigned int pkts, unsigned int bytes)
+{
+#ifdef CONFIG_BQL
+	netdev_tx_completed_queue(dev_queue, pkts, bytes);
+#else
+	if (bytes)
+		smp_mb();
+#endif
+}
 
 /**
- * __netif_txq_maybe_wake() - locklessly wake a Tx queue, if needed
+ * __netif_txq_completed_wake() - locklessly wake a Tx queue, if needed
  * @txq:	struct netdev_queue to stop/start
+ * @pkts:	number of packets completed
+ * @bytes:	number of bytes completed
  * @get_desc:	get current number of free descriptors (see requirements below!)
  * @start_thrs:	minimal number of descriptors to re-enable the queue
  * @down_cond:	down condition, predicate indicating that the queue should
@@ -94,22 +110,27 @@ 
  * All arguments may be evaluated multiple times.
  * @get_desc must be a formula or a function call, it must always
  * return up-to-date information when evaluated!
+ * Reports completed pkts/bytes to BQL.
  *
  * Returns:
  *	 0 if the queue was woken up
  *	 1 if the queue was already enabled (or disabled but @down_cond is true)
  *	-1 if the queue was left stopped
  */
-#define __netif_txq_maybe_wake(txq, get_desc, start_thrs, down_cond)	\
+#define __netif_txq_completed_wake(txq, pkts, bytes,			\
+				   get_desc, start_thrs, down_cond)	\
 	({								\
 		int _res;						\
 									\
+		/* Report to BQL and piggy back on its barrier.		\
+		 * Barrier makes sure that anybody stopping the queue	\
+		 * after this point sees the new consumer index.	\
+		 * Pairs with barrier in netif_txq_try_stop().		\
+		 */							\
+		netdev_txq_completed_mb(txq, pkts, bytes);		\
+									\
 		_res = -1;						\
-		if (likely(get_desc > start_thrs)) {			\
-			/* Make sure that anybody stopping the queue after \
-			 * this sees the new next_to_clean.		\
-			 */						\
-			smp_mb();					\
+		if (pkts && likely(get_desc > start_thrs)) {		\
 			_res = 1;					\
 			if (unlikely(netif_tx_queue_stopped(txq)) &&	\
 			    !(down_cond)) {				\
@@ -120,8 +141,8 @@ 
 		_res;							\
 	})
 
-#define netif_txq_maybe_wake(txq, get_desc, start_thrs)		\
-	__netif_txq_maybe_wake(txq, get_desc, start_thrs, false)
+#define netif_txq_completed_wake(txq, pkts, bytes, get_desc, start_thrs) \
+	__netif_txq_completed_wake(txq, pkts, bytes, get_desc, start_thrs, false)
 
 /* subqueue variants follow */