Message ID | 20230405223134.94665-8-kuba@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: lockless stop/wake combo macros | expand |
On Wed, Apr 05, 2023 at 03:31:34PM -0700, Jakub Kicinski wrote: > \ > @@ -82,10 +82,26 @@ > _res; \ > }) \ > > +/* Variant of netdev_tx_completed_queue() which guarantees smp_mb() if > + * @bytes != 0, regardless of kernel config. > + */ > +static inline void > +netdev_txq_completed_mb(struct netdev_queue *dev_queue, > + unsigned int pkts, unsigned int bytes) > +{ > +#ifdef CONFIG_BQL > + netdev_tx_completed_queue(dev_queue, pkts, bytes); > +#else > + if (bytes) > + smp_mb(); > +#endif > +} Minor nit, I would write this as if (IS_ENABLED(CONFIG_BQL)) netdev_tx_completed_queue(dev_queue, pkts, bytes); else if (bytes) smp_mb(); Actually, why is this checking bytes while the caller is checking pkts? Do we need to check them at all? If pkts/bytes is commonly non-zero, then we should just do a barrier unconditionally and make the uncommon path pay the penalty. Thanks,
On Thu, 6 Apr 2023 15:35:49 +0800 Herbert Xu wrote: > Minor nit, I would write this as > > if (IS_ENABLED(CONFIG_BQL)) > netdev_tx_completed_queue(dev_queue, pkts, bytes); > else if (bytes) > smp_mb(); Will do! > Actually, why is this checking bytes while the caller is checking > pkts? Do we need to check them at all? If pkts/bytes is commonly > non-zero, then we should just do a barrier unconditionally and make > the uncommon path pay the penalty. I wanted to keep the same semantics as netdev_tx_completed_queue() which only barriers if (bytes). Not in the least to make it obvious to someone looking at the code of netdev_txq_completed_mb() (and not the comment above it) that it doesn't _always_ put a barrier in.
On Thu, Apr 06, 2023 at 05:41:40PM -0700, Jakub Kicinski wrote: > > I wanted to keep the same semantics as netdev_tx_completed_queue() > which only barriers if (bytes). Not in the least to make it obvious > to someone looking at the code of netdev_txq_completed_mb() (and not > the comment above it) that it doesn't _always_ put a barrier in. OK, but I think we should instead change netdev_tx_compelted_queue to do the smp_mb unconditionally. We should never optimise for the unlikely case, and it is extremely unlikely for a TX cleanup routine to wind up with nothing to do. Thanks,
On Wed, 12 Apr 2023 14:06:34 +0800 Herbert Xu wrote: > On Thu, Apr 06, 2023 at 05:41:40PM -0700, Jakub Kicinski wrote: > > I wanted to keep the same semantics as netdev_tx_completed_queue() > > which only barriers if (bytes). Not in the least to make it obvious > > to someone looking at the code of netdev_txq_completed_mb() (and not > > the comment above it) that it doesn't _always_ put a barrier in. > > OK, but I think we should instead change netdev_tx_compelted_queue > to do the smp_mb unconditionally. We should never optimise for the > unlikely case, and it is extremely unlikely for a TX cleanup routine > to wind up with nothing to do. I don't understand what you're trying to argue. The whole point of the patch is to use the BQL barrier and BQL returns early, before the barrier. I don't think many people actually build kernels with BQL=n so the other branch is more *documentation* than it is relevant, executed code.
On Wed, Apr 12, 2023 at 06:54:08AM -0700, Jakub Kicinski wrote: > > I don't understand what you're trying to argue. The whole point of > the patch is to use the BQL barrier and BQL returns early, before > the barrier. I'm saying that the smp_mb should be unconditional in all cases. As it stands the only time when smp_mb will be skipped is when the TX cleaner finds no work to do. That's an extremely unlikely case and there is no point in skipping the barrier just for that. > I don't think many people actually build kernels with BQL=n so the other > branch is more *documentation* than it is relevant, executed code. No I'm not referring to BQL, I'm referring to bytes/pkts == 0. Looking into the git history, the bytes == 0 check wasn't even meant for the barrier as the barrier was added later. So it was simply there to make the BQL code function correctly. All we have to do is move the check around the BQL code and then the smp_mb can be done unconditionally. Cheers,
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index de97bee25249..f7602d8d79e3 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -688,11 +688,11 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts) dev_kfree_skb_any(skb); } - netdev_tx_completed_queue(txq, nr_pkts, tx_bytes); txr->tx_cons = cons; - __netif_txq_maybe_wake(txq, bnxt_tx_avail(bp, txr), bp->tx_wake_thresh, - READ_ONCE(txr->dev_state) != BNXT_DEV_STATE_CLOSING); + __netif_txq_completed_wake(txq, nr_pkts, tx_bytes, + bnxt_tx_avail(bp, txr), bp->tx_wake_thresh, + READ_ONCE(txr->dev_state) != BNXT_DEV_STATE_CLOSING); } static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping, diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index cbbddee55db1..f2604fc05991 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -1251,15 +1251,13 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector, if (ring_is_xdp(tx_ring)) return !!budget; - netdev_tx_completed_queue(txring_txq(tx_ring), - total_packets, total_bytes); - #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2) txq = netdev_get_tx_queue(tx_ring->netdev, tx_ring->queue_index); - if (total_packets && netif_carrier_ok(tx_ring->netdev) && - !__netif_txq_maybe_wake(txq, ixgbe_desc_unused(tx_ring), - TX_WAKE_THRESHOLD, - test_bit(__IXGBE_DOWN, &adapter->state))) + if (!__netif_txq_completed_wake(txq, total_packets, total_bytes, + ixgbe_desc_unused(tx_ring), + TX_WAKE_THRESHOLD, + netif_carrier_ok(tx_ring->netdev) && + test_bit(__IXGBE_DOWN, &adapter->state))) ++tx_ring->tx_stats.restart_queue; return !!budget; diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 18770d325499..10736fc6d85e 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3538,7 +3538,7 @@ static inline void netdev_tx_completed_queue(struct netdev_queue *dev_queue, * netdev_tx_sent_queue will miss the update and cause the queue to * be stopped forever */ - smp_mb(); + smp_mb(); /* NOTE: netdev_txq_completed_mb() assumes this exists */ if (unlikely(dql_avail(&dev_queue->dql) < 0)) return; diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h index 60a9ac5439b3..3e38fe8300a7 100644 --- a/include/net/netdev_queues.h +++ b/include/net/netdev_queues.h @@ -38,7 +38,7 @@ netif_tx_stop_queue(txq); \ /* Producer index and stop bit must be visible \ * to consumer before we recheck. \ - * Pairs with a barrier in __netif_txq_maybe_wake(). \ + * Pairs with a barrier in __netif_txq_completed_wake(). \ */ \ smp_mb__after_atomic(); \ \ @@ -82,10 +82,26 @@ _res; \ }) \ +/* Variant of netdev_tx_completed_queue() which guarantees smp_mb() if + * @bytes != 0, regardless of kernel config. + */ +static inline void +netdev_txq_completed_mb(struct netdev_queue *dev_queue, + unsigned int pkts, unsigned int bytes) +{ +#ifdef CONFIG_BQL + netdev_tx_completed_queue(dev_queue, pkts, bytes); +#else + if (bytes) + smp_mb(); +#endif +} /** - * __netif_txq_maybe_wake() - locklessly wake a Tx queue, if needed + * __netif_txq_completed_wake() - locklessly wake a Tx queue, if needed * @txq: struct netdev_queue to stop/start + * @pkts: number of packets completed + * @bytes: number of bytes completed * @get_desc: get current number of free descriptors (see requirements below!) * @start_thrs: minimal number of descriptors to re-enable the queue * @down_cond: down condition, predicate indicating that the queue should @@ -94,22 +110,27 @@ * All arguments may be evaluated multiple times. * @get_desc must be a formula or a function call, it must always * return up-to-date information when evaluated! + * Reports completed pkts/bytes to BQL. * * Returns: * 0 if the queue was woken up * 1 if the queue was already enabled (or disabled but @down_cond is true) * -1 if the queue was left stopped */ -#define __netif_txq_maybe_wake(txq, get_desc, start_thrs, down_cond) \ +#define __netif_txq_completed_wake(txq, pkts, bytes, \ + get_desc, start_thrs, down_cond) \ ({ \ int _res; \ \ + /* Report to BQL and piggy back on its barrier. \ + * Barrier makes sure that anybody stopping the queue \ + * after this point sees the new consumer index. \ + * Pairs with barrier in netif_txq_try_stop(). \ + */ \ + netdev_txq_completed_mb(txq, pkts, bytes); \ + \ _res = -1; \ - if (likely(get_desc > start_thrs)) { \ - /* Make sure that anybody stopping the queue after \ - * this sees the new next_to_clean. \ - */ \ - smp_mb(); \ + if (pkts && likely(get_desc > start_thrs)) { \ _res = 1; \ if (unlikely(netif_tx_queue_stopped(txq)) && \ !(down_cond)) { \ @@ -120,8 +141,8 @@ _res; \ }) -#define netif_txq_maybe_wake(txq, get_desc, start_thrs) \ - __netif_txq_maybe_wake(txq, get_desc, start_thrs, false) +#define netif_txq_completed_wake(txq, pkts, bytes, get_desc, start_thrs) \ + __netif_txq_completed_wake(txq, pkts, bytes, get_desc, start_thrs, false) /* subqueue variants follow */
Drivers call netdev_tx_completed_queue() right before netif_txq_maybe_wake(). If BQL is enabled netdev_tx_completed_queue() should issue a memory barrier, so we can depend on that separating the stop check from the consumer index update, instead of adding another barrier in netif_txq_maybe_wake(). This matters more than the barriers on the xmit path, because the wake condition is almost always true. So we issue the consumer side barrier often. Plus since macro gets pkt/byte counts as arguments now - we can skip waking if there were no packets completed. Signed-off-by: Jakub Kicinski <kuba@kernel.org> --- v3: - new patch CC: michael.chan@broadcom.com CC: jesse.brandeburg@intel.com CC: anthony.l.nguyen@intel.com --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 6 +-- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 12 +++--- include/linux/netdevice.h | 2 +- include/net/netdev_queues.h | 41 ++++++++++++++----- 4 files changed, 40 insertions(+), 21 deletions(-)