Message ID | 20230412015038.674023-3-kuba@kernel.org (mailing list archive) |
---|---|
State | Accepted |
Commit | 36647b206c014a0bf3ab17bc88f2c840eefd796c |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: use READ_ONCE/WRITE_ONCE for ring index accesses | expand |
On Tue, Apr 11, 2023 at 6:50 PM Jakub Kicinski <kuba@kernel.org> wrote: > > Eric points out that we should make sure that ring index updates > are wrapped in the appropriate READ_ONCE/WRITE_ONCE macros. > > Suggested-by: Eric Dumazet <edumazet@google.com> > Reviewed-by: Eric Dumazet <edumazet@google.com> > Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> > Signed-off-by: Jakub Kicinski <kuba@kernel.org> > --- > v2: > - cover writes in bnxt_xdp.c > v1: https://lore.kernel.org/all/20230411013323.513688-3-kuba@kernel.org/ Thanks. Reviewed-by: Michael Chan <michael.chan@broadcom.com>
From: Jakub Kicinski > Sent: 12 April 2023 02:51 > > Eric points out that we should make sure that ring index updates > are wrapped in the appropriate READ_ONCE/WRITE_ONCE macros. > ... > -static inline u32 bnxt_tx_avail(struct bnxt *bp, struct bnxt_tx_ring_info *txr) > +static inline u32 bnxt_tx_avail(struct bnxt *bp, > + const struct bnxt_tx_ring_info *txr) > { > - /* Tell compiler to fetch tx indices from memory. */ > - barrier(); > + u32 used = READ_ONCE(txr->tx_prod) - READ_ONCE(txr->tx_cons); > > - return bp->tx_ring_size - > - ((txr->tx_prod - txr->tx_cons) & bp->tx_ring_mask); > + return bp->tx_ring_size - (used & bp->tx_ring_mask); > } Doesn't that function only make sense if only one of the ring index can be changing? In this case I think this is being used in the transmit path so that 'tx_prod' is constant and is either already read or need not be read again. Having written a lot of 'ring access' functions over the years if the ring size is a power of 2 I'd mask the 'tx_prod' value when it is being used rather than on the increment. (So the value just wraps modulo 2**32.) This tends to make the code safer - especially since the 'ring full' and 'ring empty' conditions are different. Also that code is masking with bp->tx_ring_mask, but the increments (in hunks I've chopped) use NEXT_TX(prod). If that is masking with bp->tx_ring_mask then 'bp' should be a parameter. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Wed, 2023-04-12 at 08:15 +0000, David Laight wrote: > From: Jakub Kicinski > > Sent: 12 April 2023 02:51 > > > > Eric points out that we should make sure that ring index updates > > are wrapped in the appropriate READ_ONCE/WRITE_ONCE macros. > > > ... > > -static inline u32 bnxt_tx_avail(struct bnxt *bp, struct bnxt_tx_ring_info *txr) > > +static inline u32 bnxt_tx_avail(struct bnxt *bp, > > + const struct bnxt_tx_ring_info *txr) > > { > > - /* Tell compiler to fetch tx indices from memory. */ > > - barrier(); > > + u32 used = READ_ONCE(txr->tx_prod) - READ_ONCE(txr->tx_cons); > > > > - return bp->tx_ring_size - > > - ((txr->tx_prod - txr->tx_cons) & bp->tx_ring_mask); > > + return bp->tx_ring_size - (used & bp->tx_ring_mask); > > } > > Doesn't that function only make sense if only one of > the ring index can be changing? > In this case I think this is being used in the transmit path > so that 'tx_prod' is constant and is either already read > or need not be read again. > > Having written a lot of 'ring access' functions over the years > if the ring size is a power of 2 I'd mask the 'tx_prod' value > when it is being used rather than on the increment. > (So the value just wraps modulo 2**32.) > This tends to make the code safer - especially since the > 'ring full' and 'ring empty' conditions are different. > > Also that code is masking with bp->tx_ring_mask, but the > increments (in hunks I've chopped) use NEXT_TX(prod). > If that is masking with bp->tx_ring_mask then 'bp' should > be a parameter. AFAICS bnxt_tx_avail() is also used in TX interrupt, outside tx path/tx lock. I think all the above consideration are more suited for a driver refactor, while the current patch specifically address potential data race issues. Cheers, Paolo
From: Paolo Abeni > Sent: 13 April 2023 12:38 > > On Wed, 2023-04-12 at 08:15 +0000, David Laight wrote: > > From: Jakub Kicinski > > > Sent: 12 April 2023 02:51 > > > > > > Eric points out that we should make sure that ring index updates > > > are wrapped in the appropriate READ_ONCE/WRITE_ONCE macros. > > > > > ... > > > -static inline u32 bnxt_tx_avail(struct bnxt *bp, struct bnxt_tx_ring_info *txr) > > > +static inline u32 bnxt_tx_avail(struct bnxt *bp, > > > + const struct bnxt_tx_ring_info *txr) > > > { > > > - /* Tell compiler to fetch tx indices from memory. */ > > > - barrier(); > > > + u32 used = READ_ONCE(txr->tx_prod) - READ_ONCE(txr->tx_cons); > > > > > > - return bp->tx_ring_size - > > > - ((txr->tx_prod - txr->tx_cons) & bp->tx_ring_mask); > > > + return bp->tx_ring_size - (used & bp->tx_ring_mask); > > > } > > > > Doesn't that function only make sense if only one of > > the ring index can be changing? > > In this case I think this is being used in the transmit path > > so that 'tx_prod' is constant and is either already read > > or need not be read again. > > ... > > AFAICS bnxt_tx_avail() is also used in TX interrupt, outside tx path/tx > lock. In which case both tx_prod and tx_cons are subject to possible updates. It is even possible that the two values have absolutely no relation to each other, it requires some unusual circumstances, but isn't impossible. - A high priority interrupt (eg x86 SMM mode) could separate the READ_ONCE(). - Transmit setup will increase tx_prod. - End of transmit 'reap' often done by other code paths (like rx processing or tx setup) can change tx_cons. So not only is the value immediately stale, it can be just plain wrong. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index f7602d8d79e3..92289ab2f34a 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -472,7 +472,7 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev) prod = NEXT_TX(prod); tx_push->doorbell = cpu_to_le32(DB_KEY_TX_PUSH | DB_LONG_TX_PUSH | prod); - txr->tx_prod = prod; + WRITE_ONCE(txr->tx_prod, prod); tx_buf->is_push = 1; netdev_tx_sent_queue(txq, skb->len); @@ -583,7 +583,7 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev) wmb(); prod = NEXT_TX(prod); - txr->tx_prod = prod; + WRITE_ONCE(txr->tx_prod, prod); if (!netdev_xmit_more() || netif_xmit_stopped(txq)) bnxt_txr_db_kick(bp, txr, prod); @@ -688,7 +688,7 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts) dev_kfree_skb_any(skb); } - txr->tx_cons = cons; + WRITE_ONCE(txr->tx_cons, cons); __netif_txq_completed_wake(txq, nr_pkts, tx_bytes, bnxt_tx_avail(bp, txr), bp->tx_wake_thresh, diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index 18cac98ba58e..080e73496066 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -2231,13 +2231,12 @@ struct bnxt { #define SFF_MODULE_ID_QSFP28 0x11 #define BNXT_MAX_PHY_I2C_RESP_SIZE 64 -static inline u32 bnxt_tx_avail(struct bnxt *bp, struct bnxt_tx_ring_info *txr) +static inline u32 bnxt_tx_avail(struct bnxt *bp, + const struct bnxt_tx_ring_info *txr) { - /* Tell compiler to fetch tx indices from memory. */ - barrier(); + u32 used = READ_ONCE(txr->tx_prod) - READ_ONCE(txr->tx_cons); - return bp->tx_ring_size - - ((txr->tx_prod - txr->tx_cons) & bp->tx_ring_mask); + return bp->tx_ring_size - (used & bp->tx_ring_mask); } static inline void bnxt_writeq(struct bnxt *bp, u64 val, diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c index 5843c93b1711..4efa5fe6972b 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c @@ -64,7 +64,7 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp, int frag_len; prod = NEXT_TX(prod); - txr->tx_prod = prod; + WRITE_ONCE(txr->tx_prod, prod); /* first fill up the first buffer */ frag_tx_buf = &txr->tx_buf_ring[prod]; @@ -94,7 +94,7 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp, /* Sync TX BD */ wmb(); prod = NEXT_TX(prod); - txr->tx_prod = prod; + WRITE_ONCE(txr->tx_prod, prod); return tx_buf; } @@ -161,7 +161,7 @@ void bnxt_tx_int_xdp(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts) } tx_cons = NEXT_TX(tx_cons); } - txr->tx_cons = tx_cons; + WRITE_ONCE(txr->tx_cons, tx_cons); if (rx_doorbell_needed) { tx_buf = &txr->tx_buf_ring[last_tx_cons]; bnxt_db_write(bp, &rxr->rx_db, tx_buf->rx_prod);