diff mbox series

[net-next,v2,2/3] bnxt: use READ_ONCE/WRITE_ONCE for ring indexes

Message ID 20230412015038.674023-3-kuba@kernel.org (mailing list archive)
State Accepted
Commit 36647b206c014a0bf3ab17bc88f2c840eefd796c
Delegated to: Netdev Maintainers
Headers show
Series net: use READ_ONCE/WRITE_ONCE for ring index accesses | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 23 this patch: 23
netdev/cc_maintainers warning 5 maintainers not CCed: daniel@iogearbox.net john.fastabend@gmail.com bpf@vger.kernel.org ast@kernel.org hawk@kernel.org
netdev/build_clang success Errors and warnings before: 20 this patch: 20
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 22 this patch: 22
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 65 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Jakub Kicinski April 12, 2023, 1:50 a.m. UTC
Eric points out that we should make sure that ring index updates
are wrapped in the appropriate READ_ONCE/WRITE_ONCE macros.

Suggested-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v2:
 - cover writes in bnxt_xdp.c
v1: https://lore.kernel.org/all/20230411013323.513688-3-kuba@kernel.org/
---
CC: michael.chan@broadcom.com
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 6 +++---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     | 9 ++++-----
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 6 +++---
 3 files changed, 10 insertions(+), 11 deletions(-)

Comments

Michael Chan April 12, 2023, 1:59 a.m. UTC | #1
On Tue, Apr 11, 2023 at 6:50 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Eric points out that we should make sure that ring index updates
> are wrapped in the appropriate READ_ONCE/WRITE_ONCE macros.
>
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> v2:
>  - cover writes in bnxt_xdp.c
> v1: https://lore.kernel.org/all/20230411013323.513688-3-kuba@kernel.org/

Thanks.

Reviewed-by: Michael Chan <michael.chan@broadcom.com>
David Laight April 12, 2023, 8:15 a.m. UTC | #2
From: Jakub Kicinski
> Sent: 12 April 2023 02:51
> 
> Eric points out that we should make sure that ring index updates
> are wrapped in the appropriate READ_ONCE/WRITE_ONCE macros.
> 
...
> -static inline u32 bnxt_tx_avail(struct bnxt *bp, struct bnxt_tx_ring_info *txr)
> +static inline u32 bnxt_tx_avail(struct bnxt *bp,
> +				const struct bnxt_tx_ring_info *txr)
>  {
> -	/* Tell compiler to fetch tx indices from memory. */
> -	barrier();
> +	u32 used = READ_ONCE(txr->tx_prod) - READ_ONCE(txr->tx_cons);
> 
> -	return bp->tx_ring_size -
> -		((txr->tx_prod - txr->tx_cons) & bp->tx_ring_mask);
> +	return bp->tx_ring_size - (used & bp->tx_ring_mask);
>  }

Doesn't that function only make sense if only one of
the ring index can be changing?
In this case I think this is being used in the transmit path
so that 'tx_prod' is constant and is either already read
or need not be read again.

Having written a lot of 'ring access' functions over the years
if the ring size is a power of 2 I'd mask the 'tx_prod' value
when it is being used rather than on the increment.
(So the value just wraps modulo 2**32.)
This tends to make the code safer - especially since the
'ring full' and 'ring empty' conditions are different.

Also that code is masking with bp->tx_ring_mask, but the
increments (in hunks I've chopped) use NEXT_TX(prod).
If that is masking with bp->tx_ring_mask then 'bp' should
be a parameter.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Paolo Abeni April 13, 2023, 11:38 a.m. UTC | #3
On Wed, 2023-04-12 at 08:15 +0000, David Laight wrote:
> From: Jakub Kicinski
> > Sent: 12 April 2023 02:51
> > 
> > Eric points out that we should make sure that ring index updates
> > are wrapped in the appropriate READ_ONCE/WRITE_ONCE macros.
> > 
> ...
> > -static inline u32 bnxt_tx_avail(struct bnxt *bp, struct bnxt_tx_ring_info *txr)
> > +static inline u32 bnxt_tx_avail(struct bnxt *bp,
> > +				const struct bnxt_tx_ring_info *txr)
> >  {
> > -	/* Tell compiler to fetch tx indices from memory. */
> > -	barrier();
> > +	u32 used = READ_ONCE(txr->tx_prod) - READ_ONCE(txr->tx_cons);
> > 
> > -	return bp->tx_ring_size -
> > -		((txr->tx_prod - txr->tx_cons) & bp->tx_ring_mask);
> > +	return bp->tx_ring_size - (used & bp->tx_ring_mask);
> >  }
> 
> Doesn't that function only make sense if only one of
> the ring index can be changing?
> In this case I think this is being used in the transmit path
> so that 'tx_prod' is constant and is either already read
> or need not be read again.
> 
> Having written a lot of 'ring access' functions over the years
> if the ring size is a power of 2 I'd mask the 'tx_prod' value
> when it is being used rather than on the increment.
> (So the value just wraps modulo 2**32.)
> This tends to make the code safer - especially since the
> 'ring full' and 'ring empty' conditions are different.
> 
> Also that code is masking with bp->tx_ring_mask, but the
> increments (in hunks I've chopped) use NEXT_TX(prod).
> If that is masking with bp->tx_ring_mask then 'bp' should
> be a parameter.

AFAICS bnxt_tx_avail() is also used in TX interrupt, outside tx path/tx
lock.

I think all the above consideration are more suited for a driver
refactor, while the current patch specifically address potential data
race issues.

Cheers,

Paolo
David Laight April 13, 2023, 2:20 p.m. UTC | #4
From: Paolo Abeni
> Sent: 13 April 2023 12:38
> 
> On Wed, 2023-04-12 at 08:15 +0000, David Laight wrote:
> > From: Jakub Kicinski
> > > Sent: 12 April 2023 02:51
> > >
> > > Eric points out that we should make sure that ring index updates
> > > are wrapped in the appropriate READ_ONCE/WRITE_ONCE macros.
> > >
> > ...
> > > -static inline u32 bnxt_tx_avail(struct bnxt *bp, struct bnxt_tx_ring_info *txr)
> > > +static inline u32 bnxt_tx_avail(struct bnxt *bp,
> > > +				const struct bnxt_tx_ring_info *txr)
> > >  {
> > > -	/* Tell compiler to fetch tx indices from memory. */
> > > -	barrier();
> > > +	u32 used = READ_ONCE(txr->tx_prod) - READ_ONCE(txr->tx_cons);
> > >
> > > -	return bp->tx_ring_size -
> > > -		((txr->tx_prod - txr->tx_cons) & bp->tx_ring_mask);
> > > +	return bp->tx_ring_size - (used & bp->tx_ring_mask);
> > >  }
> >
> > Doesn't that function only make sense if only one of
> > the ring index can be changing?
> > In this case I think this is being used in the transmit path
> > so that 'tx_prod' is constant and is either already read
> > or need not be read again.
> >
...
> 
> AFAICS bnxt_tx_avail() is also used in TX interrupt, outside tx path/tx
> lock.

In which case both tx_prod and tx_cons are subject to possible updates.
It is even possible that the two values have absolutely no relation
to each other, it requires some unusual circumstances, but isn't impossible.
- A high priority interrupt (eg x86 SMM mode) could separate the READ_ONCE().
- Transmit setup will increase tx_prod.
- End of transmit 'reap' often done by other code paths (like rx processing
  or tx setup) can change tx_cons.
So not only is the value immediately stale, it can be just plain wrong.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
diff mbox series

Patch

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index f7602d8d79e3..92289ab2f34a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -472,7 +472,7 @@  static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		prod = NEXT_TX(prod);
 		tx_push->doorbell =
 			cpu_to_le32(DB_KEY_TX_PUSH | DB_LONG_TX_PUSH | prod);
-		txr->tx_prod = prod;
+		WRITE_ONCE(txr->tx_prod, prod);
 
 		tx_buf->is_push = 1;
 		netdev_tx_sent_queue(txq, skb->len);
@@ -583,7 +583,7 @@  static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	wmb();
 
 	prod = NEXT_TX(prod);
-	txr->tx_prod = prod;
+	WRITE_ONCE(txr->tx_prod, prod);
 
 	if (!netdev_xmit_more() || netif_xmit_stopped(txq))
 		bnxt_txr_db_kick(bp, txr, prod);
@@ -688,7 +688,7 @@  static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts)
 		dev_kfree_skb_any(skb);
 	}
 
-	txr->tx_cons = cons;
+	WRITE_ONCE(txr->tx_cons, cons);
 
 	__netif_txq_completed_wake(txq, nr_pkts, tx_bytes,
 				   bnxt_tx_avail(bp, txr), bp->tx_wake_thresh,
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 18cac98ba58e..080e73496066 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -2231,13 +2231,12 @@  struct bnxt {
 #define SFF_MODULE_ID_QSFP28			0x11
 #define BNXT_MAX_PHY_I2C_RESP_SIZE		64
 
-static inline u32 bnxt_tx_avail(struct bnxt *bp, struct bnxt_tx_ring_info *txr)
+static inline u32 bnxt_tx_avail(struct bnxt *bp,
+				const struct bnxt_tx_ring_info *txr)
 {
-	/* Tell compiler to fetch tx indices from memory. */
-	barrier();
+	u32 used = READ_ONCE(txr->tx_prod) - READ_ONCE(txr->tx_cons);
 
-	return bp->tx_ring_size -
-		((txr->tx_prod - txr->tx_cons) & bp->tx_ring_mask);
+	return bp->tx_ring_size - (used & bp->tx_ring_mask);
 }
 
 static inline void bnxt_writeq(struct bnxt *bp, u64 val,
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index 5843c93b1711..4efa5fe6972b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -64,7 +64,7 @@  struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp,
 		int frag_len;
 
 		prod = NEXT_TX(prod);
-		txr->tx_prod = prod;
+		WRITE_ONCE(txr->tx_prod, prod);
 
 		/* first fill up the first buffer */
 		frag_tx_buf = &txr->tx_buf_ring[prod];
@@ -94,7 +94,7 @@  struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp,
 	/* Sync TX BD */
 	wmb();
 	prod = NEXT_TX(prod);
-	txr->tx_prod = prod;
+	WRITE_ONCE(txr->tx_prod, prod);
 
 	return tx_buf;
 }
@@ -161,7 +161,7 @@  void bnxt_tx_int_xdp(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts)
 		}
 		tx_cons = NEXT_TX(tx_cons);
 	}
-	txr->tx_cons = tx_cons;
+	WRITE_ONCE(txr->tx_cons, tx_cons);
 	if (rx_doorbell_needed) {
 		tx_buf = &txr->tx_buf_ring[last_tx_cons];
 		bnxt_db_write(bp, &rxr->rx_db, tx_buf->rx_prod);