diff mbox series

[net,6/8] ice: improve updating ice_{t, r}x_ring::xsk_pool

Message ID 20240708221416.625850-7-anthony.l.nguyen@intel.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series ice: fix AF_XDP ZC timeout and concurrency issues | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 833 this patch: 833
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 835 this patch: 835
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 837 this patch: 837
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 321 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 194 this patch: 193
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-07-09--21-00 (tests: 693)

Commit Message

Tony Nguyen July 8, 2024, 10:14 p.m. UTC
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

xsk_buff_pool pointers that ice ring structs hold are updated via
ndo_bpf that is executed in process context while it can be read by
remote CPU at the same time within NAPI poll. Use synchronize_net()
after pointer update and {READ,WRITE}_ONCE() when working with mentioned
pointer.

Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h      | 11 ++--
 drivers/net/ethernet/intel/ice/ice_base.c |  4 +-
 drivers/net/ethernet/intel/ice/ice_main.c |  2 +-
 drivers/net/ethernet/intel/ice/ice_txrx.c |  4 +-
 drivers/net/ethernet/intel/ice/ice_xsk.c  | 78 ++++++++++++++---------
 drivers/net/ethernet/intel/ice/ice_xsk.h  |  4 +-
 6 files changed, 61 insertions(+), 42 deletions(-)

Comments

Jakub Kicinski July 10, 2024, 1:45 a.m. UTC | #1
On Mon,  8 Jul 2024 15:14:12 -0700 Tony Nguyen wrote:
> @@ -1556,7 +1556,7 @@ int ice_napi_poll(struct napi_struct *napi, int budget)
>  		 * comparison in the irq context instead of many inside the
>  		 * ice_clean_rx_irq function and makes the codebase cleaner.
>  		 */
> -		cleaned = rx_ring->xsk_pool ?
> +		cleaned = READ_ONCE(rx_ring->xsk_pool) ?
>  			  ice_clean_rx_irq_zc(rx_ring, budget_per_ring) :
>  			  ice_clean_rx_irq(rx_ring, budget_per_ring);
>  		work_done += cleaned;


> @@ -832,8 +839,8 @@ ice_add_xsk_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *first,
>   */
>  int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
>  {
> +	struct xsk_buff_pool *xsk_pool = READ_ONCE(rx_ring->xsk_pool);
>  	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
> -	struct xsk_buff_pool *xsk_pool = rx_ring->xsk_pool;
>  	u32 ntc = rx_ring->next_to_clean;
>  	u32 ntu = rx_ring->next_to_use;
>  	struct xdp_buff *first = NULL;

This looks suspicious, you need to at least explain why it's correct.
READ_ONCE() means one access per critical section, usually.
You access it at least twice in a single NAPI pool.
Fijalkowski, Maciej July 23, 2024, 11:46 p.m. UTC | #2
On Tue, Jul 09, 2024 at 06:45:24PM -0700, Jakub Kicinski wrote:
> On Mon,  8 Jul 2024 15:14:12 -0700 Tony Nguyen wrote:
> > @@ -1556,7 +1556,7 @@ int ice_napi_poll(struct napi_struct *napi, int budget)
> >  		 * comparison in the irq context instead of many inside the
> >  		 * ice_clean_rx_irq function and makes the codebase cleaner.
> >  		 */
> > -		cleaned = rx_ring->xsk_pool ?
> > +		cleaned = READ_ONCE(rx_ring->xsk_pool) ?
> >  			  ice_clean_rx_irq_zc(rx_ring, budget_per_ring) :
> >  			  ice_clean_rx_irq(rx_ring, budget_per_ring);
> >  		work_done += cleaned;
> 
> 
> > @@ -832,8 +839,8 @@ ice_add_xsk_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *first,
> >   */
> >  int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
> >  {
> > +	struct xsk_buff_pool *xsk_pool = READ_ONCE(rx_ring->xsk_pool);
> >  	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
> > -	struct xsk_buff_pool *xsk_pool = rx_ring->xsk_pool;
> >  	u32 ntc = rx_ring->next_to_clean;
> >  	u32 ntu = rx_ring->next_to_use;
> >  	struct xdp_buff *first = NULL;
> 
> This looks suspicious, you need to at least explain why it's correct.
> READ_ONCE() means one access per critical section, usually.
> You access it at least twice in a single NAPI pool.

Hey after break! Comebacks are tough, vacation was followed by flu so bear
with me please...

Actually xsk_pool *can* be accessed multiple times during the refill of HW
Rx ring (at the end of napi poll Rx side). I thought it would be safe to
follow the scheme of xdp prog pointer handling, where we read it from ring
once per napi loop then work on local pointer.

Goal of this commit was to prevent compiler from code reoder such as NAPI
is launched before update of xsk_buff_pool pointer which is achieved with
WRITE_ONCE()/synchronize_net() pair. Then per my understanding single
READ_ONCE() within NAPI was sufficient, the one that makes the decision
which Rx routine should be called (zc or standard one). Given that bh are
disabled and updater respects RCU grace period IMHO pointer is valid for
current NAPI cycle.

If you're saying it's not correct and each and every xsk_pool reference
within NAPI has to be decorated with READ_ONCE() then so is the xdp_prog
pointer, but I'd like to hear more about this.
Jakub Kicinski July 24, 2024, 2:57 p.m. UTC | #3
On Wed, 24 Jul 2024 01:46:11 +0200 Maciej Fijalkowski wrote:
> Goal of this commit was to prevent compiler from code reoder such as NAPI
> is launched before update of xsk_buff_pool pointer which is achieved with
> WRITE_ONCE()/synchronize_net() pair. Then per my understanding single
> READ_ONCE() within NAPI was sufficient, the one that makes the decision
> which Rx routine should be called (zc or standard one). Given that bh are
> disabled and updater respects RCU grace period IMHO pointer is valid for
> current NAPI cycle.

So if we are already in the af_xdp handler, and update patch sets pool
to NULL - the af_xdp handler will be fine with the pool becoming NULL?
I guess it may be fine, it's just quite odd to call the function called
_ONCE() multiple times..
Fijalkowski, Maciej July 24, 2024, 3:49 p.m. UTC | #4
On Wed, Jul 24, 2024 at 07:57:42AM -0700, Jakub Kicinski wrote:
> On Wed, 24 Jul 2024 01:46:11 +0200 Maciej Fijalkowski wrote:
> > Goal of this commit was to prevent compiler from code reoder such as NAPI
> > is launched before update of xsk_buff_pool pointer which is achieved with
> > WRITE_ONCE()/synchronize_net() pair. Then per my understanding single
> > READ_ONCE() within NAPI was sufficient, the one that makes the decision
> > which Rx routine should be called (zc or standard one). Given that bh are
> > disabled and updater respects RCU grace period IMHO pointer is valid for
> > current NAPI cycle.
> 
> So if we are already in the af_xdp handler, and update patch sets pool
> to NULL - the af_xdp handler will be fine with the pool becoming NULL?
> I guess it may be fine, it's just quite odd to call the function called
> _ONCE() multiple times..

Update path before NULLing pool will go through rcu grace period, stop
napis, disable irqs, etc. Running napi won't be exposed to nulled pool in
such case.

>
Jakub Kicinski July 25, 2024, 1:38 p.m. UTC | #5
On Wed, 24 Jul 2024 17:49:12 +0200 Maciej Fijalkowski wrote:
> > So if we are already in the af_xdp handler, and update patch sets pool
> > to NULL - the af_xdp handler will be fine with the pool becoming NULL?
> > I guess it may be fine, it's just quite odd to call the function called
> > _ONCE() multiple times..  
> 
> Update path before NULLing pool will go through rcu grace period, stop
> napis, disable irqs, etc. Running napi won't be exposed to nulled pool in
> such case.

Could you make it clearer what condition the patch is fixing, then?
What can go wrong without this patch?
Fijalkowski, Maciej July 25, 2024, 6:31 p.m. UTC | #6
On Thu, Jul 25, 2024 at 06:38:58AM -0700, Jakub Kicinski wrote:
> On Wed, 24 Jul 2024 17:49:12 +0200 Maciej Fijalkowski wrote:
> > > So if we are already in the af_xdp handler, and update patch sets pool
> > > to NULL - the af_xdp handler will be fine with the pool becoming NULL?
> > > I guess it may be fine, it's just quite odd to call the function called
> > > _ONCE() multiple times..  
> > 
> > Update path before NULLing pool will go through rcu grace period, stop
> > napis, disable irqs, etc. Running napi won't be exposed to nulled pool in
> > such case.
> 
> Could you make it clearer what condition the patch is fixing, then?
> What can go wrong without this patch?

Sorry for confusion, but without this patch scenario you brought up
initially *could* happen, under some wild circumstances. When I was
responding yesterday my head was around the code with this particular
patch in place, that's why I said such pool state transistion was not
possible.

Updater does this (prior to this patch):

	(...)
	ring->xsk_pool = ice_get_xp_from_qid(vsi, qid); // set to NULL
	(...)
	ice_qvec_toggle_napi(vsi, q_vector, true);
	ice_qvec_ena_irq(vsi, q_vector);

In theory compiler is allowed to transform the code in a way that xsk_pool
assignment will happen *after* triggering napi. So in ice_napi_poll():

		if (tx_ring->xsk_pool)
			wd = ice_xmit_zc(tx_ring); // call ZC routine
		else if (ice_ring_is_xdp(tx_ring))
			wd = true;
		else
			wd = ice_clean_tx_irq(tx_ring, budget);

You will initiate ZC Tx processing because xsk_pool ptr was still valid
and crash in the middle of its job once it's finally NULLed. To prevent
that:

updater:
	(...)
	WRITE_ONCE(ring->xsk_pool, ice_get_xp_from_qid(vsi, qid));
	(...)
	ice_qvec_toggle_napi(vsi, q_vector, true);
	ice_qvec_ena_irq(vsi, q_vector);
	/* make sure NAPI sees updated ice_{t,x}_ring::xsk_pool */
	synchronize_net();

reader:
		if (READ_ONCE(tx_ring->xsk_pool))
			wd = ice_xmit_zc(tx_ring);
		else if (ice_ring_is_xdp(tx_ring))
			wd = true;
		else
			wd = ice_clean_tx_irq(tx_ring, budget);

Does that make any sense now?
Jakub Kicinski July 25, 2024, 11:07 p.m. UTC | #7
On Thu, 25 Jul 2024 20:31:31 +0200 Maciej Fijalkowski wrote:
> Does that make any sense now?

Could be brain fog due to post-netdev.conf covid but no, not really.

The _ONCE() helpers basically give you the ability to store the pointer
to a variable on the stack, and that variable won't change behind your
back. But the only reason to READ_ONCE(ptr->thing) something multiple
times is to tell KCSAN that "I know what I'm doing", it just silences
potential warnings :S
Fijalkowski, Maciej July 26, 2024, 1:43 p.m. UTC | #8
On Thu, Jul 25, 2024 at 04:07:00PM -0700, Jakub Kicinski wrote:
> On Thu, 25 Jul 2024 20:31:31 +0200 Maciej Fijalkowski wrote:
> > Does that make any sense now?
> 
> Could be brain fog due to post-netdev.conf covid but no, not really.

Huh, that makes two of us.

> 
> The _ONCE() helpers basically give you the ability to store the pointer
> to a variable on the stack, and that variable won't change behind your
> back. But the only reason to READ_ONCE(ptr->thing) something multiple
> times is to tell KCSAN that "I know what I'm doing", it just silences
> potential warnings :S

I feel like you keep on referring to _ONCE (*) being used multiple times
which might be counter-intuitive whereas I was trying from the beginning
to explain my point that xsk pool from driver POV should get the very same
treatment as xdp prog has currently. So, either mark it as __rcu variable
and use rcu helpers or use _ONCE variants plus some sync.

(*) Ok, if you meant from the very beginning that two READ_ONCE against
pool per single critical section is suspicious then I didn't get that,
sorry. With diff below I would have single READ_ONCE and work on that
variable for rest of the napi. Patch was actually trying to limit xsk_pool
accesses from ring struct by working on stack variable.

Would you be okay with that?

-----8<-----

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 4c115531beba..5b27aaaa94ee 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -1550,14 +1550,15 @@ int ice_napi_poll(struct napi_struct *napi, int budget)
 		budget_per_ring = budget;
 
 	ice_for_each_rx_ring(rx_ring, q_vector->rx) {
+		struct xsk_buff_pool *xsk_pool = READ_ONCE(rx_ring->xsk_pool);
 		int cleaned;
 
 		/* A dedicated path for zero-copy allows making a single
 		 * comparison in the irq context instead of many inside the
 		 * ice_clean_rx_irq function and makes the codebase cleaner.
 		 */
-		cleaned = READ_ONCE(rx_ring->xsk_pool) ?
-			  ice_clean_rx_irq_zc(rx_ring, budget_per_ring) :
+		cleaned = rx_ring->xsk_pool ?
+			  ice_clean_rx_irq_zc(rx_ring, xsk_pool, budget_per_ring) :
 			  ice_clean_rx_irq(rx_ring, budget_per_ring);
 		work_done += cleaned;
 		/* if we clean as many as budgeted, we must not be done */
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 492a9e54d58b..dceab7619a64 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -837,13 +837,15 @@ ice_add_xsk_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *first,
 /**
  * ice_clean_rx_irq_zc - consumes packets from the hardware ring
  * @rx_ring: AF_XDP Rx ring
+ * @xsk_pool: AF_XDP pool ptr
  * @budget: NAPI budget
  *
  * Returns number of processed packets on success, remaining budget on failure.
  */
-int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
+int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring,
+			struct xsk_buff_pool *xsk_pool,
+			int budget)
 {
-	struct xsk_buff_pool *xsk_pool = READ_ONCE(rx_ring->xsk_pool);
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	u32 ntc = rx_ring->next_to_clean;
 	u32 ntu = rx_ring->next_to_use;
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.h b/drivers/net/ethernet/intel/ice/ice_xsk.h
index 4cd2d62a0836..8c3675185699 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.h
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.h
@@ -20,7 +20,9 @@ struct ice_vsi;
 #ifdef CONFIG_XDP_SOCKETS
 int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool,
 		       u16 qid);
-int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget);
+int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring,
+			struct xsk_buff_pool *xsk_pool,
+			int budget);
 int ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, u32 flags);
 bool ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring,
 			  struct xsk_buff_pool *xsk_pool, u16 count);
@@ -45,6 +47,7 @@ ice_xsk_pool_setup(struct ice_vsi __always_unused *vsi,
 
 static inline int
 ice_clean_rx_irq_zc(struct ice_rx_ring __always_unused *rx_ring,
+		    struct xsk_buff_pool __always_unused *xsk_pool,
 		    int __always_unused budget)
 {
 	return 0;

----->8-----
Jakub Kicinski July 26, 2024, 2:37 p.m. UTC | #9
On Fri, 26 Jul 2024 15:43:20 +0200 Maciej Fijalkowski wrote:
> > The _ONCE() helpers basically give you the ability to store the pointer
> > to a variable on the stack, and that variable won't change behind your
> > back. But the only reason to READ_ONCE(ptr->thing) something multiple
> > times is to tell KCSAN that "I know what I'm doing", it just silences
> > potential warnings :S  
> 
> I feel like you keep on referring to _ONCE (*) being used multiple times
> which might be counter-intuitive whereas I was trying from the beginning
> to explain my point that xsk pool from driver POV should get the very same
> treatment as xdp prog has currently. So, either mark it as __rcu variable
> and use rcu helpers or use _ONCE variants plus some sync.
> 
> (*) Ok, if you meant from the very beginning that two READ_ONCE against
> pool per single critical section is suspicious then I didn't get that,
> sorry. With diff below I would have single READ_ONCE and work on that
> variable for rest of the napi. Patch was actually trying to limit xsk_pool
> accesses from ring struct by working on stack variable.
> 
> Would you be okay with that?

Yup! That diff makes sense, thanks!
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 99a75a59078e..caaa10157909 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -765,18 +765,17 @@  static inline struct xsk_buff_pool *ice_get_xp_from_qid(struct ice_vsi *vsi,
 }
 
 /**
- * ice_xsk_pool - get XSK buffer pool bound to a ring
+ * ice_rx_xsk_pool - assign XSK buff pool to Rx ring
  * @ring: Rx ring to use
  *
- * Returns a pointer to xsk_buff_pool structure if there is a buffer pool
- * present, NULL otherwise.
+ * Sets XSK buff pool pointer on Rx ring.
  */
-static inline struct xsk_buff_pool *ice_xsk_pool(struct ice_rx_ring *ring)
+static inline void ice_rx_xsk_pool(struct ice_rx_ring *ring)
 {
 	struct ice_vsi *vsi = ring->vsi;
 	u16 qid = ring->q_index;
 
-	return ice_get_xp_from_qid(vsi, qid);
+	WRITE_ONCE(ring->xsk_pool, ice_get_xp_from_qid(vsi, qid));
 }
 
 /**
@@ -801,7 +800,7 @@  static inline void ice_tx_xsk_pool(struct ice_vsi *vsi, u16 qid)
 	if (!ring)
 		return;
 
-	ring->xsk_pool = ice_get_xp_from_qid(vsi, qid);
+	WRITE_ONCE(ring->xsk_pool, ice_get_xp_from_qid(vsi, qid));
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index 5d396c1a7731..1facf179a96f 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -536,7 +536,7 @@  static int ice_vsi_cfg_rxq(struct ice_rx_ring *ring)
 				return err;
 		}
 
-		ring->xsk_pool = ice_xsk_pool(ring);
+		ice_rx_xsk_pool(ring);
 		if (ring->xsk_pool) {
 			xdp_rxq_info_unreg(&ring->xdp_rxq);
 
@@ -597,7 +597,7 @@  static int ice_vsi_cfg_rxq(struct ice_rx_ring *ring)
 			return 0;
 		}
 
-		ok = ice_alloc_rx_bufs_zc(ring, num_bufs);
+		ok = ice_alloc_rx_bufs_zc(ring, ring->xsk_pool, num_bufs);
 		if (!ok) {
 			u16 pf_q = ring->vsi->rxq_map[ring->q_index];
 
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 55a42aad92a5..9b075dd48889 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -2949,7 +2949,7 @@  static void ice_vsi_rx_napi_schedule(struct ice_vsi *vsi)
 	ice_for_each_rxq(vsi, i) {
 		struct ice_rx_ring *rx_ring = vsi->rx_rings[i];
 
-		if (rx_ring->xsk_pool)
+		if (READ_ONCE(rx_ring->xsk_pool))
 			napi_schedule(&rx_ring->q_vector->napi);
 	}
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 8bb743f78fcb..f4b2b1bca234 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -1523,7 +1523,7 @@  int ice_napi_poll(struct napi_struct *napi, int budget)
 	ice_for_each_tx_ring(tx_ring, q_vector->tx) {
 		bool wd;
 
-		if (tx_ring->xsk_pool)
+		if (READ_ONCE(tx_ring->xsk_pool))
 			wd = ice_xmit_zc(tx_ring);
 		else if (ice_ring_is_xdp(tx_ring))
 			wd = true;
@@ -1556,7 +1556,7 @@  int ice_napi_poll(struct napi_struct *napi, int budget)
 		 * comparison in the irq context instead of many inside the
 		 * ice_clean_rx_irq function and makes the codebase cleaner.
 		 */
-		cleaned = rx_ring->xsk_pool ?
+		cleaned = READ_ONCE(rx_ring->xsk_pool) ?
 			  ice_clean_rx_irq_zc(rx_ring, budget_per_ring) :
 			  ice_clean_rx_irq(rx_ring, budget_per_ring);
 		work_done += cleaned;
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 3fbe4cfadfbf..b4058c4937bc 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -250,6 +250,8 @@  static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
 	ice_qvec_toggle_napi(vsi, q_vector, true);
 	ice_qvec_ena_irq(vsi, q_vector);
 
+	/* make sure NAPI sees updated ice_{t,x}_ring::xsk_pool */
+	synchronize_net();
 	ice_get_link_status(vsi->port_info, &link_up);
 	if (link_up) {
 		netif_tx_start_queue(netdev_get_tx_queue(vsi->netdev, q_idx));
@@ -464,6 +466,7 @@  static u16 ice_fill_rx_descs(struct xsk_buff_pool *pool, struct xdp_buff **xdp,
 /**
  * __ice_alloc_rx_bufs_zc - allocate a number of Rx buffers
  * @rx_ring: Rx ring
+ * @xsk_pool: XSK buffer pool to pick buffers to be filled by HW
  * @count: The number of buffers to allocate
  *
  * Place the @count of descriptors onto Rx ring. Handle the ring wrap
@@ -472,7 +475,8 @@  static u16 ice_fill_rx_descs(struct xsk_buff_pool *pool, struct xdp_buff **xdp,
  *
  * Returns true if all allocations were successful, false if any fail.
  */
-static bool __ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count)
+static bool __ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring,
+				   struct xsk_buff_pool *xsk_pool, u16 count)
 {
 	u32 nb_buffs_extra = 0, nb_buffs = 0;
 	union ice_32b_rx_flex_desc *rx_desc;
@@ -484,8 +488,7 @@  static bool __ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count)
 	xdp = ice_xdp_buf(rx_ring, ntu);
 
 	if (ntu + count >= rx_ring->count) {
-		nb_buffs_extra = ice_fill_rx_descs(rx_ring->xsk_pool, xdp,
-						   rx_desc,
+		nb_buffs_extra = ice_fill_rx_descs(xsk_pool, xdp, rx_desc,
 						   rx_ring->count - ntu);
 		if (nb_buffs_extra != rx_ring->count - ntu) {
 			ntu += nb_buffs_extra;
@@ -498,7 +501,7 @@  static bool __ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count)
 		ice_release_rx_desc(rx_ring, 0);
 	}
 
-	nb_buffs = ice_fill_rx_descs(rx_ring->xsk_pool, xdp, rx_desc, count);
+	nb_buffs = ice_fill_rx_descs(xsk_pool, xdp, rx_desc, count);
 
 	ntu += nb_buffs;
 	if (ntu == rx_ring->count)
@@ -514,6 +517,7 @@  static bool __ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count)
 /**
  * ice_alloc_rx_bufs_zc - allocate a number of Rx buffers
  * @rx_ring: Rx ring
+ * @xsk_pool: XSK buffer pool to pick buffers to be filled by HW
  * @count: The number of buffers to allocate
  *
  * Wrapper for internal allocation routine; figure out how many tail
@@ -521,7 +525,8 @@  static bool __ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count)
  *
  * Returns true if all calls to internal alloc routine succeeded
  */
-bool ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count)
+bool ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring,
+			  struct xsk_buff_pool *xsk_pool, u16 count)
 {
 	u16 rx_thresh = ICE_RING_QUARTER(rx_ring);
 	u16 leftover, i, tail_bumps;
@@ -530,9 +535,9 @@  bool ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count)
 	leftover = count - (tail_bumps * rx_thresh);
 
 	for (i = 0; i < tail_bumps; i++)
-		if (!__ice_alloc_rx_bufs_zc(rx_ring, rx_thresh))
+		if (!__ice_alloc_rx_bufs_zc(rx_ring, xsk_pool, rx_thresh))
 			return false;
-	return __ice_alloc_rx_bufs_zc(rx_ring, leftover);
+	return __ice_alloc_rx_bufs_zc(rx_ring, xsk_pool, leftover);
 }
 
 /**
@@ -653,7 +658,7 @@  static u32 ice_clean_xdp_irq_zc(struct ice_tx_ring *xdp_ring)
 	if (xdp_ring->next_to_clean >= cnt)
 		xdp_ring->next_to_clean -= cnt;
 	if (xsk_frames)
-		xsk_tx_completed(xdp_ring->xsk_pool, xsk_frames);
+		xsk_tx_completed(READ_ONCE(xdp_ring->xsk_pool), xsk_frames);
 
 	return completed_frames;
 }
@@ -705,7 +710,8 @@  static int ice_xmit_xdp_tx_zc(struct xdp_buff *xdp,
 		dma_addr_t dma;
 
 		dma = xsk_buff_xdp_get_dma(xdp);
-		xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, size);
+		xsk_buff_raw_dma_sync_for_device(READ_ONCE(xdp_ring->xsk_pool),
+						 dma, size);
 
 		tx_buf->xdp = xdp;
 		tx_buf->type = ICE_TX_BUF_XSK_TX;
@@ -763,7 +769,8 @@  ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
 		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
 		if (!err)
 			return ICE_XDP_REDIR;
-		if (xsk_uses_need_wakeup(rx_ring->xsk_pool) && err == -ENOBUFS)
+		if (xsk_uses_need_wakeup(READ_ONCE(rx_ring->xsk_pool)) &&
+		    err == -ENOBUFS)
 			result = ICE_XDP_EXIT;
 		else
 			result = ICE_XDP_CONSUMED;
@@ -832,8 +839,8 @@  ice_add_xsk_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *first,
  */
 int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
 {
+	struct xsk_buff_pool *xsk_pool = READ_ONCE(rx_ring->xsk_pool);
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
-	struct xsk_buff_pool *xsk_pool = rx_ring->xsk_pool;
 	u32 ntc = rx_ring->next_to_clean;
 	u32 ntu = rx_ring->next_to_use;
 	struct xdp_buff *first = NULL;
@@ -945,7 +952,8 @@  int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
 	rx_ring->next_to_clean = ntc;
 	entries_to_alloc = ICE_RX_DESC_UNUSED(rx_ring);
 	if (entries_to_alloc > ICE_RING_QUARTER(rx_ring))
-		failure |= !ice_alloc_rx_bufs_zc(rx_ring, entries_to_alloc);
+		failure |= !ice_alloc_rx_bufs_zc(rx_ring, xsk_pool,
+						 entries_to_alloc);
 
 	ice_finalize_xdp_rx(xdp_ring, xdp_xmit, 0);
 	ice_update_rx_ring_stats(rx_ring, total_rx_packets, total_rx_bytes);
@@ -968,17 +976,19 @@  int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
 /**
  * ice_xmit_pkt - produce a single HW Tx descriptor out of AF_XDP descriptor
  * @xdp_ring: XDP ring to produce the HW Tx descriptor on
+ * @xsk_pool: XSK buffer pool to pick buffers to be consumed by HW
  * @desc: AF_XDP descriptor to pull the DMA address and length from
  * @total_bytes: bytes accumulator that will be used for stats update
  */
-static void ice_xmit_pkt(struct ice_tx_ring *xdp_ring, struct xdp_desc *desc,
+static void ice_xmit_pkt(struct ice_tx_ring *xdp_ring,
+			 struct xsk_buff_pool *xsk_pool, struct xdp_desc *desc,
 			 unsigned int *total_bytes)
 {
 	struct ice_tx_desc *tx_desc;
 	dma_addr_t dma;
 
-	dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc->addr);
-	xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, desc->len);
+	dma = xsk_buff_raw_get_dma(xsk_pool, desc->addr);
+	xsk_buff_raw_dma_sync_for_device(xsk_pool, dma, desc->len);
 
 	tx_desc = ICE_TX_DESC(xdp_ring, xdp_ring->next_to_use++);
 	tx_desc->buf_addr = cpu_to_le64(dma);
@@ -991,10 +1001,13 @@  static void ice_xmit_pkt(struct ice_tx_ring *xdp_ring, struct xdp_desc *desc,
 /**
  * ice_xmit_pkt_batch - produce a batch of HW Tx descriptors out of AF_XDP descriptors
  * @xdp_ring: XDP ring to produce the HW Tx descriptors on
+ * @xsk_pool: XSK buffer pool to pick buffers to be consumed by HW
  * @descs: AF_XDP descriptors to pull the DMA addresses and lengths from
  * @total_bytes: bytes accumulator that will be used for stats update
  */
-static void ice_xmit_pkt_batch(struct ice_tx_ring *xdp_ring, struct xdp_desc *descs,
+static void ice_xmit_pkt_batch(struct ice_tx_ring *xdp_ring,
+			       struct xsk_buff_pool *xsk_pool,
+			       struct xdp_desc *descs,
 			       unsigned int *total_bytes)
 {
 	u16 ntu = xdp_ring->next_to_use;
@@ -1004,8 +1017,8 @@  static void ice_xmit_pkt_batch(struct ice_tx_ring *xdp_ring, struct xdp_desc *de
 	loop_unrolled_for(i = 0; i < PKTS_PER_BATCH; i++) {
 		dma_addr_t dma;
 
-		dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, descs[i].addr);
-		xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, descs[i].len);
+		dma = xsk_buff_raw_get_dma(xsk_pool, descs[i].addr);
+		xsk_buff_raw_dma_sync_for_device(xsk_pool, dma, descs[i].len);
 
 		tx_desc = ICE_TX_DESC(xdp_ring, ntu++);
 		tx_desc->buf_addr = cpu_to_le64(dma);
@@ -1021,21 +1034,24 @@  static void ice_xmit_pkt_batch(struct ice_tx_ring *xdp_ring, struct xdp_desc *de
 /**
  * ice_fill_tx_hw_ring - produce the number of Tx descriptors onto ring
  * @xdp_ring: XDP ring to produce the HW Tx descriptors on
+ * @xsk_pool: XSK buffer pool to pick buffers to be consumed by HW
  * @descs: AF_XDP descriptors to pull the DMA addresses and lengths from
  * @nb_pkts: count of packets to be send
  * @total_bytes: bytes accumulator that will be used for stats update
  */
-static void ice_fill_tx_hw_ring(struct ice_tx_ring *xdp_ring, struct xdp_desc *descs,
-				u32 nb_pkts, unsigned int *total_bytes)
+static void ice_fill_tx_hw_ring(struct ice_tx_ring *xdp_ring,
+				struct xsk_buff_pool *xsk_pool,
+				struct xdp_desc *descs, u32 nb_pkts,
+				unsigned int *total_bytes)
 {
 	u32 batched, leftover, i;
 
 	batched = ALIGN_DOWN(nb_pkts, PKTS_PER_BATCH);
 	leftover = nb_pkts & (PKTS_PER_BATCH - 1);
 	for (i = 0; i < batched; i += PKTS_PER_BATCH)
-		ice_xmit_pkt_batch(xdp_ring, &descs[i], total_bytes);
+		ice_xmit_pkt_batch(xdp_ring, xsk_pool, &descs[i], total_bytes);
 	for (; i < batched + leftover; i++)
-		ice_xmit_pkt(xdp_ring, &descs[i], total_bytes);
+		ice_xmit_pkt(xdp_ring, xsk_pool, &descs[i], total_bytes);
 }
 
 /**
@@ -1046,7 +1062,8 @@  static void ice_fill_tx_hw_ring(struct ice_tx_ring *xdp_ring, struct xdp_desc *d
  */
 bool ice_xmit_zc(struct ice_tx_ring *xdp_ring)
 {
-	struct xdp_desc *descs = xdp_ring->xsk_pool->tx_descs;
+	struct xsk_buff_pool *xsk_pool = READ_ONCE(xdp_ring->xsk_pool);
+	struct xdp_desc *descs = xsk_pool->tx_descs;
 	u32 nb_pkts, nb_processed = 0;
 	unsigned int total_bytes = 0;
 	int budget;
@@ -1060,25 +1077,26 @@  bool ice_xmit_zc(struct ice_tx_ring *xdp_ring)
 	budget = ICE_DESC_UNUSED(xdp_ring);
 	budget = min_t(u16, budget, ICE_RING_QUARTER(xdp_ring));
 
-	nb_pkts = xsk_tx_peek_release_desc_batch(xdp_ring->xsk_pool, budget);
+	nb_pkts = xsk_tx_peek_release_desc_batch(xsk_pool, budget);
 	if (!nb_pkts)
 		return true;
 
 	if (xdp_ring->next_to_use + nb_pkts >= xdp_ring->count) {
 		nb_processed = xdp_ring->count - xdp_ring->next_to_use;
-		ice_fill_tx_hw_ring(xdp_ring, descs, nb_processed, &total_bytes);
+		ice_fill_tx_hw_ring(xdp_ring, xsk_pool, descs, nb_processed,
+				    &total_bytes);
 		xdp_ring->next_to_use = 0;
 	}
 
-	ice_fill_tx_hw_ring(xdp_ring, &descs[nb_processed], nb_pkts - nb_processed,
-			    &total_bytes);
+	ice_fill_tx_hw_ring(xdp_ring, xsk_pool, &descs[nb_processed],
+			    nb_pkts - nb_processed, &total_bytes);
 
 	ice_set_rs_bit(xdp_ring);
 	ice_xdp_ring_update_tail(xdp_ring);
 	ice_update_tx_ring_stats(xdp_ring, nb_pkts, total_bytes);
 
-	if (xsk_uses_need_wakeup(xdp_ring->xsk_pool))
-		xsk_set_tx_need_wakeup(xdp_ring->xsk_pool);
+	if (xsk_uses_need_wakeup(xsk_pool))
+		xsk_set_tx_need_wakeup(xsk_pool);
 
 	return nb_pkts < budget;
 }
@@ -1111,7 +1129,7 @@  ice_xsk_wakeup(struct net_device *netdev, u32 queue_id,
 
 	ring = vsi->rx_rings[queue_id]->xdp_ring;
 
-	if (!ring->xsk_pool)
+	if (!READ_ONCE(ring->xsk_pool))
 		return -EINVAL;
 
 	/* The idea here is that if NAPI is running, mark a miss, so
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.h b/drivers/net/ethernet/intel/ice/ice_xsk.h
index 6fa181f080ef..4cd2d62a0836 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.h
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.h
@@ -22,7 +22,8 @@  int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool,
 		       u16 qid);
 int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget);
 int ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, u32 flags);
-bool ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count);
+bool ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring,
+			  struct xsk_buff_pool *xsk_pool, u16 count);
 bool ice_xsk_any_rx_ring_ena(struct ice_vsi *vsi);
 void ice_xsk_clean_rx_ring(struct ice_rx_ring *rx_ring);
 void ice_xsk_clean_xdp_ring(struct ice_tx_ring *xdp_ring);
@@ -51,6 +52,7 @@  ice_clean_rx_irq_zc(struct ice_rx_ring __always_unused *rx_ring,
 
 static inline bool
 ice_alloc_rx_bufs_zc(struct ice_rx_ring __always_unused *rx_ring,
+		     struct xsk_buff_pool __always_unused *xsk_pool,
 		     u16 __always_unused count)
 {
 	return false;