Message ID | 20240708221416.625850-7-anthony.l.nguyen@intel.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | ice: fix AF_XDP ZC timeout and concurrency issues | expand |
On Mon, 8 Jul 2024 15:14:12 -0700 Tony Nguyen wrote: > @@ -1556,7 +1556,7 @@ int ice_napi_poll(struct napi_struct *napi, int budget) > * comparison in the irq context instead of many inside the > * ice_clean_rx_irq function and makes the codebase cleaner. > */ > - cleaned = rx_ring->xsk_pool ? > + cleaned = READ_ONCE(rx_ring->xsk_pool) ? > ice_clean_rx_irq_zc(rx_ring, budget_per_ring) : > ice_clean_rx_irq(rx_ring, budget_per_ring); > work_done += cleaned; > @@ -832,8 +839,8 @@ ice_add_xsk_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *first, > */ > int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) > { > + struct xsk_buff_pool *xsk_pool = READ_ONCE(rx_ring->xsk_pool); > unsigned int total_rx_bytes = 0, total_rx_packets = 0; > - struct xsk_buff_pool *xsk_pool = rx_ring->xsk_pool; > u32 ntc = rx_ring->next_to_clean; > u32 ntu = rx_ring->next_to_use; > struct xdp_buff *first = NULL; This looks suspicious, you need to at least explain why it's correct. READ_ONCE() means one access per critical section, usually. You access it at least twice in a single NAPI pool.
On Tue, Jul 09, 2024 at 06:45:24PM -0700, Jakub Kicinski wrote: > On Mon, 8 Jul 2024 15:14:12 -0700 Tony Nguyen wrote: > > @@ -1556,7 +1556,7 @@ int ice_napi_poll(struct napi_struct *napi, int budget) > > * comparison in the irq context instead of many inside the > > * ice_clean_rx_irq function and makes the codebase cleaner. > > */ > > - cleaned = rx_ring->xsk_pool ? > > + cleaned = READ_ONCE(rx_ring->xsk_pool) ? > > ice_clean_rx_irq_zc(rx_ring, budget_per_ring) : > > ice_clean_rx_irq(rx_ring, budget_per_ring); > > work_done += cleaned; > > > > @@ -832,8 +839,8 @@ ice_add_xsk_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *first, > > */ > > int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) > > { > > + struct xsk_buff_pool *xsk_pool = READ_ONCE(rx_ring->xsk_pool); > > unsigned int total_rx_bytes = 0, total_rx_packets = 0; > > - struct xsk_buff_pool *xsk_pool = rx_ring->xsk_pool; > > u32 ntc = rx_ring->next_to_clean; > > u32 ntu = rx_ring->next_to_use; > > struct xdp_buff *first = NULL; > > This looks suspicious, you need to at least explain why it's correct. > READ_ONCE() means one access per critical section, usually. > You access it at least twice in a single NAPI pool. Hey after break! Comebacks are tough, vacation was followed by flu so bear with me please... Actually xsk_pool *can* be accessed multiple times during the refill of HW Rx ring (at the end of napi poll Rx side). I thought it would be safe to follow the scheme of xdp prog pointer handling, where we read it from ring once per napi loop then work on local pointer. Goal of this commit was to prevent compiler from code reoder such as NAPI is launched before update of xsk_buff_pool pointer which is achieved with WRITE_ONCE()/synchronize_net() pair. Then per my understanding single READ_ONCE() within NAPI was sufficient, the one that makes the decision which Rx routine should be called (zc or standard one). Given that bh are disabled and updater respects RCU grace period IMHO pointer is valid for current NAPI cycle. If you're saying it's not correct and each and every xsk_pool reference within NAPI has to be decorated with READ_ONCE() then so is the xdp_prog pointer, but I'd like to hear more about this.
On Wed, 24 Jul 2024 01:46:11 +0200 Maciej Fijalkowski wrote: > Goal of this commit was to prevent compiler from code reoder such as NAPI > is launched before update of xsk_buff_pool pointer which is achieved with > WRITE_ONCE()/synchronize_net() pair. Then per my understanding single > READ_ONCE() within NAPI was sufficient, the one that makes the decision > which Rx routine should be called (zc or standard one). Given that bh are > disabled and updater respects RCU grace period IMHO pointer is valid for > current NAPI cycle. So if we are already in the af_xdp handler, and update patch sets pool to NULL - the af_xdp handler will be fine with the pool becoming NULL? I guess it may be fine, it's just quite odd to call the function called _ONCE() multiple times..
On Wed, Jul 24, 2024 at 07:57:42AM -0700, Jakub Kicinski wrote: > On Wed, 24 Jul 2024 01:46:11 +0200 Maciej Fijalkowski wrote: > > Goal of this commit was to prevent compiler from code reoder such as NAPI > > is launched before update of xsk_buff_pool pointer which is achieved with > > WRITE_ONCE()/synchronize_net() pair. Then per my understanding single > > READ_ONCE() within NAPI was sufficient, the one that makes the decision > > which Rx routine should be called (zc or standard one). Given that bh are > > disabled and updater respects RCU grace period IMHO pointer is valid for > > current NAPI cycle. > > So if we are already in the af_xdp handler, and update patch sets pool > to NULL - the af_xdp handler will be fine with the pool becoming NULL? > I guess it may be fine, it's just quite odd to call the function called > _ONCE() multiple times.. Update path before NULLing pool will go through rcu grace period, stop napis, disable irqs, etc. Running napi won't be exposed to nulled pool in such case. >
On Wed, 24 Jul 2024 17:49:12 +0200 Maciej Fijalkowski wrote: > > So if we are already in the af_xdp handler, and update patch sets pool > > to NULL - the af_xdp handler will be fine with the pool becoming NULL? > > I guess it may be fine, it's just quite odd to call the function called > > _ONCE() multiple times.. > > Update path before NULLing pool will go through rcu grace period, stop > napis, disable irqs, etc. Running napi won't be exposed to nulled pool in > such case. Could you make it clearer what condition the patch is fixing, then? What can go wrong without this patch?
On Thu, Jul 25, 2024 at 06:38:58AM -0700, Jakub Kicinski wrote: > On Wed, 24 Jul 2024 17:49:12 +0200 Maciej Fijalkowski wrote: > > > So if we are already in the af_xdp handler, and update patch sets pool > > > to NULL - the af_xdp handler will be fine with the pool becoming NULL? > > > I guess it may be fine, it's just quite odd to call the function called > > > _ONCE() multiple times.. > > > > Update path before NULLing pool will go through rcu grace period, stop > > napis, disable irqs, etc. Running napi won't be exposed to nulled pool in > > such case. > > Could you make it clearer what condition the patch is fixing, then? > What can go wrong without this patch? Sorry for confusion, but without this patch scenario you brought up initially *could* happen, under some wild circumstances. When I was responding yesterday my head was around the code with this particular patch in place, that's why I said such pool state transistion was not possible. Updater does this (prior to this patch): (...) ring->xsk_pool = ice_get_xp_from_qid(vsi, qid); // set to NULL (...) ice_qvec_toggle_napi(vsi, q_vector, true); ice_qvec_ena_irq(vsi, q_vector); In theory compiler is allowed to transform the code in a way that xsk_pool assignment will happen *after* triggering napi. So in ice_napi_poll(): if (tx_ring->xsk_pool) wd = ice_xmit_zc(tx_ring); // call ZC routine else if (ice_ring_is_xdp(tx_ring)) wd = true; else wd = ice_clean_tx_irq(tx_ring, budget); You will initiate ZC Tx processing because xsk_pool ptr was still valid and crash in the middle of its job once it's finally NULLed. To prevent that: updater: (...) WRITE_ONCE(ring->xsk_pool, ice_get_xp_from_qid(vsi, qid)); (...) ice_qvec_toggle_napi(vsi, q_vector, true); ice_qvec_ena_irq(vsi, q_vector); /* make sure NAPI sees updated ice_{t,x}_ring::xsk_pool */ synchronize_net(); reader: if (READ_ONCE(tx_ring->xsk_pool)) wd = ice_xmit_zc(tx_ring); else if (ice_ring_is_xdp(tx_ring)) wd = true; else wd = ice_clean_tx_irq(tx_ring, budget); Does that make any sense now?
On Thu, 25 Jul 2024 20:31:31 +0200 Maciej Fijalkowski wrote:
> Does that make any sense now?
Could be brain fog due to post-netdev.conf covid but no, not really.
The _ONCE() helpers basically give you the ability to store the pointer
to a variable on the stack, and that variable won't change behind your
back. But the only reason to READ_ONCE(ptr->thing) something multiple
times is to tell KCSAN that "I know what I'm doing", it just silences
potential warnings :S
On Thu, Jul 25, 2024 at 04:07:00PM -0700, Jakub Kicinski wrote: > On Thu, 25 Jul 2024 20:31:31 +0200 Maciej Fijalkowski wrote: > > Does that make any sense now? > > Could be brain fog due to post-netdev.conf covid but no, not really. Huh, that makes two of us. > > The _ONCE() helpers basically give you the ability to store the pointer > to a variable on the stack, and that variable won't change behind your > back. But the only reason to READ_ONCE(ptr->thing) something multiple > times is to tell KCSAN that "I know what I'm doing", it just silences > potential warnings :S I feel like you keep on referring to _ONCE (*) being used multiple times which might be counter-intuitive whereas I was trying from the beginning to explain my point that xsk pool from driver POV should get the very same treatment as xdp prog has currently. So, either mark it as __rcu variable and use rcu helpers or use _ONCE variants plus some sync. (*) Ok, if you meant from the very beginning that two READ_ONCE against pool per single critical section is suspicious then I didn't get that, sorry. With diff below I would have single READ_ONCE and work on that variable for rest of the napi. Patch was actually trying to limit xsk_pool accesses from ring struct by working on stack variable. Would you be okay with that? -----8<----- diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 4c115531beba..5b27aaaa94ee 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -1550,14 +1550,15 @@ int ice_napi_poll(struct napi_struct *napi, int budget) budget_per_ring = budget; ice_for_each_rx_ring(rx_ring, q_vector->rx) { + struct xsk_buff_pool *xsk_pool = READ_ONCE(rx_ring->xsk_pool); int cleaned; /* A dedicated path for zero-copy allows making a single * comparison in the irq context instead of many inside the * ice_clean_rx_irq function and makes the codebase cleaner. */ - cleaned = READ_ONCE(rx_ring->xsk_pool) ? - ice_clean_rx_irq_zc(rx_ring, budget_per_ring) : + cleaned = rx_ring->xsk_pool ? + ice_clean_rx_irq_zc(rx_ring, xsk_pool, budget_per_ring) : ice_clean_rx_irq(rx_ring, budget_per_ring); work_done += cleaned; /* if we clean as many as budgeted, we must not be done */ diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 492a9e54d58b..dceab7619a64 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -837,13 +837,15 @@ ice_add_xsk_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *first, /** * ice_clean_rx_irq_zc - consumes packets from the hardware ring * @rx_ring: AF_XDP Rx ring + * @xsk_pool: AF_XDP pool ptr * @budget: NAPI budget * * Returns number of processed packets on success, remaining budget on failure. */ -int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) +int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, + struct xsk_buff_pool *xsk_pool, + int budget) { - struct xsk_buff_pool *xsk_pool = READ_ONCE(rx_ring->xsk_pool); unsigned int total_rx_bytes = 0, total_rx_packets = 0; u32 ntc = rx_ring->next_to_clean; u32 ntu = rx_ring->next_to_use; diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.h b/drivers/net/ethernet/intel/ice/ice_xsk.h index 4cd2d62a0836..8c3675185699 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.h +++ b/drivers/net/ethernet/intel/ice/ice_xsk.h @@ -20,7 +20,9 @@ struct ice_vsi; #ifdef CONFIG_XDP_SOCKETS int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid); -int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget); +int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, + struct xsk_buff_pool *xsk_pool, + int budget); int ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, u32 flags); bool ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, struct xsk_buff_pool *xsk_pool, u16 count); @@ -45,6 +47,7 @@ ice_xsk_pool_setup(struct ice_vsi __always_unused *vsi, static inline int ice_clean_rx_irq_zc(struct ice_rx_ring __always_unused *rx_ring, + struct xsk_buff_pool __always_unused *xsk_pool, int __always_unused budget) { return 0; ----->8-----
On Fri, 26 Jul 2024 15:43:20 +0200 Maciej Fijalkowski wrote: > > The _ONCE() helpers basically give you the ability to store the pointer > > to a variable on the stack, and that variable won't change behind your > > back. But the only reason to READ_ONCE(ptr->thing) something multiple > > times is to tell KCSAN that "I know what I'm doing", it just silences > > potential warnings :S > > I feel like you keep on referring to _ONCE (*) being used multiple times > which might be counter-intuitive whereas I was trying from the beginning > to explain my point that xsk pool from driver POV should get the very same > treatment as xdp prog has currently. So, either mark it as __rcu variable > and use rcu helpers or use _ONCE variants plus some sync. > > (*) Ok, if you meant from the very beginning that two READ_ONCE against > pool per single critical section is suspicious then I didn't get that, > sorry. With diff below I would have single READ_ONCE and work on that > variable for rest of the napi. Patch was actually trying to limit xsk_pool > accesses from ring struct by working on stack variable. > > Would you be okay with that? Yup! That diff makes sense, thanks!
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index 99a75a59078e..caaa10157909 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -765,18 +765,17 @@ static inline struct xsk_buff_pool *ice_get_xp_from_qid(struct ice_vsi *vsi, } /** - * ice_xsk_pool - get XSK buffer pool bound to a ring + * ice_rx_xsk_pool - assign XSK buff pool to Rx ring * @ring: Rx ring to use * - * Returns a pointer to xsk_buff_pool structure if there is a buffer pool - * present, NULL otherwise. + * Sets XSK buff pool pointer on Rx ring. */ -static inline struct xsk_buff_pool *ice_xsk_pool(struct ice_rx_ring *ring) +static inline void ice_rx_xsk_pool(struct ice_rx_ring *ring) { struct ice_vsi *vsi = ring->vsi; u16 qid = ring->q_index; - return ice_get_xp_from_qid(vsi, qid); + WRITE_ONCE(ring->xsk_pool, ice_get_xp_from_qid(vsi, qid)); } /** @@ -801,7 +800,7 @@ static inline void ice_tx_xsk_pool(struct ice_vsi *vsi, u16 qid) if (!ring) return; - ring->xsk_pool = ice_get_xp_from_qid(vsi, qid); + WRITE_ONCE(ring->xsk_pool, ice_get_xp_from_qid(vsi, qid)); } /** diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c index 5d396c1a7731..1facf179a96f 100644 --- a/drivers/net/ethernet/intel/ice/ice_base.c +++ b/drivers/net/ethernet/intel/ice/ice_base.c @@ -536,7 +536,7 @@ static int ice_vsi_cfg_rxq(struct ice_rx_ring *ring) return err; } - ring->xsk_pool = ice_xsk_pool(ring); + ice_rx_xsk_pool(ring); if (ring->xsk_pool) { xdp_rxq_info_unreg(&ring->xdp_rxq); @@ -597,7 +597,7 @@ static int ice_vsi_cfg_rxq(struct ice_rx_ring *ring) return 0; } - ok = ice_alloc_rx_bufs_zc(ring, num_bufs); + ok = ice_alloc_rx_bufs_zc(ring, ring->xsk_pool, num_bufs); if (!ok) { u16 pf_q = ring->vsi->rxq_map[ring->q_index]; diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 55a42aad92a5..9b075dd48889 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -2949,7 +2949,7 @@ static void ice_vsi_rx_napi_schedule(struct ice_vsi *vsi) ice_for_each_rxq(vsi, i) { struct ice_rx_ring *rx_ring = vsi->rx_rings[i]; - if (rx_ring->xsk_pool) + if (READ_ONCE(rx_ring->xsk_pool)) napi_schedule(&rx_ring->q_vector->napi); } } diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 8bb743f78fcb..f4b2b1bca234 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -1523,7 +1523,7 @@ int ice_napi_poll(struct napi_struct *napi, int budget) ice_for_each_tx_ring(tx_ring, q_vector->tx) { bool wd; - if (tx_ring->xsk_pool) + if (READ_ONCE(tx_ring->xsk_pool)) wd = ice_xmit_zc(tx_ring); else if (ice_ring_is_xdp(tx_ring)) wd = true; @@ -1556,7 +1556,7 @@ int ice_napi_poll(struct napi_struct *napi, int budget) * comparison in the irq context instead of many inside the * ice_clean_rx_irq function and makes the codebase cleaner. */ - cleaned = rx_ring->xsk_pool ? + cleaned = READ_ONCE(rx_ring->xsk_pool) ? ice_clean_rx_irq_zc(rx_ring, budget_per_ring) : ice_clean_rx_irq(rx_ring, budget_per_ring); work_done += cleaned; diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 3fbe4cfadfbf..b4058c4937bc 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -250,6 +250,8 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx) ice_qvec_toggle_napi(vsi, q_vector, true); ice_qvec_ena_irq(vsi, q_vector); + /* make sure NAPI sees updated ice_{t,x}_ring::xsk_pool */ + synchronize_net(); ice_get_link_status(vsi->port_info, &link_up); if (link_up) { netif_tx_start_queue(netdev_get_tx_queue(vsi->netdev, q_idx)); @@ -464,6 +466,7 @@ static u16 ice_fill_rx_descs(struct xsk_buff_pool *pool, struct xdp_buff **xdp, /** * __ice_alloc_rx_bufs_zc - allocate a number of Rx buffers * @rx_ring: Rx ring + * @xsk_pool: XSK buffer pool to pick buffers to be filled by HW * @count: The number of buffers to allocate * * Place the @count of descriptors onto Rx ring. Handle the ring wrap @@ -472,7 +475,8 @@ static u16 ice_fill_rx_descs(struct xsk_buff_pool *pool, struct xdp_buff **xdp, * * Returns true if all allocations were successful, false if any fail. */ -static bool __ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count) +static bool __ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, + struct xsk_buff_pool *xsk_pool, u16 count) { u32 nb_buffs_extra = 0, nb_buffs = 0; union ice_32b_rx_flex_desc *rx_desc; @@ -484,8 +488,7 @@ static bool __ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count) xdp = ice_xdp_buf(rx_ring, ntu); if (ntu + count >= rx_ring->count) { - nb_buffs_extra = ice_fill_rx_descs(rx_ring->xsk_pool, xdp, - rx_desc, + nb_buffs_extra = ice_fill_rx_descs(xsk_pool, xdp, rx_desc, rx_ring->count - ntu); if (nb_buffs_extra != rx_ring->count - ntu) { ntu += nb_buffs_extra; @@ -498,7 +501,7 @@ static bool __ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count) ice_release_rx_desc(rx_ring, 0); } - nb_buffs = ice_fill_rx_descs(rx_ring->xsk_pool, xdp, rx_desc, count); + nb_buffs = ice_fill_rx_descs(xsk_pool, xdp, rx_desc, count); ntu += nb_buffs; if (ntu == rx_ring->count) @@ -514,6 +517,7 @@ static bool __ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count) /** * ice_alloc_rx_bufs_zc - allocate a number of Rx buffers * @rx_ring: Rx ring + * @xsk_pool: XSK buffer pool to pick buffers to be filled by HW * @count: The number of buffers to allocate * * Wrapper for internal allocation routine; figure out how many tail @@ -521,7 +525,8 @@ static bool __ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count) * * Returns true if all calls to internal alloc routine succeeded */ -bool ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count) +bool ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, + struct xsk_buff_pool *xsk_pool, u16 count) { u16 rx_thresh = ICE_RING_QUARTER(rx_ring); u16 leftover, i, tail_bumps; @@ -530,9 +535,9 @@ bool ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count) leftover = count - (tail_bumps * rx_thresh); for (i = 0; i < tail_bumps; i++) - if (!__ice_alloc_rx_bufs_zc(rx_ring, rx_thresh)) + if (!__ice_alloc_rx_bufs_zc(rx_ring, xsk_pool, rx_thresh)) return false; - return __ice_alloc_rx_bufs_zc(rx_ring, leftover); + return __ice_alloc_rx_bufs_zc(rx_ring, xsk_pool, leftover); } /** @@ -653,7 +658,7 @@ static u32 ice_clean_xdp_irq_zc(struct ice_tx_ring *xdp_ring) if (xdp_ring->next_to_clean >= cnt) xdp_ring->next_to_clean -= cnt; if (xsk_frames) - xsk_tx_completed(xdp_ring->xsk_pool, xsk_frames); + xsk_tx_completed(READ_ONCE(xdp_ring->xsk_pool), xsk_frames); return completed_frames; } @@ -705,7 +710,8 @@ static int ice_xmit_xdp_tx_zc(struct xdp_buff *xdp, dma_addr_t dma; dma = xsk_buff_xdp_get_dma(xdp); - xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, size); + xsk_buff_raw_dma_sync_for_device(READ_ONCE(xdp_ring->xsk_pool), + dma, size); tx_buf->xdp = xdp; tx_buf->type = ICE_TX_BUF_XSK_TX; @@ -763,7 +769,8 @@ ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); if (!err) return ICE_XDP_REDIR; - if (xsk_uses_need_wakeup(rx_ring->xsk_pool) && err == -ENOBUFS) + if (xsk_uses_need_wakeup(READ_ONCE(rx_ring->xsk_pool)) && + err == -ENOBUFS) result = ICE_XDP_EXIT; else result = ICE_XDP_CONSUMED; @@ -832,8 +839,8 @@ ice_add_xsk_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *first, */ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) { + struct xsk_buff_pool *xsk_pool = READ_ONCE(rx_ring->xsk_pool); unsigned int total_rx_bytes = 0, total_rx_packets = 0; - struct xsk_buff_pool *xsk_pool = rx_ring->xsk_pool; u32 ntc = rx_ring->next_to_clean; u32 ntu = rx_ring->next_to_use; struct xdp_buff *first = NULL; @@ -945,7 +952,8 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) rx_ring->next_to_clean = ntc; entries_to_alloc = ICE_RX_DESC_UNUSED(rx_ring); if (entries_to_alloc > ICE_RING_QUARTER(rx_ring)) - failure |= !ice_alloc_rx_bufs_zc(rx_ring, entries_to_alloc); + failure |= !ice_alloc_rx_bufs_zc(rx_ring, xsk_pool, + entries_to_alloc); ice_finalize_xdp_rx(xdp_ring, xdp_xmit, 0); ice_update_rx_ring_stats(rx_ring, total_rx_packets, total_rx_bytes); @@ -968,17 +976,19 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) /** * ice_xmit_pkt - produce a single HW Tx descriptor out of AF_XDP descriptor * @xdp_ring: XDP ring to produce the HW Tx descriptor on + * @xsk_pool: XSK buffer pool to pick buffers to be consumed by HW * @desc: AF_XDP descriptor to pull the DMA address and length from * @total_bytes: bytes accumulator that will be used for stats update */ -static void ice_xmit_pkt(struct ice_tx_ring *xdp_ring, struct xdp_desc *desc, +static void ice_xmit_pkt(struct ice_tx_ring *xdp_ring, + struct xsk_buff_pool *xsk_pool, struct xdp_desc *desc, unsigned int *total_bytes) { struct ice_tx_desc *tx_desc; dma_addr_t dma; - dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc->addr); - xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, desc->len); + dma = xsk_buff_raw_get_dma(xsk_pool, desc->addr); + xsk_buff_raw_dma_sync_for_device(xsk_pool, dma, desc->len); tx_desc = ICE_TX_DESC(xdp_ring, xdp_ring->next_to_use++); tx_desc->buf_addr = cpu_to_le64(dma); @@ -991,10 +1001,13 @@ static void ice_xmit_pkt(struct ice_tx_ring *xdp_ring, struct xdp_desc *desc, /** * ice_xmit_pkt_batch - produce a batch of HW Tx descriptors out of AF_XDP descriptors * @xdp_ring: XDP ring to produce the HW Tx descriptors on + * @xsk_pool: XSK buffer pool to pick buffers to be consumed by HW * @descs: AF_XDP descriptors to pull the DMA addresses and lengths from * @total_bytes: bytes accumulator that will be used for stats update */ -static void ice_xmit_pkt_batch(struct ice_tx_ring *xdp_ring, struct xdp_desc *descs, +static void ice_xmit_pkt_batch(struct ice_tx_ring *xdp_ring, + struct xsk_buff_pool *xsk_pool, + struct xdp_desc *descs, unsigned int *total_bytes) { u16 ntu = xdp_ring->next_to_use; @@ -1004,8 +1017,8 @@ static void ice_xmit_pkt_batch(struct ice_tx_ring *xdp_ring, struct xdp_desc *de loop_unrolled_for(i = 0; i < PKTS_PER_BATCH; i++) { dma_addr_t dma; - dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, descs[i].addr); - xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, descs[i].len); + dma = xsk_buff_raw_get_dma(xsk_pool, descs[i].addr); + xsk_buff_raw_dma_sync_for_device(xsk_pool, dma, descs[i].len); tx_desc = ICE_TX_DESC(xdp_ring, ntu++); tx_desc->buf_addr = cpu_to_le64(dma); @@ -1021,21 +1034,24 @@ static void ice_xmit_pkt_batch(struct ice_tx_ring *xdp_ring, struct xdp_desc *de /** * ice_fill_tx_hw_ring - produce the number of Tx descriptors onto ring * @xdp_ring: XDP ring to produce the HW Tx descriptors on + * @xsk_pool: XSK buffer pool to pick buffers to be consumed by HW * @descs: AF_XDP descriptors to pull the DMA addresses and lengths from * @nb_pkts: count of packets to be send * @total_bytes: bytes accumulator that will be used for stats update */ -static void ice_fill_tx_hw_ring(struct ice_tx_ring *xdp_ring, struct xdp_desc *descs, - u32 nb_pkts, unsigned int *total_bytes) +static void ice_fill_tx_hw_ring(struct ice_tx_ring *xdp_ring, + struct xsk_buff_pool *xsk_pool, + struct xdp_desc *descs, u32 nb_pkts, + unsigned int *total_bytes) { u32 batched, leftover, i; batched = ALIGN_DOWN(nb_pkts, PKTS_PER_BATCH); leftover = nb_pkts & (PKTS_PER_BATCH - 1); for (i = 0; i < batched; i += PKTS_PER_BATCH) - ice_xmit_pkt_batch(xdp_ring, &descs[i], total_bytes); + ice_xmit_pkt_batch(xdp_ring, xsk_pool, &descs[i], total_bytes); for (; i < batched + leftover; i++) - ice_xmit_pkt(xdp_ring, &descs[i], total_bytes); + ice_xmit_pkt(xdp_ring, xsk_pool, &descs[i], total_bytes); } /** @@ -1046,7 +1062,8 @@ static void ice_fill_tx_hw_ring(struct ice_tx_ring *xdp_ring, struct xdp_desc *d */ bool ice_xmit_zc(struct ice_tx_ring *xdp_ring) { - struct xdp_desc *descs = xdp_ring->xsk_pool->tx_descs; + struct xsk_buff_pool *xsk_pool = READ_ONCE(xdp_ring->xsk_pool); + struct xdp_desc *descs = xsk_pool->tx_descs; u32 nb_pkts, nb_processed = 0; unsigned int total_bytes = 0; int budget; @@ -1060,25 +1077,26 @@ bool ice_xmit_zc(struct ice_tx_ring *xdp_ring) budget = ICE_DESC_UNUSED(xdp_ring); budget = min_t(u16, budget, ICE_RING_QUARTER(xdp_ring)); - nb_pkts = xsk_tx_peek_release_desc_batch(xdp_ring->xsk_pool, budget); + nb_pkts = xsk_tx_peek_release_desc_batch(xsk_pool, budget); if (!nb_pkts) return true; if (xdp_ring->next_to_use + nb_pkts >= xdp_ring->count) { nb_processed = xdp_ring->count - xdp_ring->next_to_use; - ice_fill_tx_hw_ring(xdp_ring, descs, nb_processed, &total_bytes); + ice_fill_tx_hw_ring(xdp_ring, xsk_pool, descs, nb_processed, + &total_bytes); xdp_ring->next_to_use = 0; } - ice_fill_tx_hw_ring(xdp_ring, &descs[nb_processed], nb_pkts - nb_processed, - &total_bytes); + ice_fill_tx_hw_ring(xdp_ring, xsk_pool, &descs[nb_processed], + nb_pkts - nb_processed, &total_bytes); ice_set_rs_bit(xdp_ring); ice_xdp_ring_update_tail(xdp_ring); ice_update_tx_ring_stats(xdp_ring, nb_pkts, total_bytes); - if (xsk_uses_need_wakeup(xdp_ring->xsk_pool)) - xsk_set_tx_need_wakeup(xdp_ring->xsk_pool); + if (xsk_uses_need_wakeup(xsk_pool)) + xsk_set_tx_need_wakeup(xsk_pool); return nb_pkts < budget; } @@ -1111,7 +1129,7 @@ ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, ring = vsi->rx_rings[queue_id]->xdp_ring; - if (!ring->xsk_pool) + if (!READ_ONCE(ring->xsk_pool)) return -EINVAL; /* The idea here is that if NAPI is running, mark a miss, so diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.h b/drivers/net/ethernet/intel/ice/ice_xsk.h index 6fa181f080ef..4cd2d62a0836 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.h +++ b/drivers/net/ethernet/intel/ice/ice_xsk.h @@ -22,7 +22,8 @@ int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid); int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget); int ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, u32 flags); -bool ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count); +bool ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, + struct xsk_buff_pool *xsk_pool, u16 count); bool ice_xsk_any_rx_ring_ena(struct ice_vsi *vsi); void ice_xsk_clean_rx_ring(struct ice_rx_ring *rx_ring); void ice_xsk_clean_xdp_ring(struct ice_tx_ring *xdp_ring); @@ -51,6 +52,7 @@ ice_clean_rx_irq_zc(struct ice_rx_ring __always_unused *rx_ring, static inline bool ice_alloc_rx_bufs_zc(struct ice_rx_ring __always_unused *rx_ring, + struct xsk_buff_pool __always_unused *xsk_pool, u16 __always_unused count) { return false;