Message ID | 20240604132155.3573752-2-maciej.fijalkowski@intel.com (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | ice: fix AF_XDP ZC timeout and concurrency issues | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Clearly marked for net |
netdev/apply | fail | Patch does not apply to net-0 |
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Date: Tue, 4 Jun 2024 15:21:48 +0200 > From: Michal Kubiak <michal.kubiak@intel.com> > > Address a scenario in which XSK ZC Tx produces descriptors to XDP Tx > ring when link is either not yet fully initialized or process of > stopping the netdev has already started. To avoid this, add checks > against carrier readiness in ice_xsk_wakeup() and in ice_xmit_zc(). > One could argue that bailing out early in ice_xsk_wakeup() would be > sufficient but given the fact that we produce Tx descriptors on behalf > of NAPI that is triggered for Rx traffic, the latter is also needed. > > Bringing link up is an asynchronous event executed within > ice_service_task so even though interface has been brought up there is > still a time frame where link is not yet ok. > > Without this patch, when AF_XDP ZC Tx is used simultaneously with stack > Tx, Tx timeouts occur after going through link flap (admin brings > interface down then up again). HW seem to be unable to transmit > descriptor to the wire after HW tail register bump which in turn causes > bit __QUEUE_STATE_STACK_XOFF to be set forever as > netdev_tx_completed_queue() sees no cleaned bytes on the input. > > Fixes: 126cdfe1007a ("ice: xsk: Improve AF_XDP ZC Tx and use batching API") > Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") > Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> > Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) > Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > --- > drivers/net/ethernet/intel/ice/ice_xsk.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c > index 2015f66b0cf9..1bd4b054dd80 100644 > --- a/drivers/net/ethernet/intel/ice/ice_xsk.c > +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c > @@ -1048,6 +1048,10 @@ bool ice_xmit_zc(struct ice_tx_ring *xdp_ring) > > ice_clean_xdp_irq_zc(xdp_ring); > > + if (!netif_carrier_ok(xdp_ring->vsi->netdev) || > + !netif_running(xdp_ring->vsi->netdev)) > + return true; Why is it checked after clean_xdp_irq_zc()? Also, unlikely()? > + > budget = ICE_DESC_UNUSED(xdp_ring); > budget = min_t(u16, budget, ICE_RING_QUARTER(xdp_ring)); > > @@ -1091,7 +1095,7 @@ ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, > struct ice_vsi *vsi = np->vsi; > struct ice_tx_ring *ring; > > - if (test_bit(ICE_VSI_DOWN, vsi->state)) > + if (test_bit(ICE_VSI_DOWN, vsi->state) || !netif_carrier_ok(netdev)) Same for unlikely()? > return -ENETDOWN; > > if (!ice_is_xdp_ena_vsi(vsi)) Thanks, Olek
On Tue, Jun 11, 2024 at 01:59:37PM +0200, Alexander Lobakin wrote: > From: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > Date: Tue, 4 Jun 2024 15:21:48 +0200 > > > From: Michal Kubiak <michal.kubiak@intel.com> > > > > Address a scenario in which XSK ZC Tx produces descriptors to XDP Tx > > ring when link is either not yet fully initialized or process of > > stopping the netdev has already started. To avoid this, add checks > > against carrier readiness in ice_xsk_wakeup() and in ice_xmit_zc(). > > One could argue that bailing out early in ice_xsk_wakeup() would be > > sufficient but given the fact that we produce Tx descriptors on behalf > > of NAPI that is triggered for Rx traffic, the latter is also needed. > > > > Bringing link up is an asynchronous event executed within > > ice_service_task so even though interface has been brought up there is > > still a time frame where link is not yet ok. > > > > Without this patch, when AF_XDP ZC Tx is used simultaneously with stack > > Tx, Tx timeouts occur after going through link flap (admin brings > > interface down then up again). HW seem to be unable to transmit > > descriptor to the wire after HW tail register bump which in turn causes > > bit __QUEUE_STATE_STACK_XOFF to be set forever as > > netdev_tx_completed_queue() sees no cleaned bytes on the input. > > > > Fixes: 126cdfe1007a ("ice: xsk: Improve AF_XDP ZC Tx and use batching API") > > Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") > > Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> > > Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) > > Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > > --- > > drivers/net/ethernet/intel/ice/ice_xsk.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c > > index 2015f66b0cf9..1bd4b054dd80 100644 > > --- a/drivers/net/ethernet/intel/ice/ice_xsk.c > > +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c > > @@ -1048,6 +1048,10 @@ bool ice_xmit_zc(struct ice_tx_ring *xdp_ring) > > > > ice_clean_xdp_irq_zc(xdp_ring); > > > > + if (!netif_carrier_ok(xdp_ring->vsi->netdev) || > > + !netif_running(xdp_ring->vsi->netdev)) > > + return true; > > Why is it checked after clean_xdp_irq_zc()? There's nothing wrong with cleaning descriptors that have been sent previously. We don't touch anything HW nor netstack related there, just bumping ntc and producing CQ descriptors, both ops are pure SW things. > Also, unlikely()? Thought about that as well but we played it safe first. I wouldn't like to resubmit whole series due to this so maybe Tony/Jake could address this when sending to netdev, or am I asking for too much?:) > > > + > > budget = ICE_DESC_UNUSED(xdp_ring); > > budget = min_t(u16, budget, ICE_RING_QUARTER(xdp_ring)); > > > > @@ -1091,7 +1095,7 @@ ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, > > struct ice_vsi *vsi = np->vsi; > > struct ice_tx_ring *ring; > > > > - if (test_bit(ICE_VSI_DOWN, vsi->state)) > > + if (test_bit(ICE_VSI_DOWN, vsi->state) || !netif_carrier_ok(netdev)) > > Same for unlikely()? > > > return -ENETDOWN; > > > > if (!ice_is_xdp_ena_vsi(vsi)) > > Thanks, > Olek
On 6/11/2024 10:21 AM, Maciej Fijalkowski wrote: > On Tue, Jun 11, 2024 at 01:59:37PM +0200, Alexander Lobakin wrote: >>> ice_clean_xdp_irq_zc(xdp_ring); >>> >>> + if (!netif_carrier_ok(xdp_ring->vsi->netdev) || >>> + !netif_running(xdp_ring->vsi->netdev)) >>> + return true; ... >> Also, unlikely()? > > Thought about that as well but we played it safe first. I wouldn't like to > resubmit whole series due to this so maybe Tony/Jake could address this > when sending to netdev, or am I asking for too much?:) I can adjust when sending it on to netdev. Thanks, Tony
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Date: Tue, 11 Jun 2024 16:21:27 +0200 > On Tue, Jun 11, 2024 at 01:59:37PM +0200, Alexander Lobakin wrote: >> From: Maciej Fijalkowski <maciej.fijalkowski@intel.com> >> Date: Tue, 4 Jun 2024 15:21:48 +0200 >> >>> From: Michal Kubiak <michal.kubiak@intel.com> >>> >>> Address a scenario in which XSK ZC Tx produces descriptors to XDP Tx >>> ring when link is either not yet fully initialized or process of >>> stopping the netdev has already started. To avoid this, add checks >>> against carrier readiness in ice_xsk_wakeup() and in ice_xmit_zc(). >>> One could argue that bailing out early in ice_xsk_wakeup() would be >>> sufficient but given the fact that we produce Tx descriptors on behalf >>> of NAPI that is triggered for Rx traffic, the latter is also needed. >>> >>> Bringing link up is an asynchronous event executed within >>> ice_service_task so even though interface has been brought up there is >>> still a time frame where link is not yet ok. >>> >>> Without this patch, when AF_XDP ZC Tx is used simultaneously with stack >>> Tx, Tx timeouts occur after going through link flap (admin brings >>> interface down then up again). HW seem to be unable to transmit >>> descriptor to the wire after HW tail register bump which in turn causes >>> bit __QUEUE_STATE_STACK_XOFF to be set forever as >>> netdev_tx_completed_queue() sees no cleaned bytes on the input. >>> >>> Fixes: 126cdfe1007a ("ice: xsk: Improve AF_XDP ZC Tx and use batching API") >>> Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") >>> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> >>> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) >>> Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> >>> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> >>> --- >>> drivers/net/ethernet/intel/ice/ice_xsk.c | 6 +++++- >>> 1 file changed, 5 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c >>> index 2015f66b0cf9..1bd4b054dd80 100644 >>> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c >>> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c >>> @@ -1048,6 +1048,10 @@ bool ice_xmit_zc(struct ice_tx_ring *xdp_ring) >>> >>> ice_clean_xdp_irq_zc(xdp_ring); >>> >>> + if (!netif_carrier_ok(xdp_ring->vsi->netdev) || >>> + !netif_running(xdp_ring->vsi->netdev)) >>> + return true; >> >> Why is it checked after clean_xdp_irq_zc()? > > There's nothing wrong with cleaning descriptors that have been sent > previously. We don't touch anything HW nor netstack related there, just > bumping ntc and producing CQ descriptors, both ops are pure SW things. Sure, but do we need to do that if we don't send anything this time? Lazy cleaning and all that :p Thanks, Olek
From: Alexander Lobakin <aleksander.lobakin@intel.com> Date: Wed, 12 Jun 2024 11:09:10 +0200 > From: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > Date: Tue, 11 Jun 2024 16:21:27 +0200 [...] >>>> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c >>>> index 2015f66b0cf9..1bd4b054dd80 100644 >>>> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c >>>> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c >>>> @@ -1048,6 +1048,10 @@ bool ice_xmit_zc(struct ice_tx_ring *xdp_ring) >>>> >>>> ice_clean_xdp_irq_zc(xdp_ring); >>>> >>>> + if (!netif_carrier_ok(xdp_ring->vsi->netdev) || >>>> + !netif_running(xdp_ring->vsi->netdev)) Oh BTW, I noticed some time ago that netif_running() is less precise than checking for %IFF_UP. For example, in this piece (main netdev ifup function)[0], netif_running() will start returning true *before* driver's .ndo_open() is called, but %IFF_UP will be set only after .ndo_open() is done (with no issues). That means, I'd check for %IFF_UP honestly (maybe even before checking the carrier). [0] https://elixir.bootlin.com/linux/v6.10-rc3/source/net/core/dev.c#L1466 Thanks, Olek
On Wed, Jun 12, 2024 at 11:15:31AM +0200, Alexander Lobakin wrote: > From: Alexander Lobakin <aleksander.lobakin@intel.com> > Date: Wed, 12 Jun 2024 11:09:10 +0200 > > > From: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > > Date: Tue, 11 Jun 2024 16:21:27 +0200 > > [...] > > >>>> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c > >>>> index 2015f66b0cf9..1bd4b054dd80 100644 > >>>> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c > >>>> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c > >>>> @@ -1048,6 +1048,10 @@ bool ice_xmit_zc(struct ice_tx_ring *xdp_ring) > >>>> > >>>> ice_clean_xdp_irq_zc(xdp_ring); > >>>> > >>>> + if (!netif_carrier_ok(xdp_ring->vsi->netdev) || > >>>> + !netif_running(xdp_ring->vsi->netdev)) > > Oh BTW, I noticed some time ago that netif_running() is less precise > than checking for %IFF_UP. > For example, in this piece (main netdev ifup function)[0], > netif_running() will start returning true *before* driver's .ndo_open() > is called, but %IFF_UP will be set only after .ndo_open() is done (with > no issues). I see, thanks for bringing this up! I'd like to try this out. Tony sorry for the noise, but it seems I'll go with v4 and will decorate the mentioned statements with unlikely(). > That means, I'd check for %IFF_UP honestly (maybe even before checking > the carrier). I wonder whether it is the ultimate check and two existing ones (that we are adding in this patch) could be dropped? > > [0] https://elixir.bootlin.com/linux/v6.10-rc3/source/net/core/dev.c#L1466 > > Thanks, > Olek
On Thu, Jun 13, 2024 at 05:51:25PM +0200, Maciej Fijalkowski wrote: > On Wed, Jun 12, 2024 at 11:15:31AM +0200, Alexander Lobakin wrote: > > From: Alexander Lobakin <aleksander.lobakin@intel.com> > > Date: Wed, 12 Jun 2024 11:09:10 +0200 > > > > > From: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > > > Date: Tue, 11 Jun 2024 16:21:27 +0200 > > > > [...] > > > > >>>> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c > > >>>> index 2015f66b0cf9..1bd4b054dd80 100644 > > >>>> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c > > >>>> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c > > >>>> @@ -1048,6 +1048,10 @@ bool ice_xmit_zc(struct ice_tx_ring *xdp_ring) > > >>>> > > >>>> ice_clean_xdp_irq_zc(xdp_ring); > > >>>> > > >>>> + if (!netif_carrier_ok(xdp_ring->vsi->netdev) || > > >>>> + !netif_running(xdp_ring->vsi->netdev)) > > > > Oh BTW, I noticed some time ago that netif_running() is less precise > > than checking for %IFF_UP. > > For example, in this piece (main netdev ifup function)[0], > > netif_running() will start returning true *before* driver's .ndo_open() > > is called, but %IFF_UP will be set only after .ndo_open() is done (with > > no issues). > > I see, thanks for bringing this up! I'd like to try this out. Tony sorry > for the noise, but it seems I'll go with v4 and will decorate the > mentioned statements with unlikely(). > > > That means, I'd check for %IFF_UP honestly (maybe even before checking > > the carrier). > > I wonder whether it is the ultimate check and two existing ones (that we > are adding in this patch) could be dropped? In netdev closing path [1], __LINK_STATE_START is cleared before IFF_UP. What we were initially experiencing when netif_running() check wasn't in place was that xsk was producing a bunch of Tx descs when device was being brought down. So let me keep what we have here and add IFF_UP check on top. Better be safe than sorry as the bug we were dealing with was pretty nasty. > > > > > [0] https://elixir.bootlin.com/linux/v6.10-rc3/source/net/core/dev.c#L1466 [1]: https://elixir.bootlin.com/linux/v6.10-rc3/source/net/core/dev.c#L1532 > > > > Thanks, > > Olek
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Date: Thu, 13 Jun 2024 18:07:53 +0200 > On Thu, Jun 13, 2024 at 05:51:25PM +0200, Maciej Fijalkowski wrote: >> On Wed, Jun 12, 2024 at 11:15:31AM +0200, Alexander Lobakin wrote: >>> From: Alexander Lobakin <aleksander.lobakin@intel.com> >>> Date: Wed, 12 Jun 2024 11:09:10 +0200 >>> >>>> From: Maciej Fijalkowski <maciej.fijalkowski@intel.com> >>>> Date: Tue, 11 Jun 2024 16:21:27 +0200 >>> >>> [...] >>> >>>>>>> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c >>>>>>> index 2015f66b0cf9..1bd4b054dd80 100644 >>>>>>> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c >>>>>>> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c >>>>>>> @@ -1048,6 +1048,10 @@ bool ice_xmit_zc(struct ice_tx_ring *xdp_ring) >>>>>>> >>>>>>> ice_clean_xdp_irq_zc(xdp_ring); >>>>>>> >>>>>>> + if (!netif_carrier_ok(xdp_ring->vsi->netdev) || >>>>>>> + !netif_running(xdp_ring->vsi->netdev)) >>> >>> Oh BTW, I noticed some time ago that netif_running() is less precise >>> than checking for %IFF_UP. >>> For example, in this piece (main netdev ifup function)[0], >>> netif_running() will start returning true *before* driver's .ndo_open() >>> is called, but %IFF_UP will be set only after .ndo_open() is done (with >>> no issues). >> >> I see, thanks for bringing this up! I'd like to try this out. Tony sorry >> for the noise, but it seems I'll go with v4 and will decorate the >> mentioned statements with unlikely(). >> >>> That means, I'd check for %IFF_UP honestly (maybe even before checking >>> the carrier). >> >> I wonder whether it is the ultimate check and two existing ones (that we >> are adding in this patch) could be dropped? > > In netdev closing path [1], __LINK_STATE_START is cleared before IFF_UP. Oh man, inconsistency in its best :D > What we were initially experiencing when netif_running() check wasn't in > place was that xsk was producing a bunch of Tx descs when device was being > brought down. So let me keep what we have here and add IFF_UP check on > top. Better be safe than sorry as the bug we were dealing with was pretty > nasty. Sure, I didn't know %IFF_UP's not that reliable :s Thanks, Olek
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 2015f66b0cf9..1bd4b054dd80 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -1048,6 +1048,10 @@ bool ice_xmit_zc(struct ice_tx_ring *xdp_ring) ice_clean_xdp_irq_zc(xdp_ring); + if (!netif_carrier_ok(xdp_ring->vsi->netdev) || + !netif_running(xdp_ring->vsi->netdev)) + return true; + budget = ICE_DESC_UNUSED(xdp_ring); budget = min_t(u16, budget, ICE_RING_QUARTER(xdp_ring)); @@ -1091,7 +1095,7 @@ ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, struct ice_vsi *vsi = np->vsi; struct ice_tx_ring *ring; - if (test_bit(ICE_VSI_DOWN, vsi->state)) + if (test_bit(ICE_VSI_DOWN, vsi->state) || !netif_carrier_ok(netdev)) return -ENETDOWN; if (!ice_is_xdp_ena_vsi(vsi))