Message ID | 20240709195845.9089-1-rwahl@gmx.de (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: ks8851: Fix potential TX stall after interface reopen | expand |
On 7/9/2024 12:58 PM, Ronald Wahl wrote: > From: Ronald Wahl <ronald.wahl@raritan.com> > > The amount of TX space in the hardware buffer is tracked in the tx_space > variable. The initial value is currently only set during driver probing. > > After closing the interface and reopening it the tx_space variable has > the last value it had before close. If it is smaller than the size of > the first send packet after reopeing the interface the queue will be > stopped. The queue is woken up after receiving a TX interrupt but this > will never happen since we did not send anything. > > This commit moves the initialization of the tx_space variable to the > ks8851_net_open function right before starting the TX queue. Also query > the value from the hardware instead of using a hard coded value. > > Only the SPI chip variant is affected by this issue because only this > driver variant actually depends on the tx_space variable in the xmit > function. I'm curious if this dependency could be removed? Otherwise: Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> > > Fixes: 3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX buffer overrun") > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Jakub Kicinski <kuba@kernel.org> > Cc: Paolo Abeni <pabeni@redhat.com> > Cc: Simon Horman <horms@kernel.org> > Cc: netdev@vger.kernel.org > Cc: stable@vger.kernel.org # 5.10+ > Signed-off-by: Ronald Wahl <ronald.wahl@raritan.com> > --- > drivers/net/ethernet/micrel/ks8851_common.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c > index 6453c92f0fa7..03a554df6e7a 100644 > --- a/drivers/net/ethernet/micrel/ks8851_common.c > +++ b/drivers/net/ethernet/micrel/ks8851_common.c > @@ -482,6 +482,7 @@ static int ks8851_net_open(struct net_device *dev) > ks8851_wrreg16(ks, KS_IER, ks->rc_ier); > > ks->queued_len = 0; > + ks->tx_space = ks8851_rdreg16(ks, KS_TXMIR); > netif_start_queue(ks->netdev); > > netif_dbg(ks, ifup, ks->netdev, "network device up\n"); > @@ -1101,7 +1102,6 @@ int ks8851_probe_common(struct net_device *netdev, struct device *dev, > int ret; > > ks->netdev = netdev; > - ks->tx_space = 6144; > > ks->gpio = devm_gpiod_get_optional(dev, "reset", GPIOD_OUT_HIGH); > ret = PTR_ERR_OR_ZERO(ks->gpio); > -- > 2.45.2 > >
On 11.07.24 01:48, Jacob Keller wrote: > > > On 7/9/2024 12:58 PM, Ronald Wahl wrote: >> From: Ronald Wahl <ronald.wahl@raritan.com> >> >> The amount of TX space in the hardware buffer is tracked in the tx_space >> variable. The initial value is currently only set during driver probing. >> >> After closing the interface and reopening it the tx_space variable has >> the last value it had before close. If it is smaller than the size of >> the first send packet after reopeing the interface the queue will be >> stopped. The queue is woken up after receiving a TX interrupt but this >> will never happen since we did not send anything. >> >> This commit moves the initialization of the tx_space variable to the >> ks8851_net_open function right before starting the TX queue. Also query >> the value from the hardware instead of using a hard coded value. >> >> Only the SPI chip variant is affected by this issue because only this >> driver variant actually depends on the tx_space variable in the xmit >> function. > > I'm curious if this dependency could be removed? I don't think so. The driver must ensure not to write too much data to the hardware so we need a precise accounting of how much we can write. In the original driver code for the SPI variant this was broken and repaired in 3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX buffer overrun"). Unfortunately we required some rounds of bug fixing to get it finally working without any issues. Hopefully this was the last change in that regard. :-) If you ask why only the SPI version is affected then the answer is that for the parallel interface chip there is no internal driver queuing, i.e. it writes a single packet per xmit call. Not sure if this can also overrun the hardware buffer if the receiver throttles via flow control. Since I do not own this chip variant I cannot test this. In the end that could even mean that we would need the accounting for the parallel chip code as well. - ron
On 7/10/2024 5:20 PM, Ronald Wahl wrote: > On 11.07.24 01:48, Jacob Keller wrote: >> >> >> On 7/9/2024 12:58 PM, Ronald Wahl wrote: >>> From: Ronald Wahl <ronald.wahl@raritan.com> >>> >>> The amount of TX space in the hardware buffer is tracked in the tx_space >>> variable. The initial value is currently only set during driver probing. >>> >>> After closing the interface and reopening it the tx_space variable has >>> the last value it had before close. If it is smaller than the size of >>> the first send packet after reopeing the interface the queue will be >>> stopped. The queue is woken up after receiving a TX interrupt but this >>> will never happen since we did not send anything. >>> >>> This commit moves the initialization of the tx_space variable to the >>> ks8851_net_open function right before starting the TX queue. Also query >>> the value from the hardware instead of using a hard coded value. >>> >>> Only the SPI chip variant is affected by this issue because only this >>> driver variant actually depends on the tx_space variable in the xmit >>> function. >> >> I'm curious if this dependency could be removed? > > I don't think so. > > The driver must ensure not to write too much data to the hardware so we > need a precise accounting of how much we can write. In the original > driver code for the SPI variant this was broken and repaired in > 3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX buffer overrun"). > Unfortunately we required some rounds of bug fixing to get it finally > working without any issues. Hopefully this was the last change in that > regard. :-) > > If you ask why only the SPI version is affected then the answer is that > for the parallel interface chip there is no internal driver queuing, > i.e. it writes a single packet per xmit call. Not sure if this can also > overrun the hardware buffer if the receiver throttles via flow control. > Since I do not own this chip variant I cannot test this. In the end that > could even mean that we would need the accounting for the parallel chip > code as well. > > - ron > That explains why only the one variation has this value. Thanks!
On Tue, 9 Jul 2024 21:58:45 +0200 Ronald Wahl wrote: > From: Ronald Wahl <ronald.wahl@raritan.com> > > The amount of TX space in the hardware buffer is tracked in the tx_space > variable. The initial value is currently only set during driver probing. > > After closing the interface and reopening it the tx_space variable has > the last value it had before close. If it is smaller than the size of > the first send packet after reopeing the interface the queue will be > stopped. The queue is woken up after receiving a TX interrupt but this > will never happen since we did not send anything. > > This commit moves the initialization of the tx_space variable to the > ks8851_net_open function right before starting the TX queue. Also query > the value from the hardware instead of using a hard coded value. > > Only the SPI chip variant is affected by this issue because only this > driver variant actually depends on the tx_space variable in the xmit > function. The patchwork bot is taking long siestas in Konstantin's absence, FWIW this patch was applied by Paolo on Tue. Thank you!
diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c index 6453c92f0fa7..03a554df6e7a 100644 --- a/drivers/net/ethernet/micrel/ks8851_common.c +++ b/drivers/net/ethernet/micrel/ks8851_common.c @@ -482,6 +482,7 @@ static int ks8851_net_open(struct net_device *dev) ks8851_wrreg16(ks, KS_IER, ks->rc_ier); ks->queued_len = 0; + ks->tx_space = ks8851_rdreg16(ks, KS_TXMIR); netif_start_queue(ks->netdev); netif_dbg(ks, ifup, ks->netdev, "network device up\n"); @@ -1101,7 +1102,6 @@ int ks8851_probe_common(struct net_device *netdev, struct device *dev, int ret; ks->netdev = netdev; - ks->tx_space = 6144; ks->gpio = devm_gpiod_get_optional(dev, "reset", GPIOD_OUT_HIGH); ret = PTR_ERR_OR_ZERO(ks->gpio);