diff mbox series

[PATCH/RFC,v2] net: ethernet: ravb: exit if hardware is in-progress in tx timeout

Message ID 1595246298-29260-1-git-send-email-yoshihiro.shimoda.uh@renesas.com (mailing list archive)
State Under Review
Delegated to: Geert Uytterhoeven
Headers show
Series [PATCH/RFC,v2] net: ethernet: ravb: exit if hardware is in-progress in tx timeout | expand

Commit Message

Yoshihiro Shimoda July 20, 2020, 11:58 a.m. UTC
According to the report of [1], this driver is possible to cause
the following error in ravb_tx_timeout_work().

ravb e6800000.ethernet ethernet: failed to switch device to config mode

This error means that the hardware could not change the state
from "Operation" to "Configuration" while some tx and/or rx queue
are operating. After that, ravb_config() in ravb_dmac_init() will fail,
and then any descriptors will be not allocaled anymore so that NULL
pointer dereference happens after that on ravb_start_xmit().

To fix the issue, the ravb_tx_timeout_work() should check
the return value of ravb_stop_dma() whether this hardware can be
re-initialized or not. If ravb_stop_dma() fails, ravb_tx_timeout_work()
re-enables TX and RX and just exits.

[1]
https://lore.kernel.org/linux-renesas-soc/20200518045452.2390-1-dirk.behme@de.bosch.com/

Reported-by: Dirk Behme <dirk.behme@de.bosch.com>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
---
 Changes from RFC v1:
 - Check the return value of ravb_stop_dma() and exit if the hardware
   condition can not be initialized in the tx timeout.
 - Update the commit subject and description.
 - Fix some typo.
 https://patchwork.kernel.org/patch/11570217/

 Unfortunately, I still didn't reproduce the issue yet. So, I still
 marked RFC on this patch.

 drivers/net/ethernet/renesas/ravb_main.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

Comments

Sergei Shtylyov July 20, 2020, 5:14 p.m. UTC | #1
Hello!

On 7/20/20 2:58 PM, Yoshihiro Shimoda wrote:

> According to the report of [1], this driver is possible to cause
> the following error in ravb_tx_timeout_work().
> 
> ravb e6800000.ethernet ethernet: failed to switch device to config mode

   Hmm, maybe we need a larger timeout there? The current one amounts to only
~100 ms for all cases (maybe we should parametrize the timeout?)...
  
> This error means that the hardware could not change the state
> from "Operation" to "Configuration" while some tx and/or rx queue
> are operating. After that, ravb_config() in ravb_dmac_init() will fail,

   Are we seeing double messages from ravb_config()? I think we aren't...

> and then any descriptors will be not allocaled anymore so that NULL
> pointer dereference happens after that on ravb_start_xmit().
> 
> To fix the issue, the ravb_tx_timeout_work() should check
> the return value of ravb_stop_dma() whether this hardware can be
> re-initialized or not. If ravb_stop_dma() fails, ravb_tx_timeout_work()
> re-enables TX and RX and just exits.
> 
> [1]
> https://lore.kernel.org/linux-renesas-soc/20200518045452.2390-1-dirk.behme@de.bosch.com/
> 
> Reported-by: Dirk Behme <dirk.behme@de.bosch.com>
> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>

   Assuming the comment below is fixed:

Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>

> ---
>  Changes from RFC v1:
>  - Check the return value of ravb_stop_dma() and exit if the hardware
>    condition can not be initialized in the tx timeout.
>  - Update the commit subject and description.
>  - Fix some typo.
>  https://patchwork.kernel.org/patch/11570217/
> 
>  Unfortunately, I still didn't reproduce the issue yet. So, I still
>  marked RFC on this patch.

    I think the Bosch people should test this patch, as they reported the kernel oops...

> 
>  drivers/net/ethernet/renesas/ravb_main.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> index a442bcf6..500f5c1 100644
> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -1458,7 +1458,18 @@ static void ravb_tx_timeout_work(struct work_struct *work)
>  		ravb_ptp_stop(ndev);
>  
>  	/* Wait for DMA stopping */
> -	ravb_stop_dma(ndev);
> +	if (ravb_stop_dma(ndev)) {
> +		/* If ravb_stop_dma() fails, the hardware is still in-progress
> +		 * as "Operation" mode for TX and/or RX. So, this should not

   s/in-progress as "Operation" mode/operating/.

> +		 * call the following functions because ravb_dmac_init() is
> +		 * possible to fail too. Also, this should not retry
> +		 * ravb_stop_dma() again and again here because it's possible
> +		 * to wait forever. So, this just re-enables the TX and RX and
> +		 * skip the following re-initialization procedure.
> +		 */
> +		ravb_rcv_snd_enable(ndev);
> +		goto out;
> +	}
>  
>  	ravb_ring_free(ndev, RAVB_BE);
>  	ravb_ring_free(ndev, RAVB_NC);
> @@ -1467,6 +1478,7 @@ static void ravb_tx_timeout_work(struct work_struct *work)
>  	ravb_dmac_init(ndev);

   BTW, that one also may fail...

>  	ravb_emac_init(ndev);
>  
> +out:
>  	/* Initialise PTP Clock driver */
>  	if (priv->chip_id == RCAR_GEN2)
>  		ravb_ptp_init(ndev, priv->pdev);
>
Yoshihiro Shimoda July 21, 2020, 1:39 a.m. UTC | #2
Hello!

Thank you for your review!

> From: Sergei Shtylyov, Sent: Tuesday, July 21, 2020 2:15 AM
> 
> Hello!
> 
> On 7/20/20 2:58 PM, Yoshihiro Shimoda wrote:
> 
> > According to the report of [1], this driver is possible to cause
> > the following error in ravb_tx_timeout_work().
> >
> > ravb e6800000.ethernet ethernet: failed to switch device to config mode
> 
>    Hmm, maybe we need a larger timeout there? The current one amounts to only
> ~100 ms for all cases (maybe we should parametrize the timeout?)...

I don't think so because we cannot assume when RX is finished.
For example, if other device sends to the hardware by using "ping -f",
the hardware is operating as RX while the ping is running.

> > This error means that the hardware could not change the state
> > from "Operation" to "Configuration" while some tx and/or rx queue
> > are operating. After that, ravb_config() in ravb_dmac_init() will fail,
> 
>    Are we seeing double messages from ravb_config()? I think we aren't...

No, we are not seeing double messages from ravb_config() because
ravb_stop_dma() is possible to fail before ravb_config() is called if
TCCR or CSR is specific condition.

> > and then any descriptors will be not allocaled anymore so that NULL
> > pointer dereference happens after that on ravb_start_xmit().
> >
> > To fix the issue, the ravb_tx_timeout_work() should check
> > the return value of ravb_stop_dma() whether this hardware can be
> > re-initialized or not. If ravb_stop_dma() fails, ravb_tx_timeout_work()
> > re-enables TX and RX and just exits.
> >
> > [1]
> > https://lore.kernel.org/linux-renesas-soc/20200518045452.2390-1-dirk.behme@de.bosch.com/
> >
> > Reported-by: Dirk Behme <dirk.behme@de.bosch.com>
> > Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> 
>    Assuming the comment below is fixed:
> 
> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>

Thanks!

> > ---
> >  Changes from RFC v1:
> >  - Check the return value of ravb_stop_dma() and exit if the hardware
> >    condition can not be initialized in the tx timeout.
> >  - Update the commit subject and description.
> >  - Fix some typo.
> >  https://patchwork.kernel.org/patch/11570217/
> >
> >  Unfortunately, I still didn't reproduce the issue yet. So, I still
> >  marked RFC on this patch.
> 
>     I think the Bosch people should test this patch, as they reported the kernel oops...
> 
> >
> >  drivers/net/ethernet/renesas/ravb_main.c | 14 +++++++++++++-
> >  1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> > index a442bcf6..500f5c1 100644
> > --- a/drivers/net/ethernet/renesas/ravb_main.c
> > +++ b/drivers/net/ethernet/renesas/ravb_main.c
> > @@ -1458,7 +1458,18 @@ static void ravb_tx_timeout_work(struct work_struct *work)
> >  		ravb_ptp_stop(ndev);
> >
> >  	/* Wait for DMA stopping */
> > -	ravb_stop_dma(ndev);
> > +	if (ravb_stop_dma(ndev)) {
> > +		/* If ravb_stop_dma() fails, the hardware is still in-progress
> > +		 * as "Operation" mode for TX and/or RX. So, this should not
> 
>    s/in-progress as "Operation" mode/operating/.

I'll fix it.

> > +		 * call the following functions because ravb_dmac_init() is
> > +		 * possible to fail too. Also, this should not retry
> > +		 * ravb_stop_dma() again and again here because it's possible
> > +		 * to wait forever. So, this just re-enables the TX and RX and
> > +		 * skip the following re-initialization procedure.
> > +		 */
> > +		ravb_rcv_snd_enable(ndev);
> > +		goto out;
> > +	}
> >
> >  	ravb_ring_free(ndev, RAVB_BE);
> >  	ravb_ring_free(ndev, RAVB_NC);
> > @@ -1467,6 +1478,7 @@ static void ravb_tx_timeout_work(struct work_struct *work)
> >  	ravb_dmac_init(ndev);
> 
>    BTW, that one also may fail...

Yes, that's true... In this case, I think this should print error message and
stop TX and RX to avoid any unexpected behaviors like kernel panic. So, I'll add
such a code.

Best regards,
Yoshihiro Shimoda

> >  	ravb_emac_init(ndev);
> >
> > +out:
> >  	/* Initialise PTP Clock driver */
> >  	if (priv->chip_id == RCAR_GEN2)
> >  		ravb_ptp_init(ndev, priv->pdev);
> >
diff mbox series

Patch

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index a442bcf6..500f5c1 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -1458,7 +1458,18 @@  static void ravb_tx_timeout_work(struct work_struct *work)
 		ravb_ptp_stop(ndev);
 
 	/* Wait for DMA stopping */
-	ravb_stop_dma(ndev);
+	if (ravb_stop_dma(ndev)) {
+		/* If ravb_stop_dma() fails, the hardware is still in-progress
+		 * as "Operation" mode for TX and/or RX. So, this should not
+		 * call the following functions because ravb_dmac_init() is
+		 * possible to fail too. Also, this should not retry
+		 * ravb_stop_dma() again and again here because it's possible
+		 * to wait forever. So, this just re-enables the TX and RX and
+		 * skip the following re-initialization procedure.
+		 */
+		ravb_rcv_snd_enable(ndev);
+		goto out;
+	}
 
 	ravb_ring_free(ndev, RAVB_BE);
 	ravb_ring_free(ndev, RAVB_NC);
@@ -1467,6 +1478,7 @@  static void ravb_tx_timeout_work(struct work_struct *work)
 	ravb_dmac_init(ndev);
 	ravb_emac_init(ndev);
 
+out:
 	/* Initialise PTP Clock driver */
 	if (priv->chip_id == RCAR_GEN2)
 		ravb_ptp_init(ndev, priv->pdev);