diff mbox

[1/6] ravb: remove custom .nway_reset from ethtool ops

Message ID 1527160318-10958-2-git-send-email-vladimir_zapolskiy@mentor.com (mailing list archive)
State Superseded
Delegated to: Geert Uytterhoeven
Headers show

Commit Message

Vladimir Zapolskiy May 24, 2018, 11:11 a.m. UTC
The change fixes a sleep in atomic context issue, which can be
always triggered by running 'ethtool -r' command, because
phy_start_aneg() protects phydev fields by a mutex.

Another note is that the change implicitly replaces phy_start_aneg()
with a newer phy_restart_aneg().

Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
---
 drivers/net/ethernet/renesas/ravb_main.c | 17 +----------------
 1 file changed, 1 insertion(+), 16 deletions(-)

Comments

Andrew Lunn May 24, 2018, 1:22 p.m. UTC | #1
On Thu, May 24, 2018 at 02:11:53PM +0300, Vladimir Zapolskiy wrote:
> The change fixes a sleep in atomic context issue, which can be
> always triggered by running 'ethtool -r' command, because
> phy_start_aneg() protects phydev fields by a mutex.
> 
> Another note is that the change implicitly replaces phy_start_aneg()
> with a newer phy_restart_aneg().
> 
> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
> ---
>  drivers/net/ethernet/renesas/ravb_main.c | 17 +----------------
>  1 file changed, 1 insertion(+), 16 deletions(-)
> 
> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> index 68f122140966..4a043eb0e2aa 100644
> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -1150,21 +1150,6 @@ static int ravb_set_link_ksettings(struct net_device *ndev,
>  	return error;
>  }
>  
> -static int ravb_nway_reset(struct net_device *ndev)
> -{
> -	struct ravb_private *priv = netdev_priv(ndev);
> -	int error = -ENODEV;
> -	unsigned long flags;
> -
> -	if (ndev->phydev) {
> -		spin_lock_irqsave(&priv->lock, flags);
> -		error = phy_start_aneg(ndev->phydev);
> -		spin_unlock_irqrestore(&priv->lock, flags);
> -	}

Eck! phylib assumes thread context and takes a mutex while calling
into the PHY driver.

It would be good to add some sort of fixes: tag. Maybe for the commit
that added the generic nway_reset? That would at least cover some
stable kernels.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew
Vladimir Zapolskiy May 24, 2018, 2:11 p.m. UTC | #2
On 05/24/2018 04:22 PM, Andrew Lunn wrote:
> On Thu, May 24, 2018 at 02:11:53PM +0300, Vladimir Zapolskiy wrote:
>> The change fixes a sleep in atomic context issue, which can be
>> always triggered by running 'ethtool -r' command, because
>> phy_start_aneg() protects phydev fields by a mutex.
>>
>> Another note is that the change implicitly replaces phy_start_aneg()
>> with a newer phy_restart_aneg().
>>
>> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
>> ---
>>  drivers/net/ethernet/renesas/ravb_main.c | 17 +----------------
>>  1 file changed, 1 insertion(+), 16 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
>> index 68f122140966..4a043eb0e2aa 100644
>> --- a/drivers/net/ethernet/renesas/ravb_main.c
>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
>> @@ -1150,21 +1150,6 @@ static int ravb_set_link_ksettings(struct net_device *ndev,
>>  	return error;
>>  }
>>  
>> -static int ravb_nway_reset(struct net_device *ndev)
>> -{
>> -	struct ravb_private *priv = netdev_priv(ndev);
>> -	int error = -ENODEV;
>> -	unsigned long flags;
>> -
>> -	if (ndev->phydev) {
>> -		spin_lock_irqsave(&priv->lock, flags);
>> -		error = phy_start_aneg(ndev->phydev);
>> -		spin_unlock_irqrestore(&priv->lock, flags);
>> -	}
> 
> Eck! phylib assumes thread context and takes a mutex while calling
> into the PHY driver.
> 
> It would be good to add some sort of fixes: tag. Maybe for the commit
> that added the generic nway_reset? That would at least cover some
> stable kernels.
> 
> Reviewed-by: Andrew Lunn <andrew@lunn.ch>
> 

Hi Andrew, thank you for review.

generally it makes sense to add Fixes tag, but as I said in
the commit message the problem is present before reused phy_ethtool_*()
functions were added to the kernel, so some kind of juggling with
the proper kernel version would be required in assumption that
the fixes are backported as an unmodified changes.

Hopefully Sergei as the driver maintainer can verify the fixes on
older kernels and suggest the right kernel versions for backporting.

--
With best wishes,
Vladimir
Sergei Shtylyov May 24, 2018, 4:18 p.m. UTC | #3
Hello!

On 05/24/2018 05:11 PM, Vladimir Zapolskiy wrote:

>>> The change fixes a sleep in atomic context issue, which can be
>>> always triggered by running 'ethtool -r' command, because
>>> phy_start_aneg() protects phydev fields by a mutex.

  You don't say that *not* grabbing the spinlock is safe... 

>>> Another note is that the change implicitly replaces phy_start_aneg()
>>> with a newer phy_restart_aneg().

   Hm, perphaps this could be a material for a separate patch? 

>>>
>>> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
>>> ---
>>>  drivers/net/ethernet/renesas/ravb_main.c | 17 +----------------
>>>  1 file changed, 1 insertion(+), 16 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
>>> index 68f122140966..4a043eb0e2aa 100644
>>> --- a/drivers/net/ethernet/renesas/ravb_main.c
>>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
>>> @@ -1150,21 +1150,6 @@ static int ravb_set_link_ksettings(struct net_device *ndev,
>>>  	return error;
>>>  }
>>>  
>>> -static int ravb_nway_reset(struct net_device *ndev)
>>> -{
>>> -	struct ravb_private *priv = netdev_priv(ndev);
>>> -	int error = -ENODEV;
>>> -	unsigned long flags;
>>> -
>>> -	if (ndev->phydev) {
>>> -		spin_lock_irqsave(&priv->lock, flags);
>>> -		error = phy_start_aneg(ndev->phydev);
>>> -		spin_unlock_irqrestore(&priv->lock, flags);
>>> -	}
>>
>> Eck! phylib assumes thread context and takes a mutex while calling
>> into the PHY driver.
>>
>> It would be good to add some sort of fixes: tag. Maybe for the commit
>> that added the generic nway_reset? That would at least cover some
>> stable kernels.
>>
>> Reviewed-by: Andrew Lunn <andrew@lunn.ch>
>>
> 
> Hi Andrew, thank you for review.
> 
> generally it makes sense to add Fixes tag, but as I said in
> the commit message the problem is present before reused phy_ethtool_*()
> functions were added to the kernel, so some kind of juggling with
> the proper kernel version would be required in assumption that
> the fixes are backported as an unmodified changes.

   The -stable fixes can vary from version to version, IIUC. You could be
asked to backport your patch if Greg KH (or somebody else from the -stable
kernel maintainers) gets in trouble backporting your patch. 

> Hopefully Sergei as the driver maintainer can verify the fixes on

   I'm *not* a maintainer, just a humble reviewer! :-)

> older kernels and suggest the right kernel versions for backporting.

   This would be asking too much from me, I'm afraid...
   Still, Dave, could you please give me a couple of days to spend on
this series?

> --
> With best wishes,
> Vladimir

MBR, Sergei
Andrew Lunn May 24, 2018, 4:44 p.m. UTC | #4
On Thu, May 24, 2018 at 07:18:28PM +0300, Sergei Shtylyov wrote:
> Hello!
> 
> On 05/24/2018 05:11 PM, Vladimir Zapolskiy wrote:
> 
> >>> The change fixes a sleep in atomic context issue, which can be
> >>> always triggered by running 'ethtool -r' command, because
> >>> phy_start_aneg() protects phydev fields by a mutex.
> 
>   You don't say that *not* grabbing the spinlock is safe... 

For it to be unsafe, i think that would mean phylib would need to call
back into the MAC driver? The only way that could happen is via the
adjust_link call. And that will deadlock, since it takes the same
lock.

Or am i/we missing something?

	    Andrew
Sergei Shtylyov May 24, 2018, 5:01 p.m. UTC | #5
On 05/24/2018 07:44 PM, Andrew Lunn wrote:

>>>>> The change fixes a sleep in atomic context issue, which can be
>>>>> always triggered by running 'ethtool -r' command, because
>>>>> phy_start_aneg() protects phydev fields by a mutex.
>>
>>   You don't say that *not* grabbing the spinlock is safe... 
> 
> For it to be unsafe, i think that would mean phylib would need to call
> back into the MAC driver? The only way that could happen is via the
> adjust_link call. And that will deadlock, since it takes the same
> lock.
> 
> Or am i/we missing something?

   It doesn't take any locks currently, only patches #3/#6 makes it do so...

> 	    Andrew

MBR, Sergei
Andrew Lunn May 24, 2018, 5:23 p.m. UTC | #6
> > For it to be unsafe, i think that would mean phylib would need to call
> > back into the MAC driver? The only way that could happen is via the
> > adjust_link call. And that will deadlock, since it takes the same
> > lock.
> > 
> > Or am i/we missing something?
> 
>    It doesn't take any locks currently, only patches #3/#6 makes it do so...

Ah, yes.

You should not be holding any spinlocks when calling into phylib.
It does its own locking, which is mutex based.

The code in this patch is not touching the MAC, so looks safe to me.

    Andrew
Vladimir Zapolskiy May 25, 2018, 6:05 a.m. UTC | #7
Hello Sergei,

On 05/24/2018 08:01 PM, Sergei Shtylyov wrote:
> On 05/24/2018 07:44 PM, Andrew Lunn wrote:
> 
>>>>>> The change fixes a sleep in atomic context issue, which can be
>>>>>> always triggered by running 'ethtool -r' command, because
>>>>>> phy_start_aneg() protects phydev fields by a mutex.
>>>
>>>   You don't say that *not* grabbing the spinlock is safe...

I say both that it is the fix and it is safe, I've already described
the function of the spinlock in my comments, and it is more or less
clear from the driver code.

>>
>> For it to be unsafe, i think that would mean phylib would need to call
>> back into the MAC driver? The only way that could happen is via the
>> adjust_link call. And that will deadlock, since it takes the same
>> lock.
>>
>> Or am i/we missing something?
> 
>    It doesn't take any locks currently, only patches #3/#6 makes it do so...

And that's the proper fix in my opinion, my tests don't unveil any issues.

--
With best wishes,
Vladimir
Sergei Shtylyov May 26, 2018, 4:56 p.m. UTC | #8
Hello.

   A formal patch review this time...

On 05/24/2018 02:11 PM, Vladimir Zapolskiy wrote:

> The change fixes a sleep in atomic context issue, which can be
> always triggered by running 'ethtool -r' command, because
> phy_start_aneg() protects phydev fields by a mutex.

   OK so far...

> Another note is that the change implicitly replaces phy_start_aneg()
> with a newer phy_restart_aneg().

   Why? Is this necessary to fix the BUG()?

> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
> ---
>  drivers/net/ethernet/renesas/ravb_main.c | 17 +----------------
>  1 file changed, 1 insertion(+), 16 deletions(-)
> 
> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> index 68f122140966..4a043eb0e2aa 100644
> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -1150,21 +1150,6 @@ static int ravb_set_link_ksettings(struct net_device *ndev,
>  	return error;
>  }
>  
> -static int ravb_nway_reset(struct net_device *ndev)
> -{
> -	struct ravb_private *priv = netdev_priv(ndev);
> -	int error = -ENODEV;
> -	unsigned long flags;
> -
> -	if (ndev->phydev) {
> -		spin_lock_irqsave(&priv->lock, flags);

   OK, removing spin_lock_irqsave() fixes the BUG()...
   Not sure what we rotect against here anyway, MAC interrupts?

> -		error = phy_start_aneg(ndev->phydev);
> -		spin_unlock_irqrestore(&priv->lock, flags);
> -	}
> -
> -	return error;
> -}
> -
>  static u32 ravb_get_msglevel(struct net_device *ndev)
>  {
>  	struct ravb_private *priv = netdev_priv(ndev);
> @@ -1377,7 +1362,7 @@ static int ravb_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
>  }
>  
>  static const struct ethtool_ops ravb_ethtool_ops = {
> -	.nway_reset		= ravb_nway_reset,
> +	.nway_reset		= phy_ethtool_nway_reset,

   What does this fix?

>  	.get_msglevel		= ravb_get_msglevel,
>  	.set_msglevel		= ravb_set_msglevel,
>  	.get_link		= ethtool_op_get_link,

MBR, Sergei
Sergei Shtylyov May 26, 2018, 4:56 p.m. UTC | #9
Hello.

   A formal patch review this time...

On 05/24/2018 02:11 PM, Vladimir Zapolskiy wrote:

> The change fixes a sleep in atomic context issue, which can be
> always triggered by running 'ethtool -r' command, because
> phy_start_aneg() protects phydev fields by a mutex.

   OK so far...

> Another note is that the change implicitly replaces phy_start_aneg()
> with a newer phy_restart_aneg().

   Why? Is this necessary to fix the BUG()?

> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
> ---
>  drivers/net/ethernet/renesas/ravb_main.c | 17 +----------------
>  1 file changed, 1 insertion(+), 16 deletions(-)
> 
> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> index 68f122140966..4a043eb0e2aa 100644
> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -1150,21 +1150,6 @@ static int ravb_set_link_ksettings(struct net_device *ndev,
>  	return error;
>  }
>  
> -static int ravb_nway_reset(struct net_device *ndev)
> -{
> -	struct ravb_private *priv = netdev_priv(ndev);
> -	int error = -ENODEV;
> -	unsigned long flags;
> -
> -	if (ndev->phydev) {
> -		spin_lock_irqsave(&priv->lock, flags);

   OK, removing spin_lock_irqsave() fixes the BUG()...
   Not sure what we rotect against here anyway, MAC interrupts?

> -		error = phy_start_aneg(ndev->phydev);
> -		spin_unlock_irqrestore(&priv->lock, flags);
> -	}
> -
> -	return error;
> -}
> -
>  static u32 ravb_get_msglevel(struct net_device *ndev)
>  {
>  	struct ravb_private *priv = netdev_priv(ndev);
> @@ -1377,7 +1362,7 @@ static int ravb_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
>  }
>  
>  static const struct ethtool_ops ravb_ethtool_ops = {
> -	.nway_reset		= ravb_nway_reset,
> +	.nway_reset		= phy_ethtool_nway_reset,

   What does this fix?

>  	.get_msglevel		= ravb_get_msglevel,
>  	.set_msglevel		= ravb_set_msglevel,
>  	.get_link		= ethtool_op_get_link,

MBR, Sergei
Sergei Shtylyov May 26, 2018, 5:51 p.m. UTC | #10
On 05/24/2018 02:11 PM, Vladimir Zapolskiy wrote:

> The change fixes a sleep in atomic context issue, which can be
> always triggered by running 'ethtool -r' command, because
> phy_start_aneg() protects phydev fields by a mutex.

   BTW, I was unable to trigger the BUG() with 'ethtool -r eth0' where 'eth0'
is EtherAVB. What am I doing wrong? :-)

MBR, Sergei
diff mbox

Patch

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 68f122140966..4a043eb0e2aa 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -1150,21 +1150,6 @@  static int ravb_set_link_ksettings(struct net_device *ndev,
 	return error;
 }
 
-static int ravb_nway_reset(struct net_device *ndev)
-{
-	struct ravb_private *priv = netdev_priv(ndev);
-	int error = -ENODEV;
-	unsigned long flags;
-
-	if (ndev->phydev) {
-		spin_lock_irqsave(&priv->lock, flags);
-		error = phy_start_aneg(ndev->phydev);
-		spin_unlock_irqrestore(&priv->lock, flags);
-	}
-
-	return error;
-}
-
 static u32 ravb_get_msglevel(struct net_device *ndev)
 {
 	struct ravb_private *priv = netdev_priv(ndev);
@@ -1377,7 +1362,7 @@  static int ravb_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
 }
 
 static const struct ethtool_ops ravb_ethtool_ops = {
-	.nway_reset		= ravb_nway_reset,
+	.nway_reset		= phy_ethtool_nway_reset,
 	.get_msglevel		= ravb_get_msglevel,
 	.set_msglevel		= ravb_set_msglevel,
 	.get_link		= ethtool_op_get_link,