diff mbox series

[RFC,net] Revert "net: phy: Fix race condition on link status change"

Message ID 20230816180944.19262-1-fancer.lancer@gmail.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series [RFC,net] Revert "net: phy: Fix race condition on link status change" | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1328 this patch: 1328
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 1351 this patch: 1351
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1351 this patch: 1351
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 19 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Serge Semin Aug. 16, 2023, 6:09 p.m. UTC
Protecting the phy_driver.drv->handle_interrupt() callback invocation by
the phy_device.lock mutex causes all the IRQ-capable PHY drivers to lock
the mutex twice thus deadlocking on the next calls thread:
IRQ: phy_interrupt()
     +-> mutex_lock(&phydev->lock); <-------------+
         drv->handle_interrupt()                  | Deadlock due to the
         +-> phy_error()                          + nested PHY-device
             +-> phy_process_error()              | mutex lock
                 +-> mutex_lock(&phydev->lock); <-+
                     phydev->state = PHY_ERROR;
                     mutex_unlock(&phydev->lock);
         mutex_unlock(&phydev->lock);

The problem can be easily reproduced just by calling phy_error() from the
any PHY-device interrupt handler. Reverting the commit 91a7cda1f4b8 ("net:
phy: Fix race condition on link status change") fixes the deadlock.

This reverts commit 91a7cda1f4b8bdf770000a3b60640576dafe0cec.

Fixes: 91a7cda1f4b8 ("net: phy: Fix race condition on link status change")
Signed-off-by: Serge Semin <fancer.lancer@gmail.com>

---

Since obviously it would be better to fix both the deadlock and the
problem described in the blamed commit the patch is marked as RFC. I am
not aware of a better solution for now than to revert the commit caused
the regression. So let's discuss to find out whether it's possible to have
a better fix here.

---
 drivers/net/phy/phy.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

Comments

Andrew Lunn Aug. 16, 2023, 7:35 p.m. UTC | #1
On Wed, Aug 16, 2023 at 09:09:40PM +0300, Serge Semin wrote:
> Protecting the phy_driver.drv->handle_interrupt() callback invocation by
> the phy_device.lock mutex causes all the IRQ-capable PHY drivers to lock
> the mutex twice thus deadlocking on the next calls thread:
> IRQ: phy_interrupt()
>      +-> mutex_lock(&phydev->lock); <-------------+
>          drv->handle_interrupt()                  | Deadlock due to the
>          +-> phy_error()                          + nested PHY-device
>              +-> phy_process_error()              | mutex lock
>                  +-> mutex_lock(&phydev->lock); <-+
>                      phydev->state = PHY_ERROR;
>                      mutex_unlock(&phydev->lock);
>          mutex_unlock(&phydev->lock);
> 
> The problem can be easily reproduced just by calling phy_error() from the
> any PHY-device interrupt handler.

https://elixir.bootlin.com/linux/v6.5-rc6/source/drivers/net/phy/phy.c#L1201

/**
 * phy_error - enter ERROR state for this PHY device
 * @phydev: target phy_device struct
 *
 * Moves the PHY to the ERROR state in response to a read
 * or write error, and tells the controller the link is down.
 * Must not be called from interrupt context, or while the
 * phydev->lock is held.
 */
void phy_error(struct phy_device *phydev)
{
	WARN_ON(1);
	phy_process_error(phydev);
}
EXPORT_SYMBOL(phy_error);

It is clearly documented you should not do this.

[Goes and looks]

Ah, there are lots of examples of

micrel.c-	irq_status = phy_read(phydev, LAN8814_INTS);
micrel.c-	if (irq_status < 0) {
micrel.c:		phy_error(phydev);
micrel.c-		return IRQ_NONE;
micrel.c-	}

I actually think phy_error() is broken here. The general pattern is
that the mutex is locked before calling into the driver. So we
actually want phy_error() to be safe to use with the lock already
taken. The exceptions when the lock is not held is stuff outside of
PHY operation, like HWMON, and suspend and resume, plus probe.

So i suggest you change phy_process_error() to remove the lock. Maybe
add a test to ensure the lock is actually held, and do a phydev_err()
if not.

The comment about interrupt context is also probably bogus. phylib
only uses threaded interrupts, and it is safe to block in this
context.

	Andrew
Serge Semin Aug. 16, 2023, 8:03 p.m. UTC | #2
On Wed, Aug 16, 2023 at 09:35:57PM +0200, Andrew Lunn wrote:
> On Wed, Aug 16, 2023 at 09:09:40PM +0300, Serge Semin wrote:
> > Protecting the phy_driver.drv->handle_interrupt() callback invocation by
> > the phy_device.lock mutex causes all the IRQ-capable PHY drivers to lock
> > the mutex twice thus deadlocking on the next calls thread:
> > IRQ: phy_interrupt()
> >      +-> mutex_lock(&phydev->lock); <-------------+
> >          drv->handle_interrupt()                  | Deadlock due to the
> >          +-> phy_error()                          + nested PHY-device
> >              +-> phy_process_error()              | mutex lock
> >                  +-> mutex_lock(&phydev->lock); <-+
> >                      phydev->state = PHY_ERROR;
> >                      mutex_unlock(&phydev->lock);
> >          mutex_unlock(&phydev->lock);
> > 
> > The problem can be easily reproduced just by calling phy_error() from the
> > any PHY-device interrupt handler.
> 
> https://elixir.bootlin.com/linux/v6.5-rc6/source/drivers/net/phy/phy.c#L1201
> 
> /**
>  * phy_error - enter ERROR state for this PHY device
>  * @phydev: target phy_device struct
>  *
>  * Moves the PHY to the ERROR state in response to a read
>  * or write error, and tells the controller the link is down.
>  * Must not be called from interrupt context, or while the
>  * phydev->lock is held.
>  */
> void phy_error(struct phy_device *phydev)
> {
> 	WARN_ON(1);
> 	phy_process_error(phydev);
> }
> EXPORT_SYMBOL(phy_error);
> 
> It is clearly documented you should not do this.
> 
> [Goes and looks]
> 
> Ah, there are lots of examples of
> 
> micrel.c-	irq_status = phy_read(phydev, LAN8814_INTS);
> micrel.c-	if (irq_status < 0) {
> micrel.c:		phy_error(phydev);
> micrel.c-		return IRQ_NONE;
> micrel.c-	}

It's literally the only pattern where the phy_error() method is utilized but
it can be found in all IRQ-capable network PHY drivers. Seeing how
widespread the function usage is I didn't even dared to think that
the function doc could state it can't be utilized like that or when the
lock is held.

> 
> I actually think phy_error() is broken here. The general pattern is
> that the mutex is locked before calling into the driver. So we
> actually want phy_error() to be safe to use with the lock already
> taken. The exceptions when the lock is not held is stuff outside of
> PHY operation, like HWMON, and suspend and resume, plus probe.
> 
> So i suggest you change phy_process_error() to remove the lock.

This doable.

> Maybe
> add a test to ensure the lock is actually held, and do a phydev_err()
> if not.

This can't be done since phy_state_machine() calls phy_error_precise()
which calls phy_process_error() with no phy_device.lock held. Printing the
error in that case would mean an error in the Networking PHY subsystem
itself.

Do you suggest to take the lock before calling phy_error_precise() then?

> 
> The comment about interrupt context is also probably bogus. phylib
> only uses threaded interrupts, and it is safe to block in this
> context.

Ok.

-Serge(y)

> 
> 	Andrew
Andrew Lunn Aug. 16, 2023, 8:13 p.m. UTC | #3
> > So i suggest you change phy_process_error() to remove the lock.
> 
> This doable.
> 
> > Maybe
> > add a test to ensure the lock is actually held, and do a phydev_err()
> > if not.
> 
> This can't be done since phy_state_machine() calls phy_error_precise()
> which calls phy_process_error() with no phy_device.lock held. Printing the
> error in that case would mean an error in the Networking PHY subsystem
> itself.
> 
> Do you suggest to take the lock before calling phy_error_precise() then?

Thanks for digging into the details.

phy_error_precise() is used in exactly one place. So i would actually
put the lock inside it. And maybe move the comment about not using the
function with the lock already held here :-)

	 Andrew
Serge Semin Aug. 16, 2023, 8:24 p.m. UTC | #4
On Wed, Aug 16, 2023 at 10:13:10PM +0200, Andrew Lunn wrote:
> > > So i suggest you change phy_process_error() to remove the lock.
> > 
> > This doable.
> > 
> > > Maybe
> > > add a test to ensure the lock is actually held, and do a phydev_err()
> > > if not.
> > 
> > This can't be done since phy_state_machine() calls phy_error_precise()
> > which calls phy_process_error() with no phy_device.lock held. Printing the
> > error in that case would mean an error in the Networking PHY subsystem
> > itself.
> > 
> > Do you suggest to take the lock before calling phy_error_precise() then?
> 
> Thanks for digging into the details.
> 
> phy_error_precise() is used in exactly one place. So i would actually
> put the lock inside it. And maybe move the comment about not using the
> function with the lock already held here :-)

Ok. I'll resubmit the patch tomorrow with the RFC status dropped.

-Serge(y)

> 
> 	 Andrew
diff mbox series

Patch

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index bdf00b2b2c1d..9483bd57158e 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -1235,7 +1235,6 @@  static irqreturn_t phy_interrupt(int irq, void *phy_dat)
 {
 	struct phy_device *phydev = phy_dat;
 	struct phy_driver *drv = phydev->drv;
-	irqreturn_t ret;
 
 	/* Wakeup interrupts may occur during a system sleep transition.
 	 * Postpone handling until the PHY has resumed.
@@ -1259,11 +1258,7 @@  static irqreturn_t phy_interrupt(int irq, void *phy_dat)
 		return IRQ_HANDLED;
 	}
 
-	mutex_lock(&phydev->lock);
-	ret = drv->handle_interrupt(phydev);
-	mutex_unlock(&phydev->lock);
-
-	return ret;
+	return drv->handle_interrupt(phydev);
 }
 
 /**