Message ID | 20230816180944.19262-1-fancer.lancer@gmail.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [RFC,net] Revert "net: phy: Fix race condition on link status change" | expand |
On Wed, Aug 16, 2023 at 09:09:40PM +0300, Serge Semin wrote: > Protecting the phy_driver.drv->handle_interrupt() callback invocation by > the phy_device.lock mutex causes all the IRQ-capable PHY drivers to lock > the mutex twice thus deadlocking on the next calls thread: > IRQ: phy_interrupt() > +-> mutex_lock(&phydev->lock); <-------------+ > drv->handle_interrupt() | Deadlock due to the > +-> phy_error() + nested PHY-device > +-> phy_process_error() | mutex lock > +-> mutex_lock(&phydev->lock); <-+ > phydev->state = PHY_ERROR; > mutex_unlock(&phydev->lock); > mutex_unlock(&phydev->lock); > > The problem can be easily reproduced just by calling phy_error() from the > any PHY-device interrupt handler. https://elixir.bootlin.com/linux/v6.5-rc6/source/drivers/net/phy/phy.c#L1201 /** * phy_error - enter ERROR state for this PHY device * @phydev: target phy_device struct * * Moves the PHY to the ERROR state in response to a read * or write error, and tells the controller the link is down. * Must not be called from interrupt context, or while the * phydev->lock is held. */ void phy_error(struct phy_device *phydev) { WARN_ON(1); phy_process_error(phydev); } EXPORT_SYMBOL(phy_error); It is clearly documented you should not do this. [Goes and looks] Ah, there are lots of examples of micrel.c- irq_status = phy_read(phydev, LAN8814_INTS); micrel.c- if (irq_status < 0) { micrel.c: phy_error(phydev); micrel.c- return IRQ_NONE; micrel.c- } I actually think phy_error() is broken here. The general pattern is that the mutex is locked before calling into the driver. So we actually want phy_error() to be safe to use with the lock already taken. The exceptions when the lock is not held is stuff outside of PHY operation, like HWMON, and suspend and resume, plus probe. So i suggest you change phy_process_error() to remove the lock. Maybe add a test to ensure the lock is actually held, and do a phydev_err() if not. The comment about interrupt context is also probably bogus. phylib only uses threaded interrupts, and it is safe to block in this context. Andrew
On Wed, Aug 16, 2023 at 09:35:57PM +0200, Andrew Lunn wrote: > On Wed, Aug 16, 2023 at 09:09:40PM +0300, Serge Semin wrote: > > Protecting the phy_driver.drv->handle_interrupt() callback invocation by > > the phy_device.lock mutex causes all the IRQ-capable PHY drivers to lock > > the mutex twice thus deadlocking on the next calls thread: > > IRQ: phy_interrupt() > > +-> mutex_lock(&phydev->lock); <-------------+ > > drv->handle_interrupt() | Deadlock due to the > > +-> phy_error() + nested PHY-device > > +-> phy_process_error() | mutex lock > > +-> mutex_lock(&phydev->lock); <-+ > > phydev->state = PHY_ERROR; > > mutex_unlock(&phydev->lock); > > mutex_unlock(&phydev->lock); > > > > The problem can be easily reproduced just by calling phy_error() from the > > any PHY-device interrupt handler. > > https://elixir.bootlin.com/linux/v6.5-rc6/source/drivers/net/phy/phy.c#L1201 > > /** > * phy_error - enter ERROR state for this PHY device > * @phydev: target phy_device struct > * > * Moves the PHY to the ERROR state in response to a read > * or write error, and tells the controller the link is down. > * Must not be called from interrupt context, or while the > * phydev->lock is held. > */ > void phy_error(struct phy_device *phydev) > { > WARN_ON(1); > phy_process_error(phydev); > } > EXPORT_SYMBOL(phy_error); > > It is clearly documented you should not do this. > > [Goes and looks] > > Ah, there are lots of examples of > > micrel.c- irq_status = phy_read(phydev, LAN8814_INTS); > micrel.c- if (irq_status < 0) { > micrel.c: phy_error(phydev); > micrel.c- return IRQ_NONE; > micrel.c- } It's literally the only pattern where the phy_error() method is utilized but it can be found in all IRQ-capable network PHY drivers. Seeing how widespread the function usage is I didn't even dared to think that the function doc could state it can't be utilized like that or when the lock is held. > > I actually think phy_error() is broken here. The general pattern is > that the mutex is locked before calling into the driver. So we > actually want phy_error() to be safe to use with the lock already > taken. The exceptions when the lock is not held is stuff outside of > PHY operation, like HWMON, and suspend and resume, plus probe. > > So i suggest you change phy_process_error() to remove the lock. This doable. > Maybe > add a test to ensure the lock is actually held, and do a phydev_err() > if not. This can't be done since phy_state_machine() calls phy_error_precise() which calls phy_process_error() with no phy_device.lock held. Printing the error in that case would mean an error in the Networking PHY subsystem itself. Do you suggest to take the lock before calling phy_error_precise() then? > > The comment about interrupt context is also probably bogus. phylib > only uses threaded interrupts, and it is safe to block in this > context. Ok. -Serge(y) > > Andrew
> > So i suggest you change phy_process_error() to remove the lock. > > This doable. > > > Maybe > > add a test to ensure the lock is actually held, and do a phydev_err() > > if not. > > This can't be done since phy_state_machine() calls phy_error_precise() > which calls phy_process_error() with no phy_device.lock held. Printing the > error in that case would mean an error in the Networking PHY subsystem > itself. > > Do you suggest to take the lock before calling phy_error_precise() then? Thanks for digging into the details. phy_error_precise() is used in exactly one place. So i would actually put the lock inside it. And maybe move the comment about not using the function with the lock already held here :-) Andrew
On Wed, Aug 16, 2023 at 10:13:10PM +0200, Andrew Lunn wrote: > > > So i suggest you change phy_process_error() to remove the lock. > > > > This doable. > > > > > Maybe > > > add a test to ensure the lock is actually held, and do a phydev_err() > > > if not. > > > > This can't be done since phy_state_machine() calls phy_error_precise() > > which calls phy_process_error() with no phy_device.lock held. Printing the > > error in that case would mean an error in the Networking PHY subsystem > > itself. > > > > Do you suggest to take the lock before calling phy_error_precise() then? > > Thanks for digging into the details. > > phy_error_precise() is used in exactly one place. So i would actually > put the lock inside it. And maybe move the comment about not using the > function with the lock already held here :-) Ok. I'll resubmit the patch tomorrow with the RFC status dropped. -Serge(y) > > Andrew
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index bdf00b2b2c1d..9483bd57158e 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -1235,7 +1235,6 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat) { struct phy_device *phydev = phy_dat; struct phy_driver *drv = phydev->drv; - irqreturn_t ret; /* Wakeup interrupts may occur during a system sleep transition. * Postpone handling until the PHY has resumed. @@ -1259,11 +1258,7 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat) return IRQ_HANDLED; } - mutex_lock(&phydev->lock); - ret = drv->handle_interrupt(phydev); - mutex_unlock(&phydev->lock); - - return ret; + return drv->handle_interrupt(phydev); } /**
Protecting the phy_driver.drv->handle_interrupt() callback invocation by the phy_device.lock mutex causes all the IRQ-capable PHY drivers to lock the mutex twice thus deadlocking on the next calls thread: IRQ: phy_interrupt() +-> mutex_lock(&phydev->lock); <-------------+ drv->handle_interrupt() | Deadlock due to the +-> phy_error() + nested PHY-device +-> phy_process_error() | mutex lock +-> mutex_lock(&phydev->lock); <-+ phydev->state = PHY_ERROR; mutex_unlock(&phydev->lock); mutex_unlock(&phydev->lock); The problem can be easily reproduced just by calling phy_error() from the any PHY-device interrupt handler. Reverting the commit 91a7cda1f4b8 ("net: phy: Fix race condition on link status change") fixes the deadlock. This reverts commit 91a7cda1f4b8bdf770000a3b60640576dafe0cec. Fixes: 91a7cda1f4b8 ("net: phy: Fix race condition on link status change") Signed-off-by: Serge Semin <fancer.lancer@gmail.com> --- Since obviously it would be better to fix both the deadlock and the problem described in the blamed commit the patch is marked as RFC. I am not aware of a better solution for now than to revert the commit caused the regression. So let's discuss to find out whether it's possible to have a better fix here. --- drivers/net/phy/phy.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-)