Message ID | 85f2b5e5-ea85-3a84-1a5e-c4f84897ac04@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] r8169: reset bus if NIC isn't accessible after tx timeout | expand |
On Thu, Jan 12, 2023 at 10:32 PM Heiner Kallweit <hkallweit1@gmail.com> wrote: > > ASPM issues may result in the NIC not being accessible any longer. > In this case disabling ASPM may not work. Therefore detect this case > by checking whether register reads return ~0, and try to make the > NIC accessible again by resetting the secondary bus. > > Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> > Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> > --- > drivers/net/ethernet/realtek/r8169_main.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c > index 49c124d8e..b79ccde70 100644 > --- a/drivers/net/ethernet/realtek/r8169_main.c > +++ b/drivers/net/ethernet/realtek/r8169_main.c > @@ -4535,6 +4535,10 @@ static void rtl_task(struct work_struct *work) > goto out_unlock; > > if (test_and_clear_bit(RTL_FLAG_TASK_TX_TIMEOUT, tp->wk.flags)) { > + /* if NIC isn't accessible, reset secondary bus to revive it */ > + if (RTL_R32(tp, TxConfig) == ~0) > + pci_reset_bus(tp->pci_dev); > + > /* ASPM compatibility issues are a typical reason for tx timeouts */ > ret = pci_disable_link_state(tp->pci_dev, PCIE_LINK_STATE_L1 | > PCIE_LINK_STATE_L0S); So this is headed in the right direction, but you should probably have some exception handling in place for the pci_reset_bus in the event that it fails. It is possible that the device is just gone and that in turn is triggering this. If I recall correctly, when the device doesn't come back the pci_reset_bus should return -ENOTTY. In such a case you can probably report that the device has failed and wait for the PCIe subsystem to notice and notify you that the device is gone and remove it.
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 49c124d8e..b79ccde70 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -4535,6 +4535,10 @@ static void rtl_task(struct work_struct *work) goto out_unlock; if (test_and_clear_bit(RTL_FLAG_TASK_TX_TIMEOUT, tp->wk.flags)) { + /* if NIC isn't accessible, reset secondary bus to revive it */ + if (RTL_R32(tp, TxConfig) == ~0) + pci_reset_bus(tp->pci_dev); + /* ASPM compatibility issues are a typical reason for tx timeouts */ ret = pci_disable_link_state(tp->pci_dev, PCIE_LINK_STATE_L1 | PCIE_LINK_STATE_L0S);
ASPM issues may result in the NIC not being accessible any longer. In this case disabling ASPM may not work. Therefore detect this case by checking whether register reads return ~0, and try to make the NIC accessible again by resetting the secondary bus. Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> --- drivers/net/ethernet/realtek/r8169_main.c | 4 ++++ 1 file changed, 4 insertions(+)