diff mbox series

[net-next] r8169: reset bus if NIC isn't accessible after tx timeout

Message ID 85f2b5e5-ea85-3a84-1a5e-c4f84897ac04@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net-next] r8169: reset bus if NIC isn't accessible after tx timeout | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 10 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Heiner Kallweit Jan. 13, 2023, 6:32 a.m. UTC
ASPM issues may result in the NIC not being accessible any longer.
In this case disabling ASPM may not work. Therefore detect this case
by checking whether register reads return ~0, and try to make the
NIC accessible again by resetting the secondary bus.

Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/ethernet/realtek/r8169_main.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Alexander Duyck Jan. 13, 2023, 3:39 p.m. UTC | #1
On Thu, Jan 12, 2023 at 10:32 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> ASPM issues may result in the NIC not being accessible any longer.
> In this case disabling ASPM may not work. Therefore detect this case
> by checking whether register reads return ~0, and try to make the
> NIC accessible again by resetting the secondary bus.
>
> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> ---
>  drivers/net/ethernet/realtek/r8169_main.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> index 49c124d8e..b79ccde70 100644
> --- a/drivers/net/ethernet/realtek/r8169_main.c
> +++ b/drivers/net/ethernet/realtek/r8169_main.c
> @@ -4535,6 +4535,10 @@ static void rtl_task(struct work_struct *work)
>                 goto out_unlock;
>
>         if (test_and_clear_bit(RTL_FLAG_TASK_TX_TIMEOUT, tp->wk.flags)) {
> +               /* if NIC isn't accessible, reset secondary bus to revive it */
> +               if (RTL_R32(tp, TxConfig) == ~0)
> +                       pci_reset_bus(tp->pci_dev);
> +
>                 /* ASPM compatibility issues are a typical reason for tx timeouts */
>                 ret = pci_disable_link_state(tp->pci_dev, PCIE_LINK_STATE_L1 |
>                                                           PCIE_LINK_STATE_L0S);

So this is headed in the right direction, but you should probably have
some exception handling in place for the pci_reset_bus in the event
that it fails. It is possible that the device is just gone and that in
turn is triggering this. If I recall correctly, when the device
doesn't come back the pci_reset_bus should return -ENOTTY. In such a
case you can probably report that the device has failed and wait for
the PCIe subsystem to notice and notify you that the device is gone
and remove it.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 49c124d8e..b79ccde70 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -4535,6 +4535,10 @@  static void rtl_task(struct work_struct *work)
 		goto out_unlock;
 
 	if (test_and_clear_bit(RTL_FLAG_TASK_TX_TIMEOUT, tp->wk.flags)) {
+		/* if NIC isn't accessible, reset secondary bus to revive it */
+		if (RTL_R32(tp, TxConfig) == ~0)
+			pci_reset_bus(tp->pci_dev);
+
 		/* ASPM compatibility issues are a typical reason for tx timeouts */
 		ret = pci_disable_link_state(tp->pci_dev, PCIE_LINK_STATE_L1 |
 							  PCIE_LINK_STATE_L0S);