diff mbox series

[net] Revert "r8169: don't try to disable interrupts if NAPI is, scheduled already"

Message ID 9b5b6f4c-4f54-4b90-b0b3-8d8023c2e780@gmail.com (mailing list archive)
State Accepted
Commit eabb8a9be1e4a12f3bf37ceb7411083e3775672d
Delegated to: Netdev Maintainers
Headers show
Series [net] Revert "r8169: don't try to disable interrupts if NAPI is, scheduled already" | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 925 this patch: 925
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 936 this patch: 936
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 936 this patch: 936
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 12 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-05-21--06-00 (tests: 1038)

Commit Message

Heiner Kallweit May 15, 2024, 6:18 a.m. UTC
This reverts commit 7274c4147afbf46f45b8501edbdad6da8cd013b9.

Ken reported that RTL8125b can lock up if gro_flush_timeout has the
default value of 20000 and napi_defer_hard_irqs is set to 0.
In this scenario device interrupts aren't disabled, what seems to
trigger some silicon bug under heavy load. I was able to reproduce this
behavior on RTL8168h. Fix this by reverting 7274c4147afb.

Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already")
Cc: stable@vger.kernel.org
Reported-by: Ken Milmore <ken.milmore@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/ethernet/realtek/r8169_main.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

Comments

Eric Dumazet May 15, 2024, 12:23 p.m. UTC | #1
On Wed, May 15, 2024 at 8:18 AM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> This reverts commit 7274c4147afbf46f45b8501edbdad6da8cd013b9.
>
> Ken reported that RTL8125b can lock up if gro_flush_timeout has the
> default value of 20000 and napi_defer_hard_irqs is set to 0.
> In this scenario device interrupts aren't disabled, what seems to
> trigger some silicon bug under heavy load. I was able to reproduce this
> behavior on RTL8168h. Fix this by reverting 7274c4147afb.
>
> Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already")
> Cc: stable@vger.kernel.org
> Reported-by: Ken Milmore <ken.milmore@gmail.com>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks !
Ken Milmore May 16, 2024, 9:10 p.m. UTC | #2
On 15/05/2024 07:18, Heiner Kallweit wrote:
> This reverts commit 7274c4147afbf46f45b8501edbdad6da8cd013b9.
> 
> Ken reported that RTL8125b can lock up if gro_flush_timeout has the
> default value of 20000 and napi_defer_hard_irqs is set to 0.
> In this scenario device interrupts aren't disabled, what seems to
> trigger some silicon bug under heavy load. I was able to reproduce this
> behavior on RTL8168h. Fix this by reverting 7274c4147afb.
> 
> Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already")
> Cc: stable@vger.kernel.org
> Reported-by: Ken Milmore <ken.milmore@gmail.com>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> ---
>  drivers/net/ethernet/realtek/r8169_main.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> index 0fc5fe564ae5..69606c8081a3 100644
> --- a/drivers/net/ethernet/realtek/r8169_main.c
> +++ b/drivers/net/ethernet/realtek/r8169_main.c
> @@ -4655,10 +4655,8 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>  		rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING);
>  	}
>  
> -	if (napi_schedule_prep(&tp->napi)) {
> -		rtl_irq_disable(tp);
> -		__napi_schedule(&tp->napi);
> -	}
> +	rtl_irq_disable(tp);
> +	napi_schedule(&tp->napi);
>  out:
>  	rtl_ack_events(tp, status);
>  

FYI, by now I am reasonably well convinced that the behaviour we've been seeing
is not in fact a silicon bug, but rather a very specific behaviour regarding how
these devices raise MSI interrupts.

This is largely due to the investigations by David Dillow described exhaustively
in the 2009 netdev thread linked below. I wish I had spotted this much sooner!
This information has been corroborated by my own testing on the RTL8125b:
https://lore.kernel.org/netdev/1242001754.4093.12.camel@obelisk.thedillows.org/T/

To summarise precisely what I think the behaviour is:

********
An interrupt is generated *only* when the device registers undergo a transition
from (status & mask) == 0 to (status & mask) != 0.
********

If the above holds, then calling rtl_irq_disable() will immediately force the
condition (status & mask) == 0, so we are ready to raise another interrupt when
interrupts are subsequently enabled again. 

To try and verify this, I tried the code below, which locks up the network
traffic immediately, regardless of the setting of napi_defer_hard_irqs:

diff --git linux-source-6.1~/drivers/net/ethernet/realtek/r8169_main.c linux-source-6.1/drivers/net/ethernet/realtek/r8169_main.c
index 2ce4bff..add5bdd 100644
--- linux-source-6.1~/drivers/net/ethernet/realtek/r8169_main.c
+++ linux-source-6.1/drivers/net/ethernet/realtek/r8169_main.c
@@ -4607,10 +4607,13 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 		rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING);
 	}
 
-	rtl_irq_disable(tp);
 	napi_schedule(&tp->napi);
 out:
+	u32 status2 = rtl_get_events(tp);
 	rtl_ack_events(tp, status);
+	if(status2 & ~status)
+		printk_ratelimited("rtl8169_interrupt: status=%x status2=%x\n",
+				   status, status2);
 
 	return IRQ_HANDLED;
 }

Here's some typical dmesg output:

[11315.581136] rtl8169_interrupt: status=1 status2=85
[11324.142176] r8169 0000:07:00.0 eth0: ASPM disabled on Tx timeout
[11324.151765] rtl8169_interrupt: status=4 status2=84

We can see that when a new interrupt is flagged in the interval between reading
the status register and writing to it, we may never achieve the condition
(status & mask) == 0.

So, if we read 0x01 (RxOK) from the status register, we will then write
0x01 back to acknowledge the interrupt. But in the meantime, 0x04 (TxOK) has
been flagged, as well as 0x80 (TxDescUnavail), so the register now contains
0x85. We acknowledge by writing back 0x01, so the status register should now
contain 0x84. If interrupts are unmasked throughout, then (status & mask) != 0
throughout, so no interrupt will be raised for the missed TxOK event, unless
something else should occur to set one of the other status bits.

To test this hypothesis, I tried the code below, which never disables
interrupts but instead clears out the status register on every interrupt:

index 2ce4bff..dbda9ef 100644
--- linux-source-6.1~/drivers/net/ethernet/realtek/r8169_main.c
+++ linux-source-6.1/drivers/net/ethernet/realtek/r8169_main.c
@@ -4610,7 +4610,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 	rtl_irq_disable(tp);
 	napi_schedule(&tp->napi);
 out:
-	rtl_ack_events(tp, status);
+	rtl_ack_events(tp, tp->irq_mask);
 
 	return IRQ_HANDLED;
 }

This passed my iperf3 test perfectly! It is likely to cause other problems
though: Specifically it opens the possibility that we will miss a SYSErr,
LinkChg or RxFIFOOver interrupt. Hence the rationale for achieving the required
(status & mask) == 0 condition by clearing the mask register instead.

I hope this information may prove useful in the future.
Ken Milmore May 16, 2024, 9:31 p.m. UTC | #3
On 16/05/2024 22:10, Ken Milmore wrote:
> To test this hypothesis, I tried the code below, which never disables
> interrupts but instead clears out the status register on every interrupt:
> 
> index 2ce4bff..dbda9ef 100644
> --- linux-source-6.1~/drivers/net/ethernet/realtek/r8169_main.c
> +++ linux-source-6.1/drivers/net/ethernet/realtek/r8169_main.c
> @@ -4610,7 +4610,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>  	rtl_irq_disable(tp);
>  	napi_schedule(&tp->napi);
>  out:
> -	rtl_ack_events(tp, status);
> +	rtl_ack_events(tp, tp->irq_mask);
>  
>  	return IRQ_HANDLED;
>  }
> 
> This passed my iperf3 test perfectly! It is likely to cause other problems
> though: Specifically it opens the possibility that we will miss a SYSErr,
> LinkChg or RxFIFOOver interrupt. Hence the rationale for achieving the required
> (status & mask) == 0 condition by clearing the mask register instead.

Sorry, that patch should have been:

diff --git linux-source-6.1~/drivers/net/ethernet/realtek/r8169_main.c linux-source-6.1/drivers/net/ethernet/realtek/r8169_main.c
index 2ce4bff..e9757ff 100644
--- linux-source-6.1~/drivers/net/ethernet/realtek/r8169_main.c
+++ linux-source-6.1/drivers/net/ethernet/realtek/r8169_main.c
@@ -4607,10 +4607,9 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
                rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING);
        }
 
-       rtl_irq_disable(tp);
        napi_schedule(&tp->napi);
 out:
-       rtl_ack_events(tp, status);
+       rtl_ack_events(tp, tp->irq_mask);
 
        return IRQ_HANDLED;
 }
Heiner Kallweit May 20, 2024, 1:50 p.m. UTC | #4
On 15.05.2024 08:18, Heiner Kallweit wrote:
> This reverts commit 7274c4147afbf46f45b8501edbdad6da8cd013b9.
> 
> Ken reported that RTL8125b can lock up if gro_flush_timeout has the
> default value of 20000 and napi_defer_hard_irqs is set to 0.
> In this scenario device interrupts aren't disabled, what seems to
> trigger some silicon bug under heavy load. I was able to reproduce this
> behavior on RTL8168h. Fix this by reverting 7274c4147afb.
> 
> Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already")
> Cc: stable@vger.kernel.org
> Reported-by: Ken Milmore <ken.milmore@gmail.com>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> ---

This patch was mistakenly set to "changes requested".
The replies from Ken provide additional details on how interrupt mask
and status register behave on these chips. However the patch itself
is correct and should be applied.
Ken Milmore May 20, 2024, 3:55 p.m. UTC | #5
I concur, Heiner's patch is correct. Sorry about the noise.

On 20/05/2024 14:50, Heiner Kallweit wrote:
> On 15.05.2024 08:18, Heiner Kallweit wrote:
>> This reverts commit 7274c4147afbf46f45b8501edbdad6da8cd013b9.
>>
>> Ken reported that RTL8125b can lock up if gro_flush_timeout has the
>> default value of 20000 and napi_defer_hard_irqs is set to 0.
>> In this scenario device interrupts aren't disabled, what seems to
>> trigger some silicon bug under heavy load. I was able to reproduce this
>> behavior on RTL8168h. Fix this by reverting 7274c4147afb.
>>
>> Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already")
>> Cc: stable@vger.kernel.org
>> Reported-by: Ken Milmore <ken.milmore@gmail.com>
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>> ---
> 
> This patch was mistakenly set to "changes requested".
> The replies from Ken provide additional details on how interrupt mask
> and status register behave on these chips. However the patch itself
> is correct and should be applied.
>
Simon Horman May 20, 2024, 7:54 p.m. UTC | #6
On Mon, May 20, 2024 at 03:50:11PM +0200, Heiner Kallweit wrote:
> On 15.05.2024 08:18, Heiner Kallweit wrote:
> > This reverts commit 7274c4147afbf46f45b8501edbdad6da8cd013b9.
> > 
> > Ken reported that RTL8125b can lock up if gro_flush_timeout has the
> > default value of 20000 and napi_defer_hard_irqs is set to 0.
> > In this scenario device interrupts aren't disabled, what seems to
> > trigger some silicon bug under heavy load. I was able to reproduce this
> > behavior on RTL8168h. Fix this by reverting 7274c4147afb.
> > 
> > Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already")
> > Cc: stable@vger.kernel.org
> > Reported-by: Ken Milmore <ken.milmore@gmail.com>
> > Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> > ---
> 
> This patch was mistakenly set to "changes requested".
> The replies from Ken provide additional details on how interrupt mask
> and status register behave on these chips. However the patch itself
> is correct and should be applied.

I guess someone hit the wrong button by mistake.
Let's see if this puts things back on track.

pw-bot: under-review
patchwork-bot+netdevbpf@kernel.org May 21, 2024, 9:50 a.m. UTC | #7
Hello:

This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Wed, 15 May 2024 08:18:01 +0200 you wrote:
> This reverts commit 7274c4147afbf46f45b8501edbdad6da8cd013b9.
> 
> Ken reported that RTL8125b can lock up if gro_flush_timeout has the
> default value of 20000 and napi_defer_hard_irqs is set to 0.
> In this scenario device interrupts aren't disabled, what seems to
> trigger some silicon bug under heavy load. I was able to reproduce this
> behavior on RTL8168h. Fix this by reverting 7274c4147afb.
> 
> [...]

Here is the summary with links:
  - [net] Revert "r8169: don't try to disable interrupts if NAPI is, scheduled already"
    https://git.kernel.org/netdev/net/c/eabb8a9be1e4

You are awesome, thank you!
diff mbox series

Patch

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 0fc5fe564ae5..69606c8081a3 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -4655,10 +4655,8 @@  static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 		rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING);
 	}
 
-	if (napi_schedule_prep(&tp->napi)) {
-		rtl_irq_disable(tp);
-		__napi_schedule(&tp->napi);
-	}
+	rtl_irq_disable(tp);
+	napi_schedule(&tp->napi);
 out:
 	rtl_ack_events(tp, status);