diff mbox series

[net,v2] r8169: fix rtl8125b PAUSE frames blasting when suspended

Message ID 20231129155350.5843-1-hau@realtek.com (mailing list archive)
State Accepted
Commit 4b0768b6556af56ee9b7cf4e68452a2b6289ae45
Delegated to: Netdev Maintainers
Headers show
Series [net,v2] r8169: fix rtl8125b PAUSE frames blasting when suspended | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/codegen success Generated files up to date
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1115 this patch: 1115
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 1142 this patch: 1142
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1142 this patch: 1142
netdev/checkpatch warning CHECK: Prefer using the BIT macro
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

ChunHao Lin Nov. 29, 2023, 3:53 p.m. UTC
When FIFO reaches near full state, device will issue pause frame.
If pause slot is enabled(set to 1), in this time, device will issue
pause frame only once. But if pause slot is disabled(set to 0), device
will keep sending pause frames until FIFO reaches near empty state.

When pause slot is disabled, if there is no one to handle receive
packets, device FIFO will reach near full state and keep sending
pause frames. That will impact entire local area network.

This issue can be reproduced in Chromebox (not Chromebook) in
developer mode running a test image (and v5.10 kernel):
1) ping -f $CHROMEBOX (from workstation on same local network)
2) run "powerd_dbus_suspend" from command line on the $CHROMEBOX
3) ping $ROUTER (wait until ping fails from workstation)

Takes about ~20-30 seconds after step 2 for the local network to
stop working.

Fix this issue by enabling pause slot to only send pause frame once
when FIFO reaches near full state.

Fixes: f1bce4ad2f1c ("r8169: add support for RTL8125")
Reported-by: Grant Grundler <grundler@chromium.org>
Tested-by: Grant Grundler <grundler@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: ChunHao Lin <hau@realtek.com>
---
v2:
- update comment and title.
---
 drivers/net/ethernet/realtek/r8169_main.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Jacob Keller Nov. 29, 2023, 11:04 p.m. UTC | #1
On 11/29/2023 7:53 AM, ChunHao Lin wrote:
> When FIFO reaches near full state, device will issue pause frame.
> If pause slot is enabled(set to 1), in this time, device will issue
> pause frame only once. But if pause slot is disabled(set to 0), device
> will keep sending pause frames until FIFO reaches near empty state.
> 
> When pause slot is disabled, if there is no one to handle receive
> packets, device FIFO will reach near full state and keep sending
> pause frames. That will impact entire local area network.
> 
> This issue can be reproduced in Chromebox (not Chromebook) in
> developer mode running a test image (and v5.10 kernel):
> 1) ping -f $CHROMEBOX (from workstation on same local network)
> 2) run "powerd_dbus_suspend" from command line on the $CHROMEBOX
> 3) ping $ROUTER (wait until ping fails from workstation)
> 
> Takes about ~20-30 seconds after step 2 for the local network to
> stop working.
> 
> Fix this issue by enabling pause slot to only send pause frame once
> when FIFO reaches near full state.
> 

Makes sense. Avoiding the spam is good. The naming is a bit confusing
but I guess that comes from realtek datasheet?

> Fixes: f1bce4ad2f1c ("r8169: add support for RTL8125")
> Reported-by: Grant Grundler <grundler@chromium.org>
> Tested-by: Grant Grundler <grundler@chromium.org>
> Cc: stable@vger.kernel.org
> Signed-off-by: ChunHao Lin <hau@realtek.com>
> ---
> v2:
> - update comment and title.
> ---
>  drivers/net/ethernet/realtek/r8169_main.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> index 62cabeeb842a..bb787a52bc75 100644
> --- a/drivers/net/ethernet/realtek/r8169_main.c
> +++ b/drivers/net/ethernet/realtek/r8169_main.c
> @@ -196,6 +196,7 @@ enum rtl_registers {
>  					/* No threshold before first PCI xfer */
>  #define	RX_FIFO_THRESH			(7 << RXCFG_FIFO_SHIFT)
>  #define	RX_EARLY_OFF			(1 << 11)
> +#define	RX_PAUSE_SLOT_ON		(1 << 11)	/* 8125b and later */

This confuses me though: RX_EARLY_OFF is (1 << 11) as well.. Is that
from a different set of devices? We're writing to the same register
RxConfig here I think in both cases?

Can you clarify if these are supposed to be the same bit?

>  #define	RXCFG_DMA_SHIFT			8
>  					/* Unlimited maximum PCI burst. */
>  #define	RX_DMA_BURST			(7 << RXCFG_DMA_SHIFT)
> @@ -2306,9 +2307,13 @@ static void rtl_init_rxcfg(struct rtl8169_private *tp)
>  	case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_53:
>  		RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST | RX_EARLY_OFF);
>  		break;
> -	case RTL_GIGA_MAC_VER_61 ... RTL_GIGA_MAC_VER_63:
> +	case RTL_GIGA_MAC_VER_61:
>  		RTL_W32(tp, RxConfig, RX_FETCH_DFLT_8125 | RX_DMA_BURST);
>  		break;

I assume there isn't a VER_62 between these?

> +	case RTL_GIGA_MAC_VER_63:
> +		RTL_W32(tp, RxConfig, RX_FETCH_DFLT_8125 | RX_DMA_BURST |
> +			RX_PAUSE_SLOT_ON);

We add RX_PAUSE_SLOT_ON now for RTL_GIGA_MAC_VER_63 in addition. Makes
sense.
> +		break;
>  	default:
>  		RTL_W32(tp, RxConfig, RX128_INT_EN | RX_DMA_BURST);
>  		break;
Grant Grundler Nov. 29, 2023, 11:40 p.m. UTC | #2
On Wed, Nov 29, 2023 at 3:05 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
> On 11/29/2023 7:53 AM, ChunHao Lin wrote:
> > When FIFO reaches near full state, device will issue pause frame.
> > If pause slot is enabled(set to 1), in this time, device will issue
> > pause frame only once. But if pause slot is disabled(set to 0), device
> > will keep sending pause frames until FIFO reaches near empty state.
> >
> > When pause slot is disabled, if there is no one to handle receive
> > packets, device FIFO will reach near full state and keep sending
> > pause frames. That will impact entire local area network.
> >
> > This issue can be reproduced in Chromebox (not Chromebook) in
> > developer mode running a test image (and v5.10 kernel):
> > 1) ping -f $CHROMEBOX (from workstation on same local network)
> > 2) run "powerd_dbus_suspend" from command line on the $CHROMEBOX
> > 3) ping $ROUTER (wait until ping fails from workstation)
> >
> > Takes about ~20-30 seconds after step 2 for the local network to
> > stop working.
> >
> > Fix this issue by enabling pause slot to only send pause frame once
> > when FIFO reaches near full state.
> >
>
> Makes sense. Avoiding the spam is good.  The naming is a bit confusing
> but I guess that comes from realtek datasheet?

I don't know. It doesn't matter to me what it's called since I don't
have access to the data sheet anyway. :/

> > Fixes: f1bce4ad2f1c ("r8169: add support for RTL8125")
> > Reported-by: Grant Grundler <grundler@chromium.org>
> > Tested-by: Grant Grundler <grundler@chromium.org>
> > Cc: stable@vger.kernel.org
> > Signed-off-by: ChunHao Lin <hau@realtek.com>
> > ---
> > v2:
> > - update comment and title.
> > ---
> >  drivers/net/ethernet/realtek/r8169_main.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> > index 62cabeeb842a..bb787a52bc75 100644
> > --- a/drivers/net/ethernet/realtek/r8169_main.c
> > +++ b/drivers/net/ethernet/realtek/r8169_main.c
> > @@ -196,6 +196,7 @@ enum rtl_registers {
> >                                       /* No threshold before first PCI xfer */
> >  #define      RX_FIFO_THRESH                  (7 << RXCFG_FIFO_SHIFT)
> >  #define      RX_EARLY_OFF                    (1 << 11)
> > +#define      RX_PAUSE_SLOT_ON                (1 << 11)       /* 8125b and later */
>
> This confuses me though: RX_EARLY_OFF is (1 << 11) as well.. Is that
> from a different set of devices?

Yes, for a different HW version of the device.

> We're writing to the same register
> RxConfig here I think in both cases?

Yes. But to different versions of the HW which use this bit
differently. Ergo the comment about "8125b and later".

> Can you clarify if these are supposed to be the same bit?

Yes, they are the same bit - but different versions of HW use BIT(11)
differently.

>
> >  #define      RXCFG_DMA_SHIFT                 8
> >                                       /* Unlimited maximum PCI burst. */
> >  #define      RX_DMA_BURST                    (7 << RXCFG_DMA_SHIFT)
> > @@ -2306,9 +2307,13 @@ static void rtl_init_rxcfg(struct rtl8169_private *tp)
> >       case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_53:
> >               RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST | RX_EARLY_OFF);
> >               break;
> > -     case RTL_GIGA_MAC_VER_61 ... RTL_GIGA_MAC_VER_63:
> > +     case RTL_GIGA_MAC_VER_61:
> >               RTL_W32(tp, RxConfig, RX_FETCH_DFLT_8125 | RX_DMA_BURST);
> >               break;
>
> I assume there isn't a VER_62 between these?

Correct. My clue is this code near the top of this file:

 149         [RTL_GIGA_MAC_VER_61] = {"RTL8125A",            FIRMWARE_8125A_3},
 150         /* reserve 62 for CFG_METHOD_4 in the vendor driver */
 151         [RTL_GIGA_MAC_VER_63] = {"RTL8125B",            FIRMWARE_8125B_2},

>
> > +     case RTL_GIGA_MAC_VER_63:
> > +             RTL_W32(tp, RxConfig, RX_FETCH_DFLT_8125 | RX_DMA_BURST |
> > +                     RX_PAUSE_SLOT_ON);
>
> We add RX_PAUSE_SLOT_ON now for RTL_GIGA_MAC_VER_63 in addition. Makes
> sense.

Exactly.

thanks for reviewing!

cheers,
grant

> > +             break;
> >       default:
> >               RTL_W32(tp, RxConfig, RX128_INT_EN | RX_DMA_BURST);
> >               break;
Jacob Keller Nov. 30, 2023, 10:25 p.m. UTC | #3
On 11/29/2023 3:40 PM, Grant Grundler wrote:
> On Wed, Nov 29, 2023 at 3:05 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
>> On 11/29/2023 7:53 AM, ChunHao Lin wrote:
>>> When FIFO reaches near full state, device will issue pause frame.
>>> If pause slot is enabled(set to 1), in this time, device will issue
>>> pause frame only once. But if pause slot is disabled(set to 0), device
>>> will keep sending pause frames until FIFO reaches near empty state.
>>>
>>> When pause slot is disabled, if there is no one to handle receive
>>> packets, device FIFO will reach near full state and keep sending
>>> pause frames. That will impact entire local area network.
>>>
>>> This issue can be reproduced in Chromebox (not Chromebook) in
>>> developer mode running a test image (and v5.10 kernel):
>>> 1) ping -f $CHROMEBOX (from workstation on same local network)
>>> 2) run "powerd_dbus_suspend" from command line on the $CHROMEBOX
>>> 3) ping $ROUTER (wait until ping fails from workstation)
>>>
>>> Takes about ~20-30 seconds after step 2 for the local network to
>>> stop working.
>>>
>>> Fix this issue by enabling pause slot to only send pause frame once
>>> when FIFO reaches near full state.
>>>
>>
>> Makes sense. Avoiding the spam is good.  The naming is a bit confusing
>> but I guess that comes from realtek datasheet?
> 
> I don't know. It doesn't matter to me what it's called since I don't
> have access to the data sheet anyway. :/
> 

The name is fine, i just found it a bit hard to parse since its
effectively "PAUSE_SLOT_ON" makes us *not* send pause frames forever.

I think its fine as-is, since this is referring to the use of the pause
slot in hardware.

>>> Fixes: f1bce4ad2f1c ("r8169: add support for RTL8125")
>>> Reported-by: Grant Grundler <grundler@chromium.org>
>>> Tested-by: Grant Grundler <grundler@chromium.org>
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: ChunHao Lin <hau@realtek.com>
>>> ---
>>> v2:
>>> - update comment and title.
>>> ---
>>>  drivers/net/ethernet/realtek/r8169_main.c | 7 ++++++-
>>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>>> index 62cabeeb842a..bb787a52bc75 100644
>>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>>> @@ -196,6 +196,7 @@ enum rtl_registers {
>>>                                       /* No threshold before first PCI xfer */
>>>  #define      RX_FIFO_THRESH                  (7 << RXCFG_FIFO_SHIFT)
>>>  #define      RX_EARLY_OFF                    (1 << 11)
>>> +#define      RX_PAUSE_SLOT_ON                (1 << 11)       /* 8125b and later */
>>
>> This confuses me though: RX_EARLY_OFF is (1 << 11) as well.. Is that
>> from a different set of devices?
> 
> Yes, for a different HW version of the device.
> 

Great.

>> We're writing to the same register
>> RxConfig here I think in both cases?
> 
> Yes. But to different versions of the HW which use this bit
> differently. Ergo the comment about "8125b and later".
> 
>> Can you clarify if these are supposed to be the same bit?
> 
> Yes, they are the same bit - but different versions of HW use BIT(11)
> differently.

Thanks for the clarification!

> 
>>
>>>  #define      RXCFG_DMA_SHIFT                 8
>>>                                       /* Unlimited maximum PCI burst. */
>>>  #define      RX_DMA_BURST                    (7 << RXCFG_DMA_SHIFT)
>>> @@ -2306,9 +2307,13 @@ static void rtl_init_rxcfg(struct rtl8169_private *tp)
>>>       case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_53:
>>>               RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST | RX_EARLY_OFF);
>>>               break;
>>> -     case RTL_GIGA_MAC_VER_61 ... RTL_GIGA_MAC_VER_63:
>>> +     case RTL_GIGA_MAC_VER_61:
>>>               RTL_W32(tp, RxConfig, RX_FETCH_DFLT_8125 | RX_DMA_BURST);
>>>               break;
>>
>> I assume there isn't a VER_62 between these?
> 
> Correct. My clue is this code near the top of this file:
> 
>  149         [RTL_GIGA_MAC_VER_61] = {"RTL8125A",            FIRMWARE_8125A_3},
>  150         /* reserve 62 for CFG_METHOD_4 in the vendor driver */
>  151         [RTL_GIGA_MAC_VER_63] = {"RTL8125B",            FIRMWARE_8125B_2},
> 
>>
>>> +     case RTL_GIGA_MAC_VER_63:
>>> +             RTL_W32(tp, RxConfig, RX_FETCH_DFLT_8125 | RX_DMA_BURST |
>>> +                     RX_PAUSE_SLOT_ON);
>>
>> We add RX_PAUSE_SLOT_ON now for RTL_GIGA_MAC_VER_63 in addition. Makes
>> sense.
> 
> Exactly.
> 
> thanks for reviewing!
> 

Great.

For the record:

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Jakub Kicinski Dec. 2, 2023, 4:14 a.m. UTC | #4
On Wed, 29 Nov 2023 23:53:50 +0800 ChunHao Lin wrote:
> When FIFO reaches near full state, device will issue pause frame.
> If pause slot is enabled(set to 1), in this time, device will issue
> pause frame only once. But if pause slot is disabled(set to 0), device
> will keep sending pause frames until FIFO reaches near empty state.

Heiner, looks good?
Heiner Kallweit Dec. 2, 2023, 8:19 a.m. UTC | #5
On 29.11.2023 16:53, ChunHao Lin wrote:
> When FIFO reaches near full state, device will issue pause frame.
> If pause slot is enabled(set to 1), in this time, device will issue
> pause frame only once. But if pause slot is disabled(set to 0), device
> will keep sending pause frames until FIFO reaches near empty state.
> 
> When pause slot is disabled, if there is no one to handle receive
> packets, device FIFO will reach near full state and keep sending
> pause frames. That will impact entire local area network.
> 
> This issue can be reproduced in Chromebox (not Chromebook) in
> developer mode running a test image (and v5.10 kernel):
> 1) ping -f $CHROMEBOX (from workstation on same local network)
> 2) run "powerd_dbus_suspend" from command line on the $CHROMEBOX
> 3) ping $ROUTER (wait until ping fails from workstation)
> 
> Takes about ~20-30 seconds after step 2 for the local network to
> stop working.
> 
> Fix this issue by enabling pause slot to only send pause frame once
> when FIFO reaches near full state.
> 
> Fixes: f1bce4ad2f1c ("r8169: add support for RTL8125")
> Reported-by: Grant Grundler <grundler@chromium.org>
> Tested-by: Grant Grundler <grundler@chromium.org>
> Cc: stable@vger.kernel.org
> Signed-off-by: ChunHao Lin <hau@realtek.com>
> ---
> v2:
> - update comment and title.
> ---

Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
patchwork-bot+netdevbpf@kernel.org Dec. 2, 2023, 7:30 p.m. UTC | #6
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 29 Nov 2023 23:53:50 +0800 you wrote:
> When FIFO reaches near full state, device will issue pause frame.
> If pause slot is enabled(set to 1), in this time, device will issue
> pause frame only once. But if pause slot is disabled(set to 0), device
> will keep sending pause frames until FIFO reaches near empty state.
> 
> When pause slot is disabled, if there is no one to handle receive
> packets, device FIFO will reach near full state and keep sending
> pause frames. That will impact entire local area network.
> 
> [...]

Here is the summary with links:
  - [net,v2] r8169: fix rtl8125b PAUSE frames blasting when suspended
    https://git.kernel.org/netdev/net/c/4b0768b6556a

You are awesome, thank you!
diff mbox series

Patch

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 62cabeeb842a..bb787a52bc75 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -196,6 +196,7 @@  enum rtl_registers {
 					/* No threshold before first PCI xfer */
 #define	RX_FIFO_THRESH			(7 << RXCFG_FIFO_SHIFT)
 #define	RX_EARLY_OFF			(1 << 11)
+#define	RX_PAUSE_SLOT_ON		(1 << 11)	/* 8125b and later */
 #define	RXCFG_DMA_SHIFT			8
 					/* Unlimited maximum PCI burst. */
 #define	RX_DMA_BURST			(7 << RXCFG_DMA_SHIFT)
@@ -2306,9 +2307,13 @@  static void rtl_init_rxcfg(struct rtl8169_private *tp)
 	case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_53:
 		RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST | RX_EARLY_OFF);
 		break;
-	case RTL_GIGA_MAC_VER_61 ... RTL_GIGA_MAC_VER_63:
+	case RTL_GIGA_MAC_VER_61:
 		RTL_W32(tp, RxConfig, RX_FETCH_DFLT_8125 | RX_DMA_BURST);
 		break;
+	case RTL_GIGA_MAC_VER_63:
+		RTL_W32(tp, RxConfig, RX_FETCH_DFLT_8125 | RX_DMA_BURST |
+			RX_PAUSE_SLOT_ON);
+		break;
 	default:
 		RTL_W32(tp, RxConfig, RX128_INT_EN | RX_DMA_BURST);
 		break;