diff mbox series

r8169: add support for 16K jumbo frames on RTL8125B

Message ID 20250228173505.3636-1-rsalvaterra@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series r8169: add support for 16K jumbo frames on RTL8125B | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 4 maintainers not CCed: andrew+netdev@lunn.ch edumazet@google.com kuba@kernel.org pabeni@redhat.com
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 16 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-03-05--06-00 (tests: 894)

Commit Message

Rui Salvaterra Feb. 28, 2025, 5:30 p.m. UTC
It's supported, according to the specifications.

Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
---

It's very likely that other RTL8125x devices also support 16K jumbo frames, but
I only have RTL8125B ones to test with. Additionally, I've only tested up to 12K
(my switch's limit).

 drivers/net/ethernet/realtek/r8169_main.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Heiner Kallweit Feb. 28, 2025, 8:23 p.m. UTC | #1
On 28.02.2025 18:30, Rui Salvaterra wrote:
> It's supported, according to the specifications.
> 
> Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
> ---
> 
> It's very likely that other RTL8125x devices also support 16K jumbo frames, but
> I only have RTL8125B ones to test with. Additionally, I've only tested up to 12K
> (my switch's limit).
> 
This has been proposed and discussed before. Decision was to not increase
the max jumbo packet size, as vendor drivers r8125/r8126 also support max 9k.
And in general it's not clear whether you would gain anything from jumbo packets,
because hw TSO and c'summing aren't supported for jumbo packets.
Rui Salvaterra March 1, 2025, 11:45 a.m. UTC | #2
Hi, Heiner,

On Fri, 28 Feb 2025 at 20:22, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> This has been proposed and discussed before. Decision was to not increase
> the max jumbo packet size, as vendor drivers r8125/r8126 also support max 9k.

I did a cursory search around the mailing list, but didn't find
anything specific. Maybe I didn't look hard enough. However…

> And in general it's not clear whether you would gain anything from jumbo packets,
> because hw TSO and c'summing aren't supported for jumbo packets.

… I actually have numbers to justify it. For my use case, jumbo frames
make a *huge* difference. I have an Atom 330-based file server, this
CPU is too slow to saturate the link with a MTU of 1500 bytes. The
situation, however, changes dramatically when I use jumbo frames. Case
in point…


MTU = 1500 bytes:

Accepted connection from 192.168.17.20, port 55514
[  5] local 192.168.17.16 port 5201 connected to 192.168.17.20 port 55524
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   241 MBytes  2.02 Gbits/sec
[  5]   1.00-2.00   sec   242 MBytes  2.03 Gbits/sec
[  5]   2.00-3.00   sec   242 MBytes  2.03 Gbits/sec
[  5]   3.00-4.00   sec   242 MBytes  2.03 Gbits/sec
[  5]   4.00-5.00   sec   242 MBytes  2.03 Gbits/sec
[  5]   5.00-6.00   sec   242 MBytes  2.03 Gbits/sec
[  5]   6.00-7.00   sec   242 MBytes  2.03 Gbits/sec
[  5]   7.00-8.00   sec   242 MBytes  2.03 Gbits/sec
[  5]   8.00-9.00   sec   242 MBytes  2.03 Gbits/sec
[  5]   9.00-10.00  sec   242 MBytes  2.03 Gbits/sec
[  5]  10.00-10.00  sec   128 KBytes  1.27 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  2.36 GBytes  2.03 Gbits/sec                  receiver


MTU = 9000 bytes:

Accepted connection from 192.168.17.20, port 53474
[  5] local 192.168.17.16 port 5201 connected to 192.168.17.20 port 53490
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   295 MBytes  2.47 Gbits/sec
[  5]   1.00-2.00   sec   295 MBytes  2.47 Gbits/sec
[  5]   2.00-3.00   sec   294 MBytes  2.47 Gbits/sec
[  5]   3.00-4.00   sec   295 MBytes  2.47 Gbits/sec
[  5]   4.00-5.00   sec   294 MBytes  2.47 Gbits/sec
[  5]   5.00-6.00   sec   295 MBytes  2.47 Gbits/sec
[  5]   6.00-7.00   sec   295 MBytes  2.47 Gbits/sec
[  5]   7.00-8.00   sec   295 MBytes  2.47 Gbits/sec
[  5]   8.00-9.00   sec   295 MBytes  2.47 Gbits/sec
[  5]   9.00-10.00  sec   295 MBytes  2.47 Gbits/sec
[  5]  10.00-10.00  sec   384 KBytes  2.38 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  2.88 GBytes  2.47 Gbits/sec                  receiver


MTU = 12000 bytes (with my patch):

Accepted connection from 192.168.17.20, port 59378
[  5] local 192.168.17.16 port 5201 connected to 192.168.17.20 port 59388
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   296 MBytes  2.48 Gbits/sec
[  5]   1.00-2.00   sec   296 MBytes  2.48 Gbits/sec
[  5]   2.00-3.00   sec   295 MBytes  2.48 Gbits/sec
[  5]   3.00-4.00   sec   296 MBytes  2.48 Gbits/sec
[  5]   4.00-5.00   sec   295 MBytes  2.48 Gbits/sec
[  5]   5.00-6.00   sec   296 MBytes  2.48 Gbits/sec
[  5]   6.00-7.00   sec   295 MBytes  2.48 Gbits/sec
[  5]   7.00-8.00   sec   296 MBytes  2.48 Gbits/sec
[  5]   8.00-9.00   sec   296 MBytes  2.48 Gbits/sec
[  5]   9.00-10.00  sec   294 MBytes  2.47 Gbits/sec
[  5]  10.00-10.00  sec   512 KBytes  2.49 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  2.89 GBytes  2.48 Gbits/sec                  receiver


This demonstrates that the bottleneck is in the frame processing. With
a larger frame size, the number of checksum calculations is also
lower, for the same amount of payload data, and the CPU is able to
handle them.


Kind regards,

Rui Salvaterra
Heiner Kallweit March 1, 2025, 2:14 p.m. UTC | #3
On 01.03.2025 12:45, Rui Salvaterra wrote:
> Hi, Heiner,
> 
> On Fri, 28 Feb 2025 at 20:22, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> This has been proposed and discussed before. Decision was to not increase
>> the max jumbo packet size, as vendor drivers r8125/r8126 also support max 9k.
> 
> I did a cursory search around the mailing list, but didn't find
> anything specific. Maybe I didn't look hard enough. However…
> 
>> And in general it's not clear whether you would gain anything from jumbo packets,
>> because hw TSO and c'summing aren't supported for jumbo packets.
> 
> … I actually have numbers to justify it. For my use case, jumbo frames
> make a *huge* difference. I have an Atom 330-based file server, this
> CPU is too slow to saturate the link with a MTU of 1500 bytes. The
> situation, however, changes dramatically when I use jumbo frames. Case
> in point…
> 
> 
> MTU = 1500 bytes:
> 
> Accepted connection from 192.168.17.20, port 55514
> [  5] local 192.168.17.16 port 5201 connected to 192.168.17.20 port 55524
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-1.00   sec   241 MBytes  2.02 Gbits/sec
> [  5]   1.00-2.00   sec   242 MBytes  2.03 Gbits/sec
> [  5]   2.00-3.00   sec   242 MBytes  2.03 Gbits/sec
> [  5]   3.00-4.00   sec   242 MBytes  2.03 Gbits/sec
> [  5]   4.00-5.00   sec   242 MBytes  2.03 Gbits/sec
> [  5]   5.00-6.00   sec   242 MBytes  2.03 Gbits/sec
> [  5]   6.00-7.00   sec   242 MBytes  2.03 Gbits/sec
> [  5]   7.00-8.00   sec   242 MBytes  2.03 Gbits/sec
> [  5]   8.00-9.00   sec   242 MBytes  2.03 Gbits/sec
> [  5]   9.00-10.00  sec   242 MBytes  2.03 Gbits/sec
> [  5]  10.00-10.00  sec   128 KBytes  1.27 Gbits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-10.00  sec  2.36 GBytes  2.03 Gbits/sec                  receiver
> 
Depending on the kernel version HW TSO may be be off per default.
Use ethtool to check/enable HW TSO, and see whether speed improves.

> 
> MTU = 9000 bytes:
> 
> Accepted connection from 192.168.17.20, port 53474
> [  5] local 192.168.17.16 port 5201 connected to 192.168.17.20 port 53490
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-1.00   sec   295 MBytes  2.47 Gbits/sec
> [  5]   1.00-2.00   sec   295 MBytes  2.47 Gbits/sec
> [  5]   2.00-3.00   sec   294 MBytes  2.47 Gbits/sec
> [  5]   3.00-4.00   sec   295 MBytes  2.47 Gbits/sec
> [  5]   4.00-5.00   sec   294 MBytes  2.47 Gbits/sec
> [  5]   5.00-6.00   sec   295 MBytes  2.47 Gbits/sec
> [  5]   6.00-7.00   sec   295 MBytes  2.47 Gbits/sec
> [  5]   7.00-8.00   sec   295 MBytes  2.47 Gbits/sec
> [  5]   8.00-9.00   sec   295 MBytes  2.47 Gbits/sec
> [  5]   9.00-10.00  sec   295 MBytes  2.47 Gbits/sec
> [  5]  10.00-10.00  sec   384 KBytes  2.38 Gbits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-10.00  sec  2.88 GBytes  2.47 Gbits/sec                  receiver
> 
> 
> MTU = 12000 bytes (with my patch):
> 
> Accepted connection from 192.168.17.20, port 59378
> [  5] local 192.168.17.16 port 5201 connected to 192.168.17.20 port 59388
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-1.00   sec   296 MBytes  2.48 Gbits/sec
> [  5]   1.00-2.00   sec   296 MBytes  2.48 Gbits/sec
> [  5]   2.00-3.00   sec   295 MBytes  2.48 Gbits/sec
> [  5]   3.00-4.00   sec   296 MBytes  2.48 Gbits/sec
> [  5]   4.00-5.00   sec   295 MBytes  2.48 Gbits/sec
> [  5]   5.00-6.00   sec   296 MBytes  2.48 Gbits/sec
> [  5]   6.00-7.00   sec   295 MBytes  2.48 Gbits/sec
> [  5]   7.00-8.00   sec   296 MBytes  2.48 Gbits/sec
> [  5]   8.00-9.00   sec   296 MBytes  2.48 Gbits/sec
> [  5]   9.00-10.00  sec   294 MBytes  2.47 Gbits/sec
> [  5]  10.00-10.00  sec   512 KBytes  2.49 Gbits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-10.00  sec  2.89 GBytes  2.48 Gbits/sec                  receiver
> 
> 
> This demonstrates that the bottleneck is in the frame processing. With
> a larger frame size, the number of checksum calculations is also
> lower, for the same amount of payload data, and the CPU is able to
> handle them.
> 
> 
> Kind regards,
> 
> Rui Salvaterra
Rui Salvaterra March 1, 2025, 7:46 p.m. UTC | #4
Hi again, Heiner,


On Sat, 1 Mar 2025 at 14:12, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> Depending on the kernel version HW TSO may be be off per default.
> Use ethtool to check/enable HW TSO, and see whether speed improves.

I'm running Linux 6.14-rc4 with my patch. Output from ethtool, when
the MTU is set to 1500:

tcp-segmentation-offload: on
    tx-tcp-segmentation: on
    tx-tcp-ecn-segmentation: off [fixed]
    tx-tcp-mangleid-segmentation: off
    tx-tcp6-segmentation: on


When the MTU is set to 12000:

tcp-segmentation-offload: off
    tx-tcp-segmentation: off [requested on]
    tx-tcp-ecn-segmentation: off [fixed]
    tx-tcp-mangleid-segmentation: off
    tx-tcp6-segmentation: off [requested on]

Which means my test, with a MTU of 1500, was already done with
hardware TSO offloading enabled.


Kind regards,

Rui Salvaterra
Jakub Kicinski March 6, 2025, 1:59 a.m. UTC | #5
On Fri, 28 Feb 2025 17:30:31 +0000 Rui Salvaterra wrote:
> It's supported, according to the specifications.

Hi Heiner ! Are you okay with this or do you prefer to stick to vendor
supported max?
Heiner Kallweit March 7, 2025, 7:09 a.m. UTC | #6
On 06.03.2025 02:59, Jakub Kicinski wrote:
> On Fri, 28 Feb 2025 17:30:31 +0000 Rui Salvaterra wrote:
>> It's supported, according to the specifications.
> 
> Hi Heiner ! Are you okay with this or do you prefer to stick to vendor
> supported max?

I got a feedback from Realtek that 16k jumbo packets are supported on
all RTL8125/RTL8126 chip versions. They just didn't extend their vendor
drivers because there hasn't been a customer request yet.
I'll adjust the proposed patch accordingly.

--
pw-bot: cr
diff mbox series

Patch

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 5a5eba49c651..2d9fd2b70735 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -89,6 +89,7 @@ 
 #define JUMBO_6K	(6 * SZ_1K - VLAN_ETH_HLEN - ETH_FCS_LEN)
 #define JUMBO_7K	(7 * SZ_1K - VLAN_ETH_HLEN - ETH_FCS_LEN)
 #define JUMBO_9K	(9 * SZ_1K - VLAN_ETH_HLEN - ETH_FCS_LEN)
+#define JUMBO_16K	(16 * SZ_1K - VLAN_ETH_HLEN - ETH_FCS_LEN)
 
 static const struct {
 	const char *name;
@@ -5326,6 +5327,9 @@  static int rtl_jumbo_max(struct rtl8169_private *tp)
 	/* RTL8168c */
 	case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24:
 		return JUMBO_6K;
+	/* RTL8125B */
+	case RTL_GIGA_MAC_VER_63:
+		return JUMBO_16K;
 	default:
 		return JUMBO_9K;
 	}