Message ID | 20250228173505.3636-1-rsalvaterra@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | r8169: add support for 16K jumbo frames on RTL8125B | expand |
On 28.02.2025 18:30, Rui Salvaterra wrote: > It's supported, according to the specifications. > > Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com> > --- > > It's very likely that other RTL8125x devices also support 16K jumbo frames, but > I only have RTL8125B ones to test with. Additionally, I've only tested up to 12K > (my switch's limit). > This has been proposed and discussed before. Decision was to not increase the max jumbo packet size, as vendor drivers r8125/r8126 also support max 9k. And in general it's not clear whether you would gain anything from jumbo packets, because hw TSO and c'summing aren't supported for jumbo packets.
Hi, Heiner, On Fri, 28 Feb 2025 at 20:22, Heiner Kallweit <hkallweit1@gmail.com> wrote: > > This has been proposed and discussed before. Decision was to not increase > the max jumbo packet size, as vendor drivers r8125/r8126 also support max 9k. I did a cursory search around the mailing list, but didn't find anything specific. Maybe I didn't look hard enough. However… > And in general it's not clear whether you would gain anything from jumbo packets, > because hw TSO and c'summing aren't supported for jumbo packets. … I actually have numbers to justify it. For my use case, jumbo frames make a *huge* difference. I have an Atom 330-based file server, this CPU is too slow to saturate the link with a MTU of 1500 bytes. The situation, however, changes dramatically when I use jumbo frames. Case in point… MTU = 1500 bytes: Accepted connection from 192.168.17.20, port 55514 [ 5] local 192.168.17.16 port 5201 connected to 192.168.17.20 port 55524 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 241 MBytes 2.02 Gbits/sec [ 5] 1.00-2.00 sec 242 MBytes 2.03 Gbits/sec [ 5] 2.00-3.00 sec 242 MBytes 2.03 Gbits/sec [ 5] 3.00-4.00 sec 242 MBytes 2.03 Gbits/sec [ 5] 4.00-5.00 sec 242 MBytes 2.03 Gbits/sec [ 5] 5.00-6.00 sec 242 MBytes 2.03 Gbits/sec [ 5] 6.00-7.00 sec 242 MBytes 2.03 Gbits/sec [ 5] 7.00-8.00 sec 242 MBytes 2.03 Gbits/sec [ 5] 8.00-9.00 sec 242 MBytes 2.03 Gbits/sec [ 5] 9.00-10.00 sec 242 MBytes 2.03 Gbits/sec [ 5] 10.00-10.00 sec 128 KBytes 1.27 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate [ 5] 0.00-10.00 sec 2.36 GBytes 2.03 Gbits/sec receiver MTU = 9000 bytes: Accepted connection from 192.168.17.20, port 53474 [ 5] local 192.168.17.16 port 5201 connected to 192.168.17.20 port 53490 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 295 MBytes 2.47 Gbits/sec [ 5] 1.00-2.00 sec 295 MBytes 2.47 Gbits/sec [ 5] 2.00-3.00 sec 294 MBytes 2.47 Gbits/sec [ 5] 3.00-4.00 sec 295 MBytes 2.47 Gbits/sec [ 5] 4.00-5.00 sec 294 MBytes 2.47 Gbits/sec [ 5] 5.00-6.00 sec 295 MBytes 2.47 Gbits/sec [ 5] 6.00-7.00 sec 295 MBytes 2.47 Gbits/sec [ 5] 7.00-8.00 sec 295 MBytes 2.47 Gbits/sec [ 5] 8.00-9.00 sec 295 MBytes 2.47 Gbits/sec [ 5] 9.00-10.00 sec 295 MBytes 2.47 Gbits/sec [ 5] 10.00-10.00 sec 384 KBytes 2.38 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate [ 5] 0.00-10.00 sec 2.88 GBytes 2.47 Gbits/sec receiver MTU = 12000 bytes (with my patch): Accepted connection from 192.168.17.20, port 59378 [ 5] local 192.168.17.16 port 5201 connected to 192.168.17.20 port 59388 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 296 MBytes 2.48 Gbits/sec [ 5] 1.00-2.00 sec 296 MBytes 2.48 Gbits/sec [ 5] 2.00-3.00 sec 295 MBytes 2.48 Gbits/sec [ 5] 3.00-4.00 sec 296 MBytes 2.48 Gbits/sec [ 5] 4.00-5.00 sec 295 MBytes 2.48 Gbits/sec [ 5] 5.00-6.00 sec 296 MBytes 2.48 Gbits/sec [ 5] 6.00-7.00 sec 295 MBytes 2.48 Gbits/sec [ 5] 7.00-8.00 sec 296 MBytes 2.48 Gbits/sec [ 5] 8.00-9.00 sec 296 MBytes 2.48 Gbits/sec [ 5] 9.00-10.00 sec 294 MBytes 2.47 Gbits/sec [ 5] 10.00-10.00 sec 512 KBytes 2.49 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate [ 5] 0.00-10.00 sec 2.89 GBytes 2.48 Gbits/sec receiver This demonstrates that the bottleneck is in the frame processing. With a larger frame size, the number of checksum calculations is also lower, for the same amount of payload data, and the CPU is able to handle them. Kind regards, Rui Salvaterra
On 01.03.2025 12:45, Rui Salvaterra wrote: > Hi, Heiner, > > On Fri, 28 Feb 2025 at 20:22, Heiner Kallweit <hkallweit1@gmail.com> wrote: >> >> This has been proposed and discussed before. Decision was to not increase >> the max jumbo packet size, as vendor drivers r8125/r8126 also support max 9k. > > I did a cursory search around the mailing list, but didn't find > anything specific. Maybe I didn't look hard enough. However… > >> And in general it's not clear whether you would gain anything from jumbo packets, >> because hw TSO and c'summing aren't supported for jumbo packets. > > … I actually have numbers to justify it. For my use case, jumbo frames > make a *huge* difference. I have an Atom 330-based file server, this > CPU is too slow to saturate the link with a MTU of 1500 bytes. The > situation, however, changes dramatically when I use jumbo frames. Case > in point… > > > MTU = 1500 bytes: > > Accepted connection from 192.168.17.20, port 55514 > [ 5] local 192.168.17.16 port 5201 connected to 192.168.17.20 port 55524 > [ ID] Interval Transfer Bitrate > [ 5] 0.00-1.00 sec 241 MBytes 2.02 Gbits/sec > [ 5] 1.00-2.00 sec 242 MBytes 2.03 Gbits/sec > [ 5] 2.00-3.00 sec 242 MBytes 2.03 Gbits/sec > [ 5] 3.00-4.00 sec 242 MBytes 2.03 Gbits/sec > [ 5] 4.00-5.00 sec 242 MBytes 2.03 Gbits/sec > [ 5] 5.00-6.00 sec 242 MBytes 2.03 Gbits/sec > [ 5] 6.00-7.00 sec 242 MBytes 2.03 Gbits/sec > [ 5] 7.00-8.00 sec 242 MBytes 2.03 Gbits/sec > [ 5] 8.00-9.00 sec 242 MBytes 2.03 Gbits/sec > [ 5] 9.00-10.00 sec 242 MBytes 2.03 Gbits/sec > [ 5] 10.00-10.00 sec 128 KBytes 1.27 Gbits/sec > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bitrate > [ 5] 0.00-10.00 sec 2.36 GBytes 2.03 Gbits/sec receiver > Depending on the kernel version HW TSO may be be off per default. Use ethtool to check/enable HW TSO, and see whether speed improves. > > MTU = 9000 bytes: > > Accepted connection from 192.168.17.20, port 53474 > [ 5] local 192.168.17.16 port 5201 connected to 192.168.17.20 port 53490 > [ ID] Interval Transfer Bitrate > [ 5] 0.00-1.00 sec 295 MBytes 2.47 Gbits/sec > [ 5] 1.00-2.00 sec 295 MBytes 2.47 Gbits/sec > [ 5] 2.00-3.00 sec 294 MBytes 2.47 Gbits/sec > [ 5] 3.00-4.00 sec 295 MBytes 2.47 Gbits/sec > [ 5] 4.00-5.00 sec 294 MBytes 2.47 Gbits/sec > [ 5] 5.00-6.00 sec 295 MBytes 2.47 Gbits/sec > [ 5] 6.00-7.00 sec 295 MBytes 2.47 Gbits/sec > [ 5] 7.00-8.00 sec 295 MBytes 2.47 Gbits/sec > [ 5] 8.00-9.00 sec 295 MBytes 2.47 Gbits/sec > [ 5] 9.00-10.00 sec 295 MBytes 2.47 Gbits/sec > [ 5] 10.00-10.00 sec 384 KBytes 2.38 Gbits/sec > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bitrate > [ 5] 0.00-10.00 sec 2.88 GBytes 2.47 Gbits/sec receiver > > > MTU = 12000 bytes (with my patch): > > Accepted connection from 192.168.17.20, port 59378 > [ 5] local 192.168.17.16 port 5201 connected to 192.168.17.20 port 59388 > [ ID] Interval Transfer Bitrate > [ 5] 0.00-1.00 sec 296 MBytes 2.48 Gbits/sec > [ 5] 1.00-2.00 sec 296 MBytes 2.48 Gbits/sec > [ 5] 2.00-3.00 sec 295 MBytes 2.48 Gbits/sec > [ 5] 3.00-4.00 sec 296 MBytes 2.48 Gbits/sec > [ 5] 4.00-5.00 sec 295 MBytes 2.48 Gbits/sec > [ 5] 5.00-6.00 sec 296 MBytes 2.48 Gbits/sec > [ 5] 6.00-7.00 sec 295 MBytes 2.48 Gbits/sec > [ 5] 7.00-8.00 sec 296 MBytes 2.48 Gbits/sec > [ 5] 8.00-9.00 sec 296 MBytes 2.48 Gbits/sec > [ 5] 9.00-10.00 sec 294 MBytes 2.47 Gbits/sec > [ 5] 10.00-10.00 sec 512 KBytes 2.49 Gbits/sec > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bitrate > [ 5] 0.00-10.00 sec 2.89 GBytes 2.48 Gbits/sec receiver > > > This demonstrates that the bottleneck is in the frame processing. With > a larger frame size, the number of checksum calculations is also > lower, for the same amount of payload data, and the CPU is able to > handle them. > > > Kind regards, > > Rui Salvaterra
Hi again, Heiner, On Sat, 1 Mar 2025 at 14:12, Heiner Kallweit <hkallweit1@gmail.com> wrote: > > Depending on the kernel version HW TSO may be be off per default. > Use ethtool to check/enable HW TSO, and see whether speed improves. I'm running Linux 6.14-rc4 with my patch. Output from ethtool, when the MTU is set to 1500: tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: off [fixed] tx-tcp-mangleid-segmentation: off tx-tcp6-segmentation: on When the MTU is set to 12000: tcp-segmentation-offload: off tx-tcp-segmentation: off [requested on] tx-tcp-ecn-segmentation: off [fixed] tx-tcp-mangleid-segmentation: off tx-tcp6-segmentation: off [requested on] Which means my test, with a MTU of 1500, was already done with hardware TSO offloading enabled. Kind regards, Rui Salvaterra
On Fri, 28 Feb 2025 17:30:31 +0000 Rui Salvaterra wrote:
> It's supported, according to the specifications.
Hi Heiner ! Are you okay with this or do you prefer to stick to vendor
supported max?
On 06.03.2025 02:59, Jakub Kicinski wrote: > On Fri, 28 Feb 2025 17:30:31 +0000 Rui Salvaterra wrote: >> It's supported, according to the specifications. > > Hi Heiner ! Are you okay with this or do you prefer to stick to vendor > supported max? I got a feedback from Realtek that 16k jumbo packets are supported on all RTL8125/RTL8126 chip versions. They just didn't extend their vendor drivers because there hasn't been a customer request yet. I'll adjust the proposed patch accordingly. -- pw-bot: cr
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 5a5eba49c651..2d9fd2b70735 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -89,6 +89,7 @@ #define JUMBO_6K (6 * SZ_1K - VLAN_ETH_HLEN - ETH_FCS_LEN) #define JUMBO_7K (7 * SZ_1K - VLAN_ETH_HLEN - ETH_FCS_LEN) #define JUMBO_9K (9 * SZ_1K - VLAN_ETH_HLEN - ETH_FCS_LEN) +#define JUMBO_16K (16 * SZ_1K - VLAN_ETH_HLEN - ETH_FCS_LEN) static const struct { const char *name; @@ -5326,6 +5327,9 @@ static int rtl_jumbo_max(struct rtl8169_private *tp) /* RTL8168c */ case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24: return JUMBO_6K; + /* RTL8125B */ + case RTL_GIGA_MAC_VER_63: + return JUMBO_16K; default: return JUMBO_9K; }
It's supported, according to the specifications. Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com> --- It's very likely that other RTL8125x devices also support 16K jumbo frames, but I only have RTL8125B ones to test with. Additionally, I've only tested up to 12K (my switch's limit). drivers/net/ethernet/realtek/r8169_main.c | 4 ++++ 1 file changed, 4 insertions(+)