mbox series

[net,v4,0/3] Fix large frames in the Gemini ethernet driver

Message ID 20231109-gemini-largeframe-fix-v4-0-6e611528db08@linaro.org (mailing list archive)
Headers show
Series Fix large frames in the Gemini ethernet driver | expand

Message

Linus Walleij Nov. 9, 2023, 9:03 a.m. UTC
This is the result of a bug hunt for a problem with the
RTL8366RB DSA switch leading me wrong all over the place.

I am indebted to Vladimir Oltean who as usual pointed
out where the real problem was, many thanks!

Tryig to actually use big ("jumbo") frames on this
hardware uncovered the real bugs. Then I tested it on
the DSA switch and it indeed fixes the issue.

To make sure it also works fine with big frames on
non-DSA devices I also copied a large video file over
scp to a device with maximum frame size, the data
was transported in large TCP packets ending up in
0x7ff sized frames using software checksumming at
~2.0 MB/s.

If I set down the MTU to the standard 1500 bytes so
that hardware checksumming is used, the scp transfer
of the same file was slightly lower, ~1.8-1.9 MB/s.

Despite this not being the best test it shows that
we can now stress the hardware with large frames
and that software checksum works fine.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
Changes in v4:
- Strip stray v1-related comment from the commit message on patch 1
- Move the hunks deleting gmac_fix_features() from patch
  "net: ethernet: cortina: Handle large frames" to
  "net: ethernet: cortina: Fix MTU max setting" as it is
  perfectly motivated by the MTU change, then move this patch
  later in the series.
- Drop the last patch only activating the checksum engine for
  TCP and UDP explicitly. It's not fixing a regression,
  so let's reconsider it for net-next rather than net.
- Link to v3: https://lore.kernel.org/r/20231107-gemini-largeframe-fix-v3-0-e3803c080b75@linaro.org

Changes in v3:
- Do not reimplement the existing oversize check (sigh what is
  wrong with me). Drop that patch.
- Drop the gmac_fix_features() since we are better off falling
  back to software checksums dynamically per-frame.
- Add a new patch to bypass the checksumming engine if we are not
  handling TCP or UDP.
- Link to v2: https://lore.kernel.org/r/20231105-gemini-largeframe-fix-v2-0-cd3a5aa6c496@linaro.org

Changes in v2:
- Don't check for oversized MTU request: the framework makes sure it doesn't
  happen.
- Drop unrelated BIT() macro cleanups (I might send these later for net-next)
- Use a special error code if the skbuff is too big and fail gracefully
  is this happens.
- Do proper checksum of the frame using a software fallback when the frame
  is too long for hardware checksumming.
- Link to v1: https://lore.kernel.org/r/20231104-gemini-largeframe-fix-v1-0-9c5513f22f33@linaro.org

---
Linus Walleij (3):
      net: ethernet: cortina: Fix max RX frame define
      net: ethernet: cortina: Handle large frames
      net: ethernet: cortina: Fix MTU max setting

 drivers/net/ethernet/cortina/gemini.c | 45 ++++++++++++++++++++++-------------
 drivers/net/ethernet/cortina/gemini.h |  4 ++--
 2 files changed, 31 insertions(+), 18 deletions(-)
---
base-commit: ffc253263a1375a65fa6c9f62a893e9767fbebfa
change-id: 20231104-gemini-largeframe-fix-c143d2c781b5

Best regards,

Comments

Vladimir Oltean Nov. 9, 2023, 10:50 a.m. UTC | #1
On Thu, Nov 09, 2023 at 10:03:11AM +0100, Linus Walleij wrote:
> This is the result of a bug hunt for a problem with the
> RTL8366RB DSA switch leading me wrong all over the place.
> 
> I am indebted to Vladimir Oltean who as usual pointed
> out where the real problem was, many thanks!
> 
> Tryig to actually use big ("jumbo") frames on this
> hardware uncovered the real bugs. Then I tested it on
> the DSA switch and it indeed fixes the issue.
> 
> To make sure it also works fine with big frames on
> non-DSA devices I also copied a large video file over
> scp to a device with maximum frame size, the data
> was transported in large TCP packets ending up in
> 0x7ff sized frames using software checksumming at
> ~2.0 MB/s.
> 
> If I set down the MTU to the standard 1500 bytes so
> that hardware checksumming is used, the scp transfer
> of the same file was slightly lower, ~1.8-1.9 MB/s.
> 
> Despite this not being the best test it shows that
> we can now stress the hardware with large frames
> and that software checksum works fine.
> 
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> ---

Thanks for being persistent with this! I hope we didn't miss today's
"net" pull request :)
Paolo Abeni Nov. 9, 2023, 12:26 p.m. UTC | #2
On Thu, 2023-11-09 at 12:50 +0200, Vladimir Oltean wrote:
> On Thu, Nov 09, 2023 at 10:03:11AM +0100, Linus Walleij wrote:
> > This is the result of a bug hunt for a problem with the
> > RTL8366RB DSA switch leading me wrong all over the place.
> > 
> > I am indebted to Vladimir Oltean who as usual pointed
> > out where the real problem was, many thanks!
> > 
> > Tryig to actually use big ("jumbo") frames on this
> > hardware uncovered the real bugs. Then I tested it on
> > the DSA switch and it indeed fixes the issue.
> > 
> > To make sure it also works fine with big frames on
> > non-DSA devices I also copied a large video file over
> > scp to a device with maximum frame size, the data
> > was transported in large TCP packets ending up in
> > 0x7ff sized frames using software checksumming at
> > ~2.0 MB/s.
> > 
> > If I set down the MTU to the standard 1500 bytes so
> > that hardware checksumming is used, the scp transfer
> > of the same file was slightly lower, ~1.8-1.9 MB/s.
> > 
> > Despite this not being the best test it shows that
> > we can now stress the hardware with large frames
> > and that software checksum works fine.
> > 
> > Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> > ---
> 
> Thanks for being persistent with this! I hope we didn't miss today's
> "net" pull request :)

I fear this is a bit too late for today's PR. I hope it should not be a
big problem, since we are very early in the release cycle.

Cheers,

Paolo
Linus Walleij Nov. 9, 2023, 12:46 p.m. UTC | #3
On Thu, Nov 9, 2023 at 11:50 AM Vladimir Oltean <olteanv@gmail.com> wrote:

> Thanks for being persistent with this! I hope we didn't miss today's
> "net" pull request :)

Hey thanks for one of the best review cycles I've ever had, really
really appreciated.

It's more important to be correct than to be fast so I don't worry
much about when the patch goes in.

Yours,
Linus Walleij
Vladimir Oltean Nov. 9, 2023, 5:13 p.m. UTC | #4
On Thu, Nov 09, 2023 at 01:26:17PM +0100, Paolo Abeni wrote:
> I fear this is a bit too late for today's PR. I hope it should not be a
> big problem, since we are very early in the release cycle.

No problem from my side.
patchwork-bot+netdevbpf@kernel.org Nov. 14, 2023, 5 a.m. UTC | #5
Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 09 Nov 2023 10:03:11 +0100 you wrote:
> This is the result of a bug hunt for a problem with the
> RTL8366RB DSA switch leading me wrong all over the place.
> 
> I am indebted to Vladimir Oltean who as usual pointed
> out where the real problem was, many thanks!
> 
> Tryig to actually use big ("jumbo") frames on this
> hardware uncovered the real bugs. Then I tested it on
> the DSA switch and it indeed fixes the issue.
> 
> [...]

Here is the summary with links:
  - [net,v4,1/3] net: ethernet: cortina: Fix max RX frame define
    https://git.kernel.org/netdev/net/c/510e35fb931f
  - [net,v4,2/3] net: ethernet: cortina: Handle large frames
    https://git.kernel.org/netdev/net/c/d4d0c5b4d279
  - [net,v4,3/3] net: ethernet: cortina: Fix MTU max setting
    https://git.kernel.org/netdev/net/c/dc6c0bfbaa94

You are awesome, thank you!