diff mbox series

[net,v3] net: ftgmac100: Disable hardware checksum on AST2600

Message ID 20220517092217.323060-1-joel@jms.id.au (mailing list archive)
State Accepted
Commit 6fd45e79e8b93b8d22fb8fe22c32fbad7e9190bd
Delegated to: Netdev Maintainers
Headers show
Series [net,v3] net: ftgmac100: Disable hardware checksum on AST2600 | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 3 maintainers not CCed: edumazet@google.com guoheyi@linux.alibaba.com pabeni@redhat.com
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: Possible repeated word: 'this'
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Joel Stanley May 17, 2022, 9:22 a.m. UTC
The AST2600 when using the i210 NIC over NC-SI has been observed to
produce incorrect checksum results with specific MTU values. This was
first observed when sending data across a long distance set of networks.

On a local network, the following test was performed using a 1MB file of
random data.

On the receiver run this script:

 #!/bin/bash
 while [ 1 ]; do
        # Zero the stats
        nstat -r  > /dev/null
        nc -l 9899 > test-file
        # Check for checksum errors
        TcpInCsumErrors=$(nstat | grep TcpInCsumErrors)
        if [ -z "$TcpInCsumErrors" ]; then
                echo No TcpInCsumErrors
        else
                echo TcpInCsumErrors = $TcpInCsumErrors
        fi
 done

On an AST2600 system:

 # nc <IP of  receiver host> 9899 < test-file

The test was repeated with various MTU values:

 # ip link set mtu 1410 dev eth0

The observed results:

 1500 - good
 1434 - bad
 1400 - good
 1410 - bad
 1420 - good

The test was repeated after disabling tx checksumming:

 # ethtool -K eth0 tx-checksumming off

And all MTU values tested resulted in transfers without error.

An issue with the driver cannot be ruled out, however there has been no
bug discovered so far.

David has done the work to take the original bug report of slow data
transfer between long distance connections and triaged it down to this
test case.

The vendor suspects this this is a hardware issue when using NC-SI. The
fixes line refers to the patch that introduced AST2600 support.

Reported-by: David Wilder <wilder@us.ibm.com>
Reviewed-by: Dylan Hung <dylan_hung@aspeedtech.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
---
v3 modifies the wrapping of the commit message.

v2 updates the commit message with confirmation from the vendor that
this is a hardware issue, and clarifies why the commit used in the fixes

 drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

patchwork-bot+netdevbpf@kernel.org May 18, 2022, 1:10 p.m. UTC | #1
Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Tue, 17 May 2022 18:52:17 +0930 you wrote:
> The AST2600 when using the i210 NIC over NC-SI has been observed to
> produce incorrect checksum results with specific MTU values. This was
> first observed when sending data across a long distance set of networks.
> 
> On a local network, the following test was performed using a 1MB file of
> random data.
> 
> [...]

Here is the summary with links:
  - [net,v3] net: ftgmac100: Disable hardware checksum on AST2600
    https://git.kernel.org/netdev/net/c/6fd45e79e8b9

You are awesome, thank you!
Benjamin Herrenschmidt May 21, 2022, 2:51 a.m. UTC | #2
On Tue, 2022-05-17 at 18:52 +0930, Joel Stanley wrote:
> The AST2600 when using the i210 NIC over NC-SI has been observed to
> produce incorrect checksum results with specific MTU values. This was
> first observed when sending data across a long distance set of
> networks.
> 
> On a local network, the following test was performed using a 1MB file
> of random data.

Can you double check with Aspeed what's going on there and whether
there's a way to instead, identify the bad case in the TX path and do
on-demand SW checksuming only in those cases ?

Because disabling HW checksum will kill performances afaik... (doesn't
it also end up disabling zero-copy and SG ?)

Cheers,
Ben.

> On the receiver run this script:
> 
>  #!/bin/bash
>  while [ 1 ]; do
>         # Zero the stats
>         nstat -r  > /dev/null
>         nc -l 9899 > test-file
>         # Check for checksum errors
>         TcpInCsumErrors=$(nstat | grep TcpInCsumErrors)
>         if [ -z "$TcpInCsumErrors" ]; then
>                 echo No TcpInCsumErrors
>         else
>                 echo TcpInCsumErrors = $TcpInCsumErrors
>         fi
>  done
> 
> On an AST2600 system:
> 
>  # nc <IP of  receiver host> 9899 < test-file
> 
> The test was repeated with various MTU values:
> 
>  # ip link set mtu 1410 dev eth0
> 
> The observed results:
> 
>  1500 - good
>  1434 - bad
>  1400 - good
>  1410 - bad
>  1420 - good
> 
> The test was repeated after disabling tx checksumming:
> 
>  # ethtool -K eth0 tx-checksumming off
> 
> And all MTU values tested resulted in transfers without error.
> 
> An issue with the driver cannot be ruled out, however there has been
> no
> bug discovered so far.
> 
> David has done the work to take the original bug report of slow data
> transfer between long distance connections and triaged it down to
> this
> test case.
> 
> The vendor suspects this this is a hardware issue when using NC-SI.
> The
> fixes line refers to the patch that introduced AST2600 support.
> 
> Reported-by: David Wilder <wilder@us.ibm.com>
> Reviewed-by: Dylan Hung <dylan_hung@aspeedtech.com>
> Signed-off-by: Joel Stanley <joel@jms.id.au>
> ---
> v3 modifies the wrapping of the commit message.
> 
> v2 updates the commit message with confirmation from the vendor that
> this is a hardware issue, and clarifies why the commit used in the
> fixes
> 
>  drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c
> b/drivers/net/ethernet/faraday/ftgmac100.c
> index caf48023f8ea..5231818943c6 100644
> --- a/drivers/net/ethernet/faraday/ftgmac100.c
> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> @@ -1928,6 +1928,11 @@ static int ftgmac100_probe(struct
> platform_device *pdev)
>  	/* AST2400  doesn't have working HW checksum generation */
>  	if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
>  		netdev->hw_features &= ~NETIF_F_HW_CSUM;
> +
> +	/* AST2600 tx checksum with NCSI is broken */
> +	if (priv->use_ncsi && of_device_is_compatible(np,
> "aspeed,ast2600-mac"))
> +		netdev->hw_features &= ~NETIF_F_HW_CSUM;
> +
>  	if (np && of_get_property(np, "no-hw-checksum", NULL))
>  		netdev->hw_features &= ~(NETIF_F_HW_CSUM |
> NETIF_F_RXCSUM);
>  	netdev->features |= netdev->hw_features;
Joel Stanley May 23, 2022, 10:25 p.m. UTC | #3
On Sat, 21 May 2022 at 02:53, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> On Tue, 2022-05-17 at 18:52 +0930, Joel Stanley wrote:
> > The AST2600 when using the i210 NIC over NC-SI has been observed to
> > produce incorrect checksum results with specific MTU values. This was
> > first observed when sending data across a long distance set of
> > networks.
> >
> > On a local network, the following test was performed using a 1MB file
> > of random data.
>
> Can you double check with Aspeed what's going on there and whether
> there's a way to instead, identify the bad case in the TX path and do
> on-demand SW checksuming only in those cases ?

Keep in mind this is only for the NC-SI case, where the link is
limited to 100Mbit anyway.

I did some tests with the openbmc kernel; a v5.15 tree with whatever
options we have enabled there.

Averaging a few iperf3 runs I see about 92Mbit/s with hardware
checksumming enabled, and 90Mbit/s with it disabled. So we can see the
difference, and it would be good if Aspeed could find the root cause
so this only needs to be disabled when hitting the problematic path as
you say.

> Because disabling HW checksum will kill performances afaik... (doesn't
> it also end up disabling zero-copy and SG ?)

Not sure?

>
> Cheers,
> Ben.
>
> > On the receiver run this script:
> >
> >  #!/bin/bash
> >  while [ 1 ]; do
> >         # Zero the stats
> >         nstat -r  > /dev/null
> >         nc -l 9899 > test-file
> >         # Check for checksum errors
> >         TcpInCsumErrors=$(nstat | grep TcpInCsumErrors)
> >         if [ -z "$TcpInCsumErrors" ]; then
> >                 echo No TcpInCsumErrors
> >         else
> >                 echo TcpInCsumErrors = $TcpInCsumErrors
> >         fi
> >  done
> >
> > On an AST2600 system:
> >
> >  # nc <IP of  receiver host> 9899 < test-file
> >
> > The test was repeated with various MTU values:
> >
> >  # ip link set mtu 1410 dev eth0
> >
> > The observed results:
> >
> >  1500 - good
> >  1434 - bad
> >  1400 - good
> >  1410 - bad
> >  1420 - good
> >
> > The test was repeated after disabling tx checksumming:
> >
> >  # ethtool -K eth0 tx-checksumming off
> >
> > And all MTU values tested resulted in transfers without error.
> >
> > An issue with the driver cannot be ruled out, however there has been
> > no
> > bug discovered so far.
> >
> > David has done the work to take the original bug report of slow data
> > transfer between long distance connections and triaged it down to
> > this
> > test case.
> >
> > The vendor suspects this this is a hardware issue when using NC-SI.
> > The
> > fixes line refers to the patch that introduced AST2600 support.
> >
> > Reported-by: David Wilder <wilder@us.ibm.com>
> > Reviewed-by: Dylan Hung <dylan_hung@aspeedtech.com>
> > Signed-off-by: Joel Stanley <joel@jms.id.au>
> > ---
> > v3 modifies the wrapping of the commit message.
> >
> > v2 updates the commit message with confirmation from the vendor that
> > this is a hardware issue, and clarifies why the commit used in the
> > fixes
> >
> >  drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c
> > b/drivers/net/ethernet/faraday/ftgmac100.c
> > index caf48023f8ea..5231818943c6 100644
> > --- a/drivers/net/ethernet/faraday/ftgmac100.c
> > +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> > @@ -1928,6 +1928,11 @@ static int ftgmac100_probe(struct
> > platform_device *pdev)
> >       /* AST2400  doesn't have working HW checksum generation */
> >       if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
> >               netdev->hw_features &= ~NETIF_F_HW_CSUM;
> > +
> > +     /* AST2600 tx checksum with NCSI is broken */
> > +     if (priv->use_ncsi && of_device_is_compatible(np,
> > "aspeed,ast2600-mac"))
> > +             netdev->hw_features &= ~NETIF_F_HW_CSUM;
> > +
> >       if (np && of_get_property(np, "no-hw-checksum", NULL))
> >               netdev->hw_features &= ~(NETIF_F_HW_CSUM |
> > NETIF_F_RXCSUM);
> >       netdev->features |= netdev->hw_features;
>
Andrew Lunn May 23, 2022, 11:44 p.m. UTC | #4
> > > The observed results:
> > >
> > >  1500 - good
> > >  1434 - bad
> > >  1400 - good
> > >  1410 - bad
> > >  1420 - good

Looking at these numbers, all the good cases a divisible by 4. All the
bad cases are not.

Could you extend the test to automatically test 64 through 1500?  Or
manually try 1499, 1498, 1497, 1496. Maybe the workaround is if the
packet length is divisible by 4 let the hardware do the checksum,
otherwise do it in software.

      Andrew
diff mbox series

Patch

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
index caf48023f8ea..5231818943c6 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1928,6 +1928,11 @@  static int ftgmac100_probe(struct platform_device *pdev)
 	/* AST2400  doesn't have working HW checksum generation */
 	if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
 		netdev->hw_features &= ~NETIF_F_HW_CSUM;
+
+	/* AST2600 tx checksum with NCSI is broken */
+	if (priv->use_ncsi && of_device_is_compatible(np, "aspeed,ast2600-mac"))
+		netdev->hw_features &= ~NETIF_F_HW_CSUM;
+
 	if (np && of_get_property(np, "no-hw-checksum", NULL))
 		netdev->hw_features &= ~(NETIF_F_HW_CSUM | NETIF_F_RXCSUM);
 	netdev->features |= netdev->hw_features;