Message ID | 20220517092217.323060-1-joel@jms.id.au (mailing list archive) |
---|---|
State | Accepted |
Commit | 6fd45e79e8b93b8d22fb8fe22c32fbad7e9190bd |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v3] net: ftgmac100: Disable hardware checksum on AST2600 | expand |
Hello: This patch was applied to netdev/net.git (master) by David S. Miller <davem@davemloft.net>: On Tue, 17 May 2022 18:52:17 +0930 you wrote: > The AST2600 when using the i210 NIC over NC-SI has been observed to > produce incorrect checksum results with specific MTU values. This was > first observed when sending data across a long distance set of networks. > > On a local network, the following test was performed using a 1MB file of > random data. > > [...] Here is the summary with links: - [net,v3] net: ftgmac100: Disable hardware checksum on AST2600 https://git.kernel.org/netdev/net/c/6fd45e79e8b9 You are awesome, thank you!
On Tue, 2022-05-17 at 18:52 +0930, Joel Stanley wrote: > The AST2600 when using the i210 NIC over NC-SI has been observed to > produce incorrect checksum results with specific MTU values. This was > first observed when sending data across a long distance set of > networks. > > On a local network, the following test was performed using a 1MB file > of random data. Can you double check with Aspeed what's going on there and whether there's a way to instead, identify the bad case in the TX path and do on-demand SW checksuming only in those cases ? Because disabling HW checksum will kill performances afaik... (doesn't it also end up disabling zero-copy and SG ?) Cheers, Ben. > On the receiver run this script: > > #!/bin/bash > while [ 1 ]; do > # Zero the stats > nstat -r > /dev/null > nc -l 9899 > test-file > # Check for checksum errors > TcpInCsumErrors=$(nstat | grep TcpInCsumErrors) > if [ -z "$TcpInCsumErrors" ]; then > echo No TcpInCsumErrors > else > echo TcpInCsumErrors = $TcpInCsumErrors > fi > done > > On an AST2600 system: > > # nc <IP of receiver host> 9899 < test-file > > The test was repeated with various MTU values: > > # ip link set mtu 1410 dev eth0 > > The observed results: > > 1500 - good > 1434 - bad > 1400 - good > 1410 - bad > 1420 - good > > The test was repeated after disabling tx checksumming: > > # ethtool -K eth0 tx-checksumming off > > And all MTU values tested resulted in transfers without error. > > An issue with the driver cannot be ruled out, however there has been > no > bug discovered so far. > > David has done the work to take the original bug report of slow data > transfer between long distance connections and triaged it down to > this > test case. > > The vendor suspects this this is a hardware issue when using NC-SI. > The > fixes line refers to the patch that introduced AST2600 support. > > Reported-by: David Wilder <wilder@us.ibm.com> > Reviewed-by: Dylan Hung <dylan_hung@aspeedtech.com> > Signed-off-by: Joel Stanley <joel@jms.id.au> > --- > v3 modifies the wrapping of the commit message. > > v2 updates the commit message with confirmation from the vendor that > this is a hardware issue, and clarifies why the commit used in the > fixes > > drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c > b/drivers/net/ethernet/faraday/ftgmac100.c > index caf48023f8ea..5231818943c6 100644 > --- a/drivers/net/ethernet/faraday/ftgmac100.c > +++ b/drivers/net/ethernet/faraday/ftgmac100.c > @@ -1928,6 +1928,11 @@ static int ftgmac100_probe(struct > platform_device *pdev) > /* AST2400 doesn't have working HW checksum generation */ > if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac"))) > netdev->hw_features &= ~NETIF_F_HW_CSUM; > + > + /* AST2600 tx checksum with NCSI is broken */ > + if (priv->use_ncsi && of_device_is_compatible(np, > "aspeed,ast2600-mac")) > + netdev->hw_features &= ~NETIF_F_HW_CSUM; > + > if (np && of_get_property(np, "no-hw-checksum", NULL)) > netdev->hw_features &= ~(NETIF_F_HW_CSUM | > NETIF_F_RXCSUM); > netdev->features |= netdev->hw_features;
On Sat, 21 May 2022 at 02:53, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > > On Tue, 2022-05-17 at 18:52 +0930, Joel Stanley wrote: > > The AST2600 when using the i210 NIC over NC-SI has been observed to > > produce incorrect checksum results with specific MTU values. This was > > first observed when sending data across a long distance set of > > networks. > > > > On a local network, the following test was performed using a 1MB file > > of random data. > > Can you double check with Aspeed what's going on there and whether > there's a way to instead, identify the bad case in the TX path and do > on-demand SW checksuming only in those cases ? Keep in mind this is only for the NC-SI case, where the link is limited to 100Mbit anyway. I did some tests with the openbmc kernel; a v5.15 tree with whatever options we have enabled there. Averaging a few iperf3 runs I see about 92Mbit/s with hardware checksumming enabled, and 90Mbit/s with it disabled. So we can see the difference, and it would be good if Aspeed could find the root cause so this only needs to be disabled when hitting the problematic path as you say. > Because disabling HW checksum will kill performances afaik... (doesn't > it also end up disabling zero-copy and SG ?) Not sure? > > Cheers, > Ben. > > > On the receiver run this script: > > > > #!/bin/bash > > while [ 1 ]; do > > # Zero the stats > > nstat -r > /dev/null > > nc -l 9899 > test-file > > # Check for checksum errors > > TcpInCsumErrors=$(nstat | grep TcpInCsumErrors) > > if [ -z "$TcpInCsumErrors" ]; then > > echo No TcpInCsumErrors > > else > > echo TcpInCsumErrors = $TcpInCsumErrors > > fi > > done > > > > On an AST2600 system: > > > > # nc <IP of receiver host> 9899 < test-file > > > > The test was repeated with various MTU values: > > > > # ip link set mtu 1410 dev eth0 > > > > The observed results: > > > > 1500 - good > > 1434 - bad > > 1400 - good > > 1410 - bad > > 1420 - good > > > > The test was repeated after disabling tx checksumming: > > > > # ethtool -K eth0 tx-checksumming off > > > > And all MTU values tested resulted in transfers without error. > > > > An issue with the driver cannot be ruled out, however there has been > > no > > bug discovered so far. > > > > David has done the work to take the original bug report of slow data > > transfer between long distance connections and triaged it down to > > this > > test case. > > > > The vendor suspects this this is a hardware issue when using NC-SI. > > The > > fixes line refers to the patch that introduced AST2600 support. > > > > Reported-by: David Wilder <wilder@us.ibm.com> > > Reviewed-by: Dylan Hung <dylan_hung@aspeedtech.com> > > Signed-off-by: Joel Stanley <joel@jms.id.au> > > --- > > v3 modifies the wrapping of the commit message. > > > > v2 updates the commit message with confirmation from the vendor that > > this is a hardware issue, and clarifies why the commit used in the > > fixes > > > > drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c > > b/drivers/net/ethernet/faraday/ftgmac100.c > > index caf48023f8ea..5231818943c6 100644 > > --- a/drivers/net/ethernet/faraday/ftgmac100.c > > +++ b/drivers/net/ethernet/faraday/ftgmac100.c > > @@ -1928,6 +1928,11 @@ static int ftgmac100_probe(struct > > platform_device *pdev) > > /* AST2400 doesn't have working HW checksum generation */ > > if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac"))) > > netdev->hw_features &= ~NETIF_F_HW_CSUM; > > + > > + /* AST2600 tx checksum with NCSI is broken */ > > + if (priv->use_ncsi && of_device_is_compatible(np, > > "aspeed,ast2600-mac")) > > + netdev->hw_features &= ~NETIF_F_HW_CSUM; > > + > > if (np && of_get_property(np, "no-hw-checksum", NULL)) > > netdev->hw_features &= ~(NETIF_F_HW_CSUM | > > NETIF_F_RXCSUM); > > netdev->features |= netdev->hw_features; >
> > > The observed results: > > > > > > 1500 - good > > > 1434 - bad > > > 1400 - good > > > 1410 - bad > > > 1420 - good Looking at these numbers, all the good cases a divisible by 4. All the bad cases are not. Could you extend the test to automatically test 64 through 1500? Or manually try 1499, 1498, 1497, 1496. Maybe the workaround is if the packet length is divisible by 4 let the hardware do the checksum, otherwise do it in software. Andrew
diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c index caf48023f8ea..5231818943c6 100644 --- a/drivers/net/ethernet/faraday/ftgmac100.c +++ b/drivers/net/ethernet/faraday/ftgmac100.c @@ -1928,6 +1928,11 @@ static int ftgmac100_probe(struct platform_device *pdev) /* AST2400 doesn't have working HW checksum generation */ if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac"))) netdev->hw_features &= ~NETIF_F_HW_CSUM; + + /* AST2600 tx checksum with NCSI is broken */ + if (priv->use_ncsi && of_device_is_compatible(np, "aspeed,ast2600-mac")) + netdev->hw_features &= ~NETIF_F_HW_CSUM; + if (np && of_get_property(np, "no-hw-checksum", NULL)) netdev->hw_features &= ~(NETIF_F_HW_CSUM | NETIF_F_RXCSUM); netdev->features |= netdev->hw_features;