mbox series

[net,0/1] Prevent DSA tags from breaking COE

Message ID 20231218162326.173127-1-romain.gantois@bootlin.com (mailing list archive)
Headers show
Series Prevent DSA tags from breaking COE | expand

Message

Romain Gantois Dec. 18, 2023, 4:23 p.m. UTC
Hello everyone,

This is a bugfix for an issue that was recently brought up in two
reports:

https://lore.kernel.org/netdev/c57283ed-6b9b-b0e6-ee12-5655c1c54495@bootlin.com/
https://lore.kernel.org/netdev/e5c6c75f-2dfa-4e50-a1fb-6bf4cdb617c2@electromag.com.au/

The Checksum Offloading Engine of some stmmac cores (e.g. DWMAC1000)
computes an incorrect checksum when presented with DSA-tagged packets. This
causes all TCP/UDP transfers to break when the stmmac device is connected
to the CPU port of a DSA switch.

The main change introduced by this series is a new stmmac dma feature that
stmmac_mac_link_up() can check to detect cores that have DSA-incompatible
COEs. If the flag is set and the netdevice uses DSA, stmmac_xmit() will
complete checksumming in software instead of offloading it.

I've run some iperf3 tests and the TX hotpath performance doesn't seem
to be degraded by the field added to dma_features.

Best Regards,

Romain

Romain Gantois (1):
  net: stmmac: Prevent DSA tags from breaking COE

 drivers/net/ethernet/stmicro/stmmac/common.h     |  1 +
 .../net/ethernet/stmicro/stmmac/dwmac1000_dma.c  |  1 +
 .../net/ethernet/stmicro/stmmac/stmmac_main.c    | 16 +++++++++++++++-
 3 files changed, 17 insertions(+), 1 deletion(-)

Comments

Andrew Lunn Dec. 18, 2023, 4:41 p.m. UTC | #1
On Mon, Dec 18, 2023 at 05:23:22PM +0100, Romain Gantois wrote:
> Hello everyone,
> 
> This is a bugfix for an issue that was recently brought up in two
> reports:
> 
> https://lore.kernel.org/netdev/c57283ed-6b9b-b0e6-ee12-5655c1c54495@bootlin.com/
> https://lore.kernel.org/netdev/e5c6c75f-2dfa-4e50-a1fb-6bf4cdb617c2@electromag.com.au/
> 
> The Checksum Offloading Engine of some stmmac cores (e.g. DWMAC1000)
> computes an incorrect checksum when presented with DSA-tagged packets. This
> causes all TCP/UDP transfers to break when the stmmac device is connected
> to the CPU port of a DSA switch.

Probably a dumb question.... Does this COE also perform checksum
validation on receive? Is it also getting confused by the DSA header?

You must of tested receive, so it works somehow, but i just wounder if
something needs to be done to be on the safe side?

	  Andrew
Romain Gantois Dec. 19, 2023, 9:50 a.m. UTC | #2
On Mon, 18 Dec 2023, Andrew Lunn wrote:
...
> Probably a dumb question.... Does this COE also perform checksum
> validation on receive? Is it also getting confused by the DSA header?
> 
> You must of tested receive, so it works somehow, but i just wounder if
> something needs to be done to be on the safe side?

That's a good point, I just investigated the RX path a bit more and the MAC 
indeed has IP/TCP/UDP RX checksum offloading enabled. However, the 
external switch in my setup uses EDSA tags, which displace the "true" ethertype 
field to the end of the DSA header and replaces the "normal" ethertype with 
ETH_P_EDSA (0xdada). So to the MAC controller, the ethernet frame has an unknown 
ethertype, and so it doesn't see it as an IP frame at all. All of the 
ethtool counters related to IP stuff are at 0, which supports this.

This explains why checksum offloading doesn't break the RX path in my case. 
However, other maybe other DSA switches using different frame formats could 
cause different behavior? Considering this, I think it would be safer to change 
the dsa_breaks_tx_coe flag to a general dsa_breaks_coe flag. It makes sense to 
me to assume that if DSA tags break TX COE, then RX COE will also not work.

I'll take this into account when I send a v2.

Best Regards,
Vladimir Oltean Dec. 19, 2023, 12:30 p.m. UTC | #3
On Mon, Dec 18, 2023 at 05:23:22PM +0100, Romain Gantois wrote:
> I've run some iperf3 tests and the TX hotpath performance doesn't seem
> to be degraded by the field added to dma_features.

I don't know what CPU cores you are using, but if the iperf3 performance
was line rate at gigabit before and is line rate at gigabit now, you
haven't effectively measured the impact of the change (and "doesn't seem
to be degraded" is a false conclusion). You need something more CPU
intensive to see the difference, like IP forwarding of 64 byte packets.

A very simplistic way to set up IP forwarding between 2 DSA user ports
is to do this on the router board (DUT):

echo 1 > /proc/sys/net/ipv4/ip_forward
ip link set swp0 up && ip addr add 192.168.100.2/24 dev swp0
ip link set swp1 up && ip addr add 192.168.101.2/24 dev swp1

and this on the system with 2 endpoints:

ip netns add ns0
ip link set $ETH1 netns ns0
ip link set $ETH0 up && ip addr add 192.168.100.1/24 dev $ETH0
ip -n ns0 link set $ETH1 up && ip -n ns0 addr add 192.168.101.1/24 dev $ETH1
ip route add 192.168.101.0/24 via 192.168.100.2
ip -n ns0 route add 192.168.100.0/24 via 192.168.101.2
./net-next/samples/pktgen/pktgen_sample03_burst_single_flow.sh -i $ETH0 -s 64 -m 00:04:9f:05:de:0a -d 192.168.101.1 -t 2 -f 13 -c 0 -p 400 -n 0 -b 4

I compiled the commands from some notes I had lying around, I didn't
retest them.
Household Cang Dec. 21, 2023, 7:40 a.m. UTC | #4
> On Dec 18, 2023, at 11:23 AM, Romain Gantois <romain.gantois@bootlin.com> wrote:
> 
> This is a bugfix for an issue that was recently brought up in two
> reports:
> 
> https://lore.kernel.org/netdev/c57283ed-6b9b-b0e6-ee12-5655c1c54495@bootlin.com/
> https://lore.kernel.org/netdev/e5c6c75f-2dfa-4e50-a1fb-6bf4cdb617c2@electromag.com.au/
> 
Add me in to be the 3rd report...
RK3568 GMAC0 (eth1) to MT7531BE (CPU port)
Current workaround for me is ethtool -K eth1 rx off tx off

https://lore.kernel.org/netdev/m3clft2k7umjtny546ot3ayebattksibty3yyttpffvdixl65p@7dpqsr5nisbk/T/#t

Question on the patch to be built: how would I know if my setup could take advantage of the HW checksum offload? RK3658’s eth0 on stmmac is doing fine, and eth0 is not on a DSA switch. Does this mean eth1 should be able to do hw checksum offload once the stmmac driver is fixed?
—Lucas
Vladimir Oltean Dec. 22, 2023, 12:30 p.m. UTC | #5
Hi Lucas,

On Thu, Dec 21, 2023 at 02:40:34AM -0500, Household Cang wrote:
> > On Dec 18, 2023, at 11:23 AM, Romain Gantois <romain.gantois@bootlin.com> wrote:
> > 
> > This is a bugfix for an issue that was recently brought up in two
> > reports:
> > 
> > https://lore.kernel.org/netdev/c57283ed-6b9b-b0e6-ee12-5655c1c54495@bootlin.com/
> > https://lore.kernel.org/netdev/e5c6c75f-2dfa-4e50-a1fb-6bf4cdb617c2@electromag.com.au/
> > 
> Add me in to be the 3rd report...
> RK3568 GMAC0 (eth1) to MT7531BE (CPU port)
> Current workaround for me is ethtool -K eth1 rx off tx off

Is "rx off" actually required, or just "tx off"?

> https://lore.kernel.org/netdev/m3clft2k7umjtny546ot3ayebattksibty3yyttpffvdixl65p@7dpqsr5nisbk/T/#t
> 
> Question on the patch to be built: how would I know if my setup could
> take advantage of the HW checksum offload? RK3658’s eth0 on stmmac is
> doing fine, and eth0 is not on a DSA switch. Does this mean eth1
> should be able to do hw checksum offload once the stmmac driver is
> fixed?

The MT7531BE switch requires transmitted packets to have an additional
header which indicates what switch port is targeted. So the packet
structure is not the same as what eth0 transmits.

Your GMAC datasheet should explain what packets it is able to offload
L4 checksumming for, quite plainly. Probably MAC + IP + TCP yes, but
MAC + MTK DSA + IP + TCP no.

The bug is that the network stack thinks that the GMAC is able to
offload TX checksums for these MTK DSA tagged packets, so it does not
calculate the checksum in software, leaving the task up to the hardware.
My guess is that the hardware doesn't recognize the packets as something
that is offloadable, so it doesn't calculate the checksum either, and
that's the story of how you end up with packets with bad checksums.

The patch to be built should analyze the packet before passing it to a
hardware offload engine which will do nothing. The driver still declares
the NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM features because it is able to
offload checksumming for _some_ packets, but it falls back to the
software checksum helper for the rest. This includes your MTK DSA tagged
packets. They can be checksummed in software even with the DSA tag added,
because that uses the more generic mechanism with skb->csum_start and
skb->csum_offset, which DSA is compatible with, just fine. The GMAC
driver, most likely because of the lack of hardware support, does not
look at skb->csum_start and skb->csum_offset (aka it does not declare
the NETIF_F_HW_CSUM feature), so it cannot offload checksumming for your
switch traffic unless that was specifically baked into the RTL.

More details in the "Switch tagging protocols" section of
Documentation/networking/dsa/dsa.rst.
Lucas Pereira Dec. 22, 2023, 1:22 p.m. UTC | #6
Dear community collaborators,

First of all, I would like to thank you for the prompt response and
the suggestions provided.

We conducted the tests as indicated, but unfortunately, the problem
persists. It seems to me that if it were a Checksum-related issue, the
behavior would be different, as the VPN and communication work
normally for several days before failing suddenly.

We have observed that the only effective ways to reestablish
communication, so far, are through a system reboot or by changing the
authentication cipher, such as switching from MD5 to SHA1.
Interestingly, when switching back to the MD5 cipher, the
communication fails to function again.

I am immensely grateful for the help received so far and would greatly
appreciate any further suggestions or recommendations that you might
offer to resolve this challenge.

Sincerely,
Lucas
Vladimir Oltean Dec. 22, 2023, 1:46 p.m. UTC | #7
On Fri, Dec 22, 2023 at 10:22:21AM -0300, Lucas Pereira wrote:
> Dear community collaborators,
> 
> First of all, I would like to thank you for the prompt response and
> the suggestions provided.
> 
> We conducted the tests as indicated, but unfortunately, the problem
> persists. It seems to me that if it were a Checksum-related issue, the
> behavior would be different, as the VPN and communication work
> normally for several days before failing suddenly.
> 
> We have observed that the only effective ways to reestablish
> communication, so far, are through a system reboot or by changing the
> authentication cipher, such as switching from MD5 to SHA1.
> Interestingly, when switching back to the MD5 cipher, the
> communication fails to function again.
> 
> I am immensely grateful for the help received so far and would greatly
> appreciate any further suggestions or recommendations that you might
> offer to resolve this challenge.
> 
> Sincerely,
> Lucas

Are you responding to the right thread? This is about on-board Ethernet
switch chips attached to Synopsys MAC hardware IPs.
Household Cang Dec. 22, 2023, 7:08 p.m. UTC | #8
> On Dec 22, 2023, at 7:30 AM, Vladimir Oltean <olteanv@gmail.com> wrote:
> 
> Is "rx off" actually required, or just "tx off”?

Before adjusted the ethtool -K, the client upload speed is 880Mbps (download speed 0.6Mbps). RK3568 is acting as a router, so client is sending to eth1 via DSA user port, rx used here, and then tx on eth0. So this might suggest only tx needs to be turned off on eth1.

> 
> The MT7531BE switch requires transmitted packets to have an additional
> header which indicates what switch port is targeted. So the packet
> structure is not the same as what eth0 transmits.
> 

I understand. How many bytes are the DSA header for MT, 8 bytes?

> Your GMAC datasheet should explain what packets it is able to offload
> L4 checksumming for, quite plainly. Probably MAC + IP + TCP yes, but
> MAC + MTK DSA + IP + TCP no.

I hardly could find this in the data sheet for RK3568. From the DMA mapping, it insinuates a correct ether type needs to be detected after the MAC header, besides an inner and an outer VLAN tag, if there are any.

>  The driver still declares
> the NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM features because it is able to
> offload checksumming for _some_ packets, but it falls back to the
> software checksum helper for the rest. This includes your MTK DSA tagged
> packets.

I guess the end verdict regarding whether the hardware supports checksum offloading on DSA frames is a NO. So this is going to use some precious CPU power I am looking to fully dedicate to ipsec. Though I am pursuing crypto hw offloading at the same time with baylibre.

Lucas.
Richard Tresidder Jan. 5, 2024, 4:37 a.m. UTC | #9
Richard Tresidder


Hi All
    Just wondering if from a custom platform point of view it might be better to use vlan tagging instead of DSA in the case where the MAC can't handle DSA tags in the header.
The marvel chip can use VLAN tagging to perform basically the same thing is my understanding, you just have to nominate a vlan id (untag outbound) to a port and trunk on cpu interface..
Just considering this as from my understanding the STMMAC does understand VLAN tags and will correctly generate the CRC.
I don't think this would stop us using VLAN for the ports in general, they'd just have a "default" tag for each port thats used for untagged packets on the port.
You could still apply additional VLAN's over those same ports..

I don't suppose anyone has some CPU usage info on software CRC vs this method?
I'm not sure if this is just an IP CRC issue or a TCP CRC issue also ( can the stmmac offload the TCP CRC also? )
If this is only an IP issue and the TCP CRC is never offloaded then it probably won't make much difference in reality..

Thanks
    Richard Tresidder

On 19/12/2023 5:50 pm, Romain Gantois wrote:

> On Mon, 18 Dec 2023, Andrew Lunn wrote:
> ...
>> Probably a dumb question.... Does this COE also perform checksum
>> validation on receive? Is it also getting confused by the DSA header?
>>
>> You must of tested receive, so it works somehow, but i just wounder if
>> something needs to be done to be on the safe side?
> That's a good point, I just investigated the RX path a bit more and the MAC
> indeed has IP/TCP/UDP RX checksum offloading enabled. However, the
> external switch in my setup uses EDSA tags, which displace the "true" ethertype
> field to the end of the DSA header and replaces the "normal" ethertype with
> ETH_P_EDSA (0xdada). So to the MAC controller, the ethernet frame has an unknown
> ethertype, and so it doesn't see it as an IP frame at all. All of the
> ethtool counters related to IP stuff are at 0, which supports this.
>
> This explains why checksum offloading doesn't break the RX path in my case.
> However, other maybe other DSA switches using different frame formats could
> cause different behavior? Considering this, I think it would be safer to change
> the dsa_breaks_tx_coe flag to a general dsa_breaks_coe flag. It makes sense to
> me to assume that if DSA tags break TX COE, then RX COE will also not work.
>
> I'll take this into account when I send a v2.
>
> Best Regards,
>