[net,1/1] net: stmmac: Prevent DSA tags from breaking COE

Message ID	20231218162326.173127-2-romain.gantois@bootlin.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org> From: Romain Gantois <romain.gantois@bootlin.com> To: Alexandre Torgue <alexandre.torgue@foss.st.com>, Jose Abreu <joabreu@synopsys.com> Cc: Romain Gantois <romain.gantois@bootlin.com>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Maxime Coquelin <mcoquelin.stm32@gmail.com>, Miquel Raynal <miquel.raynal@bootlin.com>, Maxime Chevallier <maxime.chevallier@bootlin.com>, Sylvain Girard <sylvain.girard@se.com>, Pascal EBERHARD <pascal.eberhard@se.com>, Richard Tresidder <rtresidd@electromag.com.au>, netdev@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com, linux-arm-kernel@lists.infradead.org, stable@vger.kernel.org Subject: [PATCH net 1/1] net: stmmac: Prevent DSA tags from breaking COE Date: Mon, 18 Dec 2023 17:23:23 +0100 Message-ID: <20231218162326.173127-2-romain.gantois@bootlin.com> In-Reply-To: <20231218162326.173127-1-romain.gantois@bootlin.com> References: <20231218162326.173127-1-romain.gantois@bootlin.com> MIME-Version: 1.0 Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org
Series	Prevent DSA tags from breaking COE \| expand [net,0/1] Prevent DSA tags from breaking COE [net,1/1] net: stmmac: Prevent DSA tags from breaking COE

Romain Gantois Dec. 18, 2023, 4:23 p.m. UTC

Some stmmac cores have Checksum Offload Engines that cannot handle DSA tags
properly. These cores find the IP/TCP headers on their own and end up
computing an incorrect checksum when a DSA tag is inserted between the MAC
header and IP header.

Add an additional check on stmmac link up so that COE is deactivated
when the stmmac device is used as a DSA conduit.

Add a new dma_feature flag to allow cores to signal that their COEs can't
handle DSA tags on TX.

Fixes: 6b2c6e4a938f ("net: stmmac: propagate feature flags to vlan")
Cc: stable@vger.kernel.org
Reported-by: Richard Tresidder <rtresidd@electromag.com.au>
Closes: https://lore.kernel.org/netdev/e5c6c75f-2dfa-4e50-a1fb-6bf4cdb617c2@electromag.com.au/
Reported-by: Romain Gantois <romain.gantois@bootlin.com>
Closes: https://lore.kernel.org/netdev/c57283ed-6b9b-b0e6-ee12-5655c1c54495@bootlin.com/
Signed-off-by: Romain Gantois <romain.gantois@bootlin.com>
---
 drivers/net/ethernet/stmicro/stmmac/common.h     |  1 +
 .../net/ethernet/stmicro/stmmac/dwmac1000_dma.c  |  1 +
 .../net/ethernet/stmicro/stmmac/stmmac_main.c    | 16 +++++++++++++++-
 3 files changed, 17 insertions(+), 1 deletion(-)

Vladimir Oltean Dec. 19, 2023, 12:20 p.m. UTC | #1

Hi Romain,

On Mon, Dec 18, 2023 at 05:23:23PM +0100, Romain Gantois wrote:
> Some stmmac cores have Checksum Offload Engines that cannot handle DSA tags
> properly. These cores find the IP/TCP headers on their own and end up
> computing an incorrect checksum when a DSA tag is inserted between the MAC
> header and IP header.
> 
> Add an additional check on stmmac link up so that COE is deactivated
> when the stmmac device is used as a DSA conduit.
> 
> Add a new dma_feature flag to allow cores to signal that their COEs can't
> handle DSA tags on TX.
> 
> Fixes: 6b2c6e4a938f ("net: stmmac: propagate feature flags to vlan")
> Cc: stable@vger.kernel.org
> Reported-by: Richard Tresidder <rtresidd@electromag.com.au>
> Closes: https://lore.kernel.org/netdev/e5c6c75f-2dfa-4e50-a1fb-6bf4cdb617c2@electromag.com.au/
> Reported-by: Romain Gantois <romain.gantois@bootlin.com>
> Closes: https://lore.kernel.org/netdev/c57283ed-6b9b-b0e6-ee12-5655c1c54495@bootlin.com/
> Signed-off-by: Romain Gantois <romain.gantois@bootlin.com>
> ---

DSA_TAG_PROTO_LAN9303, DSA_TAG_PROTO_SJA1105 and DSA_TAG_PROTO_SJA1110
construct tags with ETH_P_8021Q as EtherType. Do you still think it
would be correct to say that all DSA tags break COE on the stmmac, as
this patch assumes?

The NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM convention is not about
statically checking whether the interface using DSA, but about looking
at each packet before deciding whether to use the offload engine or to
call skb_checksum_help().

You can experiment with any tagging protocol on the stmmac driver, and
thus with the controller's response to any kind of traffic, even if the
port is not attached to a hardware switch. You need to enable the
CONFIG_NET_DSA_LOOP option, edit the return value of dsa_loop_get_protocol()
and the "netdev" field of dsa_loop_pdata. The packets need to be
analyzed on the link partner with a packet analysis tool, since there is
no switch to strip the DSA tag.

>  drivers/net/ethernet/stmicro/stmmac/common.h     |  1 +
>  .../net/ethernet/stmicro/stmmac/dwmac1000_dma.c  |  1 +
>  .../net/ethernet/stmicro/stmmac/stmmac_main.c    | 16 +++++++++++++++-
>  3 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
> index daf79cdbd3ec..50686cdc3320 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
> @@ -264,6 +264,7 @@ static int dwmac1000_get_hw_feature(void __iomem *ioaddr,
>  	dma_cap->number_tx_channel = (hw_cap & DMA_HW_FEAT_TXCHCNT) >> 22;
>  	/* Alternate (enhanced) DESC mode */
>  	dma_cap->enh_desc = (hw_cap & DMA_HW_FEAT_ENHDESSEL) >> 24;
> +	dma_cap->dsa_breaks_tx_coe = 1;
>  
>  	return 0;
>  }
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index a9b6b383e863..733348c65e04 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -42,6 +42,7 @@
>  #include <net/page_pool/helpers.h>
>  #include <net/pkt_cls.h>
>  #include <net/xdp_sock_drv.h>
> +#include <net/dsa.h>
>  #include "stmmac_ptp.h"
>  #include "stmmac.h"
>  #include "stmmac_xdp.h"
> @@ -993,8 +994,11 @@ static void stmmac_mac_link_up(struct phylink_config *config,
>  			       int speed, int duplex,
>  			       bool tx_pause, bool rx_pause)
>  {
> -	struct stmmac_priv *priv = netdev_priv(to_net_dev(config->dev));
> +	struct net_device *ndev = to_net_dev(config->dev);
> +	struct stmmac_priv *priv = netdev_priv(ndev);
> +	unsigned int tx_queue_cnt;
>  	u32 old_ctrl, ctrl;
> +	int queue;
>  
>  	if ((priv->plat->flags & STMMAC_FLAG_SERDES_UP_AFTER_PHY_LINKUP) &&
>  	    priv->plat->serdes_powerup)
> @@ -1102,6 +1106,16 @@ static void stmmac_mac_link_up(struct phylink_config *config,
>  
>  	if (priv->plat->flags & STMMAC_FLAG_HWTSTAMP_CORRECT_LATENCY)
>  		stmmac_hwtstamp_correct_latency(priv, priv);
> +
> +	/* DSA tags break COEs on some cores. Disable TX checksum
> +	 * offloading on those cores if the netdevice is a DSA conduit.
> +	 */
> +	if (priv->dma_cap.dsa_breaks_tx_coe && netdev_uses_dsa(ndev)) {
> +		tx_queue_cnt = priv->plat->tx_queues_to_use;
> +		for (queue = 0; queue < tx_queue_cnt; queue++)
> +			priv->plat->tx_queues_cfg[queue].coe_unsupported = true;
> +	}
> +

The DSA switch driver can load after stmmac_mac_link_up() runs.
This implementation is racy anyway.

>  }
>  
>  static const struct phylink_mac_ops stmmac_phylink_mac_ops = {
> -- 
> 2.43.0
> 
> 

pw-bot: cr

Maxime Chevallier Dec. 19, 2023, 1:07 p.m. UTC | #2

Hi Vlad,

+ Linus Walleij

On Tue, 19 Dec 2023 14:20:34 +0200
Vladimir Oltean <olteanv@gmail.com> wrote:

> Hi Romain,
> 
> On Mon, Dec 18, 2023 at 05:23:23PM +0100, Romain Gantois wrote:
> > Some stmmac cores have Checksum Offload Engines that cannot handle DSA tags
> > properly. These cores find the IP/TCP headers on their own and end up
> > computing an incorrect checksum when a DSA tag is inserted between the MAC
> > header and IP header.
> > 
> > Add an additional check on stmmac link up so that COE is deactivated
> > when the stmmac device is used as a DSA conduit.
> > 
> > Add a new dma_feature flag to allow cores to signal that their COEs can't
> > handle DSA tags on TX.
> > 
> > Fixes: 6b2c6e4a938f ("net: stmmac: propagate feature flags to vlan")
> > Cc: stable@vger.kernel.org
> > Reported-by: Richard Tresidder <rtresidd@electromag.com.au>
> > Closes: https://lore.kernel.org/netdev/e5c6c75f-2dfa-4e50-a1fb-6bf4cdb617c2@electromag.com.au/
> > Reported-by: Romain Gantois <romain.gantois@bootlin.com>
> > Closes: https://lore.kernel.org/netdev/c57283ed-6b9b-b0e6-ee12-5655c1c54495@bootlin.com/
> > Signed-off-by: Romain Gantois <romain.gantois@bootlin.com>
> > ---  
> 
> DSA_TAG_PROTO_LAN9303, DSA_TAG_PROTO_SJA1105 and DSA_TAG_PROTO_SJA1110
> construct tags with ETH_P_8021Q as EtherType. Do you still think it
> would be correct to say that all DSA tags break COE on the stmmac, as
> this patch assumes?
> 
> The NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM convention is not about
> statically checking whether the interface using DSA, but about looking
> at each packet before deciding whether to use the offload engine or to
> call skb_checksum_help().

So it looks like an acceptable solution would be something along the
lines of what Linus is suggesting here :

https://lore.kernel.org/netdev/20231216-new-gemini-ethernet-regression-v2-2-64c269413dfa@linaro.org/

If so, maybe it's worth adding a new helper for that check ?

Maxime

Linus Walleij Dec. 19, 2023, 2:19 p.m. UTC | #3

On Tue, Dec 19, 2023 at 2:07 PM Maxime Chevallier
<maxime.chevallier@bootlin.com> wrote:

> So it looks like an acceptable solution would be something along the
> lines of what Linus is suggesting here :
>
> https://lore.kernel.org/netdev/20231216-new-gemini-ethernet-regression-v2-2-64c269413dfa@linaro.org/
>
> If so, maybe it's worth adding a new helper for that check ?

Yeah it's a bit annoying when skb->protocol is not == ethertype of buffer.

I can certainly add a helper such as skb_eth_raw_ethertype()
to <linux/if_ether.h> that will inspect the actual ethertype in
skb->data.

It's the most straight-forward approach.

We could also add something like bool custom_ethertype; to
struct sk_buff and set that to true if the tagger adds a custom
ethertype. But I don't know how the network developers feel about
that.

Yours,
Linus Walleij

Maxime Chevallier Dec. 19, 2023, 4:29 p.m. UTC | #4

Hi Linus,

On Tue, 19 Dec 2023 15:19:45 +0100
Linus Walleij <linus.walleij@linaro.org> wrote:

> On Tue, Dec 19, 2023 at 2:07 PM Maxime Chevallier
> <maxime.chevallier@bootlin.com> wrote:
> 
> > So it looks like an acceptable solution would be something along the
> > lines of what Linus is suggesting here :
> >
> > https://lore.kernel.org/netdev/20231216-new-gemini-ethernet-regression-v2-2-64c269413dfa@linaro.org/
> >
> > If so, maybe it's worth adding a new helper for that check ?  
> 
> Yeah it's a bit annoying when skb->protocol is not == ethertype of buffer.
> 
> I can certainly add a helper such as skb_eth_raw_ethertype()
> to <linux/if_ether.h> that will inspect the actual ethertype in
> skb->data.
> 
> It's the most straight-forward approach.

Agreed :)

> We could also add something like bool custom_ethertype; to
> struct sk_buff and set that to true if the tagger adds a custom
> ethertype. But I don't know how the network developers feel about
> that.

I don't think this would be OK, first because sk_buff is pretty
sensitive when it comes to cache alignment, adding things for this kind
of use-cases isn't necessarily a good idea. Moreover, populating this
flag isn't going to be straightforward as well. I guess some ethertype
would be compatible with checksum engines, while other wouldn't, so
probably what 'custom_ethertype' means will depend on the MAC driver.

From my point of view the first approach would indeed be better.

Thanks,

Maxime

Vladimir Oltean Dec. 19, 2023, 10:46 p.m. UTC | #5

On Tue, Dec 19, 2023 at 05:29:32PM +0100, Maxime Chevallier wrote:
> Hi Linus,
> 
> On Tue, 19 Dec 2023 15:19:45 +0100
> Linus Walleij <linus.walleij@linaro.org> wrote:
> 
> > On Tue, Dec 19, 2023 at 2:07 PM Maxime Chevallier
> > <maxime.chevallier@bootlin.com> wrote:
> > 
> > > So it looks like an acceptable solution would be something along the
> > > lines of what Linus is suggesting here :
> > >
> > > https://lore.kernel.org/netdev/20231216-new-gemini-ethernet-regression-v2-2-64c269413dfa@linaro.org/
> > >
> > > If so, maybe it's worth adding a new helper for that check ?  
> > 
> > Yeah it's a bit annoying when skb->protocol is not == ethertype of buffer.
> > 
> > I can certainly add a helper such as skb_eth_raw_ethertype()
> > to <linux/if_ether.h> that will inspect the actual ethertype in
> > skb->data.
> > 
> > It's the most straight-forward approach.
> 
> Agreed :)

If you rewrite that patch to use skb_vlan_eth_hdr() to get a struct
vlan_ethhdr pointer through which h_vlan_proto and h_vlan_encapsulated_proto
are accessible, I don't see much value in writing that helper. It is
going to beg the question how generic should it be - should it also
treat ETH_P_8021AD, should it treat nested VLANs?

At the end of the day, you are trying to cover in software the cases for
which the hardware engine can perform TX checksum offloading. That is
going to be hardware specific.

> > We could also add something like bool custom_ethertype; to
> > struct sk_buff and set that to true if the tagger adds a custom
> > ethertype. But I don't know how the network developers feel about
> > that.
> 
> I don't think this would be OK, first because sk_buff is pretty
> sensitive when it comes to cache alignment, adding things for this kind
> of use-cases isn't necessarily a good idea. Moreover, populating this
> flag isn't going to be straightforward as well. I guess some ethertype
> would be compatible with checksum engines, while other wouldn't, so
> probably what 'custom_ethertype' means will depend on the MAC driver.
> 
> From my point of view the first approach would indeed be better.

I guess we should first try to answer the questions "what does
skb->protocol represent?" and "does DSA use it correctly?" before
even thinking about adding yet another fuzzy layer on top it.

Linus Walleij Dec. 20, 2023, 12:43 a.m. UTC | #6

On Tue, Dec 19, 2023 at 11:46 PM Vladimir Oltean <olteanv@gmail.com> wrote:
> On Tue, Dec 19, 2023 at 05:29:32PM +0100, Maxime Chevallier wrote:

> > > I can certainly add a helper such as skb_eth_raw_ethertype()
> > > to <linux/if_ether.h> that will inspect the actual ethertype in
> > > skb->data.
> > >
> > > It's the most straight-forward approach.
> >
> > Agreed :)
>
> If you rewrite that patch to use skb_vlan_eth_hdr() to get a struct
> vlan_ethhdr pointer through which h_vlan_proto and h_vlan_encapsulated_proto
> are accessible, I don't see much value in writing that helper. It is
> going to beg the question how generic should it be - should it also
> treat ETH_P_8021AD, should it treat nested VLANs?

I guess I should just post the patches inline. (It came from both
Erics and Maximes suggestion really.)

Actually I wrote two helpers, one to get the ethertype from the
ethernet frame which is pretty straight-forward.

include/linux/if_ether.h

+/* This determines the ethertype incoded into the skb data without
+ * relying on skb->protocol which is not always identical.
+ */
+static inline u16 skb_eth_raw_ethertype(const struct sk_buff *skb)
+{
+       struct ethhdr *hdr;
+
+       /* If we can't extract a header, return invalid type */
+       if (!skb_pointer_if_linear(skb, 0, ETH_HLEN))
+               return 0x0000U;
+
+       hdr = skb_eth_hdr(skb);
+
+       return ntohs(hdr->h_proto);
+}

Then for *this* driver I need to check for the ethertype
ETH_P_8021Q what is inside it, one level down, and that is a
separate helper. And I named it skb_vlan_raw_inner_ethertype()
It will retrieve the inner type no matter

include/linux/if_vlan.h

+/* This determines the inner ethertype incoded into the skb data without
+ * relying on skb->protocol which is not always identical.
+ */
+static inline u16 skb_vlan_raw_inner_ethertype(const struct sk_buff *skb)
+{
+       struct vlan_ethhdr *vhdr;
+
+       if (!skb_pointer_if_linear(skb, 0, VLAN_ETH_HLEN))
+               return 0x0000U;
+
+       vhdr = vlan_eth_hdr(skb);
+       return ntohs(vhdr->h_vlan_encapsulated_proto);
+}

(We can bikeshed the name of the function. *_inner_protocol maybe.)

It does not handle nested VLANs and I don't see why it should since
the immediate siblings in if_vlan.h does not, i.e.
vlan_eth_hdr(), skb_vlan_eth_hdr(). It's pretty clear these helpers
all go just one level down. (We can add a *_descend_*()
helper the day someone needs that.)

> At the end of the day, you are trying to cover in software the cases for
> which the hardware engine can perform TX checksum offloading. That is
> going to be hardware specific.

Yeps and I am happy to fold these helpers inside of my driver if
they are not helpful to anyone else, or if that is the best idea for something
intended for a fix, i.e. an -rc kernel.

> I guess we should first try to answer the questions "what does
> skb->protocol represent?" and "does DSA use it correctly?" before
> even thinking about adding yet another fuzzy layer on top it.

Fair point! Let's take a step back. The kerneldoc says:

 *      @protocol: Packet protocol from driver

That's a bit vague and it was in the first commit in git history :/

But Eric probably knows the right way to use protocol.

But we know for sure that VLAN uses this for the outermost protocol
ETH_P_8021Q (etc).

I wonder how the network stack reacts if we set the skb->protocol
to whatever DSA taggers put at the position of the ethertype.

For RTL taggers probably this works because they use an elaborate
custom ethertype, but e.g. net/dsa/tag_mtk.c will just put in
"ethertype" 0x0000, 0x0001 or 0x0002, the two latter which are
formally ETH_P_802_3 and ETH_P_AX25 which I think is maybe
not so good to put into skb->protocol.

Another option is to set it to the ETH_P_DSA ethertype, currently
unused in the kernel.

Now this kind of thinking makes me insecure because:
git grep '\->protocol' net/

There is just sooooo much code inspecting ->protocol in the generic
network stack that this seems like inviting disaster.

Yours,
Linus Walleij

Linus Walleij Dec. 20, 2023, 11 p.m. UTC | #7

On Wed, Dec 20, 2023 at 1:43 AM Linus Walleij <linus.walleij@linaro.org> wrote:

> Then for *this* driver I need to check for the ethertype
> ETH_P_8021Q what is inside it, one level down, and that is a
> separate helper. And I named it skb_vlan_raw_inner_ethertype()
> It will retrieve the inner type no matter
>
> include/linux/if_vlan.h
>
> +/* This determines the inner ethertype incoded into the skb data without
> + * relying on skb->protocol which is not always identical.
> + */
> +static inline u16 skb_vlan_raw_inner_ethertype(const struct sk_buff *skb)
> +{
> +       struct vlan_ethhdr *vhdr;
> +
> +       if (!skb_pointer_if_linear(skb, 0, VLAN_ETH_HLEN))
> +               return 0x0000U;
> +
> +       vhdr = vlan_eth_hdr(skb);
> +       return ntohs(vhdr->h_vlan_encapsulated_proto);
> +}
>
> (We can bikeshed the name of the function. *_inner_protocol maybe.)
>
> It does not handle nested VLANs and I don't see why it should since
> the immediate siblings in if_vlan.h does not, i.e.
> vlan_eth_hdr(), skb_vlan_eth_hdr(). It's pretty clear these helpers
> all go just one level down. (We can add a *_descend_*()
> helper the day someone needs that.)

Forget this whole discussion because in <linux/if_vlan.h>
there is already vlan_get_protocol() and vlan_get_protocol_and_depth()
so this problem is already solved, just better.

Yours,
Linus Walleij

Romain Gantois Dec. 29, 2023, 4:11 p.m. UTC | #8

On Tue, 19 Dec 2023, Vladimir Oltean wrote:
> DSA_TAG_PROTO_LAN9303, DSA_TAG_PROTO_SJA1105 and DSA_TAG_PROTO_SJA1110
> construct tags with ETH_P_8021Q as EtherType. Do you still think it
> would be correct to say that all DSA tags break COE on the stmmac, as
> this patch assumes?
> 
> The NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM convention is not about
> statically checking whether the interface using DSA, but about looking
> at each packet before deciding whether to use the offload engine or to
> call skb_checksum_help().
> 
> You can experiment with any tagging protocol on the stmmac driver, and
> thus with the controller's response to any kind of traffic, even if the
> port is not attached to a hardware switch. You need to enable the
Thanks for telling me about DSA_LOOP, I've tested several DSA tagging protocols 
with the RZN1 GMAC1 hardware using this method. Here's what I found in a 
nutshell:

For tagging protocols that change the EtherType field in the MAC header (e.g. 
DSA_TAG_PROTO_(DSA/EDSA/BRCM/MTK/RTL4C_A/SJA1105): On TX the tagged frames are 
almost always ignored by the checksum offload engine and IP header checker of 
the MAC device. I say "almost always" because there is an 
unlikely but nasty corner case where a DSA tag can be identical to an IP 
EtherType value. In these cases, the frame will likely fail IP header checks 
and be dropped by the MAC.

Ignoring these corner cases, the DSA frames will egress with a partial 
checksum and be dropped by the recipient. On RX, these frames will, once again, 
not be detected as IP frames by the MAC. So they will be transmitted to the CPU. 
However, the stmmac driver will assume (wrongly in this case) that
these frames' checksums have been verified by the MAC. So it will set 
CHECKSUM_UNECESSARY:

https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c#L5493

And so the IP/TCP checksums will not be checked at all, 
which is not ideal.

There are other DSA tagging protocols which cause different issues. For example 
DSA_TAG_PROTO_BRCM_PREPEND, which seems to offset the whole MAC header, and 
DSA_TAG_PROTO_LAN9303 which sets ETH_P_8021Q as its EtherType. I haven't dug too 
deeply on these issues yet, since I'd rather deal with the checksumming issue 
before getting distracted by VLAN offloading and other stuff.

Among the tagging protocols I tested, the only one that didn't cause any issues 
was DSA_TAG_PROTO_TRAILER, which only appends stuff to the frame.

TLDR: The simplest solution seems to be to modify the stmmac TX and RX paths to 
disable checksum offloading for frames that have a non-IP ethertype in 
their MAC header. This will fix the checksum situation for DSA tagging protocols 
that set non-IP and non-8021Q EtherTypes. Some edge cases like 
DSA_TAG_PROTO_BRCM_PREPEND and DSA_TAG_PROTO_LAN9303 will require a completely 
different solution if we want these MAC devices to handle them properly.
Please share any thoughts you might have on this suggestion.

Best Regards,

Vladimir Oltean Dec. 30, 2023, 2:17 p.m. UTC | #9

On Fri, Dec 29, 2023 at 05:11:48PM +0100, Romain Gantois wrote:
> Thanks for telling me about DSA_LOOP, I've tested several DSA tagging protocols 
> with the RZN1 GMAC1 hardware using this method. Here's what I found in a 
> nutshell:

Good job exploring the complexity of the problem in depth.

> For tagging protocols that change the EtherType field in the MAC header (e.g. 
> DSA_TAG_PROTO_(DSA/EDSA/BRCM/MTK/RTL4C_A/SJA1105): On TX the tagged frames are 
> almost always ignored by the checksum offload engine and IP header checker of 
> the MAC device. I say "almost always" because there is an 
> unlikely but nasty corner case where a DSA tag can be identical to an IP 
> EtherType value. In these cases, the frame will likely fail IP header checks 
> and be dropped by the MAC.

Yes, there are a few poorly designed DSA tagging formats where arbitrary
fields overlap with what the conduit interface sees as the EtherType field.
We don't design the tagging formats, as they are proprietary (except for those
derived from tag_8021q), we just support them. In some cases where the
switch has permitted that, we have implemented dynamic changing of
tagging protocols (like 'echo edsa > /sys/class/net/eth0/dsa/tagging')
in order to increase the compatibility between a particular switch and
its conduit interface. And where the compatibility with the default
tagging protocol was beyond broken, we accepted an alternative one
through the 'dsa-tag-protocol' device tree property.

> Ignoring these corner cases, the DSA frames will egress with a partial 
> checksum and be dropped by the recipient. On RX, these frames will, once again, 
> not be detected as IP frames by the MAC. So they will be transmitted to the CPU. 
> However, the stmmac driver will assume (wrongly in this case) that
> these frames' checksums have been verified by the MAC. So it will set 
> CHECKSUM_UNECESSARY:
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c#L5493
>  
> And so the IP/TCP checksums will not be checked at all, which is not
> ideal.

Yup, this all stems from the fact that DSA inherits the checksum offload
features of the conduit (stmmac) from its vlan_features. People think
that vlan_features are inherited only by VLAN upper interfaces, but that
is not the case. Confusingly, in some cases, offloading NETIF_F_IP_CSUM |
NETIF_F_IPV6_CSUM really does work (Broadcom conduit + Broadcom switch,
Marvell conduit + Marvell switch, etc), so we can't remove this mechanism.
But it uncovers lack of API compliance in drivers such as the stmmac,
which is why it is a fragile mechanism.

> There are other DSA tagging protocols which cause different issues. For example 
> DSA_TAG_PROTO_BRCM_PREPEND, which seems to offset the whole MAC header, and 
> DSA_TAG_PROTO_LAN9303 which sets ETH_P_8021Q as its EtherType. I haven't dug too 
> deeply on these issues yet, since I'd rather deal with the checksumming issue 
> before getting distracted by VLAN offloading and other stuff.

I agree that what brcm-prepend does - shifting the entire frame to the
right by 4 octets - sounds problematic in general (making the conduit
see the EtherType as octets [3:2] of the original MAC SA). But you also
need to take a look at where those protocols are used, and if that is
relevant in any way to the stmmac.

	/* Broadcom BCM58xx chips have a flow accelerator on Port 8
	 * which requires us to use the prepended Broadcom tag type
	 */
	if (dev->chip_id == BCM58XX_DEVICE_ID && port == B53_CPU_PORT) {
		dev->tag_protocol = DSA_TAG_PROTO_BRCM_PREPEND;
		goto out;
	}

From what I understand, DSA_TAG_PROTO_BRCM_PREPEND is only used
internally within Broadcom SoCs, so it seems likely that it's not
designed with generic compatibility in mind.

As for DSA_TAG_PROTO_LAN9303, let me guess what the problem was. TX was
fine, but on RX, the packets got dropped in hardware before they even
reached the stmmac driver, because it declares NETIF_F_HW_VLAN_CTAG_FILTER |
NETIF_F_HW_VLAN_STAG_FILTER as features, and the DSA tags effectively
look like unregistered VLAN traffic.

That is certainly an area where the lan9303 support can be improved.
Other VLAN-based taggers like tag_8021q perform vlan_vid_add() calls on
the conduit interface so that it won't drop the traffic even when it
uses hardware VLAN filtering.

> Among the tagging protocols I tested, the only one that didn't cause any issues 
> was DSA_TAG_PROTO_TRAILER, which only appends stuff to the frame.

It's very curious that you say this. Tail taggers are notoriously
problematic, because while the conduit will perform the checksum offload
function on the packets, the checksum calculation goes until the very end
of the frame. Thus, that checksum will be wrong after the switch consumes
the tail tag (and does not update the L4 checksum).

There is no way to overcome that except to not inherit any checksum
offload features for tail taggers. But that would break some other thing,
so we opted for having this line in the xmit procedure of tail taggers:

	if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb))
		return NULL;

But apparently we have been inconsistent in applying this to trailer_xmit()
as well. So DSA_TAG_PROTO_TRAILER should actually be a case of "checksum
is computed, but is incorrect after tag stripping", but you say that it
was the only one that worked fine.

> TLDR: The simplest solution seems to be to modify the stmmac TX and RX paths to 
> disable checksum offloading for frames that have a non-IP ethertype in 
> their MAC header. This will fix the checksum situation for DSA tagging protocols 
> that set non-IP and non-8021Q EtherTypes. Some edge cases like 
> DSA_TAG_PROTO_BRCM_PREPEND and DSA_TAG_PROTO_LAN9303 will require a completely 
> different solution if we want these MAC devices to handle them properly.
> Please share any thoughts you might have on this suggestion.

I think the overall idea is correct, with the small mentions of "let's
ignore brcm-prepend" and "lan9303 should work, maybe it's just a case of
disabling the VLAN filtering features through ethtool and testing again?".

[net,1/1] net: stmmac: Prevent DSA tags from breaking COE

Commit Message

Comments

Patch