diff mbox series

[net,v5,2/2] net: ethernet: cortina: Bypass checksumming engine of alien ethertypes

Message ID 20240102-new-gemini-ethernet-regression-v5-2-cf61ab3aa8cd@linaro.org (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series Fix a regression in the Gemini ethernet controller. | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success SINGLE THREAD; Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1113 this patch: 1113
netdev/cc_maintainers warning 1 maintainers not CCed: linux-arm-kernel@lists.infradead.org
netdev/build_clang success Errors and warnings before: 1140 this patch: 1140
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1140 this patch: 1140
netdev/checkpatch warning WARNING: Possible repeated word: 'the' WARNING: line length of 82 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Linus Walleij Jan. 2, 2024, 8:34 p.m. UTC
We had workarounds were the ethernet checksumming engine would be bypassed
for larger frames, this fixed devices using DSA, but regressed devices
where the ethernet was connected directly to a PHY.

The devices with a PHY connected directly can't handle large frames
either way, with or without bypass. Looking at the size of the frame
is probably just wrong.

Rework the workaround such that we don't activate the checksumming engine if
the ethertype inside the actual frame is something else than 0x0800
(IPv4) or 0x86dd (IPv6). These are the only frames the checksumming engine
can actually handle. VLAN framing (0x8100) also works fine.

We can't inspect skb->protocol because DSA frames will sometimes have a
custom ethertype despite skb->protocol is e.g. 0x0800.

If the frame is ALSO over the size of an ordinary ethernet frame,
we will actively bypass the checksumming engine. (Always doing this
makes the hardware unstable.)

After this both devices with direct ethernet attached such as D-Link
DNS-313 and devices with a DSA switch with a custom ethertype such as
D-Link DIR-685 work fine.

Fixes: d4d0c5b4d279 ("net: ethernet: cortina: Handle large frames")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 drivers/net/ethernet/cortina/gemini.c | 33 +++++++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 8 deletions(-)

Comments

Vladimir Oltean Jan. 4, 2024, 12:53 a.m. UTC | #1
On Tue, Jan 02, 2024 at 09:34:26PM +0100, Linus Walleij wrote:
> We had workarounds were the ethernet checksumming engine would be bypassed

s/were/where/

> for larger frames, this fixed devices using DSA, but regressed devices
> where the ethernet was connected directly to a PHY.
> 
> The devices with a PHY connected directly can't handle large frames
> either way, with or without bypass. Looking at the size of the frame
> is probably just wrong.

"Looking at the size of the frame is probably just wrong." yet you keep it.

Not only is this confusing for you to say this, but I believe that the
skb->len check is the _only_ thing that is needed. Explanation below.

> Rework the workaround such that we don't activate the checksumming engine if
> the ethertype inside the actual frame is something else than 0x0800
> (IPv4) or 0x86dd (IPv6). These are the only frames the checksumming engine
> can actually handle. VLAN framing (0x8100) also works fine.

Premise:

This driver does not set NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM (or anything)
in dev->vlan_features. Upper interface drivers which look at dev->vlan_features
in order to determine their own features are 8021q and DSA.

Packets transmitted through stacked interfaces have 3 checksumming points.
Two in software, during validate_xmit_skb() on the respective netdev,
depending on its features and skb->ip_summed, and one in the xmit
procedure of the hardware driver - gmac_start_xmit().

In short, I believe that the code which you have added to inspect the
ethertype - and based on that to avoid the "if (skb->ip_summed == CHECKSUM_PARTIAL)"
test - is bogus (a cost you are paying for nothing).

I'm saying this because I think that those
"(ethertype != ETH_P_IP && ethertype != ETH_P_IPV6)" frames wouldn't
have entered the "skb->ip_summed == CHECKSUM_PARTIAL" test anyway.

DSA-tagged frames should come with CHECKSUM_NONE, having been checksummed
in software already, by the first validate_xmit_skb() - DSA not having
inherited the checksum offload feature, because it's not in dev->vlan_features.

Coincidentally, this is also the reason why in your tests, DSA-tagged
TCP/UDP traffic still has a proper checksum, despite you bypassing the
hardware offload, and no longer calling skb_checksum_help() from the
driver. It was never needed, because the checksum was always already
calculated.

And VLAN traffic should also come with CHECKSUM_NONE, for the same reason.

The one difference between DSA and VLAN is that for DSA, you sometimes
set TSS_BYPASS_BIT (for large frames) and for VLAN you never do.

> 
> We can't inspect skb->protocol because DSA frames will sometimes have a
> custom ethertype despite skb->protocol is e.g. 0x0800.
> 
> If the frame is ALSO over the size of an ordinary ethernet frame,
> we will actively bypass the checksumming engine. (Always doing this
> makes the hardware unstable.)
> 
> After this both devices with direct ethernet attached such as D-Link
> DNS-313 and devices with a DSA switch with a custom ethertype such as
> D-Link DIR-685 work fine.
> 
> Fixes: d4d0c5b4d279 ("net: ethernet: cortina: Handle large frames")
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> ---
>  drivers/net/ethernet/cortina/gemini.c | 33 +++++++++++++++++++++++++--------
>  1 file changed, 25 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
> index 5e399c6e095b..68da4ae26248 100644
> --- a/drivers/net/ethernet/cortina/gemini.c
> +++ b/drivers/net/ethernet/cortina/gemini.c
> @@ -1142,22 +1143,38 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
>  	struct gmac_txdesc *txd;
>  	skb_frag_t *skb_frag;
>  	dma_addr_t mapping;
> +	u16 ethertype;
>  	void *buffer;
>  
>  	/* TODO: implement proper TSO using MTU in word3 */
>  	word1 = skb->len;
>  	word3 = SOF_BIT | skb->len;
>  
> -	if (skb->ip_summed == CHECKSUM_PARTIAL) {
> +	/* Dig out the the ethertype actually in the buffer and not what the
> +	 * protocol claims to be. This is the raw data that the checksumming
> +	 * offload engine will have to deal with.
> +	 */
> +	ethertype = ntohs(eth_header_parse_protocol(skb));
> +	/* This is the only VLAN type supported by this hardware so check for
> +	 * that: the checksumming engine can handle IP and IPv6 inside 802.1Q.
> +	 */
> +	if (ethertype == ETH_P_8021Q)
> +		ethertype = ntohs(__vlan_get_protocol(skb, htons(ethertype), NULL));

Random fact: if you store "ethertype" as __be16 and perform htons() on the
constant value instead, the htons() operation will be performed at compile
time and should result in fewer instructions per packet in the fast path.

> +
> +	if (ethertype != ETH_P_IP && ethertype != ETH_P_IPV6) {
> +		/* Hardware offloaded checksumming isn't working on non-IP frames.
> +		 * This happens for example on some DSA switches using a custom
> +		 * ethertype. When a frame gets bigger than a standard ethernet
> +		 * frame, it also needs to actively bypass the checksumming engine.
> +		 * There is no clear explanation to why it is like this, the
> +		 * reference manual has left the TSS completely undocumented.
> +		 */
> +		if (skb->len > ETH_FRAME_LEN)
> +			word1 |= TSS_BYPASS_BIT;

Do you know what "TSS_BYPASS_BIT" does, exactly?

> +	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
>  		int tcp = 0;
>  
> -		/* We do not switch off the checksumming on non TCP/UDP
> -		 * frames: as is shown from tests, the checksumming engine
> -		 * is smart enough to see that a frame is not actually TCP
> -		 * or UDP and then just pass it through without any changes
> -		 * to the frame.
> -		 */
> -		if (skb->protocol == htons(ETH_P_IP)) {
> +		if (ethertype == ETH_P_IP) {
>  			word1 |= TSS_IP_CHKSUM_BIT;
>  			tcp = ip_hdr(skb)->protocol == IPPROTO_TCP;
>  		} else { /* IPv6 */
> 
> -- 
> 2.34.1
>
Linus Walleij Jan. 6, 2024, 12:17 a.m. UTC | #2
On Thu, Jan 4, 2024 at 1:53 AM Vladimir Oltean <olteanv@gmail.com> wrote:

> "Looking at the size of the frame is probably just wrong." yet you keep it.
>
> Not only is this confusing for you to say this, but I believe that the
> skb->len check is the _only_ thing that is needed. Explanation below.

You are right (as usual).

And the analysis you write make perfect sense.

I dropped the entire patch, and send only 1/2 in v6.

> The one difference between DSA and VLAN is that for DSA, you sometimes
> set TSS_BYPASS_BIT (for large frames) and for VLAN you never do.
(...)
> Do you know what "TSS_BYPASS_BIT" does, exactly?

No.

The datasheet very annoyingly omits all details on the TSS
(the checksumming engine), and the documentation of the bits
in "word1" and "word3" only say it is a way to pass configuration
to the checksumming engine. I think it is a genuine oversight
by the document author actually.

Yours,
Linus Walleij
diff mbox series

Patch

diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index 5e399c6e095b..68da4ae26248 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -29,6 +29,7 @@ 
 #include <linux/of_net.h>
 #include <linux/of_platform.h>
 #include <linux/etherdevice.h>
+#include <linux/if_ether.h>
 #include <linux/if_vlan.h>
 #include <linux/skbuff.h>
 #include <linux/phy.h>
@@ -1142,22 +1143,38 @@  static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
 	struct gmac_txdesc *txd;
 	skb_frag_t *skb_frag;
 	dma_addr_t mapping;
+	u16 ethertype;
 	void *buffer;
 
 	/* TODO: implement proper TSO using MTU in word3 */
 	word1 = skb->len;
 	word3 = SOF_BIT | skb->len;
 
-	if (skb->ip_summed == CHECKSUM_PARTIAL) {
+	/* Dig out the the ethertype actually in the buffer and not what the
+	 * protocol claims to be. This is the raw data that the checksumming
+	 * offload engine will have to deal with.
+	 */
+	ethertype = ntohs(eth_header_parse_protocol(skb));
+	/* This is the only VLAN type supported by this hardware so check for
+	 * that: the checksumming engine can handle IP and IPv6 inside 802.1Q.
+	 */
+	if (ethertype == ETH_P_8021Q)
+		ethertype = ntohs(__vlan_get_protocol(skb, htons(ethertype), NULL));
+
+	if (ethertype != ETH_P_IP && ethertype != ETH_P_IPV6) {
+		/* Hardware offloaded checksumming isn't working on non-IP frames.
+		 * This happens for example on some DSA switches using a custom
+		 * ethertype. When a frame gets bigger than a standard ethernet
+		 * frame, it also needs to actively bypass the checksumming engine.
+		 * There is no clear explanation to why it is like this, the
+		 * reference manual has left the TSS completely undocumented.
+		 */
+		if (skb->len > ETH_FRAME_LEN)
+			word1 |= TSS_BYPASS_BIT;
+	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		int tcp = 0;
 
-		/* We do not switch off the checksumming on non TCP/UDP
-		 * frames: as is shown from tests, the checksumming engine
-		 * is smart enough to see that a frame is not actually TCP
-		 * or UDP and then just pass it through without any changes
-		 * to the frame.
-		 */
-		if (skb->protocol == htons(ETH_P_IP)) {
+		if (ethertype == ETH_P_IP) {
 			word1 |= TSS_IP_CHKSUM_BIT;
 			tcp = ip_hdr(skb)->protocol == IPPROTO_TCP;
 		} else { /* IPv6 */