diff mbox series

[net,v5,1/2] net: ethernet: cortina: Drop software checksum and TSO

Message ID 20240102-new-gemini-ethernet-regression-v5-1-cf61ab3aa8cd@linaro.org (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series Fix a regression in the Gemini ethernet controller. | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success SINGLE THREAD; Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1113 this patch: 1113
netdev/cc_maintainers warning 1 maintainers not CCed: linux-arm-kernel@lists.infradead.org
netdev/build_clang success Errors and warnings before: 1140 this patch: 1140
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1140 this patch: 1140
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 51 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Linus Walleij Jan. 2, 2024, 8:34 p.m. UTC
The recent change to allow large frames without hardware checksumming
slotted in software checksumming in the driver if hardware could not
do it.

This will however upset TSO (TCP Segment Offloading). Typical
error dumps includes this:

skb len=2961 headroom=222 headlen=66 tailroom=0
(...)
WARNING: CPU: 0 PID: 956 at net/core/dev.c:3259 skb_warn_bad_offload+0x7c/0x108
gemini-ethernet-port: caps=(0x0000010000154813, 0x00002007ffdd7889)

And the packets do not go through.

After investigating I drilled it down to the introduction of the
software checksumming in the driver.

Since the segmenting of packets will be done by the hardware this
makes a bit of sense since in that case the hardware also needs to
be keeping track of the checksumming.

That begs the question why large TCP or UDP packets also have to
bypass the checksumming (like e.g. ICMP does). If the hardware is
splitting it into smaller packets per-MTU setting, and checksumming
them, why is this happening then? I don't know. I know it is needed,
from tests: the OpenWrt webserver uhttpd starts sending big skb:s (up
to 2047 bytes, the max MTU) and above 1514 bytes it starts to fail
and hang unless the bypass bit is set: the frames are not getting
through.

Drop the size check and the offloading features for now: this
needs to be fixed up properly.

Suggested-by: Eric Dumazet <edumazet@google.com>
Fixes: d4d0c5b4d279 ("net: ethernet: cortina: Handle large frames")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 drivers/net/ethernet/cortina/gemini.c | 35 ++++-------------------------------
 1 file changed, 4 insertions(+), 31 deletions(-)

Comments

Vladimir Oltean Jan. 4, 2024, 12:24 a.m. UTC | #1
Hi Linus,

On Tue, Jan 02, 2024 at 09:34:25PM +0100, Linus Walleij wrote:
> That begs the question why large TCP or UDP packets also have to
> bypass the checksumming (like e.g. ICMP does). If the hardware is
> splitting it into smaller packets per-MTU setting, and checksumming
> them, why is this happening then? I don't know. I know it is needed,
> from tests: the OpenWrt webserver uhttpd starts sending big skb:s (up
> to 2047 bytes, the max MTU) and above 1514 bytes it starts to fail
> and hang unless the bypass bit is set: the frames are not getting
> through.

This uhttpd traffic is plain TCP, or TCP wrapped in DSA?
Linus Walleij Jan. 5, 2024, midnight UTC | #2
On Thu, Jan 4, 2024 at 1:24 AM Vladimir Oltean <olteanv@gmail.com> wrote:
> On Tue, Jan 02, 2024 at 09:34:25PM +0100, Linus Walleij wrote:

> > That begs the question why large TCP or UDP packets also have to
> > bypass the checksumming (like e.g. ICMP does). If the hardware is
> > splitting it into smaller packets per-MTU setting, and checksumming
> > them, why is this happening then? I don't know. I know it is needed,
> > from tests: the OpenWrt webserver uhttpd starts sending big skb:s (up
> > to 2047 bytes, the max MTU) and above 1514 bytes it starts to fail
> > and hang unless the bypass bit is set: the frames are not getting
> > through.
>
> This uhttpd traffic is plain TCP, or TCP wrapped in DSA?

Wrapped in DSA, rtl_a_4.

Yours,
Linus Walleij
Vladimir Oltean Jan. 5, 2024, 11:32 a.m. UTC | #3
On Tue, Jan 02, 2024 at 09:34:25PM +0100, Linus Walleij wrote:
> @@ -1143,39 +1142,13 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
>  	struct gmac_txdesc *txd;
>  	skb_frag_t *skb_frag;
>  	dma_addr_t mapping;
> -	unsigned short mtu;
>  	void *buffer;
> -	int ret;
> -
> -	mtu  = ETH_HLEN;
> -	mtu += netdev->mtu;
> -	if (skb->protocol == htons(ETH_P_8021Q))
> -		mtu += VLAN_HLEN;
>  
> +	/* TODO: implement proper TSO using MTU in word3 */
>  	word1 = skb->len;
> -	word3 = SOF_BIT;
> -
> -	if (word1 > mtu) {
> -		word1 |= TSS_MTU_ENABLE_BIT;
> -		word3 |= mtu;
> -	}
> +	word3 = SOF_BIT | skb->len;
>  
> -	if (skb->len >= ETH_FRAME_LEN) {
> -		/* Hardware offloaded checksumming isn't working on frames
> -		 * bigger than 1514 bytes. A hypothesis about this is that the
> -		 * checksum buffer is only 1518 bytes, so when the frames get
> -		 * bigger they get truncated, or the last few bytes get
> -		 * overwritten by the FCS.
> -		 *
> -		 * Just use software checksumming and bypass on bigger frames.
> -		 */
> -		if (skb->ip_summed == CHECKSUM_PARTIAL) {
> -			ret = skb_checksum_help(skb);
> -			if (ret)
> -				return ret;
> -		}
> -		word1 |= TSS_BYPASS_BIT;
> -	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {

So are you taking back the statement that "Hardware offloaded
checksumming isn't working on frames bigger than 1514 bytes"?

Have you increased the interface MTU beyond 1500, and tested with plain
TCP (no DSA) on top of it? Who will provide the TCP checksum for them now?

I don't understand why you remove the skb_checksum_help() call.
It doesn't play nice with skb_is_gso() packets, agreed, but you removed
the TSO netdev feature.

> +	if (skb->ip_summed == CHECKSUM_PARTIAL) {
>  		int tcp = 0;
>  
>  		/* We do not switch off the checksumming on non TCP/UDP
> 
> -- 
> 2.34.1
>
Eric Dumazet Jan. 5, 2024, 2:36 p.m. UTC | #4
On Fri, Jan 5, 2024 at 12:32 PM Vladimir Oltean <olteanv@gmail.com> wrote:
>
> On Tue, Jan 02, 2024 at 09:34:25PM +0100, Linus Walleij wrote:
> > @@ -1143,39 +1142,13 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
> >       struct gmac_txdesc *txd;
> >       skb_frag_t *skb_frag;
> >       dma_addr_t mapping;
> > -     unsigned short mtu;
> >       void *buffer;
> > -     int ret;
> > -
> > -     mtu  = ETH_HLEN;
> > -     mtu += netdev->mtu;
> > -     if (skb->protocol == htons(ETH_P_8021Q))
> > -             mtu += VLAN_HLEN;
> >
> > +     /* TODO: implement proper TSO using MTU in word3 */
> >       word1 = skb->len;
> > -     word3 = SOF_BIT;
> > -
> > -     if (word1 > mtu) {
> > -             word1 |= TSS_MTU_ENABLE_BIT;
> > -             word3 |= mtu;
> > -     }
> > +     word3 = SOF_BIT | skb->len;
> >
> > -     if (skb->len >= ETH_FRAME_LEN) {
> > -             /* Hardware offloaded checksumming isn't working on frames
> > -              * bigger than 1514 bytes. A hypothesis about this is that the
> > -              * checksum buffer is only 1518 bytes, so when the frames get
> > -              * bigger they get truncated, or the last few bytes get
> > -              * overwritten by the FCS.
> > -              *
> > -              * Just use software checksumming and bypass on bigger frames.
> > -              */
> > -             if (skb->ip_summed == CHECKSUM_PARTIAL) {
> > -                     ret = skb_checksum_help(skb);
> > -                     if (ret)
> > -                             return ret;
> > -             }
> > -             word1 |= TSS_BYPASS_BIT;
> > -     } else if (skb->ip_summed == CHECKSUM_PARTIAL) {
>
> So are you taking back the statement that "Hardware offloaded
> checksumming isn't working on frames bigger than 1514 bytes"?
>
> Have you increased the interface MTU beyond 1500, and tested with plain
> TCP (no DSA) on top of it? Who will provide the TCP checksum for them now?
>
> I don't understand why you remove the skb_checksum_help() call.
> It doesn't play nice with skb_is_gso() packets, agreed, but you removed
> the TSO netdev feature.

This TSO feature never possibly worked.

This was probably hidden because TCP retransmits non TSO packets eventually.

A TSO enabled driver must use/propagate skb_shinfo(skb)->gso_size
value to the TSO engine on the NIC.
Otherwise, this is absolutely broken.

Please look at my original suggestion. I think the plan is to try to
add back TSO in next release, with proper testing (ie not rely on TCP
resilience)

https://lore.kernel.org/netdev/CANn89iJLfxng1sYL5Zk0mknXpyYQPCp83m3KgD2KJ2_hKCpEUg@mail.gmail.com/
Eric Dumazet Jan. 5, 2024, 2:40 p.m. UTC | #5
On Tue, Jan 2, 2024 at 9:34 PM Linus Walleij <linus.walleij@linaro.org> wrote:
>
> The recent change to allow large frames without hardware checksumming
> slotted in software checksumming in the driver if hardware could not
> do it.
>
> This will however upset TSO (TCP Segment Offloading). Typical
> error dumps includes this:
>
> skb len=2961 headroom=222 headlen=66 tailroom=0
> (...)
> WARNING: CPU: 0 PID: 956 at net/core/dev.c:3259 skb_warn_bad_offload+0x7c/0x108
> gemini-ethernet-port: caps=(0x0000010000154813, 0x00002007ffdd7889)
>
> And the packets do not go through.
>
> After investigating I drilled it down to the introduction of the
> software checksumming in the driver.
>
> Since the segmenting of packets will be done by the hardware this
> makes a bit of sense since in that case the hardware also needs to
> be keeping track of the checksumming.
>
> That begs the question why large TCP or UDP packets also have to
> bypass the checksumming (like e.g. ICMP does). If the hardware is
> splitting it into smaller packets per-MTU setting, and checksumming
> them, why is this happening then? I don't know. I know it is needed,
> from tests: the OpenWrt webserver uhttpd starts sending big skb:s (up
> to 2047 bytes, the max MTU) and above 1514 bytes it starts to fail
> and hang unless the bypass bit is set: the frames are not getting
> through.
>
> Drop the size check and the offloading features for now: this
> needs to be fixed up properly.
>
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Fixes: d4d0c5b4d279 ("net: ethernet: cortina: Handle large frames")
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> ---
>  drivers/net/ethernet/cortina/gemini.c | 35 ++++-------------------------------
>  1 file changed, 4 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
> index 78287cfcbf63..5e399c6e095b 100644
> --- a/drivers/net/ethernet/cortina/gemini.c
> +++ b/drivers/net/ethernet/cortina/gemini.c
> @@ -79,8 +79,7 @@ MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
>  #define GMAC0_IRQ4_8 (GMAC0_MIB_INT_BIT | GMAC0_RX_OVERRUN_INT_BIT)
>
>  #define GMAC_OFFLOAD_FEATURES (NETIF_F_SG | NETIF_F_IP_CSUM | \
> -               NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM | \
> -               NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6)
> +                              NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM)
>
>  /**
>   * struct gmac_queue_page - page buffer per-page info
> @@ -1143,39 +1142,13 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
>         struct gmac_txdesc *txd;
>         skb_frag_t *skb_frag;
>         dma_addr_t mapping;
> -       unsigned short mtu;
>         void *buffer;
> -       int ret;
> -
> -       mtu  = ETH_HLEN;
> -       mtu += netdev->mtu;
> -       if (skb->protocol == htons(ETH_P_8021Q))
> -               mtu += VLAN_HLEN;
>
> +       /* TODO: implement proper TSO using MTU in word3 */

I would not use MTU in this comment, but gso_size (or flow MSS).

>         word1 = skb->len;
> -       word3 = SOF_BIT;
> -
> -       if (word1 > mtu) {
> -               word1 |= TSS_MTU_ENABLE_BIT;
> -               word3 |= mtu;
> -       }
> +       word3 = SOF_BIT | skb->len;

Probably word3 could be left with SOF_BIT ?
I am guessing the 'length' would only be used by the NIC if TSO is requested.

>
> -       if (skb->len >= ETH_FRAME_LEN) {
> -               /* Hardware offloaded checksumming isn't working on frames
> -                * bigger than 1514 bytes. A hypothesis about this is that the
> -                * checksum buffer is only 1518 bytes, so when the frames get
> -                * bigger they get truncated, or the last few bytes get
> -                * overwritten by the FCS.
> -                *
> -                * Just use software checksumming and bypass on bigger frames.
> -                */
> -               if (skb->ip_summed == CHECKSUM_PARTIAL) {
> -                       ret = skb_checksum_help(skb);
> -                       if (ret)
> -                               return ret;
> -               }
> -               word1 |= TSS_BYPASS_BIT;
> -       } else if (skb->ip_summed == CHECKSUM_PARTIAL) {
> +       if (skb->ip_summed == CHECKSUM_PARTIAL) {
>                 int tcp = 0;
>
>                 /* We do not switch off the checksumming on non TCP/UDP
>
> --
> 2.34.1
>
Linus Walleij Jan. 5, 2024, 11:35 p.m. UTC | #6
On Fri, Jan 5, 2024 at 12:32 PM Vladimir Oltean <olteanv@gmail.com> wrote:

> So are you taking back the statement that "Hardware offloaded
> checksumming isn't working on frames bigger than 1514 bytes"?

Yes, the correct statement is that it isn't working in frames
bigger than 1514 bytes, if they have a custom DSA ethernet
tag.

The previous workaround has made the driver work fine
with the device that has a Realtek DSA switch with custom
ethertype, but it broke the driver for devices that have a
PHY connected directly to the ethernet block.

(I blame manual testing...)

> Have you increased the interface MTU beyond 1500, and tested with plain
> TCP (no DSA) on top of it? Who will provide the TCP checksum for them now?
>
> I don't understand why you remove the skb_checksum_help() call.
> It doesn't play nice with skb_is_gso() packets, agreed, but you removed
> the TSO netdev feature.

You're right, I was stuck there and larger MTU would not work.

Simply dropping the TSO and leaving the SW checksum in place
make it all work nicely!

Thank you so much Vladimir for pointing this out!

Yours,
Linus Walleij
diff mbox series

Patch

diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index 78287cfcbf63..5e399c6e095b 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -79,8 +79,7 @@  MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
 #define GMAC0_IRQ4_8 (GMAC0_MIB_INT_BIT | GMAC0_RX_OVERRUN_INT_BIT)
 
 #define GMAC_OFFLOAD_FEATURES (NETIF_F_SG | NETIF_F_IP_CSUM | \
-		NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM | \
-		NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6)
+			       NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM)
 
 /**
  * struct gmac_queue_page - page buffer per-page info
@@ -1143,39 +1142,13 @@  static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
 	struct gmac_txdesc *txd;
 	skb_frag_t *skb_frag;
 	dma_addr_t mapping;
-	unsigned short mtu;
 	void *buffer;
-	int ret;
-
-	mtu  = ETH_HLEN;
-	mtu += netdev->mtu;
-	if (skb->protocol == htons(ETH_P_8021Q))
-		mtu += VLAN_HLEN;
 
+	/* TODO: implement proper TSO using MTU in word3 */
 	word1 = skb->len;
-	word3 = SOF_BIT;
-
-	if (word1 > mtu) {
-		word1 |= TSS_MTU_ENABLE_BIT;
-		word3 |= mtu;
-	}
+	word3 = SOF_BIT | skb->len;
 
-	if (skb->len >= ETH_FRAME_LEN) {
-		/* Hardware offloaded checksumming isn't working on frames
-		 * bigger than 1514 bytes. A hypothesis about this is that the
-		 * checksum buffer is only 1518 bytes, so when the frames get
-		 * bigger they get truncated, or the last few bytes get
-		 * overwritten by the FCS.
-		 *
-		 * Just use software checksumming and bypass on bigger frames.
-		 */
-		if (skb->ip_summed == CHECKSUM_PARTIAL) {
-			ret = skb_checksum_help(skb);
-			if (ret)
-				return ret;
-		}
-		word1 |= TSS_BYPASS_BIT;
-	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
+	if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		int tcp = 0;
 
 		/* We do not switch off the checksumming on non TCP/UDP