diff mbox series

[net,4/4] net: ethernet: cortina: Handle large frames

Message ID 20231104-gemini-largeframe-fix-v1-4-9c5513f22f33@linaro.org (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series Fix large frames in the Gemini ethernet driver | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 1312 this patch: 1313
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang fail Errors and warnings before: 1340 this patch: 1342
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn fail Errors and warnings before: 1340 this patch: 1341
netdev/checkpatch warning WARNING: 'inteface' may be misspelled - perhaps 'interface'?
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Linus Walleij Nov. 4, 2023, 12:43 p.m. UTC
The Gemini ethernet controller provides hardware checksumming
for frames up to 1514 bytes including ethernet headers but not
FCS.

If we start sending bigger frames (after first bumping up the MTU
on both interfaces sending and receiveing the frames), truncated
packets start to appear on the target such as in this tcpdump
resulting from ping -s 1474:

23:34:17.241983 14:d6:4d:a8:3c:4f (oui Unknown) > bc:ae:c5:6b:a8:3d (oui Unknown),
ethertype IPv4 (0x0800), length 1514: truncated-ip - 2 bytes missing!
(tos 0x0, ttl 64, id 32653, offset 0, flags [DF], proto ICMP (1), length 1502)
OpenWrt.lan > Fecusia: ICMP echo request, id 1672, seq 50, length 1482

If we bypass the hardware checksumming and provide a software
fallback, everything starts working fine up to the max TX MTU
of 2047 bytes, for example ping -s2000 192.168.1.2:

00:44:29.587598 bc:ae:c5:6b:a8:3d (oui Unknown) > 14:d6:4d:a8:3c:4f (oui Unknown),
ethertype IPv4 (0x0800), length 2042:
(tos 0x0, ttl 64, id 51828, offset 0, flags [none], proto ICMP (1), length 2028)
Fecusia > OpenWrt.lan: ICMP echo reply, id 1683, seq 4, length 2008

The bit enabling to bypass hardware checksum (or any of the
"TSS" bits) are undocumented in the hardware reference manual.
The entire hardware checksum unit appears undocumented. The
conclusion that we need to use the "bypass" bit was found by
trial-and-error.

On the D-Link DIR-685 router this fixes a bug on the conduit
interface to the RTL8366RB DSA switch: as the switch needs to add
space for its tag it increases the MTU on the conduit interface
to 1504 and that means that when the router sends packages
of 1500 bytes these get an extra 4 bytes of DSA tag and the
transfer fails because of the erroneous hardware checksumming,
affecting such basic functionality as the LuCI web inteface.

Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 drivers/net/ethernet/cortina/gemini.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Andrew Lunn Nov. 4, 2023, 2:56 p.m. UTC | #1
> @@ -1170,7 +1171,14 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
>  		word3 |= mtu;
>  	}
>  
> -	if (skb->ip_summed != CHECKSUM_NONE) {
> +	if (skb->len >= ETH_FRAME_LEN) {
> +		/* Hardware offloaded checksumming isn't working on frames
> +		 * bigger than 1514 bytes. Perhaps the buffer is only 1518
> +		 * bytes fitting mach a normal frame and a checksum?

mach ?

> +		 * Just bypass on bigger frames.
> +		 */
> +		word1 |= TSS_BYPASS_BIT;
> +	} else if (skb->ip_summed != CHECKSUM_NONE) {

I've never looked at how the network stack does checksums. But looking
at this patch, it made me wounder, how do you tell the stack it needs
to do a software checksum because the hardware cannot? Or for this
driver, is it always calculating a checksum, which is then ignored?
Maybe you can improve performance a little but disabling software
checksum when it is not needed?

	 Andrew
kernel test robot Nov. 4, 2023, 3:18 p.m. UTC | #2
Hi Linus,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 90b0c2b2edd1adff742c621e246562fbefa11b70]

url:    https://github.com/intel-lab-lkp/linux/commits/Linus-Walleij/net-ethernet-cortina-Fix-MTU-max-setting/20231104-204432
base:   90b0c2b2edd1adff742c621e246562fbefa11b70
patch link:    https://lore.kernel.org/r/20231104-gemini-largeframe-fix-v1-4-9c5513f22f33%40linaro.org
patch subject: [PATCH net 4/4] net: ethernet: cortina: Handle large frames
config: arc-randconfig-001-20231104 (https://download.01.org/0day-ci/archive/20231104/202311042310.Kl043sv2-lkp@intel.com/config)
compiler: arc-elf-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231104/202311042310.Kl043sv2-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311042310.Kl043sv2-lkp@intel.com/

All warnings (new ones prefixed by >>):

   drivers/net/ethernet/cortina/gemini.c: In function 'gmac_map_tx_bufs':
>> drivers/net/ethernet/cortina/gemini.c:1148:13: warning: unused variable 'ret' [-Wunused-variable]
    1148 |         int ret;
         |             ^~~


vim +/ret +1148 drivers/net/ethernet/cortina/gemini.c

  1132	
  1133	static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
  1134				    struct gmac_txq *txq, unsigned short *desc)
  1135	{
  1136		struct gemini_ethernet_port *port = netdev_priv(netdev);
  1137		struct skb_shared_info *skb_si =  skb_shinfo(skb);
  1138		unsigned short m = (1 << port->txq_order) - 1;
  1139		short frag, last_frag = skb_si->nr_frags - 1;
  1140		struct gemini_ethernet *geth = port->geth;
  1141		unsigned int word1, word3, buflen;
  1142		unsigned short w = *desc;
  1143		struct gmac_txdesc *txd;
  1144		skb_frag_t *skb_frag;
  1145		dma_addr_t mapping;
  1146		unsigned short mtu;
  1147		void *buffer;
> 1148		int ret;
  1149	
  1150		mtu  = ETH_HLEN;
  1151		mtu += netdev->mtu;
  1152		if (skb->protocol == htons(ETH_P_8021Q))
  1153			mtu += VLAN_HLEN;
  1154	
  1155		if (mtu > MTU_SIZE_BIT_MASK) {
  1156			netdev_err(netdev, "%s: MTU too big, max size 2047 bytes, capped\n", __func__);
  1157			mtu = MTU_SIZE_BIT_MASK;
  1158		}
  1159	
  1160		if (skb->len > 65535) {
  1161			/* The field for length is only 16 bits */
  1162			netdev_err(netdev, "%s: frame too big, max size 65535 bytes\n", __func__);
  1163			return -EINVAL;
  1164		}
  1165	
  1166		word1 = skb->len;
  1167		word3 = SOF_BIT;
  1168	
  1169		if (word1 > mtu) {
  1170			word1 |= TSS_MTU_ENABLE_BIT;
  1171			word3 |= mtu;
  1172		}
  1173	
  1174		if (skb->len >= ETH_FRAME_LEN) {
  1175			/* Hardware offloaded checksumming isn't working on frames
  1176			 * bigger than 1514 bytes. Perhaps the buffer is only 1518
  1177			 * bytes fitting mach a normal frame and a checksum?
  1178			 * Just bypass on bigger frames.
  1179			 */
  1180			word1 |= TSS_BYPASS_BIT;
  1181		} else if (skb->ip_summed != CHECKSUM_NONE) {
  1182			int tcp = 0;
  1183	
  1184			if (skb->protocol == htons(ETH_P_IP)) {
  1185				word1 |= TSS_IP_CHKSUM_BIT;
  1186				tcp = ip_hdr(skb)->protocol == IPPROTO_TCP;
  1187			} else { /* IPv6 */
  1188				word1 |= TSS_IPV6_ENABLE_BIT;
  1189				tcp = ipv6_hdr(skb)->nexthdr == IPPROTO_TCP;
  1190			}
  1191	
  1192			word1 |= tcp ? TSS_TCP_CHKSUM_BIT : TSS_UDP_CHKSUM_BIT;
  1193		}
  1194	
  1195		frag = -1;
  1196		while (frag <= last_frag) {
  1197			if (frag == -1) {
  1198				buffer = skb->data;
  1199				buflen = skb_headlen(skb);
  1200			} else {
  1201				skb_frag = skb_si->frags + frag;
  1202				buffer = skb_frag_address(skb_frag);
  1203				buflen = skb_frag_size(skb_frag);
  1204			}
  1205	
  1206			if (frag == last_frag) {
  1207				word3 |= EOF_BIT;
  1208				txq->skb[w] = skb;
  1209			}
  1210	
  1211			mapping = dma_map_single(geth->dev, buffer, buflen,
  1212						 DMA_TO_DEVICE);
  1213			if (dma_mapping_error(geth->dev, mapping))
  1214				goto map_error;
  1215	
  1216			txd = txq->ring + w;
  1217			txd->word0.bits32 = buflen;
  1218			txd->word1.bits32 = word1;
  1219			txd->word2.buf_adr = mapping;
  1220			txd->word3.bits32 = word3;
  1221	
  1222			word3 &= MTU_SIZE_BIT_MASK;
  1223			w++;
  1224			w &= m;
  1225			frag++;
  1226		}
  1227	
  1228		*desc = w;
  1229		return 0;
  1230	
  1231	map_error:
  1232		while (w != *desc) {
  1233			w--;
  1234			w &= m;
  1235	
  1236			dma_unmap_page(geth->dev, txq->ring[w].word2.buf_adr,
  1237				       txq->ring[w].word0.bits.buffer_size,
  1238				       DMA_TO_DEVICE);
  1239		}
  1240		return -ENOMEM;
  1241	}
  1242
Linus Walleij Nov. 5, 2023, 8:56 p.m. UTC | #3
On Sat, Nov 4, 2023 at 3:57 PM Andrew Lunn <andrew@lunn.ch> wrote:

> > +              * Just bypass on bigger frames.
> > +              */
> > +             word1 |= TSS_BYPASS_BIT;
> > +     } else if (skb->ip_summed != CHECKSUM_NONE) {
>
> I've never looked at how the network stack does checksums. But looking
> at this patch, it made me wounder, how do you tell the stack it needs
> to do a software checksum because the hardware cannot?

I read up on it: the documentation is in
Documentation/networking/checksum-offloads.rst
and in the header for skbuff, include/linux/skbuff.h

Actually we should check for == CHECKSUM_PARTIAL which means
we need to do the checksum (!= CHECKSUM_NONE is not inclusive)
then I call a software fallback directly from the driver if I need to.

> Or for this
> driver, is it always calculating a checksum, which is then ignored?
> Maybe you can improve performance a little but disabling software
> checksum when it is not needed?

The ping was somehow working without proper checksum
before, but I think I'm doing the right thing now, also tested with
HTTP traffic, check out v2.

Thanks for pointing it out, the patch looks way better now.

Yours,
Linus Walleij
diff mbox series

Patch

diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index 23723c9c0f93..063e58639379 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -1145,6 +1145,7 @@  static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
 	dma_addr_t mapping;
 	unsigned short mtu;
 	void *buffer;
+	int ret;
 
 	mtu  = ETH_HLEN;
 	mtu += netdev->mtu;
@@ -1170,7 +1171,14 @@  static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
 		word3 |= mtu;
 	}
 
-	if (skb->ip_summed != CHECKSUM_NONE) {
+	if (skb->len >= ETH_FRAME_LEN) {
+		/* Hardware offloaded checksumming isn't working on frames
+		 * bigger than 1514 bytes. Perhaps the buffer is only 1518
+		 * bytes fitting mach a normal frame and a checksum?
+		 * Just bypass on bigger frames.
+		 */
+		word1 |= TSS_BYPASS_BIT;
+	} else if (skb->ip_summed != CHECKSUM_NONE) {
 		int tcp = 0;
 
 		if (skb->protocol == htons(ETH_P_IP)) {