diff mbox series

[net,v3] net: dsa: tag_rtl4_a: Bump min packet size

Message ID 20231031-fix-rtl8366rb-v3-1-04dfc4e7d90e@linaro.org (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net,v3] net: dsa: tag_rtl4_a: Bump min packet size | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1344 this patch: 1344
netdev/cc_maintainers fail 1 blamed authors not CCed: dqfext@gmail.com; 1 maintainers not CCed: dqfext@gmail.com
netdev/build_clang success Errors and warnings before: 1369 this patch: 1369
netdev/verify_signedoff fail author Signed-off-by missing
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1369 this patch: 1369
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 24 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Linus Walleij Oct. 31, 2023, 10:45 p.m. UTC
It was reported that the "LuCI" web UI was not working properly
with a device using the RTL8366RB switch. Disabling the egress
port tagging code made the switch work again, but this is not
a good solution as we want to be able to direct traffic to a
certain port.

It turns out that packets between 1496 and 1500 bytes are
dropped unless 4 extra blank bytes are added to the tail.

1496 is the size of a normal data frame minus the internal DSA
tag. The number is intuitive, the explanation evades me.
This is not a tail tag since frames below 1496 bytes does
not need the extra bytes.

As we completely lack datasheet or any insight into how this
switch works, this trial-and-error solution is the best we
can do. FWIW this bug may very well be the reason why Realteks
code drops are not using CPU tagging. The vendor routers using
this switch are all hardwired to disable CPU tagging and all
packets are pushed to all ports on TX. This is also the case
in the old OpenWrt driver, derived from the vendor code drops.

Modifying the MTU on the switch (one setting affecting all
ports) has no effect.

Before this patch:

PING 192.168.1.1 (192.168.1.1) 1470(1498) bytes of data.
^C
--- 192.168.1.1 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1048ms

PING 192.168.1.1 (192.168.1.1) 1472(1500) bytes of data.
^C
--- 192.168.1.1 ping statistics ---
12 packets transmitted, 0 received, 100% packet loss, time 11267ms

After this patch:

PING 192.168.1.1 (192.168.1.1) 1470(1498) bytes of data.
1478 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.533 ms
1478 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=0.630 ms

PING 192.168.1.1 (192.168.1.1) 1472(1500) bytes of data.
1480 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.527 ms
1480 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=0.562 ms

Also LuCI starts working with the 1500 bytes frames it produces
from the HTTP server.

Fixes: 9eb8bc593a5e ("net: dsa: tag_rtl4_a: fix egress tags")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
Changes in v3:
- Do not pad out to 1518 bytes, just add 4 bytes to the
  tail consistently when the package is bigger than 1496
  bytes. Use a combination of __skb_pad() and __skb_put().
  This works fine.
- Add tail in the first if-clause and pad to 60 bytes
  in the else-clause since they are mutually exclusive.
- Edit comments and commit text.
- Link to v2: https://lore.kernel.org/r/20231030-fix-rtl8366rb-v2-1-e66e1ef7dbd2@linaro.org

Changes in v2:
- Pad packages >= 1496 bytes after some further tests
- Use more to-the-point description
- Provide ping results in the commit message
- Link to v1: https://lore.kernel.org/r/20231027-fix-rtl8366rb-v1-1-d565d905535a@linaro.org
---
 net/dsa/tag_rtl4_a.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)


---
base-commit: d9e164e4199bc465b3540d5fe3ffc8a9a4fc6157
change-id: 20231027-fix-rtl8366rb-e752bd5901ca

Best regards,

Comments

Linus Walleij Nov. 1, 2023, 8:18 p.m. UTC | #1
On Tue, Oct 31, 2023 at 11:45 PM Linus Walleij <linus.walleij@linaro.org> wrote:

> It was reported that the "LuCI" web UI was not working properly
> with a device using the RTL8366RB switch. Disabling the egress
> port tagging code made the switch work again, but this is not
> a good solution as we want to be able to direct traffic to a
> certain port.

Luiz is not seeing this on his ethernet controller so:

pw-bot: cr

(I've seen Vladmir do this, I don't know what it means, but seems
to be how to hold back patches.)

Yours,
Linus Walleij
Florian Fainelli Nov. 2, 2023, 6:43 p.m. UTC | #2
On 11/1/23 13:18, Linus Walleij wrote:
> On Tue, Oct 31, 2023 at 11:45 PM Linus Walleij <linus.walleij@linaro.org> wrote:
> 
>> It was reported that the "LuCI" web UI was not working properly
>> with a device using the RTL8366RB switch. Disabling the egress
>> port tagging code made the switch work again, but this is not
>> a good solution as we want to be able to direct traffic to a
>> certain port.
> 
> Luiz is not seeing this on his ethernet controller so:
> 
> pw-bot: cr
> 
> (I've seen Vladmir do this, I don't know what it means, but seems
> to be how to hold back patches.)

Looking at drivers/net/ethernet/cortina/gemini.c, should not we account 
for when the MAC is used as a conduit and include the right amount of 
"MTU" bytes? Something like this (compile tested only):

diff --git a/drivers/net/ethernet/cortina/gemini.c 
b/drivers/net/ethernet/cortina/gemini.c
index 5423fe26b4ef..5143f3734c3b 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -36,6 +36,7 @@
  #include <linux/ethtool.h>
  #include <linux/tcp.h>
  #include <linux/u64_stats_sync.h>
+#include <net/dsa.h>

  #include <linux/in.h>
  #include <linux/ip.h>
@@ -1151,6 +1152,13 @@ static int gmac_map_tx_bufs(struct net_device 
*netdev, struct sk_buff *skb,
         if (skb->protocol == htons(ETH_P_8021Q))
                 mtu += VLAN_HLEN;

+#if IS_ENABLED(CONFIG_NET_DSA)
+       if (netdev_uses_dsa(netdev)) {
+               const struct dsa_device_ops *ops = 
skb->dev->dsa_ptr->tag_ops;
+               mtu += ops->needed_headroom;
+       }
+#endif
+
         word1 = skb->len;
         word3 = SOF_BIT;

Also, as a separate check, might be worth annotating the various 
descriptor words with __le32 and appropriate le32_to_cpu() and 
cpu_to_le32() accessors for each of those fields.
Linus Walleij Nov. 2, 2023, 10:09 p.m. UTC | #3
On Thu, Nov 2, 2023 at 7:43 PM Florian Fainelli <f.fainelli@gmail.com> wrote:

> Looking at drivers/net/ethernet/cortina/gemini.c, should not we account
> for when the MAC is used as a conduit and include the right amount of
> "MTU" bytes? Something like this (compile tested only):

The DSA core already fixes this by adding the tag size to the MTU
of the conduit interface, so netdev->mtu is already 1504 for this
switch.

I found other oddities though so I'm digging into the driver!

Thanks,
Linus Walleij
Florian Fainelli Nov. 2, 2023, 10:24 p.m. UTC | #4
On 11/2/23 15:09, Linus Walleij wrote:
> On Thu, Nov 2, 2023 at 7:43 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
> 
>> Looking at drivers/net/ethernet/cortina/gemini.c, should not we account
>> for when the MAC is used as a conduit and include the right amount of
>> "MTU" bytes? Something like this (compile tested only):
> 
> The DSA core already fixes this by adding the tag size to the MTU
> of the conduit interface, so netdev->mtu is already 1504 for this
> switch.
> 
> I found other oddities though so I'm digging into the driver!

Yes indeed, I forgot about that, never mind :)
Simon Horman Nov. 4, 2023, 2:10 p.m. UTC | #5
On Wed, Nov 01, 2023 at 09:18:47PM +0100, Linus Walleij wrote:
> On Tue, Oct 31, 2023 at 11:45 PM Linus Walleij <linus.walleij@linaro.org> wrote:
> 
> > It was reported that the "LuCI" web UI was not working properly
> > with a device using the RTL8366RB switch. Disabling the egress
> > port tagging code made the switch work again, but this is not
> > a good solution as we want to be able to direct traffic to a
> > certain port.
> 
> Luiz is not seeing this on his ethernet controller so:
> 
> pw-bot: cr
> 
> (I've seen Vladmir do this, I don't know what it means, but seems
> to be how to hold back patches.)

Hi Linus,

In this case it may not have activated the automation, but
I do see that the patch is now marked as "Changes Requested"
in patchwork, so all is well.

  https://patchwork.kernel.org/project/netdevbpf/list/?series=798030&state=%2A

FWIIW, pw-bot is (slightly) documented here:

  https://docs.kernel.org/process/maintainer-netdev.html#updating-patch-status
Linus Walleij Nov. 4, 2023, 10:17 p.m. UTC | #6
On Sat, Nov 4, 2023 at 3:10 PM Simon Horman <horms@kernel.org> wrote:

> In this case it may not have activated the automation, but
> I do see that the patch is now marked as "Changes Requested"
> in patchwork, so all is well.

Yeah, in this case it should even be

pw-bot: reject

because I found the real problem elsewhere.

> FWIIW, pw-bot is (slightly) documented here:
>
>   https://docs.kernel.org/process/maintainer-netdev.html#updating-patch-status

Thanks, I'm getting better at it!

Yours,
Linus Walleij
diff mbox series

Patch

diff --git a/net/dsa/tag_rtl4_a.c b/net/dsa/tag_rtl4_a.c
index c327314b95e3..9c7f8a89cb82 100644
--- a/net/dsa/tag_rtl4_a.c
+++ b/net/dsa/tag_rtl4_a.c
@@ -41,9 +41,21 @@  static struct sk_buff *rtl4a_tag_xmit(struct sk_buff *skb,
 	u8 *tag;
 	u16 out;
 
-	/* Pad out to at least 60 bytes */
-	if (unlikely(__skb_put_padto(skb, ETH_ZLEN, false)))
-		return NULL;
+	/* Packets over 1496 bytes get dropped unless 4 bytes are added
+	 * on the tail. 1496 is ETH_DATA_LEN - tag which is hardly
+	 * a coinicidence, and the 4 bytes correspond to the tag length
+	 * and this is hardly a coinicidence so we use these defines
+	 * here.
+	 */
+	if (skb->len >= (ETH_DATA_LEN - RTL4_A_HDR_LEN)) {
+		if (unlikely(__skb_pad(skb, RTL4_A_HDR_LEN, false)))
+			return NULL;
+		__skb_put(skb, RTL4_A_HDR_LEN);
+	} else {
+		/* Pad out to at least 60 bytes */
+		if (unlikely(__skb_put_padto(skb, ETH_ZLEN, false)))
+			return NULL;
+	}
 
 	netdev_dbg(dev, "add realtek tag to package to port %d\n",
 		   dp->index);