diff mbox series

[net,v6,3/3] net: dsa: microchip: monitor potential faults in half-duplex mode

Message ID 20240614094642.122464-4-enguerrand.de-ribaucourt@savoirfairelinux.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series Handle new Microchip KSZ 9897 Errata | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 859 this patch: 859
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 1 blamed authors not CCed: f.fainelli@gmail.com; 5 maintainers not CCed: pabeni@redhat.com kuba@kernel.org edumazet@google.com f.fainelli@gmail.com olteanv@gmail.com
netdev/build_clang success Errors and warnings before: 863 this patch: 863
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 863 this patch: 863
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-06-16--18-00 (tests: 659)

Commit Message

Enguerrand de Ribaucourt June 14, 2024, 9:46 a.m. UTC
The errata DS80000754 recommends monitoring potential faults in
half-duplex mode for the KSZ9477 family.

half-duplex is not very common so I just added a critical message
when the fault conditions are detected. The switch can be expected
to be unable to communicate anymore in these states and a software
reset of the switch would be required which I did not implement.

Fixes: b987e98e50ab ("dsa: add DSA switch driver for Microchip KSZ9477")
Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com>
---
v6:
 - use macros for PORT_INTF_SPEED_MASK check
 - add VLAN condition before checking the resources
v5: https://lore.kernel.org/all/20240604092304.314636-5-enguerrand.de-ribaucourt@savoirfairelinux.com/
 - use macros for bitmasks
 - check for return values on ksz_pread*
v4: https://lore.kernel.org/all/20240531142430.678198-6-enguerrand.de-ribaucourt@savoirfairelinux.com/
 - rebase on net/main
 - add Fixes tag
 - reverse x-mas tree
v3: https://lore.kernel.org/all/20240530102436.226189-6-enguerrand.de-ribaucourt@savoirfairelinux.com/
---
 drivers/net/dsa/microchip/ksz9477.c     | 47 +++++++++++++++++++++++++
 drivers/net/dsa/microchip/ksz9477.h     |  2 ++
 drivers/net/dsa/microchip/ksz9477_reg.h | 10 ++++--
 drivers/net/dsa/microchip/ksz_common.c  | 11 ++++++
 drivers/net/dsa/microchip/ksz_common.h  |  1 +
 5 files changed, 69 insertions(+), 2 deletions(-)

Comments

Andrew Lunn June 14, 2024, 1:37 p.m. UTC | #1
On Fri, Jun 14, 2024 at 09:46:42AM +0000, Enguerrand de Ribaucourt wrote:
> The errata DS80000754 recommends monitoring potential faults in
> half-duplex mode for the KSZ9477 family.
> 
> half-duplex is not very common so I just added a critical message
> when the fault conditions are detected. The switch can be expected
> to be unable to communicate anymore in these states and a software
> reset of the switch would be required which I did not implement.

If i'm reading this code correctly, every 30 seconds it will test to
see if the link is half duplex, and is so, log onetime this could lead
to problems. Also, every 30 seconds, if the statistics counts indicate
there has been a late collision, it will log a rate limited
message. Given the 30 second poll interval, rate limiting is probably
pointless, and every one will get logged. The last print, i have no
idea what resource you are talking about. Will it also likely print
once every 30 seconds?

Is the idea here, we want to notice when this is happening, and get an
idea if it is worth implementing the software reset? Do we want to add
a "Please report if you see this" to the commit message or the log
messages themselves?

	Andrew
Enguerrand de Ribaucourt June 14, 2024, 3:14 p.m. UTC | #2
On 14/06/2024 15:37, Andrew Lunn wrote:
> On Fri, Jun 14, 2024 at 09:46:42AM +0000, Enguerrand de Ribaucourt wrote:
>> The errata DS80000754 recommends monitoring potential faults in
>> half-duplex mode for the KSZ9477 family.
>>
>> half-duplex is not very common so I just added a critical message
>> when the fault conditions are detected. The switch can be expected
>> to be unable to communicate anymore in these states and a software
>> reset of the switch would be required which I did not implement.
> 
> If i'm reading this code correctly, every 30 seconds it will test to
> see if the link is half duplex, and is so, log onetime this could lead
> to problems. Also, every 30 seconds, if the statistics counts indicate
> there has been a late collision, it will log a rate limited
> message. Given the 30 second poll interval, rate limiting is probably
> pointless, and every one will get logged. The last print, i have no
> idea what resource you are talking about. Will it also likely print
> once every 30 seconds?

The MIB statistics are read every 5 seconds. It is defined in:
	dev->mib_read_interval = msecs_to_jiffies(5000);
So indeed, _ratelimit having a 5 second back-off, it is redundant.

The second print is about two resources:
  - Packet Memory Available Block Count (PMAVBC)
  - TX Queue Blocks Used Count (TXQBU/QM_TX_CNT)
According to the errata, they are leaky in half duplex mode. Once 
depleted, packets can't be exchanged.

Like the late collision counter, these are not going to go back to 
normal values without a reset. So yes, the critical message will keeping 
being displayed. Do you think a dev_crit_once would be more appropriate?

> 
> Is the idea here, we want to notice when this is happening, and get an
> idea if it is worth implementing the software reset? Do we want to add
> a "Please report if you see this" to the commit message or the log
> messages themselves?

I'm fine with just monitoring and printing in my use case. My devices 
are unlikely to encounter half-duplex peers and the app can recover. I 
simply need to guarantee I can detect this condition. For my use case, 
implementing the software reset wasn't worth the effort. One of the main 
obstacles is that the VLAN table would need to be reconfigured, so it 
would require remembering it from the driver. Also transmission would 
inevitably be stopped for a few seconds.

For generic users, I think the warning is necessary since it indicates 
the switch will stop functioning after prolonged use in half-duplex. We 
could elaborate the warning message to point to the datasheet. That way 
users could understand why their configuration is affected, what are the 
risks, and possible avoidance or recovery mechanisms. If this errata is 
critical to someone who can't avoid it, then we could implement the 
software reset.

Proposal for a v7:
  - dev_crit_ratelimit -> dev_crit_once
  - dev_warn_once -> point to datasheet + add "report if needed"

What is your opinion on this?

Thanks,

> 
> 	Andrew
diff mbox series

Patch

diff --git a/drivers/net/dsa/microchip/ksz9477.c b/drivers/net/dsa/microchip/ksz9477.c
index c2878dd0ad7e..d1499a9318f8 100644
--- a/drivers/net/dsa/microchip/ksz9477.c
+++ b/drivers/net/dsa/microchip/ksz9477.c
@@ -429,6 +429,53 @@  void ksz9477_freeze_mib(struct ksz_device *dev, int port, bool freeze)
 	mutex_unlock(&p->mib.cnt_mutex);
 }
 
+int ksz9477_errata_monitor(struct ksz_device *dev, int port,
+			   u64 tx_late_col)
+{
+	u32 pmavbc;
+	u8 status;
+	u16 pqm;
+	int ret;
+
+	ret = ksz_pread8(dev, port, REG_PORT_STATUS_0, &status);
+	if (ret)
+		return ret;
+	if (!(FIELD_GET(PORT_INTF_SPEED_MASK, status) == PORT_INTF_SPEED_NONE) &&
+	    !(status & PORT_INTF_FULL_DUPLEX)) {
+		dev_warn_once(dev->dev,
+			      "Half-duplex detected on port %d, transmission halt may occur\n",
+			      port);
+		/* Errata DS80000754 recommends monitoring potential faults in
+		 * half-duplex mode. The switch might not be able to communicate anymore
+		 * in these states.
+		 */
+		if (tx_late_col != 0) {
+			/* Transmission halt with late collisions */
+			dev_crit_ratelimited(dev->dev,
+					     "TX late collisions detected, transmission may be halted on port %d\n",
+					     port);
+		}
+		ret = ksz_read8(dev, REG_SW_LUE_CTRL_0, &status);
+		if (ret)
+			return ret;
+		if (status & SW_VLAN_ENABLE) {
+			ret = ksz_pread16(dev, port, REG_PORT_QM_TX_CNT_0__4, &pqm);
+			if (ret)
+				return ret;
+			ret = ksz_read32(dev, REG_PMAVBC, &pmavbc);
+			if (ret)
+				return ret;
+			if ((FIELD_GET(PMAVBC_MASK, pmavbc) <= PMAVBC_MIN) ||
+			    (FIELD_GET(PORT_QM_TX_CNT_M, pqm) >= PORT_QM_TX_CNT_MAX)) {
+				/* Transmission halt with Half-Duplex and VLAN */
+				dev_crit_ratelimited(dev->dev,
+						     "resources out of limits, transmission may be halted\n");
+			}
+		}
+	}
+	return ret;
+}
+
 void ksz9477_port_init_cnt(struct ksz_device *dev, int port)
 {
 	struct ksz_port_mib *mib = &dev->ports[port].mib;
diff --git a/drivers/net/dsa/microchip/ksz9477.h b/drivers/net/dsa/microchip/ksz9477.h
index ce1e656b800b..239a281da10b 100644
--- a/drivers/net/dsa/microchip/ksz9477.h
+++ b/drivers/net/dsa/microchip/ksz9477.h
@@ -36,6 +36,8 @@  int ksz9477_port_mirror_add(struct ksz_device *dev, int port,
 			    bool ingress, struct netlink_ext_ack *extack);
 void ksz9477_port_mirror_del(struct ksz_device *dev, int port,
 			     struct dsa_mall_mirror_tc_entry *mirror);
+int ksz9477_errata_monitor(struct ksz_device *dev, int port,
+			   u64 tx_late_col);
 void ksz9477_get_caps(struct ksz_device *dev, int port,
 		      struct phylink_config *config);
 int ksz9477_fdb_dump(struct ksz_device *dev, int port,
diff --git a/drivers/net/dsa/microchip/ksz9477_reg.h b/drivers/net/dsa/microchip/ksz9477_reg.h
index fb124be8edd3..d5354c600ea1 100644
--- a/drivers/net/dsa/microchip/ksz9477_reg.h
+++ b/drivers/net/dsa/microchip/ksz9477_reg.h
@@ -843,8 +843,8 @@ 
 
 #define REG_PORT_STATUS_0		0x0030
 
-#define PORT_INTF_SPEED_M		0x3
-#define PORT_INTF_SPEED_S		3
+#define PORT_INTF_SPEED_MASK		GENMASK(4, 3)
+#define PORT_INTF_SPEED_NONE		GENMASK(1, 0)
 #define PORT_INTF_FULL_DUPLEX		BIT(2)
 #define PORT_TX_FLOW_CTRL		BIT(1)
 #define PORT_RX_FLOW_CTRL		BIT(0)
@@ -1168,6 +1168,11 @@ 
 #define PORT_RMII_CLK_SEL		BIT(7)
 #define PORT_MII_SEL_EDGE		BIT(5)
 
+#define REG_PMAVBC			0x03AC
+
+#define PMAVBC_MASK			GENMASK(26, 16)
+#define PMAVBC_MIN			0x580
+
 /* 4 - MAC */
 #define REG_PORT_MAC_CTRL_0		0x0400
 
@@ -1495,6 +1500,7 @@ 
 
 #define PORT_QM_TX_CNT_USED_S		0
 #define PORT_QM_TX_CNT_M		(BIT(11) - 1)
+#define PORT_QM_TX_CNT_MAX		0x200
 
 #define REG_PORT_QM_TX_CNT_1__4		0x0A14
 
diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c
index 2818e24e2a51..0433109b42e5 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -1382,6 +1382,7 @@  const struct ksz_chip_data ksz_switch_chips[] = {
 		.tc_cbs_supported = true,
 		.ops = &ksz9477_dev_ops,
 		.phylink_mac_ops = &ksz9477_phylink_mac_ops,
+		.phy_errata_9477 = true,
 		.mib_names = ksz9477_mib_names,
 		.mib_cnt = ARRAY_SIZE(ksz9477_mib_names),
 		.reg_mib_cnt = MIB_COUNTER_NUM,
@@ -1416,6 +1417,7 @@  const struct ksz_chip_data ksz_switch_chips[] = {
 		.num_ipms = 8,
 		.ops = &ksz9477_dev_ops,
 		.phylink_mac_ops = &ksz9477_phylink_mac_ops,
+		.phy_errata_9477 = true,
 		.mib_names = ksz9477_mib_names,
 		.mib_cnt = ARRAY_SIZE(ksz9477_mib_names),
 		.reg_mib_cnt = MIB_COUNTER_NUM,
@@ -1450,6 +1452,7 @@  const struct ksz_chip_data ksz_switch_chips[] = {
 		.num_ipms = 8,
 		.ops = &ksz9477_dev_ops,
 		.phylink_mac_ops = &ksz9477_phylink_mac_ops,
+		.phy_errata_9477 = true,
 		.mib_names = ksz9477_mib_names,
 		.mib_cnt = ARRAY_SIZE(ksz9477_mib_names),
 		.reg_mib_cnt = MIB_COUNTER_NUM,
@@ -1540,6 +1543,7 @@  const struct ksz_chip_data ksz_switch_chips[] = {
 		.tc_cbs_supported = true,
 		.ops = &ksz9477_dev_ops,
 		.phylink_mac_ops = &ksz9477_phylink_mac_ops,
+		.phy_errata_9477 = true,
 		.mib_names = ksz9477_mib_names,
 		.mib_cnt = ARRAY_SIZE(ksz9477_mib_names),
 		.reg_mib_cnt = MIB_COUNTER_NUM,
@@ -1820,6 +1824,7 @@  void ksz_r_mib_stats64(struct ksz_device *dev, int port)
 	struct rtnl_link_stats64 *stats;
 	struct ksz_stats_raw *raw;
 	struct ksz_port_mib *mib;
+	int ret;
 
 	mib = &dev->ports[port].mib;
 	stats = &mib->stats64;
@@ -1861,6 +1866,12 @@  void ksz_r_mib_stats64(struct ksz_device *dev, int port)
 	pstats->rx_pause_frames = raw->rx_pause;
 
 	spin_unlock(&mib->stats64_lock);
+
+	if (dev->info->phy_errata_9477) {
+		ret = ksz9477_errata_monitor(dev, port, raw->tx_late_col);
+		if (ret)
+			dev_err(dev->dev, "Failed to monitor transmission halt\n");
+	}
 }
 
 void ksz88xx_r_mib_stats64(struct ksz_device *dev, int port)
diff --git a/drivers/net/dsa/microchip/ksz_common.h b/drivers/net/dsa/microchip/ksz_common.h
index c784fd23a993..ee7db46e469d 100644
--- a/drivers/net/dsa/microchip/ksz_common.h
+++ b/drivers/net/dsa/microchip/ksz_common.h
@@ -66,6 +66,7 @@  struct ksz_chip_data {
 	bool tc_cbs_supported;
 	const struct ksz_dev_ops *ops;
 	const struct phylink_mac_ops *phylink_mac_ops;
+	bool phy_errata_9477;
 	bool ksz87xx_eee_link_erratum;
 	const struct ksz_mib_names *mib_names;
 	int mib_cnt;