Message ID | 20240614094642.122464-4-enguerrand.de-ribaucourt@savoirfairelinux.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Handle new Microchip KSZ 9897 Errata | expand |
On Fri, Jun 14, 2024 at 09:46:42AM +0000, Enguerrand de Ribaucourt wrote: > The errata DS80000754 recommends monitoring potential faults in > half-duplex mode for the KSZ9477 family. > > half-duplex is not very common so I just added a critical message > when the fault conditions are detected. The switch can be expected > to be unable to communicate anymore in these states and a software > reset of the switch would be required which I did not implement. If i'm reading this code correctly, every 30 seconds it will test to see if the link is half duplex, and is so, log onetime this could lead to problems. Also, every 30 seconds, if the statistics counts indicate there has been a late collision, it will log a rate limited message. Given the 30 second poll interval, rate limiting is probably pointless, and every one will get logged. The last print, i have no idea what resource you are talking about. Will it also likely print once every 30 seconds? Is the idea here, we want to notice when this is happening, and get an idea if it is worth implementing the software reset? Do we want to add a "Please report if you see this" to the commit message or the log messages themselves? Andrew
On 14/06/2024 15:37, Andrew Lunn wrote: > On Fri, Jun 14, 2024 at 09:46:42AM +0000, Enguerrand de Ribaucourt wrote: >> The errata DS80000754 recommends monitoring potential faults in >> half-duplex mode for the KSZ9477 family. >> >> half-duplex is not very common so I just added a critical message >> when the fault conditions are detected. The switch can be expected >> to be unable to communicate anymore in these states and a software >> reset of the switch would be required which I did not implement. > > If i'm reading this code correctly, every 30 seconds it will test to > see if the link is half duplex, and is so, log onetime this could lead > to problems. Also, every 30 seconds, if the statistics counts indicate > there has been a late collision, it will log a rate limited > message. Given the 30 second poll interval, rate limiting is probably > pointless, and every one will get logged. The last print, i have no > idea what resource you are talking about. Will it also likely print > once every 30 seconds? The MIB statistics are read every 5 seconds. It is defined in: dev->mib_read_interval = msecs_to_jiffies(5000); So indeed, _ratelimit having a 5 second back-off, it is redundant. The second print is about two resources: - Packet Memory Available Block Count (PMAVBC) - TX Queue Blocks Used Count (TXQBU/QM_TX_CNT) According to the errata, they are leaky in half duplex mode. Once depleted, packets can't be exchanged. Like the late collision counter, these are not going to go back to normal values without a reset. So yes, the critical message will keeping being displayed. Do you think a dev_crit_once would be more appropriate? > > Is the idea here, we want to notice when this is happening, and get an > idea if it is worth implementing the software reset? Do we want to add > a "Please report if you see this" to the commit message or the log > messages themselves? I'm fine with just monitoring and printing in my use case. My devices are unlikely to encounter half-duplex peers and the app can recover. I simply need to guarantee I can detect this condition. For my use case, implementing the software reset wasn't worth the effort. One of the main obstacles is that the VLAN table would need to be reconfigured, so it would require remembering it from the driver. Also transmission would inevitably be stopped for a few seconds. For generic users, I think the warning is necessary since it indicates the switch will stop functioning after prolonged use in half-duplex. We could elaborate the warning message to point to the datasheet. That way users could understand why their configuration is affected, what are the risks, and possible avoidance or recovery mechanisms. If this errata is critical to someone who can't avoid it, then we could implement the software reset. Proposal for a v7: - dev_crit_ratelimit -> dev_crit_once - dev_warn_once -> point to datasheet + add "report if needed" What is your opinion on this? Thanks, > > Andrew
diff --git a/drivers/net/dsa/microchip/ksz9477.c b/drivers/net/dsa/microchip/ksz9477.c index c2878dd0ad7e..d1499a9318f8 100644 --- a/drivers/net/dsa/microchip/ksz9477.c +++ b/drivers/net/dsa/microchip/ksz9477.c @@ -429,6 +429,53 @@ void ksz9477_freeze_mib(struct ksz_device *dev, int port, bool freeze) mutex_unlock(&p->mib.cnt_mutex); } +int ksz9477_errata_monitor(struct ksz_device *dev, int port, + u64 tx_late_col) +{ + u32 pmavbc; + u8 status; + u16 pqm; + int ret; + + ret = ksz_pread8(dev, port, REG_PORT_STATUS_0, &status); + if (ret) + return ret; + if (!(FIELD_GET(PORT_INTF_SPEED_MASK, status) == PORT_INTF_SPEED_NONE) && + !(status & PORT_INTF_FULL_DUPLEX)) { + dev_warn_once(dev->dev, + "Half-duplex detected on port %d, transmission halt may occur\n", + port); + /* Errata DS80000754 recommends monitoring potential faults in + * half-duplex mode. The switch might not be able to communicate anymore + * in these states. + */ + if (tx_late_col != 0) { + /* Transmission halt with late collisions */ + dev_crit_ratelimited(dev->dev, + "TX late collisions detected, transmission may be halted on port %d\n", + port); + } + ret = ksz_read8(dev, REG_SW_LUE_CTRL_0, &status); + if (ret) + return ret; + if (status & SW_VLAN_ENABLE) { + ret = ksz_pread16(dev, port, REG_PORT_QM_TX_CNT_0__4, &pqm); + if (ret) + return ret; + ret = ksz_read32(dev, REG_PMAVBC, &pmavbc); + if (ret) + return ret; + if ((FIELD_GET(PMAVBC_MASK, pmavbc) <= PMAVBC_MIN) || + (FIELD_GET(PORT_QM_TX_CNT_M, pqm) >= PORT_QM_TX_CNT_MAX)) { + /* Transmission halt with Half-Duplex and VLAN */ + dev_crit_ratelimited(dev->dev, + "resources out of limits, transmission may be halted\n"); + } + } + } + return ret; +} + void ksz9477_port_init_cnt(struct ksz_device *dev, int port) { struct ksz_port_mib *mib = &dev->ports[port].mib; diff --git a/drivers/net/dsa/microchip/ksz9477.h b/drivers/net/dsa/microchip/ksz9477.h index ce1e656b800b..239a281da10b 100644 --- a/drivers/net/dsa/microchip/ksz9477.h +++ b/drivers/net/dsa/microchip/ksz9477.h @@ -36,6 +36,8 @@ int ksz9477_port_mirror_add(struct ksz_device *dev, int port, bool ingress, struct netlink_ext_ack *extack); void ksz9477_port_mirror_del(struct ksz_device *dev, int port, struct dsa_mall_mirror_tc_entry *mirror); +int ksz9477_errata_monitor(struct ksz_device *dev, int port, + u64 tx_late_col); void ksz9477_get_caps(struct ksz_device *dev, int port, struct phylink_config *config); int ksz9477_fdb_dump(struct ksz_device *dev, int port, diff --git a/drivers/net/dsa/microchip/ksz9477_reg.h b/drivers/net/dsa/microchip/ksz9477_reg.h index fb124be8edd3..d5354c600ea1 100644 --- a/drivers/net/dsa/microchip/ksz9477_reg.h +++ b/drivers/net/dsa/microchip/ksz9477_reg.h @@ -843,8 +843,8 @@ #define REG_PORT_STATUS_0 0x0030 -#define PORT_INTF_SPEED_M 0x3 -#define PORT_INTF_SPEED_S 3 +#define PORT_INTF_SPEED_MASK GENMASK(4, 3) +#define PORT_INTF_SPEED_NONE GENMASK(1, 0) #define PORT_INTF_FULL_DUPLEX BIT(2) #define PORT_TX_FLOW_CTRL BIT(1) #define PORT_RX_FLOW_CTRL BIT(0) @@ -1168,6 +1168,11 @@ #define PORT_RMII_CLK_SEL BIT(7) #define PORT_MII_SEL_EDGE BIT(5) +#define REG_PMAVBC 0x03AC + +#define PMAVBC_MASK GENMASK(26, 16) +#define PMAVBC_MIN 0x580 + /* 4 - MAC */ #define REG_PORT_MAC_CTRL_0 0x0400 @@ -1495,6 +1500,7 @@ #define PORT_QM_TX_CNT_USED_S 0 #define PORT_QM_TX_CNT_M (BIT(11) - 1) +#define PORT_QM_TX_CNT_MAX 0x200 #define REG_PORT_QM_TX_CNT_1__4 0x0A14 diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c index 2818e24e2a51..0433109b42e5 100644 --- a/drivers/net/dsa/microchip/ksz_common.c +++ b/drivers/net/dsa/microchip/ksz_common.c @@ -1382,6 +1382,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .tc_cbs_supported = true, .ops = &ksz9477_dev_ops, .phylink_mac_ops = &ksz9477_phylink_mac_ops, + .phy_errata_9477 = true, .mib_names = ksz9477_mib_names, .mib_cnt = ARRAY_SIZE(ksz9477_mib_names), .reg_mib_cnt = MIB_COUNTER_NUM, @@ -1416,6 +1417,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .num_ipms = 8, .ops = &ksz9477_dev_ops, .phylink_mac_ops = &ksz9477_phylink_mac_ops, + .phy_errata_9477 = true, .mib_names = ksz9477_mib_names, .mib_cnt = ARRAY_SIZE(ksz9477_mib_names), .reg_mib_cnt = MIB_COUNTER_NUM, @@ -1450,6 +1452,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .num_ipms = 8, .ops = &ksz9477_dev_ops, .phylink_mac_ops = &ksz9477_phylink_mac_ops, + .phy_errata_9477 = true, .mib_names = ksz9477_mib_names, .mib_cnt = ARRAY_SIZE(ksz9477_mib_names), .reg_mib_cnt = MIB_COUNTER_NUM, @@ -1540,6 +1543,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .tc_cbs_supported = true, .ops = &ksz9477_dev_ops, .phylink_mac_ops = &ksz9477_phylink_mac_ops, + .phy_errata_9477 = true, .mib_names = ksz9477_mib_names, .mib_cnt = ARRAY_SIZE(ksz9477_mib_names), .reg_mib_cnt = MIB_COUNTER_NUM, @@ -1820,6 +1824,7 @@ void ksz_r_mib_stats64(struct ksz_device *dev, int port) struct rtnl_link_stats64 *stats; struct ksz_stats_raw *raw; struct ksz_port_mib *mib; + int ret; mib = &dev->ports[port].mib; stats = &mib->stats64; @@ -1861,6 +1866,12 @@ void ksz_r_mib_stats64(struct ksz_device *dev, int port) pstats->rx_pause_frames = raw->rx_pause; spin_unlock(&mib->stats64_lock); + + if (dev->info->phy_errata_9477) { + ret = ksz9477_errata_monitor(dev, port, raw->tx_late_col); + if (ret) + dev_err(dev->dev, "Failed to monitor transmission halt\n"); + } } void ksz88xx_r_mib_stats64(struct ksz_device *dev, int port) diff --git a/drivers/net/dsa/microchip/ksz_common.h b/drivers/net/dsa/microchip/ksz_common.h index c784fd23a993..ee7db46e469d 100644 --- a/drivers/net/dsa/microchip/ksz_common.h +++ b/drivers/net/dsa/microchip/ksz_common.h @@ -66,6 +66,7 @@ struct ksz_chip_data { bool tc_cbs_supported; const struct ksz_dev_ops *ops; const struct phylink_mac_ops *phylink_mac_ops; + bool phy_errata_9477; bool ksz87xx_eee_link_erratum; const struct ksz_mib_names *mib_names; int mib_cnt;
The errata DS80000754 recommends monitoring potential faults in half-duplex mode for the KSZ9477 family. half-duplex is not very common so I just added a critical message when the fault conditions are detected. The switch can be expected to be unable to communicate anymore in these states and a software reset of the switch would be required which I did not implement. Fixes: b987e98e50ab ("dsa: add DSA switch driver for Microchip KSZ9477") Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com> --- v6: - use macros for PORT_INTF_SPEED_MASK check - add VLAN condition before checking the resources v5: https://lore.kernel.org/all/20240604092304.314636-5-enguerrand.de-ribaucourt@savoirfairelinux.com/ - use macros for bitmasks - check for return values on ksz_pread* v4: https://lore.kernel.org/all/20240531142430.678198-6-enguerrand.de-ribaucourt@savoirfairelinux.com/ - rebase on net/main - add Fixes tag - reverse x-mas tree v3: https://lore.kernel.org/all/20240530102436.226189-6-enguerrand.de-ribaucourt@savoirfairelinux.com/ --- drivers/net/dsa/microchip/ksz9477.c | 47 +++++++++++++++++++++++++ drivers/net/dsa/microchip/ksz9477.h | 2 ++ drivers/net/dsa/microchip/ksz9477_reg.h | 10 ++++-- drivers/net/dsa/microchip/ksz_common.c | 11 ++++++ drivers/net/dsa/microchip/ksz_common.h | 1 + 5 files changed, 69 insertions(+), 2 deletions(-)