diff mbox

[net-next,v2,5/8] net: phy: meson-gxl: detect LPA corruption

Message ID 20171207142715.32578-6-jbrunet@baylibre.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jerome Brunet Dec. 7, 2017, 2:27 p.m. UTC
The purpose of this change is to fix the incorrect detection of the link
partner (LP) advertised capabilities which sometimes happens with this PHY
(roughly 1 time in a dozen)

This issue may cause the link to be negotiated at 10Mbps/Full or
10Mbps/Half when 100MBps/Full is actually possible. In some case, the link
is even completely broken and no communication is possible.

To detect the corruption, we must look for a magic undocumented bit in the
WOL bank (hint given by the SoC vendor kernel) but this is not enough to
cover all cases. We also have to look at the LPA ack. If the LP supports
Aneg but did not ack our base code when aneg is completed, we assume
something went wrong.

The detection of a corrupted LPA triggers a restart of the aneg process.
This solves the problem but may take up to 6 retries to complete.

Fixes: 7334b3e47aee ("net: phy: Add Meson GXL Internal PHY driver")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
---

I suppose this patch probably seems a bit hacky, especially the part
about the link partner acknowledge. I'm trying to figure out if the
value in MII_LPA makes sense but I don't have such a deep knowledge
of the ethernet spec.

To me, it does not makes sense for the LP to support ANEG (Bit 1 in
MII_EXPENSION), the aneg to have successfully complete and, at the
same time, LP does not ACK our base code word, which we should have
sent during this aneg.

If you think this may have unintended consequences or if you have
an idea to this differently, feel free to let me know.

 drivers/net/phy/meson-gxl.c | 59 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 58 insertions(+), 1 deletion(-)

Comments

Andrew Lunn Dec. 7, 2017, 3:34 p.m. UTC | #1
On Thu, Dec 07, 2017 at 03:27:12PM +0100, Jerome Brunet wrote:
> The purpose of this change is to fix the incorrect detection of the link
> partner (LP) advertised capabilities which sometimes happens with this PHY
> (roughly 1 time in a dozen)

Hi Jerome

Since this is a real fix, please send it alone. Base it on net, not
net-next.  David will then get it applied to stable, so it will be
backported.

Thanks
	Andrew
Jerome Brunet Dec. 7, 2017, 3:42 p.m. UTC | #2
On Thu, 2017-12-07 at 16:34 +0100, Andrew Lunn wrote:
> On Thu, Dec 07, 2017 at 03:27:12PM +0100, Jerome Brunet wrote:
> > The purpose of this change is to fix the incorrect detection of the link
> > partner (LP) advertised capabilities which sometimes happens with this PHY
> > (roughly 1 time in a dozen)
> 
> Hi Jerome
> 
> Since this is a real fix, please send it alone. Base it on net, not
> net-next.  David will then get it applied to stable, so it will be
> backported.

I hesitated on this. For this fix to be "not too ugly" I need at least the
helper functions (patch 1 to 3) ... which are not really fixes.

Would it be Ok if send patches 1 to 5 to net ?
and 6 to 8 separately on net-next ?

> 
> Thanks
> 	Andrew
Andrew Lunn Dec. 7, 2017, 4:12 p.m. UTC | #3
> Would it be Ok if send patches 1 to 5 to net ?
> and 6 to 8 separately on net-next ?

No. The rules for stable is that a patch must really fix something and
be minimal.

Documentation/process/stable-kernel-rules.rst 

What might be best is to develop a minimal, but ugly patch for stable.
Get it applied. Around a week later, net will be merged into
net-next. You can then have a 'revert' patch, followed by this series
making it nice and clean.

       Andrew
Jerome Brunet Dec. 7, 2017, 4:22 p.m. UTC | #4
On Thu, 2017-12-07 at 17:12 +0100, Andrew Lunn wrote:
> > Would it be Ok if send patches 1 to 5 to net ?
> > and 6 to 8 separately on net-next ?
> 
> No. The rules for stable is that a patch must really fix something and
> be minimal.
> 
> Documentation/process/stable-kernel-rules.rst 
> 
> What might be best is to develop a minimal, but ugly patch for stable.
> Get it applied. Around a week later, net will be merged into
> net-next. You can then have a 'revert' patch, followed by this series
> making it nice and clean.

Looks like a plan. Will do.
Thanks Andrew.

> 
>        Andrew
diff mbox

Patch

diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
index 2e8c40df33c2..726e0eeed475 100644
--- a/drivers/net/phy/meson-gxl.c
+++ b/drivers/net/phy/meson-gxl.c
@@ -35,11 +35,16 @@ 
 #define TSTWRITE	23
 
 #define BANK_ANALOG_DSP		0
+#define BANK_WOL		1
 #define BANK_BIST		3
 
 /* Analog/DSP Registers */
 #define A6_CONFIG_REG	0x17
 
+/* WOL Registers */
+#define LPI_STATUS	0xc
+#define  LPI_STATUS_RSV12	BIT(12)
+
 /* BIST Registers */
 #define FR_PLL_CONTROL	0x1b
 #define FR_PLL_DIV0	0x1c
@@ -145,6 +150,58 @@  static int meson_gxl_config_init(struct phy_device *phydev)
 	return genphy_config_init(phydev);
 }
 
+/* This specific function is provided to cope with the possible failures of
+ * this phy during aneg process. When aneg fails, the PHY reports that aneg
+ * is done but the value found in MII_LPA is wrong:
+ *  - Early failures: MII_LPA is just 0x0001. if MII_EXPANSION reports that
+ *    the link partner (LP) supports aneg but the LP never acked our base
+ *    code word, it is likely that we never sent it to begin with.
+ *  - Late failures: MII_LPA is filled with a value which seems to make sense
+ *    but it actually is not what the LP is advertising. It seems that we
+ *    can detect this using a magic bit in the WOL bank (reg 12 - bit 12).
+ *    If this particular bit is not set when aneg is reported being done,
+ *    it means MII_LPA is likely to be wrong.
+ *
+ * In both case, forcing a restart of the aneg process solve the problem.
+ * When this failure happens, the first retry is usually successful but,
+ * in some cases, it may take up to 6 retries to get a decent result
+ */
+int meson_gxl_read_status(struct phy_device *phydev)
+{
+	int ret, wol, lpa, exp;
+
+	if (phydev->autoneg == AUTONEG_ENABLE) {
+		ret = genphy_aneg_done(phydev);
+		if (ret < 0)
+			return ret;
+		else if (!ret)
+			goto read_status_continue;
+
+		/* Aneg is done, let's check everything is fine */
+		wol = meson_gxl_read_reg(phydev, BANK_WOL, LPI_STATUS);
+		if (wol < 0)
+			return wol;
+
+		lpa = phy_read(phydev, MII_LPA);
+		if (lpa < 0)
+			return lpa;
+
+		exp = phy_read(phydev, MII_EXPANSION);
+		if (exp < 0)
+			return exp;
+
+		if (!(wol & LPI_STATUS_RSV12) ||
+		    ((exp & EXPANSION_NWAY) && !(lpa & LPA_LPACK))) {
+			/* Looks like aneg failed after all */
+			phydev_dbg(phydev, "LPA corruption - aneg restart\n");
+			return genphy_restart_aneg(phydev);
+		}
+	}
+
+read_status_continue:
+	return genphy_read_status(phydev);
+}
+
 static struct phy_driver meson_gxl_phy[] = {
 	{
 		.phy_id		= 0x01814400,
@@ -155,7 +212,7 @@  static struct phy_driver meson_gxl_phy[] = {
 		.config_init	= meson_gxl_config_init,
 		.config_aneg	= genphy_config_aneg,
 		.aneg_done      = genphy_aneg_done,
-		.read_status	= genphy_read_status,
+		.read_status	= meson_gxl_read_status,
 		.suspend        = genphy_suspend,
 		.resume         = genphy_resume,
 	},