diff mbox series

[net] net: phy: marvell10g: fix 88x3310 power up

Message ID 20230712062634.21288-1-jiawenwu@trustnetic.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] net: phy: marvell10g: fix 88x3310 power up | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1341 this patch: 1341
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 1364 this patch: 1364
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1364 this patch: 1364
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 22 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Jiawen Wu July 12, 2023, 6:26 a.m. UTC
Clear MV_V2_PORT_CTRL_PWRDOWN bit to set power up for 88x3310 PHY,
it sometimes does not take effect immediately. This will cause
mv3310_reset() to time out, which will fail the config initialization.
So add to poll PHY power up.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
---
 drivers/net/phy/marvell10g.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Simon Horman July 13, 2023, 10:26 a.m. UTC | #1
On Wed, Jul 12, 2023 at 02:26:34PM +0800, Jiawen Wu wrote:
> Clear MV_V2_PORT_CTRL_PWRDOWN bit to set power up for 88x3310 PHY,
> it sometimes does not take effect immediately. This will cause
> mv3310_reset() to time out, which will fail the config initialization.
> So add to poll PHY power up.
> 
> Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>

Hi Jiawen Wu,

should this have the following?

Fixes: 0a5550b1165c ("bpftool: Use "fallthrough;" keyword instead of comments")
Russell King (Oracle) July 13, 2023, 10:35 a.m. UTC | #2
On Thu, Jul 13, 2023 at 11:26:40AM +0100, Simon Horman wrote:
> On Wed, Jul 12, 2023 at 02:26:34PM +0800, Jiawen Wu wrote:
> > Clear MV_V2_PORT_CTRL_PWRDOWN bit to set power up for 88x3310 PHY,
> > it sometimes does not take effect immediately. This will cause
> > mv3310_reset() to time out, which will fail the config initialization.
> > So add to poll PHY power up.
> > 
> > Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
> 
> Hi Jiawen Wu,
> 
> should this have the following?
> 
> Fixes: 0a5550b1165c ("bpftool: Use "fallthrough;" keyword instead of comments")

What is that commit? It doesn't appear to be in Linus' tree, it doesn't
appear to be in the net tree, nor the net-next tree.
Simon Horman July 13, 2023, 10:45 a.m. UTC | #3
On Thu, Jul 13, 2023 at 11:35:05AM +0100, Russell King (Oracle) wrote:
> On Thu, Jul 13, 2023 at 11:26:40AM +0100, Simon Horman wrote:
> > On Wed, Jul 12, 2023 at 02:26:34PM +0800, Jiawen Wu wrote:
> > > Clear MV_V2_PORT_CTRL_PWRDOWN bit to set power up for 88x3310 PHY,
> > > it sometimes does not take effect immediately. This will cause
> > > mv3310_reset() to time out, which will fail the config initialization.
> > > So add to poll PHY power up.
> > > 
> > > Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
> > 
> > Hi Jiawen Wu,
> > 
> > should this have the following?
> > 
> > Fixes: 0a5550b1165c ("bpftool: Use "fallthrough;" keyword instead of comments")
> 
> What is that commit? It doesn't appear to be in Linus' tree, it doesn't
> appear to be in the net tree, nor the net-next tree.

Hi Russell,

Sorry, it is bogus. Some sort of cut and paste error on my side
that pulled in the local commit of an unrelated patch.

What I should have said is:

Fixes: 8f48c2ac85ed ("net: marvell10g: soft-reset the PHY when coming out of low power")
Russell King (Oracle) July 13, 2023, 10:46 a.m. UTC | #4
On Wed, Jul 12, 2023 at 02:26:34PM +0800, Jiawen Wu wrote:
> Clear MV_V2_PORT_CTRL_PWRDOWN bit to set power up for 88x3310 PHY,
> it sometimes does not take effect immediately. This will cause
> mv3310_reset() to time out, which will fail the config initialization.
> So add to poll PHY power up.

Can you check how long it takes for the PWRDOWN bit to clear? The
datasheet says that hardware reset or a MDIO write to this register
can clear this bit. It doesn't say that it needs to be polled or
that it takes time to clear before reset is possible.

So, I think a little more explanation and investigation would be
useful.

Thanks.
Russell King (Oracle) July 13, 2023, 10:53 a.m. UTC | #5
On Thu, Jul 13, 2023 at 11:45:59AM +0100, Simon Horman wrote:
> On Thu, Jul 13, 2023 at 11:35:05AM +0100, Russell King (Oracle) wrote:
> > On Thu, Jul 13, 2023 at 11:26:40AM +0100, Simon Horman wrote:
> > > On Wed, Jul 12, 2023 at 02:26:34PM +0800, Jiawen Wu wrote:
> > > > Clear MV_V2_PORT_CTRL_PWRDOWN bit to set power up for 88x3310 PHY,
> > > > it sometimes does not take effect immediately. This will cause
> > > > mv3310_reset() to time out, which will fail the config initialization.
> > > > So add to poll PHY power up.
> > > > 
> > > > Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
> > > 
> > > Hi Jiawen Wu,
> > > 
> > > should this have the following?
> > > 
> > > Fixes: 0a5550b1165c ("bpftool: Use "fallthrough;" keyword instead of comments")
> > 
> > What is that commit? It doesn't appear to be in Linus' tree, it doesn't
> > appear to be in the net tree, nor the net-next tree.
> 
> Hi Russell,
> 
> Sorry, it is bogus. Some sort of cut and paste error on my side
> that pulled in the local commit of an unrelated patch.
> 
> What I should have said is:
> 
> Fixes: 8f48c2ac85ed ("net: marvell10g: soft-reset the PHY when coming out of low power")

Thanks, but I don't think that's appropriate either.

The commit adds a software reset after clearing the power down bit, but
that doesn't have anything to do with mv3310_reset().

There are two places that mv3310_reset() is called, mv3310_config_mdix()
and mv3310_set_edpd(). One of them is in the probe function, after we
have powered up the PHY.

I think we need much more information from the reporter before we can
guess which commit is a problem, if any.

When does the reset time out?
What is the code path that we see mv3310_reset() timing out?
Does the problem happen while resuming or probing?
How soon after clearing the power down bit is mv3310_reset() called?
Jiawen Wu July 13, 2023, 11:30 a.m. UTC | #6
On Thursday, July 13, 2023 6:54 PM, Russell King (Oracle) wrote:
> On Thu, Jul 13, 2023 at 11:45:59AM +0100, Simon Horman wrote:
> > On Thu, Jul 13, 2023 at 11:35:05AM +0100, Russell King (Oracle) wrote:
> > > On Thu, Jul 13, 2023 at 11:26:40AM +0100, Simon Horman wrote:
> > > > On Wed, Jul 12, 2023 at 02:26:34PM +0800, Jiawen Wu wrote:
> > > > > Clear MV_V2_PORT_CTRL_PWRDOWN bit to set power up for 88x3310 PHY,
> > > > > it sometimes does not take effect immediately. This will cause
> > > > > mv3310_reset() to time out, which will fail the config initialization.
> > > > > So add to poll PHY power up.
> > > > >
> > > > > Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
> > > >
> > > > Hi Jiawen Wu,
> > > >
> > > > should this have the following?
> > > >
> > > > Fixes: 0a5550b1165c ("bpftool: Use "fallthrough;" keyword instead of comments")
> > >
> > > What is that commit? It doesn't appear to be in Linus' tree, it doesn't
> > > appear to be in the net tree, nor the net-next tree.
> >
> > Hi Russell,
> >
> > Sorry, it is bogus. Some sort of cut and paste error on my side
> > that pulled in the local commit of an unrelated patch.
> >
> > What I should have said is:
> >
> > Fixes: 8f48c2ac85ed ("net: marvell10g: soft-reset the PHY when coming out of low power")
> 
> Thanks, but I don't think that's appropriate either.
> 
> The commit adds a software reset after clearing the power down bit, but
> that doesn't have anything to do with mv3310_reset().
> 
> There are two places that mv3310_reset() is called, mv3310_config_mdix()
> and mv3310_set_edpd(). One of them is in the probe function, after we
> have powered up the PHY.
> 
> I think we need much more information from the reporter before we can
> guess which commit is a problem, if any.
> 
> When does the reset time out?
> What is the code path that we see mv3310_reset() timing out?
> Does the problem happen while resuming or probing?
> How soon after clearing the power down bit is mv3310_reset() called?

I need to test it more times for more information.

As far as I know, reset timeout appears in mv3310_set_edpd(), after mv3310_power_up()
in mv3310_config_init().

Now what I'm confused about is, sometimes there was weird values while probing, just
to read out a weird firmware version, that caused the test to fail.

And for this phy_read_mmd_poll_timeout(), it only succeeds when sleep_before_read = true.
Otherwise, it would never succeed to clear the power down bit. Currently it looks like clearing
the bit takes about 1ms.
Russell King (Oracle) July 13, 2023, 11:41 a.m. UTC | #7
On Thu, Jul 13, 2023 at 07:30:17PM +0800, Jiawen Wu wrote:
> On Thursday, July 13, 2023 6:54 PM, Russell King (Oracle) wrote:
> > On Thu, Jul 13, 2023 at 11:45:59AM +0100, Simon Horman wrote:
> > > On Thu, Jul 13, 2023 at 11:35:05AM +0100, Russell King (Oracle) wrote:
> > > > On Thu, Jul 13, 2023 at 11:26:40AM +0100, Simon Horman wrote:
> > > > > On Wed, Jul 12, 2023 at 02:26:34PM +0800, Jiawen Wu wrote:
> > > > > > Clear MV_V2_PORT_CTRL_PWRDOWN bit to set power up for 88x3310 PHY,
> > > > > > it sometimes does not take effect immediately. This will cause
> > > > > > mv3310_reset() to time out, which will fail the config initialization.
> > > > > > So add to poll PHY power up.
> > > > > >
> > > > > > Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
> > > > >
> > > > > Hi Jiawen Wu,
> > > > >
> > > > > should this have the following?
> > > > >
> > > > > Fixes: 0a5550b1165c ("bpftool: Use "fallthrough;" keyword instead of comments")
> > > >
> > > > What is that commit? It doesn't appear to be in Linus' tree, it doesn't
> > > > appear to be in the net tree, nor the net-next tree.
> > >
> > > Hi Russell,
> > >
> > > Sorry, it is bogus. Some sort of cut and paste error on my side
> > > that pulled in the local commit of an unrelated patch.
> > >
> > > What I should have said is:
> > >
> > > Fixes: 8f48c2ac85ed ("net: marvell10g: soft-reset the PHY when coming out of low power")
> > 
> > Thanks, but I don't think that's appropriate either.
> > 
> > The commit adds a software reset after clearing the power down bit, but
> > that doesn't have anything to do with mv3310_reset().
> > 
> > There are two places that mv3310_reset() is called, mv3310_config_mdix()
> > and mv3310_set_edpd(). One of them is in the probe function, after we
> > have powered up the PHY.
> > 
> > I think we need much more information from the reporter before we can
> > guess which commit is a problem, if any.
> > 
> > When does the reset time out?
> > What is the code path that we see mv3310_reset() timing out?
> > Does the problem happen while resuming or probing?
> > How soon after clearing the power down bit is mv3310_reset() called?
> 
> I need to test it more times for more information.
> 
> As far as I know, reset timeout appears in mv3310_set_edpd(), after mv3310_power_up()
> in mv3310_config_init().
> 
> Now what I'm confused about is, sometimes there was weird values while probing, just
> to read out a weird firmware version, that caused the test to fail.
> 
> And for this phy_read_mmd_poll_timeout(), it only succeeds when sleep_before_read = true.
> Otherwise, it would never succeed to clear the power down bit. Currently it looks like clearing
> the bit takes about 1ms.

So, reading the bit before the first delay period results in the bit not
clearing, despite having written it to be zero?
Jiawen Wu July 13, 2023, 11:50 a.m. UTC | #8
On Thursday, July 13, 2023 7:41 PM, Russell King (Oracle) wrote:
> On Thu, Jul 13, 2023 at 07:30:17PM +0800, Jiawen Wu wrote:
> > On Thursday, July 13, 2023 6:54 PM, Russell King (Oracle) wrote:
> > > On Thu, Jul 13, 2023 at 11:45:59AM +0100, Simon Horman wrote:
> > > > On Thu, Jul 13, 2023 at 11:35:05AM +0100, Russell King (Oracle) wrote:
> > > > > On Thu, Jul 13, 2023 at 11:26:40AM +0100, Simon Horman wrote:
> > > > > > On Wed, Jul 12, 2023 at 02:26:34PM +0800, Jiawen Wu wrote:
> > > > > > > Clear MV_V2_PORT_CTRL_PWRDOWN bit to set power up for 88x3310 PHY,
> > > > > > > it sometimes does not take effect immediately. This will cause
> > > > > > > mv3310_reset() to time out, which will fail the config initialization.
> > > > > > > So add to poll PHY power up.
> > > > > > >
> > > > > > > Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
> > > > > >
> > > > > > Hi Jiawen Wu,
> > > > > >
> > > > > > should this have the following?
> > > > > >
> > > > > > Fixes: 0a5550b1165c ("bpftool: Use "fallthrough;" keyword instead of comments")
> > > > >
> > > > > What is that commit? It doesn't appear to be in Linus' tree, it doesn't
> > > > > appear to be in the net tree, nor the net-next tree.
> > > >
> > > > Hi Russell,
> > > >
> > > > Sorry, it is bogus. Some sort of cut and paste error on my side
> > > > that pulled in the local commit of an unrelated patch.
> > > >
> > > > What I should have said is:
> > > >
> > > > Fixes: 8f48c2ac85ed ("net: marvell10g: soft-reset the PHY when coming out of low power")
> > >
> > > Thanks, but I don't think that's appropriate either.
> > >
> > > The commit adds a software reset after clearing the power down bit, but
> > > that doesn't have anything to do with mv3310_reset().
> > >
> > > There are two places that mv3310_reset() is called, mv3310_config_mdix()
> > > and mv3310_set_edpd(). One of them is in the probe function, after we
> > > have powered up the PHY.
> > >
> > > I think we need much more information from the reporter before we can
> > > guess which commit is a problem, if any.
> > >
> > > When does the reset time out?
> > > What is the code path that we see mv3310_reset() timing out?
> > > Does the problem happen while resuming or probing?
> > > How soon after clearing the power down bit is mv3310_reset() called?
> >
> > I need to test it more times for more information.
> >
> > As far as I know, reset timeout appears in mv3310_set_edpd(), after mv3310_power_up()
> > in mv3310_config_init().
> >
> > Now what I'm confused about is, sometimes there was weird values while probing, just
> > to read out a weird firmware version, that caused the test to fail.
> >
> > And for this phy_read_mmd_poll_timeout(), it only succeeds when sleep_before_read = true.
> > Otherwise, it would never succeed to clear the power down bit. Currently it looks like clearing
> > the bit takes about 1ms.
> 
> So, reading the bit before the first delay period results in the bit not
> clearing, despite having written it to be zero?

Yes. So in the original code, there is no delay to read the register again for
setting software reset bit. I think the power down bit is not actually cleared
in my test.
Simon Horman July 13, 2023, 12:18 p.m. UTC | #9
On Thu, Jul 13, 2023 at 11:53:42AM +0100, Russell King (Oracle) wrote:
> On Thu, Jul 13, 2023 at 11:45:59AM +0100, Simon Horman wrote:
> > On Thu, Jul 13, 2023 at 11:35:05AM +0100, Russell King (Oracle) wrote:
> > > On Thu, Jul 13, 2023 at 11:26:40AM +0100, Simon Horman wrote:
> > > > On Wed, Jul 12, 2023 at 02:26:34PM +0800, Jiawen Wu wrote:
> > > > > Clear MV_V2_PORT_CTRL_PWRDOWN bit to set power up for 88x3310 PHY,
> > > > > it sometimes does not take effect immediately. This will cause
> > > > > mv3310_reset() to time out, which will fail the config initialization.
> > > > > So add to poll PHY power up.
> > > > > 
> > > > > Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
> > > > 
> > > > Hi Jiawen Wu,
> > > > 
> > > > should this have the following?
> > > > 
> > > > Fixes: 0a5550b1165c ("bpftool: Use "fallthrough;" keyword instead of comments")
> > > 
> > > What is that commit? It doesn't appear to be in Linus' tree, it doesn't
> > > appear to be in the net tree, nor the net-next tree.
> > 
> > Hi Russell,
> > 
> > Sorry, it is bogus. Some sort of cut and paste error on my side
> > that pulled in the local commit of an unrelated patch.
> > 
> > What I should have said is:
> > 
> > Fixes: 8f48c2ac85ed ("net: marvell10g: soft-reset the PHY when coming out of low power")
> 
> Thanks, but I don't think that's appropriate either.
> 
> The commit adds a software reset after clearing the power down bit, but
> that doesn't have anything to do with mv3310_reset().
> 
> There are two places that mv3310_reset() is called, mv3310_config_mdix()
> and mv3310_set_edpd(). One of them is in the probe function, after we
> have powered up the PHY.
> 
> I think we need much more information from the reporter before we can
> guess which commit is a problem, if any.

Sure, it was just a suggestion from my side.

> When does the reset time out?
> What is the code path that we see mv3310_reset() timing out?
> Does the problem happen while resuming or probing?
> How soon after clearing the power down bit is mv3310_reset() called?
Jiawen Wu July 17, 2023, 10:51 a.m. UTC | #10
> > > > There are two places that mv3310_reset() is called, mv3310_config_mdix()
> > > > and mv3310_set_edpd(). One of them is in the probe function, after we
> > > > have powered up the PHY.
> > > >
> > > > I think we need much more information from the reporter before we can
> > > > guess which commit is a problem, if any.
> > > >
> > > > When does the reset time out?
> > > > What is the code path that we see mv3310_reset() timing out?
> > > > Does the problem happen while resuming or probing?
> > > > How soon after clearing the power down bit is mv3310_reset() called?
> > >
> > > I need to test it more times for more information.
> > >
> > > As far as I know, reset timeout appears in mv3310_set_edpd(), after mv3310_power_up()
> > > in mv3310_config_init().
> > >
> > > Now what I'm confused about is, sometimes there was weird values while probing, just
> > > to read out a weird firmware version, that caused the test to fail.
> > >
> > > And for this phy_read_mmd_poll_timeout(), it only succeeds when sleep_before_read = true.
> > > Otherwise, it would never succeed to clear the power down bit. Currently it looks like clearing
> > > the bit takes about 1ms.
> >
> > So, reading the bit before the first delay period results in the bit not
> > clearing, despite having written it to be zero?
> 
> Yes. So in the original code, there is no delay to read the register again for
> setting software reset bit. I think the power down bit is not actually cleared
> in my test.

Hi Russell,

I confirmed last week that this change is valid to make mv3310_reset() success.
But now reset fails again, only on port 0. Reset timeout still appears in
mv3310_config_init() -> mv3310_set_edpd() -> mv3310_reset(). I deleted this
change to test again, and the result shows that this change is valid for port 1.

So I'm a little confused. Since I don't have programming guidelines for this PHY,
but only a datasheet. Could you please help to check for any possible problems
with it?

Thanks.
Russell King (Oracle) July 17, 2023, 12:22 p.m. UTC | #11
On Mon, Jul 17, 2023 at 06:51:38PM +0800, Jiawen Wu wrote:
> > > > > There are two places that mv3310_reset() is called, mv3310_config_mdix()
> > > > > and mv3310_set_edpd(). One of them is in the probe function, after we
> > > > > have powered up the PHY.
> > > > >
> > > > > I think we need much more information from the reporter before we can
> > > > > guess which commit is a problem, if any.
> > > > >
> > > > > When does the reset time out?
> > > > > What is the code path that we see mv3310_reset() timing out?
> > > > > Does the problem happen while resuming or probing?
> > > > > How soon after clearing the power down bit is mv3310_reset() called?
> > > >
> > > > I need to test it more times for more information.
> > > >
> > > > As far as I know, reset timeout appears in mv3310_set_edpd(), after mv3310_power_up()
> > > > in mv3310_config_init().
> > > >
> > > > Now what I'm confused about is, sometimes there was weird values while probing, just
> > > > to read out a weird firmware version, that caused the test to fail.
> > > >
> > > > And for this phy_read_mmd_poll_timeout(), it only succeeds when sleep_before_read = true.
> > > > Otherwise, it would never succeed to clear the power down bit. Currently it looks like clearing
> > > > the bit takes about 1ms.
> > >
> > > So, reading the bit before the first delay period results in the bit not
> > > clearing, despite having written it to be zero?
> > 
> > Yes. So in the original code, there is no delay to read the register again for
> > setting software reset bit. I think the power down bit is not actually cleared
> > in my test.
> 
> Hi Russell,
> 
> I confirmed last week that this change is valid to make mv3310_reset() success.
> But now reset fails again, only on port 0. Reset timeout still appears in
> mv3310_config_init() -> mv3310_set_edpd() -> mv3310_reset(). I deleted this
> change to test again, and the result shows that this change is valid for port 1.
> 
> So I'm a little confused. Since I don't have programming guidelines for this PHY,
> but only a datasheet. Could you please help to check for any possible problems
> with it?

I think the question that's missing is... why do other 88x3310 users not
see this problem - what is special about your port 0?

Maybe there's a clue with the hardware schematics? Do you have access to
those?
Jiawen Wu July 18, 2023, 9:12 a.m. UTC | #12
On Monday, July 17, 2023 8:23 PM, Russell King (Oracle) wrote:
> On Mon, Jul 17, 2023 at 06:51:38PM +0800, Jiawen Wu wrote:
> > > > > > There are two places that mv3310_reset() is called, mv3310_config_mdix()
> > > > > > and mv3310_set_edpd(). One of them is in the probe function, after we
> > > > > > have powered up the PHY.
> > > > > >
> > > > > > I think we need much more information from the reporter before we can
> > > > > > guess which commit is a problem, if any.
> > > > > >
> > > > > > When does the reset time out?
> > > > > > What is the code path that we see mv3310_reset() timing out?
> > > > > > Does the problem happen while resuming or probing?
> > > > > > How soon after clearing the power down bit is mv3310_reset() called?
> > > > >
> > > > > I need to test it more times for more information.
> > > > >
> > > > > As far as I know, reset timeout appears in mv3310_set_edpd(), after mv3310_power_up()
> > > > > in mv3310_config_init().
> > > > >
> > > > > Now what I'm confused about is, sometimes there was weird values while probing, just
> > > > > to read out a weird firmware version, that caused the test to fail.
> > > > >
> > > > > And for this phy_read_mmd_poll_timeout(), it only succeeds when sleep_before_read = true.
> > > > > Otherwise, it would never succeed to clear the power down bit. Currently it looks like clearing
> > > > > the bit takes about 1ms.
> > > >
> > > > So, reading the bit before the first delay period results in the bit not
> > > > clearing, despite having written it to be zero?
> > >
> > > Yes. So in the original code, there is no delay to read the register again for
> > > setting software reset bit. I think the power down bit is not actually cleared
> > > in my test.
> >
> > Hi Russell,
> >
> > I confirmed last week that this change is valid to make mv3310_reset() success.
> > But now reset fails again, only on port 0. Reset timeout still appears in
> > mv3310_config_init() -> mv3310_set_edpd() -> mv3310_reset(). I deleted this
> > change to test again, and the result shows that this change is valid for port 1.
> >
> > So I'm a little confused. Since I don't have programming guidelines for this PHY,
> > but only a datasheet. Could you please help to check for any possible problems
> > with it?
> 
> I think the question that's missing is... why do other 88x3310 users not
> see this problem - what is special about your port 0?
> 
> Maybe there's a clue with the hardware schematics? Do you have access to
> those?

This problem never happened again after I poweroff and restart the machine.
However, this patch is still required to successfully probe the PHY.

One thing I've noticed is that there is restriction in mv3310_power_up(), software
reset not performed when priv->firmware_ver < 0x00030000. And my 88x3310's
firmware version happens to 0x20200. Will this restriction cause subsequent reset
timeout(without this patch)?
Russell King (Oracle) July 18, 2023, 9:49 a.m. UTC | #13
On Tue, Jul 18, 2023 at 05:12:33PM +0800, Jiawen Wu wrote:
> On Monday, July 17, 2023 8:23 PM, Russell King (Oracle) wrote:
> > On Mon, Jul 17, 2023 at 06:51:38PM +0800, Jiawen Wu wrote:
> > > > > > > There are two places that mv3310_reset() is called, mv3310_config_mdix()
> > > > > > > and mv3310_set_edpd(). One of them is in the probe function, after we
> > > > > > > have powered up the PHY.
> > > > > > >
> > > > > > > I think we need much more information from the reporter before we can
> > > > > > > guess which commit is a problem, if any.
> > > > > > >
> > > > > > > When does the reset time out?
> > > > > > > What is the code path that we see mv3310_reset() timing out?
> > > > > > > Does the problem happen while resuming or probing?
> > > > > > > How soon after clearing the power down bit is mv3310_reset() called?
> > > > > >
> > > > > > I need to test it more times for more information.
> > > > > >
> > > > > > As far as I know, reset timeout appears in mv3310_set_edpd(), after mv3310_power_up()
> > > > > > in mv3310_config_init().
> > > > > >
> > > > > > Now what I'm confused about is, sometimes there was weird values while probing, just
> > > > > > to read out a weird firmware version, that caused the test to fail.
> > > > > >
> > > > > > And for this phy_read_mmd_poll_timeout(), it only succeeds when sleep_before_read = true.
> > > > > > Otherwise, it would never succeed to clear the power down bit. Currently it looks like clearing
> > > > > > the bit takes about 1ms.
> > > > >
> > > > > So, reading the bit before the first delay period results in the bit not
> > > > > clearing, despite having written it to be zero?
> > > >
> > > > Yes. So in the original code, there is no delay to read the register again for
> > > > setting software reset bit. I think the power down bit is not actually cleared
> > > > in my test.
> > >
> > > Hi Russell,
> > >
> > > I confirmed last week that this change is valid to make mv3310_reset() success.
> > > But now reset fails again, only on port 0. Reset timeout still appears in
> > > mv3310_config_init() -> mv3310_set_edpd() -> mv3310_reset(). I deleted this
> > > change to test again, and the result shows that this change is valid for port 1.
> > >
> > > So I'm a little confused. Since I don't have programming guidelines for this PHY,
> > > but only a datasheet. Could you please help to check for any possible problems
> > > with it?
> > 
> > I think the question that's missing is... why do other 88x3310 users not
> > see this problem - what is special about your port 0?
> > 
> > Maybe there's a clue with the hardware schematics? Do you have access to
> > those?
> 
> This problem never happened again after I poweroff and restart the machine.
> However, this patch is still required to successfully probe the PHY.
> 
> One thing I've noticed is that there is restriction in mv3310_power_up(), software
> reset not performed when priv->firmware_ver < 0x00030000. And my 88x3310's
> firmware version happens to 0x20200. Will this restriction cause subsequent reset
> timeout(without this patch)?

We (Matteo and I) discovered the need for software reset by
experimentation on his Macchiatobin and trying different firmware
versions. Essentially, I had 0.2.1.0 which didn't need the software
reset, Matteo had 0.3.3.0 which did seem to need it.

I also upgraded my firmware to 0.3.3.0 and even 0.3.10.0 and confirmed
that the software reset works on the two PHYs on my boards.

What I don't understand is "this patch is still required to successfully
probe the PHY". The power-up path is not called during probe - nor is
the EDPD path. By "probe" I'm assuming we're talking about the driver
probe, in other words, mv3310_probe(), not the config_init - it may be
that you're terminology is not matching phylib's terminology. Please
can you clarify.

Thanks.
Jiawen Wu July 18, 2023, 9:58 a.m. UTC | #14
On Tuesday, July 18, 2023 5:49 PM, Russell King (Oracle) wrote:
> On Tue, Jul 18, 2023 at 05:12:33PM +0800, Jiawen Wu wrote:
> > On Monday, July 17, 2023 8:23 PM, Russell King (Oracle) wrote:
> > > On Mon, Jul 17, 2023 at 06:51:38PM +0800, Jiawen Wu wrote:
> > > > > > > > There are two places that mv3310_reset() is called, mv3310_config_mdix()
> > > > > > > > and mv3310_set_edpd(). One of them is in the probe function, after we
> > > > > > > > have powered up the PHY.
> > > > > > > >
> > > > > > > > I think we need much more information from the reporter before we can
> > > > > > > > guess which commit is a problem, if any.
> > > > > > > >
> > > > > > > > When does the reset time out?
> > > > > > > > What is the code path that we see mv3310_reset() timing out?
> > > > > > > > Does the problem happen while resuming or probing?
> > > > > > > > How soon after clearing the power down bit is mv3310_reset() called?
> > > > > > >
> > > > > > > I need to test it more times for more information.
> > > > > > >
> > > > > > > As far as I know, reset timeout appears in mv3310_set_edpd(), after mv3310_power_up()
> > > > > > > in mv3310_config_init().
> > > > > > >
> > > > > > > Now what I'm confused about is, sometimes there was weird values while probing, just
> > > > > > > to read out a weird firmware version, that caused the test to fail.
> > > > > > >
> > > > > > > And for this phy_read_mmd_poll_timeout(), it only succeeds when sleep_before_read = true.
> > > > > > > Otherwise, it would never succeed to clear the power down bit. Currently it looks like clearing
> > > > > > > the bit takes about 1ms.
> > > > > >
> > > > > > So, reading the bit before the first delay period results in the bit not
> > > > > > clearing, despite having written it to be zero?
> > > > >
> > > > > Yes. So in the original code, there is no delay to read the register again for
> > > > > setting software reset bit. I think the power down bit is not actually cleared
> > > > > in my test.
> > > >
> > > > Hi Russell,
> > > >
> > > > I confirmed last week that this change is valid to make mv3310_reset() success.
> > > > But now reset fails again, only on port 0. Reset timeout still appears in
> > > > mv3310_config_init() -> mv3310_set_edpd() -> mv3310_reset(). I deleted this
> > > > change to test again, and the result shows that this change is valid for port 1.
> > > >
> > > > So I'm a little confused. Since I don't have programming guidelines for this PHY,
> > > > but only a datasheet. Could you please help to check for any possible problems
> > > > with it?
> > >
> > > I think the question that's missing is... why do other 88x3310 users not
> > > see this problem - what is special about your port 0?
> > >
> > > Maybe there's a clue with the hardware schematics? Do you have access to
> > > those?
> >
> > This problem never happened again after I poweroff and restart the machine.
> > However, this patch is still required to successfully probe the PHY.
> >
> > One thing I've noticed is that there is restriction in mv3310_power_up(), software
> > reset not performed when priv->firmware_ver < 0x00030000. And my 88x3310's
> > firmware version happens to 0x20200. Will this restriction cause subsequent reset
> > timeout(without this patch)?
> 
> We (Matteo and I) discovered the need for software reset by
> experimentation on his Macchiatobin and trying different firmware
> versions. Essentially, I had 0.2.1.0 which didn't need the software
> reset, Matteo had 0.3.3.0 which did seem to need it.
> 
> I also upgraded my firmware to 0.3.3.0 and even 0.3.10.0 and confirmed
> that the software reset works on the two PHYs on my boards.
> 
> What I don't understand is "this patch is still required to successfully
> probe the PHY". The power-up path is not called during probe - nor is
> the EDPD path. By "probe" I'm assuming we're talking about the driver
> probe, in other words, mv3310_probe(), not the config_init - it may be
> that you're terminology is not matching phylib's terminology. Please
> can you clarify.

I'm sorry for the mistake in my description. I mean MAC driver probe, in fact
it is in phy_connect_direct(), to call mv3310_config_init().
Russell King (Oracle) July 18, 2023, 11:47 a.m. UTC | #15
On Tue, Jul 18, 2023 at 05:58:28PM +0800, Jiawen Wu wrote:
> On Tuesday, July 18, 2023 5:49 PM, Russell King (Oracle) wrote:
> > On Tue, Jul 18, 2023 at 05:12:33PM +0800, Jiawen Wu wrote:
> > > On Monday, July 17, 2023 8:23 PM, Russell King (Oracle) wrote:
> > > > On Mon, Jul 17, 2023 at 06:51:38PM +0800, Jiawen Wu wrote:
> > > > > > > > > There are two places that mv3310_reset() is called, mv3310_config_mdix()
> > > > > > > > > and mv3310_set_edpd(). One of them is in the probe function, after we
> > > > > > > > > have powered up the PHY.
> > > > > > > > >
> > > > > > > > > I think we need much more information from the reporter before we can
> > > > > > > > > guess which commit is a problem, if any.
> > > > > > > > >
> > > > > > > > > When does the reset time out?
> > > > > > > > > What is the code path that we see mv3310_reset() timing out?
> > > > > > > > > Does the problem happen while resuming or probing?
> > > > > > > > > How soon after clearing the power down bit is mv3310_reset() called?
> > > > > > > >
> > > > > > > > I need to test it more times for more information.
> > > > > > > >
> > > > > > > > As far as I know, reset timeout appears in mv3310_set_edpd(), after mv3310_power_up()
> > > > > > > > in mv3310_config_init().
> > > > > > > >
> > > > > > > > Now what I'm confused about is, sometimes there was weird values while probing, just
> > > > > > > > to read out a weird firmware version, that caused the test to fail.
> > > > > > > >
> > > > > > > > And for this phy_read_mmd_poll_timeout(), it only succeeds when sleep_before_read = true.
> > > > > > > > Otherwise, it would never succeed to clear the power down bit. Currently it looks like clearing
> > > > > > > > the bit takes about 1ms.
> > > > > > >
> > > > > > > So, reading the bit before the first delay period results in the bit not
> > > > > > > clearing, despite having written it to be zero?
> > > > > >
> > > > > > Yes. So in the original code, there is no delay to read the register again for
> > > > > > setting software reset bit. I think the power down bit is not actually cleared
> > > > > > in my test.
> > > > >
> > > > > Hi Russell,
> > > > >
> > > > > I confirmed last week that this change is valid to make mv3310_reset() success.
> > > > > But now reset fails again, only on port 0. Reset timeout still appears in
> > > > > mv3310_config_init() -> mv3310_set_edpd() -> mv3310_reset(). I deleted this
> > > > > change to test again, and the result shows that this change is valid for port 1.
> > > > >
> > > > > So I'm a little confused. Since I don't have programming guidelines for this PHY,
> > > > > but only a datasheet. Could you please help to check for any possible problems
> > > > > with it?
> > > >
> > > > I think the question that's missing is... why do other 88x3310 users not
> > > > see this problem - what is special about your port 0?
> > > >
> > > > Maybe there's a clue with the hardware schematics? Do you have access to
> > > > those?
> > >
> > > This problem never happened again after I poweroff and restart the machine.
> > > However, this patch is still required to successfully probe the PHY.
> > >
> > > One thing I've noticed is that there is restriction in mv3310_power_up(), software
> > > reset not performed when priv->firmware_ver < 0x00030000. And my 88x3310's
> > > firmware version happens to 0x20200. Will this restriction cause subsequent reset
> > > timeout(without this patch)?
> > 
> > We (Matteo and I) discovered the need for software reset by
> > experimentation on his Macchiatobin and trying different firmware
> > versions. Essentially, I had 0.2.1.0 which didn't need the software
> > reset, Matteo had 0.3.3.0 which did seem to need it.
> > 
> > I also upgraded my firmware to 0.3.3.0 and even 0.3.10.0 and confirmed
> > that the software reset works on the two PHYs on my boards.
> > 
> > What I don't understand is "this patch is still required to successfully
> > probe the PHY". The power-up path is not called during probe - nor is
> > the EDPD path. By "probe" I'm assuming we're talking about the driver
> > probe, in other words, mv3310_probe(), not the config_init - it may be
> > that you're terminology is not matching phylib's terminology. Please
> > can you clarify.
> 
> I'm sorry for the mistake in my description. I mean MAC driver probe, in fact
> it is in phy_connect_direct(), to call mv3310_config_init().

Okay, so how about this for an alternative theory.

The PHY is being probed, which places the PHY in power down mode.
Then your network driver (which?) gets probed, connects immediately
to the PHY, which attempts to power up the PHY - but maybe the PHY
hasn't finished powering down yet, and thus delays the powering up.

However, according to the functional spec, placing the device in
power-down mode as we do is immediate.

Please can you try experimenting with a delay in mv3310_config_init()
before the call to mv3310_power_up() to see whether that has any
beneficial effect?

Thanks.
Jiawen Wu July 19, 2023, 2:29 a.m. UTC | #16
On Tuesday, July 18, 2023 7:47 PM, Russell King (Oracle) wrote:
> On Tue, Jul 18, 2023 at 05:58:28PM +0800, Jiawen Wu wrote:
> > On Tuesday, July 18, 2023 5:49 PM, Russell King (Oracle) wrote:
> > > On Tue, Jul 18, 2023 at 05:12:33PM +0800, Jiawen Wu wrote:
> > > > On Monday, July 17, 2023 8:23 PM, Russell King (Oracle) wrote:
> > > > > On Mon, Jul 17, 2023 at 06:51:38PM +0800, Jiawen Wu wrote:
> > > > > > > > > > There are two places that mv3310_reset() is called, mv3310_config_mdix()
> > > > > > > > > > and mv3310_set_edpd(). One of them is in the probe function, after we
> > > > > > > > > > have powered up the PHY.
> > > > > > > > > >
> > > > > > > > > > I think we need much more information from the reporter before we can
> > > > > > > > > > guess which commit is a problem, if any.
> > > > > > > > > >
> > > > > > > > > > When does the reset time out?
> > > > > > > > > > What is the code path that we see mv3310_reset() timing out?
> > > > > > > > > > Does the problem happen while resuming or probing?
> > > > > > > > > > How soon after clearing the power down bit is mv3310_reset() called?
> > > > > > > > >
> > > > > > > > > I need to test it more times for more information.
> > > > > > > > >
> > > > > > > > > As far as I know, reset timeout appears in mv3310_set_edpd(), after mv3310_power_up()
> > > > > > > > > in mv3310_config_init().
> > > > > > > > >
> > > > > > > > > Now what I'm confused about is, sometimes there was weird values while probing, just
> > > > > > > > > to read out a weird firmware version, that caused the test to fail.
> > > > > > > > >
> > > > > > > > > And for this phy_read_mmd_poll_timeout(), it only succeeds when sleep_before_read = true.
> > > > > > > > > Otherwise, it would never succeed to clear the power down bit. Currently it looks like clearing
> > > > > > > > > the bit takes about 1ms.
> > > > > > > >
> > > > > > > > So, reading the bit before the first delay period results in the bit not
> > > > > > > > clearing, despite having written it to be zero?
> > > > > > >
> > > > > > > Yes. So in the original code, there is no delay to read the register again for
> > > > > > > setting software reset bit. I think the power down bit is not actually cleared
> > > > > > > in my test.
> > > > > >
> > > > > > Hi Russell,
> > > > > >
> > > > > > I confirmed last week that this change is valid to make mv3310_reset() success.
> > > > > > But now reset fails again, only on port 0. Reset timeout still appears in
> > > > > > mv3310_config_init() -> mv3310_set_edpd() -> mv3310_reset(). I deleted this
> > > > > > change to test again, and the result shows that this change is valid for port 1.
> > > > > >
> > > > > > So I'm a little confused. Since I don't have programming guidelines for this PHY,
> > > > > > but only a datasheet. Could you please help to check for any possible problems
> > > > > > with it?
> > > > >
> > > > > I think the question that's missing is... why do other 88x3310 users not
> > > > > see this problem - what is special about your port 0?
> > > > >
> > > > > Maybe there's a clue with the hardware schematics? Do you have access to
> > > > > those?
> > > >
> > > > This problem never happened again after I poweroff and restart the machine.
> > > > However, this patch is still required to successfully probe the PHY.
> > > >
> > > > One thing I've noticed is that there is restriction in mv3310_power_up(), software
> > > > reset not performed when priv->firmware_ver < 0x00030000. And my 88x3310's
> > > > firmware version happens to 0x20200. Will this restriction cause subsequent reset
> > > > timeout(without this patch)?
> > >
> > > We (Matteo and I) discovered the need for software reset by
> > > experimentation on his Macchiatobin and trying different firmware
> > > versions. Essentially, I had 0.2.1.0 which didn't need the software
> > > reset, Matteo had 0.3.3.0 which did seem to need it.
> > >
> > > I also upgraded my firmware to 0.3.3.0 and even 0.3.10.0 and confirmed
> > > that the software reset works on the two PHYs on my boards.
> > >
> > > What I don't understand is "this patch is still required to successfully
> > > probe the PHY". The power-up path is not called during probe - nor is
> > > the EDPD path. By "probe" I'm assuming we're talking about the driver
> > > probe, in other words, mv3310_probe(), not the config_init - it may be
> > > that you're terminology is not matching phylib's terminology. Please
> > > can you clarify.
> >
> > I'm sorry for the mistake in my description. I mean MAC driver probe, in fact
> > it is in phy_connect_direct(), to call mv3310_config_init().
> 
> Okay, so how about this for an alternative theory.
> 
> The PHY is being probed, which places the PHY in power down mode.
> Then your network driver (which?) gets probed, connects immediately
> to the PHY, which attempts to power up the PHY - but maybe the PHY
> hasn't finished powering down yet, and thus delays the powering up.
> 
> However, according to the functional spec, placing the device in
> power-down mode as we do is immediate.
> 
> Please can you try experimenting with a delay in mv3310_config_init()
> before the call to mv3310_power_up() to see whether that has any
> beneficial effect?

I experimented with delays of 100ms to 1s, all reset timed out. Unfortunately,
the theory doesn't seem to be true. :(

There is a log dump while I tried in 200ms.

[59697.591809] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=6, val=c000
[59697.592811] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=5, val=9a
[59697.593814] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=2, val=2b
[59697.594817] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=3, val=9ab
[59697.595811] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=2, val=2b
[59697.596811] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=3, val=9ab
[59697.597811] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=4, regnum=2, val=141
[59697.598809] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=4, regnum=3, val=dab
[59697.599809] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=7, regnum=2, val=2b
[59697.600810] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=7, regnum=3, val=9ab
[59697.601815] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1e, regnum=8, val=0
[59697.602930] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1f, regnum=8, val=fffe
[59697.608811] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=d00d, val=680b
[59697.609823] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=c050, val=7e
[59697.610814] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=c011, val=2
[59697.611817] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=c012, val=200
[59697.611820] mv88x3310 txgbe-400:00: Firmware version 0.2.2.0
[59697.612817] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1f, regnum=f001, val=803
[59697.612820] txgbe 0000:04:00.0: [W]phy_addr=0, devnum=1f, regnum=f08c, val=9600
[59697.613819] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1f, regnum=f08a, val=cd9a
[59697.613822] txgbe 0000:04:00.0: [W]phy_addr=0, devnum=1f, regnum=f08a, val=d9a
[59697.614818] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=7, regnum=1, val=9ab
[59697.615816] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=8, val=9701
[59697.616817] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=b, val=1a4
[59697.617814] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=14, val=e
[59697.618809] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=15, val=3
[59697.619811] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=7, regnum=3c, val=0
[59697.619831] mv88x3310 txgbe-400:00: attached PHY driver (mii_bus:phy_addr=txgbe-400:00, irq=POLL)
[59697.830169] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1f, regnum=f001, val=803
[59697.830179] txgbe 0000:04:00.0: [W]phy_addr=0, devnum=1f, regnum=f001, val=3
[59697.830926] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1f, regnum=f001, val=803
[59697.831926] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=8000, val=60
[59697.831932] txgbe 0000:04:00.0: [W]phy_addr=0, devnum=3, regnum=8000, val=360
[59697.832926] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=0, val=a040
[59697.838922] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=0, val=a040
[59697.844815] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=0, val=a040
[59697.850812] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=0, val=a040
[59697.856813] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=0, val=a040
[59697.862812] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=0, val=a040
[59697.868812] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=0, val=a040
[59697.874812] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=0, val=a040
[59697.880812] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=0, val=a040
[59697.886812] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=0, val=a040
[59697.892812] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=0, val=a040
[59697.898812] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=0, val=a040
[59697.904812] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=0, val=a040
[59697.910812] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=0, val=a040
[59697.916812] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=0, val=a040
[59697.922812] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=0, val=a040
[59697.928812] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=0, val=a040
[59697.934813] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=0, val=a040
[59697.935813] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=0, val=a040
[59697.935815] mv88x3310 txgbe-400:00: mv3310_reset failed: -110
Jiawen Wu July 19, 2023, 3:53 a.m. UTC | #17
> > Okay, so how about this for an alternative theory.
> >
> > The PHY is being probed, which places the PHY in power down mode.
> > Then your network driver (which?) gets probed, connects immediately
> > to the PHY, which attempts to power up the PHY - but maybe the PHY
> > hasn't finished powering down yet, and thus delays the powering up.
> >
> > However, according to the functional spec, placing the device in
> > power-down mode as we do is immediate.
> >
> > Please can you try experimenting with a delay in mv3310_config_init()
> > before the call to mv3310_power_up() to see whether that has any
> > beneficial effect?
> 
> I experimented with delays of 100ms to 1s, all reset timed out. Unfortunately,
> the theory doesn't seem to be true. :(

And I tried to add 100ms delay after mv3310_power_up() and before chip->get_mactype(phydev),
it showed that power down bit cleared while reading the reg in mv3310_get_mactype().
Then the reset executed successfully.
Russell King (Oracle) July 19, 2023, 6:50 a.m. UTC | #18
On Wed, Jul 19, 2023 at 10:29:38AM +0800, Jiawen Wu wrote:
> [59697.591809] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=6, val=c000
> [59697.592811] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=5, val=9a
> [59697.593814] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=2, val=2b
> [59697.594817] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=3, val=9ab
> [59697.595811] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=2, val=2b
> [59697.596811] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=3, val=9ab
> [59697.597811] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=4, regnum=2, val=141
> [59697.598809] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=4, regnum=3, val=dab
> [59697.599809] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=7, regnum=2, val=2b
> [59697.600810] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=7, regnum=3, val=9ab
> [59697.601815] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1e, regnum=8, val=0
> [59697.602930] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1f, regnum=8, val=fffe
> [59697.608811] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=d00d, val=680b
> [59697.609823] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=c050, val=7e
> [59697.610814] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=c011, val=2
> [59697.611817] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=c012, val=200
> [59697.611820] mv88x3310 txgbe-400:00: Firmware version 0.2.2.0
> [59697.612817] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1f, regnum=f001, val=803

So here we can see the PHY is already in low-power mode, so presumably
it's configured to do that from power-up?

> [59697.612820] txgbe 0000:04:00.0: [W]phy_addr=0, devnum=1f, regnum=f08c, val=9600
> [59697.613819] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1f, regnum=f08a, val=cd9a
> [59697.613822] txgbe 0000:04:00.0: [W]phy_addr=0, devnum=1f, regnum=f08a, val=d9a
> [59697.614818] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=7, regnum=1, val=9ab
> [59697.615816] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=8, val=9701
> [59697.616817] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=b, val=1a4
> [59697.617814] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=14, val=e
> [59697.618809] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=15, val=3
> [59697.619811] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=7, regnum=3c, val=0
> [59697.619831] mv88x3310 txgbe-400:00: attached PHY driver (mii_bus:phy_addr=txgbe-400:00, irq=POLL)

The following is where we attempt to power up the PHY:

> [59697.830169] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1f, regnum=f001, val=803
> [59697.830179] txgbe 0000:04:00.0: [W]phy_addr=0, devnum=1f, regnum=f001, val=3

The above is our attempt to clear the low power bit.

> [59697.830926] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1f, regnum=f001, val=803

According to this read though (which is in get_mactype), the write
didn't take effect.

If you place a delay of 1ms after phy_clear_bits_mmd() in
mv3310_power_up(), does it then work?
Jiawen Wu July 19, 2023, 7:57 a.m. UTC | #19
On Wednesday, July 19, 2023 2:51 PM, Russell King (Oracle) wrote:
> On Wed, Jul 19, 2023 at 10:29:38AM +0800, Jiawen Wu wrote:
> > [59697.591809] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=6, val=c000
> > [59697.592811] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=5, val=9a
> > [59697.593814] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=2, val=2b
> > [59697.594817] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=3, val=9ab
> > [59697.595811] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=2, val=2b
> > [59697.596811] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=3, val=9ab
> > [59697.597811] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=4, regnum=2, val=141
> > [59697.598809] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=4, regnum=3, val=dab
> > [59697.599809] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=7, regnum=2, val=2b
> > [59697.600810] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=7, regnum=3, val=9ab
> > [59697.601815] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1e, regnum=8, val=0
> > [59697.602930] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1f, regnum=8, val=fffe
> > [59697.608811] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=d00d, val=680b
> > [59697.609823] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=c050, val=7e
> > [59697.610814] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=c011, val=2
> > [59697.611817] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=c012, val=200
> > [59697.611820] mv88x3310 txgbe-400:00: Firmware version 0.2.2.0
> > [59697.612817] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1f, regnum=f001, val=803
> 
> So here we can see the PHY is already in low-power mode, so presumably
> it's configured to do that from power-up?
> 
> > [59697.612820] txgbe 0000:04:00.0: [W]phy_addr=0, devnum=1f, regnum=f08c, val=9600
> > [59697.613819] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1f, regnum=f08a, val=cd9a
> > [59697.613822] txgbe 0000:04:00.0: [W]phy_addr=0, devnum=1f, regnum=f08a, val=d9a
> > [59697.614818] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=7, regnum=1, val=9ab
> > [59697.615816] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=8, val=9701
> > [59697.616817] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=b, val=1a4
> > [59697.617814] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=3, regnum=14, val=e
> > [59697.618809] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1, regnum=15, val=3
> > [59697.619811] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=7, regnum=3c, val=0
> > [59697.619831] mv88x3310 txgbe-400:00: attached PHY driver (mii_bus:phy_addr=txgbe-400:00, irq=POLL)
> 
> The following is where we attempt to power up the PHY:
> 
> > [59697.830169] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1f, regnum=f001, val=803
> > [59697.830179] txgbe 0000:04:00.0: [W]phy_addr=0, devnum=1f, regnum=f001, val=3
> 
> The above is our attempt to clear the low power bit.
> 
> > [59697.830926] txgbe 0000:04:00.0: [R]phy_addr=0, devnum=1f, regnum=f001, val=803
> 
> According to this read though (which is in get_mactype), the write
> didn't take effect.
> 
> If you place a delay of 1ms after phy_clear_bits_mmd() in
> mv3310_power_up(), does it then work?

Yes, I just experimented, it works well.
Russell King (Oracle) July 19, 2023, 8:27 a.m. UTC | #20
On Wed, Jul 19, 2023 at 03:57:30PM +0800, Jiawen Wu wrote:
> > According to this read though (which is in get_mactype), the write
> > didn't take effect.
> > 
> > If you place a delay of 1ms after phy_clear_bits_mmd() in
> > mv3310_power_up(), does it then work?
> 
> Yes, I just experimented, it works well.

Please send a patch adding it, with a comment along the lines of:

	/* Sometimes, the power down bit doesn't clear immediately, and
	 * a read of this register causes the bit not to clear. Delay
	 * 1ms to allow the PHY to come out of power down mode before
	 * the next access.
	 */

Thanks.
Jiawen Wu July 19, 2023, 8:38 a.m. UTC | #21
On Wednesday, July 19, 2023 4:27 PM, Russell King (Oracle) wrote:
> On Wed, Jul 19, 2023 at 03:57:30PM +0800, Jiawen Wu wrote:
> > > According to this read though (which is in get_mactype), the write
> > > didn't take effect.
> > >
> > > If you place a delay of 1ms after phy_clear_bits_mmd() in
> > > mv3310_power_up(), does it then work?
> >
> > Yes, I just experimented, it works well.
> 
> Please send a patch adding it, with a comment along the lines of:
> 
> 	/* Sometimes, the power down bit doesn't clear immediately, and
> 	 * a read of this register causes the bit not to clear. Delay
> 	 * 1ms to allow the PHY to come out of power down mode before
> 	 * the next access.
> 	 */

After multiple experiments, I determined that the minimum delay it required
is 55us. Does the delay need to be reduced? But I'm not sure whether it is
related to the system. I use udelay(55) in the test.
Russell King (Oracle) July 19, 2023, 8:52 a.m. UTC | #22
On Wed, Jul 19, 2023 at 04:38:36PM +0800, Jiawen Wu wrote:
> On Wednesday, July 19, 2023 4:27 PM, Russell King (Oracle) wrote:
> > On Wed, Jul 19, 2023 at 03:57:30PM +0800, Jiawen Wu wrote:
> > > > According to this read though (which is in get_mactype), the write
> > > > didn't take effect.
> > > >
> > > > If you place a delay of 1ms after phy_clear_bits_mmd() in
> > > > mv3310_power_up(), does it then work?
> > >
> > > Yes, I just experimented, it works well.
> > 
> > Please send a patch adding it, with a comment along the lines of:
> > 
> > 	/* Sometimes, the power down bit doesn't clear immediately, and
> > 	 * a read of this register causes the bit not to clear. Delay
> > 	 * 1ms to allow the PHY to come out of power down mode before
> > 	 * the next access.
> > 	 */
> 
> After multiple experiments, I determined that the minimum delay it required
> is 55us. Does the delay need to be reduced? But I'm not sure whether it is
> related to the system. I use udelay(55) in the test.

55us is slightly longer than one access-time to C45 registers with 32
bits of preamble on the bus before each mdio frame. I'd suggest we go
with 100us in that case.
diff mbox series

Patch

diff --git a/drivers/net/phy/marvell10g.c b/drivers/net/phy/marvell10g.c
index 55d9d7acc32e..2bed654b7c33 100644
--- a/drivers/net/phy/marvell10g.c
+++ b/drivers/net/phy/marvell10g.c
@@ -323,13 +323,20 @@  static int mv3310_power_down(struct phy_device *phydev)
 static int mv3310_power_up(struct phy_device *phydev)
 {
 	struct mv3310_priv *priv = dev_get_drvdata(&phydev->mdio.dev);
-	int ret;
+	int ret, val;
 
 	ret = phy_clear_bits_mmd(phydev, MDIO_MMD_VEND2, MV_V2_PORT_CTRL,
 				 MV_V2_PORT_CTRL_PWRDOWN);
+	if (ret < 0)
+		return ret;
+
+	ret = phy_read_mmd_poll_timeout(phydev, MDIO_MMD_VEND2,
+					MV_V2_PORT_CTRL, val,
+					!(val & MV_V2_PORT_CTRL_PWRDOWN),
+					1000, 100000, true);
 
 	if (phydev->drv->phy_id != MARVELL_PHY_ID_88X3310 ||
-	    priv->firmware_ver < 0x00030000)
+	    priv->firmware_ver < 0x00030000 || ret < 0)
 		return ret;
 
 	return phy_set_bits_mmd(phydev, MDIO_MMD_VEND2, MV_V2_PORT_CTRL,