diff mbox series

[net-next,v2,7/9] net: phy: introduce ethtool_phy_ops to get and set phy configuration

Message ID 20241004161601.2932901-8-maxime.chevallier@bootlin.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series Allow isolating PHY devices | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; GEN HAS DIFF 2 files changed, 102 insertions(+);
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 11 this patch: 11
netdev/build_tools success Errors and warnings before: 0 (+1) this patch: 0 (+1)
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 58 this patch: 58
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1705 this patch: 1705
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 112 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc fail Errors and warnings before: 109 this patch: 111
netdev/source_inline success Was 0 now: 0

Commit Message

Maxime Chevallier Oct. 4, 2024, 4:15 p.m. UTC
Expose phy-specific configuration hooks to get and set the state of an
ethernet PHY's internal configuration.

So far, these parameters only include the isolation state of the PHY.

The .get_config() ethtool_phy_ops gets these status information from the
phy_device's internal flags, while the .set_config() operation allows
changing these configuration parameters.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
---
V2 : Dropped loopback mode

 drivers/net/phy/phy.c        | 44 ++++++++++++++++++++++++++++++++++++
 drivers/net/phy/phy_device.c |  2 ++
 include/linux/ethtool.h      |  8 +++++++
 include/linux/phy.h          | 19 ++++++++++++++++
 4 files changed, 73 insertions(+)

Comments

Andrew Lunn Oct. 4, 2024, 6:42 p.m. UTC | #1
> +int phy_set_config(struct phy_device *phydev,
> +		   const struct phy_device_config *phy_cfg,
> +		   struct netlink_ext_ack *extack)
> +{
> +	bool isolate_change;
> +	int ret;
> +
> +	mutex_lock(&phydev->lock);
> +	isolate_change = (phy_cfg->isolate != phydev->isolated);
> +	mutex_unlock(&phydev->lock);
> +
> +	if (!isolate_change)
> +		return 0;
> +
> +	ret = phy_isolate(phydev, phy_cfg->isolate);
> +	if (ret)
> +		NL_SET_ERR_MSG(extack, "Error while configuring PHY isolation");

This seems overly simplistic to me. Don't you need to iterate over all
the other PHYs attached to this MAC and ensure they are isolated? Only
one can be unisolated at once.

It is also not clear to me how this is going to work from a MAC
perspective. Does the MAC call phy_connect() multiple times? How does
ndev->phydev work? Who is responsible for the initial configuration,
such that all but one PHY is isolated?

I assume you have a real board that needs this. So i think we need to
see a bit more of the complete solution, including the MAC changes and
the device tree for the board, so we can see the big picture.

	Andrew
Russell King (Oracle) Oct. 4, 2024, 7:02 p.m. UTC | #2
On Fri, Oct 04, 2024 at 08:42:42PM +0200, Andrew Lunn wrote:
> > +int phy_set_config(struct phy_device *phydev,
> > +		   const struct phy_device_config *phy_cfg,
> > +		   struct netlink_ext_ack *extack)
> > +{
> > +	bool isolate_change;
> > +	int ret;
> > +
> > +	mutex_lock(&phydev->lock);
> > +	isolate_change = (phy_cfg->isolate != phydev->isolated);
> > +	mutex_unlock(&phydev->lock);
> > +
> > +	if (!isolate_change)
> > +		return 0;
> > +
> > +	ret = phy_isolate(phydev, phy_cfg->isolate);
> > +	if (ret)
> > +		NL_SET_ERR_MSG(extack, "Error while configuring PHY isolation");
> 
> This seems overly simplistic to me. Don't you need to iterate over all
> the other PHYs attached to this MAC and ensure they are isolated? Only
> one can be unisolated at once.
> 
> It is also not clear to me how this is going to work from a MAC
> perspective. Does the MAC call phy_connect() multiple times? How does
> ndev->phydev work? Who is responsible for the initial configuration,
> such that all but one PHY is isolated?
> 
> I assume you have a real board that needs this. So i think we need to
> see a bit more of the complete solution, including the MAC changes and
> the device tree for the board, so we can see the big picture.

Also what the ethernet driver looks like too!

One way around the ndev->phydev problem, assuming that we decide that
isolate is a good idea, would be to isolate the current PHY, disconnect
it from the net_device, connect the new PHY, and then clear the isolate
on the new PHY. Essentially, ndev->phydev becomes the currently-active
PHY.

However, I still want to hear whether multiple PHYs can be on the same
MII bus from a functional electrical perspective.

I know that on the Macchiatobin, where the 10G serdes signals go to the
88X3310 on doubleshot boards and to the SFP cage on singleshot boards,
this is controlled by the placement of zero ohm resistors to route the
serdes signals to the appropriate device, thus minimising the unused
stub lengths.
Maxime Chevallier Oct. 7, 2024, 10:37 a.m. UTC | #3
Hello Andrew, Russell,

On Fri, 4 Oct 2024 20:02:05 +0100
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

[...]

> > This seems overly simplistic to me. Don't you need to iterate over all
> > the other PHYs attached to this MAC and ensure they are isolated? Only
> > one can be unisolated at once.
> > 
> > It is also not clear to me how this is going to work from a MAC
> > perspective. Does the MAC call phy_connect() multiple times? How does
> > ndev->phydev work? Who is responsible for the initial configuration,
> > such that all but one PHY is isolated?
> > 
> > I assume you have a real board that needs this. So i think we need to
> > see a bit more of the complete solution, including the MAC changes and
> > the device tree for the board, so we can see the big picture.  
> 
> Also what the ethernet driver looks like too!
> 
> One way around the ndev->phydev problem, assuming that we decide that
> isolate is a good idea, would be to isolate the current PHY, disconnect
> it from the net_device, connect the new PHY, and then clear the isolate
> on the new PHY. Essentially, ndev->phydev becomes the currently-active
> PHY.

It seems I am missing details in my cover and the overall work I'm
trying to achieve.

This series focuses on isolating the PHY in the case where only one
PHY is attached to the MAC. I have followup work to support multi-PHY
interfaces. I will do my best to send the RFC this week so that you can
take a look. I'm definitely not saying the current code supports that.

To tell you some details, it indeed works as Russell says, I
detach/re-attach the PHYs, ndev->phydev is the "currently active" PHY.

I'm using a new dedicated "struct phy_mux" for that, which has :

 - Parent ops (that would be filled either by the MAC, or by phylink,
in the same spirit as phylink can be an sfp_upstream), which manages
PHY attach / detach to the netdev, but also the state-machine or the
currently inactive PHY.

 - multiplexer ops, that implement the switching logic, if any (drive a
GPIO, write a register, this is in the case of real multiplexers like
we have on some of the Turris Omnia boards, which the phy_mux framework
would support)

 - child ops, that would be hooks to activate/deactivate a PHY itself
(isoalte/unisolate, power-up/power-down).

I'll send the RFC ASAP, I still have a few rough edges that I will
mention in the cover.

> However, I still want to hear whether multiple PHYs can be on the same
> MII bus from a functional electrical perspective.

Yup, I have that hardware.

Thanks,

Maxime
Andrew Lunn Oct. 7, 2024, 1:01 p.m. UTC | #4
> It seems I am missing details in my cover and the overall work I'm
> trying to achieve.
> 
> This series focuses on isolating the PHY in the case where only one
> PHY is attached to the MAC.

I can understand implementing building blocks, but this patchset seems
to be more than that, it seems to be a use case of its own. But is
isolating a single PHY a useful use case? Do we want a kAPI for this?

> I have followup work to support multi-PHY
> interfaces. I will do my best to send the RFC this week so that you can
> take a look. I'm definitely not saying the current code supports that.
> 
> To tell you some details, it indeed works as Russell says, I
> detach/re-attach the PHYs, ndev->phydev is the "currently active" PHY.
> 
> I'm using a new dedicated "struct phy_mux" for that, which has :
> 
>  - Parent ops (that would be filled either by the MAC, or by phylink,
> in the same spirit as phylink can be an sfp_upstream), which manages
> PHY attach / detach to the netdev, but also the state-machine or the
> currently inactive PHY.
> 
>  - multiplexer ops, that implement the switching logic, if any (drive a
> GPIO, write a register, this is in the case of real multiplexers like
> we have on some of the Turris Omnia boards, which the phy_mux framework
> would support)
> 
>  - child ops, that would be hooks to activate/deactivate a PHY itself
> (isoalte/unisolate, power-up/power-down).

Does the kAPI for a single PHY get used, and extended, in this setup?

> I'll send the RFC ASAP, I still have a few rough edges that I will
> mention in the cover.
> 
> > However, I still want to hear whether multiple PHYs can be on the same
> > MII bus from a functional electrical perspective.
> 
> Yup, I have that hardware.

Can you talk a bit more about that hardware? What PHYs do you have?
What interface modes are they using?

	Andrew
Maxime Chevallier Oct. 7, 2024, 1:48 p.m. UTC | #5
Hi Andrew,

On Mon, 7 Oct 2024 15:01:50 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> > It seems I am missing details in my cover and the overall work I'm
> > trying to achieve.
> > 
> > This series focuses on isolating the PHY in the case where only one
> > PHY is attached to the MAC.  
> 
> I can understand implementing building blocks, but this patchset seems
> to be more than that, it seems to be a use case of its own. But is
> isolating a single PHY a useful use case? Do we want a kAPI for this?

That's a legit point. I mentioned in the cover for V1 that this in
itself doesn't really bring anything useful. The only point being that
it makes it easy to test if a PHY has a working isolation mode, but
given that we'll assume that it doesn't by default, that whole point
is moot.

I would therefore understand if you consider that having a kAPI for
that isn't very interesting and that I shall include this work as part
of the multi-PHY support.

> > I have followup work to support multi-PHY
> > interfaces. I will do my best to send the RFC this week so that you can
> > take a look. I'm definitely not saying the current code supports that.
> > 
> > To tell you some details, it indeed works as Russell says, I
> > detach/re-attach the PHYs, ndev->phydev is the "currently active" PHY.
> > 
> > I'm using a new dedicated "struct phy_mux" for that, which has :
> > 
> >  - Parent ops (that would be filled either by the MAC, or by phylink,
> > in the same spirit as phylink can be an sfp_upstream), which manages
> > PHY attach / detach to the netdev, but also the state-machine or the
> > currently inactive PHY.
> > 
> >  - multiplexer ops, that implement the switching logic, if any (drive a
> > GPIO, write a register, this is in the case of real multiplexers like
> > we have on some of the Turris Omnia boards, which the phy_mux framework
> > would support)
> > 
> >  - child ops, that would be hooks to activate/deactivate a PHY itself
> > (isoalte/unisolate, power-up/power-down).  
> 
> Does the kAPI for a single PHY get used, and extended, in this setup?

For isolation, no.

> 
> > I'll send the RFC ASAP, I still have a few rough edges that I will
> > mention in the cover.
> >   
> > > However, I still want to hear whether multiple PHYs can be on the same
> > > MII bus from a functional electrical perspective.  
> > 
> > Yup, I have that hardware.  
> 
> Can you talk a bit more about that hardware? What PHYs do you have?
> What interface modes are they using?

Sure thing. There are multiple devices out-there that may have multiple
PHYs accessible from the MAC, through muxers (I'm trying to be generic
enough to address all cases, gpio muxers, mmio-controlled muxers, etc.),
but let me describe the HW I'm working on that's a bit more problematic.

The first such platform I have has an fs_enet MAC, a pair of LXT973
PHYs for which the isolate mode doesn't work, and no on-board circuitry to
perform the isolation. Here, we have to power one PHY down when unused :

                /--- LXT973
fs_enet -- MII--|
                \--- LXT973


The second board has a fs_enet MAC and a pair of KSZ8041 PHYs connected
in MII.

The third one has a pair of KSZ8041 PHYs connected to a
ucc_geth MAC in RMII.

On both these boards, we isolate the PHYs when unused, and we also
drive a GPIO to toggle some on-board circuitry to disconnect the MII
lines as well for the unused PHY. I'd have to run some tests to see if
this circuitry could be enough, without relying at all on PHY
isolation :

                   /--- KSZ8041
                   |
      MAC ------ MUX
                 | | 
  to SoC <-gpio--/ \--- KSZ8041


One point is, if you look at the first case (no mux), we need to know
if the PHYs are able to isolate or not in order to use the proper
switching strategy (isolate or power-down).

I hope this clarifies the approach a little bit ?

Thanks,

Maxime
Russell King (Oracle) Oct. 7, 2024, 4:10 p.m. UTC | #6
On Mon, Oct 07, 2024 at 03:48:39PM +0200, Maxime Chevallier wrote:
> Sure thing. There are multiple devices out-there that may have multiple
> PHYs accessible from the MAC, through muxers (I'm trying to be generic
> enough to address all cases, gpio muxers, mmio-controlled muxers, etc.),
> but let me describe the HW I'm working on that's a bit more problematic.
> 
> The first such platform I have has an fs_enet MAC, a pair of LXT973
> PHYs for which the isolate mode doesn't work, and no on-board circuitry to
> perform the isolation. Here, we have to power one PHY down when unused :
> 
>                 /--- LXT973
> fs_enet -- MII--|
>                 \--- LXT973
> 
> 
> The second board has a fs_enet MAC and a pair of KSZ8041 PHYs connected
> in MII.
> 
> The third one has a pair of KSZ8041 PHYs connected to a
> ucc_geth MAC in RMII.
> 
> On both these boards, we isolate the PHYs when unused, and we also
> drive a GPIO to toggle some on-board circuitry to disconnect the MII
> lines as well for the unused PHY. I'd have to run some tests to see if
> this circuitry could be enough, without relying at all on PHY
> isolation :
> 
>                    /--- KSZ8041
>                    |
>       MAC ------ MUX
>                  | | 
>   to SoC <-gpio--/ \--- KSZ8041
> 
> 
> One point is, if you look at the first case (no mux), we need to know
> if the PHYs are able to isolate or not in order to use the proper
> switching strategy (isolate or power-down).
> 
> I hope this clarifies the approach a little bit ?

What I gather from the above is you have these scenarios:

1) two LXT973 on a MII bus (not RMII, RGMII etc but the 802.3 defined
   MII bus with four data lines in each direction, a bunch of control
   signals, clocked at a maximum of 25MHz). In this case, you need to
   power down each PHY so it doesn't interfere on the MII bus as the
   PHY doesn't support isolate mode.

2) two KSZ8041 on a MII bus to a multiplexer who's exact behaviour is
   not yet known which may require the use of the PHYs isolate bit.

I would suggest that spending time adding infrastructure for a rare
scenario, and when it is uncertain whether it needs to be used in
these scenarios is premature.

Please validate on the two KSZ8041 setups whether isolate is
necessary.

Presumably on those two KSZ88041 setups, the idea is to see which PHY
ends up with media link first, and then switch between the two PHYs?

Lastly, I'm a little confused why someone would layout a platform
where there are two identical PHYs connected to one MAC on the same
board. I can see the use case given in 802.3 - where one plugs in
the media specific attachment unit depending on the media being
used - Wikipedia has a photo of the connector on a Sun Ultra 1 -
but to have two PHYs on the same board doesn't make much sense to
me. What is trying to be achieved with these two PHYs on the same
board?

It's got me wondering whether the platform you have is some kind of
development board, and the manufacturer feels the need to provide a
network socket on either end of the board because folk don't have a
long enough ethernet cable to reach the other end! :D
Andrew Lunn Oct. 7, 2024, 4:37 p.m. UTC | #7
> That's a legit point. I mentioned in the cover for V1 that this in
> itself doesn't really bring anything useful. The only point being that
> it makes it easy to test if a PHY has a working isolation mode, but
> given that we'll assume that it doesn't by default, that whole point
> is moot.
> 
> I would therefore understand if you consider that having a kAPI for
> that isn't very interesting and that I shall include this work as part
> of the multi-PHY support.

kAPI add a lot of Maintenance burden. So we should not add them unless
they are justified. to me, there is not a good justification for this.

> Sure thing. There are multiple devices out-there that may have multiple
> PHYs accessible from the MAC, through muxers (I'm trying to be generic
> enough to address all cases, gpio muxers, mmio-controlled muxers, etc.),
> but let me describe the HW I'm working on that's a bit more problematic.
> 
> The first such platform I have has an fs_enet MAC, a pair of LXT973
> PHYs for which the isolate mode doesn't work, and no on-board circuitry to
> perform the isolation. Here, we have to power one PHY down when unused :
> 
>                 /--- LXT973
> fs_enet -- MII--|
>                 \--- LXT973

So you have at least regulators under Linux control? Is that what you
mean by power down? Pulling the plug and putting it back again is
somewhat different to isolation. All its state is going to be lost,
meaning phylib needs to completely initialise it again. Or can you
hide this using PM? Just suspend/resume it?

> The second board has a fs_enet MAC and a pair of KSZ8041 PHYs connected
> in MII.
> 
> The third one has a pair of KSZ8041 PHYs connected to a
> ucc_geth MAC in RMII.
> 
> On both these boards, we isolate the PHYs when unused, and we also
> drive a GPIO to toggle some on-board circuitry to disconnect the MII
> lines as well for the unused PHY. I'd have to run some tests to see if
> this circuitry could be enough, without relying at all on PHY
> isolation :
> 
>                    /--- KSZ8041
>                    |
>       MAC ------ MUX
>                  | | 
>   to SoC <-gpio--/ \--- KSZ8041
> 
> 
> One point is, if you look at the first case (no mux), we need to know
> if the PHYs are able to isolate or not in order to use the proper
> switching strategy (isolate or power-down).

That explains the hardware, but what are the use cases? How did the
hardware designer envision this hardware being used?

If you need to power the PHY off, you cannot have dynamic behaviour
where the first to have link wins. But if you can have the media side
functional, you can do some dynamic behaviours. Although, is it wise
for the link to come up, yet to be functionally dead because it has no
MAC connected?

There are some Marvell Switches which support both internal Copper
PHYs and a SERDES port. The hardware allows first to get link to have
a functional MAC. But in Linux we have not supported that, and we
leave the unused part down so it does not get link.

Maybe we actually want energy detect, not link, to decide which PHY
should get the MAC?  But i have no real idea what you can do with
energy detect, and it would also mean building out the read_status()
call to report additional things, etc.

	Andrew
Maxime Chevallier Oct. 8, 2024, 7:07 a.m. UTC | #8
On Mon, 7 Oct 2024 17:10:56 +0100
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Mon, Oct 07, 2024 at 03:48:39PM +0200, Maxime Chevallier wrote:
> > Sure thing. There are multiple devices out-there that may have multiple
> > PHYs accessible from the MAC, through muxers (I'm trying to be generic
> > enough to address all cases, gpio muxers, mmio-controlled muxers, etc.),
> > but let me describe the HW I'm working on that's a bit more problematic.
> > 
> > The first such platform I have has an fs_enet MAC, a pair of LXT973
> > PHYs for which the isolate mode doesn't work, and no on-board circuitry to
> > perform the isolation. Here, we have to power one PHY down when unused :
> > 
> >                 /--- LXT973
> > fs_enet -- MII--|
> >                 \--- LXT973
> > 
> > 
> > The second board has a fs_enet MAC and a pair of KSZ8041 PHYs connected
> > in MII.
> > 
> > The third one has a pair of KSZ8041 PHYs connected to a
> > ucc_geth MAC in RMII.
> > 
> > On both these boards, we isolate the PHYs when unused, and we also
> > drive a GPIO to toggle some on-board circuitry to disconnect the MII
> > lines as well for the unused PHY. I'd have to run some tests to see if
> > this circuitry could be enough, without relying at all on PHY
> > isolation :
> > 
> >                    /--- KSZ8041
> >                    |
> >       MAC ------ MUX
> >                  | | 
> >   to SoC <-gpio--/ \--- KSZ8041
> > 
> > 
> > One point is, if you look at the first case (no mux), we need to know
> > if the PHYs are able to isolate or not in order to use the proper
> > switching strategy (isolate or power-down).
> > 
> > I hope this clarifies the approach a little bit ?  
> 
> What I gather from the above is you have these scenarios:
> 
> 1) two LXT973 on a MII bus (not RMII, RGMII etc but the 802.3 defined
>    MII bus with four data lines in each direction, a bunch of control
>    signals, clocked at a maximum of 25MHz). In this case, you need to
>    power down each PHY so it doesn't interfere on the MII bus as the
>    PHY doesn't support isolate mode.

Correct

> 
> 2) two KSZ8041 on a MII bus to a multiplexer who's exact behaviour is
>    not yet known which may require the use of the PHYs isolate bit.

Correct as well

> 
> I would suggest that spending time adding infrastructure for a rare
> scenario, and when it is uncertain whether it needs to be used in
> these scenarios is premature.
> 
> Please validate on the two KSZ8041 setups whether isolate is
> necessary.

I'll do

> Presumably on those two KSZ88041 setups, the idea is to see which PHY
> ends up with media link first, and then switch between the two PHYs?

Indeed. I already have code for that (I was expecting that whole
discussion to happen in the RFC for said code :D )

> Lastly, I'm a little confused why someone would layout a platform
> where there are two identical PHYs connected to one MAC on the same
> board. I can see the use case given in 802.3 - where one plugs in
> the media specific attachment unit depending on the media being
> used - Wikipedia has a photo of the connector on a Sun Ultra 1 -
> but to have two PHYs on the same board doesn't make much sense to
> me. What is trying to be achieved with these two PHYs on the same
> board?

The use-case is redundancy, to switch between the PHYs when the link
goes down on one side. I don't know why bonding isn't used, I suspect
this is because there's not enough MACs on the device to do that. These
are pretty old hardware platforms that have been in use in the field
for quite some time and will continue to be for a while.

I've been trying to decompose support for what is a niche use-case into
something that can benefit other devices :

 - Turris omnia for example uses a MII mux at the serdes level to
switch between a PHY and an SFP bus. We could leverage a phy_mux infra
here to support these related use-cases

 - I've always considered that this is similar enough ( from the
end-user perspective ) to cases like MCBin or other switches that have
2 ports connected to the same MAC. 

In the end, we have 1 netdev, 2 ports, regardless of wether there are 2
PHYs or 1. Of course, the capabilities are not the same, we can't
detect link/power simultaneously in all situations, but I'm keeping
that in mind in the design, and I've talked about this a few weeks ago
at LPC [1].

Thanks,

Maxime

[1] : https://bootlin.com/pub/conferences/2024/lpc/chevallier-phy-port/chevallier-phy-port.pdf
Maxime Chevallier Oct. 8, 2024, 7:25 a.m. UTC | #9
On Mon, 7 Oct 2024 18:37:29 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> > That's a legit point. I mentioned in the cover for V1 that this in
> > itself doesn't really bring anything useful. The only point being that
> > it makes it easy to test if a PHY has a working isolation mode, but
> > given that we'll assume that it doesn't by default, that whole point
> > is moot.
> > 
> > I would therefore understand if you consider that having a kAPI for
> > that isn't very interesting and that I shall include this work as part
> > of the multi-PHY support.  
> 
> kAPI add a lot of Maintenance burden. So we should not add them unless
> they are justified. to me, there is not a good justification for this.

That's fine by me.

> 
> > Sure thing. There are multiple devices out-there that may have multiple
> > PHYs accessible from the MAC, through muxers (I'm trying to be generic
> > enough to address all cases, gpio muxers, mmio-controlled muxers, etc.),
> > but let me describe the HW I'm working on that's a bit more problematic.
> > 
> > The first such platform I have has an fs_enet MAC, a pair of LXT973
> > PHYs for which the isolate mode doesn't work, and no on-board circuitry to
> > perform the isolation. Here, we have to power one PHY down when unused :
> > 
> >                 /--- LXT973
> > fs_enet -- MII--|
> >                 \--- LXT973  
> 
> So you have at least regulators under Linux control? Is that what you
> mean by power down? Pulling the plug and putting it back again is
> somewhat different to isolation. All its state is going to be lost,
> meaning phylib needs to completely initialise it again. Or can you
> hide this using PM? Just suspend/resume it?

Ah no, I wasn't referring to regulators but rather the BMCR PDOWN bit to
just shut the PHY down, as in suspend.

Indeed the state is lost. The way I'm supporting this is :

 - If one PHY has the link, it keeps it until link-down
 - When link-down, I round-robin between the 2 phys: 

  - Attach the PHY to the netdev
  - See if it can establish link and negotiate with LP
  - If there's nothing after a given period ( 2 seconds default ), then
I detach the PHY, attach the other one, and start again, until one of
them has link.

That's very limited indeed, we have no way of saying "first that has
link wins".


> > The second board has a fs_enet MAC and a pair of KSZ8041 PHYs connected
> > in MII.
> > 
> > The third one has a pair of KSZ8041 PHYs connected to a
> > ucc_geth MAC in RMII.
> > 
> > On both these boards, we isolate the PHYs when unused, and we also
> > drive a GPIO to toggle some on-board circuitry to disconnect the MII
> > lines as well for the unused PHY. I'd have to run some tests to see if
> > this circuitry could be enough, without relying at all on PHY
> > isolation :
> > 
> >                    /--- KSZ8041
> >                    |
> >       MAC ------ MUX
> >                  | | 
> >   to SoC <-gpio--/ \--- KSZ8041
> > 
> > 
> > One point is, if you look at the first case (no mux), we need to know
> > if the PHYs are able to isolate or not in order to use the proper
> > switching strategy (isolate or power-down).  
> 
> That explains the hardware, but what are the use cases? How did the
> hardware designer envision this hardware being used?

The use-case is link redundancy, if one PHY loses the link, we hope
that we still have link on the other one and switchover. This is one of
the things I discussed at netdev 0x17.

> If you need to power the PHY off, you cannot have dynamic behaviour
> where the first to have link wins. But if you can have the media side
> functional, you can do some dynamic behaviours.

True.

> Although, is it wise
> for the link to come up, yet to be functionally dead because it has no
> MAC connected?

Good point. What would you think ? I already deal with the identified
issue which is that both PHYs are link-up with LP, both connected to
the same switch. When we switch between the active PHYs, we send a
gratuitous ARP on the new PHY to refresh the switch's FDB.

Do you see that as being an issue, having the LP see link-up when the
link cannot actually convey data ? Besides the energy detect feature
you mention, I don't see what other options we can have unfortunately :(

> There are some Marvell Switches which support both internal Copper
> PHYs and a SERDES port. The hardware allows first to get link to have
> a functional MAC. But in Linux we have not supported that, and we
> leave the unused part down so it does not get link.

My plan is to support these as well. For the end-user, it makes no
difference wether the HW internally has 2 PHYs each with one port, or 1
phy with 2 ports. So to me, if we want to support phy_mux, we should
also support the case you mention above. I have some code to support
this, but that's the part where I'm still getting things ironed-out,
this is pretty tricky to represent that properly, especially in DT.

>
> Maybe we actually want energy detect, not link, to decide which PHY
> should get the MAC?  But i have no real idea what you can do with
> energy detect, and it would also mean building out the read_status()
> call to report additional things, etc.

Note that I'm trying to support a bigger set of use-cases besides the
pure 2-PHY setup. One being that we have a MUX within the SoC on the
SERDES lanes, allowing to steer the MII interface between a PHY and an
SFP bus (Turris Omnia has such a setup). Is it possible to have an
equivalent "energy detect" on all kinds of SFPs ?

As a note, I do see that both Russell and you may think you're being
"drip-fed" (I learned that term today) information, that's not my
intent at all, I wasn't expecting this discussion now, sorry about that.

I was saying to Russell that I would start a new thread, but we already
have a discussion going here, let me know if we shall continue the
discussion here or on a new thread.

Thanks,

Maxime
Andrew Lunn Oct. 8, 2024, 1 p.m. UTC | #10
> > So you have at least regulators under Linux control? Is that what you
> > mean by power down? Pulling the plug and putting it back again is
> > somewhat different to isolation. All its state is going to be lost,
> > meaning phylib needs to completely initialise it again. Or can you
> > hide this using PM? Just suspend/resume it?
> 
> Ah no, I wasn't referring to regulators but rather the BMCR PDOWN bit to
> just shut the PHY down, as in suspend.

Ah! I wounder what 802.3 says about PDOWN? Does it say anything about
it being equivalent to ISOLATE? That the pins go HI-Z? Are we talking
about something semi-reliable, or something which just happens to work
for this PHY?

> Indeed the state is lost. The way I'm supporting this is :
> 
>  - If one PHY has the link, it keeps it until link-down
>  - When link-down, I round-robin between the 2 phys: 
> 
>   - Attach the PHY to the netdev
>   - See if it can establish link and negotiate with LP
>   - If there's nothing after a given period ( 2 seconds default ), then
> I detach the PHY, attach the other one, and start again, until one of
> them has link.

This sounds pretty invasive to the MAC driver. I don't think you need
to attach/detach each cycle, since you don't need to send/receive any
packets. You could hide this all in phylib. But that should be
considered as part of the bigger picture.

I assume it is not actually 2 seconds, but some random number in the
range 1-3 seconds, so when both ends are searching they do eventually
find each other?

> > That explains the hardware, but what are the use cases? How did the
> > hardware designer envision this hardware being used?
> 
> The use-case is link redundancy, if one PHY loses the link, we hope
> that we still have link on the other one and switchover. This is one of
> the things I discussed at netdev 0x17.

> > If you need to power the PHY off, you cannot have dynamic behaviour
> > where the first to have link wins. But if you can have the media side
> > functional, you can do some dynamic behaviours.
> 
> True.
> 
> > Although, is it wise
> > for the link to come up, yet to be functionally dead because it has no
> > MAC connected?
> 
> Good point. What would you think ? I already deal with the identified
> issue which is that both PHYs are link-up with LP, both connected to
> the same switch. When we switch between the active PHYs, we send a
> gratuitous ARP on the new PHY to refresh the switch's FDB.

It seems odd to me you have redundant cables going to one switch? I
would have the cables going in opposite directions, to two different
switches, and have the switches in at a minimum a ring, or ideally a
mesh.

I don't think the ARP is necessary. The link peer switch should flush
its tables when the link goes down. But switches further away don't
see such link events, yet they learn about the new location of the
host. I would also expect the host sees a loss of carrier and then the
carrier restored, which probably flushes all its tables, so it is
going to ARP anyway.

> 
> Do you see that as being an issue, having the LP see link-up when the
> link cannot actually convey data ? Besides the energy detect feature
> you mention, I don't see what other options we can have unfortunately :(

Maybe see what 802.3 says about advertising with no link
modes. Autoneg should complete, in that the peers exchange messages,
but the result of the autoneg is that they have no common modes, so
the link won't come up. Is it clearly defined what should happen in
this case? But we are in a corner case, similar to ISOLATE, which i
guess rarely gets tested, so is often broken. I would guess power
detection would be more reliable when implemented. 

> > There are some Marvell Switches which support both internal Copper
> > PHYs and a SERDES port. The hardware allows first to get link to have
> > a functional MAC. But in Linux we have not supported that, and we
> > leave the unused part down so it does not get link.
> 
> My plan is to support these as well. For the end-user, it makes no
> difference wether the HW internally has 2 PHYs each with one port, or 1
> phy with 2 ports. So to me, if we want to support phy_mux, we should
> also support the case you mention above. I have some code to support
> this, but that's the part where I'm still getting things ironed-out,
> this is pretty tricky to represent that properly, especially in DT.
> 
> >
> > Maybe we actually want energy detect, not link, to decide which PHY
> > should get the MAC?  But i have no real idea what you can do with
> > energy detect, and it would also mean building out the read_status()
> > call to report additional things, etc.
> 
> Note that I'm trying to support a bigger set of use-cases besides the
> pure 2-PHY setup. One being that we have a MUX within the SoC on the
> SERDES lanes, allowing to steer the MII interface between a PHY and an
> SFP bus (Turris Omnia has such a setup). Is it possible to have an
> equivalent "energy detect" on all kinds of SFPs ?

The LOS pin, which indicates if there is light entering the SFP.

> As a note, I do see that both Russell and you may think you're being
> "drip-fed" (I learned that term today) information, that's not my
> intent at all, I wasn't expecting this discussion now, sorry about that.

It is a difficult set of problems, and you are addressing it from the
very niche end first using mechanisms which i expect are not reliably
implemented. So we are going to ask lots of questions.

You probably would of got less questions if you have started with the
use cases for the Turris Omnia and Marvell Ethernet switch, which are
more mainstream, and then extended it with your niche device. But i
can understand this order, you probably have a customer with this
niche device...

	Andrew
Russell King (Oracle) Oct. 8, 2024, 1:22 p.m. UTC | #11
On Tue, Oct 08, 2024 at 03:00:53PM +0200, Andrew Lunn wrote:
> > > So you have at least regulators under Linux control? Is that what you
> > > mean by power down? Pulling the plug and putting it back again is
> > > somewhat different to isolation. All its state is going to be lost,
> > > meaning phylib needs to completely initialise it again. Or can you
> > > hide this using PM? Just suspend/resume it?
> > 
> > Ah no, I wasn't referring to regulators but rather the BMCR PDOWN bit to
> > just shut the PHY down, as in suspend.
> 
> Ah! I wounder what 802.3 says about PDOWN? Does it say anything about
> it being equivalent to ISOLATE? That the pins go HI-Z? Are we talking
> about something semi-reliable, or something which just happens to work
> for this PHY?

"The specific behavior of a PHY in the power-down state is
implementation specific. While in the power-down state, the PHY shall
respond to management transactions. During the transition to the
power-down state and while in the power-down state, the PHY shall not
generate spurious signals on the MII or GMII."

So no, there is no requirement in 802.3 for the MII bus to go into
HI-Z state, the only requirement is to avoid creating spurious
signals. One way to achieve that would be to go into Hi-Z state,
but another way would be to drive the signals to an inactive state.
Thus, as it's not defined, setting 0.11 can't be relied upon to
allow two PHYs to be on the same MII bus.

It seems we're into implementation specifics and not generalities
here.

> > Indeed the state is lost. The way I'm supporting this is :
> > 
> >  - If one PHY has the link, it keeps it until link-down
> >  - When link-down, I round-robin between the 2 phys: 
> > 
> >   - Attach the PHY to the netdev
> >   - See if it can establish link and negotiate with LP
> >   - If there's nothing after a given period ( 2 seconds default ), then
> > I detach the PHY, attach the other one, and start again, until one of
> > them has link.
> 
> This sounds pretty invasive to the MAC driver. I don't think you need
> to attach/detach each cycle, since you don't need to send/receive any
> packets. You could hide this all in phylib. But that should be
> considered as part of the bigger picture.

Given that management transactions are permitted while PDOWN is set,
it seems we're again into an implementation specific behaviour where
setting this bit results in the PHY losing its brains. :(

> > > Although, is it wise
> > > for the link to come up, yet to be functionally dead because it has no
> > > MAC connected?
> > 
> > Good point. What would you think ? I already deal with the identified
> > issue which is that both PHYs are link-up with LP, both connected to
> > the same switch. When we switch between the active PHYs, we send a
> > gratuitous ARP on the new PHY to refresh the switch's FDB.
> 
> It seems odd to me you have redundant cables going to one switch? I
> would have the cables going in opposite directions, to two different
> switches, and have the switches in at a minimum a ring, or ideally a
> mesh.
> 
> I don't think the ARP is necessary. The link peer switch should flush
> its tables when the link goes down. But switches further away don't
> see such link events, yet they learn about the new location of the
> host. I would also expect the host sees a loss of carrier and then the
> carrier restored, which probably flushes all its tables, so it is
> going to ARP anyway.

The ARP will be necessary if you want to have the two links going to two
different switches - otherwise how does the switches upstream of those
two switches know to route packets to the MAC's ethernet address to the
different path... you'd have to wait for the higher level switches to
age their tables.

> > Note that I'm trying to support a bigger set of use-cases besides the
> > pure 2-PHY setup. One being that we have a MUX within the SoC on the
> > SERDES lanes, allowing to steer the MII interface between a PHY and an
> > SFP bus (Turris Omnia has such a setup). Is it possible to have an
> > equivalent "energy detect" on all kinds of SFPs ?
> 
> The LOS pin, which indicates if there is light entering the SFP.

Basically... no. You can't trust anything that SFPs give you. True fibre
SFPs as per the original design are way better at giving a RXLOS signal,
but everything else (including GPON) are a game.

GPON SFPs may even use the RXLOS as their UART transmit pin, wiggling it
at random times.

SFPs are a mess.

(Sorry, for what I feel is another incomplete reply...)
Maxime Chevallier Oct. 8, 2024, 2:57 p.m. UTC | #12
On Tue, 8 Oct 2024 15:00:53 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> > > So you have at least regulators under Linux control? Is that what you
> > > mean by power down? Pulling the plug and putting it back again is
> > > somewhat different to isolation. All its state is going to be lost,
> > > meaning phylib needs to completely initialise it again. Or can you
> > > hide this using PM? Just suspend/resume it?  
> > 
> > Ah no, I wasn't referring to regulators but rather the BMCR PDOWN bit to
> > just shut the PHY down, as in suspend.  
> 
> Ah! I wounder what 802.3 says about PDOWN? Does it say anything about
> it being equivalent to ISOLATE? That the pins go HI-Z? Are we talking
> about something semi-reliable, or something which just happens to work
> for this PHY?

The spec doesn't say anything about hi-z on the MII for power-down, it
simply says (22.2.4.1.5 Power down) :

"During the transition to the power-down state and while
in the power-down state, the PHY shall not generate spurious
signals on the MII or GMII"

So my best guess is that it just happens to work for this PHY. It won't
work for serdes links for the same reasons as the isolate mode I guess,
reflections would make it too unstable ?

> > Indeed the state is lost. The way I'm supporting this is :
> > 
> >  - If one PHY has the link, it keeps it until link-down
> >  - When link-down, I round-robin between the 2 phys: 
> > 
> >   - Attach the PHY to the netdev
> >   - See if it can establish link and negotiate with LP
> >   - If there's nothing after a given period ( 2 seconds default ), then
> > I detach the PHY, attach the other one, and start again, until one of
> > them has link.  
> 
> This sounds pretty invasive to the MAC driver. I don't think you need
> to attach/detach each cycle, since you don't need to send/receive any
> packets. You could hide this all in phylib. But that should be
> considered as part of the bigger picture.

Sure, that's what I came-up with so far but that's indeed an implem
problem.

> 
> I assume it is not actually 2 seconds, but some random number in the
> range 1-3 seconds, so when both ends are searching they do eventually
> find each other?

Oleksji pointed that out to me at LPC, that makes sense indeed.

> 
> > > That explains the hardware, but what are the use cases? How did the
> > > hardware designer envision this hardware being used?  
> > 
> > The use-case is link redundancy, if one PHY loses the link, we hope
> > that we still have link on the other one and switchover. This is one of
> > the things I discussed at netdev 0x17.  
> 
> > > If you need to power the PHY off, you cannot have dynamic behaviour
> > > where the first to have link wins. But if you can have the media side
> > > functional, you can do some dynamic behaviours.  
> > 
> > True.
> >   
> > > Although, is it wise
> > > for the link to come up, yet to be functionally dead because it has no
> > > MAC connected?  
> > 
> > Good point. What would you think ? I already deal with the identified
> > issue which is that both PHYs are link-up with LP, both connected to
> > the same switch. When we switch between the active PHYs, we send a
> > gratuitous ARP on the new PHY to refresh the switch's FDB.  
> 
> It seems odd to me you have redundant cables going to one switch? I
> would have the cables going in opposite directions, to two different
> switches, and have the switches in at a minimum a ring, or ideally a
> mesh.
> 
> I don't think the ARP is necessary. The link peer switch should flush
> its tables when the link goes down. But switches further away don't
> see such link events, yet they learn about the new location of the
> host. I would also expect the host sees a loss of carrier and then the
> carrier restored, which probably flushes all its tables, so it is
> going to ARP anyway.

While I would agree with you on the theory, while testing we discovered
that sending that ARP was necessary to reliably update the switch's
tables :/

This is also what bonding does in active-backup mode.

> > 
> > Do you see that as being an issue, having the LP see link-up when the
> > link cannot actually convey data ? Besides the energy detect feature
> > you mention, I don't see what other options we can have unfortunately :(  
> 
> Maybe see what 802.3 says about advertising with no link
> modes. Autoneg should complete, in that the peers exchange messages,
> but the result of the autoneg is that they have no common modes, so
> the link won't come up. Is it clearly defined what should happen in
> this case? But we are in a corner case, similar to ISOLATE, which i
> guess rarely gets tested, so is often broken. I would guess power
> detection would be more reliable when implemented. 

I'll need to perform further tests on that, I haven't looked into
energy detect. Let me take a look :)

> > > There are some Marvell Switches which support both internal Copper
> > > PHYs and a SERDES port. The hardware allows first to get link to have
> > > a functional MAC. But in Linux we have not supported that, and we
> > > leave the unused part down so it does not get link.  
> > 
> > My plan is to support these as well. For the end-user, it makes no
> > difference wether the HW internally has 2 PHYs each with one port, or 1
> > phy with 2 ports. So to me, if we want to support phy_mux, we should
> > also support the case you mention above. I have some code to support
> > this, but that's the part where I'm still getting things ironed-out,
> > this is pretty tricky to represent that properly, especially in DT.
> >   
> > >
> > > Maybe we actually want energy detect, not link, to decide which PHY
> > > should get the MAC?  But i have no real idea what you can do with
> > > energy detect, and it would also mean building out the read_status()
> > > call to report additional things, etc.  
> > 
> > Note that I'm trying to support a bigger set of use-cases besides the
> > pure 2-PHY setup. One being that we have a MUX within the SoC on the
> > SERDES lanes, allowing to steer the MII interface between a PHY and an
> > SFP bus (Turris Omnia has such a setup). Is it possible to have an
> > equivalent "energy detect" on all kinds of SFPs ?  
> 
> The LOS pin, which indicates if there is light entering the SFP.
> 
> > As a note, I do see that both Russell and you may think you're being
> > "drip-fed" (I learned that term today) information, that's not my
> > intent at all, I wasn't expecting this discussion now, sorry about that.  
> 
> It is a difficult set of problems, and you are addressing it from the
> very niche end first using mechanisms which i expect are not reliably
> implemented. So we are going to ask lots of questions.

There's absolutely no problem with that :)

> You probably would of got less questions if you have started with the
> use cases for the Turris Omnia and Marvell Ethernet switch, which are
> more mainstream, and then extended it with your niche device. But i
> can understand this order, you probably have a customer with this
> niche device...

Oh but I plan to add support for the marvell switch, mcbin, and turris
first, as these boards are somewhat easily accessible and allows
converging towards a proper kAPI for that without relying on the boards
only I and a few other folks have.

That's another can of worms though :)

Maxime
Russell King (Oracle) Oct. 8, 2024, 3:27 p.m. UTC | #13
On Tue, Oct 08, 2024 at 04:57:42PM +0200, Maxime Chevallier wrote:
> Oh but I plan to add support for the marvell switch, mcbin, and turris
> first,

What do you think needs adding for the mcbin?

For the single-shot version, the serdes lines are hard-wired to the
SFP cages, so it's a MAC with a SFP cage directly connected.

For the double-shot, the switching happens dynamically within the
88x3310 PHY, so there's no need to fiddle with any isolate modes.

The only thing that is missing is switching the 88x3310's fibre
interface from the default 10gbase-r to 1000base-X and/or SGMII, and
allowing PHYs to be stacked on top. The former I have untested
patches for but the latter is something that's waiting for
networking/phylib to gain support for stacked PHY.

Switching the interface mode is very disruptive as it needs the PHY
to be software-reset, and if the RJ45 has link but one is simply
plugging in a SFP, hitting the PHY with a software reset will
disrupt that link.

Given that the mcbin has one SFP cage that is capable of 2500base-X,
1000base-X and SGMII, and two SFP cages that can do 10gbase-r, with
a PHY that can do 10/100/1G/2.5G/5G/10G over the RJ45, I'm not sure
adding more complexity really gains us very much other than...
additional complexity.
Maxime Chevallier Oct. 8, 2024, 4:41 p.m. UTC | #14
On Tue, 8 Oct 2024 16:27:43 +0100
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Tue, Oct 08, 2024 at 04:57:42PM +0200, Maxime Chevallier wrote:
> > Oh but I plan to add support for the marvell switch, mcbin, and turris
> > first,  
> 
> What do you think needs adding for the mcbin?
> 
> For the single-shot version, the serdes lines are hard-wired to the
> SFP cages, so it's a MAC with a SFP cage directly connected.
> 
> For the double-shot, the switching happens dynamically within the
> 88x3310 PHY, so there's no need to fiddle with any isolate modes.

Nothing related to isolate mode regarding the mcbin :) They aren't
even implemented on the 3310 PHYs anyway :)

> 
> The only thing that is missing is switching the 88x3310's fibre
> interface from the default 10gbase-r to 1000base-X and/or SGMII, and
> allowing PHYs to be stacked on top. The former I have untested
> patches for but the latter is something that's waiting for
> networking/phylib to gain support for stacked PHY.

That's one part of it indeed

> Switching the interface mode is very disruptive as it needs the PHY
> to be software-reset, and if the RJ45 has link but one is simply
> plugging in a SFP, hitting the PHY with a software reset will
> disrupt that link.
> 
> Given that the mcbin has one SFP cage that is capable of 2500base-X,
> 1000base-X and SGMII, and two SFP cages that can do 10gbase-r, with
> a PHY that can do 10/100/1G/2.5G/5G/10G over the RJ45, I'm not sure
> adding more complexity really gains us very much other than...
> additional complexity.

What I mean is the ability for users to see, from tools like ethtool,
that the MCBin doubleshot's eth0 and eth1 interfaces have 2 ports
(copper + sfp), and potentially allow picking which one to use in case
both ports are connected.

There are mutliple devices out-there with such configurations (some
marvell switches for example). Do you not see some value in this ?

This isn't related at all to isolate, but rather the bigger picture of
the type of topology I'm trying to improve support for.

Setups with 2 PHYs connected to the same MAC are similar to the eth0/1
interfaces in the sense that they offer 2 front-facing ports for one
single MAC. IMO it's easier to first deal with the MCBin setup first.

Again that's a whole other topic, but my idea would be to be able, from
ethtool, to see that mcbin eth0 has one port capable of
10/100/1000/2500/5000/10000BaseT, and another capable of
1000BaseX/2500BaseX/5GBaseR/10GBaseR or whatever the plugged SFP module
offers.

I do know that the MCBin has a large variety of interfaces easily
accessible, but it also looks like a good board to introduce such
multi-port support. Many people have it, it works well with an upstream
kernel, making testing and review effort easier IMO.

I know that this whole thing of dealing with 2 PHYs attached to the
same MAC has lots of ramifications (the 1 PHY 2 ports setup I just
mentionned, the phy_link_topology that has been added, the isolate
mode, muxing support, etc.), that's why I tried to cover all the angles
at netdevconf + LPC. I have code for most of that pretty-much ready to
send.

Thanks,

Maxime
Russell King (Oracle) Oct. 8, 2024, 5:05 p.m. UTC | #15
On Tue, Oct 08, 2024 at 06:41:02PM +0200, Maxime Chevallier wrote:
> On Tue, 8 Oct 2024 16:27:43 +0100
> "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> 
> > On Tue, Oct 08, 2024 at 04:57:42PM +0200, Maxime Chevallier wrote:
> > > Oh but I plan to add support for the marvell switch, mcbin, and turris
> > > first,  
> > 
> > What do you think needs adding for the mcbin?
> > 
> > For the single-shot version, the serdes lines are hard-wired to the
> > SFP cages, so it's a MAC with a SFP cage directly connected.
> > 
> > For the double-shot, the switching happens dynamically within the
> > 88x3310 PHY, so there's no need to fiddle with any isolate modes.
> 
> Nothing related to isolate mode regarding the mcbin :) They aren't
> even implemented on the 3310 PHYs anyway :)
> 
> > 
> > The only thing that is missing is switching the 88x3310's fibre
> > interface from the default 10gbase-r to 1000base-X and/or SGMII, and
> > allowing PHYs to be stacked on top. The former I have untested
> > patches for but the latter is something that's waiting for
> > networking/phylib to gain support for stacked PHY.
> 
> That's one part of it indeed
> 
> > Switching the interface mode is very disruptive as it needs the PHY
> > to be software-reset, and if the RJ45 has link but one is simply
> > plugging in a SFP, hitting the PHY with a software reset will
> > disrupt that link.
> > 
> > Given that the mcbin has one SFP cage that is capable of 2500base-X,
> > 1000base-X and SGMII, and two SFP cages that can do 10gbase-r, with
> > a PHY that can do 10/100/1G/2.5G/5G/10G over the RJ45, I'm not sure
> > adding more complexity really gains us very much other than...
> > additional complexity.
> 
> What I mean is the ability for users to see, from tools like ethtool,
> that the MCBin doubleshot's eth0 and eth1 interfaces have 2 ports
> (copper + sfp), and potentially allow picking which one to use in case
> both ports are connected.
> 
> There are mutliple devices out-there with such configurations (some
> marvell switches for example). Do you not see some value in this ?

Many PHYs that have two media facing ports give configuration of the
priority between the two interfaces, and yes, there would definitely be
value in exposing that to userspace, thereby allowing userspace to
configure the policy there.

This would probably be more common than the two-PHY issue that we're
starting with - as I believe the 88e151x PHYs support exactly the same
thing when used with a RGMII host interface. The serdes port becomes
available for "fiber" and it is only 1000base-X there.

I was trying to work out what the motivation was for this platform.

Sorry if you mentioned it at NetdevConf and I've forgotten it all,
it was quite a while ago now!

Thanks!
Maxime Chevallier Oct. 8, 2024, 5:19 p.m. UTC | #16
On Tue, 8 Oct 2024 18:05:05 +0100
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> > What I mean is the ability for users to see, from tools like ethtool,
> > that the MCBin doubleshot's eth0 and eth1 interfaces have 2 ports
> > (copper + sfp), and potentially allow picking which one to use in case
> > both ports are connected.
> > 
> > There are mutliple devices out-there with such configurations (some
> > marvell switches for example). Do you not see some value in this ?  
> 
> Many PHYs that have two media facing ports give configuration of the
> priority between the two interfaces, and yes, there would definitely be
> value in exposing that to userspace, thereby allowing userspace to
> configure the policy there.

Great !

> This would probably be more common than the two-PHY issue that we're
> starting with - as I believe the 88e151x PHYs support exactly the same
> thing when used with a RGMII host interface. The serdes port becomes
> available for "fiber" and it is only 1000base-X there.

True, I've seen several setups with this so far indeed, as well as with
PHYs from other vendors.

> I was trying to work out what the motivation was for this platform.

It also turns out that the MCBin is one of the only boards that has a
permanent spot on my desk, as it's a pretty nice platform to experiment
with various PHY aspects.

> 
> Sorry if you mentioned it at NetdevConf and I've forgotten it all,
> it was quite a while ago now!

No worries :)

> 
> Thanks!

Thanks for your feedback on that whole topic,

Maxime
diff mbox series

Patch

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 4f3e742907cb..5a689da060d1 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -1811,3 +1811,47 @@  int phy_ethtool_nway_reset(struct net_device *ndev)
 	return ret;
 }
 EXPORT_SYMBOL(phy_ethtool_nway_reset);
+
+/**
+ * phy_get_config - Get PHY configuration parameters
+ * @phydev: the PHY device to act upon
+ * @phy_cfg:  The configuration to apply
+ */
+
+int phy_get_config(struct phy_device *phydev,
+		   struct phy_device_config *phy_cfg)
+{
+	mutex_lock(&phydev->lock);
+	phy_cfg->isolate = phydev->isolated;
+	mutex_unlock(&phydev->lock);
+
+	return 0;
+}
+
+/**
+ * phy_set_config - Set PHY configuration parameters
+ * @phydev: the PHY device to act upon
+ * @phy_cfg: the configuration to apply
+ * @extack: a netlink extack for useful error reporting
+ */
+
+int phy_set_config(struct phy_device *phydev,
+		   const struct phy_device_config *phy_cfg,
+		   struct netlink_ext_ack *extack)
+{
+	bool isolate_change;
+	int ret;
+
+	mutex_lock(&phydev->lock);
+	isolate_change = (phy_cfg->isolate != phydev->isolated);
+	mutex_unlock(&phydev->lock);
+
+	if (!isolate_change)
+		return 0;
+
+	ret = phy_isolate(phydev, phy_cfg->isolate);
+	if (ret)
+		NL_SET_ERR_MSG(extack, "Error while configuring PHY isolation");
+
+	return ret;
+}
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 9294b73c391a..1111a3cbcb02 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -3857,6 +3857,8 @@  static const struct ethtool_phy_ops phy_ethtool_phy_ops = {
 	.get_plca_status	= phy_ethtool_get_plca_status,
 	.start_cable_test	= phy_start_cable_test,
 	.start_cable_test_tdr	= phy_start_cable_test_tdr,
+	.get_config		= phy_get_config,
+	.set_config		= phy_set_config,
 };
 
 static const struct phylib_stubs __phylib_stubs = {
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 12f6dc567598..480ee99a69a5 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -1140,6 +1140,7 @@  struct phy_device;
 struct phy_tdr_config;
 struct phy_plca_cfg;
 struct phy_plca_status;
+struct phy_device_config;
 
 /**
  * struct ethtool_phy_ops - Optional PHY device options
@@ -1151,6 +1152,8 @@  struct phy_plca_status;
  * @get_plca_status: Get PLCA configuration.
  * @start_cable_test: Start a cable test
  * @start_cable_test_tdr: Start a Time Domain Reflectometry cable test
+ * @get_config: Retrieve phy device configuration parameters
+ * @set_config: Set phy device configuration parameters
  *
  * All operations are optional (i.e. the function pointer may be set to %NULL)
  * and callers must take this into account. Callers must hold the RTNL lock.
@@ -1172,6 +1175,11 @@  struct ethtool_phy_ops {
 	int (*start_cable_test_tdr)(struct phy_device *phydev,
 				    struct netlink_ext_ack *extack,
 				    const struct phy_tdr_config *config);
+	int (*get_config)(struct phy_device *phydev,
+			  struct phy_device_config *phy_cfg);
+	int (*set_config)(struct phy_device *phydev,
+			  const struct phy_device_config *phy_cfg,
+			  struct netlink_ext_ack *extack);
 };
 
 /**
diff --git a/include/linux/phy.h b/include/linux/phy.h
index e43f7169c57d..662ba57cd0de 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -886,6 +886,20 @@  enum phy_led_modes {
 	__PHY_LED_MODES_NUM,
 };
 
+/**
+ * struct phy_device_config - General PHY device configuration parameters for
+ * status reporting and bulk configuration
+ *
+ * A structure containing generic PHY device information, allowing to expose
+ * internal status to userspace, and perform PHY configuration in a controlled
+ * manner.
+ *
+ * @isolate: The MII-side isolation status of the PHY
+ */
+struct phy_device_config {
+	bool isolate;
+};
+
 /**
  * struct phy_led: An LED driven by the PHY
  *
@@ -2085,6 +2099,11 @@  int phy_ethtool_set_plca_cfg(struct phy_device *phydev,
 			     struct netlink_ext_ack *extack);
 int phy_ethtool_get_plca_status(struct phy_device *phydev,
 				struct phy_plca_status *plca_st);
+int phy_get_config(struct phy_device *phydev,
+		   struct phy_device_config *phy_cfg);
+int phy_set_config(struct phy_device *phydev,
+		   const struct phy_device_config *phy_cfg,
+		   struct netlink_ext_ack *extack);
 
 int __phy_hwtstamp_get(struct phy_device *phydev,
 		       struct kernel_hwtstamp_config *config);