mbox series

[RFC,net-next,0/7] net: phy: introduce phy numbering

Message ID 20230907092407.647139-1-maxime.chevallier@bootlin.com (mailing list archive)
Headers show
Series net: phy: introduce phy numbering | expand

Message

Maxime Chevallier Sept. 7, 2023, 9:23 a.m. UTC
Hello everyone,

This is the first RFC series introducing ethernet PHY numbering, in an
effort to better represent the link components and allow userspace to
configure these.

As of today, PHY devices are hidden behind the struct net_device from
userspace, but there exists commands such as PLCA configuration,
cable-testing, soon timestamping, that actually target the phy_device.

These commands rely on the ndev->phydev pointer to find the phy_device.

However, there exists use-cases where we have multiple PHY devices
between the MAC and the front-facing port. The most common case right
now is when a PHY acts as a media-converter, and is wired to an SFP
port :

[MAC] - [PHY] - [SFP][PHY]


Modules plugged in that port may contain a PHY too, and this is
where discrepencies start to happen.

In this case, ndev->phydev will point to the innermost PHY. Users
willing to use the SFP phy for cable-testing for example would get
unexpected results, as the middle PHY will be reached.

This is worsen by the fact that in a scenario like this :

[MAC] - [SFP][PHY]

the ndev->phydev pointer do point to the SFP PHY.

This is only the tip of the iceberg, such scenarios can happen with
other designs that include a mii mux, which isn't supported yet but
would require PHY enumeration to work.

This series therefore tries to add the ability to enumerate the PHYs
sitting behind a MAC, and assign them a unique number.

I've used the term of "phy namespace" to emphasize the fact that the PHY
numbering really is specific to an interface, each interface maintaining
its numbering scheme, starting from 0, and wrapping after all u32 values
have been exhausted.

The PHY namespace is for now contained within struct net_device, meaning
that PHYs that aren't related at all to any net_device wouldn't be
numbered as of right now. The only case I identified is when a PHY sits
between 2 DSA switches, but I don't know how relevant this is.

The phy_ns is its own struct, for now owned by net_device, but it could
be shared with struct dsa_port for example to make a MAC and the DSA CPU
port share the same phy ns.

This is early work, and it has its shortcomings :

 - I didn't include netlink notifications on PHY insersion/removal, but
   I think this could definitely be useful

 - the netlink API would need polishing, I struggle a bit with finding
   the correct netlink design pattern to return variale-length list of u32.

 - I would like to port netlink commands such as cable-test and plca to
   this new model, by adding an optional PHYINDEX field in the request.
   The idea would be that if the PHYINDEX is passed in the netlink
   request, we lookup the corresponding phy_device, and if not, we
   fallback to ndev->phydev.

 - Naming is hard, feel free to suggest any correction

Let me know what you think of this approach,

Best regards,

Maxime

Maxime Chevallier (7):
  net: phy: introduce phy numbering and phy namespaces
  net: sfp: pass the phy_device when disconnecting an sfp module's PHY
  net: phy: add helpers to handle sfp phy connect/disconnect
  net: ethtool: add a netlink command to list PHYs
  netlink: specs: add phy_list command
  net: ethtool: add a netlink command to get PHY information
  netlink: specs: add command to show individual phy information

 Documentation/netlink/specs/ethtool.yaml |  65 ++++++++++++
 drivers/net/phy/Makefile                 |   2 +-
 drivers/net/phy/at803x.c                 |   2 +
 drivers/net/phy/marvell-88x2222.c        |   2 +
 drivers/net/phy/marvell.c                |   2 +
 drivers/net/phy/marvell10g.c             |   2 +
 drivers/net/phy/phy_device.c             |  53 ++++++++++
 drivers/net/phy/phy_ns.c                 |  65 ++++++++++++
 drivers/net/phy/phylink.c                |   3 +-
 drivers/net/phy/sfp-bus.c                |   4 +-
 include/linux/netdevice.h                |   2 +
 include/linux/phy.h                      |   6 ++
 include/linux/phy_ns.h                   |  30 ++++++
 include/linux/sfp.h                      |   2 +-
 include/uapi/linux/ethtool.h             |   7 ++
 include/uapi/linux/ethtool_netlink.h     |  27 +++++
 net/core/dev.c                           |   3 +
 net/ethtool/Makefile                     |   2 +-
 net/ethtool/netlink.c                    |  20 ++++
 net/ethtool/netlink.h                    |   4 +
 net/ethtool/phy.c                        | 124 +++++++++++++++++++++++
 net/ethtool/phy_list.c                   |  99 ++++++++++++++++++
 22 files changed, 520 insertions(+), 6 deletions(-)
 create mode 100644 drivers/net/phy/phy_ns.c
 create mode 100644 include/linux/phy_ns.h
 create mode 100644 net/ethtool/phy.c
 create mode 100644 net/ethtool/phy_list.c

Comments

Jakub Kicinski Sept. 8, 2023, 3:41 p.m. UTC | #1
On Thu,  7 Sep 2023 11:23:58 +0200 Maxime Chevallier wrote:
>  - the netlink API would need polishing, I struggle a bit with finding
>    the correct netlink design pattern to return variale-length list of u32.

Think of them as a list, not an array.

Dump them one by one, don't try to wrap them in any way:
https://docs.kernel.org/next/userspace-api/netlink/specs.html#multi-attr-arrays
People have tried other things in the past:
https://docs.kernel.org/next/userspace-api/netlink/genetlink-legacy.html#attribute-type-nests
but in the end they add constraints and pain for little benefit.
Maxime Chevallier Sept. 11, 2023, 1:09 p.m. UTC | #2
Hello Jakub

On Fri, 8 Sep 2023 08:41:08 -0700
Jakub Kicinski <kuba@kernel.org> wrote:

> On Thu,  7 Sep 2023 11:23:58 +0200 Maxime Chevallier wrote:
> >  - the netlink API would need polishing, I struggle a bit with finding
> >    the correct netlink design pattern to return variale-length list of u32.  
> 
> Think of them as a list, not an array.
> 
> Dump them one by one, don't try to wrap them in any way:
> https://docs.kernel.org/next/userspace-api/netlink/specs.html#multi-attr-arrays
> People have tried other things in the past:
> https://docs.kernel.org/next/userspace-api/netlink/genetlink-legacy.html#attribute-type-nests
> but in the end they add constraints and pain for little benefit.

Thanks for the pointers, this makes much more sense than my attempt at
creating an array.

This and your other comment on the .do vs .dump is exactly what I was
missing in my understanding of netlink.

Maxime
Andrew Lunn Sept. 12, 2023, 3:36 p.m. UTC | #3
> The PHY namespace is for now contained within struct net_device, meaning
> that PHYs that aren't related at all to any net_device wouldn't be
> numbered as of right now. The only case I identified is when a PHY sits
> between 2 DSA switches, but I don't know how relevant this is.

It might be relevant for the CPU port of the switch. The SoC ethernet
with a PHY has its PHY associated to a netdev, and so it can be
managed. However, the CPU port does not have a netdev, so the PHY is a
bit homeless. Phylink gained the ability to manage PHYs which are not
associated to a netdev, so i think it can manage such a PHY. If not,
we assume the PHY is strapped to perform link up and autoneg on power
on, and otherwise leave it alone.

	Andrew
Maxime Chevallier Sept. 12, 2023, 3:51 p.m. UTC | #4
Hello Andrew,

On Tue, 12 Sep 2023 17:36:56 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> > The PHY namespace is for now contained within struct net_device, meaning
> > that PHYs that aren't related at all to any net_device wouldn't be
> > numbered as of right now. The only case I identified is when a PHY sits
> > between 2 DSA switches, but I don't know how relevant this is.  
> 
> It might be relevant for the CPU port of the switch. The SoC ethernet
> with a PHY has its PHY associated to a netdev, and so it can be
> managed. However, the CPU port does not have a netdev, so the PHY is a
> bit homeless. Phylink gained the ability to manage PHYs which are not
> associated to a netdev, so i think it can manage such a PHY. If not,
> we assume the PHY is strapped to perform link up and autoneg on power
> on, and otherwise leave it alone.

I agree and my plan, although still a bit hazy, is to share the phy_ns
between the netdev associated to the Ethernet MAC and the CPU dsa_port
of the switch, as they are on the same link. We could grab infos on the
PHYs connected to the port that way. Although the PHY isn't connected
to the same MAC, it's part of the same link, so I think it would be OK
to share the phy_ns.

We already do something in that direction, which is the stats gathering
on the CPU dsa port, which are reported alongside stats from the
ethernet MAC.

Would that be OK ? I haven't started the DSA part, I was waiting for
review on the overall idea, but I tried to keep this into consideration
hence the phy_ns notion :)

Thanks,

Maxime
Christophe Leroy Sept. 14, 2023, 10:06 a.m. UTC | #5
Le 07/09/2023 à 11:23, Maxime Chevallier a écrit :
> [Vous ne recevez pas souvent de courriers de maxime.chevallier@bootlin.com. Découvrez pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ]
> 
> Hello everyone,
> 
> This is the first RFC series introducing ethernet PHY numbering, in an
> effort to better represent the link components and allow userspace to
> configure these.
> 
> As of today, PHY devices are hidden behind the struct net_device from
> userspace, but there exists commands such as PLCA configuration,
> cable-testing, soon timestamping, that actually target the phy_device.
> 
> These commands rely on the ndev->phydev pointer to find the phy_device.
> 
> However, there exists use-cases where we have multiple PHY devices
> between the MAC and the front-facing port. The most common case right
> now is when a PHY acts as a media-converter, and is wired to an SFP
> port :
> 
> [MAC] - [PHY] - [SFP][PHY]

FWIW when thinking about multiple PHY to a single MAC, what comes to my 
mind is the SIS 900 board, and its driver net/ethernet/sis/sis900.c

It has a function sis900_default_phy() that loops over all phys to find 
one with up-link then to put all but that one in ISOLATE mode. Then when 
the link goes down it loops again to find another up-link.

I guess your series would also help in that case, wouldn't it ?

Christophe
Andrew Lunn Sept. 14, 2023, 12:47 p.m. UTC | #6
> FWIW when thinking about multiple PHY to a single MAC, what comes to my 
> mind is the SIS 900 board, and its driver net/ethernet/sis/sis900.c
> 
> It has a function sis900_default_phy() that loops over all phys to find 
> one with up-link then to put all but that one in ISOLATE mode. Then when 
> the link goes down it loops again to find another up-link.
> 
> I guess your series would also help in that case, wouldn't it ?

Yes, it would. However, that driver would need its PHY handling
re-written because it is using the old MII code, not phylib.

	Andrew