diff mbox series

[RFC,net-next,1/7] net: phy: introduce phy numbering and phy namespaces

Message ID 20230907092407.647139-2-maxime.chevallier@bootlin.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series net: phy: introduce phy numbering | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 5479 this patch: 5480
netdev/cc_maintainers warning 2 maintainers not CCed: daniel@iogearbox.net sd@queasysnail.net
netdev/build_clang fail Errors and warnings before: 2228 this patch: 2230
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn fail Errors and warnings before: 5718 this patch: 5719
netdev/checkpatch warning WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
netdev/kdoc fail Errors and warnings before: 0 this patch: 2
netdev/source_inline success Was 0 now: 0

Commit Message

Maxime Chevallier Sept. 7, 2023, 9:23 a.m. UTC
Link topologies containing multiple network PHYs attached to the same
net_device can be found when using a PHY as a media converter for use
with an SFP connector, on which an SFP transceiver containing a PHY can
be used.

With the current model, the transceiver's PHY can't be used for
operations such as cable testing, timestamping, macsec offload, etc.

The reason being that most of the logic for these configuration, coming
from either ethtool netlink or ioctls tend to use netdev->phydev, which
in multi-phy systems will reference the PHY closest to the MAC.

Introduce a numbering scheme allowing to enumerate PHY devices that
belong to any netdev, which can in turn allow userspace to take more
precise decisions with regard to each PHY's configuration.

The numbering is maintained per-netdev, hence the notion of PHY
namespaces. The numbering works similarly to a netdevice's ifindex, with
identifiers that are only recycled once INT_MAX has been reached.

This prevents races that could occur between PHY listing and SFP
transceiver removal/insertion.

The identifiers are assigned at phy_attach time, as the numbering
depends on the netdevice the phy is attached to.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
---
 drivers/net/phy/Makefile     |  2 +-
 drivers/net/phy/phy_device.c | 13 ++++++++
 drivers/net/phy/phy_ns.c     | 65 ++++++++++++++++++++++++++++++++++++
 include/linux/netdevice.h    |  2 ++
 include/linux/phy.h          |  4 +++
 include/linux/phy_ns.h       | 30 +++++++++++++++++
 net/core/dev.c               |  3 ++
 7 files changed, 118 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/phy/phy_ns.c
 create mode 100644 include/linux/phy_ns.h

Comments

Russell King (Oracle) Sept. 7, 2023, 9:32 a.m. UTC | #1
On Thu, Sep 07, 2023 at 11:23:59AM +0200, Maxime Chevallier wrote:
> @@ -640,6 +642,7 @@ struct phy_device {
>  
>  	struct device_link *devlink;
>  
> +	int phyindex;
>  	u32 phy_id;
>  
>  	struct phy_c45_device_ids c45_ids;
> @@ -761,6 +764,7 @@ struct phy_device {
>  	/* MACsec management functions */
>  	const struct macsec_ops *macsec_ops;
>  #endif
> +	struct list_head node;

I haven't yet fully looked at this, but the one thing that did stand out
was this - please name it "phy_ns_node" so that the purpose of this node
is clear.

Thanks.
Russell King (Oracle) Sept. 7, 2023, 10:14 a.m. UTC | #2
On Thu, Sep 07, 2023 at 11:23:59AM +0200, Maxime Chevallier wrote:
> Link topologies containing multiple network PHYs attached to the same
> net_device can be found when using a PHY as a media converter for use
> with an SFP connector, on which an SFP transceiver containing a PHY can
> be used.
> 
> With the current model, the transceiver's PHY can't be used for
> operations such as cable testing, timestamping, macsec offload, etc.
> 
> The reason being that most of the logic for these configuration, coming
> from either ethtool netlink or ioctls tend to use netdev->phydev, which
> in multi-phy systems will reference the PHY closest to the MAC.
> 
> Introduce a numbering scheme allowing to enumerate PHY devices that
> belong to any netdev, which can in turn allow userspace to take more
> precise decisions with regard to each PHY's configuration.
> 
> The numbering is maintained per-netdev, hence the notion of PHY
> namespaces. The numbering works similarly to a netdevice's ifindex, with
> identifiers that are only recycled once INT_MAX has been reached.
> 
> This prevents races that could occur between PHY listing and SFP
> transceiver removal/insertion.
> 
> The identifiers are assigned at phy_attach time, as the numbering
> depends on the netdevice the phy is attached to.

I think you can simplify this code quite a bit by using idr.
idr_alloc_cyclic() looks like it will do the allocation you want,
plus the IDR subsystem will store the pointer to the object (in
this case the phy device) and allow you to look that up. That
probably gets rid of quite a bit of code.

You will need to handle the locking around IDR however.
Maxime Chevallier Sept. 7, 2023, 12:19 p.m. UTC | #3
On Thu, 7 Sep 2023 11:14:08 +0100
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Thu, Sep 07, 2023 at 11:23:59AM +0200, Maxime Chevallier wrote:
> > Link topologies containing multiple network PHYs attached to the same
> > net_device can be found when using a PHY as a media converter for use
> > with an SFP connector, on which an SFP transceiver containing a PHY can
> > be used.
> > 
> > With the current model, the transceiver's PHY can't be used for
> > operations such as cable testing, timestamping, macsec offload, etc.
> > 
> > The reason being that most of the logic for these configuration, coming
> > from either ethtool netlink or ioctls tend to use netdev->phydev, which
> > in multi-phy systems will reference the PHY closest to the MAC.
> > 
> > Introduce a numbering scheme allowing to enumerate PHY devices that
> > belong to any netdev, which can in turn allow userspace to take more
> > precise decisions with regard to each PHY's configuration.
> > 
> > The numbering is maintained per-netdev, hence the notion of PHY
> > namespaces. The numbering works similarly to a netdevice's ifindex, with
> > identifiers that are only recycled once INT_MAX has been reached.
> > 
> > This prevents races that could occur between PHY listing and SFP
> > transceiver removal/insertion.
> > 
> > The identifiers are assigned at phy_attach time, as the numbering
> > depends on the netdevice the phy is attached to.  
> 
> I think you can simplify this code quite a bit by using idr.
> idr_alloc_cyclic() looks like it will do the allocation you want,
> plus the IDR subsystem will store the pointer to the object (in
> this case the phy device) and allow you to look that up. That
> probably gets rid of quite a bit of code.
> 
> You will need to handle the locking around IDR however.

Oh thanks for pointing this out. I had considered idr but I didn't spot
the _cyclic() helper, and I had ruled that out thinking it would re-use
ids directly after freeing them. I'll be more than happy to use that.

Thanks,

Maxime
Jakub Kicinski Sept. 8, 2023, 3:36 p.m. UTC | #4
On Thu, 7 Sep 2023 14:19:04 +0200 Maxime Chevallier wrote:
> > I think you can simplify this code quite a bit by using idr.
> > idr_alloc_cyclic() looks like it will do the allocation you want,
> > plus the IDR subsystem will store the pointer to the object (in
> > this case the phy device) and allow you to look that up. That
> > probably gets rid of quite a bit of code.
> > 
> > You will need to handle the locking around IDR however.  
> 
> Oh thanks for pointing this out. I had considered idr but I didn't spot
> the _cyclic() helper, and I had ruled that out thinking it would re-use
> ids directly after freeing them. I'll be more than happy to use that.

Perhaps use xarray directly, I don't think we need the @base offset or
quick access to @next which AFAICT is the only reason one would prefer
IDR?
Maxime Chevallier Sept. 11, 2023, 1:05 p.m. UTC | #5
Hello Jakub,

On Fri, 8 Sep 2023 08:36:08 -0700
Jakub Kicinski <kuba@kernel.org> wrote:

> On Thu, 7 Sep 2023 14:19:04 +0200 Maxime Chevallier wrote:
> > > I think you can simplify this code quite a bit by using idr.
> > > idr_alloc_cyclic() looks like it will do the allocation you want,
> > > plus the IDR subsystem will store the pointer to the object (in
> > > this case the phy device) and allow you to look that up. That
> > > probably gets rid of quite a bit of code.
> > > 
> > > You will need to handle the locking around IDR however.    
> > 
> > Oh thanks for pointing this out. I had considered idr but I didn't spot
> > the _cyclic() helper, and I had ruled that out thinking it would re-use
> > ids directly after freeing them. I'll be more than happy to use that.  
> 
> Perhaps use xarray directly, I don't think we need the @base offset or
> quick access to @next which AFAICT is the only reason one would prefer
> IDR?

Oh indeed xa_alloc_cyclic looks to fit perfectly, thanks !

Maxime
Andrew Lunn Sept. 12, 2023, 3:41 p.m. UTC | #6
> Introduce a numbering scheme allowing to enumerate PHY devices that
> belong to any netdev, which can in turn allow userspace to take more
> precise decisions with regard to each PHY's configuration.

A minor point, and i know naming is hard, but i keep reading _ns_ and
think namespace, as in ip netns. Maybe we should think of something
other than ns.

      Andrew
Maxime Chevallier Sept. 12, 2023, 4:10 p.m. UTC | #7
Hello,

On Tue, 12 Sep 2023 17:41:31 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> > Introduce a numbering scheme allowing to enumerate PHY devices that
> > belong to any netdev, which can in turn allow userspace to take more
> > precise decisions with regard to each PHY's configuration.  
> 
> A minor point, and i know naming is hard, but i keep reading _ns_ and
> think namespace, as in ip netns. Maybe we should think of something
> other than ns.

Yeah that was the initial idea, to imply that the numering is
independent between netdevices... I thought about "phy_list", "phys",
"phy_devices" but none of that felt correct :(

Any idea here would be welcome :D

Maxime

>       Andrew
Andrew Lunn Sept. 12, 2023, 4:15 p.m. UTC | #8
On Thu, Sep 07, 2023 at 11:23:59AM +0200, Maxime Chevallier wrote:
> Link topologies containing multiple network PHYs attached to the same
> net_device can be found when using a PHY as a media converter for use
> with an SFP connector, on which an SFP transceiver containing a PHY can
> be used.
> 
> With the current model, the transceiver's PHY can't be used for
> operations such as cable testing, timestamping, macsec offload, etc.
> 
> The reason being that most of the logic for these configuration, coming
> from either ethtool netlink or ioctls tend to use netdev->phydev, which
> in multi-phy systems will reference the PHY closest to the MAC.
> 
> Introduce a numbering scheme allowing to enumerate PHY devices that
> belong to any netdev, which can in turn allow userspace to take more
> precise decisions with regard to each PHY's configuration.

I think we need more than a number. Topology needs to be a core
concept here, otherwise how is the user supposed to know which PHY to
use cable test on, etc.

However, it is not a simple problem. An SFP PHY should be the last in
a chain. So you can infer something from that. When we start adding
MII muxes, they will need to be part of the modal.

    Andrew
Maxime Chevallier Sept. 12, 2023, 5:01 p.m. UTC | #9
Hello Andrew,

On Tue, 12 Sep 2023 18:15:52 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> On Thu, Sep 07, 2023 at 11:23:59AM +0200, Maxime Chevallier wrote:
> > Link topologies containing multiple network PHYs attached to the same
> > net_device can be found when using a PHY as a media converter for use
> > with an SFP connector, on which an SFP transceiver containing a PHY can
> > be used.
> > 
> > With the current model, the transceiver's PHY can't be used for
> > operations such as cable testing, timestamping, macsec offload, etc.
> > 
> > The reason being that most of the logic for these configuration, coming
> > from either ethtool netlink or ioctls tend to use netdev->phydev, which
> > in multi-phy systems will reference the PHY closest to the MAC.
> > 
> > Introduce a numbering scheme allowing to enumerate PHY devices that
> > belong to any netdev, which can in turn allow userspace to take more
> > precise decisions with regard to each PHY's configuration.  
> 
> I think we need more than a number. Topology needs to be a core
> concept here, otherwise how is the user supposed to know which PHY to
> use cable test on, etc.
> 
> However, it is not a simple problem. An SFP PHY should be the last in
> a chain. So you can infer something from that. When we start adding
> MII muxes, they will need to be part of the modal.

You raise a good point, we need to set a cursor on the level of detail
we want to have to describe the topology indeed.

I do have a patch that adds a notion of topology by keeping track of
the upstream device of each link component (either the ethernet
controller, another PHY, a mux, and SFP cage), but I got carried away
trying to find the correct granularity.

For example, say we have a PCS with a dedicated driver in the chain,
should it be part of the topology ? or do we stick to MAC, PHY, MUX,
SFP ?

To address the topology and more specifically cable-testing, I relied
on adding support for a phy_port, that would represent front-facing
ports, each PHY would have zero, one or more phy_ports, and from
userspace perspective, we would let user pick which port to use, then
have kernel-side logic to either deal with PHYs that have 2 ports, or
an actual mii mux with two single-port PHYs.

All in all for cable-testing, this solves the problem, as we could
include a way for users to know which PHY is attached to a port, and
therefore users could know which PHY is the outermost one.

However, it's not sufficient for things like timestamping. I think you
mentionned in another thread that there can be up to 7 devices that
could do the timestamping, and here it could be interesting to know
which is where, so that user can for example pick a PHY that has a
precise timestamping unit but that is also close-enough to the physical
port.

In that case, I will include what I have for topology description in
the next RFC.

Thanks for the insightful feedback,

Maxime

>     Andrew
Florian Fainelli Sept. 12, 2023, 5:08 p.m. UTC | #10
On 9/12/23 09:10, Maxime Chevallier wrote:
> Hello,
> 
> On Tue, 12 Sep 2023 17:41:31 +0200
> Andrew Lunn <andrew@lunn.ch> wrote:
> 
>>> Introduce a numbering scheme allowing to enumerate PHY devices that
>>> belong to any netdev, which can in turn allow userspace to take more
>>> precise decisions with regard to each PHY's configuration.
>>
>> A minor point, and i know naming is hard, but i keep reading _ns_ and
>> think namespace, as in ip netns. Maybe we should think of something
>> other than ns.
> 
> Yeah that was the initial idea, to imply that the numering is
> independent between netdevices... I thought about "phy_list", "phys",
> "phy_devices" but none of that felt correct :(

How about phy_devices_list?
diff mbox series

Patch

diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index c945ed9bd14b..baa95d9f24e4 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -2,7 +2,7 @@ 
 # Makefile for Linux PHY drivers
 
 libphy-y			:= phy.o phy-c45.o phy-core.o phy_device.o \
-				   linkmode.o
+				   linkmode.o phy_ns.o
 mdio-bus-y			+= mdio_bus.o mdio_device.o
 
 ifdef CONFIG_MDIO_DEVICE
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 2ce74593d6e4..0c029ae5130a 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -29,6 +29,7 @@ 
 #include <linux/phy.h>
 #include <linux/phylib_stubs.h>
 #include <linux/phy_led_triggers.h>
+#include <linux/phy_ns.h>
 #include <linux/pse-pd/pse.h>
 #include <linux/property.h>
 #include <linux/rtnetlink.h>
@@ -265,6 +266,14 @@  static void phy_mdio_device_remove(struct mdio_device *mdiodev)
 
 static struct phy_driver genphy_driver;
 
+static struct phy_namespace *phy_get_ns(struct phy_device *phydev)
+{
+	if (phydev->attached_dev)
+		return &phydev->attached_dev->phy_ns;
+
+	return NULL;
+}
+
 static LIST_HEAD(phy_fixup_list);
 static DEFINE_MUTEX(phy_fixup_lock);
 
@@ -677,6 +686,7 @@  struct phy_device *phy_device_create(struct mii_bus *bus, int addr, u32 phy_id,
 
 	dev->state = PHY_DOWN;
 	INIT_LIST_HEAD(&dev->leds);
+	INIT_LIST_HEAD(&dev->node);
 
 	mutex_init(&dev->lock);
 	INIT_DELAYED_WORK(&dev->state_queue, phy_state_machine);
@@ -1489,6 +1499,8 @@  int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
 
 		if (phydev->sfp_bus_attached)
 			dev->sfp_bus = phydev->sfp_bus;
+
+		phy_ns_add_phy(&dev->phy_ns, phydev);
 	}
 
 	/* Some Ethernet drivers try to connect to a PHY device before
@@ -1814,6 +1826,7 @@  void phy_detach(struct phy_device *phydev)
 	if (dev) {
 		phydev->attached_dev->phydev = NULL;
 		phydev->attached_dev = NULL;
+		phy_ns_del_phy(&dev->phy_ns, phydev);
 	}
 	phydev->phylink = NULL;
 
diff --git a/drivers/net/phy/phy_ns.c b/drivers/net/phy/phy_ns.c
new file mode 100644
index 000000000000..d7865028ab20
--- /dev/null
+++ b/drivers/net/phy/phy_ns.c
@@ -0,0 +1,65 @@ 
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Infrastructure to handle all PHY devices connected to a given netdev,
+ * either directly or indirectly attached.
+ *
+ * Copyright (c) 2023 Maxime Chevallier<maxime.chevallier@bootlin.com>
+ */
+
+#include <linux/phy.h>
+#include <linux/phy_ns.h>
+
+static int phy_ns_next_phyindex(struct phy_namespace *phy_ns)
+{
+	int phyindex = phy_ns->last_attributed_index;
+
+	for (;;) {
+		if (++phyindex <= 0)
+			phyindex = 1;
+		if (!phy_ns_get_by_index(phy_ns, phyindex))
+			return phy_ns->last_attributed_index = phyindex;
+	}
+}
+
+struct phy_device *phy_ns_get_by_index(struct phy_namespace *phy_ns,
+				       int phyindex)
+{
+	struct phy_device *phy;
+
+	mutex_lock(&phy_ns->ns_lock);
+	list_for_each_entry(phy, &phy_ns->phys, node)
+		if (phy->phyindex == phyindex)
+			goto unlock;
+
+	phy = NULL;
+unlock:
+	mutex_unlock(&phy_ns->ns_lock);
+	return phy;
+}
+EXPORT_SYMBOL_GPL(phy_ns_get_by_index);
+
+void phy_ns_add_phy(struct phy_namespace *phy_ns, struct phy_device *phy)
+{
+	/* PHYs can be attached and detached, they will keep their id */
+	if (!phy->phyindex)
+		phy->phyindex = phy_ns_next_phyindex(phy_ns);
+
+	mutex_lock(&phy_ns->ns_lock);
+	list_add(&phy->node, &phy_ns->phys);
+	mutex_unlock(&phy_ns->ns_lock);
+}
+EXPORT_SYMBOL_GPL(phy_ns_add_phy);
+
+void phy_ns_del_phy(struct phy_namespace *phy_ns, struct phy_device *phy)
+{
+	mutex_lock(&phy_ns->ns_lock);
+	list_del(&phy->node);
+	mutex_unlock(&phy_ns->ns_lock);
+}
+EXPORT_SYMBOL_GPL(phy_ns_del_phy);
+
+void phy_ns_init(struct phy_namespace *phy_ns)
+{
+	INIT_LIST_HEAD(&phy_ns->phys);
+	mutex_init(&phy_ns->ns_lock);
+}
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0896aaa91dd7..ef86cb87a38a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -43,6 +43,7 @@ 
 
 #include <linux/netdev_features.h>
 #include <linux/neighbour.h>
+#include <linux/phy_ns.h>
 #include <uapi/linux/netdevice.h>
 #include <uapi/linux/if_bonding.h>
 #include <uapi/linux/pkt_cls.h>
@@ -2380,6 +2381,7 @@  struct net_device {
 	struct netprio_map __rcu *priomap;
 #endif
 	struct phy_device	*phydev;
+	struct phy_namespace	phy_ns;
 	struct sfp_bus		*sfp_bus;
 	struct lock_class_key	*qdisc_tx_busylock;
 	bool			proto_down;
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 1351b802ffcf..b12fd33aa84a 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -543,6 +543,8 @@  struct macsec_ops;
  * @drv: Pointer to the driver for this PHY instance
  * @devlink: Create a link between phy dev and mac dev, if the external phy
  *           used by current mac interface is managed by another mac interface.
+ * @phyindex: Unique id across the phy's parent tree of phys to address the PHY
+ *	      from userspace, similar to ifindex. It's never recycled.
  * @phy_id: UID for this device found during discovery
  * @c45_ids: 802.3-c45 Device Identifiers if is_c45.
  * @is_c45:  Set to true if this PHY uses clause 45 addressing.
@@ -640,6 +642,7 @@  struct phy_device {
 
 	struct device_link *devlink;
 
+	int phyindex;
 	u32 phy_id;
 
 	struct phy_c45_device_ids c45_ids;
@@ -761,6 +764,7 @@  struct phy_device {
 	/* MACsec management functions */
 	const struct macsec_ops *macsec_ops;
 #endif
+	struct list_head node;
 };
 
 /* Generic phy_device::dev_flags */
diff --git a/include/linux/phy_ns.h b/include/linux/phy_ns.h
new file mode 100644
index 000000000000..ae173e637c62
--- /dev/null
+++ b/include/linux/phy_ns.h
@@ -0,0 +1,30 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * PHY device namespaces allow maintaining a list of PHY devices that are
+ * part of a netdevice's link topology. PHYs can for example be chained,
+ * as is the case when using a PHY that exposes an SFP module, on which an
+ * SFP transceiver that embeds a PHY is connected.
+ *
+ * This list can then be used by userspace to leverage individual PHY
+ * capabilities.
+ */
+#ifndef __PHY_NS_H
+#define __PHY_NS_H
+
+struct mutex;
+
+struct phy_namespace {
+	struct list_head phys;
+	int last_attributed_index;
+
+	/* Protects the .phys list */
+	struct mutex ns_lock;
+};
+
+struct phy_device *phy_ns_get_by_index(struct phy_namespace *phy_ns,
+				       int phyindex);
+void phy_ns_add_phy(struct phy_namespace *phy_ns, struct phy_device *phy);
+void phy_ns_del_phy(struct phy_namespace *phy_ns, struct phy_device *phy);
+void phy_ns_init(struct phy_namespace *phy_ns);
+
+#endif /* __PHY_NS_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index ccff2b6ef958..aa8b924269d7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10729,6 +10729,9 @@  struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	INIT_LIST_HEAD(&dev->net_notifier_list);
 #ifdef CONFIG_NET_SCHED
 	hash_init(dev->qdisc_hash);
+#endif
+#ifdef CONFIG_PHYLIB
+	phy_ns_init(&dev->phy_ns);
 #endif
 	dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
 	setup(dev);