diff mbox series

[net-next,v4,01/13] net: phy: Introduce ethernet link topology representation

Message ID 20231215171237.1152563-2-maxime.chevallier@bootlin.com (mailing list archive)
State New, archived
Headers show
Series Introduce PHY listing and link_topology tracking | expand

Commit Message

Maxime Chevallier Dec. 15, 2023, 5:12 p.m. UTC
Link topologies containing multiple network PHYs attached to the same
net_device can be found when using a PHY as a media converter for use
with an SFP connector, on which an SFP transceiver containing a PHY can
be used.

With the current model, the transceiver's PHY can't be used for
operations such as cable testing, timestamping, macsec offload, etc.

The reason being that most of the logic for these configuration, coming
from either ethtool netlink or ioctls tend to use netdev->phydev, which
in multi-phy systems will reference the PHY closest to the MAC.

Introduce a numbering scheme allowing to enumerate PHY devices that
belong to any netdev, which can in turn allow userspace to take more
precise decisions with regard to each PHY's configuration.

The numbering is maintained per-netdev, in a phy_device_list.
The numbering works similarly to a netdevice's ifindex, with
identifiers that are only recycled once INT_MAX has been reached.

This prevents races that could occur between PHY listing and SFP
transceiver removal/insertion.

The identifiers are assigned at phy_attach time, as the numbering
depends on the netdevice the phy is attached to.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
---
V4: - Moved the phy_link_topo_init() code to an inline header function
    - Made the code build without phylib

V3: - Renamed to phy_link_topology
    - Added assertions for RTNL
    - Various cleanups of leftover, unused test code
    - Made the PHY index u32

 MAINTAINERS                            |  2 +
 drivers/net/phy/Makefile               |  2 +-
 drivers/net/phy/phy_device.c           | 15 +++++
 drivers/net/phy/phy_link_topology.c    | 79 ++++++++++++++++++++++++++
 include/linux/netdevice.h              |  4 +-
 include/linux/phy.h                    |  4 ++
 include/linux/phy_link_topology.h      | 64 +++++++++++++++++++++
 include/linux/phy_link_topology_core.h | 19 +++++++
 include/uapi/linux/ethtool.h           | 16 ++++++
 net/core/dev.c                         |  3 +
 10 files changed, 206 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/phy/phy_link_topology.c
 create mode 100644 include/linux/phy_link_topology.h
 create mode 100644 include/linux/phy_link_topology_core.h

Comments

Vladimir Oltean Dec. 15, 2023, 9:45 p.m. UTC | #1
On Fri, Dec 15, 2023 at 06:12:23PM +0100, Maxime Chevallier wrote:
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index d8e9335d415c..89daaccc9276 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -1491,6 +1500,11 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
>  
>  		if (phydev->sfp_bus_attached)
>  			dev->sfp_bus = phydev->sfp_bus;
> +
> +		err = phy_link_topo_add_phy(&dev->link_topo, phydev,
> +					    PHY_UPSTREAM_MAC, dev);
> +		if (err)
> +			goto error;
>  	}
>  
>  	/* Some Ethernet drivers try to connect to a PHY device before
> @@ -1816,6 +1830,7 @@ void phy_detach(struct phy_device *phydev)
>  	if (dev) {
>  		phydev->attached_dev->phydev = NULL;
>  		phydev->attached_dev = NULL;
> +		phy_link_topo_del_phy(&dev->link_topo, phydev);
>  	}
>  	phydev->phylink = NULL;
>  
> diff --git a/drivers/net/phy/phy_link_topology.c b/drivers/net/phy/phy_link_topology.c
> new file mode 100644
> index 000000000000..22f6372d002c
> --- /dev/null
> +++ b/drivers/net/phy/phy_link_topology.c
> +int phy_link_topo_add_phy(struct phy_link_topology *topo,
> +			  struct phy_device *phy,
> +			  enum phy_upstream upt, void *upstream)
> +{
> +	struct phy_device_node *pdn;
> +	int ret;
> +
> +	/* Protects phy and upstream */
> +	ASSERT_RTNL();

Something to think for the PHY library maintainers. This is probably
the first time when the rtnl_lock() requirement is asserted at
phy_attach_direct() time.

I haven't done too much with the patch set yet, so I don't understand
exactly from the comment what this is protecting. But I get the
following assertion failure with DSA:

[    4.157160] ------------[ cut here ]------------
[    4.161805] RTNL: assertion failed at drivers/net/phy/phy_link_topology.c (35)
[    4.169124] WARNING: CPU: 0 PID: 26 at drivers/net/phy/phy_link_topology.c:35 phy_link_topo_add_phy+0x128/0x130
[    4.179263] Modules linked in:
[    4.209232] pc : phy_link_topo_add_phy+0x128/0x130
[    4.214040] lr : phy_link_topo_add_phy+0x128/0x130
[    4.293822] Call trace:
[    4.296271]  phy_link_topo_add_phy+0x128/0x130
[    4.300730]  phy_attach_direct+0xbc/0x3c4
[    4.304752]  phylink_fwnode_phy_connect+0xa8/0xf8
[    4.309473]  phylink_of_phy_connect+0x1c/0x28
[    4.313844]  dsa_user_create+0x318/0x5ac
[    4.317778]  dsa_port_setup+0x100/0x144
[    4.321626]  dsa_register_switch+0xe90/0x11f8
[    4.325997]  sja1105_probe+0x2bc/0x2e4
[    4.329759]  spi_probe+0xa4/0xc4
[    4.332995]  really_probe+0x16c/0x3fc
[    4.336669]  __driver_probe_device+0xa4/0x168
[    4.341041]  driver_probe_device+0x3c/0x220
[    4.345238]  __device_attach_driver+0x128/0x1cc
[    4.349784]  bus_for_each_drv+0xf4/0x14c
[    4.353719]  __device_attach+0xfc/0x1bc
[    4.357567]  device_initial_probe+0x14/0x20
[    4.361764]  bus_probe_device+0x94/0x100
[    4.385371] ---[ end trace 0000000000000000 ]---

Someone please correct me if I'm wrong, but at least up until now, calling
this unlocked has been quite harmless, because we call dsa_user_phy_setup()
before register_netdevice(), and thus, the net_device is pretty much
inaccessible to the world when we attach it to the PHY.

And, while having the phydev->attached_dev pointer populated technically
makes the net_device now accessible from the PHY, this is a moot point,
because no user space command targets the PHY directly. They all target
the netdev, and through that, netdev->phydev. The netdev is still
unregistered, so it's ok to not have rtnl_lock().

It is rather going to be something that concerns those drivers which call
phy_attach_direct() after registering, for example from ndo_open().

Interestingly, phylink_disconnect_phy() has an ASSERT_RTNL() in it
even though the phylink_attach_phy() derivatives do not. I'm unable
to ascertain whether a previous unregister_netdevice() call makes this
requirement redundant or not.

> +
> +	pdn = kzalloc(sizeof(*pdn), GFP_KERNEL);
> +	if (!pdn)
> +		return -ENOMEM;
> +
> +	pdn->phy = phy;
> +	switch (upt) {
> +	case PHY_UPSTREAM_MAC:
> +		pdn->upstream.netdev = (struct net_device *)upstream;
> +		if (phy_on_sfp(phy))
> +			pdn->parent_sfp_bus = pdn->upstream.netdev->sfp_bus;
> +		break;
> +	case PHY_UPSTREAM_PHY:
> +		pdn->upstream.phydev = (struct phy_device *)upstream;
> +		if (phy_on_sfp(phy))
> +			pdn->parent_sfp_bus = pdn->upstream.phydev->sfp_bus;
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		goto err;
> +	}
> +	pdn->upstream_type = upt;
> +
> +	ret = xa_alloc_cyclic(&topo->phys, &phy->phyindex, pdn, xa_limit_32b,
> +			      &topo->next_phy_index, GFP_KERNEL);
> +	if (ret)
> +		goto err;
> +
> +	return 0;
> +
> +err:
> +	kfree(pdn);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(phy_link_topo_add_phy);
Andrew Lunn Dec. 17, 2023, 4:57 p.m. UTC | #2
On Fri, Dec 15, 2023 at 11:45:23PM +0200, Vladimir Oltean wrote:
> On Fri, Dec 15, 2023 at 06:12:23PM +0100, Maxime Chevallier wrote:
> > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> > index d8e9335d415c..89daaccc9276 100644
> > --- a/drivers/net/phy/phy_device.c
> > +++ b/drivers/net/phy/phy_device.c
> > @@ -1491,6 +1500,11 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
> >  
> >  		if (phydev->sfp_bus_attached)
> >  			dev->sfp_bus = phydev->sfp_bus;
> > +
> > +		err = phy_link_topo_add_phy(&dev->link_topo, phydev,
> > +					    PHY_UPSTREAM_MAC, dev);
> > +		if (err)
> > +			goto error;
> >  	}
> >  
> >  	/* Some Ethernet drivers try to connect to a PHY device before
> > @@ -1816,6 +1830,7 @@ void phy_detach(struct phy_device *phydev)
> >  	if (dev) {
> >  		phydev->attached_dev->phydev = NULL;
> >  		phydev->attached_dev = NULL;
> > +		phy_link_topo_del_phy(&dev->link_topo, phydev);
> >  	}
> >  	phydev->phylink = NULL;
> >  
> > diff --git a/drivers/net/phy/phy_link_topology.c b/drivers/net/phy/phy_link_topology.c
> > new file mode 100644
> > index 000000000000..22f6372d002c
> > --- /dev/null
> > +++ b/drivers/net/phy/phy_link_topology.c
> > +int phy_link_topo_add_phy(struct phy_link_topology *topo,
> > +			  struct phy_device *phy,
> > +			  enum phy_upstream upt, void *upstream)
> > +{
> > +	struct phy_device_node *pdn;
> > +	int ret;
> > +
> > +	/* Protects phy and upstream */
> > +	ASSERT_RTNL();
> 
> Something to think for the PHY library maintainers. This is probably
> the first time when the rtnl_lock() requirement is asserted at
> phy_attach_direct() time.

There are two use cases here for plain MAC drivers.

1) phy_attach_direct() is called from probe. RTNL is normally not
held, the driver would have to take it before making the call.

2) phy_attach_direct() is called from ndo_open. In that case,
__dev_open() has a ASSERT_RTNL() so we can assume RTNL has been taken.

So i don't think we can assume RTNL is held, but it might be held.

We need a better understanding what is being protected here.

   Andrew
kernel test robot Dec. 18, 2023, 6:04 a.m. UTC | #3
Hi Maxime,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net-next/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Maxime-Chevallier/net-phy-Introduce-ethernet-link-topology-representation/20231216-012641
base:   net-next/main
patch link:    https://lore.kernel.org/r/20231215171237.1152563-2-maxime.chevallier%40bootlin.com
patch subject: [PATCH net-next v4 01/13] net: phy: Introduce ethernet link topology representation
config: i386-randconfig-054-20231216 (https://download.01.org/0day-ci/archive/20231218/202312181303.RTKMlnzl-lkp@intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231218/202312181303.RTKMlnzl-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202312181303.RTKMlnzl-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/net/phy/phy_device.c:269:34: warning: 'phy_get_link_topology' defined but not used [-Wunused-function]
     269 | static struct phy_link_topology *phy_get_link_topology(struct phy_device *phydev)
         |                                  ^~~~~~~~~~~~~~~~~~~~~


vim +/phy_get_link_topology +269 drivers/net/phy/phy_device.c

   268	
 > 269	static struct phy_link_topology *phy_get_link_topology(struct phy_device *phydev)
   270	{
   271		if (phydev->attached_dev)
   272			return &phydev->attached_dev->link_topo;
   273	
   274		return NULL;
   275	}
   276
kernel test robot Dec. 18, 2023, 8:10 a.m. UTC | #4
Hello,

kernel test robot noticed "assertion_failed" on:

commit: 78b410e5a385c625b85bdfec9acdee79b20f4907 ("[PATCH net-next v4 01/13] net: phy: Introduce ethernet link topology representation")
url: https://github.com/intel-lab-lkp/linux/commits/Maxime-Chevallier/net-phy-Introduce-ethernet-link-topology-representation/20231216-012641
base: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git e91db1614abae0cca248040c78b2c25f8dd97872
patch link: https://lore.kernel.org/all/20231215171237.1152563-2-maxime.chevallier@bootlin.com/
patch subject: [PATCH net-next v4 01/13] net: phy: Introduce ethernet link topology representation

in testcase: boot

compiler: gcc-9
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+------------------------------------------------------------------------------+------------+------------+
|                                                                              | e91db1614a | 78b410e5a3 |
+------------------------------------------------------------------------------+------------+------------+
| assertion_failed                                                             | 0          | 8          |
| WARNING:at_drivers/net/phy/phy_link_topology.c:#phy_link_topo_add_phy        | 0          | 8          |
| EIP:phy_link_topo_add_phy                                                    | 0          | 8          |
+------------------------------------------------------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202312181512.79005cd3-oliver.sang@intel.com


[   48.548859][   T36] ------------[ cut here ]------------
[   48.550106][   T36] RTNL: assertion failed at drivers/net/phy/phy_link_topology.c (35)
[ 48.552023][ T36] WARNING: CPU: 1 PID: 36 at drivers/net/phy/phy_link_topology.c:35 phy_link_topo_add_phy (drivers/net/phy/phy_link_topology.c:35 (discriminator 3)) 
[   48.554496][   T36] Modules linked in:
[   48.555406][   T36] CPU: 1 PID: 36 Comm: kworker/u4:1 Not tainted 6.7.0-rc5-01059-g78b410e5a385 #1 4e624fcd33c509a5c53c7f4349cb3dc9014b8795
[   48.558523][   T36] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[   48.560890][   T36] Workqueue: events_unbound deferred_probe_work_func
[ 48.562460][ T36] EIP: phy_link_topo_add_phy (drivers/net/phy/phy_link_topology.c:35 (discriminator 3)) 
[ 48.563694][ T36] Code: 26 00 00 00 00 80 3d 39 c7 19 d6 00 0f 85 f0 fe ff ff 6a 23 68 18 3d 99 d5 68 b0 96 88 d5 c6 05 39 c7 19 d6 01 e8 13 1a 3e ff <0f> 0b 83 c4 0c e9 ce fe ff ff bb f4 ff ff ff e9 fe fe ff ff cc cc
All code
========
   0:	26 00 00             	add    %al,%es:(%rax)
   3:	00 00                	add    %al,(%rax)
   5:	80 3d 39 c7 19 d6 00 	cmpb   $0x0,-0x29e638c7(%rip)        # 0xffffffffd619c745
   c:	0f 85 f0 fe ff ff    	jne    0xffffffffffffff02
  12:	6a 23                	pushq  $0x23
  14:	68 18 3d 99 d5       	pushq  $0xffffffffd5993d18
  19:	68 b0 96 88 d5       	pushq  $0xffffffffd58896b0
  1e:	c6 05 39 c7 19 d6 01 	movb   $0x1,-0x29e638c7(%rip)        # 0xffffffffd619c75e
  25:	e8 13 1a 3e ff       	callq  0xffffffffff3e1a3d
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	83 c4 0c             	add    $0xc,%esp
  2f:	e9 ce fe ff ff       	jmpq   0xffffffffffffff02
  34:	bb f4 ff ff ff       	mov    $0xfffffff4,%ebx
  39:	e9 fe fe ff ff       	jmpq   0xffffffffffffff3c
  3e:	cc                   	int3   
  3f:	cc                   	int3   

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2    
   2:	83 c4 0c             	add    $0xc,%esp
   5:	e9 ce fe ff ff       	jmpq   0xfffffffffffffed8
   a:	bb f4 ff ff ff       	mov    $0xfffffff4,%ebx
   f:	e9 fe fe ff ff       	jmpq   0xffffffffffffff12
  14:	cc                   	int3   
  15:	cc                   	int3   
[   48.568132][   T36] EAX: 00000042 EBX: edf5a000 ECX: 00000000 EDX: d5eb634c
[   48.569726][   T36] ESI: c38c96b8 EDI: 00000000 EBP: c0cf7d0c ESP: c0cf7cf0
[   48.571374][   T36] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010286
[   48.573237][   T36] CR0: 80050033 CR2: b7d88778 CR3: 16543000 CR4: 00040690
[   48.574846][   T36] Call Trace:
[ 48.575546][ T36] ? show_regs (arch/x86/kernel/dumpstack.c:479) 
[ 48.576634][ T36] ? phy_link_topo_add_phy (drivers/net/phy/phy_link_topology.c:35 (discriminator 3)) 
[ 48.577915][ T36] ? __warn (kernel/panic.c:677) 
[ 48.578810][ T36] ? phy_link_topo_add_phy (drivers/net/phy/phy_link_topology.c:35 (discriminator 3)) 
[ 48.580194][ T36] ? report_bug (lib/bug.c:201 lib/bug.c:219) 
[ 48.581234][ T36] ? exc_overflow (arch/x86/kernel/traps.c:250) 
[ 48.582213][ T36] ? handle_bug (arch/x86/kernel/traps.c:216) 
[ 48.583120][ T36] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1)) 
[ 48.584185][ T36] ? handle_exception (arch/x86/entry/entry_32.S:1049) 
[ 48.585295][ T36] ? exc_overflow (arch/x86/kernel/traps.c:250) 
[ 48.586339][ T36] ? phy_link_topo_add_phy (drivers/net/phy/phy_link_topology.c:35 (discriminator 3)) 
[ 48.587595][ T36] ? exc_overflow (arch/x86/kernel/traps.c:250) 
[ 48.588689][ T36] ? phy_link_topo_add_phy (drivers/net/phy/phy_link_topology.c:35 (discriminator 3)) 
[ 48.589870][ T36] phy_attach_direct (drivers/net/phy/phy_device.c:1504) 
[ 48.590857][ T36] phylink_connect_phy (drivers/net/phy/phylink.c:1958) 
[ 48.591949][ T36] dsa_user_create (net/dsa/user.c:2495 net/dsa/user.c:2530 net/dsa/user.c:2668) 
[ 48.592985][ T36] dsa_port_setup (net/dsa/dsa.c:524) 
[ 48.593974][ T36] dsa_tree_setup (net/dsa/dsa.c:764 net/dsa/dsa.c:893) 
[ 48.595049][ T36] dsa_register_switch (net/dsa/dsa.c:1530 net/dsa/dsa.c:1544) 
[ 48.596350][ T36] dsa_loop_drv_probe (drivers/net/dsa/dsa_loop.c:344) 
[ 48.597483][ T36] mdio_probe (drivers/net/phy/mdio_device.c:165) 
[ 48.598357][ T36] really_probe (drivers/base/dd.c:579 drivers/base/dd.c:658) 
[ 48.599353][ T36] __driver_probe_device (drivers/base/dd.c:800) 
[ 48.600558][ T36] driver_probe_device (drivers/base/dd.c:830) 
[ 48.601690][ T36] __device_attach_driver (drivers/base/dd.c:959) 
[ 48.602883][ T36] bus_for_each_drv (drivers/base/bus.c:457) 
[ 48.604077][ T36] __device_attach (drivers/base/dd.c:1030) 
[ 48.605140][ T36] ? driver_probe_device (drivers/base/dd.c:922) 
[ 48.606221][ T36] device_initial_probe (drivers/base/dd.c:1080) 
[ 48.607390][ T36] bus_probe_device (drivers/base/bus.c:532) 
[ 48.608551][ T36] deferred_probe_work_func (drivers/base/dd.c:124) 
[ 48.609781][ T36] process_one_work (include/linux/jump_label.h:207 include/linux/jump_label.h:207 include/trace/events/workqueue.h:108 kernel/workqueue.c:2632) 
[ 48.611053][ T36] process_scheduled_works (kernel/workqueue.c:2694) 
[ 48.612380][ T36] worker_thread (kernel/workqueue.c:2781) 
[ 48.613437][ T36] kthread (kernel/kthread.c:388) 
[ 48.614365][ T36] ? rescuer_thread (kernel/workqueue.c:2727) 
[ 48.616138][ T36] ? kthread_complete_and_exit (kernel/kthread.c:341) 
[ 48.617344][ T36] ret_from_fork (arch/x86/kernel/process.c:153) 
[ 48.618342][ T36] ? kthread_complete_and_exit (kernel/kthread.c:341) 
[ 48.619647][ T36] ret_from_fork_asm (arch/x86/entry/entry_32.S:741) 
[ 48.620892][ T36] entry_INT80_32 (arch/x86/entry/entry_32.S:947) 
[   48.622004][   T36] irq event stamp: 35465
[ 48.622981][ T36] hardirqs last enabled at (35473): console_unlock (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:77 arch/x86/include/asm/irqflags.h:135 kernel/printk/printk.c:341 kernel/printk/printk.c:2706 kernel/printk/printk.c:3038) 
[ 48.625036][ T36] hardirqs last disabled at (35482): console_unlock (kernel/printk/printk.c:339 kernel/printk/printk.c:2706 kernel/printk/printk.c:3038) 
[ 48.626997][ T36] softirqs last enabled at (35420): __do_softirq (arch/x86/include/asm/preempt.h:27 kernel/softirq.c:400 kernel/softirq.c:582) 
[ 48.628965][ T36] softirqs last disabled at (35415): do_softirq_own_stack (arch/x86/kernel/irq_32.c:57 arch/x86/kernel/irq_32.c:147) 
[   48.631005][   T36] ---[ end trace 0000000000000000 ]---
[   48.632508][   T36] dsa-loop fixed-0:1f lan1 (uninitialized): PHY [dsa-0.0:00] driver [Generic PHY] (irq=POLL)
[   48.637526][   T36] dsa-loop fixed-0:1f lan2 (uninitialized): PHY [dsa-0.0:01] driver [Generic PHY] (irq=POLL)
[   48.654535][   T36] dsa-loop fixed-0:1f lan3 (uninitialized): PHY [dsa-0.0:02] driver [Generic PHY] (irq=POLL)
[   48.659100][   T36] dsa-loop fixed-0:1f lan4 (uninitialized): PHY [dsa-0.0:03] driver [Generic PHY] (irq=POLL)
[   48.663920][   T36] DSA: tree 0 setup
[   48.664854][   T36] dsa-loop fixed-0:1f: DSA mockup driver: 0x1f



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231218/202312181512.79005cd3-oliver.sang@intel.com
Maxime Chevallier Dec. 18, 2023, 8:49 a.m. UTC | #5
Hello Andrew,

Thanks for the review,

On Sun, 17 Dec 2023 17:57:10 +0100
Andrew Lunn <andrew@lunn.ch> wrote:

> On Fri, Dec 15, 2023 at 11:45:23PM +0200, Vladimir Oltean wrote:
> > On Fri, Dec 15, 2023 at 06:12:23PM +0100, Maxime Chevallier wrote:  
> > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> > > index d8e9335d415c..89daaccc9276 100644
> > > --- a/drivers/net/phy/phy_device.c
> > > +++ b/drivers/net/phy/phy_device.c
> > > @@ -1491,6 +1500,11 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
> > >  
> > >  		if (phydev->sfp_bus_attached)
> > >  			dev->sfp_bus = phydev->sfp_bus;
> > > +
> > > +		err = phy_link_topo_add_phy(&dev->link_topo, phydev,
> > > +					    PHY_UPSTREAM_MAC, dev);
> > > +		if (err)
> > > +			goto error;
> > >  	}
> > >  
> > >  	/* Some Ethernet drivers try to connect to a PHY device before
> > > @@ -1816,6 +1830,7 @@ void phy_detach(struct phy_device *phydev)
> > >  	if (dev) {
> > >  		phydev->attached_dev->phydev = NULL;
> > >  		phydev->attached_dev = NULL;
> > > +		phy_link_topo_del_phy(&dev->link_topo, phydev);
> > >  	}
> > >  	phydev->phylink = NULL;
> > >  
> > > diff --git a/drivers/net/phy/phy_link_topology.c b/drivers/net/phy/phy_link_topology.c
> > > new file mode 100644
> > > index 000000000000..22f6372d002c
> > > --- /dev/null
> > > +++ b/drivers/net/phy/phy_link_topology.c
> > > +int phy_link_topo_add_phy(struct phy_link_topology *topo,
> > > +			  struct phy_device *phy,
> > > +			  enum phy_upstream upt, void *upstream)
> > > +{
> > > +	struct phy_device_node *pdn;
> > > +	int ret;
> > > +
> > > +	/* Protects phy and upstream */
> > > +	ASSERT_RTNL();  
> > 
> > Something to think for the PHY library maintainers. This is probably
> > the first time when the rtnl_lock() requirement is asserted at
> > phy_attach_direct() time.  
> 
> There are two use cases here for plain MAC drivers.
> 
> 1) phy_attach_direct() is called from probe. RTNL is normally not
> held, the driver would have to take it before making the call.
> 
> 2) phy_attach_direct() is called from ndo_open. In that case,
> __dev_open() has a ASSERT_RTNL() so we can assume RTNL has been taken.
> 
> So i don't think we can assume RTNL is held, but it might be held.
> 
> We need a better understanding what is being protected here.

I'm protecting the struct phy_device and the *upstream pointer (which
can be a net_device or a struct phy_device as well). In particular, I'm
protecting the phy_device->phyindex field. While I don't see any case
where we would concurrently write to it, given the weird topologies
that might be implemented in the future, I guess better safe than sorry.

For the rest, the actual phy list and the next_index pointer from the
topology should be already protected by the xarray mechanism.

Thanks,

Maxime
Maxime Chevallier Dec. 18, 2023, 9:11 a.m. UTC | #6
Hello Vlad, PHy maintainers,

On Fri, 15 Dec 2023 23:45:23 +0200
Vladimir Oltean <vladimir.oltean@nxp.com> wrote:

> On Fri, Dec 15, 2023 at 06:12:23PM +0100, Maxime Chevallier wrote:
> > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> > index d8e9335d415c..89daaccc9276 100644
> > --- a/drivers/net/phy/phy_device.c
> > +++ b/drivers/net/phy/phy_device.c
> > @@ -1491,6 +1500,11 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
> >  
> >  		if (phydev->sfp_bus_attached)
> >  			dev->sfp_bus = phydev->sfp_bus;
> > +
> > +		err = phy_link_topo_add_phy(&dev->link_topo, phydev,
> > +					    PHY_UPSTREAM_MAC, dev);
> > +		if (err)
> > +			goto error;
> >  	}
> >  
> >  	/* Some Ethernet drivers try to connect to a PHY device before
> > @@ -1816,6 +1830,7 @@ void phy_detach(struct phy_device *phydev)
> >  	if (dev) {
> >  		phydev->attached_dev->phydev = NULL;
> >  		phydev->attached_dev = NULL;
> > +		phy_link_topo_del_phy(&dev->link_topo, phydev);
> >  	}
> >  	phydev->phylink = NULL;
> >  
> > diff --git a/drivers/net/phy/phy_link_topology.c b/drivers/net/phy/phy_link_topology.c
> > new file mode 100644
> > index 000000000000..22f6372d002c
> > --- /dev/null
> > +++ b/drivers/net/phy/phy_link_topology.c
> > +int phy_link_topo_add_phy(struct phy_link_topology *topo,
> > +			  struct phy_device *phy,
> > +			  enum phy_upstream upt, void *upstream)
> > +{
> > +	struct phy_device_node *pdn;
> > +	int ret;
> > +
> > +	/* Protects phy and upstream */
> > +	ASSERT_RTNL();  
> 
> Something to think for the PHY library maintainers. This is probably
> the first time when the rtnl_lock() requirement is asserted at
> phy_attach_direct() time.
> 
> I haven't done too much with the patch set yet, so I don't understand
> exactly from the comment what this is protecting. But I get the
> following assertion failure with DSA:
> 
> [    4.157160] ------------[ cut here ]------------
> [    4.161805] RTNL: assertion failed at drivers/net/phy/phy_link_topology.c (35)
> [    4.169124] WARNING: CPU: 0 PID: 26 at drivers/net/phy/phy_link_topology.c:35 phy_link_topo_add_phy+0x128/0x130
> [    4.179263] Modules linked in:
> [    4.209232] pc : phy_link_topo_add_phy+0x128/0x130
> [    4.214040] lr : phy_link_topo_add_phy+0x128/0x130
> [    4.293822] Call trace:
> [    4.296271]  phy_link_topo_add_phy+0x128/0x130
> [    4.300730]  phy_attach_direct+0xbc/0x3c4
> [    4.304752]  phylink_fwnode_phy_connect+0xa8/0xf8
> [    4.309473]  phylink_of_phy_connect+0x1c/0x28
> [    4.313844]  dsa_user_create+0x318/0x5ac
> [    4.317778]  dsa_port_setup+0x100/0x144
> [    4.321626]  dsa_register_switch+0xe90/0x11f8
> [    4.325997]  sja1105_probe+0x2bc/0x2e4
> [    4.329759]  spi_probe+0xa4/0xc4
> [    4.332995]  really_probe+0x16c/0x3fc
> [    4.336669]  __driver_probe_device+0xa4/0x168
> [    4.341041]  driver_probe_device+0x3c/0x220
> [    4.345238]  __device_attach_driver+0x128/0x1cc
> [    4.349784]  bus_for_each_drv+0xf4/0x14c
> [    4.353719]  __device_attach+0xfc/0x1bc
> [    4.357567]  device_initial_probe+0x14/0x20
> [    4.361764]  bus_probe_device+0x94/0x100
> [    4.385371] ---[ end trace 0000000000000000 ]---
> 
> Someone please correct me if I'm wrong, but at least up until now, calling
> this unlocked has been quite harmless, because we call dsa_user_phy_setup()
> before register_netdevice(), and thus, the net_device is pretty much
> inaccessible to the world when we attach it to the PHY.

Ok so I'll give it more thought on that part, and analyze better the
access paths that ca be problematic. I'll update the doc accordingly,
as this is non-trivial. I haven't been able to test it on a DSA setup,
nor on a !phylink mac, so thanks a lot for testing :)
> 
> And, while having the phydev->attached_dev pointer populated technically
> makes the net_device now accessible from the PHY, this is a moot point,
> because no user space command targets the PHY directly. They all target
> the netdev, and through that, netdev->phydev. The netdev is still
> unregistered, so it's ok to not have rtnl_lock().
> 
> It is rather going to be something that concerns those drivers which call
> phy_attach_direct() after registering, for example from ndo_open().

In that case this is fine, right ? As ndo_open runs with rtnl held,
phy-targetting netlink commands (that indeed goes through the netdev)
will be also serialized ? I might be missing the point though :(

> Interestingly, phylink_disconnect_phy() has an ASSERT_RTNL() in it
> even though the phylink_attach_phy() derivatives do not. I'm unable
> to ascertain whether a previous unregister_netdevice() call makes this
> requirement redundant or not.
> 
> > +
> > +	pdn = kzalloc(sizeof(*pdn), GFP_KERNEL);
> > +	if (!pdn)
> > +		return -ENOMEM;
> > +
> > +	pdn->phy = phy;
> > +	switch (upt) {
> > +	case PHY_UPSTREAM_MAC:
> > +		pdn->upstream.netdev = (struct net_device *)upstream;
> > +		if (phy_on_sfp(phy))
> > +			pdn->parent_sfp_bus = pdn->upstream.netdev->sfp_bus;
> > +		break;
> > +	case PHY_UPSTREAM_PHY:
> > +		pdn->upstream.phydev = (struct phy_device *)upstream;
> > +		if (phy_on_sfp(phy))
> > +			pdn->parent_sfp_bus = pdn->upstream.phydev->sfp_bus;
> > +		break;
> > +	default:
> > +		ret = -EINVAL;
> > +		goto err;
> > +	}
> > +	pdn->upstream_type = upt;
> > +
> > +	ret = xa_alloc_cyclic(&topo->phys, &phy->phyindex, pdn, xa_limit_32b,
> > +			      &topo->next_phy_index, GFP_KERNEL);
> > +	if (ret)
> > +		goto err;
> > +
> > +	return 0;
> > +
> > +err:
> > +	kfree(pdn);
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(phy_link_topo_add_phy);
diff mbox series

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index dda78b4ce707..f09b1d4e5487 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7873,6 +7873,8 @@  F:	include/linux/mii.h
 F:	include/linux/of_net.h
 F:	include/linux/phy.h
 F:	include/linux/phy_fixed.h
+F:	include/linux/phy_link_topology.h
+F:	include/linux/phy_link_topology_core.h
 F:	include/linux/phylib_stubs.h
 F:	include/linux/platform_data/mdio-bcm-unimac.h
 F:	include/linux/platform_data/mdio-gpio.h
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index e35ea69d9cb4..a7a9640bfa3a 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -2,7 +2,7 @@ 
 # Makefile for Linux PHY drivers
 
 libphy-y			:= phy.o phy-c45.o phy-core.o phy_device.o \
-				   linkmode.o
+				   linkmode.o phy_link_topology.o
 mdio-bus-y			+= mdio_bus.o mdio_device.o
 
 ifdef CONFIG_MDIO_DEVICE
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index d8e9335d415c..89daaccc9276 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -29,6 +29,7 @@ 
 #include <linux/phy.h>
 #include <linux/phylib_stubs.h>
 #include <linux/phy_led_triggers.h>
+#include <linux/phy_link_topology.h>
 #include <linux/pse-pd/pse.h>
 #include <linux/property.h>
 #include <linux/rtnetlink.h>
@@ -265,6 +266,14 @@  static void phy_mdio_device_remove(struct mdio_device *mdiodev)
 
 static struct phy_driver genphy_driver;
 
+static struct phy_link_topology *phy_get_link_topology(struct phy_device *phydev)
+{
+	if (phydev->attached_dev)
+		return &phydev->attached_dev->link_topo;
+
+	return NULL;
+}
+
 static LIST_HEAD(phy_fixup_list);
 static DEFINE_MUTEX(phy_fixup_lock);
 
@@ -1491,6 +1500,11 @@  int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
 
 		if (phydev->sfp_bus_attached)
 			dev->sfp_bus = phydev->sfp_bus;
+
+		err = phy_link_topo_add_phy(&dev->link_topo, phydev,
+					    PHY_UPSTREAM_MAC, dev);
+		if (err)
+			goto error;
 	}
 
 	/* Some Ethernet drivers try to connect to a PHY device before
@@ -1816,6 +1830,7 @@  void phy_detach(struct phy_device *phydev)
 	if (dev) {
 		phydev->attached_dev->phydev = NULL;
 		phydev->attached_dev = NULL;
+		phy_link_topo_del_phy(&dev->link_topo, phydev);
 	}
 	phydev->phylink = NULL;
 
diff --git a/drivers/net/phy/phy_link_topology.c b/drivers/net/phy/phy_link_topology.c
new file mode 100644
index 000000000000..22f6372d002c
--- /dev/null
+++ b/drivers/net/phy/phy_link_topology.c
@@ -0,0 +1,79 @@ 
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Infrastructure to handle all PHY devices connected to a given netdev,
+ * either directly or indirectly attached.
+ *
+ * Copyright (c) 2023 Maxime Chevallier<maxime.chevallier@bootlin.com>
+ */
+
+#include <linux/phy_link_topology.h>
+#include <linux/netdevice.h>
+#include <linux/phy.h>
+#include <linux/rtnetlink.h>
+#include <linux/xarray.h>
+
+struct phy_device *phy_link_topo_get_phy(struct phy_link_topology *topo,
+					 u32 phyindex)
+{
+	struct phy_device_node *pdn = xa_load(&topo->phys, phyindex);
+
+	if (pdn)
+		return pdn->phy;
+
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(phy_link_topo_get_phy);
+
+int phy_link_topo_add_phy(struct phy_link_topology *topo,
+			  struct phy_device *phy,
+			  enum phy_upstream upt, void *upstream)
+{
+	struct phy_device_node *pdn;
+	int ret;
+
+	/* Protects phy and upstream */
+	ASSERT_RTNL();
+
+	pdn = kzalloc(sizeof(*pdn), GFP_KERNEL);
+	if (!pdn)
+		return -ENOMEM;
+
+	pdn->phy = phy;
+	switch (upt) {
+	case PHY_UPSTREAM_MAC:
+		pdn->upstream.netdev = (struct net_device *)upstream;
+		if (phy_on_sfp(phy))
+			pdn->parent_sfp_bus = pdn->upstream.netdev->sfp_bus;
+		break;
+	case PHY_UPSTREAM_PHY:
+		pdn->upstream.phydev = (struct phy_device *)upstream;
+		if (phy_on_sfp(phy))
+			pdn->parent_sfp_bus = pdn->upstream.phydev->sfp_bus;
+		break;
+	default:
+		ret = -EINVAL;
+		goto err;
+	}
+	pdn->upstream_type = upt;
+
+	ret = xa_alloc_cyclic(&topo->phys, &phy->phyindex, pdn, xa_limit_32b,
+			      &topo->next_phy_index, GFP_KERNEL);
+	if (ret)
+		goto err;
+
+	return 0;
+
+err:
+	kfree(pdn);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(phy_link_topo_add_phy);
+
+void phy_link_topo_del_phy(struct phy_link_topology *topo,
+			   struct phy_device *phy)
+{
+	struct phy_device_node *pdn = xa_erase(&topo->phys, phy->phyindex);
+
+	kfree(pdn);
+}
+EXPORT_SYMBOL_GPL(phy_link_topo_del_phy);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1b935ee341b4..17ae81c1e519 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -40,7 +40,6 @@ 
 #include <net/dcbnl.h>
 #endif
 #include <net/netprio_cgroup.h>
-
 #include <linux/netdev_features.h>
 #include <linux/neighbour.h>
 #include <uapi/linux/netdevice.h>
@@ -52,6 +51,7 @@ 
 #include <net/net_trackers.h>
 #include <net/net_debug.h>
 #include <net/dropreason-core.h>
+#include <linux/phy_link_topology_core.h>
 
 struct netpoll_info;
 struct device;
@@ -2041,6 +2041,7 @@  enum netdev_stat_type {
  *	@fcoe_ddp_xid:	Max exchange id for FCoE LRO by ddp
  *
  *	@priomap:	XXX: need comments on this one
+ *	@link_topo:	Physical link topology tracking attached PHYs
  *	@phydev:	Physical device may attach itself
  *			for hardware timestamping
  *	@sfp_bus:	attached &struct sfp_bus structure.
@@ -2435,6 +2436,7 @@  struct net_device {
 #if IS_ENABLED(CONFIG_CGROUP_NET_PRIO)
 	struct netprio_map __rcu *priomap;
 #endif
+	struct phy_link_topology	link_topo;
 	struct phy_device	*phydev;
 	struct sfp_bus		*sfp_bus;
 	struct lock_class_key	*qdisc_tx_busylock;
diff --git a/include/linux/phy.h b/include/linux/phy.h
index dbb5e13e3e1b..12613ad521b4 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -543,6 +543,9 @@  struct macsec_ops;
  * @drv: Pointer to the driver for this PHY instance
  * @devlink: Create a link between phy dev and mac dev, if the external phy
  *           used by current mac interface is managed by another mac interface.
+ * @phyindex: Unique id across the phy's parent tree of phys to address the PHY
+ *	      from userspace, similar to ifindex. A zero index means the PHY
+ *	      wasn't assigned an id yet.
  * @phy_id: UID for this device found during discovery
  * @c45_ids: 802.3-c45 Device Identifiers if is_c45.
  * @is_c45:  Set to true if this PHY uses clause 45 addressing.
@@ -642,6 +645,7 @@  struct phy_device {
 
 	struct device_link *devlink;
 
+	u32 phyindex;
 	u32 phy_id;
 
 	struct phy_c45_device_ids c45_ids;
diff --git a/include/linux/phy_link_topology.h b/include/linux/phy_link_topology.h
new file mode 100644
index 000000000000..0d23919e464f
--- /dev/null
+++ b/include/linux/phy_link_topology.h
@@ -0,0 +1,64 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * PHY device list allow maintaining a list of PHY devices that are
+ * part of a netdevice's link topology. PHYs can for example be chained,
+ * as is the case when using a PHY that exposes an SFP module, on which an
+ * SFP transceiver that embeds a PHY is connected.
+ *
+ * This list can then be used by userspace to leverage individual PHY
+ * capabilities.
+ */
+#ifndef __PHY_LINK_TOPOLOGY_H
+#define __PHY_LINK_TOPOLOGY_H
+
+#include <linux/ethtool.h>
+#include <linux/phy_link_topology_core.h>
+
+struct xarray;
+struct phy_device;
+struct net_device;
+struct sfp_bus;
+
+struct phy_device_node {
+	enum phy_upstream upstream_type;
+
+	union {
+		struct net_device	*netdev;
+		struct phy_device	*phydev;
+	} upstream;
+
+	struct sfp_bus *parent_sfp_bus;
+
+	struct phy_device *phy;
+};
+
+#if IS_ENABLED(CONFIG_PHYLIB)
+struct phy_device *phy_link_topo_get_phy(struct phy_link_topology *topo,
+					 u32 phyindex);
+int phy_link_topo_add_phy(struct phy_link_topology *topo,
+			  struct phy_device *phy,
+			  enum phy_upstream upt, void *upstream);
+
+void phy_link_topo_del_phy(struct phy_link_topology *lt, struct phy_device *phy);
+
+#else
+static inline struct phy_device *phy_link_topo_get_phy(struct phy_link_topology *topo,
+						       u32 phyindex)
+{
+	return NULL;
+}
+
+static inline int phy_link_topo_add_phy(struct phy_link_topology *topo,
+					struct phy_device *phy,
+					enum phy_upstream upt, void *upstream)
+{
+	return 0;
+}
+
+static inline void phy_link_topo_del_phy(struct phy_link_topology *topo,
+					 struct phy_device *phy)
+{
+}
+#endif
+
+#endif /* __PHY_LINK_TOPOLOGY_H */
diff --git a/include/linux/phy_link_topology_core.h b/include/linux/phy_link_topology_core.h
new file mode 100644
index 000000000000..78c75f909489
--- /dev/null
+++ b/include/linux/phy_link_topology_core.h
@@ -0,0 +1,19 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PHY_LINK_TOPOLOGY_CORE_H
+#define __PHY_LINK_TOPOLOGY_CORE_H
+
+struct xarray;
+
+struct phy_link_topology {
+	struct xarray phys;
+
+	u32 next_phy_index;
+};
+
+static inline void phy_link_topo_init(struct phy_link_topology *topo)
+{
+	xa_init_flags(&topo->phys, XA_FLAGS_ALLOC1);
+	topo->next_phy_index = 1;
+}
+
+#endif /* __PHY_LINK_TOPOLOGY_CORE_H */
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 0787d561ace0..9cff798c6df9 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -2216,4 +2216,20 @@  struct ethtool_link_settings {
 	 * __u32 map_lp_advertising[link_mode_masks_nwords];
 	 */
 };
+
+/**
+ * enum phy_upstream - Represents the upstream component a given PHY device
+ * is connected to, as in what is on the other end of the MII bus. Most PHYs
+ * will be attached to an Ethernet MAC controller, but in some cases, there's
+ * an intermediate PHY used as a media-converter, which will driver another
+ * MII interface as its output.
+ * @PHY_UPSTREAM_MAC: Upstream component is a MAC (a switch port,
+ *		      or ethernet controller)
+ * @PHY_UPSTREAM_PHY: Upstream component is a PHY (likely a media converter)
+ */
+enum phy_upstream {
+	PHY_UPSTREAM_MAC,
+	PHY_UPSTREAM_PHY,
+};
+
 #endif /* _UAPI_LINUX_ETHTOOL_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index 0432b04cf9b0..a8776fe6be68 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -153,6 +153,7 @@ 
 #include <linux/prandom.h>
 #include <linux/once_lite.h>
 #include <net/netdev_rx_queue.h>
+#include <linux/phy_link_topology_core.h>
 
 #include "dev.h"
 #include "net-sysfs.h"
@@ -10869,6 +10870,8 @@  struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 #ifdef CONFIG_NET_SCHED
 	hash_init(dev->qdisc_hash);
 #endif
+	phy_link_topo_init(&dev->link_topo);
+
 	dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
 	setup(dev);