mbox series

[net-next,0/2] Fix phy_link_topology initialization

Message ID 20240507102822.2023826-1-maxime.chevallier@bootlin.com (mailing list archive)
Headers show
Series Fix phy_link_topology initialization | expand

Message

Maxime Chevallier May 7, 2024, 10:28 a.m. UTC
Nathan and Heiner reported issues that occur when phylib and phy drivers
built as modules expect the phy_link_topology to be initialized, due to
wrong use of IS_REACHABLE.

This small fixup series addresses that by moving the initialization code
into net/core/dev.c, but at the same time implementing lazy
initialization to only allocate the topology upon the first PHY
insertion.

This needed some refactoring, namely pass the netdevice itself as a
parameter for phy_link_topology helpers.

Thanks Heiner for the help on untangling this, and Nathan for the
report.

Maxime Chevallier (2):
  net: phy: phy_link_topology: Pass netdevice to phy_link_topo helpers
  net: phy: phy_link_topology: Lazy-initialize the link topology

 drivers/net/phy/phy_device.c           | 25 +++++----------
 drivers/net/phy/phy_link_topology.c    | 44 +++++++++++---------------
 include/linux/netdevice.h              |  2 ++
 include/linux/phy_link_topology.h      | 40 +++++++++++++----------
 include/linux/phy_link_topology_core.h | 23 +++-----------
 net/core/dev.c                         | 38 ++++++++++++++++++----
 net/ethtool/netlink.c                  |  2 +-
 7 files changed, 88 insertions(+), 86 deletions(-)

Comments

Nathan Chancellor May 13, 2024, 6:36 a.m. UTC | #1
Hi Maxime,

On Tue, May 07, 2024 at 12:28:19PM +0200, Maxime Chevallier wrote:
> Nathan and Heiner reported issues that occur when phylib and phy drivers
> built as modules expect the phy_link_topology to be initialized, due to
> wrong use of IS_REACHABLE.
> 
> This small fixup series addresses that by moving the initialization code
> into net/core/dev.c, but at the same time implementing lazy
> initialization to only allocate the topology upon the first PHY
> insertion.
> 
> This needed some refactoring, namely pass the netdevice itself as a
> parameter for phy_link_topology helpers.
> 
> Thanks Heiner for the help on untangling this, and Nathan for the
> report.

Are you able to prioritize getting this series merged? This has been a
problem in -next for over a month now and the merge window is now open.
I would hate to see this regress in mainline, as my main system may be
affected by it (not sure, I got a new test machine that got bit by it in
addition to the other two I noticed it on).

Cheers,
Nathan
Russell King (Oracle) May 13, 2024, 9:15 a.m. UTC | #2
On Sun, May 12, 2024 at 11:36:36PM -0700, Nathan Chancellor wrote:
> Hi Maxime,
> 
> On Tue, May 07, 2024 at 12:28:19PM +0200, Maxime Chevallier wrote:
> > Nathan and Heiner reported issues that occur when phylib and phy drivers
> > built as modules expect the phy_link_topology to be initialized, due to
> > wrong use of IS_REACHABLE.
> > 
> > This small fixup series addresses that by moving the initialization code
> > into net/core/dev.c, but at the same time implementing lazy
> > initialization to only allocate the topology upon the first PHY
> > insertion.
> > 
> > This needed some refactoring, namely pass the netdevice itself as a
> > parameter for phy_link_topology helpers.
> > 
> > Thanks Heiner for the help on untangling this, and Nathan for the
> > report.
> 
> Are you able to prioritize getting this series merged? This has been a
> problem in -next for over a month now and the merge window is now open.
> I would hate to see this regress in mainline, as my main system may be
> affected by it (not sure, I got a new test machine that got bit by it in
> addition to the other two I noticed it on).

... and Maxime has been working on trying to get an acceptable fix for
it over that time, with to-and-fro discussions. Maxime still hasn't got
an ack from Heiner for the fixes, and changes are still being
requested.

I think, sadly, the only way forward at this point would be to revert
the original commit. I've just tried reverting 6916e461e793 in my
net-next tree and it's possible, although a little noisy:

$ git revert 6916e461e793
Performing inexact rename detection: 100% (8904/8904), done.
Auto-merging net/core/dev.c
Auto-merging include/uapi/linux/ethtool.h
Removing include/linux/phy_link_topology_core.h
Removing include/linux/phy_link_topology.h
Auto-merging include/linux/phy.h
Auto-merging include/linux/netdevice.h
Removing drivers/net/phy/phy_link_topology.c
Auto-merging drivers/net/phy/phy_device.c
Auto-merging MAINTAINERS
hint: Waiting for your editor to close the file...

I haven't checked whether that ends up with something that's buildable.

Any views Jakub/Dave/Paolo?
Jakub Kicinski May 13, 2024, 3:11 p.m. UTC | #3
On Mon, 13 May 2024 10:15:48 +0100 Russell King (Oracle) wrote:
> ... and Maxime has been working on trying to get an acceptable fix for
> it over that time, with to-and-fro discussions. Maxime still hasn't got
> an ack from Heiner for the fixes, and changes are still being
> requested.
> 
> I think, sadly, the only way forward at this point would be to revert
> the original commit. I've just tried reverting 6916e461e793 in my
> net-next tree and it's possible, although a little noisy:
> 
> $ git revert 6916e461e793
> Performing inexact rename detection: 100% (8904/8904), done.
> Auto-merging net/core/dev.c
> Auto-merging include/uapi/linux/ethtool.h
> Removing include/linux/phy_link_topology_core.h
> Removing include/linux/phy_link_topology.h
> Auto-merging include/linux/phy.h
> Auto-merging include/linux/netdevice.h
> Removing drivers/net/phy/phy_link_topology.c
> Auto-merging drivers/net/phy/phy_device.c
> Auto-merging MAINTAINERS
> hint: Waiting for your editor to close the file...
> 
> I haven't checked whether that ends up with something that's buildable.
> 
> Any views Jakub/Dave/Paolo?

I think you're right. The series got half-merged, we shouldn't push it
into a release in this state. We should revert all of it, I reckon?

6916e461e793 ("net: phy: Introduce ethernet link topology representation")
0ec5ed6c130e ("net: sfp: pass the phy_device when disconnecting an sfp module's PHY")
e75e4e074c44 ("net: phy: add helpers to handle sfp phy connect/disconnect")
fdd353965b52 ("net: sfp: Add helper to return the SFP bus name")
841942bc6212 ("net: ethtool: Allow passing a phy index for some commands")

Does anyone feel strongly that we should try to patch it up instead?
Maxime Chevallier May 14, 2024, 6:46 a.m. UTC | #4
Hi Jakub,

On Mon, 13 May 2024 08:11:38 -0700
Jakub Kicinski <kuba@kernel.org> wrote:

> On Mon, 13 May 2024 10:15:48 +0100 Russell King (Oracle) wrote:
> > ... and Maxime has been working on trying to get an acceptable fix for
> > it over that time, with to-and-fro discussions. Maxime still hasn't got
> > an ack from Heiner for the fixes, and changes are still being
> > requested.
> > 
> > I think, sadly, the only way forward at this point would be to revert
> > the original commit. I've just tried reverting 6916e461e793 in my
> > net-next tree and it's possible, although a little noisy:
> > 
> > $ git revert 6916e461e793
> > Performing inexact rename detection: 100% (8904/8904), done.
> > Auto-merging net/core/dev.c
> > Auto-merging include/uapi/linux/ethtool.h
> > Removing include/linux/phy_link_topology_core.h
> > Removing include/linux/phy_link_topology.h
> > Auto-merging include/linux/phy.h
> > Auto-merging include/linux/netdevice.h
> > Removing drivers/net/phy/phy_link_topology.c
> > Auto-merging drivers/net/phy/phy_device.c
> > Auto-merging MAINTAINERS
> > hint: Waiting for your editor to close the file...
> > 
> > I haven't checked whether that ends up with something that's buildable.
> > 
> > Any views Jakub/Dave/Paolo?  
> 
> I think you're right. The series got half-merged, we shouldn't push it
> into a release in this state. We should revert all of it, I reckon?
> 
> 6916e461e793 ("net: phy: Introduce ethernet link topology representation")
> 0ec5ed6c130e ("net: sfp: pass the phy_device when disconnecting an sfp module's PHY")
> e75e4e074c44 ("net: phy: add helpers to handle sfp phy connect/disconnect")
> fdd353965b52 ("net: sfp: Add helper to return the SFP bus name")
> 841942bc6212 ("net: ethtool: Allow passing a phy index for some commands")
> 
> Does anyone feel strongly that we should try to patch it up instead?

It's OK for me, at least this showed some of the shortcomings with the
current code, let's come back with a better version for the next round.

Thanks,

Maxime