Message ID | 20250101235122.704012-1-francesco@valla.it (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: phy: don't issue a module request if a driver is available | expand |
> mutex_init(&dev->lock); > INIT_DELAYED_WORK(&dev->state_queue, phy_state_machine); > >- /* Request the appropriate module unconditionally; don't >- * bother trying to do so only if it isn't already loaded, >- * because that gets complicated. A hotplug event would have >- * done an unconditional modprobe anyway. >- * We don't do normal hotplug because it won't work for MDIO >+ /* We don't do normal hotplug because it won't work for MDIO > * -- because it relies on the device staying around for long > * enough for the driver to get loaded. With MDIO, the NIC > * driver will get bored and give up as soon as it finds that @@ - >724,7 +745,8 @@ struct phy_device *phy_device_create(struct mii_bus >*bus, int addr, u32 phy_id, > int i; > > for (i = 1; i < num_ids; i++) { >- if (c45_ids->device_ids[i] == 0xffffffff) >+ if (c45_ids->device_ids[i] == 0xffffffff || >+ phy_driver_exists(c45_ids->device_ids[i])) > continue; > > ret = phy_request_driver_module(dev, @@ -732,7 +754,7 @@ >struct phy_device *phy_device_create(struct mii_bus *bus, int addr, u32 >phy_id, > if (ret) > break; > } >- } else { >+ } else if (!phy_driver_exists(phy_id)) { [Suman] Can we add this phy_driver_exists() API call before the if/else check? > ret = phy_request_driver_module(dev, phy_id); > } >
Hi Suman, On Thursday, 2 January 2025 at 10:35:40 Suman Ghosh <sumang@marvell.com> wrote: > > mutex_init(&dev->lock); > > INIT_DELAYED_WORK(&dev->state_queue, phy_state_machine); > > > >- /* Request the appropriate module unconditionally; don't > >- * bother trying to do so only if it isn't already loaded, > >- * because that gets complicated. A hotplug event would have > >- * done an unconditional modprobe anyway. > >- * We don't do normal hotplug because it won't work for MDIO > >+ /* We don't do normal hotplug because it won't work for MDIO > > * -- because it relies on the device staying around for long > > * enough for the driver to get loaded. With MDIO, the NIC > > * driver will get bored and give up as soon as it finds that @@ - > >724,7 +745,8 @@ struct phy_device *phy_device_create(struct mii_bus > >*bus, int addr, u32 phy_id, > > int i; > > > > for (i = 1; i < num_ids; i++) { > >- if (c45_ids->device_ids[i] == 0xffffffff) > >+ if (c45_ids->device_ids[i] == 0xffffffff || > >+ phy_driver_exists(c45_ids->device_ids[i])) > > continue; > > > > ret = phy_request_driver_module(dev, @@ -732,7 +754,7 @@ > >struct phy_device *phy_device_create(struct mii_bus *bus, int addr, u32 > >phy_id, > > if (ret) > > break; > > } > >- } else { > >+ } else if (!phy_driver_exists(phy_id)) { > [Suman] Can we add this phy_driver_exists() API call before the if/else check? > Not really, as in case of C45 PHYs we have to check for drivers using (multiple) IDs, which are different from phy_id. > > ret = phy_request_driver_module(dev, phy_id); > > } > > > Thank you! Regards, Francesco
On Thu, Jan 02, 2025 at 12:51:22AM +0100, Francesco Valla wrote: > Whenever a new PHY device is created, request_module() is called > unconditionally, without checking if a driver for the new PHY is already > available (either built-in or from a previous probe). This conflicts > with async probing of the underlying MDIO bus and always throws a > warning (because if a driver is loaded it _might_ cause a deadlock, if > in turn it calls async_synchronize_full()). Why aren't any of the phylib maintainers seeing this warning? Where does the warning come from? > +static bool phy_driver_exists(u32 phy_id) > +{ > + bool found = false; > + struct phy_drv_node *node; > + > + down_read(&phy_drv_list_sem); > + list_for_each_entry(node, &phy_drv_list, list) { > + if (phy_id_compare(phy_id, node->drv->phy_id, node->drv->phy_id_mask)) { > + found = true; > + break; > + } > + } > + up_read(&phy_drv_list_sem); > + > + return found; > +} > + Why do we need this, along with the associated additional memory allocations? What's wrong with bus_for_each_drv() which the core code provides?
On Thursday, 2 January 2025 at 12:06:15 Russell King (Oracle) <linux@armlinux.org.uk> wrote: > On Thu, Jan 02, 2025 at 12:51:22AM +0100, Francesco Valla wrote: > > Whenever a new PHY device is created, request_module() is called > > unconditionally, without checking if a driver for the new PHY is already > > available (either built-in or from a previous probe). This conflicts > > with async probing of the underlying MDIO bus and always throws a > > warning (because if a driver is loaded it _might_ cause a deadlock, if > > in turn it calls async_synchronize_full()). > > Why aren't any of the phylib maintainers seeing this warning? Where does > the warning come from? > I'm not sure. For me, it was pretty easy to trigger. I'm on a BeaglePlay (AM62X) and I am working on boot time optimization and with different configurations for async probing. When the async probe of the Ethernet NIC (i.e.: am65-cpsw-nuss) runs, it in turn triggers the probe of the associated davincio_mdio device, which then brings me to the following WARNING: [ 0.287271] davinci_mdio 8000f00.mdio: davinci mdio revision 9.7, bus freq 1000000 [ 0.287574] ------------[ cut here ]------------ [ 0.287581] WARNING: CPU: 2 PID: 48 at /kernel/module/kmod.c:144 __request_module+0x19c/0x204 [ 0.287605] Modules linked in: [ 0.287619] CPU: 2 UID: 0 PID: 48 Comm: kworker/u16:2 Not tainted 6.12.4-00004-g89f77a9313d4-dirty #1 [ 0.287630] Hardware name: BeagleBoard.org BeaglePlay (DT) [ 0.287637] Workqueue: async async_run_entry_fn [ 0.287653] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 0.287663] pc : __request_module+0x19c/0x204 [ 0.287671] lr : __request_module+0x198/0x204 [ 0.287679] sp : ffff8000817b2e70 [ 0.287684] x29: ffff8000817b2ef0 x28: ffff000001b994b0 x27: 0000000000000000 [ 0.287698] x26: 0000000000000000 x25: ffff8000817b3277 x24: 0000000000000000 [ 0.287712] x23: ffff000001b99000 x22: 0000000000000000 x21: ffff0000038a8800 [ 0.287726] x20: 0000000000000001 x19: ffff800080d4dc18 x18: 0000000000000002 [ 0.287739] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 0.287753] x14: 0000000000000001 x13: 0000000000000001 x12: 0000000000000001 [ 0.287767] x11: 0000000000000001 x10: 0000000000000000 x9 : 0000000000000000 [ 0.287780] x8 : ffff8000817b2ee8 x7 : 0000000000000000 x6 : 0000000000000000 [ 0.287794] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000030 [ 0.287806] x2 : 0000000000000008 x1 : ffff8000800c66dc x0 : 0000000000000001 [ 0.287820] Call trace: [ 0.287821] omap_i2c 20000000.i2c: bus 0 rev0.12 at 400 kHz [ 0.287826] __request_module+0x19c/0x204 [ 0.287835] phy_request_driver_module+0x11c/0x17c [ 0.287849] phy_device_create+0x234/0x260 [ 0.287860] get_phy_device+0x78/0x154 [ 0.287870] fwnode_mdiobus_register_phy+0x11c/0x1d8 [ 0.287880] __of_mdiobus_parse_phys+0x174/0x2a0 [ 0.287890] __of_mdiobus_register+0x104/0x240 [ 0.287899] davinci_mdio_probe+0x284/0x44c [ 0.287909] platform_probe+0x68/0xc4 [ 0.287921] really_probe+0xbc/0x29c [ 0.287930] really_probe_debug+0x88/0x110 [ 0.287940] __driver_probe_device+0xbc/0xd4 [ 0.287948] driver_probe_device+0xd8/0x15c [ 0.287957] __device_attach_driver+0xb8/0x134 [ 0.287965] bus_for_each_drv+0x88/0xe8 [ 0.287974] __device_attach+0xa0/0x190 [ 0.287982] device_initial_probe+0x14/0x20 [ 0.287991] bus_probe_device+0xac/0xb0 [ 0.287998] device_add+0x588/0x74c [ 0.288011] of_device_add+0x44/0x60 [ 0.288027] of_platform_device_create_pdata+0x8c/0x120 [ 0.288039] of_platform_device_create+0x18/0x24 [ 0.288050] am65_cpsw_nuss_probe+0x670/0xcf4 [ 0.288062] platform_probe+0x68/0xc4 [ 0.288072] really_probe+0xbc/0x29c [ 0.288079] really_probe_debug+0x88/0x110 [ 0.288088] __driver_probe_device+0xbc/0xd4 [ 0.288096] driver_probe_device+0xd8/0x15c [ 0.288105] __driver_attach_async_helper+0x4c/0xb4 [ 0.288113] async_run_entry_fn+0x34/0xe0 [ 0.288124] process_one_work+0x148/0x288 [ 0.288137] worker_thread+0x2cc/0x3d4 [ 0.288147] kthread+0x110/0x114 [ 0.288159] ret_from_fork+0x10/0x20 [ 0.288171] ---[ end trace 0000000000000000 ]--- This is expected, as request_module() is not meant to be called from an async context: https://lore.kernel.org/lkml/20130118221227.GG24579@htj.dyndns.org/ It should be noted that: - the davincio_mdio device is a child of the am65-cpsw-nuss device - the am65-cpsw-nuss driver is NOT marked with neither PROBE_PREFER_ASYNCHRONOUS nor PROBE_FORCE_SYNCHRONOUS and the behavior is being triggered specifying driver_async_probe=am65-cpsw-nuss on the command line. > > +static bool phy_driver_exists(u32 phy_id) > > +{ > > + bool found = false; > > + struct phy_drv_node *node; > > + > > + down_read(&phy_drv_list_sem); > > + list_for_each_entry(node, &phy_drv_list, list) { > > + if (phy_id_compare(phy_id, node->drv->phy_id, node->drv->phy_id_mask)) { > > + found = true; > > + break; > > + } > > + } > > + up_read(&phy_drv_list_sem); > > + > > + return found; > > +} > > + > > Why do we need this, along with the associated additional memory > allocations? What's wrong with bus_for_each_drv() which the core > code provides? > > Because I didn't think of it, but it would be much better. I'll refactor the logic using bus_for_each_drv() + a simple match callback. Thank you! Regards, Francesco
On Thu, Jan 02, 2025 at 02:26:58PM +0100, Francesco Valla wrote: > On Thursday, 2 January 2025 at 12:06:15 Russell King (Oracle) <linux@armlinux.org.uk> wrote: > > On Thu, Jan 02, 2025 at 12:51:22AM +0100, Francesco Valla wrote: > > > Whenever a new PHY device is created, request_module() is called > > > unconditionally, without checking if a driver for the new PHY is already > > > available (either built-in or from a previous probe). This conflicts > > > with async probing of the underlying MDIO bus and always throws a > > > warning (because if a driver is loaded it _might_ cause a deadlock, if > > > in turn it calls async_synchronize_full()). > > > > Why aren't any of the phylib maintainers seeing this warning? Where does > > the warning come from? > > > > I'm not sure. For me, it was pretty easy to trigger. Please include the information how you triggered it into the commit message. > This is expected, as request_module() is not meant to be called from an async > context: > > https://lore.kernel.org/lkml/20130118221227.GG24579@htj.dyndns.org/ > > It should be noted that: > - the davincio_mdio device is a child of the am65-cpsw-nuss device > - the am65-cpsw-nuss driver is NOT marked with neither PROBE_PREFER_ASYNCHRONOUS > nor PROBE_FORCE_SYNCHRONOUS and the behavior is being triggered specifying > driver_async_probe=am65-cpsw-nuss on the command line. So the phylib core is currently async probe incompatible. The whole module loading story is a bit shaky in phylib, so we need to be very careful with any changes, or you are going to break stuff, in interesting ways, with it first appearing to work, because the fallback genphy is used rather than the specific PHY driver, but then breaking when genphy is not sufficient. Please think about this as a generic problem with async probe. Is this really specific to phylib? Should some or all of the solution to the problem be moved into the driver core? Could we maybe first try an async probe using the existing drivers, and then fall back to a sync probe which can load additional drivers? One other question, how much speadup do you get with async probe of PHYs? Is it really worth the effort? Andrew
On Thursday, 2 January 2025 at 14:52:18 Andrew Lunn <andrew@lunn.ch> wrote: > On Thu, Jan 02, 2025 at 02:26:58PM +0100, Francesco Valla wrote: > > On Thursday, 2 January 2025 at 12:06:15 Russell King (Oracle) <linux@armlinux.org.uk> wrote: > > > On Thu, Jan 02, 2025 at 12:51:22AM +0100, Francesco Valla wrote: > > > > Whenever a new PHY device is created, request_module() is called > > > > unconditionally, without checking if a driver for the new PHY is already > > > > available (either built-in or from a previous probe). This conflicts > > > > with async probing of the underlying MDIO bus and always throws a > > > > warning (because if a driver is loaded it _might_ cause a deadlock, if > > > > in turn it calls async_synchronize_full()). > > > > > > Why aren't any of the phylib maintainers seeing this warning? Where does > > > the warning come from? > > > > > > > I'm not sure. For me, it was pretty easy to trigger. > > Please include the information how you triggered it into the commit > message. > Ok, will do. > > This is expected, as request_module() is not meant to be called from an async > > context: > > > > https://lore.kernel.org/lkml/20130118221227.GG24579@htj.dyndns.org/ > > > > It should be noted that: > > - the davincio_mdio device is a child of the am65-cpsw-nuss device > > - the am65-cpsw-nuss driver is NOT marked with neither PROBE_PREFER_ASYNCHRONOUS > > nor PROBE_FORCE_SYNCHRONOUS and the behavior is being triggered specifying > > driver_async_probe=am65-cpsw-nuss on the command line. > > So the phylib core is currently async probe incompatible. The whole > module loading story is a bit shaky in phylib, so we need to be very > careful with any changes, or you are going to break stuff, in > interesting ways, with it first appearing to work, because the > fallback genphy is used rather than the specific PHY driver, but then > breaking when genphy is not sufficient. > > Please think about this as a generic problem with async probe. Is this > really specific to phylib? Should some or all of the solution to the > problem be moved into the driver core? Could we maybe first try an > async probe using the existing drivers, and then fall back to a sync > probe which can load additional drivers? It probably isn't, but considering the way phylib works currently (with the genphy filling up for missing drivers etc.) I'm not sure if/how this can work. I need to think about it. > > One other question, how much speadup do you get with async probe of > PHYs? Is it really worth the effort? For me it's a reduction of ~170ms, which currently accounts for roughly the 25% of the time spent before starting userspace (660ms vs 490ms, give or take a couple of milliseconds). That's due to the large reset time required by the PHYs to initialize, so I expect it would be much lower on most of the systems. But, I've done much more to shave much less time in the past, so I think it is at least worth investigating. Thank you! Regards, Francesco > > Andrew >
> > One other question, how much speadup do you get with async probe of > > PHYs? Is it really worth the effort? > > For me it's a reduction of ~170ms, which currently accounts for roughly the 25% > of the time spent before starting userspace (660ms vs 490ms, give or take a > couple of milliseconds). That's due to the large reset time required by the PHYs > to initialize 170ms seems like a long time for a reset. Maybe check the datasheet and see what it says. Andrew
>> > mutex_init(&dev->lock); >> > INIT_DELAYED_WORK(&dev->state_queue, phy_state_machine); >> > >> >- /* Request the appropriate module unconditionally; don't >> >- * bother trying to do so only if it isn't already loaded, >> >- * because that gets complicated. A hotplug event would have >> >- * done an unconditional modprobe anyway. >> >- * We don't do normal hotplug because it won't work for MDIO >> >+ /* We don't do normal hotplug because it won't work for MDIO >> > * -- because it relies on the device staying around for long >> > * enough for the driver to get loaded. With MDIO, the NIC >> > * driver will get bored and give up as soon as it finds that @@ - >> >724,7 +745,8 @@ struct phy_device *phy_device_create(struct mii_bus >> >*bus, int addr, u32 phy_id, >> > int i; >> > >> > for (i = 1; i < num_ids; i++) { >> >- if (c45_ids->device_ids[i] == 0xffffffff) >> >+ if (c45_ids->device_ids[i] == 0xffffffff || >> >+ phy_driver_exists(c45_ids->device_ids[i])) >> > continue; >> > >> > ret = phy_request_driver_module(dev, @@ -732,7 +754,7 @@ >struct >> >phy_device *phy_device_create(struct mii_bus *bus, int addr, u32 >> >phy_id, >> > if (ret) >> > break; >> > } >> >- } else { >> >+ } else if (!phy_driver_exists(phy_id)) { >> [Suman] Can we add this phy_driver_exists() API call before the >if/else check? >> > >Not really, as in case of C45 PHYs we have to check for drivers using >(multiple) IDs, which are different from phy_id. [Suman] Hmm got it. We can isolate the else part but that won't make much difference. Thank you!! >
On Thu, Jan 02, 2025 at 02:52:18PM +0100, Andrew Lunn wrote: > On Thu, Jan 02, 2025 at 02:26:58PM +0100, Francesco Valla wrote: > > On Thursday, 2 January 2025 at 12:06:15 Russell King (Oracle) <linux@armlinux.org.uk> wrote: > > > On Thu, Jan 02, 2025 at 12:51:22AM +0100, Francesco Valla wrote: > > > > Whenever a new PHY device is created, request_module() is called > > > > unconditionally, without checking if a driver for the new PHY is already > > > > available (either built-in or from a previous probe). This conflicts > > > > with async probing of the underlying MDIO bus and always throws a > > > > warning (because if a driver is loaded it _might_ cause a deadlock, if > > > > in turn it calls async_synchronize_full()). > > > > > > Why aren't any of the phylib maintainers seeing this warning? Where does > > > the warning come from? > > > > > > > I'm not sure. For me, it was pretty easy to trigger. > > Please include the information how you triggered it into the commit > message. > > > This is expected, as request_module() is not meant to be called from an async > > context: > > > > https://lore.kernel.org/lkml/20130118221227.GG24579@htj.dyndns.org/ > > > > It should be noted that: > > - the davincio_mdio device is a child of the am65-cpsw-nuss device > > - the am65-cpsw-nuss driver is NOT marked with neither PROBE_PREFER_ASYNCHRONOUS > > nor PROBE_FORCE_SYNCHRONOUS and the behavior is being triggered specifying > > driver_async_probe=am65-cpsw-nuss on the command line. > > So the phylib core is currently async probe incompatible. The whole > module loading story is a bit shaky in phylib, so we need to be very > careful with any changes, or you are going to break stuff, in > interesting ways, with it first appearing to work, because the > fallback genphy is used rather than the specific PHY driver, but then > breaking when genphy is not sufficient. > > Please think about this as a generic problem with async probe. Is this > really specific to phylib? Should some or all of the solution to the > problem be moved into the driver core? Could we maybe first try an > async probe using the existing drivers, and then fall back to a sync > probe which can load additional drivers? > > One other question, how much speadup do you get with async probe of > PHYs? Is it really worth the effort? > I'm trying to enable async probe for my PCIe controller (pcie-dw-rockchip), which on the radxa rock5b has a RTL8125 NIC connected to it. By enabling async probe for the PCIe driver I get the same splat as Francesco. Looking at the prints, it is trying to load a module for PHY ID: 0x1cc840 This PHY ID is defined in: drivers/net/phy/realtek.c. Looking at my .config I have: CONFIG_REALTEK_PHY=y So this is not built as a module, so I am a bit surprised to see this splat (since the driver is built as built-in). I think it would be nice if the phylib core could be fixed so that it does not try to load modules for drivers which are built as built-in. Also see this old thread that tries to enable async probe by default on DT systems: https://lore.kernel.org/linux-kernel//d5796286-ec24-511a-5910-5673f8ea8b10@samsung.com/T/#u AFAICT, it seems that the phylib core is one of the biggest blockers from being able to enable async probe by default on DT systems. Kind regards, Niklas
On Fri, Jan 03, 2025 at 12:25:52PM +0100, Niklas Cassel wrote: > I'm trying to enable async probe for my PCIe controller (pcie-dw-rockchip), > which on the radxa rock5b has a RTL8125 NIC connected to it. > > By enabling async probe for the PCIe driver I get the same splat as Francesco. > > Looking at the prints, it is trying to load a module for PHY ID: 0x1cc840 > This PHY ID is defined in: drivers/net/phy/realtek.c. > > Looking at my .config I have: > CONFIG_REALTEK_PHY=y > > So this is not built as a module, so I am a bit surprised to see this > splat (since the driver is built as built-in). > > > I think it would be nice if the phylib core could be fixed so that > it does not try to load modules for drivers which are built as built-in. > > > Also see this old thread that tries to enable async probe by default on > DT systems: > https://lore.kernel.org/linux-kernel//d5796286-ec24-511a-5910-5673f8ea8b10@samsung.com/T/#u > > AFAICT, it seems that the phylib core is one of the biggest blockers from > being able to enable async probe by default on DT systems. Yes, we accept that phylib is incompatible with async probing. I don't think that's going to change, because it's fundamentally baked in with the way the whole fallback driver stuff works. We *certainly* don't want to move the request_module() into phy_attach*() (which is the point where we require the driver to be bound or we fallback to the generic feature-reduced driver). First, that *will* break SFP modules, no ifs or buts. Second, moving it there would mean calling request_module() in many cases with the RTNL held, which blocks things like new connections network establishing while the module is requested (I've run into this problem when the TI Wilink driver locks up holding the RTNL lock making the platform impossible to remotely resolve if there isn't an already open SSH connection.) We certainly don't want userspace to be doing stuff while holding the "big" RTNL that affects much of networking. Third, I suspect phylib already has a race between the PHY driver / driver core binding the appropriate driver and phy_attach_direct() attaching the fallback generic driver to the driverless PHY device, and making this more "async" is going to open that race possibly to the point where it becomes a problem. (At the moment, it doesn't seem to cause any issue, so is theoretical right now - but if one reads the code, it's obvious that there is no locking that prevents a race there.) What saves phylib right now is that by issueing the request_module(), that will wait for the module to be loaded and initialised. The initialisation function will register the PHY drivers in this module. As this is synchronous, it will happen before request_module() returns, and thus before phy_device_create() returns. Thus, if there is a module available for the PHY, it will be loaded and available to be bound to the PHY device by the time phy_device_register() is called. This ensures that - in the case of an auto-loaded module, the race will never happen. Yes, it's weak. A scenario that could trigger this is loading PHY driver modules in parallel with a call to the phy_attach*() functions, e.g. when bringing up a network interface where the network driver calls through to phy_attach*() from its .ndo_open() method. If we simply make phylib's request_module() async, then this race will be opened for auto-loaded modules as well. Closing this race to give consistent results is impossible, even if we add locking. If phy_attach*() were to complete first, the generic driver would be used despite the PHY specific driver module being loaded. Alternatively, if the PHY specific driver module finishes being loaded before phy_attach*() is called, then the PHY specific driver will be used for the device. So... it needs to be synchronous. I also don't think "make a list of built-in drivers and omit the request module" is an acceptable workaround - it's a sticky plaster for the problem. If the PHY driver isn't built-in, then you have the same problem with request_module() being issued. You could work around that by ensuring that the PHY driver is built-in, but then we're relying on multiple different things all being correct in diverse areas, which is fragile. All in all, I don't see at the moment any simple way to make phylib async-probe compatible.
On Fri, Jan 03, 2025 at 12:08:43PM +0000, Russell King (Oracle) wrote: > On Fri, Jan 03, 2025 at 12:25:52PM +0100, Niklas Cassel wrote: > > I'm trying to enable async probe for my PCIe controller (pcie-dw-rockchip), > > which on the radxa rock5b has a RTL8125 NIC connected to it. > > > > By enabling async probe for the PCIe driver I get the same splat as Francesco. > > > > Looking at the prints, it is trying to load a module for PHY ID: 0x1cc840 > > This PHY ID is defined in: drivers/net/phy/realtek.c. > > > > Looking at my .config I have: > > CONFIG_REALTEK_PHY=y > > > > So this is not built as a module, so I am a bit surprised to see this > > splat (since the driver is built as built-in). > > > > > > I think it would be nice if the phylib core could be fixed so that > > it does not try to load modules for drivers which are built as built-in. > > > > > > Also see this old thread that tries to enable async probe by default on > > DT systems: > > https://lore.kernel.org/linux-kernel//d5796286-ec24-511a-5910-5673f8ea8b10@samsung.com/T/#u > > > > AFAICT, it seems that the phylib core is one of the biggest blockers from > > being able to enable async probe by default on DT systems. > > Yes, we accept that phylib is incompatible with async probing. I don't > think that's going to change, because it's fundamentally baked in with > the way the whole fallback driver stuff works. > > We *certainly* don't want to move the request_module() into > phy_attach*() (which is the point where we require the driver to be > bound or we fallback to the generic feature-reduced driver). First, > that *will* break SFP modules, no ifs or buts. > > Second, moving it there would mean calling request_module() in many > cases with the RTNL held, which blocks things like new connections > network establishing while the module is requested (I've run into this > problem when the TI Wilink driver locks up holding the RTNL lock making > the platform impossible to remotely resolve if there isn't an already > open SSH connection.) We certainly don't want userspace to be doing > stuff while holding the "big" RTNL that affects much of networking. > > Third, I suspect phylib already has a race between the PHY driver / > driver core binding the appropriate driver and phy_attach_direct() > attaching the fallback generic driver to the driverless PHY device, > and making this more "async" is going to open that race possibly to > the point where it becomes a problem. (At the moment, it doesn't > seem to cause any issue, so is theoretical right now - but if one > reads the code, it's obvious that there is no locking that prevents > a race there.) > > What saves phylib right now is that by issueing the request_module(), > that will wait for the module to be loaded and initialised. The > initialisation function will register the PHY drivers in this module. > As this is synchronous, it will happen before request_module() returns, > and thus before phy_device_create() returns. Thus, if there is a module > available for the PHY, it will be loaded and available to be bound > to the PHY device by the time phy_device_register() is called. This > ensures that - in the case of an auto-loaded module, the race will > never happen. > > Yes, it's weak. A scenario that could trigger this is loading PHY > driver modules in parallel with a call to the phy_attach*() functions, > e.g. when bringing up a network interface where the network driver > calls through to phy_attach*() from its .ndo_open() method. If we > simply make phylib's request_module() async, then this race will be > opened for auto-loaded modules as well. > > Closing this race to give consistent results is impossible, even if > we add locking. If phy_attach*() were to complete first, the generic > driver would be used despite the PHY specific driver module being > loaded. Alternatively, if the PHY specific driver module finishes > being loaded before phy_attach*() is called, then the PHY specific > driver will be used for the device. So... it needs to be synchronous. > > I also don't think "make a list of built-in drivers and omit the > request module" is an acceptable workaround - it's a sticky plaster > for the problem. If the PHY driver isn't built-in, then you have the > same problem with request_module() being issued. You could work around > that by ensuring that the PHY driver is built-in, but then we're > relying on multiple different things all being correct in diverse > areas, which is fragile. FWIW, the patch in $subject does make the splat go away for me. (I have the PHY driver built as built-in). The patch in $subject does "Add a list of registered drivers and check if one is already available before resorting to call request_module(); in this way, if the PHY driver is already there, the MDIO bus can perform the async probe." Personally, I think this solution is better than keeping the status-quo. If we take a buildroot kernel config for a specific board as an example, they are very careful to always mark the NIC driver for the specific board as built-in, such that the board can use nfsroot. (Same logic can be applied to phylib driver.) Having a solution similar to what is suggested in $subject would certainly improve things for DT based systems. Also note that arch/arm64/configs/defconfig marks a bunch of phylib drivers as built-in. Kind regards, Niklas
On Fri, Jan 03, 2025 at 01:49:30PM +0100, Niklas Cassel wrote: > On Fri, Jan 03, 2025 at 12:08:43PM +0000, Russell King (Oracle) wrote: > > On Fri, Jan 03, 2025 at 12:25:52PM +0100, Niklas Cassel wrote: > > > I'm trying to enable async probe for my PCIe controller (pcie-dw-rockchip), > > > which on the radxa rock5b has a RTL8125 NIC connected to it. > > > > > > By enabling async probe for the PCIe driver I get the same splat as Francesco. > > > > > > Looking at the prints, it is trying to load a module for PHY ID: 0x1cc840 > > > This PHY ID is defined in: drivers/net/phy/realtek.c. > > > > > > Looking at my .config I have: > > > CONFIG_REALTEK_PHY=y > > > > > > So this is not built as a module, so I am a bit surprised to see this > > > splat (since the driver is built as built-in). > > > > > > > > > I think it would be nice if the phylib core could be fixed so that > > > it does not try to load modules for drivers which are built as built-in. > > > > > > > > > Also see this old thread that tries to enable async probe by default on > > > DT systems: > > > https://lore.kernel.org/linux-kernel//d5796286-ec24-511a-5910-5673f8ea8b10@samsung.com/T/#u > > > > > > AFAICT, it seems that the phylib core is one of the biggest blockers from > > > being able to enable async probe by default on DT systems. > > > > Yes, we accept that phylib is incompatible with async probing. I don't > > think that's going to change, because it's fundamentally baked in with > > the way the whole fallback driver stuff works. > > > > We *certainly* don't want to move the request_module() into > > phy_attach*() (which is the point where we require the driver to be > > bound or we fallback to the generic feature-reduced driver). First, > > that *will* break SFP modules, no ifs or buts. > > > > Second, moving it there would mean calling request_module() in many > > cases with the RTNL held, which blocks things like new connections > > network establishing while the module is requested (I've run into this > > problem when the TI Wilink driver locks up holding the RTNL lock making > > the platform impossible to remotely resolve if there isn't an already > > open SSH connection.) We certainly don't want userspace to be doing > > stuff while holding the "big" RTNL that affects much of networking. > > > > Third, I suspect phylib already has a race between the PHY driver / > > driver core binding the appropriate driver and phy_attach_direct() > > attaching the fallback generic driver to the driverless PHY device, > > and making this more "async" is going to open that race possibly to > > the point where it becomes a problem. (At the moment, it doesn't > > seem to cause any issue, so is theoretical right now - but if one > > reads the code, it's obvious that there is no locking that prevents > > a race there.) > > > > What saves phylib right now is that by issueing the request_module(), > > that will wait for the module to be loaded and initialised. The > > initialisation function will register the PHY drivers in this module. > > As this is synchronous, it will happen before request_module() returns, > > and thus before phy_device_create() returns. Thus, if there is a module > > available for the PHY, it will be loaded and available to be bound > > to the PHY device by the time phy_device_register() is called. This > > ensures that - in the case of an auto-loaded module, the race will > > never happen. > > > > Yes, it's weak. A scenario that could trigger this is loading PHY > > driver modules in parallel with a call to the phy_attach*() functions, > > e.g. when bringing up a network interface where the network driver > > calls through to phy_attach*() from its .ndo_open() method. If we > > simply make phylib's request_module() async, then this race will be > > opened for auto-loaded modules as well. > > > > Closing this race to give consistent results is impossible, even if > > we add locking. If phy_attach*() were to complete first, the generic > > driver would be used despite the PHY specific driver module being > > loaded. Alternatively, if the PHY specific driver module finishes > > being loaded before phy_attach*() is called, then the PHY specific > > driver will be used for the device. So... it needs to be synchronous. > > > > I also don't think "make a list of built-in drivers and omit the > > request module" is an acceptable workaround - it's a sticky plaster > > for the problem. If the PHY driver isn't built-in, then you have the > > same problem with request_module() being issued. You could work around > > that by ensuring that the PHY driver is built-in, but then we're > > relying on multiple different things all being correct in diverse > > areas, which is fragile. > > FWIW, the patch in $subject does make the splat go away for me. > (I have the PHY driver built as built-in). This is not about "making the splat go away". Making something go away is not solving the problem if it's a fragile sticky plaster. Such things come back to bite. > The patch in $subject does "Add a list of registered drivers and check > if one is already available before resorting to call request_module(); > in this way, if the PHY driver is already there, the MDIO bus can perform > the async probe." This is the sticky plaster. I covered that in my message to which you're replying. See the last paragraph from my reply quoted above.
> FWIW, the patch in $subject does make the splat go away for me. > (I have the PHY driver built as built-in). > > The patch in $subject does "Add a list of registered drivers and check > if one is already available before resorting to call request_module(); > in this way, if the PHY driver is already there, the MDIO bus can perform > the async probe." Lets take a step backwards. How in general should module loading work with async probe? Lets understand that first. Then we can think about what is special about PHYs, and how we can fit into the existing framework. Andrew
On Fri, Jan 03, 2025 at 03:12:14PM +0100, Andrew Lunn wrote: > > FWIW, the patch in $subject does make the splat go away for me. > > (I have the PHY driver built as built-in). > > > > The patch in $subject does "Add a list of registered drivers and check > > if one is already available before resorting to call request_module(); > > in this way, if the PHY driver is already there, the MDIO bus can perform > > the async probe." > > Lets take a step backwards. > > How in general should module loading work with async probe? Lets > understand that first. > > Then we can think about what is special about PHYs, and how we can fit > into the existing framework. I agree that it might be a good idea, if possible, for request_module() itself to see if the requested module is already registered, and then do nothing. Adding Luis (modules maintainer) to CC. Luis, phylib calls request_module() unconditionally, regardless if there is a driver registered already (phy driver built as built-in). This causes the splat described here to be printed: https://lore.kernel.org/netdev/7103704.9J7NaK4W3v@fedora.fritz.box/T/#u But as far as I can tell, this is a false positive, since the call cannot possibly load any module (since the driver is built as built-in and not as a module). Kind regards, Niklas
On Fri, Jan 03, 2025 at 03:34:26PM +0100, Niklas Cassel wrote: > On Fri, Jan 03, 2025 at 03:12:14PM +0100, Andrew Lunn wrote: > > > FWIW, the patch in $subject does make the splat go away for me. > > > (I have the PHY driver built as built-in). > > > > > > The patch in $subject does "Add a list of registered drivers and check > > > if one is already available before resorting to call request_module(); > > > in this way, if the PHY driver is already there, the MDIO bus can perform > > > the async probe." > > > > Lets take a step backwards. > > > > How in general should module loading work with async probe? Lets > > understand that first. > > > > Then we can think about what is special about PHYs, and how we can fit > > into the existing framework. > > I agree that it might be a good idea, if possible, for request_module() > itself to see if the requested module is already registered, and then do > nothing. > > Adding Luis (modules maintainer) to CC. > > Luis, phylib calls request_module() unconditionally, regardless if there > is a driver registered already (phy driver built as built-in). > > This causes the splat described here to be printed: > https://lore.kernel.org/netdev/7103704.9J7NaK4W3v@fedora.fritz.box/T/#u > > But as far as I can tell, this is a false positive, since the call cannot > possibly load any module (since the driver is built as built-in and not > as a module). Please be careful here. Just because it is built in for your build does not mean it is built in for everybody. The code needs to be able to load the module if it is not built in. Also, i've seen broken builds where the driver is both built in and exists as a module. User space will try to load the module, and the linker will then complain about duplicate symbols and the load fails. So it is not a false positive. There is also the question of what exactly causes the deadlock. Is simply calling user space to load a module which does not exist sufficient to cause the deadlock? That is what is happening in your system, with built in modules. Also, the kernel probably has no idea if the module has already been loaded nor not when request_module() is called. What we pass to the request_module() is not the name of the module, but a string which represents one of the IDs the PHY has. It is then userspace that maps that string to a kernel module via modules.alias. And there is a many to one mapping, many IDs map to one module. So it could be the module has already been loaded due to some other string. So, back to my original question. How should module loading work with async probe? Andrew
Hi, On Friday, 3 January 2025 at 16:37:33 Andrew Lunn <andrew@lunn.ch> wrote: > On Fri, Jan 03, 2025 at 03:34:26PM +0100, Niklas Cassel wrote: > > On Fri, Jan 03, 2025 at 03:12:14PM +0100, Andrew Lunn wrote: > > > > FWIW, the patch in $subject does make the splat go away for me. > > > > (I have the PHY driver built as built-in). > > > > > > > > The patch in $subject does "Add a list of registered drivers and check > > > > if one is already available before resorting to call request_module(); > > > > in this way, if the PHY driver is already there, the MDIO bus can perform > > > > the async probe." > > > > > > Lets take a step backwards. > > > > > > How in general should module loading work with async probe? Lets > > > understand that first. > > > > > > Then we can think about what is special about PHYs, and how we can fit > > > into the existing framework. > > > > I agree that it might be a good idea, if possible, for request_module() > > itself to see if the requested module is already registered, and then do > > nothing. > > > > Adding Luis (modules maintainer) to CC. > > > > Luis, phylib calls request_module() unconditionally, regardless if there > > is a driver registered already (phy driver built as built-in). > > > > This causes the splat described here to be printed: > > https://lore.kernel.org/netdev/7103704.9J7NaK4W3v@fedora.fritz.box/T/#u > > > > But as far as I can tell, this is a false positive, since the call cannot > > possibly load any module (since the driver is built as built-in and not > > as a module). > > Please be careful here. Just because it is built in for your build > does not mean it is built in for everybody. The code needs to be able > to load the module if it is not built in. > I fully agree here - moreover, the same WARN would be triggered if async probe is used for a NIC driver which is built as a module, with PHY drivers themselves being modules. > Also, i've seen broken builds where the driver is both built in and > exists as a module. User space will try to load the module, and the > linker will then complain about duplicate symbols and the load fails. > So it is not a false positive. There is also the question of what > exactly causes the deadlock. Is simply calling user space to load a > module which does not exist sufficient to cause the deadlock? That is > what is happening in your system, with built in modules. > The deadlock (which I did NOT observed) would be caused by a module init function calling async_synchronize_full(); quoting from __request_module(): /* * We don't allow synchronous module loading from async. Module * init may invoke async_synchronize_full() which will end up * waiting for this task which already is waiting for the module * loading to complete, leading to a deadlock. */ WARN_ON_ONCE(wait && current_is_async()); The deadlock was observed way back and the code above was added in 2013 [1]; some more work has been done during the years [2]. > Also, the kernel probably has no idea if the module has already been > loaded nor not when request_module() is called. What we pass to the > request_module() is not the name of the module, but a string which > represents one of the IDs the PHY has. It is then userspace that maps > that string to a kernel module via modules.alias. And there is a many > to one mapping, many IDs map to one module. So it could be the module > has already been loaded due to some other string. > > So, back to my original question. How should module loading work with > async probe? > Today it is not meant to work [1]; that was mainly the reason for my original patch. Maybe the best solution would be to work on that. [1] https://lore.kernel.org/all/20130118221227.GG24579@htj.dyndns.org/ [2] https://lore.kernel.org/lkml/YfsruBT19o3j0KD2@bombadil.infradead.org/T/#m4a8579a9e9a2508bea66223906e1917e164e6c70 Thank you! Regards, Francesco
On Thursday, 2 January 2025 at 15:21:51 Andrew Lunn <andrew@lunn.ch> wrote: > > > One other question, how much speadup do you get with async probe of > > > PHYs? Is it really worth the effort? > > > > For me it's a reduction of ~170ms, which currently accounts for roughly the 25% > > of the time spent before starting userspace (660ms vs 490ms, give or take a > > couple of milliseconds). That's due to the large reset time required by the PHYs > > to initialize > > 170ms seems like a long time for a reset. Maybe check the datasheet > and see what it says. > > Andrew > Turned out that another 50-60ms could be shaven off. I trusted the original implementation for the platform on the reset timings, but the datasheet used the same name for two time quantities and they chose the wrong one. Regards, Francesco
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index b26bb33cd1d4..a9e8b834851c 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -44,6 +44,14 @@ MODULE_DESCRIPTION("PHY library"); MODULE_AUTHOR("Andy Fleming"); MODULE_LICENSE("GPL"); +struct phy_drv_node { + const struct phy_driver *drv; + struct list_head list; +}; + +static LIST_HEAD(phy_drv_list); +static DECLARE_RWSEM(phy_drv_list_sem); + __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_basic_features) __ro_after_init; EXPORT_SYMBOL_GPL(phy_basic_features); @@ -658,6 +666,23 @@ static int phy_request_driver_module(struct phy_device *dev, u32 phy_id) return 0; } +static bool phy_driver_exists(u32 phy_id) +{ + bool found = false; + struct phy_drv_node *node; + + down_read(&phy_drv_list_sem); + list_for_each_entry(node, &phy_drv_list, list) { + if (phy_id_compare(phy_id, node->drv->phy_id, node->drv->phy_id_mask)) { + found = true; + break; + } + } + up_read(&phy_drv_list_sem); + + return found; +} + struct phy_device *phy_device_create(struct mii_bus *bus, int addr, u32 phy_id, bool is_c45, struct phy_c45_device_ids *c45_ids) @@ -709,11 +734,7 @@ struct phy_device *phy_device_create(struct mii_bus *bus, int addr, u32 phy_id, mutex_init(&dev->lock); INIT_DELAYED_WORK(&dev->state_queue, phy_state_machine); - /* Request the appropriate module unconditionally; don't - * bother trying to do so only if it isn't already loaded, - * because that gets complicated. A hotplug event would have - * done an unconditional modprobe anyway. - * We don't do normal hotplug because it won't work for MDIO + /* We don't do normal hotplug because it won't work for MDIO * -- because it relies on the device staying around for long * enough for the driver to get loaded. With MDIO, the NIC * driver will get bored and give up as soon as it finds that @@ -724,7 +745,8 @@ struct phy_device *phy_device_create(struct mii_bus *bus, int addr, u32 phy_id, int i; for (i = 1; i < num_ids; i++) { - if (c45_ids->device_ids[i] == 0xffffffff) + if (c45_ids->device_ids[i] == 0xffffffff || + phy_driver_exists(c45_ids->device_ids[i])) continue; ret = phy_request_driver_module(dev, @@ -732,7 +754,7 @@ struct phy_device *phy_device_create(struct mii_bus *bus, int addr, u32 phy_id, if (ret) break; } - } else { + } else if (!phy_driver_exists(phy_id)) { ret = phy_request_driver_module(dev, phy_id); } @@ -3674,6 +3696,7 @@ static int phy_remove(struct device *dev) */ int phy_driver_register(struct phy_driver *new_driver, struct module *owner) { + struct phy_drv_node *node; int retval; /* Either the features are hard coded, or dynamically @@ -3695,6 +3718,10 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner) new_driver->name)) return -EINVAL; + node = kzalloc(sizeof(*node), GFP_KERNEL); + if (!node) + return -ENOMEM; + new_driver->mdiodrv.flags |= MDIO_DEVICE_IS_PHY; new_driver->mdiodrv.driver.name = new_driver->name; new_driver->mdiodrv.driver.bus = &mdio_bus_type; @@ -3707,10 +3734,15 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner) if (retval) { pr_err("%s: Error %d in registering driver\n", new_driver->name, retval); - + kfree(node); return retval; } + down_write(&phy_drv_list_sem); + node->drv = new_driver; + list_add(&node->list, &phy_drv_list); + up_write(&phy_drv_list_sem); + pr_debug("%s: Registered new driver\n", new_driver->name); return 0; @@ -3736,6 +3768,19 @@ EXPORT_SYMBOL(phy_drivers_register); void phy_driver_unregister(struct phy_driver *drv) { + struct phy_drv_node *node; + + down_write(&phy_drv_list_sem); + list_for_each_entry(node, &phy_drv_list, list) { + if (phy_id_compare(drv->phy_id, + node->drv->phy_id, node->drv->phy_id_mask)) { + list_del(&node->list); + kfree(node); + break; + } + } + up_write(&phy_drv_list_sem); + driver_unregister(&drv->mdiodrv.driver); } EXPORT_SYMBOL(phy_driver_unregister);
Whenever a new PHY device is created, request_module() is called unconditionally, without checking if a driver for the new PHY is already available (either built-in or from a previous probe). This conflicts with async probing of the underlying MDIO bus and always throws a warning (because if a driver is loaded it _might_ cause a deadlock, if in turn it calls async_synchronize_full()). Add a list of registered drivers and check if one is already available before resorting to call request_module(); in this way, if the PHY driver is already there, the MDIO bus can perform the async probe. Signed-off-by: Francesco Valla <francesco@valla.it> --- drivers/net/phy/phy_device.c | 61 +++++++++++++++++++++++++++++++----- 1 file changed, 53 insertions(+), 8 deletions(-)