diff mbox series

mt7530: dsa_switch_parse_of() fails, causes probe code to run twice

Message ID 896514df-af33-6408-8b33-d8fd06e671ef@arinc9.com (mailing list archive)
State Not Applicable
Headers show
Series mt7530: dsa_switch_parse_of() fails, causes probe code to run twice | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Arınç ÜNAL April 14, 2023, 10:41 p.m. UTC
Hey there,

I've been working on the MT7530 DSA subdriver. While doing some tests, I
realised mt7530_probe() runs twice. I moved enabling the regulators from
mt7530_setup() to mt7530_probe(). Enabling the regulators there ends up
with exception warnings on the first time. It works fine when
mt7530_probe() is run again.

This should not be an expected behaviour, right? Any ideas how we can make
it work the first time?


[    3.150567] mt7530 mdio-bus:1f: mt7530_probe() is running
[    3.156403] mt7530 mdio-bus:1f: np is true
[    3.160608] mt7530 mdio-bus:1f: np dts parse failed
[    3.167094] mtk_soc_eth 1b100000.ethernet: generated random MAC address 96:70:de:9e:c0:88
[    3.176535] mtk_soc_eth 1b100000.ethernet eth0: mediatek frame engine at 0xf09e0000, irq 213
[...]
[    4.121791] mt7530 mdio-bus:1f: mt7530_probe() is running
[    4.127678] mt7530 mdio-bus:1f: np is true
[    4.138242] mt7530 mdio-bus:1f: np is successful
[    4.154957] mt7530 mdio-bus:1f: no interrupt support
[    4.189915] mt7530 mdio-bus:1f: configuring for fixed/trgmii link mode
[    4.198619] mt7530 mdio-bus:1f: Link is Up - 1Gbps/Full - flow control rx/tx
[    4.206437] mt7530 mdio-bus:1f wan (uninitialized): PHY [mt7530-0:00] driver [MediaTek MT7530 PHY] (irq=POLL)
[    4.218450] mt7530 mdio-bus:1f lan0 (uninitialized): PHY [mt7530-0:01] driver [MediaTek MT7530 PHY] (irq=POLL)
[    4.230201] mt7530 mdio-bus:1f lan1 (uninitialized): PHY [mt7530-0:02] driver [MediaTek MT7530 PHY] (irq=POLL)
[    4.242000] mt7530 mdio-bus:1f lan2 (uninitialized): PHY [mt7530-0:03] driver [MediaTek MT7530 PHY] (irq=POLL)
[    4.253755] mt7530 mdio-bus:1f lan3 (uninitialized): PHY [mt7530-0:04] driver [MediaTek MT7530 PHY] (irq=POLL)
[    4.265271] mtk_soc_eth 1b100000.ethernet eth0: entered promiscuous mode
[    4.272101] DSA: tree 0 setup

The exceptions:

[    8.160099] ------------[ cut here ]------------
[    8.164753] WARNING: CPU: 3 PID: 1 at drivers/regulator/core.c:2405 _regulator_put+0x170/0x178
[    8.173450] Modules linked in:
[    8.176519] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc6-next-20230413+ #17
[    8.184105] Hardware name: Mediatek Cortex-A7 (Device Tree)
[    8.189693]  unwind_backtrace from show_stack+0x18/0x1c
[    8.194947]  show_stack from dump_stack_lvl+0x40/0x4c
[    8.200022]  dump_stack_lvl from __warn+0x80/0x12c
[    8.204839]  __warn from warn_slowpath_fmt+0xc0/0x184
[    8.209916]  warn_slowpath_fmt from _regulator_put+0x170/0x178
[    8.215775]  _regulator_put from regulator_put+0x24/0x34
[    8.221109]  regulator_put from release_nodes+0x50/0xc4
[    8.226356]  release_nodes from devres_release_all+0x84/0xd0
[    8.232034]  devres_release_all from device_unbind_cleanup+0x14/0x68
[    8.238411]  device_unbind_cleanup from really_probe+0x268/0x400
[    8.244441]  really_probe from __driver_probe_device+0xa4/0x208
[    8.250384]  __driver_probe_device from driver_probe_device+0x38/0xc8
[    8.256847]  driver_probe_device from __device_attach_driver+0xb0/0x128
[    8.263484]  __device_attach_driver from bus_for_each_drv+0x98/0xec
[    8.269773]  bus_for_each_drv from __device_attach+0xb0/0x1dc
[    8.275540]  __device_attach from bus_probe_device+0x90/0x94
[    8.281221]  bus_probe_device from device_add+0x4d4/0x6c4
[    8.286639]  device_add from mdio_device_register+0x44/0x88
[    8.292230]  mdio_device_register from __of_mdiobus_register+0x1d8/0x3cc
[    8.298949]  __of_mdiobus_register from mtk_mdio_init+0x1c4/0x23c
[    8.305065]  mtk_mdio_init from mtk_probe+0x7cc/0x8ac
[    8.310140]  mtk_probe from platform_probe+0x64/0xb8
[    8.315123]  platform_probe from really_probe+0xe8/0x400
[    8.320453]  really_probe from __driver_probe_device+0xa4/0x208
[    8.326396]  __driver_probe_device from driver_probe_device+0x38/0xc8
[    8.332859]  driver_probe_device from __driver_attach+0x124/0x1d4
[    8.338975]  __driver_attach from bus_for_each_dev+0x84/0xd4
[    8.344656]  bus_for_each_dev from bus_add_driver+0xe8/0x208
[    8.350335]  bus_add_driver from driver_register+0x84/0x11c
[    8.355930]  driver_register from do_one_initcall+0x60/0x210
[    8.361612]  do_one_initcall from kernel_init_freeable+0x214/0x270
[    8.367817]  kernel_init_freeable from kernel_init+0x20/0x138
[    8.373588]  kernel_init from ret_from_fork+0x14/0x2c
[    8.378658] Exception stack(0xf0825fb0 to 0xf0825ff8)
[    8.383720] 5fa0:                                     00000000 00000000 00000000 00000000
[    8.391910] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    8.400099] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    8.406761] ---[ end trace 0000000000000000 ]---
[    8.411588] ------------[ cut here ]------------
[    8.416221] WARNING: CPU: 3 PID: 1 at drivers/regulator/core.c:2405 _regulator_put+0x170/0x178
[    8.424906] Modules linked in:
[    8.428001] CPU: 3 PID: 1 Comm: swapper/0 Tainted: G        W          6.3.0-rc6-next-20230413+ #17
[    8.437064] Hardware name: Mediatek Cortex-A7 (Device Tree)
[    8.442646]  unwind_backtrace from show_stack+0x18/0x1c
[    8.447898]  show_stack from dump_stack_lvl+0x40/0x4c
[    8.452970]  dump_stack_lvl from __warn+0x80/0x12c
[    8.457787]  __warn from warn_slowpath_fmt+0xc0/0x184
[    8.462864]  warn_slowpath_fmt from _regulator_put+0x170/0x178
[    8.468722]  _regulator_put from regulator_put+0x24/0x34
[    8.474056]  regulator_put from release_nodes+0x50/0xc4
[    8.479302]  release_nodes from devres_release_all+0x84/0xd0
[    8.484980]  devres_release_all from device_unbind_cleanup+0x14/0x68
[    8.491355]  device_unbind_cleanup from really_probe+0x268/0x400
[    8.497385]  really_probe from __driver_probe_device+0xa4/0x208
[    8.503327]  __driver_probe_device from driver_probe_device+0x38/0xc8
[    8.509790]  driver_probe_device from __device_attach_driver+0xb0/0x128
[    8.516427]  __device_attach_driver from bus_for_each_drv+0x98/0xec
[    8.522716]  bus_for_each_drv from __device_attach+0xb0/0x1dc
[    8.528484]  __device_attach from bus_probe_device+0x90/0x94
[    8.534165]  bus_probe_device from device_add+0x4d4/0x6c4
[    8.539582]  device_add from mdio_device_register+0x44/0x88
[    8.545172]  mdio_device_register from __of_mdiobus_register+0x1d8/0x3cc
[    8.551889]  __of_mdiobus_register from mtk_mdio_init+0x1c4/0x23c
[    8.558004]  mtk_mdio_init from mtk_probe+0x7cc/0x8ac
[    8.563079]  mtk_probe from platform_probe+0x64/0xb8
[    8.568062]  platform_probe from really_probe+0xe8/0x400
[    8.573392]  really_probe from __driver_probe_device+0xa4/0x208
[    8.579335]  __driver_probe_device from driver_probe_device+0x38/0xc8
[    8.585799]  driver_probe_device from __driver_attach+0x124/0x1d4
[    8.591914]  __driver_attach from bus_for_each_dev+0x84/0xd4
[    8.597594]  bus_for_each_dev from bus_add_driver+0xe8/0x208
[    8.603274]  bus_add_driver from driver_register+0x84/0x11c
[    8.608868]  driver_register from do_one_initcall+0x60/0x210
[    8.614549]  do_one_initcall from kernel_init_freeable+0x214/0x270
[    8.620753]  kernel_init_freeable from kernel_init+0x20/0x138
[    8.626523]  kernel_init from ret_from_fork+0x14/0x2c
[    8.631594] Exception stack(0xf0825fb0 to 0xf0825ff8)
[    8.636654] 5fa0:                                     00000000 00000000 00000000 00000000
[    8.644844] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    8.653033] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    8.659681] ---[ end trace 0000000000000000 ]---

Arınç

Comments

Daniel Golle April 14, 2023, 10:48 p.m. UTC | #1
On Sat, Apr 15, 2023 at 01:41:07AM +0300, Arınç ÜNAL wrote:
> Hey there,
> 
> I've been working on the MT7530 DSA subdriver. While doing some tests, I
> realised mt7530_probe() runs twice. I moved enabling the regulators from
> mt7530_setup() to mt7530_probe(). Enabling the regulators there ends up
> with exception warnings on the first time. It works fine when
> mt7530_probe() is run again.
> 
> This should not be an expected behaviour, right? Any ideas how we can make
> it work the first time?

Can you share the patch or work-in-progress tree which will allow me
to reproduce this problem?

It can of course be that regulator driver has not yet been loaded on
the first run and -EPROBE_DEFER is returned in that case. Knowing the
value of 'err' variable below would hence be valuable information.

> 
> diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
> index 02410ac439b7..57b262099791 100644
> --- a/drivers/net/dsa/mt7530.c
> +++ b/drivers/net/dsa/mt7530.c
> @@ -3248,6 +3248,8 @@ mt7530_probe(struct mdio_device *mdiodev)
>  	struct mt7530_priv *priv;
>  	struct device_node *dn;
> +	dev_info(&mdiodev->dev, "mt7530_probe() is running\n");
> +
>  	dn = mdiodev->dev.of_node;
>  	priv = devm_kzalloc(&mdiodev->dev, sizeof(*priv), GFP_KERNEL);
> diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> index e5f156940c67..738ca4420e85 100644
> --- a/net/dsa/dsa.c
> +++ b/net/dsa/dsa.c
> @@ -1488,9 +1488,12 @@ static int dsa_switch_probe(struct dsa_switch *ds)
>  		return -EINVAL;
>  	if (np) {
> +		dev_info(ds->dev, "np is true\n");
>  		err = dsa_switch_parse_of(ds, np);
> -		if (err)
> +		if (err) {
> +			dev_info(ds->dev, "np dts parse failed\n");
>  			dsa_switch_release_ports(ds);
> +		}
>  	} else if (pdata) {
>  		err = dsa_switch_parse(ds, pdata);
>  		if (err)
> @@ -1502,6 +1505,8 @@ static int dsa_switch_probe(struct dsa_switch *ds)
>  	if (err)
>  		return err;
> +	dev_info(ds->dev, "np is successful\n");
> +
>  	dst = ds->dst;
>  	dsa_tree_get(dst);
>  	err = dsa_tree_setup(dst);
> 
> [    3.150567] mt7530 mdio-bus:1f: mt7530_probe() is running
> [    3.156403] mt7530 mdio-bus:1f: np is true
> [    3.160608] mt7530 mdio-bus:1f: np dts parse failed
> [    3.167094] mtk_soc_eth 1b100000.ethernet: generated random MAC address 96:70:de:9e:c0:88
> [    3.176535] mtk_soc_eth 1b100000.ethernet eth0: mediatek frame engine at 0xf09e0000, irq 213
> [...]
> [    4.121791] mt7530 mdio-bus:1f: mt7530_probe() is running
> [    4.127678] mt7530 mdio-bus:1f: np is true
> [    4.138242] mt7530 mdio-bus:1f: np is successful
> [    4.154957] mt7530 mdio-bus:1f: no interrupt support
> [    4.189915] mt7530 mdio-bus:1f: configuring for fixed/trgmii link mode
> [    4.198619] mt7530 mdio-bus:1f: Link is Up - 1Gbps/Full - flow control rx/tx
> [    4.206437] mt7530 mdio-bus:1f wan (uninitialized): PHY [mt7530-0:00] driver [MediaTek MT7530 PHY] (irq=POLL)
> [    4.218450] mt7530 mdio-bus:1f lan0 (uninitialized): PHY [mt7530-0:01] driver [MediaTek MT7530 PHY] (irq=POLL)
> [    4.230201] mt7530 mdio-bus:1f lan1 (uninitialized): PHY [mt7530-0:02] driver [MediaTek MT7530 PHY] (irq=POLL)
> [    4.242000] mt7530 mdio-bus:1f lan2 (uninitialized): PHY [mt7530-0:03] driver [MediaTek MT7530 PHY] (irq=POLL)
> [    4.253755] mt7530 mdio-bus:1f lan3 (uninitialized): PHY [mt7530-0:04] driver [MediaTek MT7530 PHY] (irq=POLL)
> [    4.265271] mtk_soc_eth 1b100000.ethernet eth0: entered promiscuous mode
> [    4.272101] DSA: tree 0 setup
> 
> The exceptions:
> 
> [    8.160099] ------------[ cut here ]------------
> [    8.164753] WARNING: CPU: 3 PID: 1 at drivers/regulator/core.c:2405 _regulator_put+0x170/0x178
> [    8.173450] Modules linked in:
> [    8.176519] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc6-next-20230413+ #17
> [    8.184105] Hardware name: Mediatek Cortex-A7 (Device Tree)
> [    8.189693]  unwind_backtrace from show_stack+0x18/0x1c
> [    8.194947]  show_stack from dump_stack_lvl+0x40/0x4c
> [    8.200022]  dump_stack_lvl from __warn+0x80/0x12c
> [    8.204839]  __warn from warn_slowpath_fmt+0xc0/0x184
> [    8.209916]  warn_slowpath_fmt from _regulator_put+0x170/0x178
> [    8.215775]  _regulator_put from regulator_put+0x24/0x34
> [    8.221109]  regulator_put from release_nodes+0x50/0xc4
> [    8.226356]  release_nodes from devres_release_all+0x84/0xd0
> [    8.232034]  devres_release_all from device_unbind_cleanup+0x14/0x68
> [    8.238411]  device_unbind_cleanup from really_probe+0x268/0x400
> [    8.244441]  really_probe from __driver_probe_device+0xa4/0x208
> [    8.250384]  __driver_probe_device from driver_probe_device+0x38/0xc8
> [    8.256847]  driver_probe_device from __device_attach_driver+0xb0/0x128
> [    8.263484]  __device_attach_driver from bus_for_each_drv+0x98/0xec
> [    8.269773]  bus_for_each_drv from __device_attach+0xb0/0x1dc
> [    8.275540]  __device_attach from bus_probe_device+0x90/0x94
> [    8.281221]  bus_probe_device from device_add+0x4d4/0x6c4
> [    8.286639]  device_add from mdio_device_register+0x44/0x88
> [    8.292230]  mdio_device_register from __of_mdiobus_register+0x1d8/0x3cc
> [    8.298949]  __of_mdiobus_register from mtk_mdio_init+0x1c4/0x23c
> [    8.305065]  mtk_mdio_init from mtk_probe+0x7cc/0x8ac
> [    8.310140]  mtk_probe from platform_probe+0x64/0xb8
> [    8.315123]  platform_probe from really_probe+0xe8/0x400
> [    8.320453]  really_probe from __driver_probe_device+0xa4/0x208
> [    8.326396]  __driver_probe_device from driver_probe_device+0x38/0xc8
> [    8.332859]  driver_probe_device from __driver_attach+0x124/0x1d4
> [    8.338975]  __driver_attach from bus_for_each_dev+0x84/0xd4
> [    8.344656]  bus_for_each_dev from bus_add_driver+0xe8/0x208
> [    8.350335]  bus_add_driver from driver_register+0x84/0x11c
> [    8.355930]  driver_register from do_one_initcall+0x60/0x210
> [    8.361612]  do_one_initcall from kernel_init_freeable+0x214/0x270
> [    8.367817]  kernel_init_freeable from kernel_init+0x20/0x138
> [    8.373588]  kernel_init from ret_from_fork+0x14/0x2c
> [    8.378658] Exception stack(0xf0825fb0 to 0xf0825ff8)
> [    8.383720] 5fa0:                                     00000000 00000000 00000000 00000000
> [    8.391910] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [    8.400099] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [    8.406761] ---[ end trace 0000000000000000 ]---
> [    8.411588] ------------[ cut here ]------------
> [    8.416221] WARNING: CPU: 3 PID: 1 at drivers/regulator/core.c:2405 _regulator_put+0x170/0x178
> [    8.424906] Modules linked in:
> [    8.428001] CPU: 3 PID: 1 Comm: swapper/0 Tainted: G        W          6.3.0-rc6-next-20230413+ #17
> [    8.437064] Hardware name: Mediatek Cortex-A7 (Device Tree)
> [    8.442646]  unwind_backtrace from show_stack+0x18/0x1c
> [    8.447898]  show_stack from dump_stack_lvl+0x40/0x4c
> [    8.452970]  dump_stack_lvl from __warn+0x80/0x12c
> [    8.457787]  __warn from warn_slowpath_fmt+0xc0/0x184
> [    8.462864]  warn_slowpath_fmt from _regulator_put+0x170/0x178
> [    8.468722]  _regulator_put from regulator_put+0x24/0x34
> [    8.474056]  regulator_put from release_nodes+0x50/0xc4
> [    8.479302]  release_nodes from devres_release_all+0x84/0xd0
> [    8.484980]  devres_release_all from device_unbind_cleanup+0x14/0x68
> [    8.491355]  device_unbind_cleanup from really_probe+0x268/0x400
> [    8.497385]  really_probe from __driver_probe_device+0xa4/0x208
> [    8.503327]  __driver_probe_device from driver_probe_device+0x38/0xc8
> [    8.509790]  driver_probe_device from __device_attach_driver+0xb0/0x128
> [    8.516427]  __device_attach_driver from bus_for_each_drv+0x98/0xec
> [    8.522716]  bus_for_each_drv from __device_attach+0xb0/0x1dc
> [    8.528484]  __device_attach from bus_probe_device+0x90/0x94
> [    8.534165]  bus_probe_device from device_add+0x4d4/0x6c4
> [    8.539582]  device_add from mdio_device_register+0x44/0x88
> [    8.545172]  mdio_device_register from __of_mdiobus_register+0x1d8/0x3cc
> [    8.551889]  __of_mdiobus_register from mtk_mdio_init+0x1c4/0x23c
> [    8.558004]  mtk_mdio_init from mtk_probe+0x7cc/0x8ac
> [    8.563079]  mtk_probe from platform_probe+0x64/0xb8
> [    8.568062]  platform_probe from really_probe+0xe8/0x400
> [    8.573392]  really_probe from __driver_probe_device+0xa4/0x208
> [    8.579335]  __driver_probe_device from driver_probe_device+0x38/0xc8
> [    8.585799]  driver_probe_device from __driver_attach+0x124/0x1d4
> [    8.591914]  __driver_attach from bus_for_each_dev+0x84/0xd4
> [    8.597594]  bus_for_each_dev from bus_add_driver+0xe8/0x208
> [    8.603274]  bus_add_driver from driver_register+0x84/0x11c
> [    8.608868]  driver_register from do_one_initcall+0x60/0x210
> [    8.614549]  do_one_initcall from kernel_init_freeable+0x214/0x270
> [    8.620753]  kernel_init_freeable from kernel_init+0x20/0x138
> [    8.626523]  kernel_init from ret_from_fork+0x14/0x2c
> [    8.631594] Exception stack(0xf0825fb0 to 0xf0825ff8)
> [    8.636654] 5fa0:                                     00000000 00000000 00000000 00000000
> [    8.644844] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [    8.653033] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [    8.659681] ---[ end trace 0000000000000000 ]---
> 
> Arınç
Arınç ÜNAL April 14, 2023, 11:23 p.m. UTC | #2
On 15.04.2023 01:48, Daniel Golle wrote:
> On Sat, Apr 15, 2023 at 01:41:07AM +0300, Arınç ÜNAL wrote:
>> Hey there,
>>
>> I've been working on the MT7530 DSA subdriver. While doing some tests, I
>> realised mt7530_probe() runs twice. I moved enabling the regulators from
>> mt7530_setup() to mt7530_probe(). Enabling the regulators there ends up
>> with exception warnings on the first time. It works fine when
>> mt7530_probe() is run again.
>>
>> This should not be an expected behaviour, right? Any ideas how we can make
>> it work the first time?
> 
> Can you share the patch or work-in-progress tree which will allow me
> to reproduce this problem?

I tested this on vanilla 6.3-rc6. There's just the diff below that is 
applied. I encountered it on the standalone MT7530 on my Bananapi 
BPI-R2. I haven't tried it on MCM MT7530 on MT7621 SoC yet.

> 
> It can of course be that regulator driver has not yet been loaded on
> the first run and -EPROBE_DEFER is returned in that case. Knowing the
> value of 'err' variable below would hence be valuable information.

Regardless of enabling the regulator on either mt7530_probe() or 
mt7530_setup(), dsa_switch_parse_of() always fails.

Arınç
Daniel Golle April 14, 2023, 11:53 p.m. UTC | #3
On Sat, Apr 15, 2023 at 02:23:16AM +0300, Arınç ÜNAL wrote:
> On 15.04.2023 01:48, Daniel Golle wrote:
> > On Sat, Apr 15, 2023 at 01:41:07AM +0300, Arınç ÜNAL wrote:
> > > Hey there,
> > > 
> > > I've been working on the MT7530 DSA subdriver. While doing some tests, I
> > > realised mt7530_probe() runs twice. I moved enabling the regulators from
> > > mt7530_setup() to mt7530_probe(). Enabling the regulators there ends up
> > > with exception warnings on the first time. It works fine when
> > > mt7530_probe() is run again.
> > > 
> > > This should not be an expected behaviour, right? Any ideas how we can make
> > > it work the first time?
> > 
> > Can you share the patch or work-in-progress tree which will allow me
> > to reproduce this problem?
> 
> I tested this on vanilla 6.3-rc6. There's just the diff below that is
> applied. I encountered it on the standalone MT7530 on my Bananapi BPI-R2. I
> haven't tried it on MCM MT7530 on MT7621 SoC yet.
> 
> > 
> > It can of course be that regulator driver has not yet been loaded on
> > the first run and -EPROBE_DEFER is returned in that case. Knowing the
> > value of 'err' variable below would hence be valuable information.
> 
> Regardless of enabling the regulator on either mt7530_probe() or
> mt7530_setup(), dsa_switch_parse_of() always fails.

So dsa_switch_parse_of() can return -EPROBE_DEFER if the ethernet
driver responsible for the CPU port has not yet been loaded.

See net/dsa/dsa.c (inside function dsa_port_parse_of):
[...]
1232)                master = of_find_net_device_by_node(ethernet);
1233)                of_node_put(ethernet);
1234)                if (!master)
1235)                        return -EPROBE_DEFER;
[...]

Hence it would be important to include the value of 'err' in your
debugging printf output, as -EPROBE_DEFER can be an expected and
implicitely intended reality and nothing is wrong then.
Arınç ÜNAL April 15, 2023, 12:28 a.m. UTC | #4
On 15.04.2023 02:53, Daniel Golle wrote:
> On Sat, Apr 15, 2023 at 02:23:16AM +0300, Arınç ÜNAL wrote:
>> On 15.04.2023 01:48, Daniel Golle wrote:
>>> On Sat, Apr 15, 2023 at 01:41:07AM +0300, Arınç ÜNAL wrote:
>>>> Hey there,
>>>>
>>>> I've been working on the MT7530 DSA subdriver. While doing some tests, I
>>>> realised mt7530_probe() runs twice. I moved enabling the regulators from
>>>> mt7530_setup() to mt7530_probe(). Enabling the regulators there ends up
>>>> with exception warnings on the first time. It works fine when
>>>> mt7530_probe() is run again.
>>>>
>>>> This should not be an expected behaviour, right? Any ideas how we can make
>>>> it work the first time?
>>>
>>> Can you share the patch or work-in-progress tree which will allow me
>>> to reproduce this problem?
>>
>> I tested this on vanilla 6.3-rc6. There's just the diff below that is
>> applied. I encountered it on the standalone MT7530 on my Bananapi BPI-R2. I
>> haven't tried it on MCM MT7530 on MT7621 SoC yet.
>>
>>>
>>> It can of course be that regulator driver has not yet been loaded on
>>> the first run and -EPROBE_DEFER is returned in that case. Knowing the
>>> value of 'err' variable below would hence be valuable information.
>>
>> Regardless of enabling the regulator on either mt7530_probe() or
>> mt7530_setup(), dsa_switch_parse_of() always fails.
> 
> So dsa_switch_parse_of() can return -EPROBE_DEFER if the ethernet
> driver responsible for the CPU port has not yet been loaded.
> 
> See net/dsa/dsa.c (inside function dsa_port_parse_of):
> [...]
> 1232)                master = of_find_net_device_by_node(ethernet);
> 1233)                of_node_put(ethernet);
> 1234)                if (!master)
> 1235)                        return -EPROBE_DEFER;
> [...]
> 
> Hence it would be important to include the value of 'err' in your
> debugging printf output, as -EPROBE_DEFER can be an expected and
> implicitely intended reality and nothing is wrong then.

Thanks Daniel. I can't do more tests soon but this is probably what's 
going on as the logs already indicate that the MediaTek ethernet driver 
was yet to load.

As acknowledged, since running the MT7530 DSA subdriver from scratch is 
expected if the ethernet driver is not loaded yet, there's not really a 
problem. Though the switch is reset twice in a short amount of time. I 
don't think that's very great.

The driver initialisation seems serialised (at least for the drivers 
built into the kernel) as I tried sleeping for 5 seconds on 
mt7530_probe() but no other driver was loaded in the meantime so I got 
the same behaviour.

The regulator code will cause a long and nasty exception the first time. 
Though there's nothing wrong as it does what it's supposed to do on the 
second run. I'm not sure if that's negligible.

Could we at least somehow make the MT7530 DSA subdriver wait until the 
regulator driver is loaded?

Arınç
Daniel Golle April 15, 2023, 12:52 a.m. UTC | #5
On Sat, Apr 15, 2023 at 03:28:55AM +0300, Arınç ÜNAL wrote:
> On 15.04.2023 02:53, Daniel Golle wrote:
> > On Sat, Apr 15, 2023 at 02:23:16AM +0300, Arınç ÜNAL wrote:
> > > On 15.04.2023 01:48, Daniel Golle wrote:
> > > > On Sat, Apr 15, 2023 at 01:41:07AM +0300, Arınç ÜNAL wrote:
> > > > > Hey there,
> > > > > 
> > > > > I've been working on the MT7530 DSA subdriver. While doing some tests, I
> > > > > realised mt7530_probe() runs twice. I moved enabling the regulators from
> > > > > mt7530_setup() to mt7530_probe(). Enabling the regulators there ends up
> > > > > with exception warnings on the first time. It works fine when
> > > > > mt7530_probe() is run again.
> > > > > 
> > > > > This should not be an expected behaviour, right? Any ideas how we can make
> > > > > it work the first time?
> > > > 
> > > > Can you share the patch or work-in-progress tree which will allow me
> > > > to reproduce this problem?
> > > 
> > > I tested this on vanilla 6.3-rc6. There's just the diff below that is
> > > applied. I encountered it on the standalone MT7530 on my Bananapi BPI-R2. I
> > > haven't tried it on MCM MT7530 on MT7621 SoC yet.
> > > 
> > > > 
> > > > It can of course be that regulator driver has not yet been loaded on
> > > > the first run and -EPROBE_DEFER is returned in that case. Knowing the
> > > > value of 'err' variable below would hence be valuable information.
> > > 
> > > Regardless of enabling the regulator on either mt7530_probe() or
> > > mt7530_setup(), dsa_switch_parse_of() always fails.
> > 
> > So dsa_switch_parse_of() can return -EPROBE_DEFER if the ethernet
> > driver responsible for the CPU port has not yet been loaded.
> > 
> > See net/dsa/dsa.c (inside function dsa_port_parse_of):
> > [...]
> > 1232)                master = of_find_net_device_by_node(ethernet);
> > 1233)                of_node_put(ethernet);
> > 1234)                if (!master)
> > 1235)                        return -EPROBE_DEFER;
> > [...]
> > 
> > Hence it would be important to include the value of 'err' in your
> > debugging printf output, as -EPROBE_DEFER can be an expected and
> > implicitely intended reality and nothing is wrong then.
> 
> Thanks Daniel. I can't do more tests soon but this is probably what's going
> on as the logs already indicate that the MediaTek ethernet driver was yet to
> load.
> 
> As acknowledged, since running the MT7530 DSA subdriver from scratch is
> expected if the ethernet driver is not loaded yet, there's not really a
> problem. Though the switch is reset twice in a short amount of time. I don't
> think that's very great.

That's true, and we should try to avoid that.

> 
> The driver initialisation seems serialised (at least for the drivers built
> into the kernel) as I tried sleeping for 5 seconds on mt7530_probe() but no
> other driver was loaded in the meantime so I got the same behaviour.
> 
> The regulator code will cause a long and nasty exception the first time.
> Though there's nothing wrong as it does what it's supposed to do on the
> second run. I'm not sure if that's negligible.
> 
> Could we at least somehow make the MT7530 DSA subdriver wait until the
> regulator driver is loaded?

I assume the regulator-related stackdump is unrelated, but caused by
cpufreq changes, which had now been fixed by commit 0883426fd07e
("cpufreq: mediatek: Raise proc and sram max voltage for MT7622/7623").

If you are using v6.3-rc6 this commit is still missing there, but
manually picking it from linux-next should fix it.

Let me know if I can help with testing on my farm of MediaTek boards.
I'm a bit nervous about fixing MT7531BE soon, so deciding if we move
PLL activation to mt7530_probe() would be essential as it makes the
fix much easier...
Arınç ÜNAL April 15, 2023, 7:53 a.m. UTC | #6
On 15.04.2023 03:52, Daniel Golle wrote:
> On Sat, Apr 15, 2023 at 03:28:55AM +0300, Arınç ÜNAL wrote:
>> On 15.04.2023 02:53, Daniel Golle wrote:
>>> On Sat, Apr 15, 2023 at 02:23:16AM +0300, Arınç ÜNAL wrote:
>>>> On 15.04.2023 01:48, Daniel Golle wrote:
>>>>> On Sat, Apr 15, 2023 at 01:41:07AM +0300, Arınç ÜNAL wrote:
>>>>>> Hey there,
>>>>>>
>>>>>> I've been working on the MT7530 DSA subdriver. While doing some tests, I
>>>>>> realised mt7530_probe() runs twice. I moved enabling the regulators from
>>>>>> mt7530_setup() to mt7530_probe(). Enabling the regulators there ends up
>>>>>> with exception warnings on the first time. It works fine when
>>>>>> mt7530_probe() is run again.
>>>>>>
>>>>>> This should not be an expected behaviour, right? Any ideas how we can make
>>>>>> it work the first time?
>>>>>
>>>>> Can you share the patch or work-in-progress tree which will allow me
>>>>> to reproduce this problem?
>>>>
>>>> I tested this on vanilla 6.3-rc6. There's just the diff below that is
>>>> applied. I encountered it on the standalone MT7530 on my Bananapi BPI-R2. I
>>>> haven't tried it on MCM MT7530 on MT7621 SoC yet.
>>>>
>>>>>
>>>>> It can of course be that regulator driver has not yet been loaded on
>>>>> the first run and -EPROBE_DEFER is returned in that case. Knowing the
>>>>> value of 'err' variable below would hence be valuable information.
>>>>
>>>> Regardless of enabling the regulator on either mt7530_probe() or
>>>> mt7530_setup(), dsa_switch_parse_of() always fails.
>>>
>>> So dsa_switch_parse_of() can return -EPROBE_DEFER if the ethernet
>>> driver responsible for the CPU port has not yet been loaded.
>>>
>>> See net/dsa/dsa.c (inside function dsa_port_parse_of):
>>> [...]
>>> 1232)                master = of_find_net_device_by_node(ethernet);
>>> 1233)                of_node_put(ethernet);
>>> 1234)                if (!master)
>>> 1235)                        return -EPROBE_DEFER;
>>> [...]
>>>
>>> Hence it would be important to include the value of 'err' in your
>>> debugging printf output, as -EPROBE_DEFER can be an expected and
>>> implicitely intended reality and nothing is wrong then.
>>
>> Thanks Daniel. I can't do more tests soon but this is probably what's going
>> on as the logs already indicate that the MediaTek ethernet driver was yet to
>> load.
>>
>> As acknowledged, since running the MT7530 DSA subdriver from scratch is
>> expected if the ethernet driver is not loaded yet, there's not really a
>> problem. Though the switch is reset twice in a short amount of time. I don't
>> think that's very great.
> 
> That's true, and we should try to avoid that.
> 
>>
>> The driver initialisation seems serialised (at least for the drivers built
>> into the kernel) as I tried sleeping for 5 seconds on mt7530_probe() but no
>> other driver was loaded in the meantime so I got the same behaviour.
>>
>> The regulator code will cause a long and nasty exception the first time.
>> Though there's nothing wrong as it does what it's supposed to do on the
>> second run. I'm not sure if that's negligible.
>>
>> Could we at least somehow make the MT7530 DSA subdriver wait until the
>> regulator driver is loaded?
> 
> I assume the regulator-related stackdump is unrelated, but caused by
> cpufreq changes, which had now been fixed by commit 0883426fd07e
> ("cpufreq: mediatek: Raise proc and sram max voltage for MT7622/7623").
> 
> If you are using v6.3-rc6 this commit is still missing there, but
> manually picking it from linux-next should fix it.

I did one better and just did the test on the current linux-next, I get 
exceptions that seem to be identical. I also made sure this commit was 
actually there.

> 
> Let me know if I can help with testing on my farm of MediaTek boards.
> I'm a bit nervous about fixing MT7531BE soon, so deciding if we move
> PLL activation to mt7530_probe() would be essential as it makes the
> fix much easier...

Can you test this branch on MT7531AE, MT7531BE and the switch on MT7988 
SoC? I just need to complete the patch logs, the code won't change much.

https://github.com/arinc9/linux/commits/for-netnext

I'm thinking if we can -EPROBE_DEFER right at the start of 
mt7530_probe(), it should prevent the reset code from running twice, and 
enabling the regulator will run without any exceptions.

I think I can just keep enabling the regulator on mt7530_setup() if I 
can't figure that out. On MT7623NI, the switch stops working after 35 
seconds with these logs. As long as the regulator is enabled before 
this, everything keeps working.

[   35.037200] vusb: disabling
[   35.040089] vmc: disabling
[   35.042856] vmch: disabling
[   35.045709] vgp1: disabling
[   35.049010] vcamaf: disabling

Arınç
Arınç ÜNAL April 15, 2023, 9:43 a.m. UTC | #7
On 15.04.2023 10:53, Arınç ÜNAL wrote:
> On 15.04.2023 03:52, Daniel Golle wrote:
>> On Sat, Apr 15, 2023 at 03:28:55AM +0300, Arınç ÜNAL wrote:
>>> On 15.04.2023 02:53, Daniel Golle wrote:
>>>> On Sat, Apr 15, 2023 at 02:23:16AM +0300, Arınç ÜNAL wrote:
>>>>> On 15.04.2023 01:48, Daniel Golle wrote:
>>>>>> On Sat, Apr 15, 2023 at 01:41:07AM +0300, Arınç ÜNAL wrote:
>>>>>>> Hey there,
>>>>>>>
>>>>>>> I've been working on the MT7530 DSA subdriver. While doing some 
>>>>>>> tests, I
>>>>>>> realised mt7530_probe() runs twice. I moved enabling the 
>>>>>>> regulators from
>>>>>>> mt7530_setup() to mt7530_probe(). Enabling the regulators there 
>>>>>>> ends up
>>>>>>> with exception warnings on the first time. It works fine when
>>>>>>> mt7530_probe() is run again.
>>>>>>>
>>>>>>> This should not be an expected behaviour, right? Any ideas how we 
>>>>>>> can make
>>>>>>> it work the first time?
>>>>>>
>>>>>> Can you share the patch or work-in-progress tree which will allow me
>>>>>> to reproduce this problem?
>>>>>
>>>>> I tested this on vanilla 6.3-rc6. There's just the diff below that is
>>>>> applied. I encountered it on the standalone MT7530 on my Bananapi 
>>>>> BPI-R2. I
>>>>> haven't tried it on MCM MT7530 on MT7621 SoC yet.
>>>>>
>>>>>>
>>>>>> It can of course be that regulator driver has not yet been loaded on
>>>>>> the first run and -EPROBE_DEFER is returned in that case. Knowing the
>>>>>> value of 'err' variable below would hence be valuable information.
>>>>>
>>>>> Regardless of enabling the regulator on either mt7530_probe() or
>>>>> mt7530_setup(), dsa_switch_parse_of() always fails.
>>>>
>>>> So dsa_switch_parse_of() can return -EPROBE_DEFER if the ethernet
>>>> driver responsible for the CPU port has not yet been loaded.
>>>>
>>>> See net/dsa/dsa.c (inside function dsa_port_parse_of):
>>>> [...]
>>>> 1232)                master = of_find_net_device_by_node(ethernet);
>>>> 1233)                of_node_put(ethernet);
>>>> 1234)                if (!master)
>>>> 1235)                        return -EPROBE_DEFER;
>>>> [...]
>>>>
>>>> Hence it would be important to include the value of 'err' in your
>>>> debugging printf output, as -EPROBE_DEFER can be an expected and
>>>> implicitely intended reality and nothing is wrong then.
>>>
>>> Thanks Daniel. I can't do more tests soon but this is probably what's 
>>> going
>>> on as the logs already indicate that the MediaTek ethernet driver was 
>>> yet to
>>> load.
>>>
>>> As acknowledged, since running the MT7530 DSA subdriver from scratch is
>>> expected if the ethernet driver is not loaded yet, there's not really a
>>> problem. Though the switch is reset twice in a short amount of time. 
>>> I don't
>>> think that's very great.
>>
>> That's true, and we should try to avoid that.
>>
>>>
>>> The driver initialisation seems serialised (at least for the drivers 
>>> built
>>> into the kernel) as I tried sleeping for 5 seconds on mt7530_probe() 
>>> but no
>>> other driver was loaded in the meantime so I got the same behaviour.
>>>
>>> The regulator code will cause a long and nasty exception the first time.
>>> Though there's nothing wrong as it does what it's supposed to do on the
>>> second run. I'm not sure if that's negligible.
>>>
>>> Could we at least somehow make the MT7530 DSA subdriver wait until the
>>> regulator driver is loaded?
>>
>> I assume the regulator-related stackdump is unrelated, but caused by
>> cpufreq changes, which had now been fixed by commit 0883426fd07e
>> ("cpufreq: mediatek: Raise proc and sram max voltage for MT7622/7623").
>>
>> If you are using v6.3-rc6 this commit is still missing there, but
>> manually picking it from linux-next should fix it.
> 
> I did one better and just did the test on the current linux-next, I get 
> exceptions that seem to be identical. I also made sure this commit was 
> actually there.
> 
>>
>> Let me know if I can help with testing on my farm of MediaTek boards.
>> I'm a bit nervous about fixing MT7531BE soon, so deciding if we move
>> PLL activation to mt7530_probe() would be essential as it makes the
>> fix much easier...
> 
> Can you test this branch on MT7531AE, MT7531BE and the switch on MT7988 
> SoC? I just need to complete the patch logs, the code won't change much.
> 
> https://github.com/arinc9/linux/commits/for-netnext
> 
> I'm thinking if we can -EPROBE_DEFER right at the start of 
> mt7530_probe(), it should prevent the reset code from running twice, and 
> enabling the regulator will run without any exceptions.
> 
> I think I can just keep enabling the regulator on mt7530_setup() if I 
> can't figure that out. On MT7623NI, the switch stops working after 35 
> seconds with these logs. As long as the regulator is enabled before 
> this, everything keeps working.
> 
> [   35.037200] vusb: disabling
> [   35.040089] vmc: disabling
> [   35.042856] vmch: disabling
> [   35.045709] vgp1: disabling
> [   35.049010] vcamaf: disabling

I was able to confirm the error code is -517, EPROBE_DEFER.

Arınç
Vladimir Oltean April 15, 2023, 10:12 a.m. UTC | #8
On Sat, Apr 15, 2023 at 01:41:07AM +0300, Arınç ÜNAL wrote:
> I moved enabling the regulators from mt7530_setup() to mt7530_probe().
> Enabling the regulators there ends up with exception warnings on the
> first time.

Have you read what the WARN_ON() in _regulator_put() has to say?

	/* Docs say you must disable before calling regulator_put() */
	WARN_ON(regulator->enable_count);

If you call regulator_enable() during probe, do you also call
regulator_disable() during the probe error path?
Daniel Golle April 15, 2023, 12:43 p.m. UTC | #9
On Sat, Apr 15, 2023 at 10:53:10AM +0300, Arınç ÜNAL wrote:
> On 15.04.2023 03:52, Daniel Golle wrote:
> > On Sat, Apr 15, 2023 at 03:28:55AM +0300, Arınç ÜNAL wrote:
> > > On 15.04.2023 02:53, Daniel Golle wrote:
> > > > On Sat, Apr 15, 2023 at 02:23:16AM +0300, Arınç ÜNAL wrote:
> > > > > On 15.04.2023 01:48, Daniel Golle wrote:
> > > > > > On Sat, Apr 15, 2023 at 01:41:07AM +0300, Arınç ÜNAL wrote:
> > > > > > > Hey there,
> > > > > > > 
> > > > > > > I've been working on the MT7530 DSA subdriver. While doing some tests, I
> > > > > > > realised mt7530_probe() runs twice. I moved enabling the regulators from
> > > > > > > mt7530_setup() to mt7530_probe(). Enabling the regulators there ends up
> > > > > > > with exception warnings on the first time. It works fine when
> > > > > > > mt7530_probe() is run again.
> > > > > > > 
> > > > > > > This should not be an expected behaviour, right? Any ideas how we can make
> > > > > > > it work the first time?
> > > > > > 
> > > > > > Can you share the patch or work-in-progress tree which will allow me
> > > > > > to reproduce this problem?
> > > > > 
> > > > > I tested this on vanilla 6.3-rc6. There's just the diff below that is
> > > > > applied. I encountered it on the standalone MT7530 on my Bananapi BPI-R2. I
> > > > > haven't tried it on MCM MT7530 on MT7621 SoC yet.
> > > > > 
> > > > > > 
> > > > > > It can of course be that regulator driver has not yet been loaded on
> > > > > > the first run and -EPROBE_DEFER is returned in that case. Knowing the
> > > > > > value of 'err' variable below would hence be valuable information.
> > > > > 
> > > > > Regardless of enabling the regulator on either mt7530_probe() or
> > > > > mt7530_setup(), dsa_switch_parse_of() always fails.
> > > > 
> > > > So dsa_switch_parse_of() can return -EPROBE_DEFER if the ethernet
> > > > driver responsible for the CPU port has not yet been loaded.
> > > > 
> > > > See net/dsa/dsa.c (inside function dsa_port_parse_of):
> > > > [...]
> > > > 1232)                master = of_find_net_device_by_node(ethernet);
> > > > 1233)                of_node_put(ethernet);
> > > > 1234)                if (!master)
> > > > 1235)                        return -EPROBE_DEFER;
> > > > [...]
> > > > 
> > > > Hence it would be important to include the value of 'err' in your
> > > > debugging printf output, as -EPROBE_DEFER can be an expected and
> > > > implicitely intended reality and nothing is wrong then.
> > > 
> > > Thanks Daniel. I can't do more tests soon but this is probably what's going
> > > on as the logs already indicate that the MediaTek ethernet driver was yet to
> > > load.
> > > 
> > > As acknowledged, since running the MT7530 DSA subdriver from scratch is
> > > expected if the ethernet driver is not loaded yet, there's not really a
> > > problem. Though the switch is reset twice in a short amount of time. I don't
> > > think that's very great.
> > 
> > That's true, and we should try to avoid that.
> > 
> > > 
> > > The driver initialisation seems serialised (at least for the drivers built
> > > into the kernel) as I tried sleeping for 5 seconds on mt7530_probe() but no
> > > other driver was loaded in the meantime so I got the same behaviour.
> > > 
> > > The regulator code will cause a long and nasty exception the first time.
> > > Though there's nothing wrong as it does what it's supposed to do on the
> > > second run. I'm not sure if that's negligible.
> > > 
> > > Could we at least somehow make the MT7530 DSA subdriver wait until the
> > > regulator driver is loaded?
> > 
> > I assume the regulator-related stackdump is unrelated, but caused by
> > cpufreq changes, which had now been fixed by commit 0883426fd07e
> > ("cpufreq: mediatek: Raise proc and sram max voltage for MT7622/7623").
> > 
> > If you are using v6.3-rc6 this commit is still missing there, but
> > manually picking it from linux-next should fix it.
> 
> I did one better and just did the test on the current linux-next, I get
> exceptions that seem to be identical. I also made sure this commit was
> actually there.
> 
> > 
> > Let me know if I can help with testing on my farm of MediaTek boards.
> > I'm a bit nervous about fixing MT7531BE soon, so deciding if we move
> > PLL activation to mt7530_probe() would be essential as it makes the
> > fix much easier...
> 
> Can you test this branch on MT7531AE, MT7531BE and the switch on MT7988 SoC?
> I just need to complete the patch logs, the code won't change much.
> 
> https://github.com/arinc9/linux/commits/for-netnext

Tested on BPi-R64 (MT7622A+MT7531BE), BPi-R3 (MT7986A+MT7531AE) and
MT7988A reference board. All working just fine.

For BPi-R2 (MT7623N+MT7530) I made sure to disable the already enabled
regulators in the error path and hence made the WARN_ON no longer
trigger, see:
https://github.com/dangowrt/linux/commit/55035b5ac739914166ed4f026262d0fc9b17bc76

> 
> I'm thinking if we can -EPROBE_DEFER right at the start of mt7530_probe(),
> it should prevent the reset code from running twice, and enabling the
> regulator will run without any exceptions.
> 
> I think I can just keep enabling the regulator on mt7530_setup() if I can't
> figure that out.

What's bad about having the hardware setup in mt7530_setup()? I think it
even makes more sense to have it there and *not* in mt7530_probe() for
exactly such reasons. Maybe we can even also move the reset function
there and really do *all* of the hardware setup there and let the probe
function really just parse DTS, probe the hardware, allocate memory and
initialize data-structures like it is supposed to be.

That being said, I thought that having PLL actication in mt7530_probe()
would make things easier (as in: not require a function pointer to the
sgmii_create function), but looking at it now this is not even true,
we now require
void mt7530_core_write(struct mt7530_priv *priv, u32 reg, u32 val);
void mt7530_core_set(struct mt7530_priv *priv, u32 reg, u32 val);
void mt7530_core_clear(struct mt7530_priv *priv, u32 reg, u32 val);
void mt7530_write(struct mt7530_priv *priv, u32 reg, u32 val);
u32 _mt7530_read(struct mt7530_dummy_poll *p);
u32 mt7530_read(struct mt7530_priv *priv, u32 reg);
being either exported or to be inline functions in mt7530.h which
previously wasn't needed...

I may miss something here and would like to understand your perspective:
What exactly is the argument for moving all of the setup to the probe
function?

> On MT7623NI, the switch stops working after 35 seconds with
> these logs. As long as the regulator is enabled before this, everything
> keeps working.
> 
> [   35.037200] vusb: disabling
> [   35.040089] vmc: disabling
> [   35.042856] vmch: disabling
> [   35.045709] vgp1: disabling
> [   35.049010] vcamaf: disabling
> 
> Arınç
Arınç ÜNAL April 15, 2023, 1:19 p.m. UTC | #10
On 15.04.2023 15:43, Daniel Golle wrote:
> On Sat, Apr 15, 2023 at 10:53:10AM +0300, Arınç ÜNAL wrote:
>> On 15.04.2023 03:52, Daniel Golle wrote:
>>> On Sat, Apr 15, 2023 at 03:28:55AM +0300, Arınç ÜNAL wrote:
>>>> On 15.04.2023 02:53, Daniel Golle wrote:
>>>>> On Sat, Apr 15, 2023 at 02:23:16AM +0300, Arınç ÜNAL wrote:
>>>>>> On 15.04.2023 01:48, Daniel Golle wrote:
>>>>>>> On Sat, Apr 15, 2023 at 01:41:07AM +0300, Arınç ÜNAL wrote:
>>>>>>>> Hey there,
>>>>>>>>
>>>>>>>> I've been working on the MT7530 DSA subdriver. While doing some tests, I
>>>>>>>> realised mt7530_probe() runs twice. I moved enabling the regulators from
>>>>>>>> mt7530_setup() to mt7530_probe(). Enabling the regulators there ends up
>>>>>>>> with exception warnings on the first time. It works fine when
>>>>>>>> mt7530_probe() is run again.
>>>>>>>>
>>>>>>>> This should not be an expected behaviour, right? Any ideas how we can make
>>>>>>>> it work the first time?
>>>>>>>
>>>>>>> Can you share the patch or work-in-progress tree which will allow me
>>>>>>> to reproduce this problem?
>>>>>>
>>>>>> I tested this on vanilla 6.3-rc6. There's just the diff below that is
>>>>>> applied. I encountered it on the standalone MT7530 on my Bananapi BPI-R2. I
>>>>>> haven't tried it on MCM MT7530 on MT7621 SoC yet.
>>>>>>
>>>>>>>
>>>>>>> It can of course be that regulator driver has not yet been loaded on
>>>>>>> the first run and -EPROBE_DEFER is returned in that case. Knowing the
>>>>>>> value of 'err' variable below would hence be valuable information.
>>>>>>
>>>>>> Regardless of enabling the regulator on either mt7530_probe() or
>>>>>> mt7530_setup(), dsa_switch_parse_of() always fails.
>>>>>
>>>>> So dsa_switch_parse_of() can return -EPROBE_DEFER if the ethernet
>>>>> driver responsible for the CPU port has not yet been loaded.
>>>>>
>>>>> See net/dsa/dsa.c (inside function dsa_port_parse_of):
>>>>> [...]
>>>>> 1232)                master = of_find_net_device_by_node(ethernet);
>>>>> 1233)                of_node_put(ethernet);
>>>>> 1234)                if (!master)
>>>>> 1235)                        return -EPROBE_DEFER;
>>>>> [...]
>>>>>
>>>>> Hence it would be important to include the value of 'err' in your
>>>>> debugging printf output, as -EPROBE_DEFER can be an expected and
>>>>> implicitely intended reality and nothing is wrong then.
>>>>
>>>> Thanks Daniel. I can't do more tests soon but this is probably what's going
>>>> on as the logs already indicate that the MediaTek ethernet driver was yet to
>>>> load.
>>>>
>>>> As acknowledged, since running the MT7530 DSA subdriver from scratch is
>>>> expected if the ethernet driver is not loaded yet, there's not really a
>>>> problem. Though the switch is reset twice in a short amount of time. I don't
>>>> think that's very great.
>>>
>>> That's true, and we should try to avoid that.
>>>
>>>>
>>>> The driver initialisation seems serialised (at least for the drivers built
>>>> into the kernel) as I tried sleeping for 5 seconds on mt7530_probe() but no
>>>> other driver was loaded in the meantime so I got the same behaviour.
>>>>
>>>> The regulator code will cause a long and nasty exception the first time.
>>>> Though there's nothing wrong as it does what it's supposed to do on the
>>>> second run. I'm not sure if that's negligible.
>>>>
>>>> Could we at least somehow make the MT7530 DSA subdriver wait until the
>>>> regulator driver is loaded?
>>>
>>> I assume the regulator-related stackdump is unrelated, but caused by
>>> cpufreq changes, which had now been fixed by commit 0883426fd07e
>>> ("cpufreq: mediatek: Raise proc and sram max voltage for MT7622/7623").
>>>
>>> If you are using v6.3-rc6 this commit is still missing there, but
>>> manually picking it from linux-next should fix it.
>>
>> I did one better and just did the test on the current linux-next, I get
>> exceptions that seem to be identical. I also made sure this commit was
>> actually there.
>>
>>>
>>> Let me know if I can help with testing on my farm of MediaTek boards.
>>> I'm a bit nervous about fixing MT7531BE soon, so deciding if we move
>>> PLL activation to mt7530_probe() would be essential as it makes the
>>> fix much easier...
>>
>> Can you test this branch on MT7531AE, MT7531BE and the switch on MT7988 SoC?
>> I just need to complete the patch logs, the code won't change much.
>>
>> https://github.com/arinc9/linux/commits/for-netnext
> 
> Tested on BPi-R64 (MT7622A+MT7531BE), BPi-R3 (MT7986A+MT7531AE) and
> MT7988A reference board. All working just fine.
> 
> For BPi-R2 (MT7623N+MT7530) I made sure to disable the already enabled
> regulators in the error path and hence made the WARN_ON no longer
> trigger, see:
> https://github.com/dangowrt/linux/commit/55035b5ac739914166ed4f026262d0fc9b17bc76

Very nice. This is also what I had figured eventually and was in the 
process of conveying to Vladimir. I can confirm I don't get exceptions 
anymore.

> 
>>
>> I'm thinking if we can -EPROBE_DEFER right at the start of mt7530_probe(),
>> it should prevent the reset code from running twice, and enabling the
>> regulator will run without any exceptions.
>>
>> I think I can just keep enabling the regulator on mt7530_setup() if I can't
>> figure that out.
> 
> What's bad about having the hardware setup in mt7530_setup()? I think it
> even makes more sense to have it there and *not* in mt7530_probe() for
> exactly such reasons. Maybe we can even also move the reset function
> there and really do *all* of the hardware setup there and let the probe
> function really just parse DTS, probe the hardware, allocate memory and
> initialize data-structures like it is supposed to be.
> 
> That being said, I thought that having PLL actication in mt7530_probe()
> would make things easier (as in: not require a function pointer to the
> sgmii_create function), but looking at it now this is not even true,
> we now require
> void mt7530_core_write(struct mt7530_priv *priv, u32 reg, u32 val);
> void mt7530_core_set(struct mt7530_priv *priv, u32 reg, u32 val);
> void mt7530_core_clear(struct mt7530_priv *priv, u32 reg, u32 val);
> void mt7530_write(struct mt7530_priv *priv, u32 reg, u32 val);
> u32 _mt7530_read(struct mt7530_dummy_poll *p);
> u32 mt7530_read(struct mt7530_priv *priv, u32 reg);
> being either exported or to be inline functions in mt7530.h which
> previously wasn't needed...
> 
> I may miss something here and would like to understand your perspective:
> What exactly is the argument for moving all of the setup to the probe
> function?

While speaking here, let's discuss what should be considered probing. To 
me, detecting the chip ID, checking the XTAL frequency, checking whether 
port 5 is SGMII, are probe material. We retrieve information from the 
hardware and reject registering the switch if things don't fit.

The current order of the code implies that resetting the switch is 
necessary for these checks to be done.

As an example, on realtek-mdio.c [0], I can also see that reset is done 
on probe.

Now anything after that, like setting down MACs, PHYs, doing internal 
reset, pll setup, creating sgmii, lowering driving could rather be on 
mt7530_setup().

One thing that complicates this is that the MT7530 switch has got a 
unique feature, PHY muxing. I want to be able to use this feature 
without registering the switch at all. And that requires the switch to 
be at least reset.

This is especially useful for devices that only use a single port of the 
switch. We can get rid of the DSA header overhead by doing this. I 
explained this in more detail here [1].

I'm CC'ing Thiabut here since they've been interested in this for a while.

[0] 
https://github.com/arinc9/linux/blob/for-netnext/drivers/net/dsa/realtek/realtek-mdio.c#L143
[1] 
https://lore.kernel.org/netdev/0e3ca573-2190-57b0-0e98-7f5b890d328e@arinc9.com/

Arınç
Vladimir Oltean April 15, 2023, 1:38 p.m. UTC | #11
On Sat, Apr 15, 2023 at 04:19:46PM +0300, Arınç ÜNAL wrote:
> While speaking here, let's discuss what should be considered probing.
> 
> One thing that complicates this is that the MT7530 switch has got a unique
> feature, PHY muxing. I want to be able to use this feature without
> registering the switch at all. And that requires the switch to be at least
> reset.

All DSA switch drivers end their probing with a successful call to
dsa_register_switch(). I would appreciate if you wouldn't start adding
exceptions to that.
Arınç ÜNAL April 15, 2023, 1:40 p.m. UTC | #12
On 15.04.2023 16:38, Vladimir Oltean wrote:
> On Sat, Apr 15, 2023 at 04:19:46PM +0300, Arınç ÜNAL wrote:
>> While speaking here, let's discuss what should be considered probing.
>>
>> One thing that complicates this is that the MT7530 switch has got a unique
>> feature, PHY muxing. I want to be able to use this feature without
>> registering the switch at all. And that requires the switch to be at least
>> reset.
> 
> All DSA switch drivers end their probing with a successful call to
> dsa_register_switch(). I would appreciate if you wouldn't start adding
> exceptions to that.

My wording was not great there. What I meant is that PHY muxing will be 
configured before dsa_register_switch() is run.

Arınç
Vladimir Oltean April 15, 2023, 1:46 p.m. UTC | #13
On Sat, Apr 15, 2023 at 04:40:19PM +0300, Arınç ÜNAL wrote:
> My wording was not great there. What I meant is that PHY muxing will be
> configured before dsa_register_switch() is run.

And we're back to the discussion from the thread "Move MT7530 phy muxing
from DSA to PHY driver". What if someone decides that they don't need
the switch driver - can they disable it? No.

Your thoughts are stopping mid way. If you think that PHY muxing should
work without registering the DSA switch, then it doesn't belong in the
DSA driver, plain and simple. No "yeah, but I can move it here, and it
could kinda work, as a side effect of a driver failing to probe, or
probing successfully but not registering with the subsystems for its
primary purpose, or ...".
Daniel Golle April 15, 2023, 2:02 p.m. UTC | #14
On Sat, Apr 15, 2023 at 04:46:04PM +0300, Vladimir Oltean wrote:
> On Sat, Apr 15, 2023 at 04:40:19PM +0300, Arınç ÜNAL wrote:
> > My wording was not great there. What I meant is that PHY muxing will be
> > configured before dsa_register_switch() is run.
> 
> And we're back to the discussion from the thread "Move MT7530 phy muxing
> from DSA to PHY driver". What if someone decides that they don't need
> the switch driver - can they disable it? No.
> 
> Your thoughts are stopping mid way. If you think that PHY muxing should
> work without registering the DSA switch, then it doesn't belong in the
> DSA driver, plain and simple. No "yeah, but I can move it here, and it
> could kinda work, as a side effect of a driver failing to probe, or
> probing successfully but not registering with the subsystems for its
> primary purpose, or ...".

As the PHYs are accessed over the MDIO bus which is exposed by the mt7530.c
DSA driver the only middle ground would possibly be to introduce a MFD
driver taking care of creating the bus access regmap (MDIO vs. MDIO) and
expose the mt7530-controlled MDIO bus.

Obviously that'd be a bit more work than just moving some things from the
switch setup function to the probe function...
Vladimir Oltean April 15, 2023, 2:20 p.m. UTC | #15
On Sat, Apr 15, 2023 at 03:02:10PM +0100, Daniel Golle wrote:
> As the PHYs are accessed over the MDIO bus which is exposed by the mt7530.c
> DSA driver the only middle ground would possibly be to introduce a MFD
> driver taking care of creating the bus access regmap (MDIO vs. MDIO) and
> expose the mt7530-controlled MDIO bus.

Which is something I had already mentioned as a possible way forward in
the other thread. One would need to take care of ensuring a reasonable
migration path in terms of device tree compatibility though.

> 
> Obviously that'd be a bit more work than just moving some things from the
> switch setup function to the probe function...

On the other hand, it would actually work reliably, and would not depend
on whomever wanted to reorder things just a little bit differently for
his system to probe faster.
Vladimir Oltean April 15, 2023, 2:28 p.m. UTC | #16
On Sat, Apr 15, 2023 at 05:20:14PM +0300, Vladimir Oltean wrote:
> On Sat, Apr 15, 2023 at 03:02:10PM +0100, Daniel Golle wrote:
> > As the PHYs are accessed over the MDIO bus which is exposed by the mt7530.c
> > DSA driver the only middle ground would possibly be to introduce a MFD
> > driver taking care of creating the bus access regmap (MDIO vs. MDIO) and
> > expose the mt7530-controlled MDIO bus.
> 
> Which is something I had already mentioned as a possible way forward in
> the other thread. One would need to take care of ensuring a reasonable
> migration path in terms of device tree compatibility though.

You mention the PHYs because you seem to suggest that the PHY muxing
code should go there. I think it would also be possible to just put it
in this new MFD parent driver of the DSA switch, which is in fact what
I had originally suggested.
Arınç ÜNAL April 15, 2023, 2:57 p.m. UTC | #17
On 15.04.2023 17:20, Vladimir Oltean wrote:
> On Sat, Apr 15, 2023 at 03:02:10PM +0100, Daniel Golle wrote:
>> As the PHYs are accessed over the MDIO bus which is exposed by the mt7530.c
>> DSA driver the only middle ground would possibly be to introduce a MFD
>> driver taking care of creating the bus access regmap (MDIO vs. MDIO) and
>> expose the mt7530-controlled MDIO bus.
> 
> Which is something I had already mentioned as a possible way forward in
> the other thread. One would need to take care of ensuring a reasonable
> migration path in terms of device tree compatibility though.
> 
>>
>> Obviously that'd be a bit more work than just moving some things from the
>> switch setup function to the probe function...
> 
> On the other hand, it would actually work reliably, and would not depend
> on whomever wanted to reorder things just a little bit differently for
> his system to probe faster.

Ok thanks. I will investigate how the switch would be set up with an MFD 
driver, and how it would affect dt-bindings.

Looking back at my patch series, currently with this [0], SGMII on 
MT7531BE's port 6 starts working, and with Daniel's addition [1], the 
regulator warnings disappear.

I will submit the patch series as an RFC after addressing Daniel's 
inline functions suggestion.

[0] 
https://github.com/arinc9/linux/commit/89230fc01c86ec7e6b8a43a7f54ba8db97aaa4cd
[1] 
https://github.com/dangowrt/linux/commit/55035b5ac739914166ed4f026262d0fc9b17bc76

Arınç
Arınç ÜNAL April 16, 2023, 9:38 a.m. UTC | #18
On 15.04.2023 17:57, Arınç ÜNAL wrote:
> On 15.04.2023 17:20, Vladimir Oltean wrote:
>> On Sat, Apr 15, 2023 at 03:02:10PM +0100, Daniel Golle wrote:
>>> As the PHYs are accessed over the MDIO bus which is exposed by the 
>>> mt7530.c
>>> DSA driver the only middle ground would possibly be to introduce a MFD
>>> driver taking care of creating the bus access regmap (MDIO vs. MDIO) and
>>> expose the mt7530-controlled MDIO bus.
>>
>> Which is something I had already mentioned as a possible way forward in
>> the other thread. One would need to take care of ensuring a reasonable
>> migration path in terms of device tree compatibility though.
>>
>>>
>>> Obviously that'd be a bit more work than just moving some things from 
>>> the
>>> switch setup function to the probe function...
>>
>> On the other hand, it would actually work reliably, and would not depend
>> on whomever wanted to reorder things just a little bit differently for
>> his system to probe faster.
> 
> Ok thanks. I will investigate how the switch would be set up with an MFD 
> driver, and how it would affect dt-bindings.
> 
> Looking back at my patch series, currently with this [0], SGMII on 
> MT7531BE's port 6 starts working, and with Daniel's addition [1], the 
> regulator warnings disappear.
> 
> I will submit the patch series as an RFC after addressing Daniel's 
> inline functions suggestion.

I've been giving this some thought. My understanding of probe in this 
context has changed drastically. The probe here is supposed to probe the 
driver, like setting up the pointers, reading from the devicetree, 
filling up the info table, and finally calling dsa_register_switch(). It 
would not necessarily do anything to the switch hardware like resetting 
and reading information from the registers. This is currently how 
mt7530-mdio and mt7530-mmio already operate. So I'm not going to move 
anything from setup to probe.

The duplicate code on mt7530_setup() and mt7531_setup() could rather be 
put on mt753x_setup() instead. But now there's ID_MT7988 also going 
through mt753x_setup, so it's not very feasible to do this anymore, too 
many ID checks there would be.

Moving forward, I will send a separate bugfix patch that makes port 6 on 
MT7531BE work. My patch series will solely be for improving the driver.

Daniel, can you confirm this patch is enough to make port 6 work on 
MT7531BE? I won't touch the PCS creation code here as it'd be an 
improvement rather than a fix, if this works.

https://github.com/arinc9/linux/commit/bb55b97b8f600cf28433e7ff494d296a15191cb3

Arınç
Daniel Golle April 16, 2023, 11:57 a.m. UTC | #19
On Sun, Apr 16, 2023 at 12:38:03PM +0300, Arınç ÜNAL wrote:
> On 15.04.2023 17:57, Arınç ÜNAL wrote:
> > On 15.04.2023 17:20, Vladimir Oltean wrote:
> > > On Sat, Apr 15, 2023 at 03:02:10PM +0100, Daniel Golle wrote:
> > > > As the PHYs are accessed over the MDIO bus which is exposed by
> > > > the mt7530.c
> > > > DSA driver the only middle ground would possibly be to introduce a MFD
> > > > driver taking care of creating the bus access regmap (MDIO vs. MDIO) and
> > > > expose the mt7530-controlled MDIO bus.
> > > 
> > > Which is something I had already mentioned as a possible way forward in
> > > the other thread. One would need to take care of ensuring a reasonable
> > > migration path in terms of device tree compatibility though.
> > > 
> > > > 
> > > > Obviously that'd be a bit more work than just moving some things
> > > > from the
> > > > switch setup function to the probe function...
> > > 
> > > On the other hand, it would actually work reliably, and would not depend
> > > on whomever wanted to reorder things just a little bit differently for
> > > his system to probe faster.
> > 
> > Ok thanks. I will investigate how the switch would be set up with an MFD
> > driver, and how it would affect dt-bindings.
> > 
> > Looking back at my patch series, currently with this [0], SGMII on
> > MT7531BE's port 6 starts working, and with Daniel's addition [1], the
> > regulator warnings disappear.
> > 
> > I will submit the patch series as an RFC after addressing Daniel's
> > inline functions suggestion.
> 
> I've been giving this some thought. My understanding of probe in this
> context has changed drastically. The probe here is supposed to probe the
> driver, like setting up the pointers, reading from the devicetree, filling
> up the info table, and finally calling dsa_register_switch(). It would not
> necessarily do anything to the switch hardware like resetting and reading
> information from the registers. This is currently how mt7530-mdio and
> mt7530-mmio already operate. So I'm not going to move anything from setup to
> probe.
> 
> The duplicate code on mt7530_setup() and mt7531_setup() could rather be put
> on mt753x_setup() instead. But now there's ID_MT7988 also going through
> mt753x_setup, so it's not very feasible to do this anymore, too many ID
> checks there would be.
> 
> Moving forward, I will send a separate bugfix patch that makes port 6 on
> MT7531BE work. My patch series will solely be for improving the driver.
> 
> Daniel, can you confirm this patch is enough to make port 6 work on
> MT7531BE? I won't touch the PCS creation code here as it'd be an improvement
> rather than a fix, if this works.
> 
> https://github.com/arinc9/linux/commit/bb55b97b8f600cf28433e7ff494d296a15191cb3

Why don't we use my original solution [1] which has some advantages:

 * It doesn't requrire additional export of mt7530_regmap_bus

 * It doesn't move PCS creation to mt7530.c, hence PCS_MTK_LYNXI is
   only required for MDIO-connected switches
   (with your patch we would have to move the dependency on PCS_MTK_LYNXI
   from NET_DSA_MT7530_MDIO to NET_DSA_MT7530)

 * It doesn't expose the dysfunctional SerDes PCS for port 5 on MT7531BE
   This will still fail and hence result in probing on MT7531 to exit
   prematurely, preventing the switch driver from being loaded.
   Before 9ecc00164dc23 ("net: dsa: mt7530: refactor SGMII PCS creation")
   the return value of mtk_pcs_lynxi_create was ignored, now it isn't...

 * It changes much less in terms of LoC


I've of course also already addressed the comments of Jesse Brandeburg
and will submit it in a few moments.


[1]: https://patchwork.kernel.org/project/linux-mediatek/patch/ZDSlm-0gyyDZXy_k@makrotopia.org/

Thank you anyway for eyes and brains on this, I appreciate that.
diff mbox series

Patch

diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
index 02410ac439b7..57b262099791 100644
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -3248,6 +3248,8 @@  mt7530_probe(struct mdio_device *mdiodev)
  	struct mt7530_priv *priv;
  	struct device_node *dn;
  
+	dev_info(&mdiodev->dev, "mt7530_probe() is running\n");
+
  	dn = mdiodev->dev.of_node;
  
  	priv = devm_kzalloc(&mdiodev->dev, sizeof(*priv), GFP_KERNEL);
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index e5f156940c67..738ca4420e85 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -1488,9 +1488,12 @@  static int dsa_switch_probe(struct dsa_switch *ds)
  		return -EINVAL;
  
  	if (np) {
+		dev_info(ds->dev, "np is true\n");
  		err = dsa_switch_parse_of(ds, np);
-		if (err)
+		if (err) {
+			dev_info(ds->dev, "np dts parse failed\n");
  			dsa_switch_release_ports(ds);
+		}
  	} else if (pdata) {
  		err = dsa_switch_parse(ds, pdata);
  		if (err)
@@ -1502,6 +1505,8 @@  static int dsa_switch_probe(struct dsa_switch *ds)
  	if (err)
  		return err;
  
+	dev_info(ds->dev, "np is successful\n");
+
  	dst = ds->dst;
  	dsa_tree_get(dst);
  	err = dsa_tree_setup(dst);