r8169: do not call rtl8169_down() twice on shutdown

Message ID	20250415095335.506266-2-cassel@kernel.org (mailing list archive)
State	Changes Requested
Delegated to:	Netdev Maintainers
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B368288C83 for <netdev@vger.kernel.org>; Tue, 15 Apr 2025 09:53:47 +0000 (UTC) From: Niklas Cassel <cassel@kernel.org> To: Heiner Kallweit <hkallweit1@gmail.com>, nic_swsd@realtek.com, Andrew Lunn <andrew+netdev@lunn.ch>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, Niklas Cassel <cassel@kernel.org>, netdev@vger.kernel.org Subject: [PATCH] r8169: do not call rtl8169_down() twice on shutdown Date: Tue, 15 Apr 2025 11:53:36 +0200 Message-ID: <20250415095335.506266-2-cassel@kernel.org> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	r8169: do not call rtl8169_down() twice on shutdown \| expand r8169: do not call rtl8169_down() twice on shutdown

Context	Check	Description
netdev/series_format	warning	Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection	success	Guessed tree name to be net-next
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 1 this patch: 1
netdev/build_tools	success	No tools touched, skip
netdev/cc_maintainers	success	CCed 7 of 7 maintainers
netdev/build_clang	success	Errors and warnings before: 2 this patch: 2
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 2 this patch: 2
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 9 lines checked
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
netdev/contest	fail	net-next-2025-04-16--00-00 (tests: 916)

Niklas Cassel April 15, 2025, 9:53 a.m. UTC

For a PCI controller driver with a .shutdown() callback, we will see the
following warning:
[   12.020111] called from state HALTED
[   12.020459] WARNING: CPU: 7 PID: 229 at drivers/net/phy/phy.c:1630 phy_stop+0x134/0x1a0

This is because rtl8169_down() (which calls phy_stop()) is called twice
during shutdown.

First time:
[   23.827764] Call trace:
[   23.827765]  show_stack+0x20/0x40 (C)
[   23.827774]  dump_stack_lvl+0x60/0x80
[   23.827778]  dump_stack+0x18/0x24
[   23.827782]  rtl8169_down+0x30/0x2a0
[   23.827788]  rtl_shutdown+0xb0/0xc0
[   23.827792]  pci_device_shutdown+0x3c/0x88
[   23.827797]  device_shutdown+0x150/0x278
[   23.827802]  kernel_restart+0x4c/0xb8

Second time:
[   23.841468] Call trace:
[   23.841470]  show_stack+0x20/0x40 (C)
[   23.841478]  dump_stack_lvl+0x60/0x80
[   23.841483]  dump_stack+0x18/0x24
[   23.841486]  rtl8169_down+0x30/0x2a0
[   23.841492]  rtl8169_close+0x64/0x100
[   23.841496]  __dev_close_many+0xbc/0x1f0
[   23.841502]  dev_close_many+0x94/0x160
[   23.841505]  unregister_netdevice_many_notify+0x160/0x9d0
[   23.841510]  unregister_netdevice_queue+0xf0/0x100
[   23.841515]  unregister_netdev+0x2c/0x58
[   23.841519]  rtl_remove_one+0xa0/0xe0
[   23.841524]  pci_device_remove+0x4c/0xf8
[   23.841528]  device_remove+0x54/0x90
[   23.841534]  device_release_driver_internal+0x1d4/0x238
[   23.841539]  device_release_driver+0x20/0x38
[   23.841544]  pci_stop_bus_device+0x84/0xe0
[   23.841548]  pci_stop_bus_device+0x40/0xe0
[   23.841552]  pci_stop_root_bus+0x48/0x80
[   23.841555]  dw_pcie_host_deinit+0x34/0xe0
[   23.841559]  rockchip_pcie_shutdown+0x20/0x38
[   23.841565]  platform_shutdown+0x2c/0x48
[   23.841571]  device_shutdown+0x150/0x278
[   23.841575]  kernel_restart+0x4c/0xb8

Add a netif_device_present() guard around the rtl8169_down() call in
rtl8169_close(), to avoid rtl8169_down() from being called twice.

This matches how e.g. e1000e_close() has a netif_device_present() guard
around the e1000e_down() call.

Signed-off-by: Niklas Cassel <cassel@kernel.org>
---
 drivers/net/ethernet/realtek/r8169_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Heiner Kallweit April 15, 2025, 6:48 p.m. UTC | #1

On 15.04.2025 11:53, Niklas Cassel wrote:
> For a PCI controller driver with a .shutdown() callback, we will see the
> following warning:

I saw the related mail thread, IIRC it was about potential new code.
Is this right? Or do we talk about existing code? Then it would have
to be treated as fix.

Existence of a shutdown callback itself is not the problem, the problem is
that all PCI bus devices are removed as part of shutdown handling for this
specific controller driver.

> [   12.020111] called from state HALTED
> [   12.020459] WARNING: CPU: 7 PID: 229 at drivers/net/phy/phy.c:1630 phy_stop+0x134/0x1a0
> 
> This is because rtl8169_down() (which calls phy_stop()) is called twice
> during shutdown.
> 
> First time:
> [   23.827764] Call trace:
> [   23.827765]  show_stack+0x20/0x40 (C)
> [   23.827774]  dump_stack_lvl+0x60/0x80
> [   23.827778]  dump_stack+0x18/0x24
> [   23.827782]  rtl8169_down+0x30/0x2a0
> [   23.827788]  rtl_shutdown+0xb0/0xc0
> [   23.827792]  pci_device_shutdown+0x3c/0x88
> [   23.827797]  device_shutdown+0x150/0x278
> [   23.827802]  kernel_restart+0x4c/0xb8
> 
> Second time:
> [   23.841468] Call trace:
> [   23.841470]  show_stack+0x20/0x40 (C)
> [   23.841478]  dump_stack_lvl+0x60/0x80
> [   23.841483]  dump_stack+0x18/0x24
> [   23.841486]  rtl8169_down+0x30/0x2a0
> [   23.841492]  rtl8169_close+0x64/0x100
> [   23.841496]  __dev_close_many+0xbc/0x1f0
> [   23.841502]  dev_close_many+0x94/0x160
> [   23.841505]  unregister_netdevice_many_notify+0x160/0x9d0
> [   23.841510]  unregister_netdevice_queue+0xf0/0x100
> [   23.841515]  unregister_netdev+0x2c/0x58
> [   23.841519]  rtl_remove_one+0xa0/0xe0
> [   23.841524]  pci_device_remove+0x4c/0xf8
> [   23.841528]  device_remove+0x54/0x90
> [   23.841534]  device_release_driver_internal+0x1d4/0x238
> [   23.841539]  device_release_driver+0x20/0x38
> [   23.841544]  pci_stop_bus_device+0x84/0xe0
> [   23.841548]  pci_stop_bus_device+0x40/0xe0
> [   23.841552]  pci_stop_root_bus+0x48/0x80
> [   23.841555]  dw_pcie_host_deinit+0x34/0xe0
> [   23.841559]  rockchip_pcie_shutdown+0x20/0x38
> [   23.841565]  platform_shutdown+0x2c/0x48
> [   23.841571]  device_shutdown+0x150/0x278
> [   23.841575]  kernel_restart+0x4c/0xb8
> 
> Add a netif_device_present() guard around the rtl8169_down() call in
> rtl8169_close(), to avoid rtl8169_down() from being called twice.
> 
> This matches how e.g. e1000e_close() has a netif_device_present() guard
> around the e1000e_down() call.
> 
This approach has at least two issues:

1. Likely it breaks WoL, because now phy_detach() is called.
2. r8169 shutdown callback sets device to D3hot, PCI core wakes it up again
   for the remove callback. Now it's left in D0.

I'll also spend a few thoughts on how to solve this best.

> Signed-off-by: Niklas Cassel <cassel@kernel.org>
> ---
>  drivers/net/ethernet/realtek/r8169_main.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> index 4eebd9cb40a3..0300a06ae260 100644
> --- a/drivers/net/ethernet/realtek/r8169_main.c
> +++ b/drivers/net/ethernet/realtek/r8169_main.c
> @@ -4879,7 +4879,8 @@ static int rtl8169_close(struct net_device *dev)
>  	pm_runtime_get_sync(&pdev->dev);
>  
>  	netif_stop_queue(dev);
> -	rtl8169_down(tp);
> +	if (netif_device_present(tp->dev))
> +		rtl8169_down(tp);
>  	rtl8169_rx_clear(tp);
>  
>  	free_irq(tp->irq, tp);

Niklas Cassel April 16, 2025, 2:13 p.m. UTC | #2

On Tue, Apr 15, 2025 at 08:48:53PM +0200, Heiner Kallweit wrote:
> On 15.04.2025 11:53, Niklas Cassel wrote:
> > For a PCI controller driver with a .shutdown() callback, we will see the
> > following warning:
> 
> I saw the related mail thread, IIRC it was about potential new code.
> Is this right? Or do we talk about existing code? Then it would have
> to be treated as fix.

The qcom PCIe controller driver wants to add a .shutdown callback:
https://lore.kernel.org/linux-pci/tb6kkyfgpemzacfwbg45otgcrg5ltj323y6ufjy3bw3rgri7uf@vrmmtueyg7su/T/#t

In that shutdown callback, they want to call dw_pcie_host_deinit(),
which will call pci_stop_root_bus().

Looking at other PCIe controller drivers:
$ git grep -p pci_stop_root_bus

There seems to be a lot of PCIe controller drivers that call
pci_stop_root_bus(), but they all do it from the .remove() callback,
not from the .shutdown() callback.

Actually, I can't see any existing PCIe controller driver that calls
pci_stop_root_bus() from the .shutdown() callback.

So perhaps we should hold off with this patch.

Adding the qcom folks to CC.


Kind regards,
Niklas


> 
> Existence of a shutdown callback itself is not the problem, the problem is
> that all PCI bus devices are removed as part of shutdown handling for this
> specific controller driver.
> 
> > [   12.020111] called from state HALTED
> > [   12.020459] WARNING: CPU: 7 PID: 229 at drivers/net/phy/phy.c:1630 phy_stop+0x134/0x1a0
> > 
> > This is because rtl8169_down() (which calls phy_stop()) is called twice
> > during shutdown.
> > 
> > First time:
> > [   23.827764] Call trace:
> > [   23.827765]  show_stack+0x20/0x40 (C)
> > [   23.827774]  dump_stack_lvl+0x60/0x80
> > [   23.827778]  dump_stack+0x18/0x24
> > [   23.827782]  rtl8169_down+0x30/0x2a0
> > [   23.827788]  rtl_shutdown+0xb0/0xc0
> > [   23.827792]  pci_device_shutdown+0x3c/0x88
> > [   23.827797]  device_shutdown+0x150/0x278
> > [   23.827802]  kernel_restart+0x4c/0xb8
> > 
> > Second time:
> > [   23.841468] Call trace:
> > [   23.841470]  show_stack+0x20/0x40 (C)
> > [   23.841478]  dump_stack_lvl+0x60/0x80
> > [   23.841483]  dump_stack+0x18/0x24
> > [   23.841486]  rtl8169_down+0x30/0x2a0
> > [   23.841492]  rtl8169_close+0x64/0x100
> > [   23.841496]  __dev_close_many+0xbc/0x1f0
> > [   23.841502]  dev_close_many+0x94/0x160
> > [   23.841505]  unregister_netdevice_many_notify+0x160/0x9d0
> > [   23.841510]  unregister_netdevice_queue+0xf0/0x100
> > [   23.841515]  unregister_netdev+0x2c/0x58
> > [   23.841519]  rtl_remove_one+0xa0/0xe0
> > [   23.841524]  pci_device_remove+0x4c/0xf8
> > [   23.841528]  device_remove+0x54/0x90
> > [   23.841534]  device_release_driver_internal+0x1d4/0x238
> > [   23.841539]  device_release_driver+0x20/0x38
> > [   23.841544]  pci_stop_bus_device+0x84/0xe0
> > [   23.841548]  pci_stop_bus_device+0x40/0xe0
> > [   23.841552]  pci_stop_root_bus+0x48/0x80
> > [   23.841555]  dw_pcie_host_deinit+0x34/0xe0
> > [   23.841559]  rockchip_pcie_shutdown+0x20/0x38
> > [   23.841565]  platform_shutdown+0x2c/0x48
> > [   23.841571]  device_shutdown+0x150/0x278
> > [   23.841575]  kernel_restart+0x4c/0xb8
> > 
> > Add a netif_device_present() guard around the rtl8169_down() call in
> > rtl8169_close(), to avoid rtl8169_down() from being called twice.
> > 
> > This matches how e.g. e1000e_close() has a netif_device_present() guard
> > around the e1000e_down() call.
> > 
> This approach has at least two issues:
> 
> 1. Likely it breaks WoL, because now phy_detach() is called.
> 2. r8169 shutdown callback sets device to D3hot, PCI core wakes it up again
>    for the remove callback. Now it's left in D0.
> 
> I'll also spend a few thoughts on how to solve this best.
> 
> > Signed-off-by: Niklas Cassel <cassel@kernel.org>
> > ---
> >  drivers/net/ethernet/realtek/r8169_main.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> > index 4eebd9cb40a3..0300a06ae260 100644
> > --- a/drivers/net/ethernet/realtek/r8169_main.c
> > +++ b/drivers/net/ethernet/realtek/r8169_main.c
> > @@ -4879,7 +4879,8 @@ static int rtl8169_close(struct net_device *dev)
> >  	pm_runtime_get_sync(&pdev->dev);
> >  
> >  	netif_stop_queue(dev);
> > -	rtl8169_down(tp);
> > +	if (netif_device_present(tp->dev))
> > +		rtl8169_down(tp);
> >  	rtl8169_rx_clear(tp);
> >  
> >  	free_irq(tp->irq, tp);
>

Krishna Chaitanya Chundru April 16, 2025, 3:45 p.m. UTC | #3

On 4/16/2025 7:43 PM, Niklas Cassel wrote:
> On Tue, Apr 15, 2025 at 08:48:53PM +0200, Heiner Kallweit wrote:
>> On 15.04.2025 11:53, Niklas Cassel wrote:
>>> For a PCI controller driver with a .shutdown() callback, we will see the
>>> following warning:
>>
>> I saw the related mail thread, IIRC it was about potential new code.
>> Is this right? Or do we talk about existing code? Then it would have
>> to be treated as fix.
> 
> The qcom PCIe controller driver wants to add a .shutdown callback:
> https://lore.kernel.org/linux-pci/tb6kkyfgpemzacfwbg45otgcrg5ltj323y6ufjy3bw3rgri7uf@vrmmtueyg7su/T/#t
> 
> In that shutdown callback, they want to call dw_pcie_host_deinit(),
> which will call pci_stop_root_bus().
> 
> Looking at other PCIe controller drivers:
> $ git grep -p pci_stop_root_bus
> 
> There seems to be a lot of PCIe controller drivers that call
> pci_stop_root_bus(), but they all do it from the .remove() callback,
> not from the .shutdown() callback.
> 
> Actually, I can't see any existing PCIe controller driver that calls
> pci_stop_root_bus() from the .shutdown() callback.
> 
Not all endpoint have .shutdown callback(), the endpoint drivers still
will try to do data transfers if we don't call pci_stop_stop_bus(), and
if there are ongoing transfers over link and if the controller driver
turns off the PCIe link without calling pci_stop_root_bus we might get
bus errors, NOC errors etc which can hang the system.

This should be handled gracefully in the rtl8169 driver.
> So perhaps we should hold off with this patch.
> 
I disagree on this, it might be causing issue with net driver, but we
might face some other issues as explained above if we don't call
pci_stop_root_bus().

- Krishna Chaitanya.
> Adding the qcom folks to CC.
> 
> 
> Kind regards,
> Niklas
> 
> 
>>
>> Existence of a shutdown callback itself is not the problem, the problem is
>> that all PCI bus devices are removed as part of shutdown handling for this
>> specific controller driver.
>>
>>> [   12.020111] called from state HALTED
>>> [   12.020459] WARNING: CPU: 7 PID: 229 at drivers/net/phy/phy.c:1630 phy_stop+0x134/0x1a0
>>>
>>> This is because rtl8169_down() (which calls phy_stop()) is called twice
>>> during shutdown.
>>>
>>> First time:
>>> [   23.827764] Call trace:
>>> [   23.827765]  show_stack+0x20/0x40 (C)
>>> [   23.827774]  dump_stack_lvl+0x60/0x80
>>> [   23.827778]  dump_stack+0x18/0x24
>>> [   23.827782]  rtl8169_down+0x30/0x2a0
>>> [   23.827788]  rtl_shutdown+0xb0/0xc0
>>> [   23.827792]  pci_device_shutdown+0x3c/0x88
>>> [   23.827797]  device_shutdown+0x150/0x278
>>> [   23.827802]  kernel_restart+0x4c/0xb8
>>>
>>> Second time:
>>> [   23.841468] Call trace:
>>> [   23.841470]  show_stack+0x20/0x40 (C)
>>> [   23.841478]  dump_stack_lvl+0x60/0x80
>>> [   23.841483]  dump_stack+0x18/0x24
>>> [   23.841486]  rtl8169_down+0x30/0x2a0
>>> [   23.841492]  rtl8169_close+0x64/0x100
>>> [   23.841496]  __dev_close_many+0xbc/0x1f0
>>> [   23.841502]  dev_close_many+0x94/0x160
>>> [   23.841505]  unregister_netdevice_many_notify+0x160/0x9d0
>>> [   23.841510]  unregister_netdevice_queue+0xf0/0x100
>>> [   23.841515]  unregister_netdev+0x2c/0x58
>>> [   23.841519]  rtl_remove_one+0xa0/0xe0
>>> [   23.841524]  pci_device_remove+0x4c/0xf8
>>> [   23.841528]  device_remove+0x54/0x90
>>> [   23.841534]  device_release_driver_internal+0x1d4/0x238
>>> [   23.841539]  device_release_driver+0x20/0x38
>>> [   23.841544]  pci_stop_bus_device+0x84/0xe0
>>> [   23.841548]  pci_stop_bus_device+0x40/0xe0
>>> [   23.841552]  pci_stop_root_bus+0x48/0x80
>>> [   23.841555]  dw_pcie_host_deinit+0x34/0xe0
>>> [   23.841559]  rockchip_pcie_shutdown+0x20/0x38
>>> [   23.841565]  platform_shutdown+0x2c/0x48
>>> [   23.841571]  device_shutdown+0x150/0x278
>>> [   23.841575]  kernel_restart+0x4c/0xb8
>>>
>>> Add a netif_device_present() guard around the rtl8169_down() call in
>>> rtl8169_close(), to avoid rtl8169_down() from being called twice.
>>>
>>> This matches how e.g. e1000e_close() has a netif_device_present() guard
>>> around the e1000e_down() call.
>>>
>> This approach has at least two issues:
>>
>> 1. Likely it breaks WoL, because now phy_detach() is called.
>> 2. r8169 shutdown callback sets device to D3hot, PCI core wakes it up again
>>     for the remove callback. Now it's left in D0.
>>
>> I'll also spend a few thoughts on how to solve this best.
>>
>>> Signed-off-by: Niklas Cassel <cassel@kernel.org>
>>> ---
>>>   drivers/net/ethernet/realtek/r8169_main.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>>> index 4eebd9cb40a3..0300a06ae260 100644
>>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>>> @@ -4879,7 +4879,8 @@ static int rtl8169_close(struct net_device *dev)
>>>   	pm_runtime_get_sync(&pdev->dev);
>>>   
>>>   	netif_stop_queue(dev);
>>> -	rtl8169_down(tp);
>>> +	if (netif_device_present(tp->dev))
>>> +		rtl8169_down(tp);
>>>   	rtl8169_rx_clear(tp);
>>>   
>>>   	free_irq(tp->irq, tp);
>>

Niklas Cassel April 16, 2025, 4:03 p.m. UTC | #4

Hello Krishna Chaitanya,

On Wed, Apr 16, 2025 at 09:15:19PM +0530, Krishna Chaitanya Chundru wrote:
> On 4/16/2025 7:43 PM, Niklas Cassel wrote:
> > 
> > So perhaps we should hold off with this patch.
> > 
> I disagree on this, it might be causing issue with net driver, but we
> might face some other issues as explained above if we don't call
> pci_stop_root_bus().

When I wrote hold off with this patch, I meant the patch in $subject,
not your patch.

When it comes to your patch, I think that the commit log needs to explain
why it is so special.

Because AFAICT, all other PCIe controller drivers call pci_stop_root_bus()
in the .remove() callback, not in the .shutdown() callback.

Doing things differently from all other PCIe controller drivers is usually
a red flag.

Kind regards,
Niklas

Krishna Chaitanya Chundru April 16, 2025, 5:15 p.m. UTC | #5

On 4/16/2025 9:33 PM, Niklas Cassel wrote:
> Hello Krishna Chaitanya,
> 
> On Wed, Apr 16, 2025 at 09:15:19PM +0530, Krishna Chaitanya Chundru wrote:
>> On 4/16/2025 7:43 PM, Niklas Cassel wrote:
>>>
>>> So perhaps we should hold off with this patch.
>>>
>> I disagree on this, it might be causing issue with net driver, but we
>> might face some other issues as explained above if we don't call
>> pci_stop_root_bus().
> 
> When I wrote hold off with this patch, I meant the patch in $subject,
> not your patch.
> 
oh sorry, I misunderstood it.
> 
> When it comes to your patch, I think that the commit log needs to explain
> why it is so special.
> 
> Because AFAICT, all other PCIe controller drivers call pci_stop_root_bus()
> in the .remove() callback, not in the .shutdown() callback.
> 
> Doing things differently from all other PCIe controller drivers is usually
> a red flag.
> 
I will update all the required details in the next version of that
patch.

- Krishna Chaitanya.
> 
> Kind regards,
> Niklas

Manivannan Sadhasivam April 18, 2025, 5:19 p.m. UTC | #6

On Wed, Apr 16, 2025 at 06:03:36PM +0200, Niklas Cassel wrote:
> Hello Krishna Chaitanya,
> 
> On Wed, Apr 16, 2025 at 09:15:19PM +0530, Krishna Chaitanya Chundru wrote:
> > On 4/16/2025 7:43 PM, Niklas Cassel wrote:
> > > 
> > > So perhaps we should hold off with this patch.
> > > 
> > I disagree on this, it might be causing issue with net driver, but we
> > might face some other issues as explained above if we don't call
> > pci_stop_root_bus().
> 
> When I wrote hold off with this patch, I meant the patch in $subject,
> not your patch.
> 
> 
> When it comes to your patch, I think that the commit log needs to explain
> why it is so special.
> 
> Because AFAICT, all other PCIe controller drivers call pci_stop_root_bus()
> in the .remove() callback, not in the .shutdown() callback.
> 

And this driver is special because, it has no 'remove()' callback as it
implements an irqchip controller. So we have to shutdown the devices cleanly in
the 'shutdown' callback.

Also do note that the driver core will not call the 'remove()' callback unless
the driver as a module is unloaded during poweroff/reboot scenarios. So the
controller drivers need to properly power down the devices in their 'shutdown()'
callback IMO.

> Doing things differently from all other PCIe controller drivers is usually
> a red flag.
> 

Yes, even if it is the right thing to do ;) But I agree that the commit message
needs some improvement.

- Mani

Heiner Kallweit April 18, 2025, 9:52 p.m. UTC | #7

On 18.04.2025 19:19, Manivannan Sadhasivam wrote:
> On Wed, Apr 16, 2025 at 06:03:36PM +0200, Niklas Cassel wrote:
>> Hello Krishna Chaitanya,
>>
>> On Wed, Apr 16, 2025 at 09:15:19PM +0530, Krishna Chaitanya Chundru wrote:
>>> On 4/16/2025 7:43 PM, Niklas Cassel wrote:
>>>>
>>>> So perhaps we should hold off with this patch.
>>>>
>>> I disagree on this, it might be causing issue with net driver, but we
>>> might face some other issues as explained above if we don't call
>>> pci_stop_root_bus().
>>
>> When I wrote hold off with this patch, I meant the patch in $subject,
>> not your patch.
>>
>>
>> When it comes to your patch, I think that the commit log needs to explain
>> why it is so special.
>>
>> Because AFAICT, all other PCIe controller drivers call pci_stop_root_bus()
>> in the .remove() callback, not in the .shutdown() callback.
>>
> 
> And this driver is special because, it has no 'remove()' callback as it
> implements an irqchip controller. So we have to shutdown the devices cleanly in
> the 'shutdown' callback.
> 
Doing proper cleanup in a first place is responsibility of the respective bus
devices (in their shutdown() callback).

Calling pci_stop_root_bus() in the pci controllers shutdown() causes the bus
devices to be removed, hence their remove() is called. Typically devices
don't expect that remove() is called after shutdown(). This may cause issues
if e.g. shutdown() sets a device to D3cold. remove() won't expect that device
is inaccessible.

Another issue is with devices being wake-up sources. If wake-up is enabled,
then devices configure the wake-up in their shutdown() callback. Calling
remove() afterwards may reverse parts of the wake-up configuration.
And I'd expect that most wakeup-capable device disable wakeup in the
remove() callback. So there's a good chance that the proposed change breaks
wake-up.

There maybe other side effects I'm not aware of.

IMO a controller drivers shutdown() shall focus on cleanup up its own resources.

> Also do note that the driver core will not call the 'remove()' callback unless
> the driver as a module is unloaded during poweroff/reboot scenarios. So the
> controller drivers need to properly power down the devices in their 'shutdown()'
> callback IMO.
> 
>> Doing things differently from all other PCIe controller drivers is usually
>> a red flag.
>>
> 
> Yes, even if it is the right thing to do ;) But I agree that the commit message
> needs some improvement.
> 
> - Mani
>

Heiner Kallweit April 19, 2025, 10:18 a.m. UTC | #8

On 18.04.2025 23:52, Heiner Kallweit wrote:
> On 18.04.2025 19:19, Manivannan Sadhasivam wrote:
>> On Wed, Apr 16, 2025 at 06:03:36PM +0200, Niklas Cassel wrote:
>>> Hello Krishna Chaitanya,
>>>
>>> On Wed, Apr 16, 2025 at 09:15:19PM +0530, Krishna Chaitanya Chundru wrote:
>>>> On 4/16/2025 7:43 PM, Niklas Cassel wrote:
>>>>>
>>>>> So perhaps we should hold off with this patch.
>>>>>
>>>> I disagree on this, it might be causing issue with net driver, but we
>>>> might face some other issues as explained above if we don't call
>>>> pci_stop_root_bus().
>>>
>>> When I wrote hold off with this patch, I meant the patch in $subject,
>>> not your patch.
>>>
>>>
>>> When it comes to your patch, I think that the commit log needs to explain
>>> why it is so special.
>>>
>>> Because AFAICT, all other PCIe controller drivers call pci_stop_root_bus()
>>> in the .remove() callback, not in the .shutdown() callback.
>>>
>>
>> And this driver is special because, it has no 'remove()' callback as it
>> implements an irqchip controller. So we have to shutdown the devices cleanly in
>> the 'shutdown' callback.
>>
> Doing proper cleanup in a first place is responsibility of the respective bus
> devices (in their shutdown() callback).
> 
> Calling pci_stop_root_bus() in the pci controllers shutdown() causes the bus
> devices to be removed, hence their remove() is called. Typically devices
> don't expect that remove() is called after shutdown(). This may cause issues
> if e.g. shutdown() sets a device to D3cold. remove() won't expect that device
> is inaccessible.
> 
> Another issue is with devices being wake-up sources. If wake-up is enabled,
> then devices configure the wake-up in their shutdown() callback. Calling
> remove() afterwards may reverse parts of the wake-up configuration.
> And I'd expect that most wakeup-capable device disable wakeup in the
> remove() callback. So there's a good chance that the proposed change breaks
> wake-up.
> 
> There maybe other side effects I'm not aware of.
> 
> IMO a controller drivers shutdown() shall focus on cleanup up its own resources.
> 
>> Also do note that the driver core will not call the 'remove()' callback unless
>> the driver as a module is unloaded during poweroff/reboot scenarios. So the
>> controller drivers need to properly power down the devices in their 'shutdown()'
>> callback IMO.
>>
>>> Doing things differently from all other PCIe controller drivers is usually
>>> a red flag.
>>>
>>
>> Yes, even if it is the right thing to do ;) But I agree that the commit message
>> needs some improvement.
>>
>> - Mani
>>
> 
+Bjorn, as this is primarily a PCI topic

r8169: do not call rtl8169_down() twice on shutdown

Checks

Commit Message

Comments

Patch