diff mbox series

[v5,4/4] PCI: Allow PCI bridges to go to D3Hot on all Devicetree based platforms

Message ID 20240802-pci-bridge-d3-v5-4-2426dd9e8e27@linaro.org (mailing list archive)
State Changes Requested
Headers show
Series PCI: Allow D3Hot for PCI bridges in Devicetree based platforms | expand

Commit Message

Manivannan Sadhasivam via B4 Relay Aug. 2, 2024, 5:55 a.m. UTC
From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

Unlike ACPI based platforms, there are no known issues with D3Hot for the
PCI bridges in the Devicetree based platforms. So let's allow the PCI
bridges to go to D3Hot during runtime. It should be noted that the bridges
need to be defined in Devicetree for this to work.

Currently, D3Cold is not allowed since Vcc supply which is required for
transitioning the device to D3Cold is not exposed on all Devicetree based
platforms.

Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
---
 drivers/pci/pci.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

Comments

Lukas Wunner Aug. 2, 2024, 10:13 a.m. UTC | #1
On Fri, Aug 02, 2024 at 11:25:03AM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> Unlike ACPI based platforms, there are no known issues with D3Hot for the
> PCI bridges in the Devicetree based platforms. So let's allow the PCI
> bridges to go to D3Hot during runtime. It should be noted that the bridges
> need to be defined in Devicetree for this to work.
[...]
> +		if (state == PCI_D3hot && dev_of_node(&bridge->dev))
> +			return true;

For such a simple change which several parties are interested in,
I think it would be better to move it to the front of the series.

Patch [1/4] looks like an optimization (using a cached value)
which this patch doesn't depend on.  Patch [2/4] looks like a
change of bikeshed color which isn't strictly necessary for
this fourth patch either.  If you want to propose those changes,
fine, but by making this fourth patch depend on them, you risk
delaying its acceptance.  As an upstreaming strategy it's usually
smarter to move potentially controversial or unnecessary material
to the back of the series, or submit it separately if it can be
applied standalone.


> Currently, D3Cold is not allowed since Vcc supply which is required for
> transitioning the device to D3Cold is not exposed on all Devicetree based
> platforms.

The PCI core cannot put devices into D3cold without help from the
platform.  Checking whether D3cold is possible (or allowed or
whatever) thus requires asking platform support code via
platform_pci_power_manageable(), platform_pci_choose_state() etc.

I think patch [3/4] is a little confusing because it creates
infrastructure to decide whether D3cold is supported (allowed?)
but we already have that in the platform_pci_*() functions.
So I'm not sure if patch [3/4] adds value.  I think generally
speaking if D3hot isn't possible (allowed?), D3cold is assumed
to not be possible either.

Thanks,

Lukas
Manivannan Sadhasivam Aug. 5, 2024, 1:35 p.m. UTC | #2
On Fri, Aug 02, 2024 at 12:13:31PM +0200, Lukas Wunner wrote:
> On Fri, Aug 02, 2024 at 11:25:03AM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> > Unlike ACPI based platforms, there are no known issues with D3Hot for the
> > PCI bridges in the Devicetree based platforms. So let's allow the PCI
> > bridges to go to D3Hot during runtime. It should be noted that the bridges
> > need to be defined in Devicetree for this to work.
> [...]
> > +		if (state == PCI_D3hot && dev_of_node(&bridge->dev))
> > +			return true;
> 
> For such a simple change which several parties are interested in,
> I think it would be better to move it to the front of the series.
> 
> Patch [1/4] looks like an optimization (using a cached value)
> which this patch doesn't depend on.  Patch [2/4] looks like a
> change of bikeshed color which isn't strictly necessary for
> this fourth patch either.  If you want to propose those changes,
> fine, but by making this fourth patch depend on them, you risk
> delaying its acceptance.  As an upstreaming strategy it's usually
> smarter to move potentially controversial or unnecessary material
> to the back of the series, or submit it separately if it can be
> applied standalone.
> 

Agree with you! Even after doing upstreaming for this much time, I tend to
ignore this...

> 
> > Currently, D3Cold is not allowed since Vcc supply which is required for
> > transitioning the device to D3Cold is not exposed on all Devicetree based
> > platforms.
> 
> The PCI core cannot put devices into D3cold without help from the
> platform.  Checking whether D3cold is possible (or allowed or
> whatever) thus requires asking platform support code via
> platform_pci_power_manageable(), platform_pci_choose_state() etc.
> 
> I think patch [3/4] is a little confusing because it creates
> infrastructure to decide whether D3cold is supported (allowed?)
> but we already have that in the platform_pci_*() functions.
> So I'm not sure if patch [3/4] adds value.  I think generally
> speaking if D3hot isn't possible (allowed?), D3cold is assumed
> to not be possible either.
> 

Why? D3Hot is useful for runtime PM and if the platform doesn't want to do
runtime PM, it can always skip D3Hot (not ideal though). But D3Cold is a power
off state, and the platform may choose to use it for the case of system suspend.

So I still feel that decoupling D3Hot and D3Cold is necessary.

- Mani
Lukas Wunner Aug. 6, 2024, 6:53 a.m. UTC | #3
On Mon, Aug 05, 2024 at 07:05:55PM +0530, Manivannan Sadhasivam wrote:
> On Fri, Aug 02, 2024 at 12:13:31PM +0200, Lukas Wunner wrote:
> > The PCI core cannot put devices into D3cold without help from the
> > platform.  Checking whether D3cold is possible (or allowed or
> > whatever) thus requires asking platform support code via
> > platform_pci_power_manageable(), platform_pci_choose_state() etc.
> > 
> > I think patch [3/4] is a little confusing because it creates
> > infrastructure to decide whether D3cold is supported (allowed?)
> > but we already have that in the platform_pci_*() functions.
> > So I'm not sure if patch [3/4] adds value.  I think generally
> > speaking if D3hot isn't possible (allowed?), D3cold is assumed
> > to not be possible either.
> 
> Why? D3Hot is useful for runtime PM and if the platform doesn't want to do
> runtime PM, it can always skip D3Hot (not ideal though).

AFAICS we always program the device to go to D3hot and the platform
then cuts power, thereby putting it into D3cold.  So D3hot is never
skipped.  See __pci_set_power_state():

	if (state == PCI_D3cold) {
		/*
		 * To put the device in D3cold, put it into D3hot in the native
		 * way, then put it into D3cold using platform ops.
		 */
		error = pci_set_low_power_state(dev, PCI_D3hot, locked);

		if (pci_platform_power_transition(dev, PCI_D3cold))
			return error;

Thanks,

Lukas
Manivannan Sadhasivam Aug. 6, 2024, 12:41 p.m. UTC | #4
On Tue, Aug 06, 2024 at 08:53:01AM +0200, Lukas Wunner wrote:
> On Mon, Aug 05, 2024 at 07:05:55PM +0530, Manivannan Sadhasivam wrote:
> > On Fri, Aug 02, 2024 at 12:13:31PM +0200, Lukas Wunner wrote:
> > > The PCI core cannot put devices into D3cold without help from the
> > > platform.  Checking whether D3cold is possible (or allowed or
> > > whatever) thus requires asking platform support code via
> > > platform_pci_power_manageable(), platform_pci_choose_state() etc.
> > > 
> > > I think patch [3/4] is a little confusing because it creates
> > > infrastructure to decide whether D3cold is supported (allowed?)
> > > but we already have that in the platform_pci_*() functions.
> > > So I'm not sure if patch [3/4] adds value.  I think generally
> > > speaking if D3hot isn't possible (allowed?), D3cold is assumed
> > > to not be possible either.
> > 
> > Why? D3Hot is useful for runtime PM and if the platform doesn't want to do
> > runtime PM, it can always skip D3Hot (not ideal though).
> 
> AFAICS we always program the device to go to D3hot and the platform
> then cuts power, thereby putting it into D3cold.  So D3hot is never
> skipped.  See __pci_set_power_state():
> 
> 	if (state == PCI_D3cold) {
> 		/*
> 		 * To put the device in D3cold, put it into D3hot in the native
> 		 * way, then put it into D3cold using platform ops.
> 		 */
> 		error = pci_set_low_power_state(dev, PCI_D3hot, locked);
> 
> 		if (pci_platform_power_transition(dev, PCI_D3cold))
> 			return error;
> 

This is applicable only to pci_set_power_state(), but AFAIK PCIe spec doesn't
mandate switching to D3Hot for entering D3Cold. So the PCIe host controller
drivers (especically non-ACPI platforms) may just send PME_Turn_Off followed by
removing the slot power (which again is not controlled by pci_set_power_state())
as there are no non-ACPI related hooks as of now.

- Mani
Lukas Wunner Aug. 6, 2024, 1:02 p.m. UTC | #5
On Tue, Aug 06, 2024 at 06:11:07PM +0530, Manivannan Sadhasivam wrote:
> On Tue, Aug 06, 2024 at 08:53:01AM +0200, Lukas Wunner wrote:
> > AFAICS we always program the device to go to D3hot and the platform
> > then cuts power, thereby putting it into D3cold.  So D3hot is never
> > skipped.  See __pci_set_power_state():
> > 
> > 	if (state == PCI_D3cold) {
> > 		/*
> > 		 * To put the device in D3cold, put it into D3hot in the native
> > 		 * way, then put it into D3cold using platform ops.
> > 		 */
> > 		error = pci_set_low_power_state(dev, PCI_D3hot, locked);
> > 
> > 		if (pci_platform_power_transition(dev, PCI_D3cold))
> > 			return error;
> > 
> 
> This is applicable only to pci_set_power_state(), but AFAIK PCIe spec
> doesn't mandate switching to D3Hot for entering D3Cold.

Per PCI Bus Power Management Interface Specification r1.2 sec 5.5 fig 5-1,
the only supported state transition to D3cold is from D3hot.

Per PCIe r6.2 sec 5.2, "PM is compatible with the PCI Bus Power Management
Interface Specification".

Granted, PCI-PM is an ancient spec, so I think anyone can be forgiven
for not knowing its intricacies off-the-cuff. :)


> So the PCIe host controller drivers (especically non-ACPI platforms)
> may just send PME_Turn_Off followed by removing the slot power
> (which again is not controlled by pci_set_power_state())
> as there are no non-ACPI related hooks as of now.

Ideally, devicetree-based platforms should be brought into the
platform_pci_*() fold to align them with ACPI and get common
behavior across all platforms.

Thanks,

Lukas
Manivannan Sadhasivam Aug. 6, 2024, 2:39 p.m. UTC | #6
On Tue, Aug 06, 2024 at 03:02:39PM +0200, Lukas Wunner wrote:
> On Tue, Aug 06, 2024 at 06:11:07PM +0530, Manivannan Sadhasivam wrote:
> > On Tue, Aug 06, 2024 at 08:53:01AM +0200, Lukas Wunner wrote:
> > > AFAICS we always program the device to go to D3hot and the platform
> > > then cuts power, thereby putting it into D3cold.  So D3hot is never
> > > skipped.  See __pci_set_power_state():
> > > 
> > > 	if (state == PCI_D3cold) {
> > > 		/*
> > > 		 * To put the device in D3cold, put it into D3hot in the native
> > > 		 * way, then put it into D3cold using platform ops.
> > > 		 */
> > > 		error = pci_set_low_power_state(dev, PCI_D3hot, locked);
> > > 
> > > 		if (pci_platform_power_transition(dev, PCI_D3cold))
> > > 			return error;
> > > 
> > 
> > This is applicable only to pci_set_power_state(), but AFAIK PCIe spec
> > doesn't mandate switching to D3Hot for entering D3Cold.
> 
> Per PCI Bus Power Management Interface Specification r1.2 sec 5.5 fig 5-1,
> the only supported state transition to D3cold is from D3hot.
> 
> Per PCIe r6.2 sec 5.2, "PM is compatible with the PCI Bus Power Management
> Interface Specification".
> 
> Granted, PCI-PM is an ancient spec, so I think anyone can be forgiven
> for not knowing its intricacies off-the-cuff. :)
> 

Ah, the grand old PCI-PM... I don't remember the last time I looked into it :)

> 
> > So the PCIe host controller drivers (especically non-ACPI platforms)
> > may just send PME_Turn_Off followed by removing the slot power
> > (which again is not controlled by pci_set_power_state())
> > as there are no non-ACPI related hooks as of now.
> 
> Ideally, devicetree-based platforms should be brought into the
> platform_pci_*() fold to align them with ACPI and get common
> behavior across all platforms.
> 

Yeah, that would be the ideal case. Unfortunately, there is no ideal ground for
DT :/ We do not even have the supplies populated properly. But with the advent
of power sequencing framework, I think this can be fixed.

Regarding your comment on patch 3/4, we already have the sysfs attribute to
control whether the device can be put into D3Cold or not and that is directly
coming from userspace. So there is no guarantee to assume that D3Hot support is
considered.

- Mani
Lukas Wunner Aug. 6, 2024, 8:20 p.m. UTC | #7
On Tue, Aug 06, 2024 at 08:09:18PM +0530, Manivannan Sadhasivam wrote:
> Regarding your comment on patch 3/4, we already have the sysfs attribute
> to control whether the device can be put into D3Cold or not and that is
> directly coming from userspace. So there is no guarantee to assume that
> D3Hot support is considered.

If a device is not allowed to be suspended to D3cold, it will only be
suspended to D3hot.

If a port is put into anything deeper than D0, its secondary bus is
no longer in B0 (PCI-PM r1.2 table 6-1) and children are inaccessible,
so they're "effectively" in D3cold.  Hence if a device cannot be
suspended to D3cold, it will block all parent bridges from being
suspended.  That's what the logic in pci_bridge_d3_update() and
pci_dev_check_d3cold() is doing.

The d3cold_allowed attribute in sysfs is just one of several reasons
why a device may not go to D3cold (see pci_dev_check_d3cold() for
details).

The d3cold_allowed attribute was originally intended to disable D3cold
on devices where it was known to not work.  Nowadays this should all
be handled automatically, which is why we've discussed moving the
attribute to debugfs:

https://lore.kernel.org/all/20230918132424.GA11357@wunner.de/
https://lore.kernel.org/all/20231002181025.82746-1-mario.limonciello@amd.com/

Thanks,

Lukas
Hsin-Yi Wang Aug. 6, 2024, 8:58 p.m. UTC | #8
On Tue, Aug 6, 2024 at 7:39 AM Manivannan Sadhasivam
<manivannan.sadhasivam@linaro.org> wrote:
>
> On Tue, Aug 06, 2024 at 03:02:39PM +0200, Lukas Wunner wrote:
> > On Tue, Aug 06, 2024 at 06:11:07PM +0530, Manivannan Sadhasivam wrote:
> > > On Tue, Aug 06, 2024 at 08:53:01AM +0200, Lukas Wunner wrote:
> > > > AFAICS we always program the device to go to D3hot and the platform
> > > > then cuts power, thereby putting it into D3cold.  So D3hot is never
> > > > skipped.  See __pci_set_power_state():
> > > >
> > > >   if (state == PCI_D3cold) {
> > > >           /*
> > > >            * To put the device in D3cold, put it into D3hot in the native
> > > >            * way, then put it into D3cold using platform ops.
> > > >            */
> > > >           error = pci_set_low_power_state(dev, PCI_D3hot, locked);
> > > >
> > > >           if (pci_platform_power_transition(dev, PCI_D3cold))
> > > >                   return error;
> > > >
> > >
> > > This is applicable only to pci_set_power_state(), but AFAIK PCIe spec
> > > doesn't mandate switching to D3Hot for entering D3Cold.
> >
> > Per PCI Bus Power Management Interface Specification r1.2 sec 5.5 fig 5-1,
> > the only supported state transition to D3cold is from D3hot.
> >
> > Per PCIe r6.2 sec 5.2, "PM is compatible with the PCI Bus Power Management
> > Interface Specification".
> >
> > Granted, PCI-PM is an ancient spec, so I think anyone can be forgiven
> > for not knowing its intricacies off-the-cuff. :)
> >
>
> Ah, the grand old PCI-PM... I don't remember the last time I looked into it :)
>
> >
> > > So the PCIe host controller drivers (especically non-ACPI platforms)
> > > may just send PME_Turn_Off followed by removing the slot power
> > > (which again is not controlled by pci_set_power_state())
> > > as there are no non-ACPI related hooks as of now.
> >
> > Ideally, devicetree-based platforms should be brought into the
> > platform_pci_*() fold to align them with ACPI and get common
> > behavior across all platforms.
> >
>
> Yeah, that would be the ideal case. Unfortunately, there is no ideal ground for
> DT :/ We do not even have the supplies populated properly. But with the advent
> of power sequencing framework, I think this can be fixed.
>

Looking in acpi_pci_bridge_d3(), it has several checkings about
whether d3 is supported, including reading power_manageable flag
(acpi_device_power_manageable) and reading the root port property.
For DT, does it make sense to have a chosen property about this?

> Regarding your comment on patch 3/4, we already have the sysfs attribute to
> control whether the device can be put into D3Cold or not and that is directly
> coming from userspace. So there is no guarantee to assume that D3Hot support is
> considered.
>
> - Mani
>
> --
> மணிவண்ணன் சதாசிவம்
Manivannan Sadhasivam Aug. 19, 2024, 3:34 p.m. UTC | #9
On Tue, Aug 06, 2024 at 10:20:39PM +0200, Lukas Wunner wrote:
> On Tue, Aug 06, 2024 at 08:09:18PM +0530, Manivannan Sadhasivam wrote:
> > Regarding your comment on patch 3/4, we already have the sysfs attribute
> > to control whether the device can be put into D3Cold or not and that is
> > directly coming from userspace. So there is no guarantee to assume that
> > D3Hot support is considered.
> 
> If a device is not allowed to be suspended to D3cold, it will only be
> suspended to D3hot.
> 
> If a port is put into anything deeper than D0, its secondary bus is
> no longer in B0 (PCI-PM r1.2 table 6-1) and children are inaccessible,
> so they're "effectively" in D3cold.  Hence if a device cannot be
> suspended to D3cold, it will block all parent bridges from being
> suspended.  That's what the logic in pci_bridge_d3_update() and
> pci_dev_check_d3cold() is doing.
> 

Agree.

But patch 3/4 is mostly based on the suggestion from Bjorn [1] for earlier
revision. He specifically mentioned that the platform_pci_bridge_d3() function
doesn't differentiate between D3Hot and D3Cold and that's why I splitted them:

"These are two vastly different scenarios, and I would really like to
untangle them so they aren't conflated.  I see that you're extending
platform_pci_bridge_d3(), which apparently has that conflation baked
into it already, but my personal experience is that this is really
hard to maintain."

I agree with your point that if D3Hot is not possible, then D3Cold is also not
possible as per the PCI PM reference you quoted. But here, D3Hot is possible,
but D3Cold is not. And platform_pci_power_manageable(),
platform_pci_choose_state() are already returning false for DT platforms.

So if 'true' is returned from platform_pci_bridge_d3(), then it implies that
D3Cold is also supported, but it doesn't on DT platforms. So a split seems to be
necessary IMO.

- Mani

[1] https://lore.kernel.org/linux-pci/20240221182000.GA1533634@bhelgaas/

> The d3cold_allowed attribute in sysfs is just one of several reasons
> why a device may not go to D3cold (see pci_dev_check_d3cold() for
> details).
> 
> The d3cold_allowed attribute was originally intended to disable D3cold
> on devices where it was known to not work.  Nowadays this should all
> be handled automatically, which is why we've discussed moving the
> attribute to debugfs:
> 
> https://lore.kernel.org/all/20230918132424.GA11357@wunner.de/
> https://lore.kernel.org/all/20231002181025.82746-1-mario.limonciello@amd.com/
> 
> Thanks,
> 
> Lukas
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index c7a4f961ec28..bc1e1ca673f1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2992,6 +2992,18 @@  static bool pci_bridge_d3_allowed(struct pci_dev *bridge, pci_power_t state)
 		if (pci_bridge_d3_force)
 			return true;
 
+		/*
+		 * Allow D3Hot for all Devicetree based platforms having a
+		 * separate node for the bridge. We don't allow D3Cold for now
+		 * since not all platforms are exposing the Vcc supply in
+		 * Devicetree which is required for transitioning the bridge to
+		 * D3Cold.
+		 *
+		 * NOTE: The bridge is expected to be defined in Devicetree.
+		 */
+		if (state == PCI_D3hot && dev_of_node(&bridge->dev))
+			return true;
+
 		/* Even the oldest 2010 Thunderbolt controller supports D3. */
 		if (bridge->is_thunderbolt)
 			return true;
@@ -3042,7 +3054,7 @@  bool pci_bridge_d3cold_allowed(struct pci_dev *bridge)
  *
  * This function checks if the bridge is allowed to move to D3Hot.
  * Currently we only allow D3Hot for recent enough PCIe ports on ACPI based
- * platforms and Thunderbolt.
+ * platforms, Thunderbolt and Devicetree based platforms.
  */
 bool pci_bridge_d3hot_allowed(struct pci_dev *bridge)
 {