diff mbox series

[v7,2/2] PCI/PM: Disable PTM on all devices

Message ID 20220609001007.533242-2-rajvi.jingar@linux.intel.com (mailing list archive)
State Handled Elsewhere, archived
Headers show
Series [v7,1/2] PCI/PM: refactor pci_pm_suspend_noirq() | expand

Commit Message

rjingar June 9, 2022, 12:10 a.m. UTC
On receiving a PTM Request from a downstream device, if PTM is disabled
on the root port, as per PCIe specification, such request would cause
an Unsupported Request error. So disable PTM for any downstream devices.
PTM state needs to be saved before disabling it to be restored later.

Set ptm_enabled from 'struct pci_dev' to 0 in pci_ptm_disable() and
it is used in pci_save_state() before saving PTM state to avoid
double save.

Fixes: a697f072f5da ("PCI: Disable PTM during suspend to save power")
Signed-off-by: Rajvi Jingar <rajvi.jingar@linux.intel.com>
Suggested-by: David E. Box <david.e.box@linux.intel.com>
---
 v1 -> v2: add Fixes tag in commit message
 v2 -> v3: move changelog after "---" marker
 v3 -> v4: add "---" marker after changelog
 v4 -> v5: move pci_disable_ptm() out of the pci_dev->state_saved check.
	   disable PTM for all devices, not just root ports.
 v5 -> v6: move pci_disable_ptm() to pci_pm_suspend()
	   set pci_dev->ptm_enabled to 0 in pci_ptm_disable() and it is
	   used in pci_save_state() before saving PTM state to avoid
	   double save.
 v6 -> v7: add #ifdef CONFIG_PCIE_PTM in pci_save_state() before saving
	   PTM state
---
 drivers/pci/pci-driver.c | 21 ++++++++++++++++++++-
 drivers/pci/pci.c        | 28 +++++++++++++---------------
 drivers/pci/pcie/ptm.c   |  1 +
 3 files changed, 34 insertions(+), 16 deletions(-)

Comments

Bjorn Helgaas June 11, 2022, 12:12 a.m. UTC | #1
On Wed, Jun 08, 2022 at 05:10:07PM -0700, Rajvi Jingar wrote:
> On receiving a PTM Request from a downstream device, if PTM is disabled
> on the root port, as per PCIe specification, such request would cause
> an Unsupported Request error. So disable PTM for any downstream devices.
> PTM state needs to be saved before disabling it to be restored later.
> 
> Set ptm_enabled from 'struct pci_dev' to 0 in pci_ptm_disable() and
> it is used in pci_save_state() before saving PTM state to avoid
> double save.
> 
> Fixes: a697f072f5da ("PCI: Disable PTM during suspend to save power")
> Signed-off-by: Rajvi Jingar <rajvi.jingar@linux.intel.com>
> Suggested-by: David E. Box <david.e.box@linux.intel.com>
> ---
>  v1 -> v2: add Fixes tag in commit message
>  v2 -> v3: move changelog after "---" marker
>  v3 -> v4: add "---" marker after changelog
>  v4 -> v5: move pci_disable_ptm() out of the pci_dev->state_saved check.
> 	   disable PTM for all devices, not just root ports.
>  v5 -> v6: move pci_disable_ptm() to pci_pm_suspend()
> 	   set pci_dev->ptm_enabled to 0 in pci_ptm_disable() and it is
> 	   used in pci_save_state() before saving PTM state to avoid
> 	   double save.
>  v6 -> v7: add #ifdef CONFIG_PCIE_PTM in pci_save_state() before saving
> 	   PTM state
> ---
>  drivers/pci/pci-driver.c | 21 ++++++++++++++++++++-
>  drivers/pci/pci.c        | 28 +++++++++++++---------------
>  drivers/pci/pcie/ptm.c   |  1 +
>  3 files changed, 34 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 1f64de3e5280..db4d7835d7ae 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -803,14 +803,33 @@ static int pci_pm_suspend(struct device *dev)
>  		pci_dev_adjust_pme(pci_dev);
>  	}
>  
> +	/*
> +	 * If a PTM Requester is put in a low-power state, a PTM Responder
> +	 * upstream from it may also be put in a low-power state. Putting a
> +	 * Port in D1, D2, or D3hot does not prohibit it from sending or
> +	 * responding to PTM Requests. We want to disable PTM on Responders
> +	 * when they are in a low-power state. Per 6.21.3, a PTM Requester
> +	 * must not be enabled when the upstream PTM Responder is disabled.
> +	 * Therefore, we must disable all PTM on all downstream PTM
> +	 * Requesters before disabling it on the PTM Responder, e.g., a Root
> +	 * Port.
> +	 *
> +	 * Also, to restore the PTM state, it needs to be saved before
> +	 * disabling it for all devices.
> +	 */
> +	pci_save_ptm_state(pci_dev);
> +	pci_disable_ptm(pci_dev);

I think this is a little bit too magical.  The PTM disable doesn't
really fit here in pci_pm_suspend().  It's more like the wakeup
configuration done by pci_pm_suspend_noirq() in
pci_prepare_to_sleep().

IIUC, the reason it's here in pci_pm_suspend() is because of the weird
nvme thing where nvme_suspend() puts the device in a device-specific
low-power flavor of D0 and subsequent config accesses take it out of
that low-power situation [1].

I don't think this is a maintainable situation because there's nothing
about this pci_disable_ptm() that says "this cannot be done after
pm->suspend()".  That's a completely nvme-specific thing that we can't
deduce from the code and are likely to break in the future.

We *do* have the rule that if the driver sets pdev->state_saved
(normally by calling pci_save_state()), it means the driver is
responsible for *all* the device state, even the standard config space
that the PCI core would normally handle.

When the driver does set pdev->state_saved, I don't think
pci_pm_suspend_noirq() actually touches the device itself, and I'm a
little more comfortable relying on that assumption.

If this nvme weirdness plays a part here, I think the commit log and
probably a comment really should mention what's going on because it's
just feels fragile.

[1] https://lore.kernel.org/r/CAJZ5v0iNaAd=yP3DgDVVpffKU6kt+nSpPeqxWJyRddaX5K4FRA@mail.gmail.com

>  	if (pm->suspend) {
>  		pci_power_t prev = pci_dev->current_state;
>  		int error;
>  
>  		error = pm->suspend(dev);
>  		suspend_report_result(dev, pm->suspend, error);
> -		if (error)
> +		if (error) {
> +			pci_restore_ptm_state(pci_dev);
>  			return error;
> +		}
>  
>  		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
>  		    && pci_dev->current_state != PCI_UNKNOWN) {
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index cfaf40a540a8..3e9dcb1bbffa 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1669,7 +1669,15 @@ int pci_save_state(struct pci_dev *dev)
>  	pci_save_ltr_state(dev);
>  	pci_save_dpc_state(dev);
>  	pci_save_aer_state(dev);
> -	pci_save_ptm_state(dev);
> +#ifdef CONFIG_PCIE_PTM
> +	/*
> +	 * PCI PM core disables PTM during suspend and saves PTM state before
> +	 * that to be able to restore the ptm state restored later. So PCI core
> +	 * needs this check to avoid double save.
> +	 */
> +	if (dev->ptm_enabled)
> +		pci_save_ptm_state(dev);
> +#endif

This ptm_enabled check doesn't fit with the rest of the function and
the semantics are fairly complicated.

>  	return pci_save_vc_state(dev);
>  }
>  EXPORT_SYMBOL(pci_save_state);
> @@ -2710,24 +2718,12 @@ int pci_prepare_to_sleep(struct pci_dev *dev)
>  	if (target_state == PCI_POWER_ERROR)
>  		return -EIO;
>  
> -	/*
> -	 * There are systems (for example, Intel mobile chips since Coffee
> -	 * Lake) where the power drawn while suspended can be significantly
> -	 * reduced by disabling PTM on PCIe root ports as this allows the
> -	 * port to enter a lower-power PM state and the SoC to reach a
> -	 * lower-power idle state as a whole.
> -	 */
> -	if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
> -		pci_disable_ptm(dev);
> -
>  	pci_enable_wake(dev, target_state, wakeup);
>  
>  	error = pci_set_power_state(dev, target_state);
>  
> -	if (error) {
> +	if (error)
>  		pci_enable_wake(dev, target_state, false);
> -		pci_restore_ptm_state(dev);
> -	}
>  
>  	return error;
>  }
> @@ -2775,8 +2771,10 @@ int pci_finish_runtime_suspend(struct pci_dev *dev)
>  	 * port to enter a lower-power PM state and the SoC to reach a
>  	 * lower-power idle state as a whole.
>  	 */
> -	if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
> +	if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) {
> +		pci_save_ptm_state(dev);
>  		pci_disable_ptm(dev);
> +	}
>  
>  	__pci_enable_wake(dev, target_state, pci_dev_run_wake(dev));
>  
> diff --git a/drivers/pci/pcie/ptm.c b/drivers/pci/pcie/ptm.c
> index 368a254e3124..746e29779c27 100644
> --- a/drivers/pci/pcie/ptm.c
> +++ b/drivers/pci/pcie/ptm.c
> @@ -44,6 +44,7 @@ void pci_disable_ptm(struct pci_dev *dev)
>  	pci_read_config_word(dev, ptm + PCI_PTM_CTRL, &ctrl);
>  	ctrl &= ~(PCI_PTM_CTRL_ENABLE | PCI_PTM_CTRL_ROOT);
>  	pci_write_config_word(dev, ptm + PCI_PTM_CTRL, ctrl);
> +	dev->ptm_enabled = 0;

This looks like a bug fix that could be in a separate patch.

>  }
>  
>  void pci_save_ptm_state(struct pci_dev *dev)


I think something like the sketch below would fit better in the power
management framework.  PTM disable is closely related to device power
states, so I tried to put it as close as possible to the power state
transitions.  I'm sure there are things missing and things I'm
overlooking:

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index cfaf40a540a8..4dcd0c7381b9 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2705,28 +2705,21 @@ int pci_prepare_to_sleep(struct pci_dev *dev)
 {
 	bool wakeup = device_may_wakeup(&dev->dev);
 	pci_power_t target_state = pci_target_state(dev, wakeup);
+	bool ptm = pcie_ptm_enabled(dev);
 	int error;
 
 	if (target_state == PCI_POWER_ERROR)
 		return -EIO;
 
-	/*
-	 * There are systems (for example, Intel mobile chips since Coffee
-	 * Lake) where the power drawn while suspended can be significantly
-	 * reduced by disabling PTM on PCIe root ports as this allows the
-	 * port to enter a lower-power PM state and the SoC to reach a
-	 * lower-power idle state as a whole.
-	 */
-	if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
-		pci_disable_ptm(dev);
-
+	pci_disable_ptm(dev);
 	pci_enable_wake(dev, target_state, wakeup);
 
 	error = pci_set_power_state(dev, target_state);
 
 	if (error) {
 		pci_enable_wake(dev, target_state, false);
-		pci_restore_ptm_state(dev);
+		if (ptm)
+			pci_enable_ptm(dev);
 	}
 
 	return error;
@@ -2762,6 +2755,7 @@ EXPORT_SYMBOL(pci_back_from_sleep);
 int pci_finish_runtime_suspend(struct pci_dev *dev)
 {
 	pci_power_t target_state;
+	bool ptm = pcie_ptm_enabled(dev);
 	int error;
 
 	target_state = pci_target_state(dev, device_can_wakeup(&dev->dev));
@@ -2778,13 +2772,15 @@ int pci_finish_runtime_suspend(struct pci_dev *dev)
 	if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
 		pci_disable_ptm(dev);
 
+	pci_disable_ptm(dev);
 	__pci_enable_wake(dev, target_state, pci_dev_run_wake(dev));
 
 	error = pci_set_power_state(dev, target_state);
 
 	if (error) {
 		pci_enable_wake(dev, target_state, false);
-		pci_restore_ptm_state(dev);
+		if (ptm)
+			pci_enable_ptm(dev);
 	}
 
 	return error;
Rafael J. Wysocki July 7, 2022, 4:17 p.m. UTC | #2
Sorry for the delay here.

On Sat, Jun 11, 2022 at 2:12 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Jun 08, 2022 at 05:10:07PM -0700, Rajvi Jingar wrote:
> > On receiving a PTM Request from a downstream device, if PTM is disabled
> > on the root port, as per PCIe specification, such request would cause
> > an Unsupported Request error. So disable PTM for any downstream devices.
> > PTM state needs to be saved before disabling it to be restored later.
> >
> > Set ptm_enabled from 'struct pci_dev' to 0 in pci_ptm_disable() and
> > it is used in pci_save_state() before saving PTM state to avoid
> > double save.
> >
> > Fixes: a697f072f5da ("PCI: Disable PTM during suspend to save power")
> > Signed-off-by: Rajvi Jingar <rajvi.jingar@linux.intel.com>
> > Suggested-by: David E. Box <david.e.box@linux.intel.com>
> > ---
> >  v1 -> v2: add Fixes tag in commit message
> >  v2 -> v3: move changelog after "---" marker
> >  v3 -> v4: add "---" marker after changelog
> >  v4 -> v5: move pci_disable_ptm() out of the pci_dev->state_saved check.
> >          disable PTM for all devices, not just root ports.
> >  v5 -> v6: move pci_disable_ptm() to pci_pm_suspend()
> >          set pci_dev->ptm_enabled to 0 in pci_ptm_disable() and it is
> >          used in pci_save_state() before saving PTM state to avoid
> >          double save.
> >  v6 -> v7: add #ifdef CONFIG_PCIE_PTM in pci_save_state() before saving
> >          PTM state
> > ---
> >  drivers/pci/pci-driver.c | 21 ++++++++++++++++++++-
> >  drivers/pci/pci.c        | 28 +++++++++++++---------------
> >  drivers/pci/pcie/ptm.c   |  1 +
> >  3 files changed, 34 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > index 1f64de3e5280..db4d7835d7ae 100644
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> > @@ -803,14 +803,33 @@ static int pci_pm_suspend(struct device *dev)
> >               pci_dev_adjust_pme(pci_dev);
> >       }
> >
> > +     /*
> > +      * If a PTM Requester is put in a low-power state, a PTM Responder
> > +      * upstream from it may also be put in a low-power state. Putting a
> > +      * Port in D1, D2, or D3hot does not prohibit it from sending or
> > +      * responding to PTM Requests. We want to disable PTM on Responders
> > +      * when they are in a low-power state. Per 6.21.3, a PTM Requester
> > +      * must not be enabled when the upstream PTM Responder is disabled.
> > +      * Therefore, we must disable all PTM on all downstream PTM
> > +      * Requesters before disabling it on the PTM Responder, e.g., a Root
> > +      * Port.
> > +      *
> > +      * Also, to restore the PTM state, it needs to be saved before
> > +      * disabling it for all devices.
> > +      */
> > +     pci_save_ptm_state(pci_dev);
> > +     pci_disable_ptm(pci_dev);
>
> I think this is a little bit too magical.  The PTM disable doesn't
> really fit here in pci_pm_suspend().  It's more like the wakeup
> configuration done by pci_pm_suspend_noirq() in
> pci_prepare_to_sleep().
>
> IIUC, the reason it's here in pci_pm_suspend() is because of the weird
> nvme thing where nvme_suspend() puts the device in a device-specific
> low-power flavor of D0 and subsequent config accesses take it out of
> that low-power situation [1].
>
> I don't think this is a maintainable situation because there's nothing
> about this pci_disable_ptm() that says "this cannot be done after
> pm->suspend()".  That's a completely nvme-specific thing that we can't
> deduce from the code and are likely to break in the future.

Well, I'm not sure it is nvme-specific really.

Pretty much the same goes for any driver that wants to do their own
power management (whatever it is) in the ->suspend() callback and
indicate that by calling pci_save_state() by itself.

> We *do* have the rule that if the driver sets pdev->state_saved
> (normally by calling pci_save_state()), it means the driver is
> responsible for *all* the device state, even the standard config space
> that the PCI core would normally handle.
>
> When the driver does set pdev->state_saved, I don't think
> pci_pm_suspend_noirq() actually touches the device itself, and I'm a
> little more comfortable relying on that assumption.

It can be relied on right now which is also why the $subject patch
cannot put the PTM disabling in there and do it regardless of the
state_saved value.

> If this nvme weirdness plays a part here, I think the commit log and
> probably a comment really should mention what's going on because it's
> just feels fragile.

Totally agree on that one.

> [1] https://lore.kernel.org/r/CAJZ5v0iNaAd=yP3DgDVVpffKU6kt+nSpPeqxWJyRddaX5K4FRA@mail.gmail.com
>
> >       if (pm->suspend) {
> >               pci_power_t prev = pci_dev->current_state;
> >               int error;
> >
> >               error = pm->suspend(dev);
> >               suspend_report_result(dev, pm->suspend, error);
> > -             if (error)
> > +             if (error) {
> > +                     pci_restore_ptm_state(pci_dev);
> >                       return error;
> > +             }
> >
> >               if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
> >                   && pci_dev->current_state != PCI_UNKNOWN) {
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index cfaf40a540a8..3e9dcb1bbffa 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -1669,7 +1669,15 @@ int pci_save_state(struct pci_dev *dev)
> >       pci_save_ltr_state(dev);
> >       pci_save_dpc_state(dev);
> >       pci_save_aer_state(dev);
> > -     pci_save_ptm_state(dev);
> > +#ifdef CONFIG_PCIE_PTM
> > +     /*
> > +      * PCI PM core disables PTM during suspend and saves PTM state before
> > +      * that to be able to restore the ptm state restored later. So PCI core
> > +      * needs this check to avoid double save.
> > +      */
> > +     if (dev->ptm_enabled)
> > +             pci_save_ptm_state(dev);
> > +#endif
>
> This ptm_enabled check doesn't fit with the rest of the function and
> the semantics are fairly complicated.
>
> >       return pci_save_vc_state(dev);
> >  }
> >  EXPORT_SYMBOL(pci_save_state);
> > @@ -2710,24 +2718,12 @@ int pci_prepare_to_sleep(struct pci_dev *dev)
> >       if (target_state == PCI_POWER_ERROR)
> >               return -EIO;
> >
> > -     /*
> > -      * There are systems (for example, Intel mobile chips since Coffee
> > -      * Lake) where the power drawn while suspended can be significantly
> > -      * reduced by disabling PTM on PCIe root ports as this allows the
> > -      * port to enter a lower-power PM state and the SoC to reach a
> > -      * lower-power idle state as a whole.
> > -      */
> > -     if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
> > -             pci_disable_ptm(dev);
> > -
> >       pci_enable_wake(dev, target_state, wakeup);
> >
> >       error = pci_set_power_state(dev, target_state);
> >
> > -     if (error) {
> > +     if (error)
> >               pci_enable_wake(dev, target_state, false);
> > -             pci_restore_ptm_state(dev);
> > -     }
> >
> >       return error;
> >  }
> > @@ -2775,8 +2771,10 @@ int pci_finish_runtime_suspend(struct pci_dev *dev)
> >        * port to enter a lower-power PM state and the SoC to reach a
> >        * lower-power idle state as a whole.
> >        */
> > -     if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
> > +     if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) {
> > +             pci_save_ptm_state(dev);
> >               pci_disable_ptm(dev);
> > +     }
> >
> >       __pci_enable_wake(dev, target_state, pci_dev_run_wake(dev));
> >
> > diff --git a/drivers/pci/pcie/ptm.c b/drivers/pci/pcie/ptm.c
> > index 368a254e3124..746e29779c27 100644
> > --- a/drivers/pci/pcie/ptm.c
> > +++ b/drivers/pci/pcie/ptm.c
> > @@ -44,6 +44,7 @@ void pci_disable_ptm(struct pci_dev *dev)
> >       pci_read_config_word(dev, ptm + PCI_PTM_CTRL, &ctrl);
> >       ctrl &= ~(PCI_PTM_CTRL_ENABLE | PCI_PTM_CTRL_ROOT);
> >       pci_write_config_word(dev, ptm + PCI_PTM_CTRL, ctrl);
> > +     dev->ptm_enabled = 0;
>
> This looks like a bug fix that could be in a separate patch.
>
> >  }
> >
> >  void pci_save_ptm_state(struct pci_dev *dev)
>
>
> I think something like the sketch below would fit better in the power
> management framework.  PTM disable is closely related to device power
> states, so I tried to put it as close as possible to the power state
> transitions.  I'm sure there are things missing and things I'm
> overlooking:

There are PCI devices that pci_prepare_to_sleep() is not called for,
so disabling PTM in there may not work in general.

> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index cfaf40a540a8..4dcd0c7381b9 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2705,28 +2705,21 @@ int pci_prepare_to_sleep(struct pci_dev *dev)
>  {
>         bool wakeup = device_may_wakeup(&dev->dev);
>         pci_power_t target_state = pci_target_state(dev, wakeup);
> +       bool ptm = pcie_ptm_enabled(dev);
>         int error;
>
>         if (target_state == PCI_POWER_ERROR)
>                 return -EIO;
>
> -       /*
> -        * There are systems (for example, Intel mobile chips since Coffee
> -        * Lake) where the power drawn while suspended can be significantly
> -        * reduced by disabling PTM on PCIe root ports as this allows the
> -        * port to enter a lower-power PM state and the SoC to reach a
> -        * lower-power idle state as a whole.
> -        */
> -       if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
> -               pci_disable_ptm(dev);
> -
> +       pci_disable_ptm(dev);
>         pci_enable_wake(dev, target_state, wakeup);
>
>         error = pci_set_power_state(dev, target_state);
>
>         if (error) {
>                 pci_enable_wake(dev, target_state, false);
> -               pci_restore_ptm_state(dev);
> +               if (ptm)
> +                       pci_enable_ptm(dev);
>         }
>
>         return error;
> @@ -2762,6 +2755,7 @@ EXPORT_SYMBOL(pci_back_from_sleep);
>  int pci_finish_runtime_suspend(struct pci_dev *dev)
>  {
>         pci_power_t target_state;
> +       bool ptm = pcie_ptm_enabled(dev);
>         int error;
>
>         target_state = pci_target_state(dev, device_can_wakeup(&dev->dev));
> @@ -2778,13 +2772,15 @@ int pci_finish_runtime_suspend(struct pci_dev *dev)
>         if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
>                 pci_disable_ptm(dev);
>
> +       pci_disable_ptm(dev);
>         __pci_enable_wake(dev, target_state, pci_dev_run_wake(dev));
>
>         error = pci_set_power_state(dev, target_state);
>
>         if (error) {
>                 pci_enable_wake(dev, target_state, false);
> -               pci_restore_ptm_state(dev);
> +               if (ptm)
> +                       pci_enable_ptm(dev);
>         }
>
>         return error;
diff mbox series

Patch

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 1f64de3e5280..db4d7835d7ae 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -803,14 +803,33 @@  static int pci_pm_suspend(struct device *dev)
 		pci_dev_adjust_pme(pci_dev);
 	}
 
+	/*
+	 * If a PTM Requester is put in a low-power state, a PTM Responder
+	 * upstream from it may also be put in a low-power state. Putting a
+	 * Port in D1, D2, or D3hot does not prohibit it from sending or
+	 * responding to PTM Requests. We want to disable PTM on Responders
+	 * when they are in a low-power state. Per 6.21.3, a PTM Requester
+	 * must not be enabled when the upstream PTM Responder is disabled.
+	 * Therefore, we must disable all PTM on all downstream PTM
+	 * Requesters before disabling it on the PTM Responder, e.g., a Root
+	 * Port.
+	 *
+	 * Also, to restore the PTM state, it needs to be saved before
+	 * disabling it for all devices.
+	 */
+	pci_save_ptm_state(pci_dev);
+	pci_disable_ptm(pci_dev);
+
 	if (pm->suspend) {
 		pci_power_t prev = pci_dev->current_state;
 		int error;
 
 		error = pm->suspend(dev);
 		suspend_report_result(dev, pm->suspend, error);
-		if (error)
+		if (error) {
+			pci_restore_ptm_state(pci_dev);
 			return error;
+		}
 
 		if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
 		    && pci_dev->current_state != PCI_UNKNOWN) {
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index cfaf40a540a8..3e9dcb1bbffa 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1669,7 +1669,15 @@  int pci_save_state(struct pci_dev *dev)
 	pci_save_ltr_state(dev);
 	pci_save_dpc_state(dev);
 	pci_save_aer_state(dev);
-	pci_save_ptm_state(dev);
+#ifdef CONFIG_PCIE_PTM
+	/*
+	 * PCI PM core disables PTM during suspend and saves PTM state before
+	 * that to be able to restore the ptm state restored later. So PCI core
+	 * needs this check to avoid double save.
+	 */
+	if (dev->ptm_enabled)
+		pci_save_ptm_state(dev);
+#endif
 	return pci_save_vc_state(dev);
 }
 EXPORT_SYMBOL(pci_save_state);
@@ -2710,24 +2718,12 @@  int pci_prepare_to_sleep(struct pci_dev *dev)
 	if (target_state == PCI_POWER_ERROR)
 		return -EIO;
 
-	/*
-	 * There are systems (for example, Intel mobile chips since Coffee
-	 * Lake) where the power drawn while suspended can be significantly
-	 * reduced by disabling PTM on PCIe root ports as this allows the
-	 * port to enter a lower-power PM state and the SoC to reach a
-	 * lower-power idle state as a whole.
-	 */
-	if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
-		pci_disable_ptm(dev);
-
 	pci_enable_wake(dev, target_state, wakeup);
 
 	error = pci_set_power_state(dev, target_state);
 
-	if (error) {
+	if (error)
 		pci_enable_wake(dev, target_state, false);
-		pci_restore_ptm_state(dev);
-	}
 
 	return error;
 }
@@ -2775,8 +2771,10 @@  int pci_finish_runtime_suspend(struct pci_dev *dev)
 	 * port to enter a lower-power PM state and the SoC to reach a
 	 * lower-power idle state as a whole.
 	 */
-	if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
+	if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) {
+		pci_save_ptm_state(dev);
 		pci_disable_ptm(dev);
+	}
 
 	__pci_enable_wake(dev, target_state, pci_dev_run_wake(dev));
 
diff --git a/drivers/pci/pcie/ptm.c b/drivers/pci/pcie/ptm.c
index 368a254e3124..746e29779c27 100644
--- a/drivers/pci/pcie/ptm.c
+++ b/drivers/pci/pcie/ptm.c
@@ -44,6 +44,7 @@  void pci_disable_ptm(struct pci_dev *dev)
 	pci_read_config_word(dev, ptm + PCI_PTM_CTRL, &ctrl);
 	ctrl &= ~(PCI_PTM_CTRL_ENABLE | PCI_PTM_CTRL_ROOT);
 	pci_write_config_word(dev, ptm + PCI_PTM_CTRL, ctrl);
+	dev->ptm_enabled = 0;
 }
 
 void pci_save_ptm_state(struct pci_dev *dev)