diff mbox

[1/7] PCI: Restore BARs on runtime resume despite being unbound

Message ID f7593132608eb9a83d7268cc5b3ef12b3dd9c52d.1518941073.git.lukas@wunner.de (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Lukas Wunner Feb. 18, 2018, 8:38 a.m. UTC
PCI devices not bound to a driver are supposed to stay in D0 during
runtime suspend.  But they may have a parent which is bound and can be
transitioned to D3cold at runtime.  Once the parent goes to D3cold, the
unbound child may go to D3cold as well.  When the child comes out of
D3cold, its BARs are uninitialized and thus inaccessible when a driver
tries to probe.

One example are recent hybrid graphics laptops which cut power to the
discrete GPU when the root port above it goes to ACPI power state D3.
Users may provoke this by unbinding the GPU driver and allowing runtime
PM on the GPU via sysfs:  The PM core will then treat the GPU as
"suspended", which in turn allows the root port to runtime suspend,
causing the power resources listed in its _PR3 object to be powered off.
The GPU's BARs will be uninitialized when a driver later probes it.

Another example are hybrid graphics laptops where the GPU itself (rather
than the root port) is capable of runtime suspending to D3cold.  If the
GPU's integrated HDA controller is not bound and the GPU's driver
decides to runtime suspend to D3cold, the HDA controller's BARs will be
uninitialized when a driver later probes it.

Fix by restoring the BARs on runtime resume if the device is not bound.
This is sufficient to fix the above-mentioned use cases.  Other use
cases might require a full-blown pci_save_state() / pci_restore_state()
or execution of fixups.  We can add that once use cases materialize,
let's not inflate the code unnecessarily.

Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/pci-driver.c | 8 ++++++--
 drivers/pci/pci.c        | 2 +-
 drivers/pci/pci.h        | 1 +
 3 files changed, 8 insertions(+), 3 deletions(-)

Comments

Rafael J. Wysocki Feb. 19, 2018, 9:49 a.m. UTC | #1
On Sun, Feb 18, 2018 at 9:38 AM, Lukas Wunner <lukas@wunner.de> wrote:
> PCI devices not bound to a driver are supposed to stay in D0 during
> runtime suspend.  But they may have a parent which is bound and can be
> transitioned to D3cold at runtime.  Once the parent goes to D3cold, the
> unbound child may go to D3cold as well.  When the child comes out of
> D3cold, its BARs are uninitialized and thus inaccessible when a driver
> tries to probe.
>
> One example are recent hybrid graphics laptops which cut power to the
> discrete GPU when the root port above it goes to ACPI power state D3.
> Users may provoke this by unbinding the GPU driver and allowing runtime
> PM on the GPU via sysfs:  The PM core will then treat the GPU as
> "suspended", which in turn allows the root port to runtime suspend,
> causing the power resources listed in its _PR3 object to be powered off.
> The GPU's BARs will be uninitialized when a driver later probes it.
>
> Another example are hybrid graphics laptops where the GPU itself (rather
> than the root port) is capable of runtime suspending to D3cold.  If the
> GPU's integrated HDA controller is not bound and the GPU's driver
> decides to runtime suspend to D3cold, the HDA controller's BARs will be
> uninitialized when a driver later probes it.
>
> Fix by restoring the BARs on runtime resume if the device is not bound.
> This is sufficient to fix the above-mentioned use cases.  Other use
> cases might require a full-blown pci_save_state() / pci_restore_state()
> or execution of fixups.  We can add that once use cases materialize,
> let's not inflate the code unnecessarily.
>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  drivers/pci/pci-driver.c | 8 ++++++--
>  drivers/pci/pci.c        | 2 +-
>  drivers/pci/pci.h        | 1 +
>  3 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 3bed6beda051..51b11cbd48f6 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1277,10 +1277,14 @@ static int pci_pm_runtime_resume(struct device *dev)
>
>         /*
>          * If pci_dev->driver is not set (unbound), the device should
> -        * always remain in D0 regardless of the runtime PM status
> +        * always remain in D0 regardless of the runtime PM status.
> +        * But if its parent can go to D3cold, this device may have
> +        * been in D3cold as well and require restoration of its BARs.
>          */
> -       if (!pci_dev->driver)
> +       if (!pci_dev->driver) {
> +               pci_restore_bars(pci_dev);
>                 return 0;
> +       }
>
>         if (!pm || !pm->runtime_resume)
>                 return -ENOSYS;
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index f6a4dd10d9b0..f694650235f2 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -563,7 +563,7 @@ int pci_wait_for_pending(struct pci_dev *dev, int pos, u16 mask)
>   * Restore the BAR values for a given device, so as to make it
>   * accessible by its driver.
>   */
> -static void pci_restore_bars(struct pci_dev *dev)
> +void pci_restore_bars(struct pci_dev *dev)
>  {
>         int i;
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index fcd81911b127..29dc15bbe3bf 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -83,6 +83,7 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev);
>  void pci_free_cap_save_buffers(struct pci_dev *dev);
>  bool pci_bridge_d3_possible(struct pci_dev *dev);
>  void pci_bridge_d3_update(struct pci_dev *dev);
> +void pci_restore_bars(struct pci_dev *dev);
>
>  static inline void pci_wakeup_event(struct pci_dev *dev)
>  {
> --
> 2.15.1
>
Bjorn Helgaas Feb. 20, 2018, 9:29 p.m. UTC | #2
On Sun, Feb 18, 2018 at 09:38:32AM +0100, Lukas Wunner wrote:
> PCI devices not bound to a driver are supposed to stay in D0 during
> runtime suspend.

Doesn't "runtime suspend" mean an individual device is suspended while
the rest of the system remains active?

If so, maybe it would be more direct to say "PCI devices not bound to
a driver cannot be runtime suspended"?

(It's a separate question not relevant to this patch, but I'm not
convinced that "PCI devices without a driver cannot be suspended"
should be accepted as a rule.  If it is a rule, we should be able to
deduce it from the specs.)

> But they may have a parent which is bound and can be
> transitioned to D3cold at runtime.  Once the parent goes to D3cold, the
> unbound child may go to D3cold as well.  When the child comes out of
> D3cold, its BARs are uninitialized and thus inaccessible when a driver
> tries to probe.
> 
> One example are recent hybrid graphics laptops which cut power to the
> discrete GPU when the root port above it goes to ACPI power state D3.
> Users may provoke this by unbinding the GPU driver and allowing runtime
> PM on the GPU via sysfs:  The PM core will then treat the GPU as
> "suspended", which in turn allows the root port to runtime suspend,
> causing the power resources listed in its _PR3 object to be powered off.
> The GPU's BARs will be uninitialized when a driver later probes it.
> 
> Another example are hybrid graphics laptops where the GPU itself (rather
> than the root port) is capable of runtime suspending to D3cold.  If the
> GPU's integrated HDA controller is not bound and the GPU's driver
> decides to runtime suspend to D3cold, the HDA controller's BARs will be
> uninitialized when a driver later probes it.
> 
> Fix by restoring the BARs on runtime resume if the device is not bound.
> This is sufficient to fix the above-mentioned use cases.  Other use
> cases might require a full-blown pci_save_state() / pci_restore_state()
> or execution of fixups.  We can add that once use cases materialize,
> let's not inflate the code unnecessarily.

Why would we not do a full-blown restore?  With this patch, I think
some configuration done during enumeration, e.g., ASPM and MPS, will
be lost.  "lspci -vv" of the HDA before and after the suspend may show
different things, which seems counter-intuitive.

I wouldn't think of a full-blown restore as "inflating the code"; I
would think of it as "having fewer special cases", i.e., we always use
the same restore path instead of having the main one plus a special
one for unbound devices.

> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
>  drivers/pci/pci-driver.c | 8 ++++++--
>  drivers/pci/pci.c        | 2 +-
>  drivers/pci/pci.h        | 1 +
>  3 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 3bed6beda051..51b11cbd48f6 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1277,10 +1277,14 @@ static int pci_pm_runtime_resume(struct device *dev)
>  
>  	/*
>  	 * If pci_dev->driver is not set (unbound), the device should
> -	 * always remain in D0 regardless of the runtime PM status
> +	 * always remain in D0 regardless of the runtime PM status.
> +	 * But if its parent can go to D3cold, this device may have
> +	 * been in D3cold as well and require restoration of its BARs.
>  	 */
> -	if (!pci_dev->driver)
> +	if (!pci_dev->driver) {
> +		pci_restore_bars(pci_dev);

If we do decide not to do a full-blown restore, how do we decide
whether to use pci_restore_bars() or pci_restore_config_space()?

I'm not sure why we have both.  The pci_restore_bars() path looks a
little smarter in that it is more careful when updating 64-bit BARs
that can't be updated atomically.

>  		return 0;
> +	}
>  
>  	if (!pm || !pm->runtime_resume)
>  		return -ENOSYS;
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index f6a4dd10d9b0..f694650235f2 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -563,7 +563,7 @@ int pci_wait_for_pending(struct pci_dev *dev, int pos, u16 mask)
>   * Restore the BAR values for a given device, so as to make it
>   * accessible by its driver.
>   */
> -static void pci_restore_bars(struct pci_dev *dev)
> +void pci_restore_bars(struct pci_dev *dev)
>  {
>  	int i;
>  
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index fcd81911b127..29dc15bbe3bf 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -83,6 +83,7 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev);
>  void pci_free_cap_save_buffers(struct pci_dev *dev);
>  bool pci_bridge_d3_possible(struct pci_dev *dev);
>  void pci_bridge_d3_update(struct pci_dev *dev);
> +void pci_restore_bars(struct pci_dev *dev);
>  
>  static inline void pci_wakeup_event(struct pci_dev *dev)
>  {
> -- 
> 2.15.1
>
Rafael J. Wysocki Feb. 21, 2018, 9:57 a.m. UTC | #3
On Tue, Feb 20, 2018 at 10:29 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Sun, Feb 18, 2018 at 09:38:32AM +0100, Lukas Wunner wrote:
>> PCI devices not bound to a driver are supposed to stay in D0 during
>> runtime suspend.
>
> Doesn't "runtime suspend" mean an individual device is suspended while
> the rest of the system remains active?
>
> If so, maybe it would be more direct to say "PCI devices not bound to
> a driver cannot be runtime suspended"?
>
> (It's a separate question not relevant to this patch, but I'm not
> convinced that "PCI devices without a driver cannot be suspended"
> should be accepted as a rule.  If it is a rule, we should be able to
> deduce it from the specs.)
>
>> But they may have a parent which is bound and can be
>> transitioned to D3cold at runtime.  Once the parent goes to D3cold, the
>> unbound child may go to D3cold as well.  When the child comes out of
>> D3cold, its BARs are uninitialized and thus inaccessible when a driver
>> tries to probe.
>>
>> One example are recent hybrid graphics laptops which cut power to the
>> discrete GPU when the root port above it goes to ACPI power state D3.
>> Users may provoke this by unbinding the GPU driver and allowing runtime
>> PM on the GPU via sysfs:  The PM core will then treat the GPU as
>> "suspended", which in turn allows the root port to runtime suspend,
>> causing the power resources listed in its _PR3 object to be powered off.
>> The GPU's BARs will be uninitialized when a driver later probes it.
>>
>> Another example are hybrid graphics laptops where the GPU itself (rather
>> than the root port) is capable of runtime suspending to D3cold.  If the
>> GPU's integrated HDA controller is not bound and the GPU's driver
>> decides to runtime suspend to D3cold, the HDA controller's BARs will be
>> uninitialized when a driver later probes it.
>>
>> Fix by restoring the BARs on runtime resume if the device is not bound.
>> This is sufficient to fix the above-mentioned use cases.  Other use
>> cases might require a full-blown pci_save_state() / pci_restore_state()
>> or execution of fixups.  We can add that once use cases materialize,
>> let's not inflate the code unnecessarily.
>
> Why would we not do a full-blown restore?  With this patch, I think
> some configuration done during enumeration, e.g., ASPM and MPS, will
> be lost.  "lspci -vv" of the HDA before and after the suspend may show
> different things, which seems counter-intuitive.

Right.

> I wouldn't think of a full-blown restore as "inflating the code"; I
> would think of it as "having fewer special cases", i.e., we always use
> the same restore path instead of having the main one plus a special
> one for unbound devices.

That is a fair point, but if you look at pci_pm_runtime_suspend(), it
doesn't actually save anything for devices without drivers, so we'd
just restore the original initial state for them every time.

If we are to restore the entire state on runtime resume,
pci_pm_runtime_suspend() should be changed to call pci_save_state()
before returning 0 in the !dev->driver case AFAICS.

>> Cc: Bjorn Helgaas <bhelgaas@google.com>
>> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Signed-off-by: Lukas Wunner <lukas@wunner.de>
>> ---
>>  drivers/pci/pci-driver.c | 8 ++++++--
>>  drivers/pci/pci.c        | 2 +-
>>  drivers/pci/pci.h        | 1 +
>>  3 files changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>> index 3bed6beda051..51b11cbd48f6 100644
>> --- a/drivers/pci/pci-driver.c
>> +++ b/drivers/pci/pci-driver.c
>> @@ -1277,10 +1277,14 @@ static int pci_pm_runtime_resume(struct device *dev)
>>
>>       /*
>>        * If pci_dev->driver is not set (unbound), the device should
>> -      * always remain in D0 regardless of the runtime PM status
>> +      * always remain in D0 regardless of the runtime PM status.
>> +      * But if its parent can go to D3cold, this device may have
>> +      * been in D3cold as well and require restoration of its BARs.
>>        */
>> -     if (!pci_dev->driver)
>> +     if (!pci_dev->driver) {
>> +             pci_restore_bars(pci_dev);
>
> If we do decide not to do a full-blown restore, how do we decide
> whether to use pci_restore_bars() or pci_restore_config_space()?
>
> I'm not sure why we have both.

Me neither.

> The pci_restore_bars() path looks a
> little smarter in that it is more careful when updating 64-bit BARs
> that can't be updated atomically.
>
>>               return 0;
>> +     }
>>
>>       if (!pm || !pm->runtime_resume)
>>               return -ENOSYS;

So if pci_pm_runtime_suspend() is modified to call pci_save_state()
before returning 0 in the !dev->driver case, we can just move the
pci_restore_standard_config() invocation in pci_pm_runtime_resume() up
to the very top and check dev->driver later.
Lukas Wunner Feb. 24, 2018, 4:26 p.m. UTC | #4
[trimming cc a bit to avoid spamming folks likely uninterested in PCI PM
intricacies]

On Wed, Feb 21, 2018 at 10:57:14AM +0100, Rafael J. Wysocki wrote:
> On Tue, Feb 20, 2018 at 10:29 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Sun, Feb 18, 2018 at 09:38:32AM +0100, Lukas Wunner wrote:
> > > --- a/drivers/pci/pci-driver.c
> > > +++ b/drivers/pci/pci-driver.c
> > > @@ -1277,10 +1277,14 @@ static int pci_pm_runtime_resume(struct device *dev)
> > >
> > >       /*
> > >        * If pci_dev->driver is not set (unbound), the device should
> > > -      * always remain in D0 regardless of the runtime PM status
> > > +      * always remain in D0 regardless of the runtime PM status.
> > > +      * But if its parent can go to D3cold, this device may have
> > > +      * been in D3cold as well and require restoration of its BARs.
> > >        */
> > > -     if (!pci_dev->driver)
> > > +     if (!pci_dev->driver) {
> > > +             pci_restore_bars(pci_dev);
> >
> > If we do decide not to do a full-blown restore, how do we decide
> > whether to use pci_restore_bars() or pci_restore_config_space()?
> >
> > I'm not sure why we have both.
> 
> Me neither.

pci_restore_config_space() configures the BARs from saved_config_space[]
in struct pci_dev.  This array needs to have been populated beforehand.
If it's never been populated, the function can't be used.  The sole
caller of that function, pci_restore_state(), therefore checks whether
the state_saved bit in stuct pci_dev is true.

According to the code comment in the sole caller of pci_restore_bars(),
pci_raw_set_power_state(), there are systems where device are in D3hot
on boot.  Moreover devices where the No_Soft_Reset bit is 0 are in
D0uninitialized when coming out of D3hot.

For those devices, pci_restore_bars() is called to configure the BARs
from resource[] in struct pci_dev, which was populated on bus scan.
pci_restore_config_space() can't be used here because the first time
the devices are brought into D0, saved_config_space[] hasn't been
populated yet.  It's normally only populated when the device is suspended.

It might be possible to replace the invocation of pci_restore_bars()
with pci_restore_state() if pci_save_state() is called on bus scan for
devices in D3hot whose No_Soft_Reset bit is 0.

An alternative approach would be to avoid storing BARs in
saved_config_space[], and modify pci_restore_config_space() to restore
those from resource[].  That would save 24 bytes in struct pci_dev.
But it would only work if the BARs are always in sync with resource[],
I'm not sure if there are situations when this isn't the case.

Thanks,

Lukas
diff mbox

Patch

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 3bed6beda051..51b11cbd48f6 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1277,10 +1277,14 @@  static int pci_pm_runtime_resume(struct device *dev)
 
 	/*
 	 * If pci_dev->driver is not set (unbound), the device should
-	 * always remain in D0 regardless of the runtime PM status
+	 * always remain in D0 regardless of the runtime PM status.
+	 * But if its parent can go to D3cold, this device may have
+	 * been in D3cold as well and require restoration of its BARs.
 	 */
-	if (!pci_dev->driver)
+	if (!pci_dev->driver) {
+		pci_restore_bars(pci_dev);
 		return 0;
+	}
 
 	if (!pm || !pm->runtime_resume)
 		return -ENOSYS;
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f6a4dd10d9b0..f694650235f2 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -563,7 +563,7 @@  int pci_wait_for_pending(struct pci_dev *dev, int pos, u16 mask)
  * Restore the BAR values for a given device, so as to make it
  * accessible by its driver.
  */
-static void pci_restore_bars(struct pci_dev *dev)
+void pci_restore_bars(struct pci_dev *dev)
 {
 	int i;
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index fcd81911b127..29dc15bbe3bf 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -83,6 +83,7 @@  void pci_allocate_cap_save_buffers(struct pci_dev *dev);
 void pci_free_cap_save_buffers(struct pci_dev *dev);
 bool pci_bridge_d3_possible(struct pci_dev *dev);
 void pci_bridge_d3_update(struct pci_dev *dev);
+void pci_restore_bars(struct pci_dev *dev);
 
 static inline void pci_wakeup_event(struct pci_dev *dev)
 {