diff mbox series

PCI: Forbid RPM on ACPI systems before 5.0 only

Message ID cadbd4eb-40bb-1fa8-1e00-dc95dab62295@gmail.com (mailing list archive)
State Changes Requested, archived
Headers show
Series PCI: Forbid RPM on ACPI systems before 5.0 only | expand

Commit Message

Heiner Kallweit Jan. 17, 2022, 10:51 a.m. UTC
Currently PCI core forbids RPM and requires opt-in from userspace,
apart from few drivers calling pm_runtime_allow(). Reason is that some
early ACPI PM implementations conflict with RPM, see [0].
Note that as of today pm_runtime_forbid() is also called for non-ACPI
systems. Maybe it's time to allow RPM per default for non-ACPI systems
and recent enough ACPI versions. Let's allow RPM from ACPI 5.0 which
was published in 2011.

[0] https://lkml.org/lkml/2020/11/17/1548

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/pci/pci.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Bjorn Helgaas Jan. 17, 2022, 11:35 p.m. UTC | #1
[+cc Kai-Heng, Lukas, Mika, since they were cc'd or commented on [0] below]

On Mon, Jan 17, 2022 at 11:51:54AM +0100, Heiner Kallweit wrote:
> Currently PCI core forbids RPM and requires opt-in from userspace,
> apart from few drivers calling pm_runtime_allow(). Reason is that some
> early ACPI PM implementations conflict with RPM, see [0].
> Note that as of today pm_runtime_forbid() is also called for non-ACPI
> systems. Maybe it's time to allow RPM per default for non-ACPI systems
> and recent enough ACPI versions. Let's allow RPM from ACPI 5.0 which
> was published in 2011.

Let's reword this to use the positive sense, e.g., something like
"enable runtime power management for non-ACPI and ACPI 5.0 and newer."

This feels like a potentially significant change that could cause
breakage.  

  - How would a user recognize that we're doing something different?
    Maybe we need a note in dmesg?

  - If a system broke because of this, what would it look like?  How
    would a user notice a problem, and how would he or she connect the
    problem to this change?

  - Is there a kernel parameter that will get the previous behavior of
    disabling runtime PM as a workaround until a quirk can be added?
    If so, we should probably mention it here.  If not, should there
    be?

> [0] https://lkml.org/lkml/2020/11/17/1548

Please use an https://lore.kernel.org/r/... link instead.

Let's mention bb910a7040e9 ("PCI/PM Runtime: Make runtime PM of PCI
devices inactive by default") as well to help connect the dots here.

> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> ---
>  drivers/pci/pci.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 428afd459..26e3a500c 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3101,7 +3101,12 @@ void pci_pm_init(struct pci_dev *dev)
>  	u16 status;
>  	u16 pmc;
>  
> -	pm_runtime_forbid(&dev->dev);
> +#ifdef CONFIG_ACPI
> +	/* Some early ACPI PM implementations conflict with RPM. */
> +	if (acpi_gbl_FADT.header.revision > 0 &&
> +	    acpi_gbl_FADT.header.revision < 5)
> +		pm_runtime_forbid(&dev->dev);
> +#endif
>  	pm_runtime_set_active(&dev->dev);
>  	pm_runtime_enable(&dev->dev);
>  	device_enable_async_suspend(&dev->dev);
> -- 
> 2.34.1
>
Heiner Kallweit Jan. 18, 2022, 8:06 a.m. UTC | #2
On 18.01.2022 00:35, Bjorn Helgaas wrote:
> [+cc Kai-Heng, Lukas, Mika, since they were cc'd or commented on [0] below]
> 
> On Mon, Jan 17, 2022 at 11:51:54AM +0100, Heiner Kallweit wrote:
>> Currently PCI core forbids RPM and requires opt-in from userspace,
>> apart from few drivers calling pm_runtime_allow(). Reason is that some
>> early ACPI PM implementations conflict with RPM, see [0].
>> Note that as of today pm_runtime_forbid() is also called for non-ACPI
>> systems. Maybe it's time to allow RPM per default for non-ACPI systems
>> and recent enough ACPI versions. Let's allow RPM from ACPI 5.0 which
>> was published in 2011.
> 
> Let's reword this to use the positive sense, e.g., something like
> "enable runtime power management for non-ACPI and ACPI 5.0 and newer."
> 
> This feels like a potentially significant change that could cause
> breakage.  
> 
>   - How would a user recognize that we're doing something different?
>     Maybe we need a note in dmesg?
> 
>   - If a system broke because of this, what would it look like?  How
>     would a user notice a problem, and how would he or she connect the
>     problem to this change?
> 

Don't know what the exact symptoms of the original problem are.
I'd more see a certain risk that this change reveals bugs in RPM usage
of PCI device drivers. There's not a fixed list of potential symptoms.

One example: igb driver caused a hang on system shutdown when RPM was
enabled due to a RTNL deadlock in RPM resume path.

>   - Is there a kernel parameter that will get the previous behavior of
>     disabling runtime PM as a workaround until a quirk can be added?
>     If so, we should probably mention it here.  If not, should there
>     be?

For each device in sysfs: power/control: "auto" -> "on"

> 
>> [0] https://lkml.org/lkml/2020/11/17/1548
> 
> Please use an https://lore.kernel.org/r/... link instead.
> 
> Let's mention bb910a7040e9 ("PCI/PM Runtime: Make runtime PM of PCI
> devices inactive by default") as well to help connect the dots here.
> 
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>> ---
>>  drivers/pci/pci.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 428afd459..26e3a500c 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -3101,7 +3101,12 @@ void pci_pm_init(struct pci_dev *dev)
>>  	u16 status;
>>  	u16 pmc;
>>  
>> -	pm_runtime_forbid(&dev->dev);
>> +#ifdef CONFIG_ACPI
>> +	/* Some early ACPI PM implementations conflict with RPM. */
>> +	if (acpi_gbl_FADT.header.revision > 0 &&
>> +	    acpi_gbl_FADT.header.revision < 5)
>> +		pm_runtime_forbid(&dev->dev);
>> +#endif
>>  	pm_runtime_set_active(&dev->dev);
>>  	pm_runtime_enable(&dev->dev);
>>  	device_enable_async_suspend(&dev->dev);
>> -- 
>> 2.34.1
>>
Bjorn Helgaas Jan. 18, 2022, 4:09 p.m. UTC | #3
On Tue, Jan 18, 2022 at 09:06:12AM +0100, Heiner Kallweit wrote:
> On 18.01.2022 00:35, Bjorn Helgaas wrote:
> > [+cc Kai-Heng, Lukas, Mika, since they were cc'd or commented on [0] below]
> > 
> > On Mon, Jan 17, 2022 at 11:51:54AM +0100, Heiner Kallweit wrote:
> >> Currently PCI core forbids RPM and requires opt-in from userspace,
> >> apart from few drivers calling pm_runtime_allow(). Reason is that some
> >> early ACPI PM implementations conflict with RPM, see [0].
> >> Note that as of today pm_runtime_forbid() is also called for non-ACPI
> >> systems. Maybe it's time to allow RPM per default for non-ACPI systems
> >> and recent enough ACPI versions. Let's allow RPM from ACPI 5.0 which
> >> was published in 2011.
> > 
> > Let's reword this to use the positive sense, e.g., something like
> > "enable runtime power management for non-ACPI and ACPI 5.0 and newer."
> > 
> > This feels like a potentially significant change that could cause
> > breakage.  
> > 
> >   - How would a user recognize that we're doing something different?
> >     Maybe we need a note in dmesg?
> > 
> >   - If a system broke because of this, what would it look like?  How
> >     would a user notice a problem, and how would he or she connect the
> >     problem to this change?
> 
> Don't know what the exact symptoms of the original problem are.
> I'd more see a certain risk that this change reveals bugs in RPM usage
> of PCI device drivers. There's not a fixed list of potential symptoms.
> 
> One example: igb driver caused a hang on system shutdown when RPM was
> enabled due to a RTNL deadlock in RPM resume path.
> 
> >   - Is there a kernel parameter that will get the previous behavior of
> >     disabling runtime PM as a workaround until a quirk can be added?
> >     If so, we should probably mention it here.  If not, should there
> >     be?
> 
> For each device in sysfs: power/control: "auto" -> "on"

Thanks.  In case it wasn't clear, this is information that I would
like to have in the commit log so that if anybody *does* see a
problem, there's a hint about how to debug it and work around it.

> >> [0] https://lkml.org/lkml/2020/11/17/1548
> > 
> > Please use an https://lore.kernel.org/r/... link instead.
> > 
> > Let's mention bb910a7040e9 ("PCI/PM Runtime: Make runtime PM of PCI
> > devices inactive by default") as well to help connect the dots here.
> > 
> >> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> >> ---
> >>  drivers/pci/pci.c | 7 ++++++-
> >>  1 file changed, 6 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >> index 428afd459..26e3a500c 100644
> >> --- a/drivers/pci/pci.c
> >> +++ b/drivers/pci/pci.c
> >> @@ -3101,7 +3101,12 @@ void pci_pm_init(struct pci_dev *dev)
> >>  	u16 status;
> >>  	u16 pmc;
> >>  
> >> -	pm_runtime_forbid(&dev->dev);
> >> +#ifdef CONFIG_ACPI
> >> +	/* Some early ACPI PM implementations conflict with RPM. */
> >> +	if (acpi_gbl_FADT.header.revision > 0 &&
> >> +	    acpi_gbl_FADT.header.revision < 5)
> >> +		pm_runtime_forbid(&dev->dev);
> >> +#endif
> >>  	pm_runtime_set_active(&dev->dev);
> >>  	pm_runtime_enable(&dev->dev);
> >>  	device_enable_async_suspend(&dev->dev);
> >> -- 
> >> 2.34.1
> >>
>
Rafael J. Wysocki Jan. 18, 2022, 4:28 p.m. UTC | #4
On Mon, Jan 17, 2022 at 11:52 AM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> Currently PCI core forbids RPM and requires opt-in from userspace,
> apart from few drivers calling pm_runtime_allow(). Reason is that some
> early ACPI PM implementations conflict with RPM, see [0].
> Note that as of today pm_runtime_forbid() is also called for non-ACPI
> systems. Maybe it's time to allow RPM per default for non-ACPI systems
> and recent enough ACPI versions. Let's allow RPM from ACPI 5.0 which
> was published in 2011.
>
> [0] https://lkml.org/lkml/2020/11/17/1548
>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> ---
>  drivers/pci/pci.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 428afd459..26e3a500c 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3101,7 +3101,12 @@ void pci_pm_init(struct pci_dev *dev)
>         u16 status;
>         u16 pmc;
>
> -       pm_runtime_forbid(&dev->dev);
> +#ifdef CONFIG_ACPI
> +       /* Some early ACPI PM implementations conflict with RPM. */
> +       if (acpi_gbl_FADT.header.revision > 0 &&
> +           acpi_gbl_FADT.header.revision < 5)
> +               pm_runtime_forbid(&dev->dev);
> +#endif

Well, there are two things here.

First, there were systems in which ACPI PM was not ready for changing
power states in the S0 system state (ie. run-time) and it was assuming
that power states would only be changed during transitions to sleep
states (S1 - S4) and to S5.  This can be covered by the ACPI revicion
check, but I'm not sure if ACPI 5.0 is the right one.  Why ACPI 5 and
not ACPI 6, for instance?

Second, there were PCI devices without ACPI PM where the PCI standard
PM didn't work correctly.  This is not related to ACPI at all and I'm
not sure why the ACPI revision check would be sufficient to cover
these cases.

>         pm_runtime_set_active(&dev->dev);
>         pm_runtime_enable(&dev->dev);
>         device_enable_async_suspend(&dev->dev);
> --
> 2.34.1
>
Heiner Kallweit Jan. 18, 2022, 4:56 p.m. UTC | #5
On 18.01.2022 17:28, Rafael J. Wysocki wrote:
> On Mon, Jan 17, 2022 at 11:52 AM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> Currently PCI core forbids RPM and requires opt-in from userspace,
>> apart from few drivers calling pm_runtime_allow(). Reason is that some
>> early ACPI PM implementations conflict with RPM, see [0].
>> Note that as of today pm_runtime_forbid() is also called for non-ACPI
>> systems. Maybe it's time to allow RPM per default for non-ACPI systems
>> and recent enough ACPI versions. Let's allow RPM from ACPI 5.0 which
>> was published in 2011.
>>
>> [0] https://lkml.org/lkml/2020/11/17/1548
>>
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>> ---
>>  drivers/pci/pci.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 428afd459..26e3a500c 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -3101,7 +3101,12 @@ void pci_pm_init(struct pci_dev *dev)
>>         u16 status;
>>         u16 pmc;
>>
>> -       pm_runtime_forbid(&dev->dev);
>> +#ifdef CONFIG_ACPI
>> +       /* Some early ACPI PM implementations conflict with RPM. */
>> +       if (acpi_gbl_FADT.header.revision > 0 &&
>> +           acpi_gbl_FADT.header.revision < 5)
>> +               pm_runtime_forbid(&dev->dev);
>> +#endif
> 
> Well, there are two things here.
> 
> First, there were systems in which ACPI PM was not ready for changing
> power states in the S0 system state (ie. run-time) and it was assuming
> that power states would only be changed during transitions to sleep
> states (S1 - S4) and to S5.  This can be covered by the ACPI revicion
> check, but I'm not sure if ACPI 5.0 is the right one.  Why ACPI 5 and
> not ACPI 6, for instance?
> 
Just based on the assumption that ACPI 5.0 should be recent enough.
We can also go with ACPI 6.

> Second, there were PCI devices without ACPI PM where the PCI standard
> PM didn't work correctly.  This is not related to ACPI at all and I'm
> not sure why the ACPI revision check would be sufficient to cover
> these cases.
> 
Didn't know that there were such cases. Can you provide any examples or
links to reports about such misbehaving devices?

>>         pm_runtime_set_active(&dev->dev);
>>         pm_runtime_enable(&dev->dev);
>>         device_enable_async_suspend(&dev->dev);
>> --
>> 2.34.1
>>
Rafael J. Wysocki Jan. 18, 2022, 5:11 p.m. UTC | #6
On Tue, Jan 18, 2022 at 5:57 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> On 18.01.2022 17:28, Rafael J. Wysocki wrote:
> > On Mon, Jan 17, 2022 at 11:52 AM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>
> >> Currently PCI core forbids RPM and requires opt-in from userspace,
> >> apart from few drivers calling pm_runtime_allow(). Reason is that some
> >> early ACPI PM implementations conflict with RPM, see [0].
> >> Note that as of today pm_runtime_forbid() is also called for non-ACPI
> >> systems. Maybe it's time to allow RPM per default for non-ACPI systems
> >> and recent enough ACPI versions. Let's allow RPM from ACPI 5.0 which
> >> was published in 2011.
> >>
> >> [0] https://lkml.org/lkml/2020/11/17/1548
> >>
> >> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> >> ---
> >>  drivers/pci/pci.c | 7 ++++++-
> >>  1 file changed, 6 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >> index 428afd459..26e3a500c 100644
> >> --- a/drivers/pci/pci.c
> >> +++ b/drivers/pci/pci.c
> >> @@ -3101,7 +3101,12 @@ void pci_pm_init(struct pci_dev *dev)
> >>         u16 status;
> >>         u16 pmc;
> >>
> >> -       pm_runtime_forbid(&dev->dev);
> >> +#ifdef CONFIG_ACPI
> >> +       /* Some early ACPI PM implementations conflict with RPM. */
> >> +       if (acpi_gbl_FADT.header.revision > 0 &&
> >> +           acpi_gbl_FADT.header.revision < 5)
> >> +               pm_runtime_forbid(&dev->dev);
> >> +#endif
> >
> > Well, there are two things here.
> >
> > First, there were systems in which ACPI PM was not ready for changing
> > power states in the S0 system state (ie. run-time) and it was assuming
> > that power states would only be changed during transitions to sleep
> > states (S1 - S4) and to S5.  This can be covered by the ACPI revicion
> > check, but I'm not sure if ACPI 5.0 is the right one.  Why ACPI 5 and
> > not ACPI 6, for instance?
> >
> Just based on the assumption that ACPI 5.0 should be recent enough.
> We can also go with ACPI 6.

I know that we can, the question is whether or not we should.

IOW, there needs to be at least some technical grounds on which to
assume that a given ACPI release is safe enough.

> > Second, there were PCI devices without ACPI PM where the PCI standard
> > PM didn't work correctly.  This is not related to ACPI at all and I'm
> > not sure why the ACPI revision check would be sufficient to cover
> > these cases.
> >
> Didn't know that there were such cases. Can you provide any examples or
> links to reports about such misbehaving devices?

Admittedly, I don't have a list of them, so I would need to look them
up and not just in the mailing lists.

> >>         pm_runtime_set_active(&dev->dev);
> >>         pm_runtime_enable(&dev->dev);
> >>         device_enable_async_suspend(&dev->dev);
> >> --

Also note that this change will allow PM-runtime to be used on PCI
devices without drivers by default and that may not be entirely safe
either.  It didn't work really well in the past IIRC, so I'm wondering
what's the reason to believe that it will work just fine this time.
Heiner Kallweit Jan. 18, 2022, 5:42 p.m. UTC | #7
On 18.01.2022 18:11, Rafael J. Wysocki wrote:
> On Tue, Jan 18, 2022 at 5:57 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> On 18.01.2022 17:28, Rafael J. Wysocki wrote:
>>> On Mon, Jan 17, 2022 at 11:52 AM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>
>>>> Currently PCI core forbids RPM and requires opt-in from userspace,
>>>> apart from few drivers calling pm_runtime_allow(). Reason is that some
>>>> early ACPI PM implementations conflict with RPM, see [0].
>>>> Note that as of today pm_runtime_forbid() is also called for non-ACPI
>>>> systems. Maybe it's time to allow RPM per default for non-ACPI systems
>>>> and recent enough ACPI versions. Let's allow RPM from ACPI 5.0 which
>>>> was published in 2011.
>>>>
>>>> [0] https://lkml.org/lkml/2020/11/17/1548
>>>>
>>>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>>>> ---
>>>>  drivers/pci/pci.c | 7 ++++++-
>>>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>> index 428afd459..26e3a500c 100644
>>>> --- a/drivers/pci/pci.c
>>>> +++ b/drivers/pci/pci.c
>>>> @@ -3101,7 +3101,12 @@ void pci_pm_init(struct pci_dev *dev)
>>>>         u16 status;
>>>>         u16 pmc;
>>>>
>>>> -       pm_runtime_forbid(&dev->dev);
>>>> +#ifdef CONFIG_ACPI
>>>> +       /* Some early ACPI PM implementations conflict with RPM. */
>>>> +       if (acpi_gbl_FADT.header.revision > 0 &&
>>>> +           acpi_gbl_FADT.header.revision < 5)
>>>> +               pm_runtime_forbid(&dev->dev);
>>>> +#endif
>>>
>>> Well, there are two things here.
>>>
>>> First, there were systems in which ACPI PM was not ready for changing
>>> power states in the S0 system state (ie. run-time) and it was assuming
>>> that power states would only be changed during transitions to sleep
>>> states (S1 - S4) and to S5.  This can be covered by the ACPI revicion
>>> check, but I'm not sure if ACPI 5.0 is the right one.  Why ACPI 5 and
>>> not ACPI 6, for instance?
>>>
>> Just based on the assumption that ACPI 5.0 should be recent enough.
>> We can also go with ACPI 6.
> 
> I know that we can, the question is whether or not we should.
> 
> IOW, there needs to be at least some technical grounds on which to
> assume that a given ACPI release is safe enough.
> 
When ACPI 5 was published the workaround to disable RPM in general
was in place already. I'd assume that the majority of users does not
opt in for RPM, therefore it may be hard to find out whether any
system with ACPI 5 or ACPI 6 suffers from the same problem as the
affected old systems.

>>> Second, there were PCI devices without ACPI PM where the PCI standard
>>> PM didn't work correctly.  This is not related to ACPI at all and I'm
>>> not sure why the ACPI revision check would be sufficient to cover
>>> these cases.
>>>
>> Didn't know that there were such cases. Can you provide any examples or
>> links to reports about such misbehaving devices?
> 
> Admittedly, I don't have a list of them, so I would need to look them
> up and not just in the mailing lists.
> 
>>>>         pm_runtime_set_active(&dev->dev);
>>>>         pm_runtime_enable(&dev->dev);
>>>>         device_enable_async_suspend(&dev->dev);
>>>> --
> 
> Also note that this change will allow PM-runtime to be used on PCI
> devices without drivers by default and that may not be entirely safe
> either.  It didn't work really well in the past IIRC, so I'm wondering
> what's the reason to believe that it will work just fine this time.

>From "Documentation/power/pci.rst":
If a PCI driver implements the runtime PM callbacks and intends to use the
runtime PM framework provided by the PM core and the PCI subsystem, it needs
to decrement the device's runtime PM usage counter in its probe callback
function.  If it doesn't do that, the counter will always be different from
zero for the device and it will never be runtime-suspended.

Having said that I don't see how there can be a RPM-related problem w/o
the driver calling one of the RPM put functions. Maybe some of the problems
in the past were caused by PCI core bugs that have long been fixed.

To reduce the risk we could enable RPM for a certain subset of PCI devices
only in a first step, e.g. for PCIe devices.
Rafael J. Wysocki Jan. 19, 2022, 7:38 p.m. UTC | #8
On Tue, Jan 18, 2022 at 6:42 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> On 18.01.2022 18:11, Rafael J. Wysocki wrote:
> > On Tue, Jan 18, 2022 at 5:57 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>
> >> On 18.01.2022 17:28, Rafael J. Wysocki wrote:
> >>> On Mon, Jan 17, 2022 at 11:52 AM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>
> >>>> Currently PCI core forbids RPM and requires opt-in from userspace,
> >>>> apart from few drivers calling pm_runtime_allow(). Reason is that some
> >>>> early ACPI PM implementations conflict with RPM, see [0].
> >>>> Note that as of today pm_runtime_forbid() is also called for non-ACPI
> >>>> systems. Maybe it's time to allow RPM per default for non-ACPI systems
> >>>> and recent enough ACPI versions. Let's allow RPM from ACPI 5.0 which
> >>>> was published in 2011.
> >>>>
> >>>> [0] https://lkml.org/lkml/2020/11/17/1548
> >>>>
> >>>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> >>>> ---
> >>>>  drivers/pci/pci.c | 7 ++++++-
> >>>>  1 file changed, 6 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >>>> index 428afd459..26e3a500c 100644
> >>>> --- a/drivers/pci/pci.c
> >>>> +++ b/drivers/pci/pci.c
> >>>> @@ -3101,7 +3101,12 @@ void pci_pm_init(struct pci_dev *dev)
> >>>>         u16 status;
> >>>>         u16 pmc;
> >>>>
> >>>> -       pm_runtime_forbid(&dev->dev);
> >>>> +#ifdef CONFIG_ACPI
> >>>> +       /* Some early ACPI PM implementations conflict with RPM. */
> >>>> +       if (acpi_gbl_FADT.header.revision > 0 &&
> >>>> +           acpi_gbl_FADT.header.revision < 5)
> >>>> +               pm_runtime_forbid(&dev->dev);
> >>>> +#endif
> >>>
> >>> Well, there are two things here.
> >>>
> >>> First, there were systems in which ACPI PM was not ready for changing
> >>> power states in the S0 system state (ie. run-time) and it was assuming
> >>> that power states would only be changed during transitions to sleep
> >>> states (S1 - S4) and to S5.  This can be covered by the ACPI revicion
> >>> check, but I'm not sure if ACPI 5.0 is the right one.  Why ACPI 5 and
> >>> not ACPI 6, for instance?
> >>>
> >> Just based on the assumption that ACPI 5.0 should be recent enough.
> >> We can also go with ACPI 6.
> >
> > I know that we can, the question is whether or not we should.
> >
> > IOW, there needs to be at least some technical grounds on which to
> > assume that a given ACPI release is safe enough.
> >
> When ACPI 5 was published the workaround to disable RPM in general
> was in place already. I'd assume that the majority of users does not
> opt in for RPM, therefore it may be hard to find out whether any
> system with ACPI 5 or ACPI 6 suffers from the same problem as the
> affected old systems.

Which kind of demonstrates the problem with the proposed approach
which is based on speculation.

> >>> Second, there were PCI devices without ACPI PM where the PCI standard
> >>> PM didn't work correctly.  This is not related to ACPI at all and I'm
> >>> not sure why the ACPI revision check would be sufficient to cover
> >>> these cases.
> >>>
> >> Didn't know that there were such cases. Can you provide any examples or
> >> links to reports about such misbehaving devices?
> >
> > Admittedly, I don't have a list of them, so I would need to look them
> > up and not just in the mailing lists.
> >
> >>>>         pm_runtime_set_active(&dev->dev);
> >>>>         pm_runtime_enable(&dev->dev);
> >>>>         device_enable_async_suspend(&dev->dev);
> >>>> --
> >
> > Also note that this change will allow PM-runtime to be used on PCI
> > devices without drivers by default and that may not be entirely safe
> > either.  It didn't work really well in the past IIRC, so I'm wondering
> > what's the reason to believe that it will work just fine this time.
>
> >From "Documentation/power/pci.rst":
> If a PCI driver implements the runtime PM callbacks and intends to use the
> runtime PM framework provided by the PM core and the PCI subsystem, it needs
> to decrement the device's runtime PM usage counter in its probe callback
> function.  If it doesn't do that, the counter will always be different from
> zero for the device and it will never be runtime-suspended.

I'm not sure how this is related to what I said above.

> Having said that I don't see how there can be a RPM-related problem w/o
> the driver calling one of the RPM put functions. Maybe some of the problems
> in the past were caused by PCI core bugs that have long been fixed.
>
> To reduce the risk we could enable RPM for a certain subset of PCI devices
> only in a first step, e.g. for PCIe devices.

I'm still not sure if runtime-suspending them when they have no
drivers is a good idea.

It might be better to somehow do the pm_runtime_allow() automatically
in local_pci_probe() if the usage counter is 1 and power.runtime_auto
is false after running the driver's ->probe() callback.
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 428afd459..26e3a500c 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3101,7 +3101,12 @@  void pci_pm_init(struct pci_dev *dev)
 	u16 status;
 	u16 pmc;
 
-	pm_runtime_forbid(&dev->dev);
+#ifdef CONFIG_ACPI
+	/* Some early ACPI PM implementations conflict with RPM. */
+	if (acpi_gbl_FADT.header.revision > 0 &&
+	    acpi_gbl_FADT.header.revision < 5)
+		pm_runtime_forbid(&dev->dev);
+#endif
 	pm_runtime_set_active(&dev->dev);
 	pm_runtime_enable(&dev->dev);
 	device_enable_async_suspend(&dev->dev);