diff mbox series

[3/5] PCI / PM: Check for error when reading PME status

Message ID 20190805205214.194981-4-helgaas@kernel.org (mailing list archive)
State Superseded, archived
Headers show
Series PCI: Add PCI_ERROR_RESPONSE, check for errors | expand

Commit Message

Bjorn Helgaas Aug. 5, 2019, 8:52 p.m. UTC
From: Bjorn Helgaas <bhelgaas@google.com>

pci_check_pme_status() reads the Power Management capability to determine
whether a device has generated a PME.  The capability is in config space,
which is accessible in D0, D1, D2, and D3hot, but not in D3cold.

If we call pci_check_pme_status() on a device that's in D3cold, config
reads fail and return ~0 data, which we erroneously interpreted as "the
device has generated a PME".

000dd5316e1c ("PCI: Do not poll for PME if the device is in D3cold")
avoided many of these problems by avoiding pci_check_pme_status() if we
think the device is in D3cold.  However, it is not a complete fix because
the device may go to D3cold after we check its power state but before
pci_check_pme_status() reads the Power Management Status Register.

Return false ("device has not generated a PME") if we get an error response
reading the Power Management Status Register.

Fixes: 000dd5316e1c ("PCI: Do not poll for PME if the device is in D3cold")
Fixes: 71a83bd727cc ("PCI/PM: add runtime PM support to PCIe port")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pci.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Rafael J. Wysocki Aug. 5, 2019, 9:02 p.m. UTC | #1
On Mon, Aug 5, 2019 at 10:52 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> From: Bjorn Helgaas <bhelgaas@google.com>
>
> pci_check_pme_status() reads the Power Management capability to determine
> whether a device has generated a PME.  The capability is in config space,
> which is accessible in D0, D1, D2, and D3hot, but not in D3cold.
>
> If we call pci_check_pme_status() on a device that's in D3cold, config
> reads fail and return ~0 data, which we erroneously interpreted as "the
> device has generated a PME".
>
> 000dd5316e1c ("PCI: Do not poll for PME if the device is in D3cold")
> avoided many of these problems by avoiding pci_check_pme_status() if we
> think the device is in D3cold.  However, it is not a complete fix because
> the device may go to D3cold after we check its power state but before
> pci_check_pme_status() reads the Power Management Status Register.
>
> Return false ("device has not generated a PME") if we get an error response
> reading the Power Management Status Register.
>
> Fixes: 000dd5316e1c ("PCI: Do not poll for PME if the device is in D3cold")
> Fixes: 71a83bd727cc ("PCI/PM: add runtime PM support to PCIe port")
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>  drivers/pci/pci.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 984171d40858..af6a97d7012b 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2008,6 +2008,9 @@ bool pci_check_pme_status(struct pci_dev *dev)
>
>         pmcsr_pos = dev->pm_cap + PCI_PM_CTRL;
>         pci_read_config_word(dev, pmcsr_pos, &pmcsr);
> +       if (pmcsr == (u16) PCI_ERROR_RESPONSE)
> +               return false;
> +

No, sorry.

We tried that and it didn't work.

pcie_pme_handle_request() depends on this returning "true" for all
bits set, as from its perspective "device is not accessible" may very
well mean "device may have signaled PME".

If you want to make this change, you need to rework
pcie_pme_handle_request() along with it.

>         if (!(pmcsr & PCI_PM_CTRL_PME_STATUS))
>                 return false;
Bjorn Helgaas Aug. 6, 2019, 1:36 p.m. UTC | #2
On Mon, Aug 05, 2019 at 11:02:51PM +0200, Rafael J. Wysocki wrote:
> On Mon, Aug 5, 2019 at 10:52 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > pci_check_pme_status() reads the Power Management capability to determine
> > whether a device has generated a PME.  The capability is in config space,
> > which is accessible in D0, D1, D2, and D3hot, but not in D3cold.
> >
> > If we call pci_check_pme_status() on a device that's in D3cold, config
> > reads fail and return ~0 data, which we erroneously interpreted as "the
> > device has generated a PME".
> >
> > 000dd5316e1c ("PCI: Do not poll for PME if the device is in D3cold")
> > avoided many of these problems by avoiding pci_check_pme_status() if we
> > think the device is in D3cold.  However, it is not a complete fix because
> > the device may go to D3cold after we check its power state but before
> > pci_check_pme_status() reads the Power Management Status Register.
> >
> > Return false ("device has not generated a PME") if we get an error response
> > reading the Power Management Status Register.
> >
> > Fixes: 000dd5316e1c ("PCI: Do not poll for PME if the device is in D3cold")
> > Fixes: 71a83bd727cc ("PCI/PM: add runtime PM support to PCIe port")
> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > ---
> >  drivers/pci/pci.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 984171d40858..af6a97d7012b 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -2008,6 +2008,9 @@ bool pci_check_pme_status(struct pci_dev *dev)
> >
> >         pmcsr_pos = dev->pm_cap + PCI_PM_CTRL;
> >         pci_read_config_word(dev, pmcsr_pos, &pmcsr);
> > +       if (pmcsr == (u16) PCI_ERROR_RESPONSE)
> > +               return false;
> 
> No, sorry.
> 
> We tried that and it didn't work.
> 
> pcie_pme_handle_request() depends on this returning "true" for all
> bits set, as from its perspective "device is not accessible" may very
> well mean "device may have signaled PME".

Right, it's obviously wrong in the case of devices that advertise
D3cold in PME_Support, i.e., devices that can generate PME even with
main power off.  Also, we may want to try to wake up devices if the
config read fails for a reason other than the device being in D3cold.

What I don't like about the current code is that it checks
PCI_PM_CTRL_PME_STATUS in data that may be completely bogus.  Do you
think it would be better to do something like this:

  pci_read_config_word(dev, pmcsr_pos, &pmcsr);
  if (pmcsr == (u16) PCI_ERROR_RESPONSE) {
    if (pci_pme_capable(dev, PCI_PM_CAP_PME_D3cold))
      return true;
    return false;
  }

or maybe this:

  pci_read_config_word(dev, pmcsr_pos, &pmcsr);
  if (pmcsr == (u16) PCI_ERROR_RESPONSE)
    return true;

We should get PCI_ERROR_RESPONSE pretty reliably from devices in
D3cold, so the first possibility would cover that case.

But since pci_check_pme_status() basically returns a hint ("true"
means a device *may* have generated a PME), and even if the hint is
wrong, the worst that happens is an unnecessary wakeup, maybe the
second possibility is safer.

What do you think?

> >         if (!(pmcsr & PCI_PM_CTRL_PME_STATUS))
> >                 return false;
Rafael J. Wysocki Aug. 13, 2019, 11:26 p.m. UTC | #3
On Tuesday, August 6, 2019 3:36:38 PM CEST Bjorn Helgaas wrote:
> On Mon, Aug 05, 2019 at 11:02:51PM +0200, Rafael J. Wysocki wrote:
> > On Mon, Aug 5, 2019 at 10:52 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >
> > > pci_check_pme_status() reads the Power Management capability to determine
> > > whether a device has generated a PME.  The capability is in config space,
> > > which is accessible in D0, D1, D2, and D3hot, but not in D3cold.
> > >
> > > If we call pci_check_pme_status() on a device that's in D3cold, config
> > > reads fail and return ~0 data, which we erroneously interpreted as "the
> > > device has generated a PME".
> > >
> > > 000dd5316e1c ("PCI: Do not poll for PME if the device is in D3cold")
> > > avoided many of these problems by avoiding pci_check_pme_status() if we
> > > think the device is in D3cold.  However, it is not a complete fix because
> > > the device may go to D3cold after we check its power state but before
> > > pci_check_pme_status() reads the Power Management Status Register.
> > >
> > > Return false ("device has not generated a PME") if we get an error response
> > > reading the Power Management Status Register.
> > >
> > > Fixes: 000dd5316e1c ("PCI: Do not poll for PME if the device is in D3cold")
> > > Fixes: 71a83bd727cc ("PCI/PM: add runtime PM support to PCIe port")
> > > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > > ---
> > >  drivers/pci/pci.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index 984171d40858..af6a97d7012b 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -2008,6 +2008,9 @@ bool pci_check_pme_status(struct pci_dev *dev)
> > >
> > >         pmcsr_pos = dev->pm_cap + PCI_PM_CTRL;
> > >         pci_read_config_word(dev, pmcsr_pos, &pmcsr);
> > > +       if (pmcsr == (u16) PCI_ERROR_RESPONSE)
> > > +               return false;
> > 
> > No, sorry.
> > 
> > We tried that and it didn't work.
> > 
> > pcie_pme_handle_request() depends on this returning "true" for all
> > bits set, as from its perspective "device is not accessible" may very
> > well mean "device may have signaled PME".
> 
> Right, it's obviously wrong in the case of devices that advertise
> D3cold in PME_Support, i.e., devices that can generate PME even with
> main power off.  Also, we may want to try to wake up devices if the
> config read fails for a reason other than the device being in D3cold.
> 
> What I don't like about the current code is that it checks
> PCI_PM_CTRL_PME_STATUS in data that may be completely bogus.

Whether or not the other bits in the register make sense doesn't
matter here.  Only the PME_STATUS bit matters.

> Do you think it would be better to do something like this:
> 
>   pci_read_config_word(dev, pmcsr_pos, &pmcsr);
>   if (pmcsr == (u16) PCI_ERROR_RESPONSE) {
>     if (pci_pme_capable(dev, PCI_PM_CAP_PME_D3cold))
>       return true;
>     return false;
>   }
> 
> or maybe this:
> 
>   pci_read_config_word(dev, pmcsr_pos, &pmcsr);
>   if (pmcsr == (u16) PCI_ERROR_RESPONSE)
>     return true;

In this case it still would be prudent to check PME_ENABLE before
returning true and so there is no practical difference between
ERROR_RESPONSE and the valid data with PME_STATUS set.

Except that in the ERROR_RESPONSE case we may as well avoid the
PMCSR write which is not going to make a difference.

> We should get PCI_ERROR_RESPONSE pretty reliably from devices in
> D3cold, so the first possibility would cover that case.
>
> But since pci_check_pme_status() basically returns a hint ("true"
> means a device *may* have generated a PME), and even if the hint is
> wrong, the worst that happens is an unnecessary wakeup, maybe the
> second possibility is safer.
> 
> What do you think?

So if you really want to avoid the PMCSR write in the ERROR_RESPONSE case,
something like this can be done IMO:

 			return false;
 
 	/* Clear PME status. */
-	pmcsr |= PCI_PM_CTRL_PME_STATUS;
 	if (pmcsr & PCI_PM_CTRL_PME_ENABLE) {
+		if (pmcsr == (u16) PCI_ERROR_RESPONSE)
+			return true;
+
 		/* Disable PME to avoid interrupt flood. */
 		pmcsr &= ~PCI_PM_CTRL_PME_ENABLE;
 		ret = true;
Bjorn Helgaas Aug. 14, 2019, 1:15 a.m. UTC | #4
On Wed, Aug 14, 2019 at 01:26:56AM +0200, Rafael J. Wysocki wrote:
> On Tuesday, August 6, 2019 3:36:38 PM CEST Bjorn Helgaas wrote:
> > On Mon, Aug 05, 2019 at 11:02:51PM +0200, Rafael J. Wysocki wrote:
> > > On Mon, Aug 5, 2019 at 10:52 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > >
> > > > pci_check_pme_status() reads the Power Management capability to determine
> > > > whether a device has generated a PME.  The capability is in config space,
> > > > which is accessible in D0, D1, D2, and D3hot, but not in D3cold.
> > > >
> > > > If we call pci_check_pme_status() on a device that's in D3cold, config
> > > > reads fail and return ~0 data, which we erroneously interpreted as "the
> > > > device has generated a PME".
> > > >
> > > > 000dd5316e1c ("PCI: Do not poll for PME if the device is in D3cold")
> > > > avoided many of these problems by avoiding pci_check_pme_status() if we
> > > > think the device is in D3cold.  However, it is not a complete fix because
> > > > the device may go to D3cold after we check its power state but before
> > > > pci_check_pme_status() reads the Power Management Status Register.
> > > >
> > > > Return false ("device has not generated a PME") if we get an error response
> > > > reading the Power Management Status Register.
> > > >
> > > > Fixes: 000dd5316e1c ("PCI: Do not poll for PME if the device is in D3cold")
> > > > Fixes: 71a83bd727cc ("PCI/PM: add runtime PM support to PCIe port")
> > > > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > > > ---
> > > >  drivers/pci/pci.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > index 984171d40858..af6a97d7012b 100644
> > > > --- a/drivers/pci/pci.c
> > > > +++ b/drivers/pci/pci.c
> > > > @@ -2008,6 +2008,9 @@ bool pci_check_pme_status(struct pci_dev *dev)
> > > >
> > > >         pmcsr_pos = dev->pm_cap + PCI_PM_CTRL;
> > > >         pci_read_config_word(dev, pmcsr_pos, &pmcsr);
> > > > +       if (pmcsr == (u16) PCI_ERROR_RESPONSE)
> > > > +               return false;
> > > 
> > > No, sorry.
> > > 
> > > We tried that and it didn't work.
> > > 
> > > pcie_pme_handle_request() depends on this returning "true" for all
> > > bits set, as from its perspective "device is not accessible" may very
> > > well mean "device may have signaled PME".
> > 
> > Right, it's obviously wrong in the case of devices that advertise
> > D3cold in PME_Support, i.e., devices that can generate PME even with
> > main power off.  Also, we may want to try to wake up devices if the
> > config read fails for a reason other than the device being in D3cold.
> > 
> > What I don't like about the current code is that it checks
> > PCI_PM_CTRL_PME_STATUS in data that may be completely bogus.
> 
> Whether or not the other bits in the register make sense doesn't
> matter here.  Only the PME_STATUS bit matters.

Of course.  It just relies on the implicit assumption that the bit in
the error response matches the PME_STATUS state that we want, which is
a little bit ugly.

> > Do you think it would be better to do something like this:
> > 
> >   pci_read_config_word(dev, pmcsr_pos, &pmcsr);
> >   if (pmcsr == (u16) PCI_ERROR_RESPONSE) {
> >     if (pci_pme_capable(dev, PCI_PM_CAP_PME_D3cold))
> >       return true;
> >     return false;
> >   }
> > 
> > or maybe this:
> > 
> >   pci_read_config_word(dev, pmcsr_pos, &pmcsr);
> >   if (pmcsr == (u16) PCI_ERROR_RESPONSE)
> >     return true;
> 
> In this case it still would be prudent to check PME_ENABLE before
> returning true and so there is no practical difference between
> ERROR_RESPONSE and the valid data with PME_STATUS set.
> 
> Except that in the ERROR_RESPONSE case we may as well avoid the
> PMCSR write which is not going to make a difference.
> 
> > We should get PCI_ERROR_RESPONSE pretty reliably from devices in
> > D3cold, so the first possibility would cover that case.
> >
> > But since pci_check_pme_status() basically returns a hint ("true"
> > means a device *may* have generated a PME), and even if the hint is
> > wrong, the worst that happens is an unnecessary wakeup, maybe the
> > second possibility is safer.
> > 
> > What do you think?
> 
> So if you really want to avoid the PMCSR write in the ERROR_RESPONSE case,
> something like this can be done IMO:
> 
>  			return false;
>  
>  	/* Clear PME status. */
> -	pmcsr |= PCI_PM_CTRL_PME_STATUS;
>  	if (pmcsr & PCI_PM_CTRL_PME_ENABLE) {
> +		if (pmcsr == (u16) PCI_ERROR_RESPONSE)
> +			return true;
> +
>  		/* Disable PME to avoid interrupt flood. */
>  		pmcsr &= ~PCI_PM_CTRL_PME_ENABLE;
>  		ret = true;

Agreed, that's the conclusion I came to as well.  I wouldn't do this
just to avoid the config write, since as you mentioned that will get
dropped anyway.  The reason I would consider this is as an example of
how drivers might think about validating data they read from devices.

Bjorn
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 984171d40858..af6a97d7012b 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2008,6 +2008,9 @@  bool pci_check_pme_status(struct pci_dev *dev)
 
 	pmcsr_pos = dev->pm_cap + PCI_PM_CTRL;
 	pci_read_config_word(dev, pmcsr_pos, &pmcsr);
+	if (pmcsr == (u16) PCI_ERROR_RESPONSE)
+		return false;
+
 	if (!(pmcsr & PCI_PM_CTRL_PME_STATUS))
 		return false;