Message ID | 20230424055249.460381-2-kai.heng.feng@canonical.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | [v4,1/3] PCI/AER: Factor out interrupt toggling into helpers | expand |
On 4/23/23 10:52 PM, Kai-Heng Feng wrote: > PCIe service that shares IRQ with PME may cause spurious wakeup on > system suspend. > > PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states > that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready > (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose > much here to disable AER during system suspend. > > This is very similar to previous attempts to suspend AER and DPC [1], > but with a different reason. > > [1] https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.feng@canonical.com/ > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > --- IIUC, you encounter AER errors during the suspend/resume process, which results in AER IRQ. Because AER and PME share an IRQ, it is regarded as a spurious wake-up IRQ. So to fix it, you want to disable AER reporting, right? It looks like it is harmless to disable the AER during the suspend/resume path. But, I am wondering why we get these errors? Did you check what errors you get during the suspend/resume path? Are these errors valid? > drivers/pci/pcie/aer.c | 22 ++++++++++++++++++++++ > 1 file changed, 22 insertions(+) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 1420e1f27105..9c07fdbeb52d 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev) > return 0; > } > > +static int aer_suspend(struct pcie_device *dev) > +{ > + struct aer_rpc *rpc = get_service_data(dev); > + struct pci_dev *pdev = rpc->rpd; > + > + aer_disable_irq(pdev); > + > + return 0; > +} > + > +static int aer_resume(struct pcie_device *dev) > +{ > + struct aer_rpc *rpc = get_service_data(dev); > + struct pci_dev *pdev = rpc->rpd; > + > + aer_enable_irq(pdev); > + > + return 0; > +} > + > /** > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > * @dev: pointer to Root Port, RCEC, or RCiEP > @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = { > .service = PCIE_PORT_SERVICE_AER, > > .probe = aer_probe, > + .suspend = aer_suspend, > + .resume = aer_resume, > .remove = aer_remove, > }; >
On Tue, Apr 25, 2023 at 7:47 AM Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com> wrote: > > > > On 4/23/23 10:52 PM, Kai-Heng Feng wrote: > > PCIe service that shares IRQ with PME may cause spurious wakeup on > > system suspend. > > > > PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states > > that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready > > (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose > > much here to disable AER during system suspend. > > > > This is very similar to previous attempts to suspend AER and DPC [1], > > but with a different reason. > > > > [1] https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.feng@canonical.com/ > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > > > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > > --- > > IIUC, you encounter AER errors during the suspend/resume process, which > results in AER IRQ. Because AER and PME share an IRQ, it is regarded as a > spurious wake-up IRQ. So to fix it, you want to disable AER reporting, > right? Yes. That's exactly what happened. > > It looks like it is harmless to disable the AER during the suspend/resume > path. But, I am wondering why we get these errors? Did you check what errors > you get during the suspend/resume path? Are these errors valid? I really don't know. I think it's similar to the reasoning in commit b07461a8e45b ("PCI/AER: Clear error status registers during enumeration and restore"): "AER errors might be recorded when powering-on devices. These errors can be ignored, ...". For this case, it happens when powering-off the device (D3cold) via turning off power resources. Kai-Heng > > > > drivers/pci/pcie/aer.c | 22 ++++++++++++++++++++++ > > 1 file changed, 22 insertions(+) > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > index 1420e1f27105..9c07fdbeb52d 100644 > > --- a/drivers/pci/pcie/aer.c > > +++ b/drivers/pci/pcie/aer.c > > @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev) > > return 0; > > } > > > > +static int aer_suspend(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + aer_disable_irq(pdev); > > + > > + return 0; > > +} > > + > > +static int aer_resume(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + aer_enable_irq(pdev); > > + > > + return 0; > > +} > > + > > /** > > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > > * @dev: pointer to Root Port, RCEC, or RCiEP > > @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = { > > .service = PCIE_PORT_SERVICE_AER, > > > > .probe = aer_probe, > > + .suspend = aer_suspend, > > + .resume = aer_resume, > > .remove = aer_remove, > > }; > > > > -- > Sathyanarayanan Kuppuswamy > Linux Kernel Developer
Hi, On 4/24/23 10:55 PM, Kai-Heng Feng wrote: > On Tue, Apr 25, 2023 at 7:47 AM Sathyanarayanan Kuppuswamy > <sathyanarayanan.kuppuswamy@linux.intel.com> wrote: >> >> >> >> On 4/23/23 10:52 PM, Kai-Heng Feng wrote: >>> PCIe service that shares IRQ with PME may cause spurious wakeup on >>> system suspend. >>> >>> PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states >>> that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready >>> (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose >>> much here to disable AER during system suspend. >>> >>> This is very similar to previous attempts to suspend AER and DPC [1], >>> but with a different reason. >>> >>> [1] https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.feng@canonical.com/ >>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 >>> >>> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> >>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> >>> --- >> >> IIUC, you encounter AER errors during the suspend/resume process, which >> results in AER IRQ. Because AER and PME share an IRQ, it is regarded as a >> spurious wake-up IRQ. So to fix it, you want to disable AER reporting, >> right? > > Yes. That's exactly what happened. > >> >> It looks like it is harmless to disable the AER during the suspend/resume >> path. But, I am wondering why we get these errors? Did you check what errors >> you get during the suspend/resume path? Are these errors valid? > > I really don't know. I think it's similar to the reasoning in commit > b07461a8e45b ("PCI/AER: Clear error status registers during > enumeration and restore"): "AER errors might be recorded when > powering-on devices. These errors can be ignored, ...". > For this case, it happens when powering-off the device (D3cold) via > turning off power resources. Got it. Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> > > Kai-Heng > >> >> >>> drivers/pci/pcie/aer.c | 22 ++++++++++++++++++++++ >>> 1 file changed, 22 insertions(+) >>> >>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c >>> index 1420e1f27105..9c07fdbeb52d 100644 >>> --- a/drivers/pci/pcie/aer.c >>> +++ b/drivers/pci/pcie/aer.c >>> @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev) >>> return 0; >>> } >>> >>> +static int aer_suspend(struct pcie_device *dev) >>> +{ >>> + struct aer_rpc *rpc = get_service_data(dev); >>> + struct pci_dev *pdev = rpc->rpd; >>> + >>> + aer_disable_irq(pdev); >>> + >>> + return 0; >>> +} >>> + >>> +static int aer_resume(struct pcie_device *dev) >>> +{ >>> + struct aer_rpc *rpc = get_service_data(dev); >>> + struct pci_dev *pdev = rpc->rpd; >>> + >>> + aer_enable_irq(pdev); >>> + >>> + return 0; >>> +} >>> + >>> /** >>> * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP >>> * @dev: pointer to Root Port, RCEC, or RCiEP >>> @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = { >>> .service = PCIE_PORT_SERVICE_AER, >>> >>> .probe = aer_probe, >>> + .suspend = aer_suspend, >>> + .resume = aer_resume, >>> .remove = aer_remove, >>> }; >>> >> >> -- >> Sathyanarayanan Kuppuswamy >> Linux Kernel Developer
On Mon, Apr 24, 2023 at 01:52:48PM +0800, Kai-Heng Feng wrote: > PCIe service that shares IRQ with PME may cause spurious wakeup on > system suspend. > > PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states > that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready > (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose > much here to disable AER during system suspend. > > This is very similar to previous attempts to suspend AER and DPC [1], > but with a different reason. What is the reason? I assume it's something to do with the bugzilla below, but the commit log should outline the user-visible problem this fixes. The commit log basically makes the case for "why should we merge this patch." I assume it's along the lines of "I tried to suspend this system, but it immediately woke up again because of an AER interrupt, and disabling AER during suspend avoids this problem. And disabling the AER interrupt is not a problem because X" > [1] https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.feng@canonical.com/ > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > --- > drivers/pci/pcie/aer.c | 22 ++++++++++++++++++++++ > 1 file changed, 22 insertions(+) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 1420e1f27105..9c07fdbeb52d 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev) > return 0; > } > > +static int aer_suspend(struct pcie_device *dev) > +{ > + struct aer_rpc *rpc = get_service_data(dev); > + struct pci_dev *pdev = rpc->rpd; > + > + aer_disable_irq(pdev); > + > + return 0; > +} > + > +static int aer_resume(struct pcie_device *dev) > +{ > + struct aer_rpc *rpc = get_service_data(dev); > + struct pci_dev *pdev = rpc->rpd; > + > + aer_enable_irq(pdev); > + > + return 0; > +} > + > /** > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > * @dev: pointer to Root Port, RCEC, or RCiEP > @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = { > .service = PCIE_PORT_SERVICE_AER, > > .probe = aer_probe, > + .suspend = aer_suspend, > + .resume = aer_resume, > .remove = aer_remove, > }; > > -- > 2.34.1 >
On Sat, May 6, 2023 at 3:22 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Mon, Apr 24, 2023 at 01:52:48PM +0800, Kai-Heng Feng wrote: > > PCIe service that shares IRQ with PME may cause spurious wakeup on > > system suspend. > > > > PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states > > that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready > > (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose > > much here to disable AER during system suspend. > > > > This is very similar to previous attempts to suspend AER and DPC [1], > > but with a different reason. > > What is the reason? I assume it's something to do with the bugzilla > below, but the commit log should outline the user-visible problem this > fixes. The commit log basically makes the case for "why should we > merge this patch." > > I assume it's along the lines of "I tried to suspend this system, but > it immediately woke up again because of an AER interrupt, and > disabling AER during suspend avoids this problem. And disabling > the AER interrupt is not a problem because X" Yes that's the reason :) Will update the message to better reflect what's going on. Kai-Heng > > > [1] https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.feng@canonical.com/ > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > > > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > > --- > > drivers/pci/pcie/aer.c | 22 ++++++++++++++++++++++ > > 1 file changed, 22 insertions(+) > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > index 1420e1f27105..9c07fdbeb52d 100644 > > --- a/drivers/pci/pcie/aer.c > > +++ b/drivers/pci/pcie/aer.c > > @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev) > > return 0; > > } > > > > +static int aer_suspend(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + aer_disable_irq(pdev); > > + > > + return 0; > > +} > > + > > +static int aer_resume(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + aer_enable_irq(pdev); > > + > > + return 0; > > +} > > + > > /** > > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > > * @dev: pointer to Root Port, RCEC, or RCiEP > > @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = { > > .service = PCIE_PORT_SERVICE_AER, > > > > .probe = aer_probe, > > + .suspend = aer_suspend, > > + .resume = aer_resume, > > .remove = aer_remove, > > }; > > > > -- > > 2.34.1 > >
On Tue, Apr 25, 2023 at 7:47 AM Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com> wrote: > > > > On 4/23/23 10:52 PM, Kai-Heng Feng wrote: > > PCIe service that shares IRQ with PME may cause spurious wakeup on > > system suspend. > > > > PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states > > that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready > > (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose > > much here to disable AER during system suspend. > > > > This is very similar to previous attempts to suspend AER and DPC [1], > > but with a different reason. > > > > [1] https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.feng@canonical.com/ > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > > > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > > --- > > IIUC, you encounter AER errors during the suspend/resume process, which > results in AER IRQ. Because AER and PME share an IRQ, it is regarded as a > spurious wake-up IRQ. So to fix it, you want to disable AER reporting, > right? > > It looks like it is harmless to disable the AER during the suspend/resume > path. But, I am wondering why we get these errors? Did you check what errors > you get during the suspend/resume path? Are these errors valid? AFAIK those errors comes from firmware/hardware side, especially when the device gets put to D3hot/D3cold. Kai-Heng > > > > drivers/pci/pcie/aer.c | 22 ++++++++++++++++++++++ > > 1 file changed, 22 insertions(+) > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > index 1420e1f27105..9c07fdbeb52d 100644 > > --- a/drivers/pci/pcie/aer.c > > +++ b/drivers/pci/pcie/aer.c > > @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev) > > return 0; > > } > > > > +static int aer_suspend(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + aer_disable_irq(pdev); > > + > > + return 0; > > +} > > + > > +static int aer_resume(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + aer_enable_irq(pdev); > > + > > + return 0; > > +} > > + > > /** > > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > > * @dev: pointer to Root Port, RCEC, or RCiEP > > @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = { > > .service = PCIE_PORT_SERVICE_AER, > > > > .probe = aer_probe, > > + .suspend = aer_suspend, > > + .resume = aer_resume, > > .remove = aer_remove, > > }; > > > > -- > Sathyanarayanan Kuppuswamy > Linux Kernel Developer
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 1420e1f27105..9c07fdbeb52d 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev) return 0; } +static int aer_suspend(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + struct pci_dev *pdev = rpc->rpd; + + aer_disable_irq(pdev); + + return 0; +} + +static int aer_resume(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + struct pci_dev *pdev = rpc->rpd; + + aer_enable_irq(pdev); + + return 0; +} + /** * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP * @dev: pointer to Root Port, RCEC, or RCiEP @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = { .service = PCIE_PORT_SERVICE_AER, .probe = aer_probe, + .suspend = aer_suspend, + .resume = aer_resume, .remove = aer_remove, };