diff mbox

[v2] PCI: pciehp: Don't enable PME on runtime suspend

Message ID d1ae979d256631e7c189694a6d3e1a518abf3fc8.1486359844.git.lukas@wunner.de (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Lukas Wunner Feb. 6, 2017, 5:54 a.m. UTC
Since commit 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe
hotplug ports") we runtime suspend a hotplug port to D3hot when all its
children are runtime suspended or none are present.

When runtime suspending the port the PCI core automatically enables PME:
    pci_pm_runtime_suspend()
        pci_finish_runtime_suspend()
            __pci_enable_wake()

According to the PCI Express Base Specification, section 6.7.3.4:
   "Note that PME and Hot-Plug Event interrupts (when both are
    implemented) always share the same MSI or MSI-X vector [...]
    If wake generation is required by the associated form factor
    specification, a hot-plug capable Downstream Port must support
    generation of a wakeup event (using the PME mechanism) on hotplug
    events that occur when the system is in a sleep state or the Port
    is in device state D1, D2, or D3Hot."

Thus, if the port is runtime suspended even though it is still occupied,
it may immediately be woken by a PME interrupt.  One scenario where this
happens is if all children of the hotplug port have runtime suspended.
Another scenario is power control via sysfs:  If a user manually turns
the hotplug port off (e.g. to safely remove the card), PME will signal
an interrupt for the still-occupied slot, which is interpreted by pciehp
as re-insertion of a card.  As a result, power control via sysfs is no
longer possible.  This was observed and reported by Yinghai Lu.

PME is in fact unnecessary on hotplug ports:  Hotplug can be signaled
even in D3hot, and commit 68db9bc81436 ensures that all parents of the
hotplug port are kept awake so that interrupts can be delivered.
PME would allow us to runtime suspend the parent ports as well, but we
do not make use of it because we cannot be sure if PME actually works.
Thunderbolt controllers for instance advertise PME capability, but at
least on Macs the PME pin is not connected.

Since we do not rely on PME for hotplug ports, we may as well not enable
it, thereby avoiding its negative side effects.  However the present
commit deliberately only avoids enabling PME on runtime suspend, the
ability to enable it for system sleep is retained.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=193951
Fixes: 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe
    hotplug ports")
Reported-by: Yinghai Lu <yinghai@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---

v1-> v2:
 Move check for is_hotplug_bridge from pci_finish_runtime_suspend()
 down into pci_dev_run_wake(), this seems cleaner, less clumsy.

 drivers/pci/pci.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Rafael J. Wysocki Feb. 6, 2017, 11:56 a.m. UTC | #1
On Monday, February 06, 2017 06:54:37 AM Lukas Wunner wrote:
> Since commit 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe
> hotplug ports") we runtime suspend a hotplug port to D3hot when all its
> children are runtime suspended or none are present.
> 
> When runtime suspending the port the PCI core automatically enables PME:
>     pci_pm_runtime_suspend()
>         pci_finish_runtime_suspend()
>             __pci_enable_wake()
> 
> According to the PCI Express Base Specification, section 6.7.3.4:
>    "Note that PME and Hot-Plug Event interrupts (when both are
>     implemented) always share the same MSI or MSI-X vector [...]
>     If wake generation is required by the associated form factor
>     specification, a hot-plug capable Downstream Port must support
>     generation of a wakeup event (using the PME mechanism) on hotplug
>     events that occur when the system is in a sleep state or the Port
>     is in device state D1, D2, or D3Hot."
> 
> Thus, if the port is runtime suspended even though it is still occupied,
> it may immediately be woken by a PME interrupt.  One scenario where this
> happens is if all children of the hotplug port have runtime suspended.
> Another scenario is power control via sysfs:  If a user manually turns
> the hotplug port off (e.g. to safely remove the card), PME will signal
> an interrupt for the still-occupied slot, which is interpreted by pciehp
> as re-insertion of a card.  As a result, power control via sysfs is no
> longer possible.  This was observed and reported by Yinghai Lu.
> 
> PME is in fact unnecessary on hotplug ports:  Hotplug can be signaled
> even in D3hot, and commit 68db9bc81436 ensures that all parents of the
> hotplug port are kept awake so that interrupts can be delivered.
> PME would allow us to runtime suspend the parent ports as well, but we
> do not make use of it because we cannot be sure if PME actually works.
> Thunderbolt controllers for instance advertise PME capability, but at
> least on Macs the PME pin is not connected.
> 
> Since we do not rely on PME for hotplug ports, we may as well not enable
> it, thereby avoiding its negative side effects.  However the present
> commit deliberately only avoids enabling PME on runtime suspend, the
> ability to enable it for system sleep is retained.
> 
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=193951
> Fixes: 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe
>     hotplug ports")
> Reported-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
> 
> v1-> v2:
>  Move check for is_hotplug_bridge from pci_finish_runtime_suspend()
>  down into pci_dev_run_wake(), this seems cleaner, less clumsy.

ACK

> 
>  drivers/pci/pci.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index a881c0d..9c22e62 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2096,6 +2096,14 @@ bool pci_dev_run_wake(struct pci_dev *dev)
>  {
>  	struct pci_bus *bus = dev->bus;
>  
> +	/*
> +	 * Don't enable PME at runtime on hotplug ports (even if supported)
> +	 * since PME sends unwanted interrupts if the slot is occupied while
> +	 * suspended to D3hot (PCIe Base Specification, section 6.7.3.4).
> +	 */
> +	if (dev->is_hotplug_bridge)
> +		return false;
> +
>  	if (device_run_wake(&dev->dev))
>  		return true;

Thanks,
Rafael
Bjorn Helgaas Feb. 6, 2017, 5:54 p.m. UTC | #2
On Mon, Feb 06, 2017 at 06:54:37AM +0100, Lukas Wunner wrote:
> Since commit 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe
> hotplug ports") we runtime suspend a hotplug port to D3hot when all its
> children are runtime suspended or none are present.
> 
> When runtime suspending the port the PCI core automatically enables PME:
>     pci_pm_runtime_suspend()
>         pci_finish_runtime_suspend()
>             __pci_enable_wake()
> 
> According to the PCI Express Base Specification, section 6.7.3.4:
>    "Note that PME and Hot-Plug Event interrupts (when both are
>     implemented) always share the same MSI or MSI-X vector [...]
>     If wake generation is required by the associated form factor
>     specification, a hot-plug capable Downstream Port must support
>     generation of a wakeup event (using the PME mechanism) on hotplug
>     events that occur when the system is in a sleep state or the Port
>     is in device state D1, D2, or D3Hot."
> 
> Thus, if the port is runtime suspended even though it is still occupied,
> it may immediately be woken by a PME interrupt.  

The spec goes on to say that a wakeup event should be generated when
all three of these conditions occur:

  - status register for an enabled [hotplug] event transitions from
    not set to set

  - Port is in D1, D2, or D3hot,

  - PME_En is set

I think you're saying that if we put a hotplug-capable port that
controls an occupied slot into D3hot, the port may immediately
generate a wakeup PME.

What is the hotplug event that causes generation of this wakeup event?

> One scenario where this
> happens is if all children of the hotplug port have runtime suspended.
> Another scenario is power control via sysfs:  If a user manually turns
> the hotplug port off (e.g. to safely remove the card), PME will signal
> an interrupt for the still-occupied slot, which is interpreted by pciehp
> as re-insertion of a card.  As a result, power control via sysfs is no
> longer possible.  This was observed and reported by Yinghai Lu.
> 
> PME is in fact unnecessary on hotplug ports:  Hotplug can be signaled
> even in D3hot,

How are hotplug events signaled in D3hot without using PME?  I'm
looking at the PCI PM spec, r1.2, table 5-4 (p 49 in my copy), which
says a function in D3hot can only generate PME.

> and commit 68db9bc81436 ensures that all parents of the
> hotplug port are kept awake so that interrupts can be delivered.
> PME would allow us to runtime suspend the parent ports as well, but we
> do not make use of it because we cannot be sure if PME actually works.
> Thunderbolt controllers for instance advertise PME capability, but at
> least on Macs the PME pin is not connected.
> 
> Since we do not rely on PME for hotplug ports, we may as well not enable
> it, thereby avoiding its negative side effects.  However the present
> commit deliberately only avoids enabling PME on runtime suspend, the
> ability to enable it for system sleep is retained.
> 
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=193951
> Fixes: 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe
>     hotplug ports")
> Reported-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
> 
> v1-> v2:
>  Move check for is_hotplug_bridge from pci_finish_runtime_suspend()
>  down into pci_dev_run_wake(), this seems cleaner, less clumsy.
> 
>  drivers/pci/pci.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index a881c0d..9c22e62 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2096,6 +2096,14 @@ bool pci_dev_run_wake(struct pci_dev *dev)
>  {
>  	struct pci_bus *bus = dev->bus;
>  
> +	/*
> +	 * Don't enable PME at runtime on hotplug ports (even if supported)
> +	 * since PME sends unwanted interrupts if the slot is occupied while
> +	 * suspended to D3hot (PCIe Base Specification, section 6.7.3.4).
> +	 */
> +	if (dev->is_hotplug_bridge)
> +		return false;
> +
>  	if (device_run_wake(&dev->dev))
>  		return true;
>  
> -- 
> 2.11.0
>
Lukas Wunner Feb. 6, 2017, 9:20 p.m. UTC | #3
On Mon, Feb 06, 2017 at 11:54:05AM -0600, Bjorn Helgaas wrote:
> On Mon, Feb 06, 2017 at 06:54:37AM +0100, Lukas Wunner wrote:
> > Since commit 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe
> > hotplug ports") we runtime suspend a hotplug port to D3hot when all its
> > children are runtime suspended or none are present.
> > 
> > When runtime suspending the port the PCI core automatically enables PME:
> >     pci_pm_runtime_suspend()
> >         pci_finish_runtime_suspend()
> >             __pci_enable_wake()
> > 
> > According to the PCI Express Base Specification, section 6.7.3.4:
> >    "Note that PME and Hot-Plug Event interrupts (when both are
> >     implemented) always share the same MSI or MSI-X vector [...]
> >     If wake generation is required by the associated form factor
> >     specification, a hot-plug capable Downstream Port must support
> >     generation of a wakeup event (using the PME mechanism) on hotplug
> >     events that occur when the system is in a sleep state or the Port
> >     is in device state D1, D2, or D3Hot."
> > 
> > Thus, if the port is runtime suspended even though it is still occupied,
> > it may immediately be woken by a PME interrupt.
> 
> The spec goes on to say that a wakeup event should be generated when
> all three of these conditions occur:
> 
>   - status register for an enabled [hotplug] event transitions from
>     not set to set
> 
>   - Port is in D1, D2, or D3hot,
> 
>   - PME_En is set
> 
> I think you're saying that if we put a hotplug-capable port that
> controls an occupied slot into D3hot, the port may immediately
> generate a wakeup PME.
> 
> What is the hotplug event that causes generation of this wakeup event?

If you had read all e-mails in this thread or looked at the bugzilla
entry I've created, you wouldn't have to ask this question.

I think it's disappointing that you're asking me to jump through
various hoops like creating a bugzilla entry, as well as threatening
to revert my patch, but are unwilling to even look at the bugzilla
entry or read the entire thread.  It is equally disappointing that
the reporter of the regression was unwilling or unable to provide
dmesg output for both machines so that we've got no real idea what
we're dealing with.

It's a Link Up event.

Which doesn't make sense of course because per the spec PME is only
supposed to be signaled when the Link Status *changes* from 0 to 1,
which is not supposed to happen for a slot that has been manually
disabled.  It's a quirk of this (unknown) hardware.


> > PME is in fact unnecessary on hotplug ports:  Hotplug can be signaled
> > even in D3hot,
> 
> How are hotplug events signaled in D3hot without using PME?  I'm
> looking at the PCI PM spec, r1.2, table 5-4 (p 49 in my copy), which
> says a function in D3hot can only generate PME.

The table doesn't seem to be in congruence with reality.  The hotplug
ports on Thunderbolt chips emit hotplug interrupts in D3hot regardless
of PME.  The Broadcom wireless cards I submitted an early quirk for
last year likewise emitted interrupts in D3hot.

Lukas
Rafael J. Wysocki Feb. 6, 2017, 9:27 p.m. UTC | #4
On Monday, February 06, 2017 10:20:41 PM Lukas Wunner wrote:
> On Mon, Feb 06, 2017 at 11:54:05AM -0600, Bjorn Helgaas wrote:
> > On Mon, Feb 06, 2017 at 06:54:37AM +0100, Lukas Wunner wrote:
> > > Since commit 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe
> > > hotplug ports") we runtime suspend a hotplug port to D3hot when all its
> > > children are runtime suspended or none are present.
> > > 
> > > When runtime suspending the port the PCI core automatically enables PME:
> > >     pci_pm_runtime_suspend()
> > >         pci_finish_runtime_suspend()
> > >             __pci_enable_wake()
> > > 
> > > According to the PCI Express Base Specification, section 6.7.3.4:
> > >    "Note that PME and Hot-Plug Event interrupts (when both are
> > >     implemented) always share the same MSI or MSI-X vector [...]
> > >     If wake generation is required by the associated form factor
> > >     specification, a hot-plug capable Downstream Port must support
> > >     generation of a wakeup event (using the PME mechanism) on hotplug
> > >     events that occur when the system is in a sleep state or the Port
> > >     is in device state D1, D2, or D3Hot."
> > > 
> > > Thus, if the port is runtime suspended even though it is still occupied,
> > > it may immediately be woken by a PME interrupt.
> > 
> > The spec goes on to say that a wakeup event should be generated when
> > all three of these conditions occur:
> > 
> >   - status register for an enabled [hotplug] event transitions from
> >     not set to set
> > 
> >   - Port is in D1, D2, or D3hot,
> > 
> >   - PME_En is set
> > 
> > I think you're saying that if we put a hotplug-capable port that
> > controls an occupied slot into D3hot, the port may immediately
> > generate a wakeup PME.
> > 
> > What is the hotplug event that causes generation of this wakeup event?
> 
> If you had read all e-mails in this thread or looked at the bugzilla
> entry I've created, you wouldn't have to ask this question.
> 
> I think it's disappointing that you're asking me to jump through
> various hoops like creating a bugzilla entry, as well as threatening
> to revert my patch, but are unwilling to even look at the bugzilla
> entry or read the entire thread.  It is equally disappointing that
> the reporter of the regression was unwilling or unable to provide
> dmesg output for both machines so that we've got no real idea what
> we're dealing with.
> 
> It's a Link Up event.
> 
> Which doesn't make sense of course because per the spec PME is only
> supposed to be signaled when the Link Status *changes* from 0 to 1,
> which is not supposed to happen for a slot that has been manually
> disabled.  It's a quirk of this (unknown) hardware.
> 
> 
> > > PME is in fact unnecessary on hotplug ports:  Hotplug can be signaled
> > > even in D3hot,
> > 
> > How are hotplug events signaled in D3hot without using PME?  I'm
> > looking at the PCI PM spec, r1.2, table 5-4 (p 49 in my copy), which
> > says a function in D3hot can only generate PME.
> 
> The table doesn't seem to be in congruence with reality.  The hotplug
> ports on Thunderbolt chips emit hotplug interrupts in D3hot regardless
> of PME.  The Broadcom wireless cards I submitted an early quirk for
> last year likewise emitted interrupts in D3hot.

Which is not spec-compliant, so quirks are the way to go here IMO, for endponts.

Ports are a bit different, because the PCIe spec isn't particularly clear about
what events can be generated by ports in D3hot.  In fact, it is not particuarly
clear what D3hot really means for root ports for that matter.

Thanks,
Rafael
Rafael J. Wysocki Feb. 6, 2017, 9:52 p.m. UTC | #5
On Monday, February 06, 2017 10:20:41 PM Lukas Wunner wrote:
> On Mon, Feb 06, 2017 at 11:54:05AM -0600, Bjorn Helgaas wrote:
> > On Mon, Feb 06, 2017 at 06:54:37AM +0100, Lukas Wunner wrote:
> > > Since commit 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe
> > > hotplug ports") we runtime suspend a hotplug port to D3hot when all its
> > > children are runtime suspended or none are present.
> > > 
> > > When runtime suspending the port the PCI core automatically enables PME:
> > >     pci_pm_runtime_suspend()
> > >         pci_finish_runtime_suspend()
> > >             __pci_enable_wake()
> > > 
> > > According to the PCI Express Base Specification, section 6.7.3.4:
> > >    "Note that PME and Hot-Plug Event interrupts (when both are
> > >     implemented) always share the same MSI or MSI-X vector [...]
> > >     If wake generation is required by the associated form factor
> > >     specification, a hot-plug capable Downstream Port must support
> > >     generation of a wakeup event (using the PME mechanism) on hotplug
> > >     events that occur when the system is in a sleep state or the Port
> > >     is in device state D1, D2, or D3Hot."
> > > 
> > > Thus, if the port is runtime suspended even though it is still occupied,
> > > it may immediately be woken by a PME interrupt.
> > 
> > The spec goes on to say that a wakeup event should be generated when
> > all three of these conditions occur:
> > 
> >   - status register for an enabled [hotplug] event transitions from
> >     not set to set
> > 
> >   - Port is in D1, D2, or D3hot,
> > 
> >   - PME_En is set
> > 
> > I think you're saying that if we put a hotplug-capable port that
> > controls an occupied slot into D3hot, the port may immediately
> > generate a wakeup PME.
> > 
> > What is the hotplug event that causes generation of this wakeup event?
> 
> If you had read all e-mails in this thread or looked at the bugzilla
> entry I've created, you wouldn't have to ask this question.
> 
> I think it's disappointing that you're asking me to jump through
> various hoops like creating a bugzilla entry, as well as threatening
> to revert my patch, but are unwilling to even look at the bugzilla
> entry or read the entire thread.  It is equally disappointing that
> the reporter of the regression was unwilling or unable to provide
> dmesg output for both machines so that we've got no real idea what
> we're dealing with.
> 
> It's a Link Up event.
> 
> Which doesn't make sense of course because per the spec PME is only
> supposed to be signaled when the Link Status *changes* from 0 to 1,

BTW, where exactly did you find this bit in the spec?

Thanks,
Rafael
Bjorn Helgaas Feb. 6, 2017, 10:15 p.m. UTC | #6
On Mon, Feb 06, 2017 at 10:20:41PM +0100, Lukas Wunner wrote:
> On Mon, Feb 06, 2017 at 11:54:05AM -0600, Bjorn Helgaas wrote:
> > On Mon, Feb 06, 2017 at 06:54:37AM +0100, Lukas Wunner wrote:
> > > Since commit 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe
> > > hotplug ports") we runtime suspend a hotplug port to D3hot when all its
> > > children are runtime suspended or none are present.
> > > 
> > > When runtime suspending the port the PCI core automatically enables PME:
> > >     pci_pm_runtime_suspend()
> > >         pci_finish_runtime_suspend()
> > >             __pci_enable_wake()
> > > 
> > > According to the PCI Express Base Specification, section 6.7.3.4:
> > >    "Note that PME and Hot-Plug Event interrupts (when both are
> > >     implemented) always share the same MSI or MSI-X vector [...]
> > >     If wake generation is required by the associated form factor
> > >     specification, a hot-plug capable Downstream Port must support
> > >     generation of a wakeup event (using the PME mechanism) on hotplug
> > >     events that occur when the system is in a sleep state or the Port
> > >     is in device state D1, D2, or D3Hot."
> > > 
> > > Thus, if the port is runtime suspended even though it is still occupied,
> > > it may immediately be woken by a PME interrupt.
> > 
> > The spec goes on to say that a wakeup event should be generated when
> > all three of these conditions occur:
> > 
> >   - status register for an enabled [hotplug] event transitions from
> >     not set to set
> > 
> >   - Port is in D1, D2, or D3hot,
> > 
> >   - PME_En is set
> > 
> > I think you're saying that if we put a hotplug-capable port that
> > controls an occupied slot into D3hot, the port may immediately
> > generate a wakeup PME.
> > 
> > What is the hotplug event that causes generation of this wakeup event?
> 
> If you had read all e-mails in this thread or looked at the bugzilla
> entry I've created, you wouldn't have to ask this question.

I'm sorry, I don't necessarily have time to sort through all the
emails.  My idea is that the changelog should be a self-contained
justification for the patch.  The bugzilla is for supporting details
and future archaeologists.

> I think it's disappointing that you're asking me to jump through
> various hoops like creating a bugzilla entry, as well as threatening
> to revert my patch, but are unwilling to even look at the bugzilla
> entry or read the entire thread.  It is equally disappointing that
> the reporter of the regression was unwilling or unable to provide
> dmesg output for both machines so that we've got no real idea what
> we're dealing with.

I beg your pardon?  I don't think it's fair to malign Yinghai.  He's
tested at least two machines and at least two patches, and it's only
been two working days since he reported the problem.  He deserves
great thanks for finding this issue early.

If you think a bugzilla is onerous or a revert of a patch that breaks
something is inappropriate, we might have to just disagree.

I'll come back to this later.  I'm still hoping that somebody will do
some experiments with pciehp out of the picture, using setpci to walk
through this manually.  At this point I'm more inclined to suspect a
pciehp issue than a hardware erratum.  If we could reproduce a problem
without pciehp in the picture, that would be much more convincing.

Bjorn
Lukas Wunner Feb. 7, 2017, 6:21 a.m. UTC | #7
On Mon, Feb 06, 2017 at 04:15:02PM -0600, Bjorn Helgaas wrote:
> On Mon, Feb 06, 2017 at 10:20:41PM +0100, Lukas Wunner wrote:
> > On Mon, Feb 06, 2017 at 11:54:05AM -0600, Bjorn Helgaas wrote:
> > > What is the hotplug event that causes generation of this wakeup event?
> > 
> > If you had read all e-mails in this thread or looked at the bugzilla
> > entry I've created, you wouldn't have to ask this question.
> 
> I'm sorry, I don't necessarily have time to sort through all the
> emails.  My idea is that the changelog should be a self-contained
> justification for the patch.  The bugzilla is for supporting details
> and future archaeologists.
> 
> > I think it's disappointing that you're asking me to jump through
> > various hoops like creating a bugzilla entry, as well as threatening
> > to revert my patch, but are unwilling to even look at the bugzilla
> > entry or read the entire thread.  It is equally disappointing that
> > the reporter of the regression was unwilling or unable to provide
> > dmesg output for both machines so that we've got no real idea what
> > we're dealing with.
> 
> I beg your pardon?  I don't think it's fair to malign Yinghai.  He's
> tested at least two machines and at least two patches, and it's only
> been two working days since he reported the problem.

I think the commercialization of Linux kernel development has put this
open source project in a sorry state if an unpaid volunteer is told off
because he expresses disappointment that a paid contributor is asking
him to debug an issue on secret hardware using secret patches and not
providing secret dmesg output.


> If you think a bugzilla is onerous

Hold on.  I didn't say a bugzilla is onerous, I said I'm disappointed
that you're asking me to create one and then don't look at it.

Lukas
Lukas Wunner Feb. 7, 2017, 6:26 a.m. UTC | #8
On Mon, Feb 06, 2017 at 10:52:12PM +0100, Rafael J. Wysocki wrote:
> On Monday, February 06, 2017 10:20:41 PM Lukas Wunner wrote:
> > On Mon, Feb 06, 2017 at 11:54:05AM -0600, Bjorn Helgaas wrote:
> > > On Mon, Feb 06, 2017 at 06:54:37AM +0100, Lukas Wunner wrote:
> > > > Since commit 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe
> > > > hotplug ports") we runtime suspend a hotplug port to D3hot when all its
> > > > children are runtime suspended or none are present.
> > > > 
> > > > When runtime suspending the port the PCI core automatically enables PME:
> > > >     pci_pm_runtime_suspend()
> > > >         pci_finish_runtime_suspend()
> > > >             __pci_enable_wake()
> > > > 
> > > > According to the PCI Express Base Specification, section 6.7.3.4:
> > > >    "Note that PME and Hot-Plug Event interrupts (when both are
> > > >     implemented) always share the same MSI or MSI-X vector [...]
> > > >     If wake generation is required by the associated form factor
> > > >     specification, a hot-plug capable Downstream Port must support
> > > >     generation of a wakeup event (using the PME mechanism) on hotplug
> > > >     events that occur when the system is in a sleep state or the Port
> > > >     is in device state D1, D2, or D3Hot."
> > > > 
> > > > Thus, if the port is runtime suspended even though it is still occupied,
> > > > it may immediately be woken by a PME interrupt.
> > > 
> > > The spec goes on to say that a wakeup event should be generated when
> > > all three of these conditions occur:
> > > 
> > >   - status register for an enabled [hotplug] event transitions from
> > >     not set to set
> > > 
> > >   - Port is in D1, D2, or D3hot,
> > > 
> > >   - PME_En is set
> > > 
> > > I think you're saying that if we put a hotplug-capable port that
> > > controls an occupied slot into D3hot, the port may immediately
> > > generate a wakeup PME.
> > > 
> > > What is the hotplug event that causes generation of this wakeup event?
> > 
> > If you had read all e-mails in this thread or looked at the bugzilla
> > entry I've created, you wouldn't have to ask this question.
> > 
> > I think it's disappointing that you're asking me to jump through
> > various hoops like creating a bugzilla entry, as well as threatening
> > to revert my patch, but are unwilling to even look at the bugzilla
> > entry or read the entire thread.  It is equally disappointing that
> > the reporter of the regression was unwilling or unable to provide
> > dmesg output for both machines so that we've got no real idea what
> > we're dealing with.
> > 
> > It's a Link Up event.
> > 
> > Which doesn't make sense of course because per the spec PME is only
> > supposed to be signaled when the Link Status *changes* from 0 to 1,
> 
> BTW, where exactly did you find this bit in the spec?

PCIe Base Spec, section 6.7.3.4:

"For form factors that support wake generation, a wakeup event must be
 generated if all three of the following conditions occur:

 * The status register for an enabled event transitions from not set to set
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 * The Port is in device state D1, D2, or D3Hot, and
 * The PME_En bit in the Port's Power Management Control/Status register is
   set"


My point was that the Link Status shouldn't arbitrarily change from 0 to 1
after powering off the slot.

Best regards,

Lukas
Rafael J. Wysocki Feb. 7, 2017, 4:04 p.m. UTC | #9
On Tuesday, February 07, 2017 07:21:01 AM Lukas Wunner wrote:
> On Mon, Feb 06, 2017 at 04:15:02PM -0600, Bjorn Helgaas wrote:
> > On Mon, Feb 06, 2017 at 10:20:41PM +0100, Lukas Wunner wrote:
> > > On Mon, Feb 06, 2017 at 11:54:05AM -0600, Bjorn Helgaas wrote:
> > > > What is the hotplug event that causes generation of this wakeup event?
> > > 
> > > If you had read all e-mails in this thread or looked at the bugzilla
> > > entry I've created, you wouldn't have to ask this question.
> > 
> > I'm sorry, I don't necessarily have time to sort through all the
> > emails.  My idea is that the changelog should be a self-contained
> > justification for the patch.  The bugzilla is for supporting details
> > and future archaeologists.
> > 
> > > I think it's disappointing that you're asking me to jump through
> > > various hoops like creating a bugzilla entry, as well as threatening
> > > to revert my patch, but are unwilling to even look at the bugzilla
> > > entry or read the entire thread.  It is equally disappointing that
> > > the reporter of the regression was unwilling or unable to provide
> > > dmesg output for both machines so that we've got no real idea what
> > > we're dealing with.
> > 
> > I beg your pardon?  I don't think it's fair to malign Yinghai.  He's
> > tested at least two machines and at least two patches, and it's only
> > been two working days since he reported the problem.
> 
> I think the commercialization of Linux kernel development has put this
> open source project in a sorry state if an unpaid volunteer is told off
> because he expresses disappointment that a paid contributor is asking
> him to debug an issue on secret hardware using secret patches and not
> providing secret dmesg output.

That's not like a lot has changed in that respect for the last 10 years and
I was in your spot at that time.

Such systems have always been there and we've had to tackle problems
with them regardless.

You seem to be disappointed that Yinghai has reported the problem at all,
given that the hardware is unreleased and so on, but problem reports,
even for systems like that, are what allows us to create code that works
for everybody, so we (the maintainers) appreciate them very much.

The bottom line, in any case, is that the current code causes problems to
happen somewhere and as a rule we don't release code that is known to
cause problems to happen to anyone.  This means something needs to be done
about that and the choice at this point is pretty much between reverting and
quirking the affected system(s).

I'm not a big fan of reverts, because they tend to reward bug reporters only
and disincentivize developers, which may be fine if those developers are paid
for their work, but really is not awesome for volunteers.  So I would add a quirk
even though more systems may be affected, but there's no evidence either
way right now.

That's a Bjorn's call, however.

Thanks,
Rafael
Rafael J. Wysocki Feb. 7, 2017, 4:15 p.m. UTC | #10
On Tuesday, February 07, 2017 07:26:59 AM Lukas Wunner wrote:
> On Mon, Feb 06, 2017 at 10:52:12PM +0100, Rafael J. Wysocki wrote:
> > On Monday, February 06, 2017 10:20:41 PM Lukas Wunner wrote:
> > > On Mon, Feb 06, 2017 at 11:54:05AM -0600, Bjorn Helgaas wrote:
> > > > On Mon, Feb 06, 2017 at 06:54:37AM +0100, Lukas Wunner wrote:
> > > > > Since commit 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe
> > > > > hotplug ports") we runtime suspend a hotplug port to D3hot when all its
> > > > > children are runtime suspended or none are present.
> > > > > 
> > > > > When runtime suspending the port the PCI core automatically enables PME:
> > > > >     pci_pm_runtime_suspend()
> > > > >         pci_finish_runtime_suspend()
> > > > >             __pci_enable_wake()
> > > > > 
> > > > > According to the PCI Express Base Specification, section 6.7.3.4:
> > > > >    "Note that PME and Hot-Plug Event interrupts (when both are
> > > > >     implemented) always share the same MSI or MSI-X vector [...]
> > > > >     If wake generation is required by the associated form factor
> > > > >     specification, a hot-plug capable Downstream Port must support
> > > > >     generation of a wakeup event (using the PME mechanism) on hotplug
> > > > >     events that occur when the system is in a sleep state or the Port
> > > > >     is in device state D1, D2, or D3Hot."
> > > > > 
> > > > > Thus, if the port is runtime suspended even though it is still occupied,
> > > > > it may immediately be woken by a PME interrupt.
> > > > 
> > > > The spec goes on to say that a wakeup event should be generated when
> > > > all three of these conditions occur:
> > > > 
> > > >   - status register for an enabled [hotplug] event transitions from
> > > >     not set to set
> > > > 
> > > >   - Port is in D1, D2, or D3hot,
> > > > 
> > > >   - PME_En is set
> > > > 
> > > > I think you're saying that if we put a hotplug-capable port that
> > > > controls an occupied slot into D3hot, the port may immediately
> > > > generate a wakeup PME.
> > > > 
> > > > What is the hotplug event that causes generation of this wakeup event?
> > > 
> > > If you had read all e-mails in this thread or looked at the bugzilla
> > > entry I've created, you wouldn't have to ask this question.
> > > 
> > > I think it's disappointing that you're asking me to jump through
> > > various hoops like creating a bugzilla entry, as well as threatening
> > > to revert my patch, but are unwilling to even look at the bugzilla
> > > entry or read the entire thread.  It is equally disappointing that
> > > the reporter of the regression was unwilling or unable to provide
> > > dmesg output for both machines so that we've got no real idea what
> > > we're dealing with.
> > > 
> > > It's a Link Up event.
> > > 
> > > Which doesn't make sense of course because per the spec PME is only
> > > supposed to be signaled when the Link Status *changes* from 0 to 1,
> > 
> > BTW, where exactly did you find this bit in the spec?
> 
> PCIe Base Spec, section 6.7.3.4:

That whole section isn't particularly clear to me to be honest.  At least it
may be subject to interpretation leading to different types of behavior IMO.

> "For form factors that support wake generation, a wakeup event must be
>  generated if all three of the following conditions occur:
> 
>  * The status register for an enabled event transitions from not set to set
>                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Are you sure that the "status register" here refers to Link Status?

>  * The Port is in device state D1, D2, or D3Hot, and
>  * The PME_En bit in the Port's Power Management Control/Status register is
>    set"

Also it doesn't say when PMEs can be generated.  It only says when it is
required to generate them.
 
> 
> My point was that the Link Status shouldn't arbitrarily change from 0 to 1
> after powering off the slot.

That's a fair point, but I'm not sure this is what actually happens.

Thanks,
Rafael
Lukas Wunner Feb. 8, 2017, 4:23 a.m. UTC | #11
On Tue, Feb 07, 2017 at 05:04:45PM +0100, Rafael J. Wysocki wrote:
> On Tuesday, February 07, 2017 07:21:01 AM Lukas Wunner wrote:
> > On Mon, Feb 06, 2017 at 04:15:02PM -0600, Bjorn Helgaas wrote:
> > > On Mon, Feb 06, 2017 at 10:20:41PM +0100, Lukas Wunner wrote:
> > > > On Mon, Feb 06, 2017 at 11:54:05AM -0600, Bjorn Helgaas wrote:
> > > > > What is the hotplug event that causes generation of this wakeup event?
> > > > 
> > > > If you had read all e-mails in this thread or looked at the bugzilla
> > > > entry I've created, you wouldn't have to ask this question.
> > > 
> > > I'm sorry, I don't necessarily have time to sort through all the
> > > emails.  My idea is that the changelog should be a self-contained
> > > justification for the patch.  The bugzilla is for supporting details
> > > and future archaeologists.
> > > 
> > > > I think it's disappointing that you're asking me to jump through
> > > > various hoops like creating a bugzilla entry, as well as threatening
> > > > to revert my patch, but are unwilling to even look at the bugzilla
> > > > entry or read the entire thread.  It is equally disappointing that
> > > > the reporter of the regression was unwilling or unable to provide
> > > > dmesg output for both machines so that we've got no real idea what
> > > > we're dealing with.
> > > 
> > > I beg your pardon?  I don't think it's fair to malign Yinghai.  He's
> > > tested at least two machines and at least two patches, and it's only
> > > been two working days since he reported the problem.
> > 
> > I think the commercialization of Linux kernel development has put this
> > open source project in a sorry state if an unpaid volunteer is told off
> > because he expresses disappointment that a paid contributor is asking
> > him to debug an issue on secret hardware using secret patches and not
> > providing secret dmesg output.
> 
> That's not like a lot has changed in that respect for the last 10 years and
> I was in your spot at that time.

Thank you Rafael, means a lot.


> The bottom line, in any case, is that the current code causes problems to
> happen somewhere and as a rule we don't release code that is known to
> cause problems to happen to anyone.  This means something needs to be done
> about that and the choice at this point is pretty much between reverting and
> quirking the affected system(s).

Quirking is not an option in this case because the PCI device IDs of the
affected hotplug ports as well as DMI data are unknown.  Yinghai Lu is
refusing to publish that, for both affected systems.

To be honest I only care about runtime suspending *Thunderbolt* hotplug
ports.  I enabled it for *all* hotplug ports in 68db9bc81436 because it
seemed like the right thing to do.  However given the murkiness of the
spec and the odd quirks Yinghai Lu reported it's probably not worth the
effort.  One must bear in mind that we've only heard of systems with
2015+ BIOSes so far.  More problem reports may pile up once we push the
BIOS limit further back.

I have submitted a patch to recognize Thunderbolt ports, so what I have
in mind is to resubmit 68db9bc81436 with an additional is_thunderbolt
condition in pci_bridge_d3_possible().  The number of Thunderbolt chips
is small and I know them fairly well, so it's easy for me to judge the
potential for regressions and deal with them.

Alternatively Bjorn could apply the patch to recognize Thunderbolt
devices now and then I can submit a one-line fix which adds the
is_thunderbolt condition to pci_bridge_d3_possible().  This would
obviate the need to revert 68db9bc81436.


> You seem to be disappointed that Yinghai has reported the problem at all,
> given that the hardware is unreleased and so on, but problem reports,
> even for systems like that, are what allows us to create code that works
> for everybody, so we (the maintainers) appreciate them very much.

No problem with reporting regressions, but with denying information
required to provide a fix.  I had asked for full dmesg output for
both machines so that the PCI device IDs and DMI data are available,
but Yinghai Lu continues to withhold that.  I am thus denied the means
to provide a fix and am forced to watch reversion of 68db9bc81436
without being able to respond.

Thanks,

Lukas
Rafael J. Wysocki Feb. 8, 2017, 12:03 p.m. UTC | #12
On Wednesday, February 08, 2017 05:23:54 AM Lukas Wunner wrote:
> On Tue, Feb 07, 2017 at 05:04:45PM +0100, Rafael J. Wysocki wrote:
> > On Tuesday, February 07, 2017 07:21:01 AM Lukas Wunner wrote:
> > > On Mon, Feb 06, 2017 at 04:15:02PM -0600, Bjorn Helgaas wrote:
> > > > On Mon, Feb 06, 2017 at 10:20:41PM +0100, Lukas Wunner wrote:
> > > > > On Mon, Feb 06, 2017 at 11:54:05AM -0600, Bjorn Helgaas wrote:
> > > > > > What is the hotplug event that causes generation of this wakeup event?
> > > > > 
> > > > > If you had read all e-mails in this thread or looked at the bugzilla
> > > > > entry I've created, you wouldn't have to ask this question.
> > > > 
> > > > I'm sorry, I don't necessarily have time to sort through all the
> > > > emails.  My idea is that the changelog should be a self-contained
> > > > justification for the patch.  The bugzilla is for supporting details
> > > > and future archaeologists.
> > > > 
> > > > > I think it's disappointing that you're asking me to jump through
> > > > > various hoops like creating a bugzilla entry, as well as threatening
> > > > > to revert my patch, but are unwilling to even look at the bugzilla
> > > > > entry or read the entire thread.  It is equally disappointing that
> > > > > the reporter of the regression was unwilling or unable to provide
> > > > > dmesg output for both machines so that we've got no real idea what
> > > > > we're dealing with.
> > > > 
> > > > I beg your pardon?  I don't think it's fair to malign Yinghai.  He's
> > > > tested at least two machines and at least two patches, and it's only
> > > > been two working days since he reported the problem.
> > > 
> > > I think the commercialization of Linux kernel development has put this
> > > open source project in a sorry state if an unpaid volunteer is told off
> > > because he expresses disappointment that a paid contributor is asking
> > > him to debug an issue on secret hardware using secret patches and not
> > > providing secret dmesg output.
> > 
> > That's not like a lot has changed in that respect for the last 10 years and
> > I was in your spot at that time.
> 
> Thank you Rafael, means a lot.
> 
> 
> > The bottom line, in any case, is that the current code causes problems to
> > happen somewhere and as a rule we don't release code that is known to
> > cause problems to happen to anyone.  This means something needs to be done
> > about that and the choice at this point is pretty much between reverting and
> > quirking the affected system(s).
> 
> Quirking is not an option in this case because the PCI device IDs of the
> affected hotplug ports as well as DMI data are unknown.  Yinghai Lu is
> refusing to publish that, for both affected systems.

Well, he may be bound by an NDA.

But still you may ask him to send that information to me (that should work for
Intel hardware at least) and we'll see what can be done about disclosing it.

> To be honest I only care about runtime suspending *Thunderbolt* hotplug
> ports.  I enabled it for *all* hotplug ports in 68db9bc81436 because it
> seemed like the right thing to do.  However given the murkiness of the
> spec and the odd quirks Yinghai Lu reported it's probably not worth the
> effort.  One must bear in mind that we've only heard of systems with
> 2015+ BIOSes so far.  More problem reports may pile up once we push the
> BIOS limit further back.

Right.

> I have submitted a patch to recognize Thunderbolt ports, so what I have
> in mind is to resubmit 68db9bc81436 with an additional is_thunderbolt
> condition in pci_bridge_d3_possible().  The number of Thunderbolt chips
> is small and I know them fairly well, so it's easy for me to judge the
> potential for regressions and deal with them.

That sounds reasonable.

> Alternatively Bjorn could apply the patch to recognize Thunderbolt
> devices now and then I can submit a one-line fix which adds the
> is_thunderbolt condition to pci_bridge_d3_possible().  This would
> obviate the need to revert 68db9bc81436.

OK

I guess the cleanest way would be to submit a patch (or series of patches)
on top of the current mainline to make changes to fix the problem on the
Yinghai's machines.

> > You seem to be disappointed that Yinghai has reported the problem at all,
> > given that the hardware is unreleased and so on, but problem reports,
> > even for systems like that, are what allows us to create code that works
> > for everybody, so we (the maintainers) appreciate them very much.
> 
> No problem with reporting regressions, but with denying information
> required to provide a fix.  I had asked for full dmesg output for
> both machines so that the PCI device IDs and DMI data are available,
> but Yinghai Lu continues to withhold that.  I am thus denied the means
> to provide a fix and am forced to watch reversion of 68db9bc81436
> without being able to respond.

As I said, there may be an NDA involved and there is a way around it
as mentioned above.

That said I think that limitting runtime PM to Thunderbolt ports for the time
being is the way to go at this point in the cycle.

Thanks,
Rafael
Bjorn Helgaas Feb. 8, 2017, 5:57 p.m. UTC | #13
On Tue, Feb 07, 2017 at 07:21:01AM +0100, Lukas Wunner wrote:
> On Mon, Feb 06, 2017 at 04:15:02PM -0600, Bjorn Helgaas wrote:
> > On Mon, Feb 06, 2017 at 10:20:41PM +0100, Lukas Wunner wrote:
> > > On Mon, Feb 06, 2017 at 11:54:05AM -0600, Bjorn Helgaas wrote:
> > > > What is the hotplug event that causes generation of this wakeup event?
> > > 
> > > If you had read all e-mails in this thread or looked at the bugzilla
> > > entry I've created, you wouldn't have to ask this question.
> > 
> > I'm sorry, I don't necessarily have time to sort through all the
> > emails.  My idea is that the changelog should be a self-contained
> > justification for the patch.  The bugzilla is for supporting details
> > and future archaeologists.
> > 
> > > I think it's disappointing that you're asking me to jump through
> > > various hoops like creating a bugzilla entry, as well as threatening
> > > to revert my patch, but are unwilling to even look at the bugzilla
> > > entry or read the entire thread.  It is equally disappointing that
> > > the reporter of the regression was unwilling or unable to provide
> > > dmesg output for both machines so that we've got no real idea what
> > > we're dealing with.
> > 
> > I beg your pardon?  I don't think it's fair to malign Yinghai.  He's
> > tested at least two machines and at least two patches, and it's only
> > been two working days since he reported the problem.
> 
> I think the commercialization of Linux kernel development has put this
> open source project in a sorry state if an unpaid volunteer is told off
> because he expresses disappointment that a paid contributor is asking
> him to debug an issue on secret hardware using secret patches and not
> providing secret dmesg output.

Just let me be clear that I greatly appreciate all problem reports.
There's very little that I can actually test myself, so problem
reports from others are absolutely essential.

Often we don't get all the information we'd like, but I treat all
reports seriously.  For every report we get, there may be dozens or
hundreds of people who hit the same problem but do not have the time
or expertise to report it.

> > If you think a bugzilla is onerous
> 
> Hold on.  I didn't say a bugzilla is onerous, I said I'm disappointed
> that you're asking me to create one and then don't look at it.

I looked at it, and it had a few details, but no analysis of the
situation.  Ideally, I'm looking for a précis of the situation in
the changelog, with complete analysis and supporting details in the
bugzilla.

Bjorn
Bjorn Helgaas Feb. 8, 2017, 6:04 p.m. UTC | #14
On Wed, Feb 08, 2017 at 05:23:54AM +0100, Lukas Wunner wrote:
> On Tue, Feb 07, 2017 at 05:04:45PM +0100, Rafael J. Wysocki wrote:

> > The bottom line, in any case, is that the current code causes problems to
> > happen somewhere and as a rule we don't release code that is known to
> > cause problems to happen to anyone.  This means something needs to be done
> > about that and the choice at this point is pretty much between reverting and
> > quirking the affected system(s).
> 
> Quirking is not an option in this case because the PCI device IDs of the
> affected hotplug ports as well as DMI data are unknown.  Yinghai Lu is
> refusing to publish that, for both affected systems.

A quirk is appropriate if the hardware isn't working per spec.  I
don't think we have convincing evidence of that yet, so I'm more
interested in investigating that than I am in getting the device IDs
to add a quirk.

I suspect there's useful information to be had by disabling CONFIG_PM
and CONFIG_HOTPLUG_PCI_PCIE and poking things with setpci, which could
be done regardless of any NDAs.

Bjorn
Lukas Wunner Feb. 9, 2017, 4:32 a.m. UTC | #15
On Wed, Feb 08, 2017 at 11:57:36AM -0600, Bjorn Helgaas wrote:
> On Tue, Feb 07, 2017 at 07:21:01AM +0100, Lukas Wunner wrote:
> > On Mon, Feb 06, 2017 at 04:15:02PM -0600, Bjorn Helgaas wrote:
> > > On Mon, Feb 06, 2017 at 10:20:41PM +0100, Lukas Wunner wrote:
> > > > On Mon, Feb 06, 2017 at 11:54:05AM -0600, Bjorn Helgaas wrote:
> > > > > What is the hotplug event that causes generation of this wakeup event?
> > > > 
> > > > If you had read all e-mails in this thread or looked at the bugzilla
> > > > entry I've created, you wouldn't have to ask this question.
> > > 
> > > I'm sorry, I don't necessarily have time to sort through all the
> > > emails.  My idea is that the changelog should be a self-contained
> > > justification for the patch.  The bugzilla is for supporting details
> > > and future archaeologists.
> > > 
[...]
> > > If you think a bugzilla is onerous
> > 
> > Hold on.  I didn't say a bugzilla is onerous, I said I'm disappointed
> > that you're asking me to create one and then don't look at it.
> 
> I looked at it, and it had a few details, but no analysis of the
> situation.  Ideally, I'm looking for a précis of the situation in
> the changelog, with complete analysis and supporting details in the
> bugzilla.

It's irrelevant whether the bugzilla contains an analysis or not.

You were asking what type of event caused the wakeup.  The information
was in the bugzilla.  So you either didn't look at the bugzilla or
didn't look hard enough.

Please tell me why in the future I should comply with your requests to
create a bugzilla if you don't look at it and the bug reporter doesn't
bother attaching any further information.  It would also have been
possible for you to glean the information you were asking for from the
reporter's e-mails but you didn't read those either.  And you're paid
to do this, I'm not.  If you look at the timestamps of my e-mails you
may notice that they're either early in the morning, late at night or
on weekends.  Guess what, I'm doing this in my spare time, in addition
to my day job.  Try to put yourself in my shoes for a moment, this
feels like a waste of my time and like subsystem maintainership is
being turned into a Kafkaesque bureaucracy.

Lukas
Lukas Wunner Feb. 12, 2017, 2:57 p.m. UTC | #16
On Thu, Feb 09, 2017 at 05:32:54AM +0100, Lukas Wunner wrote:
> On Wed, Feb 08, 2017 at 11:57:36AM -0600, Bjorn Helgaas wrote:
> > On Tue, Feb 07, 2017 at 07:21:01AM +0100, Lukas Wunner wrote:
> > > On Mon, Feb 06, 2017 at 04:15:02PM -0600, Bjorn Helgaas wrote:
> > > > On Mon, Feb 06, 2017 at 10:20:41PM +0100, Lukas Wunner wrote:
> > > > > On Mon, Feb 06, 2017 at 11:54:05AM -0600, Bjorn Helgaas wrote:
> > > > > > What is the hotplug event that causes generation of this wakeup event?
> > > > > 
> > > > > If you had read all e-mails in this thread or looked at the bugzilla
> > > > > entry I've created, you wouldn't have to ask this question.
> > > > 
> > > > I'm sorry, I don't necessarily have time to sort through all the
> > > > emails.  My idea is that the changelog should be a self-contained
> > > > justification for the patch.  The bugzilla is for supporting details
> > > > and future archaeologists.
> > > > 
> [...]
> > > > If you think a bugzilla is onerous
> > > 
> > > Hold on.  I didn't say a bugzilla is onerous, I said I'm disappointed
> > > that you're asking me to create one and then don't look at it.
> > 
> > I looked at it, and it had a few details, but no analysis of the
> > situation.  Ideally, I'm looking for a précis of the situation in
> > the changelog, with complete analysis and supporting details in the
> > bugzilla.
> 
> It's irrelevant whether the bugzilla contains an analysis or not.
> 
> You were asking what type of event caused the wakeup.  The information
> was in the bugzilla.  So you either didn't look at the bugzilla or
> didn't look hard enough.
> 
> Please tell me why in the future I should comply with your requests to
> create a bugzilla if you don't look at it and the bug reporter doesn't
> bother attaching any further information.  It would also have been
> possible for you to glean the information you were asking for from the
> reporter's e-mails but you didn't read those either.  And you're paid
> to do this, I'm not.  If you look at the timestamps of my e-mails you
> may notice that they're either early in the morning, late at night or
> on weekends.  Guess what, I'm doing this in my spare time, in addition
> to my day job.

My apologies Bjorn, it was clearly inappropriate for me to assume that
maintenance of the PCI subsystem is at all part of your job contract,
or talk about it on a public mailing list.

Kind regards,

Lukas
diff mbox

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index a881c0d..9c22e62 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2096,6 +2096,14 @@  bool pci_dev_run_wake(struct pci_dev *dev)
 {
 	struct pci_bus *bus = dev->bus;
 
+	/*
+	 * Don't enable PME at runtime on hotplug ports (even if supported)
+	 * since PME sends unwanted interrupts if the slot is occupied while
+	 * suspended to D3hot (PCIe Base Specification, section 6.7.3.4).
+	 */
+	if (dev->is_hotplug_bridge)
+		return false;
+
 	if (device_run_wake(&dev->dev))
 		return true;