diff mbox

[v6,3/8] PCI: pciehp: Reinstate runtime PM on Thunderbolt hotplug ports

Message ID 14ffd1ceeb76200b34e0abfebab134545e504bdc.1486913733.git.lukas@wunner.de (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Lukas Wunner Feb. 12, 2017, 4:07 p.m. UTC
Commit 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe
hotplug ports") extended runtime PM for PCIe ports to hotplug ports.
However Yinghai Lu reported the following breakage caused by the commit:

* After disabling a slot on one of his servers, the slot signals PME
  which causes pciehp to immediately re-enable the card, thus defeating
  manual slot control via sysfs.

* On another server which has a Power Controller for PCIe hotplug ports
  (PCIe r3.1, sec 6.7.1.8), the slot cannot be re-enabled after manually
  disabling it via sysfs since commit 68db9bc81436 acquired a runtime
  ref only *after* enabling power on the slot, yet this particular
  SkyLake server requires that the port is in D0 when enabling power.

Hence the commit was reverted with d98e0929071e.  It is herewith
reinstated in modified form:

* The Power Controller issue is addressed by no longer acquiring the
  runtime ref in board_added() and remove_board(), but in their callers.
  (There's just a single one each.)

* An additional condition is added to pci_bridge_d3_possible() such that
  runtime PM is only enabled on *Thunderbolt* hotplug ports to constrain
  it to a few well understood chips.  In all other cases the feature can
  be enabled by booting with pcie_port_pm=force.  If confidence is high
  that it works well for everyone, it can be made the default by
  removing this condition:

  /*
   * Non-Thunderbolt hotplug ports need further testing before
   * enabling D3 on them.
   */
  if (bridge->is_hotplug_bridge)
	  return false;

Following is an updated recapitulation of the extra considerations
required for hotplug ports as laid out in 68db9bc81436:

* The configuration space of the port remains accessible in D3hot, so
  all the functions to read or modify the Slot Status and Slot Control
  registers need not be modified.

* However D0 is required to access devices on the secondary bus.  This
  happens in pciehp_check_link_status() and pciehp_configure_device()
  (both called from board_added()) and in pciehp_unconfigure_device()
  (called from remove_board()), so acquire a runtime PM ref for their
  invocation.

* The hotplug port stays active as long as it has active children.
  If all hotplugged devices below the port runtime suspend, the port
  is allowed to runtime suspend as well.  Plug and unplug detection
  continues to work in D3hot.

* Hotplug interrupts are delivered in-band, which requires parents of
  the hotplug port to stay in D0.  For hotplug-capable root ports this
  is a non-issue.  For cascaded hotplug ports, side-band signaling is
  required.  E.g. with Thunderbolt, a plug event at the end of the chain
  is signaled through the CIO switching fabric to the NHI regardless of
  PCIe ports on the chain being in D3.  It is then the NHI's job to
  runtime resume the PCIe port on which the plug event occurred.

* Runtime PM may only be allowed if the hotplug port is handled natively
  by the OS.  On ACPI systems, the port may alternatively be handled by
  the firmware and things break if the OS puts the port into D3 behind the
  firmware's back:  E.g. Thunderbolt hotplug ports on non-Macs are handled
  by Intel's firmware in System Management Mode and the firmware is known
  to access devices on the port's secondary bus without checking first if
  the port is in D0:  https://bugzilla.kernel.org/show_bug.cgi?id=53811

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=193951
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Erik Veijola <erik.veijola@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp_ctrl.c | 13 +++++++++++--
 drivers/pci/pci.c                 | 14 +++++++++-----
 2 files changed, 20 insertions(+), 7 deletions(-)

Comments

Bjorn Helgaas Feb. 14, 2017, 10:59 p.m. UTC | #1
On Sun, Feb 12, 2017 at 05:07:45PM +0100, Lukas Wunner wrote:
> Commit 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe
> hotplug ports") extended runtime PM for PCIe ports to hotplug ports.
> However Yinghai Lu reported the following breakage caused by the commit:
> 
> * After disabling a slot on one of his servers, the slot signals PME
>   which causes pciehp to immediately re-enable the card, thus defeating
>   manual slot control via sysfs.

I don't think we've clearly established this.  It surprises me that a
Downstream Port would signal PME after turning off power to the slot.
Wouldn't that make PME useless?

I'd like to look at this in a little more detail, either by turning
off PM and pciehp and poking things manually with setpci, or possibly
by adding some debug output in the PME ISR and the pciehp ISR and
"write command" paths.

> * On another server which has a Power Controller for PCIe hotplug ports
>   (PCIe r3.1, sec 6.7.1.8), the slot cannot be re-enabled after manually
>   disabling it via sysfs since commit 68db9bc81436 acquired a runtime
>   ref only *after* enabling power on the slot, yet this particular
>   SkyLake server requires that the port is in D0 when enabling power.

So this server requires that the Downstream Port must be in D0 before
we turn on power to the slot below the Downstream Port?  That seems
like it would be a requirement for *all* systems.  If the bridge is
not in D0, we can't generate any PCI transactions on the secondary bus
(PCI PM r1.1., Table 19), so we can't enumerate anything in the slot.

Bjorn
Lukas Wunner Feb. 18, 2017, 9:25 a.m. UTC | #2
On Tue, Feb 14, 2017 at 04:59:26PM -0600, Bjorn Helgaas wrote:
> On Sun, Feb 12, 2017 at 05:07:45PM +0100, Lukas Wunner wrote:
> > Commit 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe
> > hotplug ports") extended runtime PM for PCIe ports to hotplug ports.
> > However Yinghai Lu reported the following breakage caused by the commit:
> > 
> > * After disabling a slot on one of his servers, the slot signals PME
> >   which causes pciehp to immediately re-enable the card, thus defeating
> >   manual slot control via sysfs.
> 
> I don't think we've clearly established this.  It surprises me that a
> Downstream Port would signal PME after turning off power to the slot.
> Wouldn't that make PME useless?
> 
> I'd like to look at this in a little more detail, either by turning
> off PM and pciehp and poking things manually with setpci, or possibly
> by adding some debug output in the PME ISR and the pciehp ISR and
> "write command" paths.

I'm in the To: header of your e-mail but I am not in a position to
debug this as I don't have the hardware available.  It would have
to be done by Yinghai Lu or Intel.  This may already be clear to
everyone involved, so apologies for stating the obvious.

Also, I hope it's clear that these Intel servers are not affected
by v6 of this patch as it only enables runtime PM on *Thunderbolt*
hotplug ports.


> > * On another server which has a Power Controller for PCIe hotplug ports
> >   (PCIe r3.1, sec 6.7.1.8), the slot cannot be re-enabled after manually
> >   disabling it via sysfs since commit 68db9bc81436 acquired a runtime
> >   ref only *after* enabling power on the slot, yet this particular
> >   SkyLake server requires that the port is in D0 when enabling power.
> 
> So this server requires that the Downstream Port must be in D0 before
> we turn on power to the slot below the Downstream Port?

Apparently.


> That seems like it would be a requirement for *all* systems.

Well, it's not mandated by the spec but indeed seems like a good idea.
Please note that the present patch does exactly that.


> If the bridge is
> not in D0, we can't generate any PCI transactions on the secondary bus
> (PCI PM r1.1., Table 19), so we can't enumerate anything in the slot.

Turning on slot power has nothing to do with accessing the secondary bus.
v5 of this patch resumed the hotplug port to D0 when accessing the
secondary bus. On top of that, v6 resumes it to D0 when turning on
slot power.

Thanks,

Lukas
Bjorn Helgaas Feb. 19, 2017, 12:16 a.m. UTC | #3
On Sat, Feb 18, 2017 at 10:25:16AM +0100, Lukas Wunner wrote:
> On Tue, Feb 14, 2017 at 04:59:26PM -0600, Bjorn Helgaas wrote:
> > On Sun, Feb 12, 2017 at 05:07:45PM +0100, Lukas Wunner wrote:
> > > Commit 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe
> > > hotplug ports") extended runtime PM for PCIe ports to hotplug ports.
> > > However Yinghai Lu reported the following breakage caused by the commit:
> > > 
> > > * After disabling a slot on one of his servers, the slot signals PME
> > >   which causes pciehp to immediately re-enable the card, thus defeating
> > >   manual slot control via sysfs.
> > 
> > I don't think we've clearly established this.  It surprises me that a
> > Downstream Port would signal PME after turning off power to the slot.
> > Wouldn't that make PME useless?
> > 
> > I'd like to look at this in a little more detail, either by turning
> > off PM and pciehp and poking things manually with setpci, or possibly
> > by adding some debug output in the PME ISR and the pciehp ISR and
> > "write command" paths.
> 
> I'm in the To: header of your e-mail but I am not in a position to
> debug this as I don't have the hardware available.  It would have
> to be done by Yinghai Lu or Intel.  This may already be clear to
> everyone involved, so apologies for stating the obvious.

It is clear; I was just hoping you would make it as easy as possible
for the folks with the hardware by proposing a debugging patch or a
series of specific setpci commands to run.

> Also, I hope it's clear that these Intel servers are not affected
> by v6 of this patch as it only enables runtime PM on *Thunderbolt*
> hotplug ports.

Right.  I just don't understand what's happening, and I don't want
this to become folk wisdom before we have more details.

> > > * On another server which has a Power Controller for PCIe hotplug ports
> > >   (PCIe r3.1, sec 6.7.1.8), the slot cannot be re-enabled after manually
> > >   disabling it via sysfs since commit 68db9bc81436 acquired a runtime
> > >   ref only *after* enabling power on the slot, yet this particular
> > >   SkyLake server requires that the port is in D0 when enabling power.
> > 
> > So this server requires that the Downstream Port must be in D0 before
> > we turn on power to the slot below the Downstream Port?
> 
> Apparently.
> 
> > That seems like it would be a requirement for *all* systems.
> 
> Well, it's not mandated by the spec but indeed seems like a good idea.
> Please note that the present patch does exactly that.

A bridge in D3hot still accepts config cycles, so we should be able to
use Slot Control to turn slot power for its secondary bus on or off.

PCI PM r1.1, Table 16, says the only action we can expect from the
bridge is PME, so we can't rely on interrupts for Command Completion,
Link State Changed, etc.  Unless we wanted to manage the hotplug
controller by polling it, I think we have to put the bridge in D0
before sending any commands to it.

This seems like something that could be split into a separate patch
that wouldn't have anything to do with Thunderbolt.  Would that make
sense to you?

Bjorn
Lukas Wunner Feb. 20, 2017, 12:04 p.m. UTC | #4
On Sat, Feb 18, 2017 at 06:16:06PM -0600, Bjorn Helgaas wrote:
> On Sat, Feb 18, 2017 at 10:25:16AM +0100, Lukas Wunner wrote:
> > On Tue, Feb 14, 2017 at 04:59:26PM -0600, Bjorn Helgaas wrote:
> > > On Sun, Feb 12, 2017 at 05:07:45PM +0100, Lukas Wunner wrote:
> > > > Commit 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe
> > > > hotplug ports") extended runtime PM for PCIe ports to hotplug ports.
> > > > However Yinghai Lu reported the following breakage caused by the commit:
> > > > 
> > > > * After disabling a slot on one of his servers, the slot signals PME
> > > >   which causes pciehp to immediately re-enable the card, thus defeating
> > > >   manual slot control via sysfs.
> > > 
> > > I don't think we've clearly established this.  It surprises me that a
> > > Downstream Port would signal PME after turning off power to the slot.
> > > Wouldn't that make PME useless?
> > > 
> > > I'd like to look at this in a little more detail, either by turning
> > > off PM and pciehp and poking things manually with setpci, or possibly
> > > by adding some debug output in the PME ISR and the pciehp ISR and
> > > "write command" paths.
> > 
> > I'm in the To: header of your e-mail but I am not in a position to
> > debug this as I don't have the hardware available.  It would have
> > to be done by Yinghai Lu or Intel.  This may already be clear to
> > everyone involved, so apologies for stating the obvious.
> 
> It is clear; I was just hoping you would make it as easy as possible
> for the folks with the hardware by proposing a debugging patch or a
> series of specific setpci commands to run.
> 
> > Also, I hope it's clear that these Intel servers are not affected
> > by v6 of this patch as it only enables runtime PM on *Thunderbolt*
> > hotplug ports.
> 
> Right.  I just don't understand what's happening, and I don't want
> this to become folk wisdom before we have more details.

If I understand Yinghai Lu's analysis correctly, the port is in D3hot,
we write to config space to enable power on the slot and request a
Command Complete interrupt, the port signals PME to request being
transferred to D0 so that it can deliver the Command Complete interrupt.

So this patch just papers over the issue by disabling PME. :-)

The issue doesn't manifest itself on Thunderbolt controllers because
(a) they don't support power control and (b) they never send Command
Complete interrupts (even though they claim to support it).

Since patch [3/8] of v6 of my series enables runtime PM only on
*Thunderbolt* hotplug ports, the series should be fine as is, however
this isn't really beautiful so if you decide to apply patch [3/8] I'll
come up with a fix, or if you'd rather postpone it I'll rework the patch
to keep the port runtime resumed whenever and for as long as we're waiting
for command completion.  I'll not be able to start working on this until
next week though.  If you do decide to postpone patch [3/8], maybe we
can at least come up with an agreeable version of patch [1/8] to go
into your 4.11 pull.

Thanks,

Lukas
diff mbox

Patch

diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index ec0b4c11ccd9..d071aa63dac9 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -31,6 +31,7 @@ 
 #include <linux/kernel.h>
 #include <linux/types.h>
 #include <linux/slab.h>
+#include <linux/pm_runtime.h>
 #include <linux/pci.h>
 #include "../pci.h"
 #include "pciehp.h"
@@ -390,6 +391,7 @@  int pciehp_enable_slot(struct slot *p_slot)
 {
 	u8 getstatus = 0;
 	struct controller *ctrl = p_slot->ctrl;
+	int retval;
 
 	pciehp_get_adapter_status(p_slot, &getstatus);
 	if (!getstatus) {
@@ -414,7 +416,10 @@  int pciehp_enable_slot(struct slot *p_slot)
 		}
 	}
 
-	return board_added(p_slot);
+	pm_runtime_get_sync(&ctrl->pcie->port->dev);
+	retval = board_added(p_slot);
+	pm_runtime_put(&ctrl->pcie->port->dev);
+	return retval;
 }
 
 /*
@@ -424,6 +429,7 @@  int pciehp_disable_slot(struct slot *p_slot)
 {
 	u8 getstatus = 0;
 	struct controller *ctrl = p_slot->ctrl;
+	int retval;
 
 	if (!p_slot->ctrl)
 		return 1;
@@ -437,7 +443,10 @@  int pciehp_disable_slot(struct slot *p_slot)
 		}
 	}
 
-	return remove_board(p_slot);
+	pm_runtime_get_sync(&ctrl->pcie->port->dev);
+	retval = remove_board(p_slot);
+	pm_runtime_put(&ctrl->pcie->port->dev);
+	return retval;
 }
 
 int pciehp_sysfs_enable_slot(struct slot *p_slot)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 441083a0d5b0..a1896df274d4 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2241,13 +2241,10 @@  bool pci_bridge_d3_possible(struct pci_dev *bridge)
 			return false;
 
 		/*
-		 * Hotplug interrupts cannot be delivered if the link is down,
-		 * so parents of a hotplug port must stay awake. In addition,
-		 * hotplug ports handled by firmware in System Management Mode
+		 * Hotplug ports handled by firmware in System Management Mode
 		 * may not be put into D3 by the OS (Thunderbolt on non-Macs).
-		 * For simplicity, disallow in general for now.
 		 */
-		if (bridge->is_hotplug_bridge)
+		if (bridge->is_hotplug_bridge && !pciehp_is_native(bridge))
 			return false;
 
 		if (pci_bridge_d3_force)
@@ -2258,6 +2255,13 @@  bool pci_bridge_d3_possible(struct pci_dev *bridge)
 			return true;
 
 		/*
+		 * Non-Thunderbolt hotplug ports need further testing before
+		 * enabling D3 on them.
+		 */
+		if (bridge->is_hotplug_bridge)
+			return false;
+
+		/*
 		 * It should be safe to put PCIe ports from 2015 or newer
 		 * to D3.
 		 */