[GIT,PULL] PCI fixes for v4.10

Message ID	CAE9FiQUpzE-5XKS8hKy3e25CyhknDjwXkNk=ADxi-r_euJTd5A@mail.gmail.com (mailing list archive)
State	New, archived
Delegated to:	Bjorn Helgaas
Headers	show Return-Path: <linux-pci-owner@kernel.org> MIME-Version: 1.0 In-Reply-To: <20170209201154.GA22458@bhelgaas-glaptop.roam.corp.google.com> References: <20170208192054.GA31395@bhelgaas-glaptop.roam.corp.google.com> <20170208192256.GB31395@bhelgaas-glaptop.roam.corp.google.com> <20170209040648.GA1304@wunner.de> <20170209150950.GA11905@bhelgaas-glaptop.roam.corp.google.com> <20170209201154.GA22458@bhelgaas-glaptop.roam.corp.google.com> From: Yinghai Lu <yinghai@kernel.org> Date: Fri, 10 Feb 2017 18:39:16 -0800 Message-ID: <CAE9FiQUpzE-5XKS8hKy3e25CyhknDjwXkNk=ADxi-r_euJTd5A@mail.gmail.com> Subject: Re: [GIT PULL] PCI fixes for v4.10 To: Bjorn Helgaas <helgaas@kernel.org> Cc: Lukas Wunner <lukas@wunner.de>, Linus Torvalds <torvalds@linux-foundation.org>, "linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Bart Van Assche <bart.vanassche@sandisk.com>, Christoph Hellwig <hch@lst.de>, "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>, Mika Westerberg <mika.westerberg@linux.intel.com>, Ashok Raj <ashok.raj@intel.com>, Keith Busch <keith.busch@intel.com> Content-Type: multipart/mixed; boundary=94eb2c03d95eddfa700548381e1b Sender: linux-pci-owner@vger.kernel.org Precedence: bulk

Message ID

CAE9FiQUpzE-5XKS8hKy3e25CyhknDjwXkNk=ADxi-r_euJTd5A@mail.gmail.com (mailing list archive)

State

New, archived

Delegated to:

Bjorn Helgaas

Headers

MIME-Version: 1.0
In-Reply-To: <20170209201154.GA22458@bhelgaas-glaptop.roam.corp.google.com>
References: <20170208192054.GA31395@bhelgaas-glaptop.roam.corp.google.com>
	<20170208192256.GB31395@bhelgaas-glaptop.roam.corp.google.com>
	<20170209040648.GA1304@wunner.de>
	<20170209150950.GA11905@bhelgaas-glaptop.roam.corp.google.com>
	<20170209201154.GA22458@bhelgaas-glaptop.roam.corp.google.com>
From: Yinghai Lu <yinghai@kernel.org>
Date: Fri, 10 Feb 2017 18:39:16 -0800
Message-ID: <CAE9FiQUpzE-5XKS8hKy3e25CyhknDjwXkNk=ADxi-r_euJTd5A@mail.gmail.com>
Subject: Re: [GIT PULL] PCI fixes for v4.10
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Lukas Wunner <lukas@wunner.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Bart Van Assche <bart.vanassche@sandisk.com>,
	Christoph Hellwig <hch@lst.de>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Mika Westerberg <mika.westerberg@linux.intel.com>,
	Ashok Raj <ashok.raj@intel.com>, Keith Busch <keith.busch@intel.com>
Content-Type: multipart/mixed; boundary=94eb2c03d95eddfa700548381e1b
Sender: linux-pci-owner@vger.kernel.org
Precedence: bulk

Commit Message

Yinghai Lu Feb. 11, 2017, 2:39 a.m. UTC

On Thu, Feb 9, 2017 at 12:11 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Thu, Feb 09, 2017 at 09:09:50AM -0600, Bjorn Helgaas wrote:
>> [+cc Ashok, Keith]
>>
>> On Thu, Feb 09, 2017 at 05:06:48AM +0100, Lukas Wunner wrote:
>> > On Wed, Feb 08, 2017 at 01:22:56PM -0600, Bjorn Helgaas wrote:
>> > > Bjorn Helgaas (1):
>> > >       Revert "PCI: pciehp: Add runtime PM support for PCIe hotplug ports"
>> >
>> > What's the rationale for reverting this?
>> >
>> > You've received patches to fix the issue on both affected machines,
>> > so a revert seems unnecessary:
>> >
>> > https://patchwork.kernel.org/patch/9557113/
>> > https://patchwork.kernel.org/patch/9562007/
>>
>> I don't think we've gotten to the root cause of the problem yet,
>> and I don't want to throw in fixes at the last minute without a better
>> understanding of it.
>>
>> PCIe hotplug hardware is not very complicated, it hasn't changed in
>> many years, and at least for the Intel hardware in question, is
>> generally pretty well-tested with Windows.  So I want to be careful
>> about asserting that this new piece of hardware is broken.
>
> I apologize: I had quirks on the brain, but neither of the patches
> above is device-specific.  So neither is claiming broken hardware.
>
> However, 9557113 claims we get unwanted PME interrupts if the slot is
> occupied when we suspend to D3hot.  This is what I want to explore
> further, because that hardware behavior doesn't really make sense to
> me.
>
> 9562007 apparently fixes something, but at this point it's a debugging
> patch (no changelog or signed-off-by) so not a candidate for tossing
> into v4.10 at this late date.

Agreed. It should need more test coverage.

Found more problems.

Actually we don't need 9557113.
as even with that, we still saw link up when power off slots with some cards.

please check updated version of 9562007, that fix power on/off link up problem.

Ashok,

Can ask your QA guys check only attached patch and commit 68db9bc ?

Thanks

Yinghai

Comments

Lukas Wunner Feb. 12, 2017, 7:05 p.m. UTC | #1

On Fri, Feb 10, 2017 at 06:39:16PM -0800, Yinghai Lu wrote:
> On Thu, Feb 9, 2017 at 12:11 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Thu, Feb 09, 2017 at 09:09:50AM -0600, Bjorn Helgaas wrote:
> > > On Thu, Feb 09, 2017 at 05:06:48AM +0100, Lukas Wunner wrote:
> > > > https://patchwork.kernel.org/patch/9557113/
> > > > https://patchwork.kernel.org/patch/9562007/
> >
> > I apologize: I had quirks on the brain, but neither of the patches
> > above is device-specific.  So neither is claiming broken hardware.
> >
> > However, 9557113 claims we get unwanted PME interrupts if the slot is
> > occupied when we suspend to D3hot.  This is what I want to explore
> > further, because that hardware behavior doesn't really make sense to
> > me.
> >
> > 9562007 apparently fixes something, but at this point it's a debugging
> > patch (no changelog or signed-off-by) so not a candidate for tossing
> > into v4.10 at this late date.
> 
> Agreed. It should need more test coverage.  Found more problems.
> 
> Actually we don't need 9557113 as even with that, we still saw link up
> when power off slots with some cards.
> 
> please check updated version of 9562007, that fix power on/off link up
> problem.

Thank you for debugging this further.  The patch I've submitted today
reinstates runtime PM for hotplug ports but constrains it to those on
a Thunderbolt daisy chain.  The patch allows enabling the feature on
other hardware by booting with pcie_port_pm=force.

A few things to keep in mind:

* On Thunderbolt hotplug ports, interrupts are sent even if the port
  is in D3hot, which as Bjorn has pointed out contradicts the PCI PM
  spec r1.2, table 5-4.  This may be caused by liberal interpretation
  of the spec by Intel when designing the Thunderbolt controllers,
  or perhaps Thunderbolt controllers simply do not possess a "real",
  fully-fledged PCIe switch.  I let the hotplug ports go to D3hot,
  expecting them to continue delivering interrupts but YMMV.

* You've reported that the hotplug port must be in D0 to enable and
  disable power on the slot.  I think this is not required by the spec.
  Thunderbolt hotplug ports do not support power control.  My suspicion
  is that the ports on your machine must remain in D0 as long as the
  slot is occupied, i.e. they must not runtime suspend to D3hot.  Can
  this happen?  Yes.  I release the runtime PM ref once a slot has been
  enabled or disabled.  The device remains runtime active as long as it
  has active children.  If all children runtime suspend, the port will
  go to D3hot, which might cause trouble if this implies that slot power
  is turned off.  To test this you need a card whose Linux driver supports
  runtime PM (e.g. Nvidia GPU, boot with nouveau.runpm=1).

* If the hotplug slot has runtime suspended to D3hot and there are ports
  above it that also runtime suspend to D3hot, its config space is no
  longer accessible and in-band interrupts won't come through.  A side-band
  signaling method such as PME WAKE# is required to deliver interrupts from
  this state.  Also, the hotplug_slot_ops defined for pciehp will have to
  be augmented with calls to pm_runtime_get_sync() and pm_runtime_put()
  to wake the parent of the hotplug port so that config space is accessible
  when interacting with the slot via sysfs.

* If pciehp_poll_mode is used, it may be necessary to call
  pm_runtime_forbid(). (Or alternatively runtime resume it whenever config
  space is polled, but that seems silly.)

> --- linux-2.6.orig/drivers/pci/hotplug/pciehp_ctrl.c
> +++ linux-2.6/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -89,17 +89,17 @@ static int board_added(struct slot *p_sl
>  	struct controller *ctrl = p_slot->ctrl;
>  	struct pci_bus *parent = ctrl->pcie->port->subordinate;
>  
> +	pm_runtime_get_sync(&ctrl->pcie->port->dev);
>  	if (POWER_CTRL(ctrl)) {
>  		/* Power on slot */
>  		retval = pciehp_power_on_slot(p_slot);
>  		if (retval)
> -			return retval;
> +			goto err_exit;
>  	}
>  
>  	pciehp_green_led_blink(p_slot);
>  
>  	/* Check link training status */
> -	pm_runtime_get_sync(&ctrl->pcie->port->dev);
>  	retval = pciehp_check_link_status(ctrl);
>  	if (retval) {
>  		ctrl_err(ctrl, "Failed to check link status\n");

Well, it may be simpler to just move the pm_runtime_get_sync() / _put()
to the caller of board_added() and remove_board().  That way it's not
necessary to insert a pm_runtime_put() into every error path.  The
patch I've submitted today does exactly that.

In fact, v2 of my Thunderbolt runtime PM series, posted in May 2016,
already did that:
http://www.spinics.net/lists/linux-pci/msg51153.html

But for v3 I decided to move the pm_runtime_get_sync() / _put() down
the call stack into board_added() and remove_board() to make more
precise exactly which operations require the hotplug port to be in D0.
Guess that wasn't a good idea. :-(

Thanks,

Lukas

Rafael J. Wysocki Feb. 13, 2017, 12:10 p.m. UTC | #2

On Sunday, February 12, 2017 08:05:02 PM Lukas Wunner wrote:
> On Fri, Feb 10, 2017 at 06:39:16PM -0800, Yinghai Lu wrote:
> > On Thu, Feb 9, 2017 at 12:11 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Thu, Feb 09, 2017 at 09:09:50AM -0600, Bjorn Helgaas wrote:
> > > > On Thu, Feb 09, 2017 at 05:06:48AM +0100, Lukas Wunner wrote:
> > > > > https://patchwork.kernel.org/patch/9557113/
> > > > > https://patchwork.kernel.org/patch/9562007/
> > >
> > > I apologize: I had quirks on the brain, but neither of the patches
> > > above is device-specific.  So neither is claiming broken hardware.
> > >
> > > However, 9557113 claims we get unwanted PME interrupts if the slot is
> > > occupied when we suspend to D3hot.  This is what I want to explore
> > > further, because that hardware behavior doesn't really make sense to
> > > me.
> > >
> > > 9562007 apparently fixes something, but at this point it's a debugging
> > > patch (no changelog or signed-off-by) so not a candidate for tossing
> > > into v4.10 at this late date.
> > 
> > Agreed. It should need more test coverage.  Found more problems.
> > 
> > Actually we don't need 9557113 as even with that, we still saw link up
> > when power off slots with some cards.
> > 
> > please check updated version of 9562007, that fix power on/off link up
> > problem.
> 
> Thank you for debugging this further.  The patch I've submitted today
> reinstates runtime PM for hotplug ports but constrains it to those on
> a Thunderbolt daisy chain.  The patch allows enabling the feature on
> other hardware by booting with pcie_port_pm=force.
> 
> A few things to keep in mind:
> 
> * On Thunderbolt hotplug ports, interrupts are sent even if the port
>   is in D3hot, which as Bjorn has pointed out contradicts the PCI PM
>   spec r1.2, table 5-4.  This may be caused by liberal interpretation
>   of the spec by Intel when designing the Thunderbolt controllers,
>   or perhaps Thunderbolt controllers simply do not possess a "real",
>   fully-fledged PCIe switch.  I let the hotplug ports go to D3hot,
>   expecting them to continue delivering interrupts but YMMV.
> 
> * You've reported that the hotplug port must be in D0 to enable and
>   disable power on the slot.  I think this is not required by the spec.
>   Thunderbolt hotplug ports do not support power control.  My suspicion
>   is that the ports on your machine must remain in D0 as long as the
>   slot is occupied, i.e. they must not runtime suspend to D3hot.  Can
>   this happen?  Yes.  I release the runtime PM ref once a slot has been
>   enabled or disabled.  The device remains runtime active as long as it
>   has active children.  If all children runtime suspend, the port will
>   go to D3hot, which might cause trouble if this implies that slot power
>   is turned off.  To test this you need a card whose Linux driver supports
>   runtime PM (e.g. Nvidia GPU, boot with nouveau.runpm=1).
> 
> * If the hotplug slot has runtime suspended to D3hot and there are ports
>   above it that also runtime suspend to D3hot, its config space is no
>   longer accessible and in-band interrupts won't come through.  A side-band
>   signaling method such as PME WAKE# is required to deliver interrupts from
>   this state.

It actually can use in-band PME messages too, at least if my interpretation of
this part of the spec is correct (or reflects the interpretation of the people
who design the chips in question to be more precise).

Thanks,
Rafael

Subject:[PATCH v2] PCI, pciechp: Only power on/off slots when it is D0
Found power on via /sys has problem.
sca05-0a81fd7f:~ # echo 1 > /sys/bus/pci/slots/7/power
[  300.949937] pci_hotplug: power_write_file: power = 1
[  300.955502] pciehp 0000:73:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 17f1
[  300.982557] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 from Slot Status
[  300.991171] pciehp 0000:73:00.0:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0
[  301.000033] pciehp 0000:73:00.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
[  301.009274] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 from Slot Status
[  301.662172] pciehp 0000:73:00.0:pcie004: pciehp_check_link_active: lnk_status = f083
[  301.670827] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0108 from Slot Status
[  301.679376] pciehp 0000:73:00.0:pcie004: Slot(7): Link Up
[  301.685463] pciehp 0000:73:00.0:pcie004: Slot(7): Link Up event ignored; already powering on
[  301.685508] pciehp 0000:73:00.0:pcie004: pciehp_check_link_active: lnk_status = f083
[  302.005967] pciehp 0000:73:00.0:pcie004: pciehp_check_link_status: lnk_status = f083
[  302.014859] pci 0000:74:00.0: [15b3:1003] type 00 class 0x0c0600

also find other slot with other card still have extra link up problem on power off
even has can_wake patch.

sca05-0a81fd7f:~ # echo 0 > /sys/bus/pci/slots/1/power 
[ 6116.873632] pci_hotplug: power_write_file: power = 0
[ 6116.879198] pciehp 0000:16:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 11f1
[ 6116.888730] pciehp 0000:16:00.0:pcie004: pciehp_unconfigure_device: domain:bus:dev = 0000:17:00
[ 6116.898464] pci 0000:17:00.0: PME# disabled
[ 6116.903541] pci 0000:17:00.0: freeing pci_dev info
[ 6116.909662] pciehp 0000:16:00.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 6116.918277] pciehp 0000:16:00.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
[ 6116.982048] pciehp 0000:16:00.0:pcie004: pending interrupts 0x0108 from Slot Status
[ 6116.990608] pciehp 0000:16:00.0:pcie004: Slot(1): Link Down
[ 6116.996876] pciehp 0000:16:00.0:pcie004: Slot(1): Link Down event ignored; already powering off
[ 6117.961521] pciehp 0000:16:00.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
[ 6117.970575] pciehp 0000:16:00.0:pcie004: pending interrupts 0x0018 from Slot Status
[ 6117.970581] pciehp 0000:16:00.0:pcie004: Slot(1): Card present
[ 6117.985660] pciehp 0000:16:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 17f1
[ 6117.995825] pciehp 0000:16:00.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 6118.005489] pciehp 0000:16:00.0:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0
[ 6118.014628] pciehp 0000:16:00.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
[ 6118.023880] pciehp 0000:16:00.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 6118.602855] pciehp 0000:16:00.0:pcie004: pciehp_check_link_active: lnk_status = f103
[ 6118.611507] pciehp 0000:16:00.0:pcie004: pending interrupts 0x0108 from Slot Status
[ 6118.620057] pciehp 0000:16:00.0:pcie004: Slot(1): Link Up
[ 6118.626151] pciehp 0000:16:00.0:pcie004: pciehp_check_link_active: lnk_status = f103
[ 6118.634828] pciehp 0000:16:00.0:pcie004: Slot(1): Link Up event ignored; already powering on
[ 6118.741520] pciehp 0000:16:00.0:pcie004: pciehp_check_link_status: lnk_status = f103
[ 6118.750201] pci 0000:17:00.0: [108e:2088] type 00 class 0x020700
...

That mean commit 68db9bc assumpation about power on/off on D3 is not right.
- The configuration space of the port remains accessible in D3hot, so all
  the functions to read or modify the Slot Status and Slot Control
  registers need not be modified.  Even turning on slot power doesn't seem
  to require the port to be in D0, at least the PCIe spec doesn't say so
  and I confirmed that by testing with a Thunderbolt controller.

This patch put back D0 when trying to power on/off the slots.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/hotplug/pciehp_ctrl.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_ctrl.c
===================================================================
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_ctrl.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_ctrl.c
@@ -89,17 +89,17 @@  static int board_added(struct slot *p_sl
 	struct controller *ctrl = p_slot->ctrl;
 	struct pci_bus *parent = ctrl->pcie->port->subordinate;
 
+	pm_runtime_get_sync(&ctrl->pcie->port->dev);
 	if (POWER_CTRL(ctrl)) {
 		/* Power on slot */
 		retval = pciehp_power_on_slot(p_slot);
 		if (retval)
-			return retval;
+			goto err_exit;
 	}
 
 	pciehp_green_led_blink(p_slot);
 
 	/* Check link training status */
-	pm_runtime_get_sync(&ctrl->pcie->port->dev);
 	retval = pciehp_check_link_status(ctrl);
 	if (retval) {
 		ctrl_err(ctrl, "Failed to check link status\n");
@@ -143,9 +143,10 @@  static int remove_board(struct slot *p_s
 
 	pm_runtime_get_sync(&ctrl->pcie->port->dev);
 	retval = pciehp_unconfigure_device(p_slot);
-	pm_runtime_put(&ctrl->pcie->port->dev);
-	if (retval)
+	if (retval) {
+		pm_runtime_put(&ctrl->pcie->port->dev);
 		return retval;
+	}
 
 	if (POWER_CTRL(ctrl)) {
 		pciehp_power_off_slot(p_slot);
@@ -157,6 +158,7 @@  static int remove_board(struct slot *p_s
 		 */
 		msleep(1000);
 	}
+	pm_runtime_put(&ctrl->pcie->port->dev);
 
 	/* turn off Green LED */
 	pciehp_green_led_off(p_slot);

[GIT,PULL] PCI fixes for v4.10

Commit Message

Comments

Patch