diff mbox

[v3,4/8] pciehp: Don't disable the link permanently, during removal

Message ID 52B0AEAD.6050604@gmail.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Rajat Jain Dec. 17, 2013, 8:06 p.m. UTC
We need future link up events for hot-add, thus don't disable
the link permanently during device removal. Also, remove the static
functions that are now left unused.

Signed-off-by: Rajat Jain <rajatjain@juniper.net>
Signed-off-by: Guenter Roeck <groeck@juniper.net>
---
v3: no change, created by splitting the patch v2 [2/4]
v2: (non existent)
v1: (non existent)

 drivers/pci/hotplug/pciehp_hpc.c |   18 ------------------
 1 file changed, 18 deletions(-)

Comments

Bjorn Helgaas Dec. 18, 2013, 1:02 a.m. UTC | #1
[+cc yinghai@kernel.org (seems to be Yinghai's preferred email]

On Tue, Dec 17, 2013 at 12:06:05PM -0800, Rajat Jain wrote:
> We need future link up events for hot-add, thus don't disable
> the link permanently during device removal. Also, remove the static
> functions that are now left unused.

The changelog should mention that this reverts part of 2debd9289997 ("PCI:
pciehp: Disable/enable link during slot power off/on").

Yinghai, can you tell us whether this is an issue on your systems?

> Signed-off-by: Rajat Jain <rajatjain@juniper.net>
> Signed-off-by: Guenter Roeck <groeck@juniper.net>
> ---
> v3: no change, created by splitting the patch v2 [2/4]
> v2: (non existent)
> v1: (non existent)
> 
>  drivers/pci/hotplug/pciehp_hpc.c |   18 ------------------
>  1 file changed, 18 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index b45b568..ab12555 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -278,11 +278,6 @@ static void pcie_wait_link_active(struct controller *ctrl)
>  	__pcie_wait_link_active(ctrl, true);
>  }
>  
> -static void pcie_wait_link_not_active(struct controller *ctrl)
> -{
> -	__pcie_wait_link_active(ctrl, false);
> -}
> -
>  static bool pci_bus_check_dev(struct pci_bus *bus, int devfn)
>  {
>  	u32 l;
> @@ -383,11 +378,6 @@ static int pciehp_link_enable(struct controller *ctrl)
>  	return __pciehp_link_set(ctrl, true);
>  }
>  
> -static int pciehp_link_disable(struct controller *ctrl)
> -{
> -	return __pciehp_link_set(ctrl, false);
> -}
> -
>  int pciehp_get_attention_status(struct slot *slot, u8 *status)
>  {
>  	struct controller *ctrl = slot->ctrl;
> @@ -620,14 +610,6 @@ int pciehp_power_off_slot(struct slot * slot)
>  	u16 cmd_mask;
>  	int retval;
>  
> -	/* Disable the link at first */
> -	pciehp_link_disable(ctrl);
> -	/* wait the link is down */
> -	if (ctrl->link_active_reporting)
> -		pcie_wait_link_not_active(ctrl);
> -	else
> -		msleep(1000);
> -
>  	slot_cmd = POWER_OFF;
>  	cmd_mask = PCI_EXP_SLTCTL_PCC;
>  	retval = pcie_write_cmd(ctrl, slot_cmd, cmd_mask);
> -- 
> 1.7.9.5
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rajat Jain Dec. 18, 2013, 3:20 a.m. UTC | #2
> -----Original Message-----

> From: Bjorn Helgaas [mailto:bhelgaas@google.com]

> 

> [+cc yinghai@kernel.org (seems to be Yinghai's preferred email]

> 

> On Tue, Dec 17, 2013 at 12:06:05PM -0800, Rajat Jain wrote:

> > We need future link up events for hot-add, thus don't disable the link

> > permanently during device removal. Also, remove the static functions

> > that are now left unused.

> 

> The changelog should mention that this reverts part of 2debd9289997

> ("PCI:

> pciehp: Disable/enable link during slot power off/on").


Sure, I'll add that. 

> 

> Yinghai, can you tell us whether this is an issue on your systems?

>


Thanks in advance Yinghai. 

Actually I did not understand the original problem and the solution in the first
place (so I also do not understand how might disabling of presence detect notification
help). If you can give more details on the original problem that shall be great. Here
is what I understood from the commit log:

The believe the HW looks like this:

PCIe port <----> Repeater <----> Device.

An in addition there is the presence detect pin that is connected directly from
The port to the device. Now, when the device is plugged out, the pin indicates
No presence. But are you saying the PCIe link from port to repeater is still up?

Part of log:
====================================
    ...
    It turns out the root complex is continually trying to train the link to
    the repeater because the repeater has not been reset.

    This patch will disable the link at removal time to allow the repeater
    to be reset properly.
    ...
====================================

1) I did not understand why would port try to retrain to the link in either cases -
Whether the link to the repeater is up or not?

2) When no link is seen, is THAT what causes repeater to go down and hence solve the
Problem?

3) I'd expect you'd continuously get "adapter not present" messages if some how
The driver was trying to add the device. But I did not understand where did
"adapter present" messages came from?

Sorry, but I guess I am totally confused now. I'll probably go to sleep :-) 

Just trying to understand.

Thanks,

Rajat

> > Signed-off-by: Rajat Jain <rajatjain@juniper.net>

> > Signed-off-by: Guenter Roeck <groeck@juniper.net>

> > ---

> > v3: no change, created by splitting the patch v2 [2/4]

> > v2: (non existent)

> > v1: (non existent)

> >

> >  drivers/pci/hotplug/pciehp_hpc.c |   18 ------------------

> >  1 file changed, 18 deletions(-)

> >

> > diff --git a/drivers/pci/hotplug/pciehp_hpc.c

> > b/drivers/pci/hotplug/pciehp_hpc.c

> > index b45b568..ab12555 100644

> > --- a/drivers/pci/hotplug/pciehp_hpc.c

> > +++ b/drivers/pci/hotplug/pciehp_hpc.c

> > @@ -278,11 +278,6 @@ static void pcie_wait_link_active(struct

> controller *ctrl)

> >  	__pcie_wait_link_active(ctrl, true);  }

> >

> > -static void pcie_wait_link_not_active(struct controller *ctrl) -{

> > -	__pcie_wait_link_active(ctrl, false);

> > -}

> > -

> >  static bool pci_bus_check_dev(struct pci_bus *bus, int devfn)  {

> >  	u32 l;

> > @@ -383,11 +378,6 @@ static int pciehp_link_enable(struct controller

> *ctrl)

> >  	return __pciehp_link_set(ctrl, true);  }

> >

> > -static int pciehp_link_disable(struct controller *ctrl) -{

> > -	return __pciehp_link_set(ctrl, false);

> > -}

> > -

> >  int pciehp_get_attention_status(struct slot *slot, u8 *status)  {

> >  	struct controller *ctrl = slot->ctrl; @@ -620,14 +610,6 @@ int

> > pciehp_power_off_slot(struct slot * slot)

> >  	u16 cmd_mask;

> >  	int retval;

> >

> > -	/* Disable the link at first */

> > -	pciehp_link_disable(ctrl);

> > -	/* wait the link is down */

> > -	if (ctrl->link_active_reporting)

> > -		pcie_wait_link_not_active(ctrl);

> > -	else

> > -		msleep(1000);

> > -

> >  	slot_cmd = POWER_OFF;

> >  	cmd_mask = PCI_EXP_SLTCTL_PCC;

> >  	retval = pcie_write_cmd(ctrl, slot_cmd, cmd_mask);

> > --

> > 1.7.9.5

> >

>
Guenter Roeck Dec. 18, 2013, 3:40 a.m. UTC | #3
On Wed, Dec 18, 2013 at 03:20:58AM +0000, Rajat Jain wrote:
> 
> 
> > -----Original Message-----
> > From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> > 
> > [+cc yinghai@kernel.org (seems to be Yinghai's preferred email]
> > 
> > On Tue, Dec 17, 2013 at 12:06:05PM -0800, Rajat Jain wrote:
> > > We need future link up events for hot-add, thus don't disable the link
> > > permanently during device removal. Also, remove the static functions
> > > that are now left unused.
> > 
> > The changelog should mention that this reverts part of 2debd9289997
> > ("PCI:
> > pciehp: Disable/enable link during slot power off/on").
> 
> Sure, I'll add that. 
> 
> > 
> > Yinghai, can you tell us whether this is an issue on your systems?
> >
> 
> Thanks in advance Yinghai. 
> 
> Actually I did not understand the original problem and the solution in the first
> place (so I also do not understand how might disabling of presence detect notification
> help). If you can give more details on the original problem that shall be great. Here
> is what I understood from the commit log:
> 
> The believe the HW looks like this:
> 
> PCIe port <----> Repeater <----> Device.
> 
> An in addition there is the presence detect pin that is connected directly from
> The port to the device. Now, when the device is plugged out, the pin indicates
> No presence. But are you saying the PCIe link from port to repeater is still up?
> 
> Part of log:
> ====================================
>     ...
>     It turns out the root complex is continually trying to train the link to
>     the repeater because the repeater has not been reset.
> 
>     This patch will disable the link at removal time to allow the repeater
>     to be reset properly.
>     ...
> ====================================
> 
> 1) I did not understand why would port try to retrain to the link in either cases -
> Whether the link to the repeater is up or not?
> 
> 2) When no link is seen, is THAT what causes repeater to go down and hence solve the
> Problem?
> 
> 3) I'd expect you'd continuously get "adapter not present" messages if some how
> The driver was trying to add the device. But I did not understand where did
> "adapter present" messages came from?
> 
> Sorry, but I guess I am totally confused now. I'll probably go to sleep :-) 
> 

What, that early ? :-)

I think we may have a similar setup on some of our boards. Maybe we can
reproduce the situation (after we understand what exactly it is and how to
trigger it).

Related to this patch set, I learned today that our Space product uses mpt2sas.
Just in case we need one for testing reset functionality.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rajat Jain Dec. 18, 2013, 4:06 a.m. UTC | #4
DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogR3VlbnRlciBSb2VjayBb
bWFpbHRvOmdyb2VjazdAZ21haWwuY29tXSBPbiBCZWhhbGYgT2YgR3VlbnRlcg0KPiBSb2Vjaw0K
PiBTZW50OiBUdWVzZGF5LCBEZWNlbWJlciAxNywgMjAxMyA3OjQwIFBNDQo+IFRvOiBSYWphdCBK
YWluDQo+IENjOiBCam9ybiBIZWxnYWFzOyBSYWphdCBKYWluOyBLZW5qaSBLYW5lc2hpZ2U7IEFs
ZXggV2lsbGlhbXNvbjsgWWlqaW5nDQo+IFdhbmc7IGxpbnV4LXBjaUB2Z2VyLmtlcm5lbC5vcmc7
IGxpbnV4LWtlcm5lbEB2Z2VyLmtlcm5lbC5vcmc7DQo+IFlAamFzcGVyLmVzDQo+IFN1YmplY3Q6
IFJlOiBbUEFUQ0ggdjMgNC84XSBwY2llaHA6IERvbid0IGRpc2FibGUgdGhlIGxpbmsgcGVybWFu
ZW50bHksDQo+IGR1cmluZyByZW1vdmFsDQo+IA0KPiBPbiBXZWQsIERlYyAxOCwgMjAxMyBhdCAw
MzoyMDo1OEFNICswMDAwLCBSYWphdCBKYWluIHdyb3RlOg0KPiA+DQo+ID4NCj4gPiA+IC0tLS0t
T3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+ID4gPiBGcm9tOiBCam9ybiBIZWxnYWFzIFttYWlsdG86
YmhlbGdhYXNAZ29vZ2xlLmNvbV0NCj4gPiA+DQo+ID4gPiBbK2NjIHlpbmdoYWlAa2VybmVsLm9y
ZyAoc2VlbXMgdG8gYmUgWWluZ2hhaSdzIHByZWZlcnJlZCBlbWFpbF0NCj4gPiA+DQo+ID4gPiBP
biBUdWUsIERlYyAxNywgMjAxMyBhdCAxMjowNjowNVBNIC0wODAwLCBSYWphdCBKYWluIHdyb3Rl
Og0KPiA+ID4gPiBXZSBuZWVkIGZ1dHVyZSBsaW5rIHVwIGV2ZW50cyBmb3IgaG90LWFkZCwgdGh1
cyBkb24ndCBkaXNhYmxlIHRoZQ0KPiA+ID4gPiBsaW5rIHBlcm1hbmVudGx5IGR1cmluZyBkZXZp
Y2UgcmVtb3ZhbC4gQWxzbywgcmVtb3ZlIHRoZSBzdGF0aWMNCj4gPiA+ID4gZnVuY3Rpb25zIHRo
YXQgYXJlIG5vdyBsZWZ0IHVudXNlZC4NCj4gPiA+DQo+ID4gPiBUaGUgY2hhbmdlbG9nIHNob3Vs
ZCBtZW50aW9uIHRoYXQgdGhpcyByZXZlcnRzIHBhcnQgb2YgMmRlYmQ5Mjg5OTk3DQo+ID4gPiAo
IlBDSToNCj4gPiA+IHBjaWVocDogRGlzYWJsZS9lbmFibGUgbGluayBkdXJpbmcgc2xvdCBwb3dl
ciBvZmYvb24iKS4NCj4gPg0KPiA+IFN1cmUsIEknbGwgYWRkIHRoYXQuDQo+ID4NCj4gPiA+DQo+
ID4gPiBZaW5naGFpLCBjYW4geW91IHRlbGwgdXMgd2hldGhlciB0aGlzIGlzIGFuIGlzc3VlIG9u
IHlvdXIgc3lzdGVtcz8NCj4gPiA+DQo+ID4NCj4gPiBUaGFua3MgaW4gYWR2YW5jZSBZaW5naGFp
Lg0KPiA+DQo+ID4gQWN0dWFsbHkgSSBkaWQgbm90IHVuZGVyc3RhbmQgdGhlIG9yaWdpbmFsIHBy
b2JsZW0gYW5kIHRoZSBzb2x1dGlvbiBpbg0KPiA+IHRoZSBmaXJzdCBwbGFjZSAoc28gSSBhbHNv
IGRvIG5vdCB1bmRlcnN0YW5kIGhvdyBtaWdodCBkaXNhYmxpbmcgb2YNCj4gPiBwcmVzZW5jZSBk
ZXRlY3Qgbm90aWZpY2F0aW9uIGhlbHApLiBJZiB5b3UgY2FuIGdpdmUgbW9yZSBkZXRhaWxzIG9u
DQo+ID4gdGhlIG9yaWdpbmFsIHByb2JsZW0gdGhhdCBzaGFsbCBiZSBncmVhdC4gSGVyZSBpcyB3
aGF0IEkgdW5kZXJzdG9vZA0KPiBmcm9tIHRoZSBjb21taXQgbG9nOg0KPiA+DQo+ID4gVGhlIGJl
bGlldmUgdGhlIEhXIGxvb2tzIGxpa2UgdGhpczoNCj4gPg0KPiA+IFBDSWUgcG9ydCA8LS0tLT4g
UmVwZWF0ZXIgPC0tLS0+IERldmljZS4NCj4gPg0KPiA+IEFuIGluIGFkZGl0aW9uIHRoZXJlIGlz
IHRoZSBwcmVzZW5jZSBkZXRlY3QgcGluIHRoYXQgaXMgY29ubmVjdGVkDQo+ID4gZGlyZWN0bHkg
ZnJvbSBUaGUgcG9ydCB0byB0aGUgZGV2aWNlLiBOb3csIHdoZW4gdGhlIGRldmljZSBpcyBwbHVn
Z2VkDQo+ID4gb3V0LCB0aGUgcGluIGluZGljYXRlcyBObyBwcmVzZW5jZS4gQnV0IGFyZSB5b3Ug
c2F5aW5nIHRoZSBQQ0llIGxpbmsNCj4gZnJvbSBwb3J0IHRvIHJlcGVhdGVyIGlzIHN0aWxsIHVw
Pw0KPiA+DQo+ID4gUGFydCBvZiBsb2c6DQo+ID4gPT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09DQo+ID4gICAgIC4uLg0KPiA+ICAgICBJdCB0dXJucyBvdXQgdGhlIHJvb3QgY29t
cGxleCBpcyBjb250aW51YWxseSB0cnlpbmcgdG8gdHJhaW4gdGhlDQo+IGxpbmsgdG8NCj4gPiAg
ICAgdGhlIHJlcGVhdGVyIGJlY2F1c2UgdGhlIHJlcGVhdGVyIGhhcyBub3QgYmVlbiByZXNldC4N
Cj4gPg0KPiA+ICAgICBUaGlzIHBhdGNoIHdpbGwgZGlzYWJsZSB0aGUgbGluayBhdCByZW1vdmFs
IHRpbWUgdG8gYWxsb3cgdGhlDQo+IHJlcGVhdGVyDQo+ID4gICAgIHRvIGJlIHJlc2V0IHByb3Bl
cmx5Lg0KPiA+ICAgICAuLi4NCj4gPiA9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT0NCj4gPg0KPiA+IDEpIEkgZGlkIG5vdCB1bmRlcnN0YW5kIHdoeSB3b3VsZCBwb3J0IHRyeSB0
byByZXRyYWluIHRvIHRoZSBsaW5rIGluDQo+ID4gZWl0aGVyIGNhc2VzIC0gV2hldGhlciB0aGUg
bGluayB0byB0aGUgcmVwZWF0ZXIgaXMgdXAgb3Igbm90Pw0KPiA+DQo+ID4gMikgV2hlbiBubyBs
aW5rIGlzIHNlZW4sIGlzIFRIQVQgd2hhdCBjYXVzZXMgcmVwZWF0ZXIgdG8gZ28gZG93biBhbmQN
Cj4gPiBoZW5jZSBzb2x2ZSB0aGUgUHJvYmxlbT8NCj4gPg0KPiA+IDMpIEknZCBleHBlY3QgeW91
J2QgY29udGludW91c2x5IGdldCAiYWRhcHRlciBub3QgcHJlc2VudCIgbWVzc2FnZXMgaWYNCj4g
PiBzb21lIGhvdyBUaGUgZHJpdmVyIHdhcyB0cnlpbmcgdG8gYWRkIHRoZSBkZXZpY2UuIEJ1dCBJ
IGRpZCBub3QNCj4gPiB1bmRlcnN0YW5kIHdoZXJlIGRpZCAiYWRhcHRlciBwcmVzZW50IiBtZXNz
YWdlcyBjYW1lIGZyb20/DQo+ID4NCj4gPiBTb3JyeSwgYnV0IEkgZ3Vlc3MgSSBhbSB0b3RhbGx5
IGNvbmZ1c2VkIG5vdy4gSSdsbCBwcm9iYWJseSBnbyB0bw0KPiA+IHNsZWVwIDotKQ0KPiA+DQo+
IA0KPiBXaGF0LCB0aGF0IGVhcmx5ID8gOi0pDQoNCkkgd2FzIHVwIHVudGlsIDMgQU0gbGFzdCBu
aWdodC4gImdpdCIgKG9yIHRoZSBpZ25vcmFuY2Ugb2YgaXQpIGtlZXBzIG1lIGJ1c3kgSSBndWVz
cyA6LSkNCg0KPiANCj4gSSB0aGluayB3ZSBtYXkgaGF2ZSBhIHNpbWlsYXIgc2V0dXAgb24gc29t
ZSBvZiBvdXIgYm9hcmRzLiBNYXliZSB3ZSBjYW4NCj4gcmVwcm9kdWNlIHRoZSBzaXR1YXRpb24g
KGFmdGVyIHdlIHVuZGVyc3RhbmQgd2hhdCBleGFjdGx5IGl0IGlzIGFuZCBob3cNCj4gdG8gdHJp
Z2dlciBpdCkuDQo+IA0KPiBSZWxhdGVkIHRvIHRoaXMgcGF0Y2ggc2V0LCBJIGxlYXJuZWQgdG9k
YXkgdGhhdCBvdXIgU3BhY2UgcHJvZHVjdCB1c2VzDQo+IG1wdDJzYXMuDQo+IEp1c3QgaW4gY2Fz
ZSB3ZSBuZWVkIG9uZSBmb3IgdGVzdGluZyByZXNldCBmdW5jdGlvbmFsaXR5Lg0KPiANCj4gR3Vl
bnRlcg0KPiANCg0K

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu Dec. 18, 2013, 5:14 a.m. UTC | #5
On Tue, Dec 17, 2013 at 7:20 PM, Rajat Jain <rajatjain@juniper.net> wrote:
>
> Actually I did not understand the original problem and the solution in the first
> place (so I also do not understand how might disabling of presence detect notification
> help). If you can give more details on the original problem that shall be great. Here
> is what I understood from the commit log:
>
> The believe the HW looks like this:
>
> PCIe port <----> Repeater <----> Device.
>
> An in addition there is the presence detect pin that is connected directly from
> The port to the device. Now, when the device is plugged out, the pin indicates
> No presence. But are you saying the PCIe link from port to repeater is still up?

After the card is removed from the slot.

PCIe port try to retrain the link to repeater, like the link will keep
up and down.

so the presence bit will keep report one card present and not present.
that present bit should be OR of inband input and outband input.
We check the outband input and it always report correctly.

According to HW guys and Intel, that should be bug of repeater.

Disable the link from pcie to repeater, likely to reset the repeater....

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rajat Jain Dec. 18, 2013, 6:17 a.m. UTC | #6
> -----Original Message-----

> From: yhlu.kernel@gmail.com [mailto:yhlu.kernel@gmail.com] On Behalf Of

> 

> On Tue, Dec 17, 2013 at 7:20 PM, Rajat Jain <rajatjain@juniper.net>

> wrote:

> >

> > Actually I did not understand the original problem and the solution in

> > the first place (so I also do not understand how might disabling of

> > presence detect notification help). If you can give more details on

> > the original problem that shall be great. Here is what I understood

> from the commit log:

> >

> > The believe the HW looks like this:

> >

> > PCIe port <----> Repeater <----> Device.

> >

> > An in addition there is the presence detect pin that is connected

> > directly from The port to the device. Now, when the device is plugged

> > out, the pin indicates No presence. But are you saying the PCIe link

> from port to repeater is still up?

> 

> After the card is removed from the slot.

> 

> PCIe port try to retrain the link to repeater, like the link will keep

> up and down.


I still did not understand the why would the PCIe port try to do so. Are you
Saying the repeater keeps on flapping the link? Nevertheless, I assume it is
due to some repeater bug (that you mention below) that I do not need to
understand.

> 

> so the presence bit will keep report one card present and not present.

> that present bit should be OR of inband input and outband input.

> We check the outband input and it always report correctly.


Just a suggestion for a work around. I checked the manual of the PCIe switch
we use in our system (IDT 89HPES48H12G2), and it is possible at least in this
switch to control whether or not in-band is considered. An internal configuration
register says:

Presence Detect Control. This field controls the manner in which
presence of an adapter in a slot is reported to the hot-plug controller
associated with a downstream switch port.
0x0 - (both) Presence of an adapter in the slot is reported as the
logical “OR” of the receiver detect mechanism and the hotplug
presence detect input (PxPDN).
0x1 - (signal) Presence of an adapter in the slot is reported as the
state of the hot-plug presence detect input (PxPDN).
0x2 - (always) When selected, this mode always informs the hotplug
controller that an adapter is present.
0x3 - (never) When selected, this mode always informs the hotplug
controller that an adapter is not present.

May be some similar config may be present in the switch or the CPU that you use.

Just a random thought.

> 

> According to HW guys and Intel, that should be bug of repeater.


Well, in that case I doubt if the patch will solve the problem. I think
Most likely the "card present / not present" messages may be replaced
By "link-up / link-down" messages. But I'd appreciate your testing.

Thanks,

Rajat

> 

> Disable the link from pcie to repeater, likely to reset the repeater....

> 

> Thanks

> 

> Yinghai

>
Rajat Jain Jan. 5, 2014, 5:53 p.m. UTC | #7
Hello Bjorn,

Just checking on the fate of this patch set...

On Tue, Dec 17, 2013 at 5:02 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [+cc yinghai@kernel.org (seems to be Yinghai's preferred email]
>
> On Tue, Dec 17, 2013 at 12:06:05PM -0800, Rajat Jain wrote:
>> We need future link up events for hot-add, thus don't disable
>> the link permanently during device removal. Also, remove the static
>> functions that are now left unused.
>
> The changelog should mention that this reverts part of 2debd9289997 ("PCI:
> pciehp: Disable/enable link during slot power off/on").

Sure. Do you want me to submit another patch set (bumping up the
version) with this change log, or you'd want to add this change log
while merging?

>
> Yinghai, can you tell us whether this is an issue on your systems?

As Yinghai confirms further down this thread, his issue was confirmed
by Intel to be a bug in the repeater chip.
----------------------------------
Yinghai writes:
> According to HW guys and Intel, that should be bug of repeater.
>
---------------------------------
 I don't know about the details of his scenario, except that when the
adapter was disabled the repeater kept on flapping the link up & down
(and hence disabling the link solved the problem then). Yinghai
couldn't test, but I believe with this patch even if we disable
presence detect interrupt, the "adapter present / no present" messages
would (rightly) convert to "Link Up / Link Down" messages (since the
repeater keeps on flapping the link).

Since it is a platform specific bug, I'm not sure what can be done to
remove those messages except may be reduce the verbosity? If you'd
like I could change all the INFO messages to DBG messages.

Please let me know how to proceed further on this. Also, did you get a
chance to look at the subsequent patches of this patch set, I was
wondering if you had any comments there?

Thanks,

Rajat

>
>> Signed-off-by: Rajat Jain <rajatjain@juniper.net>
>> Signed-off-by: Guenter Roeck <groeck@juniper.net>
>> ---
>> v3: no change, created by splitting the patch v2 [2/4]
>> v2: (non existent)
>> v1: (non existent)
>>
>>  drivers/pci/hotplug/pciehp_hpc.c |   18 ------------------
>>  1 file changed, 18 deletions(-)
>>
>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>> index b45b568..ab12555 100644
>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>> @@ -278,11 +278,6 @@ static void pcie_wait_link_active(struct controller *ctrl)
>>       __pcie_wait_link_active(ctrl, true);
>>  }
>>
>> -static void pcie_wait_link_not_active(struct controller *ctrl)
>> -{
>> -     __pcie_wait_link_active(ctrl, false);
>> -}
>> -
>>  static bool pci_bus_check_dev(struct pci_bus *bus, int devfn)
>>  {
>>       u32 l;
>> @@ -383,11 +378,6 @@ static int pciehp_link_enable(struct controller *ctrl)
>>       return __pciehp_link_set(ctrl, true);
>>  }
>>
>> -static int pciehp_link_disable(struct controller *ctrl)
>> -{
>> -     return __pciehp_link_set(ctrl, false);
>> -}
>> -
>>  int pciehp_get_attention_status(struct slot *slot, u8 *status)
>>  {
>>       struct controller *ctrl = slot->ctrl;
>> @@ -620,14 +610,6 @@ int pciehp_power_off_slot(struct slot * slot)
>>       u16 cmd_mask;
>>       int retval;
>>
>> -     /* Disable the link at first */
>> -     pciehp_link_disable(ctrl);
>> -     /* wait the link is down */
>> -     if (ctrl->link_active_reporting)
>> -             pcie_wait_link_not_active(ctrl);
>> -     else
>> -             msleep(1000);
>> ->>       slot_cmd = POWER_OFF;
>>       cmd_mask = PCI_EXP_SLTCTL_PCC;
>>       retval = pcie_write_cmd(ctrl, slot_cmd, cmd_mask);
>> --
>> 1.7.9.5
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Jan. 7, 2014, 12:03 a.m. UTC | #8
On Sun, Jan 5, 2014 at 10:53 AM, Rajat Jain <rajatxjain@gmail.com> wrote:
> Hello Bjorn,
>
> Just checking on the fate of this patch set...
>
> On Tue, Dec 17, 2013 at 5:02 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> [+cc yinghai@kernel.org (seems to be Yinghai's preferred email]
>>
>> On Tue, Dec 17, 2013 at 12:06:05PM -0800, Rajat Jain wrote:
>>> We need future link up events for hot-add, thus don't disable
>>> the link permanently during device removal. Also, remove the static
>>> functions that are now left unused.
>>
>> The changelog should mention that this reverts part of 2debd9289997 ("PCI:
>> pciehp: Disable/enable link during slot power off/on").
>
> Sure. Do you want me to submit another patch set (bumping up the
> version) with this change log, or you'd want to add this change log
> while merging?
>
>>
>> Yinghai, can you tell us whether this is an issue on your systems?
>
> As Yinghai confirms further down this thread, his issue was confirmed
> by Intel to be a bug in the repeater chip.
> ----------------------------------
> Yinghai writes:
>> According to HW guys and Intel, that should be bug of repeater.
>>
> ---------------------------------
>  I don't know about the details of his scenario, except that when the
> adapter was disabled the repeater kept on flapping the link up & down
> (and hence disabling the link solved the problem then). Yinghai
> couldn't test, but I believe with this patch even if we disable
> presence detect interrupt, the "adapter present / no present" messages
> would (rightly) convert to "Link Up / Link Down" messages (since the
> repeater keeps on flapping the link).
>
> Since it is a platform specific bug, I'm not sure what can be done to
> remove those messages except may be reduce the verbosity? If you'd
> like I could change all the INFO messages to DBG messages.

Even if it's a defect in a particular piece of hardware, I don't want
to regress on that hardware, even if the regression is just extra
messages that we didn't see before.

I think ideally we would add some sort of quirk for that hardware so
it works just as well as it does today.  I think extra messages will
lead to a bug reports from users.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rajat Jain Jan. 7, 2014, 6:20 p.m. UTC | #9
Hello Bjorn / Yinghai,

> -----Original Message-----

> From: Bjorn Helgaas [mailto:bhelgaas@google.com]

> Sent: Monday, January 06, 2014 4:04 PM

> To: Rajat Jain

> Cc: Rajat Jain; Kenji Kaneshige; Alex Williamson; Yijing Wang; linux-

> pci@vger.kernel.org; linux-kernel@vger.kernel.org; Yinghai Lu; Guenter

> Roeck; Rajat Jain; Yinghai Lu

> Subject: Re: [PATCH v3 4/8] pciehp: Don't disable the link permanently,

> during removal

> 

> On Sun, Jan 5, 2014 at 10:53 AM, Rajat Jain <rajatxjain@gmail.com>

> wrote:

> > Hello Bjorn,

> >

> > Just checking on the fate of this patch set...

> >

> > On Tue, Dec 17, 2013 at 5:02 PM, Bjorn Helgaas <bhelgaas@google.com>

> wrote:

> >> [+cc yinghai@kernel.org (seems to be Yinghai's preferred email]

> >>

> >> On Tue, Dec 17, 2013 at 12:06:05PM -0800, Rajat Jain wrote:

> >>> We need future link up events for hot-add, thus don't disable the

> >>> link permanently during device removal. Also, remove the static

> >>> functions that are now left unused.

> >>

> >> The changelog should mention that this reverts part of 2debd9289997

> ("PCI:

> >> pciehp: Disable/enable link during slot power off/on").

> >

> > Sure. Do you want me to submit another patch set (bumping up the

> > version) with this change log, or you'd want to add this change log

> > while merging?

> >

> >>

> >> Yinghai, can you tell us whether this is an issue on your systems?

> >

> > As Yinghai confirms further down this thread, his issue was confirmed

> > by Intel to be a bug in the repeater chip.

> > ----------------------------------

> > Yinghai writes:

> >> According to HW guys and Intel, that should be bug of repeater.

> >>

> > ---------------------------------

> >  I don't know about the details of his scenario, except that when the

> > adapter was disabled the repeater kept on flapping the link up & down

> > (and hence disabling the link solved the problem then). Yinghai

> > couldn't test, but I believe with this patch even if we disable

> > presence detect interrupt, the "adapter present / no present" messages

> > would (rightly) convert to "Link Up / Link Down" messages (since the

> > repeater keeps on flapping the link).

> >

> > Since it is a platform specific bug, I'm not sure what can be done to

> > remove those messages except may be reduce the verbosity? If you'd

> > like I could change all the INFO messages to DBG messages.

> 

> Even if it's a defect in a particular piece of hardware, I don't want to

> regress on that hardware, even if the regression is just extra messages

> that we didn't see before.

> 

> I think ideally we would add some sort of quirk for that hardware so it

> works just as well as it does today.  I think extra messages will lead

> to a bug reports from users.


Sure, I can do that. I think what the quirk would have to do is that for that particular platform, don't enable the link-state based hotplug. (Since link-state hotplug will not work if we disable the link permanently as we do today on card removal).

But the question is how to determine that the quirk has to be applied? I think the objective is to apply the quirk to the platforms that have a "PCIe repeater". Since this does not depend on a PCI device / vendor ID, and I think the PCIe repeater is probably not even visible to the pciehp or the PCI subsystem, how do I determine that the quirk has to be applied?

If (hw_has_pcie_repeater)
	Don't use link-state hotplug (and disable link permanently during card removal)
Else
	Use link-state hotplug (and don't disable the link permanently)
	

Yinghai: Since I do not have that hardware, I will need some help in testing the patch with the quirk. I was wondering if you'd still have that hardware around and would be able to help me with testing?

Thanks,

Rajat
Rajat Jain Jan. 9, 2014, 8:45 p.m. UTC | #10
Hi Bjorn,

> -----Original Message-----

> From: linux-pci-owner@vger.kernel.org [mailto:linux-pci-

> owner@vger.kernel.org] On Behalf Of Rajat Jain

> Sent: Tuesday, January 07, 2014 10:21 AM

> To: Bjorn Helgaas; Rajat Jain

> Cc: Rajat Jain; Kenji Kaneshige; Alex Williamson; Yijing Wang; linux-

> pci@vger.kernel.org; linux-kernel@vger.kernel.org; Yinghai Lu; Guenter

> Roeck; Yinghai Lu

> Subject: RE: [PATCH v3 4/8] pciehp: Don't disable the link permanently,

> during removal

> 

> Hello Bjorn / Yinghai,

> 

> > -----Original Message-----

> > From: Bjorn Helgaas [mailto:bhelgaas@google.com]

> > Sent: Monday, January 06, 2014 4:04 PM

> > To: Rajat Jain

> > Cc: Rajat Jain; Kenji Kaneshige; Alex Williamson; Yijing Wang; linux-

> > pci@vger.kernel.org; linux-kernel@vger.kernel.org; Yinghai Lu; Guenter

> > Roeck; Rajat Jain; Yinghai Lu

> > Subject: Re: [PATCH v3 4/8] pciehp: Don't disable the link

> > permanently, during removal

> >

> > On Sun, Jan 5, 2014 at 10:53 AM, Rajat Jain <rajatxjain@gmail.com>

> > wrote:

> > > Hello Bjorn,

> > >

> > > Just checking on the fate of this patch set...

> > >

> > > On Tue, Dec 17, 2013 at 5:02 PM, Bjorn Helgaas <bhelgaas@google.com>

> > wrote:

> > >> [+cc yinghai@kernel.org (seems to be Yinghai's preferred email]

> > >>

> > >> On Tue, Dec 17, 2013 at 12:06:05PM -0800, Rajat Jain wrote:

> > >>> We need future link up events for hot-add, thus don't disable the

> > >>> link permanently during device removal. Also, remove the static

> > >>> functions that are now left unused.

> > >>

> > >> The changelog should mention that this reverts part of 2debd9289997

> > ("PCI:

> > >> pciehp: Disable/enable link during slot power off/on").

> > >

> > > Sure. Do you want me to submit another patch set (bumping up the

> > > version) with this change log, or you'd want to add this change log

> > > while merging?

> > >

> > >>

> > >> Yinghai, can you tell us whether this is an issue on your systems?

> > >

> > > As Yinghai confirms further down this thread, his issue was

> > > confirmed by Intel to be a bug in the repeater chip.

> > > ----------------------------------

> > > Yinghai writes:

> > >> According to HW guys and Intel, that should be bug of repeater.

> > >>

> > > ---------------------------------

> > >  I don't know about the details of his scenario, except that when

> > > the adapter was disabled the repeater kept on flapping the link up &

> > > down (and hence disabling the link solved the problem then). Yinghai

> > > couldn't test, but I believe with this patch even if we disable

> > > presence detect interrupt, the "adapter present / no present"

> > > messages would (rightly) convert to "Link Up / Link Down" messages

> > > (since the repeater keeps on flapping the link).

> > >

> > > Since it is a platform specific bug, I'm not sure what can be done

> > > to remove those messages except may be reduce the verbosity? If

> > > you'd like I could change all the INFO messages to DBG messages.

> >

> > Even if it's a defect in a particular piece of hardware, I don't want

> > to regress on that hardware, even if the regression is just extra

> > messages that we didn't see before.

> >

> > I think ideally we would add some sort of quirk for that hardware so

> > it works just as well as it does today.  I think extra messages will

> > lead to a bug reports from users.

> 

> Sure, I can do that. I think what the quirk would have to do is that for

> that particular platform, don't enable the link-state based hotplug.

> (Since link-state hotplug will not work if we disable the link

> permanently as we do today on card removal).

> 

> But the question is how to determine that the quirk has to be applied? I

> think the objective is to apply the quirk to the platforms that have a

> "PCIe repeater". Since this does not depend on a PCI device / vendor ID,

> and I think the PCIe repeater is probably not even visible to the pciehp

> or the PCI subsystem, how do I determine that the quirk has to be

> applied?


Any ideas on how do I identify the platforms that may have this problem? 

Thanks,

Rajat


> 

> If (hw_has_pcie_repeater)

> 	Don't use link-state hotplug (and disable link permanently during

> card removal) Else

> 	Use link-state hotplug (and don't disable the link permanently)

> 

> 

> Yinghai: Since I do not have that hardware, I will need some help in

> testing the patch with the quirk. I was wondering if you'd still have

> that hardware around and would be able to help me with testing?

> 

> Thanks,

> 

> Rajat

>  {.n +       +%  lzwm  b ?  r  zX  \ )   w*jg        ?j/   z ?  2 ?

> & )? a     G   h  j:+v   w ?
Bjorn Helgaas Jan. 9, 2014, 8:58 p.m. UTC | #11
On Thu, Jan 9, 2014 at 1:45 PM, Rajat Jain <rajatjain@juniper.net> wrote:
> Hi Bjorn,
>
>> -----Original Message-----
>> From: linux-pci-owner@vger.kernel.org [mailto:linux-pci-
>> owner@vger.kernel.org] On Behalf Of Rajat Jain
>> Sent: Tuesday, January 07, 2014 10:21 AM
>> To: Bjorn Helgaas; Rajat Jain
>> Cc: Rajat Jain; Kenji Kaneshige; Alex Williamson; Yijing Wang; linux-
>> pci@vger.kernel.org; linux-kernel@vger.kernel.org; Yinghai Lu; Guenter
>> Roeck; Yinghai Lu
>> Subject: RE: [PATCH v3 4/8] pciehp: Don't disable the link permanently,
>> during removal
>>
>> Hello Bjorn / Yinghai,
>>
>> > -----Original Message-----
>> > From: Bjorn Helgaas [mailto:bhelgaas@google.com]
>> > Sent: Monday, January 06, 2014 4:04 PM
>> > To: Rajat Jain
>> > Cc: Rajat Jain; Kenji Kaneshige; Alex Williamson; Yijing Wang; linux-
>> > pci@vger.kernel.org; linux-kernel@vger.kernel.org; Yinghai Lu; Guenter
>> > Roeck; Rajat Jain; Yinghai Lu
>> > Subject: Re: [PATCH v3 4/8] pciehp: Don't disable the link
>> > permanently, during removal
>> >
>> > On Sun, Jan 5, 2014 at 10:53 AM, Rajat Jain <rajatxjain@gmail.com>
>> > wrote:
>> > > Hello Bjorn,
>> > >
>> > > Just checking on the fate of this patch set...
>> > >
>> > > On Tue, Dec 17, 2013 at 5:02 PM, Bjorn Helgaas <bhelgaas@google.com>
>> > wrote:
>> > >> [+cc yinghai@kernel.org (seems to be Yinghai's preferred email]
>> > >>
>> > >> On Tue, Dec 17, 2013 at 12:06:05PM -0800, Rajat Jain wrote:
>> > >>> We need future link up events for hot-add, thus don't disable the
>> > >>> link permanently during device removal. Also, remove the static
>> > >>> functions that are now left unused.
>> > >>
>> > >> The changelog should mention that this reverts part of 2debd9289997
>> > ("PCI:
>> > >> pciehp: Disable/enable link during slot power off/on").
>> > >
>> > > Sure. Do you want me to submit another patch set (bumping up the
>> > > version) with this change log, or you'd want to add this change log
>> > > while merging?
>> > >
>> > >>
>> > >> Yinghai, can you tell us whether this is an issue on your systems?
>> > >
>> > > As Yinghai confirms further down this thread, his issue was
>> > > confirmed by Intel to be a bug in the repeater chip.
>> > > ----------------------------------
>> > > Yinghai writes:
>> > >> According to HW guys and Intel, that should be bug of repeater.
>> > >>
>> > > ---------------------------------
>> > >  I don't know about the details of his scenario, except that when
>> > > the adapter was disabled the repeater kept on flapping the link up &
>> > > down (and hence disabling the link solved the problem then). Yinghai
>> > > couldn't test, but I believe with this patch even if we disable
>> > > presence detect interrupt, the "adapter present / no present"
>> > > messages would (rightly) convert to "Link Up / Link Down" messages
>> > > (since the repeater keeps on flapping the link).
>> > >
>> > > Since it is a platform specific bug, I'm not sure what can be done
>> > > to remove those messages except may be reduce the verbosity? If
>> > > you'd like I could change all the INFO messages to DBG messages.
>> >
>> > Even if it's a defect in a particular piece of hardware, I don't want
>> > to regress on that hardware, even if the regression is just extra
>> > messages that we didn't see before.
>> >
>> > I think ideally we would add some sort of quirk for that hardware so
>> > it works just as well as it does today.  I think extra messages will
>> > lead to a bug reports from users.
>>
>> Sure, I can do that. I think what the quirk would have to do is that for
>> that particular platform, don't enable the link-state based hotplug.
>> (Since link-state hotplug will not work if we disable the link
>> permanently as we do today on card removal).
>>
>> But the question is how to determine that the quirk has to be applied? I
>> think the objective is to apply the quirk to the platforms that have a
>> "PCIe repeater". Since this does not depend on a PCI device / vendor ID,
>> and I think the PCIe repeater is probably not even visible to the pciehp
>> or the PCI subsystem, how do I determine that the quirk has to be
>> applied?
>
> Any ideas on how do I identify the platforms that may have this problem?

I sure don't know.  I suspect you're right that the PCIe repeater is
invisible to software, at least in terms of PCI config space.  Maybe
we could use DMI to identify platforms.  That's not a very good
solution because we have to come up with a list, but I can't think of
a better way.  Yinghai knows more about the platform and might have
better ideas.

Bjorn

>> If (hw_has_pcie_repeater)
>>       Don't use link-state hotplug (and disable link permanently during
>> card removal) Else
>>       Use link-state hotplug (and don't disable the link permanently)
>>
>>
>> Yinghai: Since I do not have that hardware, I will need some help in
>> testing the patch with the quirk. I was wondering if you'd still have
>> that hardware around and would be able to help me with testing?
>>
>> Thanks,
>>
>> Rajat
>>   {.n +       +%  lzwm  b ?  r  zX   \ )   w*jg         ?j/   z ?  2 ?
>> & )? a       G   h   j:+v   w ?
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rajat Jain Jan. 13, 2014, 8:30 a.m. UTC | #12
Hi Yinghai / Bjorn,


On Thu, Jan 9, 2014 at 12:58 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> >
>>> > On Sun, Jan 5, 2014 at 10:53 AM, Rajat Jain <rajatxjain@gmail.com>
>>> > wrote:
>>> > > Hello Bjorn,
>>> > >
>>> > > Just checking on the fate of this patch set...
>>> > >
>>> > > On Tue, Dec 17, 2013 at 5:02 PM, Bjorn Helgaas <bhelgaas@google.com>
>>> > wrote:
>>> > >> [+cc yinghai@kernel.org (seems to be Yinghai's preferred email]
>>> > >>
>>> > >> On Tue, Dec 17, 2013 at 12:06:05PM -0800, Rajat Jain wrote:
>>> > >>> We need future link up events for hot-add, thus don't disable the
>>> > >>> link permanently during device removal. Also, remove the static
>>> > >>> functions that are now left unused.
>>> > >>
>>> > >> The changelog should mention that this reverts part of 2debd9289997
>>> > ("PCI:
>>> > >> pciehp: Disable/enable link during slot power off/on").
>>> > >
>>> > > Sure. Do you want me to submit another patch set (bumping up the
>>> > > version) with this change log, or you'd want to add this change log
>>> > > while merging?
>>> > >
>>> > >>
>>> > >> Yinghai, can you tell us whether this is an issue on your systems?
>>> > >
>>> > > As Yinghai confirms further down this thread, his issue was
>>> > > confirmed by Intel to be a bug in the repeater chip.
>>> > > ----------------------------------
>>> > > Yinghai writes:
>>> > >> According to HW guys and Intel, that should be bug of repeater.
>>> > >>
>>> > > ---------------------------------
>>> > >  I don't know about the details of his scenario, except that when
>>> > > the adapter was disabled the repeater kept on flapping the link up &
>>> > > down (and hence disabling the link solved the problem then). Yinghai
>>> > > couldn't test, but I believe with this patch even if we disable
>>> > > presence detect interrupt, the "adapter present / no present"
>>> > > messages would (rightly) convert to "Link Up / Link Down" messages
>>> > > (since the repeater keeps on flapping the link).
>>> > >
>>> > > Since it is a platform specific bug, I'm not sure what can be done
>>> > > to remove those messages except may be reduce the verbosity? If
>>> > > you'd like I could change all the INFO messages to DBG messages.
>>> >
>>> > Even if it's a defect in a particular piece of hardware, I don't want
>>> > to regress on that hardware, even if the regression is just extra
>>> > messages that we didn't see before.
>>> >
>>> > I think ideally we would add some sort of quirk for that hardware so
>>> > it works just as well as it does today.  I think extra messages will
>>> > lead to a bug reports from users.
>>>
>>> Sure, I can do that. I think what the quirk would have to do is that for
>>> that particular platform, don't enable the link-state based hotplug.
>>> (Since link-state hotplug will not work if we disable the link
>>> permanently as we do today on card removal).
>>>
>>> But the question is how to determine that the quirk has to be applied? I
>>> think the objective is to apply the quirk to the platforms that have a
>>> "PCIe repeater". Since this does not depend on a PCI device / vendor ID,
>>> and I think the PCIe repeater is probably not even visible to the pciehp
>>> or the PCI subsystem, how do I determine that the quirk has to be
>>> applied?
>>
>> Any ideas on how do I identify the platforms that may have this problem?
>
> I sure don't know.  I suspect you're right that the PCIe repeater is
> invisible to software, at least in terms of PCI config space.  Maybe
> we could use DMI to identify platforms.  That's not a very good
> solution because we have to come up with a list, but I can't think of
> a better way.  Yinghai knows more about the platform and might have
> better ideas.

Yinghai: I am trying to understand what exactly is this platform bug
and how to add a quirk such that this platform remains unaffected. Can
you please help me by suggesting how to decide if this is _the_
platform that has the bug (the pcie repeater).

Bjorn: It seems to be that identification of this platform will be out
PCI code (since the bug seems to be in a pcie repeater chip which is
not a PCI device visible to SW). So even if we find a way to identify
this platform (e.g DMI) , I doubt if you'd want me to add that in the
pciehp code (which is platform independent so far). At best, the only
way out I can see is to provide a knob from the pciehp, that can be
use by the platform code to either enable or disable the link state
hotplug. It could go back towards using a module parameter like
pciehp_use_link_events. Please suggest.

The only other way I can think of, is that I can remove the debug
message altogether (Link up / Link down). (Or the user can change the
verbosity).

Humm, when I think of it, we're trying to address a bug of a chip
which is not a PCI device, into pciehp. I'm praying it doesn't bring
this patch set to a dead end :-)

Thanks,

Rajat

>
> Bjorn
>
>>> If (hw_has_pcie_repeater)
>>>       Don't use link-state hotplug (and disable link permanently during
>>> card removal) Else
>>>       Use link-state hotplug (and don't disable the link permanently)
>>>
>>>
>>> Yinghai: Since I do not have that hardware, I will need some help in
>>> testing the patch with the quirk. I was wondering if you'd still have
>>> that hardware around and would be able to help me with testing?
>>>
>>> Thanks,
>>>
>>> Rajat
>>>   {.n +       +%  lzwm  b ?  r  zX   \ )   w*jg         ?j/   z ?  2 ?
>>> & )? a       G   h   j:+v   w ?
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Jan. 13, 2014, 5:38 p.m. UTC | #13
On Mon, Jan 13, 2014 at 1:30 AM, Rajat Jain <rajatxjain@gmail.com> wrote:

> Yinghai: I am trying to understand what exactly is this platform bug
> and how to add a quirk such that this platform remains unaffected. Can
> you please help me by suggesting how to decide if this is _the_
> platform that has the bug (the pcie repeater).
>
> Bjorn: It seems to be that identification of this platform will be out
> PCI code (since the bug seems to be in a pcie repeater chip which is
> not a PCI device visible to SW). So even if we find a way to identify
> this platform (e.g DMI) , I doubt if you'd want me to add that in the
> pciehp code (which is platform independent so far). At best, the only
> way out I can see is to provide a knob from the pciehp, that can be
> use by the platform code to either enable or disable the link state
> hotplug. It could go back towards using a module parameter like
> pciehp_use_link_events. Please suggest.
>
> The only other way I can think of, is that I can remove the debug
> message altogether (Link up / Link down). (Or the user can change the
> verbosity).

I think it's perfectly fine to add a DMI-based quirk in pciehp.  Yes,
it's a bit ugly, but that's just the nature of working around hardware
defects.  Identify the platform, emit a diagnostic ("disabling link
state because platform may be buggy"), enable the workaround.  That
seems better than requiring the user to figure out what hardware he
has and whether it has a defect.

I would also be OK with adding a pciehp module parameter to explicitly
enable or disable the workaround if that seems necessary.  I just want
the common case of correctly working hardware to work without any
switches.

Removing or rate-limiting the link up/down debug message is also fine with me.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index b45b568..ab12555 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -278,11 +278,6 @@  static void pcie_wait_link_active(struct controller *ctrl)
 	__pcie_wait_link_active(ctrl, true);
 }
 
-static void pcie_wait_link_not_active(struct controller *ctrl)
-{
-	__pcie_wait_link_active(ctrl, false);
-}
-
 static bool pci_bus_check_dev(struct pci_bus *bus, int devfn)
 {
 	u32 l;
@@ -383,11 +378,6 @@  static int pciehp_link_enable(struct controller *ctrl)
 	return __pciehp_link_set(ctrl, true);
 }
 
-static int pciehp_link_disable(struct controller *ctrl)
-{
-	return __pciehp_link_set(ctrl, false);
-}
-
 int pciehp_get_attention_status(struct slot *slot, u8 *status)
 {
 	struct controller *ctrl = slot->ctrl;
@@ -620,14 +610,6 @@  int pciehp_power_off_slot(struct slot * slot)
 	u16 cmd_mask;
 	int retval;
 
-	/* Disable the link at first */
-	pciehp_link_disable(ctrl);
-	/* wait the link is down */
-	if (ctrl->link_active_reporting)
-		pcie_wait_link_not_active(ctrl);
-	else
-		msleep(1000);
-
 	slot_cmd = POWER_OFF;
 	cmd_mask = PCI_EXP_SLTCTL_PCC;
 	retval = pcie_write_cmd(ctrl, slot_cmd, cmd_mask);