diff mbox

[v4] PCI: pciehp: Drop checking of PCI_BRIDGE_CONTROL in pciehp_unconfigure_device()

Message ID 20171025112940.77890-1-mika.westerberg@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mika Westerberg Oct. 25, 2017, 11:29 a.m. UTC
During surprise hot-unplug the device is not accessible anymore and
register reads return 0xffffffff. When that happens pciehp_unconfigure_device()
may inadvertently think the device below the bridge may be a display
device of somesort as reading PCI_BRIDGE_CONTROL register also returns
0xff. This results failure of the remove operation:

  pciehp 0000:00:1c.0:pcie004: Slot(0): Link Down
  pciehp 0000:00:1c.0:pcie004: Slot(0): Card present
  pciehp 0000:00:1c.0:pcie004: Cannot remove display device 0000:01:00.0

Because of this the hierarchy is left untouched preventing further
hotplug operations.

Now, it is not clear why the check is there in the first place and why
we would like to prevent removing a bridge if it has PCI_BRIDGE_CTL_VGA
set. In case of surprise hot-unplug, it would not even be possible to
prevent the removal. It may be due to the fact that pciehp_pci.c pretty
much copies similar implementation from shpchp_pci.c and this check was
just left there in the code without further thinking if it is actually
needed at all.

Given this and the issue described above, I think it makes sense to drop
the whole PCI_BRIDGE_CONTROL check from pciehp_unconfigure_device()
which is what this patch does.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
The previous version of the patch can be found here:

  https://patchwork.kernel.org/patch/10024145/

Changes from the previous version:

  * Drop the whole PCI_BRIDGE_CONTROL check
  * Update patch subject to reflect that

 drivers/pci/hotplug/pciehp_pci.c | 12 ------------
 1 file changed, 12 deletions(-)

Comments

Bjorn Helgaas Nov. 6, 2017, 11:14 p.m. UTC | #1
On Wed, Oct 25, 2017 at 02:29:40PM +0300, Mika Westerberg wrote:
> During surprise hot-unplug the device is not accessible anymore and
> register reads return 0xffffffff. When that happens pciehp_unconfigure_device()
> may inadvertently think the device below the bridge may be a display
> device of somesort as reading PCI_BRIDGE_CONTROL register also returns
> 0xff. This results failure of the remove operation:
> 
>   pciehp 0000:00:1c.0:pcie004: Slot(0): Link Down
>   pciehp 0000:00:1c.0:pcie004: Slot(0): Card present
>   pciehp 0000:00:1c.0:pcie004: Cannot remove display device 0000:01:00.0
> 
> Because of this the hierarchy is left untouched preventing further
> hotplug operations.
> 
> Now, it is not clear why the check is there in the first place and why
> we would like to prevent removing a bridge if it has PCI_BRIDGE_CTL_VGA
> set. In case of surprise hot-unplug, it would not even be possible to
> prevent the removal. It may be due to the fact that pciehp_pci.c pretty
> much copies similar implementation from shpchp_pci.c and this check was
> just left there in the code without further thinking if it is actually
> needed at all.
> 
> Given this and the issue described above, I think it makes sense to drop
> the whole PCI_BRIDGE_CONTROL check from pciehp_unconfigure_device()
> which is what this patch does.
> 
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> ---
> The previous version of the patch can be found here:
> 
>   https://patchwork.kernel.org/patch/10024145/
> 
> Changes from the previous version:
> 
>   * Drop the whole PCI_BRIDGE_CONTROL check
>   * Update patch subject to reflect that
> 
>  drivers/pci/hotplug/pciehp_pci.c | 12 ------------
>  1 file changed, 12 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
> index 2a1ca020cf5a..acc360d1a1fb 100644
> --- a/drivers/pci/hotplug/pciehp_pci.c
> +++ b/drivers/pci/hotplug/pciehp_pci.c
> @@ -79,7 +79,6 @@ int pciehp_configure_device(struct slot *p_slot)
>  int pciehp_unconfigure_device(struct slot *p_slot)
>  {
>  	int rc = 0;
> -	u8 bctl = 0;
>  	u8 presence = 0;
>  	struct pci_dev *dev, *temp;
>  	struct pci_bus *parent = p_slot->ctrl->pcie->port->subordinate;
> @@ -101,17 +100,6 @@ int pciehp_unconfigure_device(struct slot *p_slot)
>  	list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
>  					 bus_list) {
>  		pci_dev_get(dev);
> -		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE && presence) {
> -			pci_read_config_byte(dev, PCI_BRIDGE_CONTROL, &bctl);
> -			if (bctl & PCI_BRIDGE_CTL_VGA) {
> -				ctrl_err(ctrl,
> -					 "Cannot remove display device %s\n",
> -					 pci_name(dev));
> -				pci_dev_put(dev);
> -				rc = -EINVAL;
> -				break;
> -			}
> -		}
>  		if (!presence) {
>  			pci_dev_set_disconnected(dev, NULL);
>  			if (pci_has_subordinate(dev))

As you mention, shpchp_unconfigure_device() contains the same code; is
there any reason not to remove it from there as well?

I know you probably can't test shpchp, and neither can I, but it looks
like it has the same issue, and I don't want to avoid fixing it there
just for want of testing.

I wish we knew more about why that PCI_BRIDGE_CTL_VGA check was there
in the first place.  I think it was added by Dely Sy in 2004 [1].
LinkedIn thinks he's still at Intel; any chance you could ping him and
see if he has any insight?  My guess is that it was originally in a
path that didn't have to worry about surprise unplug.  But I still
don't know why it would be a problem.

Bjorn

[1] https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=c16b4b14d980
Mika Westerberg Nov. 7, 2017, 10:33 a.m. UTC | #2
On Mon, Nov 06, 2017 at 05:14:09PM -0600, Bjorn Helgaas wrote:
> As you mention, shpchp_unconfigure_device() contains the same code; is
> there any reason not to remove it from there as well?

I don't see why we cannot remove that as well. But I have to admit that
I don't know much about conventional PCI hotplug.

> I know you probably can't test shpchp, and neither can I, but it looks
> like it has the same issue, and I don't want to avoid fixing it there
> just for want of testing.

Yeah, I can't test it but no problem removing that code from shpchp as
well. I think it should be a separate patch, though.

> I wish we knew more about why that PCI_BRIDGE_CTL_VGA check was there
> in the first place.  I think it was added by Dely Sy in 2004 [1].
> LinkedIn thinks he's still at Intel; any chance you could ping him and
> see if he has any insight?

I checked from our internal "phone book" and could not find him there,
unfortunately.

> My guess is that it was originally in a
> path that didn't have to worry about surprise unplug.  But I still
> don't know why it would be a problem.

One reason might be that if you do unplug through sysfs and you are
running some graphics console (X) or so, you will suddenly lose access
to the system. However, even in that case you can just ssh to the box
and re-enable the card...
diff mbox

Patch

diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
index 2a1ca020cf5a..acc360d1a1fb 100644
--- a/drivers/pci/hotplug/pciehp_pci.c
+++ b/drivers/pci/hotplug/pciehp_pci.c
@@ -79,7 +79,6 @@  int pciehp_configure_device(struct slot *p_slot)
 int pciehp_unconfigure_device(struct slot *p_slot)
 {
 	int rc = 0;
-	u8 bctl = 0;
 	u8 presence = 0;
 	struct pci_dev *dev, *temp;
 	struct pci_bus *parent = p_slot->ctrl->pcie->port->subordinate;
@@ -101,17 +100,6 @@  int pciehp_unconfigure_device(struct slot *p_slot)
 	list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
 					 bus_list) {
 		pci_dev_get(dev);
-		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE && presence) {
-			pci_read_config_byte(dev, PCI_BRIDGE_CONTROL, &bctl);
-			if (bctl & PCI_BRIDGE_CTL_VGA) {
-				ctrl_err(ctrl,
-					 "Cannot remove display device %s\n",
-					 pci_name(dev));
-				pci_dev_put(dev);
-				rc = -EINVAL;
-				break;
-			}
-		}
 		if (!presence) {
 			pci_dev_set_disconnected(dev, NULL);
 			if (pci_has_subordinate(dev))