diff mbox

[PATCHv3,2/5] pci: Add is_removed state

Message ID 20161021153714.GA4221@wunner.de (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Lukas Wunner Oct. 21, 2016, 3:37 p.m. UTC
On Tue, Sep 27, 2016 at 04:23:32PM -0400, Keith Busch wrote:
> --- a/drivers/pci/hotplug/pciehp_pci.c
> +++ b/drivers/pci/hotplug/pciehp_pci.c
> @@ -109,6 +109,8 @@ int pciehp_unconfigure_device(struct slot *p_slot)
>  				break;
>  			}
>  		}
> +		if (!presence)
> +			dev->is_removed = 1;
>  		pci_stop_and_remove_bus_device(dev);
>  		/*
>  		 * Ensure that no new Requests will be generated from

Sorry for the delay Keith, I finally got around to test v3 of your
series with hot-removed Thunderbolt devices on Apple Macs.

I've found that the above isn't sufficient, it's necessary to also
set the is_removed bit on any child devices.  E.g. on my system
when an Apple Gigabit Ethernet adapter is plugged in, the topology
looks like this:

0000:06:04.0 --- 0000:09:00.0 --- 0000:0a:00.0 --- 0000:0b:00.0

Hotplug port     Upstream bridge  Downstream br.   Broadcom 57762
of TB host       of TB switch in
                 Ethernet adapter

With your patch above, the is_removed bit is only set on 0000:09:00.0
but not on its children.  Consequently the "tg3" driver tries to
access the hot-removed Broadcom 57762 Ethernet chip as before,
causing a soft lockup.

The diff below fixes this for me, could you fold that into your patch?
The same change might also be necessary in pcie-dpc.c. Feel free to
rename "set_is_removed_cb()" if you don't like it.

Thanks,

Lukas

-- >8 --
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Keith Busch Oct. 21, 2016, 4:15 p.m. UTC | #1
On Fri, Oct 21, 2016 at 05:37:14PM +0200, Lukas Wunner wrote:
> On Tue, Sep 27, 2016 at 04:23:32PM -0400, Keith Busch wrote:
> > --- a/drivers/pci/hotplug/pciehp_pci.c
> > +++ b/drivers/pci/hotplug/pciehp_pci.c
> > @@ -109,6 +109,8 @@ int pciehp_unconfigure_device(struct slot *p_slot)
> >  				break;
> >  			}
> >  		}
> > +		if (!presence)
> > +			dev->is_removed = 1;
> >  		pci_stop_and_remove_bus_device(dev);
> >  		/*
> >  		 * Ensure that no new Requests will be generated from
> 
> Sorry for the delay Keith, I finally got around to test v3 of your
> series with hot-removed Thunderbolt devices on Apple Macs.
> 
> I've found that the above isn't sufficient, it's necessary to also
> set the is_removed bit on any child devices.  E.g. on my system
> when an Apple Gigabit Ethernet adapter is plugged in, the topology
> looks like this:

Thanks for the catch. Your proposal looks good to me. I'll send a new
revision incorporating something like this that the dpc driver can
also use.
 
> With your patch above, the is_removed bit is only set on 0000:09:00.0
> but not on its children.  Consequently the "tg3" driver tries to
> access the hot-removed Broadcom 57762 Ethernet chip as before,
> causing a soft lockup.

Is that something that can be fixed in the tg3 driver? I don't think
drivers can rely on this patch to fense off their unintended access since
we can't stop tg3 from accesses a removed device before 'is_removed'
is set.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Wunner Oct. 21, 2016, 4:36 p.m. UTC | #2
On Fri, Oct 21, 2016 at 12:15:16PM -0400, Keith Busch wrote:
> On Fri, Oct 21, 2016 at 05:37:14PM +0200, Lukas Wunner wrote:
> > With your patch above, the is_removed bit is only set on 0000:09:00.0
> > but not on its children.  Consequently the "tg3" driver tries to
> > access the hot-removed Broadcom 57762 Ethernet chip as before,
> > causing a soft lockup.
> 
> Is that something that can be fixed in the tg3 driver? I don't think
> drivers can rely on this patch to fense off their unintended access since
> we can't stop tg3 from accesses a removed device before 'is_removed'
> is set.

I haven't tested yet what happens when the adapter is unplugged while
packets are in-flight, but at least unplugging works fine when the
adapter is idle (with your series plus the small changes I outlined).

*Without* your series, I have to set the interface to down with ifconfig
before unplugging.  If I ever forget that, the machine locks up:


NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kworker/2:2:299]
...
Workqueue: pciehp-4 pciehp_power_thread
RIP: 0010:[<ffffffffa105b01d>]  [<ffffffffa105b01d>] tg3_read32+0xd/0x10 [tg3]
...
Call Trace:
 [<ffffffffa1063610>] ? tg3_stop_block.constprop.126+0x80/0x110 [tg3]
 [<ffffffffa1066298>] ? tg3_abort_hw+0x68/0x2f0 [tg3]
 [<ffffffffa106654d>] ? tg3_halt+0x2d/0x180 [tg3]
 [<ffffffffa1072a07>] ? tg3_stop+0x157/0x210 [tg3]
 [<ffffffffa1072aeb>] ? tg3_close+0x2b/0xe0 [tg3]
 [<ffffffff81465ff4>] ? __dev_close_many+0x84/0xd0
 [<ffffffff814660b4>] ? dev_close_many+0x74/0x100
 [<ffffffff8146790b>] ? rollback_registered_many+0xfb/0x2e0
 [<ffffffff81467b19>] ? rollback_registered+0x29/0x40
 [<ffffffff81468950>] ? unregister_netdevice_queue+0x40/0x90
 [<ffffffff814689b8>] ? unregister_netdev+0x18/0x20
 [<ffffffffa106604b>] ? tg3_remove_one+0x8b/0x130 [tg3]
 [<ffffffff8130b556>] ? pci_device_remove+0x36/0xb0
 [<ffffffff813df92a>] ? __device_release_driver+0x9a/0x140
 [<ffffffff813df9ee>] ? device_release_driver+0x1e/0x30
 [<ffffffff81304bf4>] ? pci_stop_bus_device+0x84/0xa0
 [<ffffffff81304b9b>] ? pci_stop_bus_device+0x2b/0xa0
 [<ffffffff81304b9b>] ? pci_stop_bus_device+0x2b/0xa0
 [<ffffffff81304cee>] ? pci_stop_and_remove_bus_device+0xe/0x20
 [<ffffffff8131ecea>] ? pciehp_unconfigure_device+0x9a/0x180
 [<ffffffff8131e7ef>] ? pciehp_disable_slot+0x3f/0xb0
 [<ffffffff8131e8e5>] ? pciehp_power_thread+0x85/0xa0
 [<ffffffff810855df>] ? process_one_work+0x19f/0x3d0
 [<ffffffff8108585d>] ? worker_thread+0x4d/0x450
 [<ffffffff81085810>] ? process_one_work+0x3d0/0x3d0
 [<ffffffff8108b32d>] ? kthread+0xbd/0xe0
 [<ffffffff8108b270>] ? kthread_create_on_node+0x170/0x170
 [<ffffffff8155ee1f>] ? ret_from_fork+0x3f/0x70
 [<ffffffff8108b270>] ? kthread_create_on_node+0x170/0x170


Being able to just unplug without having to think of ifconfig is already
a massive improvement.

Thanks,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
index 299ea5e..ec26eb7 100644
--- a/drivers/pci/hotplug/pciehp_pci.c
+++ b/drivers/pci/hotplug/pciehp_pci.c
@@ -73,6 +73,12 @@  int pciehp_configure_device(struct slot *p_slot)
 	return ret;
 }
 
+static int set_is_removed_cb(struct pci_dev *pdev, void *unused)
+{
+	pdev->is_removed = 1;
+	return 0;
+}
+
 int pciehp_unconfigure_device(struct slot *p_slot)
 {
 	int rc = 0;
@@ -109,8 +115,11 @@  int pciehp_unconfigure_device(struct slot *p_slot)
 				break;
 			}
 		}
-		if (!presence)
+		if (!presence) {
 			dev->is_removed = 1;
+			if (pci_has_subordinate(dev))
+				pci_walk_bus(dev->subordinate, set_is_removed_cb, NULL);
+		}
 		pci_stop_and_remove_bus_device(dev);
 		/*
 		 * Ensure that no new Requests will be generated from