Message ID | 20150803041451.GA11144@google.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Re-sending, hopefully without multi-part now, so it gets to the list. Temporarily forced into using a different mail client... On Tue, Aug 4, 2015 at 10:05 AM, Jarod Wilson <jarod@redhat.com> wrote: > On Mon, Aug 3, 2015 at 12:14 AM, Bjorn Helgaas <bhelgaas@google.com> wrote: >> >> On Tue, Jul 21, 2015 at 12:25:30PM -0400, Jarod Wilson wrote: >> > https://bugzilla.kernel.org/show_bug.cgi?id=99841 >> > >> > Seems like a read of all 1's from a register of a device that has gone >> > away should be taken as a sign that the device has gone away. >> > Section 6.2.10 of the PCIE spec (v4.0, rev 0.3, Feb 19, 2014) suggests >> > as >> > much with this snippet: >> > >> > |IMPLEMENTATION NOTE >> > |Data Value of All 1’s >> > |Many platforms, including those supporting RP Extensions for DPC, can >> > |return a data value of all 1’s to software when an error is associated >> > |with a PCI Express Configuration, I/O, or Memory Read Request. During >> > DPC, >> > |the Downstream Port discards Requests destined for the Link and >> > completes >> > |them with an error (i.e., either with an Unsupported Request (UR) or >> > |Completer Abort (CA) Completion Status). By ending a series of MMIO or >> > |configuration space operations with a read to an address with a known >> > |data value not equal to all 1’s, software may determine if a Completer >> > |has been removed or DPC has been triggered. >> > >> > I'm not sure the above is directly relevant to this case, but the same >> > principle (reading all 1's means the device is probably gone) seems to >> > hold. >> > >> > This is based on part of a debugging patch Bjorn posted in the >> > referenced >> > bugzilla, and its required to make the HP ZBook G2 I've got here not >> > barf >> > when disconnecting a thunderbolt ethernet adapter and corrupt memory. >> > >> > Suggested-by: Bjorn Helgaas <bhelgaas@google.com> >> > CC: Bjorn Helgaas <bhelgaas@google.com> >> > CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> > CC: linux-pci@vger.kernel.org >> > Signed-off-by: Jarod Wilson <jarod@redhat.com> >> >> Hi Jarod, >> >> I think there are two issues here: >> >> 1) pciehp doesn't handle all 1's correctly >> 2) use-after-free of hotplug_slot >> >> This patch is for the first issue. I think it's correct, but I still >> have a question or two. I attached an updated version of the patch >> and changelog. >> >> Here's the path I think we're taking: 03:03.0 receives pciehp >> interrupt for removal (link down and card not present), and we call >> pciehp_unconfigure_device() for 05:00.0 and everything downstream from >> it. Part of this is removing 06:00.0. I expected this would use this >> path: >> >> pcie_port_remove_service # .remove method for 06:00.0 >> dev_printk("unloading service driver ...") >> pciehp_remove # hpdriver_portdrv.remove >> pciehp_release_ctrl >> pcie_shutdown_notification >> pcie_disable_notification >> pcie_write_cmd >> pcie_do_write_cmd(..., true) # wait >> pcie_wait_cmd >> pcie_poll_cmd >> read PCI_EXP_SLTSTA # would get 0xffff >> read PCI_EXPT_SLTCTL # would get 0xffff >> >> so I added checks for ~0 data in pcie_poll_cmd() and >> pcie_do_write_cmd(). >> >> But the dmesg log shows that we were in pcie_isr(), and I don't >> understand yet how we got there. Can you help figure that out? Maybe >> put a dump_stack() in pcie_isr() or something? > > > pciehp_power_thread looks to be the entry point. > > [ 1949.102247] pciehp 0000:03:03.0:pcie24: pcie_isr: intr_loc 108 > [ 1949.102252] pciehp 0000:03:03.0:pcie24: Presence/Notify input change > [ 1949.102256] pciehp 0000:03:03.0:pcie24: Card not present on Slot(3) > [ 1949.102262] pciehp 0000:03:03.0:pcie24: Data Link Layer State change > [ 1949.102266] pciehp 0000:03:03.0:pcie24: slot(3): Link Down event > [ 1949.102281] pciehp 0000:03:03.0:pcie24: Surprise Removal > [ 1949.102286] pciehp 0000:03:03.0:pcie24: Link Down event ignored on > slot(3): already powering off > [ 1949.102288] pciehp 0000:03:03.0:pcie24: Disabling > domain:bus:device=0000:05:00 > [ 1949.102290] pciehp 0000:03:03.0:pcie24: pciehp_unconfigure_device: > domain:bus:dev = 0000:05:00 > [ 1950.321907] tg3 0000:07:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE will > not clear MAC_TX_MODE=ffffffff > [ 1950.525986] [sched_delayed] sched: RT throttling activated > [ 1950.544164] pciehp 0000:06:00.0:pcie24: unloading service driver pciehp > [ 1950.544170] pciehp 0000:06:00.0:pcie24: release_slot: physical_slot = 9 > [ 1950.545016] pciehp 0000:06:00.0:pcie24: Timeout on hotplug command 0x1038 > (issued 19092 msec ago) > [ 1950.545020] pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response > from device > [ 1950.545021] pciehp 0000:06:00.0:pcie24: pcie_disable_notification: > SLOTCTRL d8 write cmd 0 > [ 1950.545025] pciehp 0000:06:00.0:pcie24: Device has gone away > [ 1950.545027] CPU: 0 PID: 12361 Comm: kworker/0:2 Not tainted > 3.10.0-302.el7.hp.x86_64 #1 > [ 1950.545028] Hardware name: Hewlett-Packard HP ZBook 15 G2/2253, BIOS M70 > Ver. 01.07 02/26/2015 > [ 1950.545033] Workqueue: pciehp-3 pciehp_power_thread > [ 1950.545034] 0000000000000000 00000000f721dd13 ffff8804822ffa78 > ffffffff81632729 > [ 1950.545036] ffff8804822ffac0 ffffffff8133bf64 ffff00000000002e > 00000000f721dd13 > [ 1950.545038] ffff8804818fab00 ffff880468f70cc0 000000000000002e > 0000000000000282 > [ 1950.545039] Call Trace: > [ 1950.545044] [<ffffffff81632729>] dump_stack+0x19/0x1b > [ 1950.545046] [<ffffffff8133bf64>] pcie_isr+0x264/0x280 > [ 1950.545048] [<ffffffff8111b6b9>] __free_irq+0x189/0x220 > [ 1950.545049] [<ffffffff8111b7e9>] free_irq+0x49/0xb0 > [ 1950.545051] [<ffffffff8133d3b9>] pciehp_release_ctrl+0xb9/0xe0 > [ 1950.545053] [<ffffffff81339db3>] pciehp_remove+0x23/0x30 > [ 1950.545055] [<ffffffff8133442e>] pcie_port_remove_service+0x4e/0x60 > [ 1950.545058] [<ffffffff813f39ff>] __device_release_driver+0x7f/0xf0 > [ 1950.545059] [<ffffffff813f3a93>] device_release_driver+0x23/0x30 > [ 1950.545061] [<ffffffff813f3208>] bus_remove_device+0x108/0x180 > [ 1950.545062] [<ffffffff813ef725>] device_del+0x135/0x1f0 > [ 1950.545064] [<ffffffff81334360>] ? cleanup_service_irqs+0x40/0x40 > [ 1950.545065] [<ffffffff813ef7fe>] device_unregister+0x1e/0x60 > [ 1950.545066] [<ffffffff81334381>] remove_iter+0x21/0x30 > [ 1950.545068] [<ffffffff813ef093>] device_for_each_child+0x53/0x90 > [ 1950.545069] [<ffffffff81334ae2>] pcie_port_device_remove+0x22/0x40 > [ 1950.545070] [<ffffffff81334e1e>] pcie_portdrv_remove+0xe/0x10 > [ 1950.545074] [<ffffffff813266eb>] pci_device_remove+0x3b/0xb0 > [ 1950.545075] [<ffffffff813f39ff>] __device_release_driver+0x7f/0xf0 > [ 1950.545077] [<ffffffff813f3a93>] device_release_driver+0x23/0x30 > [ 1950.545078] [<ffffffff8131fa54>] pci_stop_bus_device+0x94/0xa0 > [ 1950.545079] [<ffffffff8131f9fb>] pci_stop_bus_device+0x3b/0xa0 > [ 1950.545081] [<ffffffff8131fb42>] > pci_stop_and_remove_bus_device+0x12/0x20 > [ 1950.545082] [<ffffffff8133b8a0>] pciehp_unconfigure_device+0xb0/0x1b0 > [ 1950.545084] [<ffffffff8133b282>] pciehp_disable_slot+0x52/0xd0 > [ 1950.545085] [<ffffffff8133b3ea>] pciehp_power_thread+0xea/0x150 > [ 1950.545088] [<ffffffff8109c57b>] process_one_work+0x17b/0x470 > [ 1950.545089] [<ffffffff8109d34b>] worker_thread+0x11b/0x400 > [ 1950.545090] [<ffffffff8109d230>] ? rescuer_thread+0x400/0x400 > [ 1950.545093] [<ffffffff810a4a6f>] kthread+0xcf/0xe0 > [ 1950.545094] [<ffffffff810a49a0>] ? kthread_create_on_node+0x140/0x140 > [ 1950.545097] [<ffffffff81642d58>] ret_from_fork+0x58/0x90 > [ 1950.545099] [<ffffffff810a49a0>] ? kthread_create_on_node+0x140/0x140 > > > I get pretty much identical stacks for these devices as well, in this order > after the above: > [ 1959.103493] pciehp 0000:03:06.0:pcie24: Device has gone away > [ 1959.104152] pciehp 0000:03:05.0:pcie24: Device has gone away > [ 1959.104736] pciehp 0000:03:04.0:pcie24: Device has gone away > [ 1959.105265] pciehp 0000:03:03.0:pcie24: Device has gone away > [ 1959.105841] pciehp 0000:03:00.0:pcie24: Device has gone away > > > --jarod > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 04, 2015 at 10:05:18AM -0400, Jarod Wilson wrote: > On Mon, Aug 3, 2015 at 12:14 AM, Bjorn Helgaas <bhelgaas@google.com> wrote: > > > On Tue, Jul 21, 2015 at 12:25:30PM -0400, Jarod Wilson wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=99841 > > > > > > Seems like a read of all 1's from a register of a device that has gone > > > away should be taken as a sign that the device has gone away. > > > Section 6.2.10 of the PCIE spec (v4.0, rev 0.3, Feb 19, 2014) suggests as > > > much with this snippet: > > > > > > |IMPLEMENTATION NOTE > > > |Data Value of All 1’s > > > |Many platforms, including those supporting RP Extensions for DPC, can > > > |return a data value of all 1’s to software when an error is associated > > > |with a PCI Express Configuration, I/O, or Memory Read Request. During > > DPC, > > > |the Downstream Port discards Requests destined for the Link and > > completes > > > |them with an error (i.e., either with an Unsupported Request (UR) or > > > |Completer Abort (CA) Completion Status). By ending a series of MMIO or > > > |configuration space operations with a read to an address with a known > > > |data value not equal to all 1’s, software may determine if a Completer > > > |has been removed or DPC has been triggered. > > > > > > I'm not sure the above is directly relevant to this case, but the same > > > principle (reading all 1's means the device is probably gone) seems to > > > hold. > > > > > > This is based on part of a debugging patch Bjorn posted in the referenced > > > bugzilla, and its required to make the HP ZBook G2 I've got here not barf > > > when disconnecting a thunderbolt ethernet adapter and corrupt memory. > > > > > > Suggested-by: Bjorn Helgaas <bhelgaas@google.com> > > > CC: Bjorn Helgaas <bhelgaas@google.com> > > > CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > CC: linux-pci@vger.kernel.org > > > Signed-off-by: Jarod Wilson <jarod@redhat.com> > > > > Hi Jarod, > > > > I think there are two issues here: > > > > 1) pciehp doesn't handle all 1's correctly > > 2) use-after-free of hotplug_slot > > > > This patch is for the first issue. I think it's correct, but I still > > have a question or two. I attached an updated version of the patch > > and changelog. > > > > Here's the path I think we're taking: 03:03.0 receives pciehp > > interrupt for removal (link down and card not present), and we call > > pciehp_unconfigure_device() for 05:00.0 and everything downstream from > > it. Part of this is removing 06:00.0. I expected this would use this > > path: > > > > pcie_port_remove_service # .remove method for 06:00.0 > > dev_printk("unloading service driver ...") > > pciehp_remove # hpdriver_portdrv.remove > > pciehp_release_ctrl > > pcie_shutdown_notification > > pcie_disable_notification > > pcie_write_cmd > > pcie_do_write_cmd(..., true) # wait > > pcie_wait_cmd > > pcie_poll_cmd > > read PCI_EXP_SLTSTA # would get 0xffff > > read PCI_EXPT_SLTCTL # would get 0xffff > > > > so I added checks for ~0 data in pcie_poll_cmd() and > > pcie_do_write_cmd(). > > > > But the dmesg log shows that we were in pcie_isr(), and I don't > > understand yet how we got there. Can you help figure that out? Maybe > > put a dump_stack() in pcie_isr() or something? > > [ 1949.102247] pciehp 0000:03:03.0:pcie24: pcie_isr: intr_loc 108 > [ 1949.102252] pciehp 0000:03:03.0:pcie24: Presence/Notify input change > [ 1949.102256] pciehp 0000:03:03.0:pcie24: Card not present on Slot(3) > [ 1949.102262] pciehp 0000:03:03.0:pcie24: Data Link Layer State change > [ 1949.102266] pciehp 0000:03:03.0:pcie24: slot(3): Link Down event > [ 1949.102281] pciehp 0000:03:03.0:pcie24: Surprise Removal > [ 1949.102286] pciehp 0000:03:03.0:pcie24: Link Down event ignored on > slot(3): already powering off > [ 1949.102288] pciehp 0000:03:03.0:pcie24: Disabling > domain:bus:device=0000:05:00 > [ 1949.102290] pciehp 0000:03:03.0:pcie24: pciehp_unconfigure_device: > domain:bus:dev = 0000:05:00 > [ 1950.321907] tg3 0000:07:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE > will not clear MAC_TX_MODE=ffffffff > [ 1950.525986] [sched_delayed] sched: RT throttling activated > [ 1950.544164] pciehp 0000:06:00.0:pcie24: unloading service driver pciehp > [ 1950.544170] pciehp 0000:06:00.0:pcie24: release_slot: physical_slot = 9 > [ 1950.545016] pciehp 0000:06:00.0:pcie24: Timeout on hotplug command > 0x1038 (issued 19092 msec ago) > [ 1950.545020] pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response > from device > [ 1950.545021] pciehp 0000:06:00.0:pcie24: pcie_disable_notification: > SLOTCTRL d8 write cmd 0 > [ 1950.545025] pciehp 0000:06:00.0:pcie24: Device has gone away > [ 1950.545027] CPU: 0 PID: 12361 Comm: kworker/0:2 Not tainted > 3.10.0-302.el7.hp.x86_64 #1 > [ 1950.545028] Hardware name: Hewlett-Packard HP ZBook 15 G2/2253, BIOS M70 > Ver. 01.07 02/26/2015 > [ 1950.545033] Workqueue: pciehp-3 pciehp_power_thread > [ 1950.545034] 0000000000000000 00000000f721dd13 ffff8804822ffa78 > ffffffff81632729 > [ 1950.545036] ffff8804822ffac0 ffffffff8133bf64 ffff00000000002e > 00000000f721dd13 > [ 1950.545038] ffff8804818fab00 ffff880468f70cc0 000000000000002e > 0000000000000282 > [ 1950.545039] Call Trace: > [ 1950.545044] [<ffffffff81632729>] dump_stack+0x19/0x1b > [ 1950.545046] [<ffffffff8133bf64>] pcie_isr+0x264/0x280 > [ 1950.545048] [<ffffffff8111b6b9>] __free_irq+0x189/0x220 > [ 1950.545049] [<ffffffff8111b7e9>] free_irq+0x49/0xb0 > [ 1950.545051] [<ffffffff8133d3b9>] pciehp_release_ctrl+0xb9/0xe0 > [ 1950.545053] [<ffffffff81339db3>] pciehp_remove+0x23/0x30 > [ 1950.545055] [<ffffffff8133442e>] pcie_port_remove_service+0x4e/0x60 Do you have CONFIG_DEBUG_SHIRQ set? When CONFIG_DEBUG_SHIRQ is set, __free_irq() calls the ISR one last time. It does make sense that the driver must be prepared for the ISR to be called at any time before free_irq() returns. I just didn't see a path for the already-removed device to generate an actual interrupt. Can you try the version I posted, with the additional tests in pcie_poll_cmd() and pcie_do_write_cmd()? We should try to read from the device there, even before we free the IRQ, so we might see several messages. Maybe there's a way we can be smarter about bailing out there. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 8/4/2015 12:56 PM, Bjorn Helgaas wrote: > On Tue, Aug 04, 2015 at 10:05:18AM -0400, Jarod Wilson wrote: >> On Mon, Aug 3, 2015 at 12:14 AM, Bjorn Helgaas <bhelgaas@google.com> wrote: >> >>> On Tue, Jul 21, 2015 at 12:25:30PM -0400, Jarod Wilson wrote: >>>> https://bugzilla.kernel.org/show_bug.cgi?id=99841 >>>> >>>> Seems like a read of all 1's from a register of a device that has gone >>>> away should be taken as a sign that the device has gone away. >>>> Section 6.2.10 of the PCIE spec (v4.0, rev 0.3, Feb 19, 2014) suggests as >>>> much with this snippet: >>>> >>>> |IMPLEMENTATION NOTE >>>> |Data Value of All 1’s >>>> |Many platforms, including those supporting RP Extensions for DPC, can >>>> |return a data value of all 1’s to software when an error is associated >>>> |with a PCI Express Configuration, I/O, or Memory Read Request. During >>> DPC, >>>> |the Downstream Port discards Requests destined for the Link and >>> completes >>>> |them with an error (i.e., either with an Unsupported Request (UR) or >>>> |Completer Abort (CA) Completion Status). By ending a series of MMIO or >>>> |configuration space operations with a read to an address with a known >>>> |data value not equal to all 1’s, software may determine if a Completer >>>> |has been removed or DPC has been triggered. >>>> >>>> I'm not sure the above is directly relevant to this case, but the same >>>> principle (reading all 1's means the device is probably gone) seems to >>>> hold. >>>> >>>> This is based on part of a debugging patch Bjorn posted in the referenced >>>> bugzilla, and its required to make the HP ZBook G2 I've got here not barf >>>> when disconnecting a thunderbolt ethernet adapter and corrupt memory. >>>> >>>> Suggested-by: Bjorn Helgaas <bhelgaas@google.com> >>>> CC: Bjorn Helgaas <bhelgaas@google.com> >>>> CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >>>> CC: linux-pci@vger.kernel.org >>>> Signed-off-by: Jarod Wilson <jarod@redhat.com> >>> >>> Hi Jarod, >>> >>> I think there are two issues here: >>> >>> 1) pciehp doesn't handle all 1's correctly >>> 2) use-after-free of hotplug_slot >>> >>> This patch is for the first issue. I think it's correct, but I still >>> have a question or two. I attached an updated version of the patch >>> and changelog. >>> >>> Here's the path I think we're taking: 03:03.0 receives pciehp >>> interrupt for removal (link down and card not present), and we call >>> pciehp_unconfigure_device() for 05:00.0 and everything downstream from >>> it. Part of this is removing 06:00.0. I expected this would use this >>> path: >>> >>> pcie_port_remove_service # .remove method for 06:00.0 >>> dev_printk("unloading service driver ...") >>> pciehp_remove # hpdriver_portdrv.remove >>> pciehp_release_ctrl >>> pcie_shutdown_notification >>> pcie_disable_notification >>> pcie_write_cmd >>> pcie_do_write_cmd(..., true) # wait >>> pcie_wait_cmd >>> pcie_poll_cmd >>> read PCI_EXP_SLTSTA # would get 0xffff >>> read PCI_EXPT_SLTCTL # would get 0xffff >>> >>> so I added checks for ~0 data in pcie_poll_cmd() and >>> pcie_do_write_cmd(). >>> >>> But the dmesg log shows that we were in pcie_isr(), and I don't >>> understand yet how we got there. Can you help figure that out? Maybe >>> put a dump_stack() in pcie_isr() or something? >> >> [ 1949.102247] pciehp 0000:03:03.0:pcie24: pcie_isr: intr_loc 108 >> [ 1949.102252] pciehp 0000:03:03.0:pcie24: Presence/Notify input change >> [ 1949.102256] pciehp 0000:03:03.0:pcie24: Card not present on Slot(3) >> [ 1949.102262] pciehp 0000:03:03.0:pcie24: Data Link Layer State change >> [ 1949.102266] pciehp 0000:03:03.0:pcie24: slot(3): Link Down event >> [ 1949.102281] pciehp 0000:03:03.0:pcie24: Surprise Removal >> [ 1949.102286] pciehp 0000:03:03.0:pcie24: Link Down event ignored on >> slot(3): already powering off >> [ 1949.102288] pciehp 0000:03:03.0:pcie24: Disabling >> domain:bus:device=0000:05:00 >> [ 1949.102290] pciehp 0000:03:03.0:pcie24: pciehp_unconfigure_device: >> domain:bus:dev = 0000:05:00 >> [ 1950.321907] tg3 0000:07:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE >> will not clear MAC_TX_MODE=ffffffff >> [ 1950.525986] [sched_delayed] sched: RT throttling activated >> [ 1950.544164] pciehp 0000:06:00.0:pcie24: unloading service driver pciehp >> [ 1950.544170] pciehp 0000:06:00.0:pcie24: release_slot: physical_slot = 9 >> [ 1950.545016] pciehp 0000:06:00.0:pcie24: Timeout on hotplug command >> 0x1038 (issued 19092 msec ago) >> [ 1950.545020] pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response >> from device >> [ 1950.545021] pciehp 0000:06:00.0:pcie24: pcie_disable_notification: >> SLOTCTRL d8 write cmd 0 >> [ 1950.545025] pciehp 0000:06:00.0:pcie24: Device has gone away >> [ 1950.545027] CPU: 0 PID: 12361 Comm: kworker/0:2 Not tainted >> 3.10.0-302.el7.hp.x86_64 #1 >> [ 1950.545028] Hardware name: Hewlett-Packard HP ZBook 15 G2/2253, BIOS M70 >> Ver. 01.07 02/26/2015 >> [ 1950.545033] Workqueue: pciehp-3 pciehp_power_thread >> [ 1950.545034] 0000000000000000 00000000f721dd13 ffff8804822ffa78 >> ffffffff81632729 >> [ 1950.545036] ffff8804822ffac0 ffffffff8133bf64 ffff00000000002e >> 00000000f721dd13 >> [ 1950.545038] ffff8804818fab00 ffff880468f70cc0 000000000000002e >> 0000000000000282 >> [ 1950.545039] Call Trace: >> [ 1950.545044] [<ffffffff81632729>] dump_stack+0x19/0x1b >> [ 1950.545046] [<ffffffff8133bf64>] pcie_isr+0x264/0x280 >> [ 1950.545048] [<ffffffff8111b6b9>] __free_irq+0x189/0x220 >> [ 1950.545049] [<ffffffff8111b7e9>] free_irq+0x49/0xb0 >> [ 1950.545051] [<ffffffff8133d3b9>] pciehp_release_ctrl+0xb9/0xe0 >> [ 1950.545053] [<ffffffff81339db3>] pciehp_remove+0x23/0x30 >> [ 1950.545055] [<ffffffff8133442e>] pcie_port_remove_service+0x4e/0x60 > > Do you have CONFIG_DEBUG_SHIRQ set? When CONFIG_DEBUG_SHIRQ is set, > __free_irq() calls the ISR one last time. It does make sense that the > driver must be prepared for the ISR to be called at any time before > free_irq() returns. I just didn't see a path for the already-removed > device to generate an actual interrupt. Yup, I've got CONFIG_DEBUG_SHIRQ=y in my config. So I take it we're hitting the action->handler(irq, dev_id) bit in __free_irq(), after we've already done a bunch of teardown/removal? > Can you try the version I posted, with the additional tests in > pcie_poll_cmd() and pcie_do_write_cmd()? We should try to read from > the device there, even before we free the IRQ, so we might see several > messages. Maybe there's a way we can be smarter about bailing out > there. The above was with your additions munged in with the older patch, I actually do see "pcie_do_write_cmd: no response from device" just two lines ahead of each "Device has gone away" message from pcie_isr(). pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response from device pciehp 0000:06:00.0:pcie24: pcie_disable_notification: SLOTCTRL d8 write cmd 0 pciehp 0000:06:00.0:pcie24: Device has gone away <- from pcie_isr()
On Tue, Aug 04, 2015 at 01:21:23PM -0400, Jarod Wilson wrote: > On 8/4/2015 12:56 PM, Bjorn Helgaas wrote: > >On Tue, Aug 04, 2015 at 10:05:18AM -0400, Jarod Wilson wrote: > >>On Mon, Aug 3, 2015 at 12:14 AM, Bjorn Helgaas <bhelgaas@google.com> wrote: > >> > >>>On Tue, Jul 21, 2015 at 12:25:30PM -0400, Jarod Wilson wrote: > >>>>https://bugzilla.kernel.org/show_bug.cgi?id=99841 > >>>> > >>>>Seems like a read of all 1's from a register of a device that has gone > >>>>away should be taken as a sign that the device has gone away. > >>>>Section 6.2.10 of the PCIE spec (v4.0, rev 0.3, Feb 19, 2014) suggests as > >>>>much with this snippet: > >>>> > >>>>|IMPLEMENTATION NOTE > >>>>|Data Value of All 1’s > >>>>|Many platforms, including those supporting RP Extensions for DPC, can > >>>>|return a data value of all 1’s to software when an error is associated > >>>>|with a PCI Express Configuration, I/O, or Memory Read Request. During > >>>DPC, > >>>>|the Downstream Port discards Requests destined for the Link and > >>>completes > >>>>|them with an error (i.e., either with an Unsupported Request (UR) or > >>>>|Completer Abort (CA) Completion Status). By ending a series of MMIO or > >>>>|configuration space operations with a read to an address with a known > >>>>|data value not equal to all 1’s, software may determine if a Completer > >>>>|has been removed or DPC has been triggered. > >>>> > >>>>I'm not sure the above is directly relevant to this case, but the same > >>>>principle (reading all 1's means the device is probably gone) seems to > >>>>hold. > >>>> > >>>>This is based on part of a debugging patch Bjorn posted in the referenced > >>>>bugzilla, and its required to make the HP ZBook G2 I've got here not barf > >>>>when disconnecting a thunderbolt ethernet adapter and corrupt memory. > >>>> > >>>>Suggested-by: Bjorn Helgaas <bhelgaas@google.com> > >>>>CC: Bjorn Helgaas <bhelgaas@google.com> > >>>>CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > >>>>CC: linux-pci@vger.kernel.org > >>>>Signed-off-by: Jarod Wilson <jarod@redhat.com> > >>> > >>>Hi Jarod, > >>> > >>>I think there are two issues here: > >>> > >>> 1) pciehp doesn't handle all 1's correctly > >>> 2) use-after-free of hotplug_slot > >>> > >>>This patch is for the first issue. I think it's correct, but I still > >>>have a question or two. I attached an updated version of the patch > >>>and changelog. > >>> > >>>Here's the path I think we're taking: 03:03.0 receives pciehp > >>>interrupt for removal (link down and card not present), and we call > >>>pciehp_unconfigure_device() for 05:00.0 and everything downstream from > >>>it. Part of this is removing 06:00.0. I expected this would use this > >>>path: > >>> > >>> pcie_port_remove_service # .remove method for 06:00.0 > >>> dev_printk("unloading service driver ...") > >>> pciehp_remove # hpdriver_portdrv.remove > >>> pciehp_release_ctrl > >>> pcie_shutdown_notification > >>> pcie_disable_notification > >>> pcie_write_cmd > >>> pcie_do_write_cmd(..., true) # wait > >>> pcie_wait_cmd > >>> pcie_poll_cmd > >>> read PCI_EXP_SLTSTA # would get 0xffff > >>> read PCI_EXPT_SLTCTL # would get 0xffff > >>> > >>>so I added checks for ~0 data in pcie_poll_cmd() and > >>>pcie_do_write_cmd(). > >>> > >>>But the dmesg log shows that we were in pcie_isr(), and I don't > >>>understand yet how we got there. Can you help figure that out? Maybe > >>>put a dump_stack() in pcie_isr() or something? > >> > >>[ 1949.102247] pciehp 0000:03:03.0:pcie24: pcie_isr: intr_loc 108 > >>[ 1949.102252] pciehp 0000:03:03.0:pcie24: Presence/Notify input change > >>[ 1949.102256] pciehp 0000:03:03.0:pcie24: Card not present on Slot(3) > >>[ 1949.102262] pciehp 0000:03:03.0:pcie24: Data Link Layer State change > >>[ 1949.102266] pciehp 0000:03:03.0:pcie24: slot(3): Link Down event > >>[ 1949.102281] pciehp 0000:03:03.0:pcie24: Surprise Removal > >>[ 1949.102286] pciehp 0000:03:03.0:pcie24: Link Down event ignored on > >>slot(3): already powering off > >>[ 1949.102288] pciehp 0000:03:03.0:pcie24: Disabling > >>domain:bus:device=0000:05:00 > >>[ 1949.102290] pciehp 0000:03:03.0:pcie24: pciehp_unconfigure_device: > >>domain:bus:dev = 0000:05:00 > >>[ 1950.321907] tg3 0000:07:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE > >>will not clear MAC_TX_MODE=ffffffff > >>[ 1950.525986] [sched_delayed] sched: RT throttling activated > >>[ 1950.544164] pciehp 0000:06:00.0:pcie24: unloading service driver pciehp > >>[ 1950.544170] pciehp 0000:06:00.0:pcie24: release_slot: physical_slot = 9 > >>[ 1950.545016] pciehp 0000:06:00.0:pcie24: Timeout on hotplug command > >>0x1038 (issued 19092 msec ago) > >>[ 1950.545020] pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response > >>from device > >>[ 1950.545021] pciehp 0000:06:00.0:pcie24: pcie_disable_notification: > >>SLOTCTRL d8 write cmd 0 > >>[ 1950.545025] pciehp 0000:06:00.0:pcie24: Device has gone away > >>[ 1950.545027] CPU: 0 PID: 12361 Comm: kworker/0:2 Not tainted > >>3.10.0-302.el7.hp.x86_64 #1 > >>[ 1950.545028] Hardware name: Hewlett-Packard HP ZBook 15 G2/2253, BIOS M70 > >>Ver. 01.07 02/26/2015 > >>[ 1950.545033] Workqueue: pciehp-3 pciehp_power_thread > >>[ 1950.545034] 0000000000000000 00000000f721dd13 ffff8804822ffa78 > >>ffffffff81632729 > >>[ 1950.545036] ffff8804822ffac0 ffffffff8133bf64 ffff00000000002e > >>00000000f721dd13 > >>[ 1950.545038] ffff8804818fab00 ffff880468f70cc0 000000000000002e > >>0000000000000282 > >>[ 1950.545039] Call Trace: > >>[ 1950.545044] [<ffffffff81632729>] dump_stack+0x19/0x1b > >>[ 1950.545046] [<ffffffff8133bf64>] pcie_isr+0x264/0x280 > >>[ 1950.545048] [<ffffffff8111b6b9>] __free_irq+0x189/0x220 > >>[ 1950.545049] [<ffffffff8111b7e9>] free_irq+0x49/0xb0 > >>[ 1950.545051] [<ffffffff8133d3b9>] pciehp_release_ctrl+0xb9/0xe0 > >>[ 1950.545053] [<ffffffff81339db3>] pciehp_remove+0x23/0x30 > >>[ 1950.545055] [<ffffffff8133442e>] pcie_port_remove_service+0x4e/0x60 > > > >Do you have CONFIG_DEBUG_SHIRQ set? When CONFIG_DEBUG_SHIRQ is set, > >__free_irq() calls the ISR one last time. It does make sense that the > >driver must be prepared for the ISR to be called at any time before > >free_irq() returns. I just didn't see a path for the already-removed > >device to generate an actual interrupt. > > Yup, I've got CONFIG_DEBUG_SHIRQ=y in my config. So I take it we're > hitting the action->handler(irq, dev_id) bit in __free_irq(), after > we've already done a bunch of teardown/removal? OK, good, that explains why we called pcie_isr(). > >Can you try the version I posted, with the additional tests in > >pcie_poll_cmd() and pcie_do_write_cmd()? We should try to read from > >the device there, even before we free the IRQ, so we might see several > >messages. Maybe there's a way we can be smarter about bailing out > >there. > > The above was with your additions munged in with the older patch, I > actually do see "pcie_do_write_cmd: no response from device" just > two lines ahead of each "Device has gone away" message from > pcie_isr(). > > pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response from device > pciehp 0000:06:00.0:pcie24: pcie_disable_notification: SLOTCTRL d8 > write cmd 0 > pciehp 0000:06:00.0:pcie24: Device has gone away <- from pcie_isr() Oh, sorry! I should have noticed that. I just wanted to make sure I didn't cause a flood of extra messages. I think I'll merge this version (with all three checks). We still have a slot lifetime issue, but that's a separate problem. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 8/4/2015 1:51 PM, Bjorn Helgaas wrote: > On Tue, Aug 04, 2015 at 01:21:23PM -0400, Jarod Wilson wrote: >> On 8/4/2015 12:56 PM, Bjorn Helgaas wrote: >>> On Tue, Aug 04, 2015 at 10:05:18AM -0400, Jarod Wilson wrote: >>>> On Mon, Aug 3, 2015 at 12:14 AM, Bjorn Helgaas <bhelgaas@google.com> wrote: >>>> >>>>> On Tue, Jul 21, 2015 at 12:25:30PM -0400, Jarod Wilson wrote: >>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=99841 >>>>>> >>>>>> Seems like a read of all 1's from a register of a device that has gone >>>>>> away should be taken as a sign that the device has gone away. >>>>>> Section 6.2.10 of the PCIE spec (v4.0, rev 0.3, Feb 19, 2014) suggests as >>>>>> much with this snippet: >>>>>> >>>>>> |IMPLEMENTATION NOTE >>>>>> |Data Value of All 1’s >>>>>> |Many platforms, including those supporting RP Extensions for DPC, can >>>>>> |return a data value of all 1’s to software when an error is associated >>>>>> |with a PCI Express Configuration, I/O, or Memory Read Request. During >>>>> DPC, >>>>>> |the Downstream Port discards Requests destined for the Link and >>>>> completes >>>>>> |them with an error (i.e., either with an Unsupported Request (UR) or >>>>>> |Completer Abort (CA) Completion Status). By ending a series of MMIO or >>>>>> |configuration space operations with a read to an address with a known >>>>>> |data value not equal to all 1’s, software may determine if a Completer >>>>>> |has been removed or DPC has been triggered. >>>>>> >>>>>> I'm not sure the above is directly relevant to this case, but the same >>>>>> principle (reading all 1's means the device is probably gone) seems to >>>>>> hold. >>>>>> >>>>>> This is based on part of a debugging patch Bjorn posted in the referenced >>>>>> bugzilla, and its required to make the HP ZBook G2 I've got here not barf >>>>>> when disconnecting a thunderbolt ethernet adapter and corrupt memory. >>>>>> >>>>>> Suggested-by: Bjorn Helgaas <bhelgaas@google.com> >>>>>> CC: Bjorn Helgaas <bhelgaas@google.com> >>>>>> CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >>>>>> CC: linux-pci@vger.kernel.org >>>>>> Signed-off-by: Jarod Wilson <jarod@redhat.com> >>>>> >>>>> Hi Jarod, >>>>> >>>>> I think there are two issues here: >>>>> >>>>> 1) pciehp doesn't handle all 1's correctly >>>>> 2) use-after-free of hotplug_slot >>>>> >>>>> This patch is for the first issue. I think it's correct, but I still >>>>> have a question or two. I attached an updated version of the patch >>>>> and changelog. >>>>> >>>>> Here's the path I think we're taking: 03:03.0 receives pciehp >>>>> interrupt for removal (link down and card not present), and we call >>>>> pciehp_unconfigure_device() for 05:00.0 and everything downstream from >>>>> it. Part of this is removing 06:00.0. I expected this would use this >>>>> path: >>>>> >>>>> pcie_port_remove_service # .remove method for 06:00.0 >>>>> dev_printk("unloading service driver ...") >>>>> pciehp_remove # hpdriver_portdrv.remove >>>>> pciehp_release_ctrl >>>>> pcie_shutdown_notification >>>>> pcie_disable_notification >>>>> pcie_write_cmd >>>>> pcie_do_write_cmd(..., true) # wait >>>>> pcie_wait_cmd >>>>> pcie_poll_cmd >>>>> read PCI_EXP_SLTSTA # would get 0xffff >>>>> read PCI_EXPT_SLTCTL # would get 0xffff >>>>> >>>>> so I added checks for ~0 data in pcie_poll_cmd() and >>>>> pcie_do_write_cmd(). >>>>> >>>>> But the dmesg log shows that we were in pcie_isr(), and I don't >>>>> understand yet how we got there. Can you help figure that out? Maybe >>>>> put a dump_stack() in pcie_isr() or something? >>>> >>>> [ 1949.102247] pciehp 0000:03:03.0:pcie24: pcie_isr: intr_loc 108 >>>> [ 1949.102252] pciehp 0000:03:03.0:pcie24: Presence/Notify input change >>>> [ 1949.102256] pciehp 0000:03:03.0:pcie24: Card not present on Slot(3) >>>> [ 1949.102262] pciehp 0000:03:03.0:pcie24: Data Link Layer State change >>>> [ 1949.102266] pciehp 0000:03:03.0:pcie24: slot(3): Link Down event >>>> [ 1949.102281] pciehp 0000:03:03.0:pcie24: Surprise Removal >>>> [ 1949.102286] pciehp 0000:03:03.0:pcie24: Link Down event ignored on >>>> slot(3): already powering off >>>> [ 1949.102288] pciehp 0000:03:03.0:pcie24: Disabling >>>> domain:bus:device=0000:05:00 >>>> [ 1949.102290] pciehp 0000:03:03.0:pcie24: pciehp_unconfigure_device: >>>> domain:bus:dev = 0000:05:00 >>>> [ 1950.321907] tg3 0000:07:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE >>>> will not clear MAC_TX_MODE=ffffffff >>>> [ 1950.525986] [sched_delayed] sched: RT throttling activated >>>> [ 1950.544164] pciehp 0000:06:00.0:pcie24: unloading service driver pciehp >>>> [ 1950.544170] pciehp 0000:06:00.0:pcie24: release_slot: physical_slot = 9 >>>> [ 1950.545016] pciehp 0000:06:00.0:pcie24: Timeout on hotplug command >>>> 0x1038 (issued 19092 msec ago) >>>> [ 1950.545020] pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response >>> >from device >>>> [ 1950.545021] pciehp 0000:06:00.0:pcie24: pcie_disable_notification: >>>> SLOTCTRL d8 write cmd 0 >>>> [ 1950.545025] pciehp 0000:06:00.0:pcie24: Device has gone away >>>> [ 1950.545027] CPU: 0 PID: 12361 Comm: kworker/0:2 Not tainted >>>> 3.10.0-302.el7.hp.x86_64 #1 >>>> [ 1950.545028] Hardware name: Hewlett-Packard HP ZBook 15 G2/2253, BIOS M70 >>>> Ver. 01.07 02/26/2015 >>>> [ 1950.545033] Workqueue: pciehp-3 pciehp_power_thread >>>> [ 1950.545034] 0000000000000000 00000000f721dd13 ffff8804822ffa78 >>>> ffffffff81632729 >>>> [ 1950.545036] ffff8804822ffac0 ffffffff8133bf64 ffff00000000002e >>>> 00000000f721dd13 >>>> [ 1950.545038] ffff8804818fab00 ffff880468f70cc0 000000000000002e >>>> 0000000000000282 >>>> [ 1950.545039] Call Trace: >>>> [ 1950.545044] [<ffffffff81632729>] dump_stack+0x19/0x1b >>>> [ 1950.545046] [<ffffffff8133bf64>] pcie_isr+0x264/0x280 >>>> [ 1950.545048] [<ffffffff8111b6b9>] __free_irq+0x189/0x220 >>>> [ 1950.545049] [<ffffffff8111b7e9>] free_irq+0x49/0xb0 >>>> [ 1950.545051] [<ffffffff8133d3b9>] pciehp_release_ctrl+0xb9/0xe0 >>>> [ 1950.545053] [<ffffffff81339db3>] pciehp_remove+0x23/0x30 >>>> [ 1950.545055] [<ffffffff8133442e>] pcie_port_remove_service+0x4e/0x60 >>> >>> Do you have CONFIG_DEBUG_SHIRQ set? When CONFIG_DEBUG_SHIRQ is set, >>> __free_irq() calls the ISR one last time. It does make sense that the >>> driver must be prepared for the ISR to be called at any time before >>> free_irq() returns. I just didn't see a path for the already-removed >>> device to generate an actual interrupt. >> >> Yup, I've got CONFIG_DEBUG_SHIRQ=y in my config. So I take it we're >> hitting the action->handler(irq, dev_id) bit in __free_irq(), after >> we've already done a bunch of teardown/removal? > > OK, good, that explains why we called pcie_isr(). > >>> Can you try the version I posted, with the additional tests in >>> pcie_poll_cmd() and pcie_do_write_cmd()? We should try to read from >>> the device there, even before we free the IRQ, so we might see several >>> messages. Maybe there's a way we can be smarter about bailing out >>> there. >> >> The above was with your additions munged in with the older patch, I >> actually do see "pcie_do_write_cmd: no response from device" just >> two lines ahead of each "Device has gone away" message from >> pcie_isr(). >> >> pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response from device >> pciehp 0000:06:00.0:pcie24: pcie_disable_notification: SLOTCTRL d8 >> write cmd 0 >> pciehp 0000:06:00.0:pcie24: Device has gone away <- from pcie_isr() > > Oh, sorry! I should have noticed that. I just wanted to make sure I > didn't cause a flood of extra messages. > > I think I'll merge this version (with all three checks). We still have a > slot lifetime issue, but that's a separate problem. Sounds good to me, thanks much for your help on this. Do we really still have a slot lifetime issue though? It looks to be the path from pciehp_release_ctrl that leads to free_irq and __free_irq calling pcie_isr one last time, and there's a ctrl_info("Latch %s on Slot(%s)", open ? ..., slot_name(slot)); in pcie_isr *if* we aren't bailing when we read all 1's from PCI_EXP_SLTSTA. I think when we bail early, we should never see the subsequent attempt to read the freed slot.
On Tue, Aug 4, 2015 at 1:46 PM, Jarod Wilson <jarod@redhat.com> wrote: > On 8/4/2015 1:51 PM, Bjorn Helgaas wrote: >> >> On Tue, Aug 04, 2015 at 01:21:23PM -0400, Jarod Wilson wrote: >>> >>> On 8/4/2015 12:56 PM, Bjorn Helgaas wrote: >>>> >>>> On Tue, Aug 04, 2015 at 10:05:18AM -0400, Jarod Wilson wrote: >>>>> >>>>> On Mon, Aug 3, 2015 at 12:14 AM, Bjorn Helgaas <bhelgaas@google.com> >>>>> wrote: >>>>> >>>>>> On Tue, Jul 21, 2015 at 12:25:30PM -0400, Jarod Wilson wrote: >>>>>>> >>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=99841 >>>>>>> >>>>>>> Seems like a read of all 1's from a register of a device that has >>>>>>> gone >>>>>>> away should be taken as a sign that the device has gone away. >>>>>>> Section 6.2.10 of the PCIE spec (v4.0, rev 0.3, Feb 19, 2014) >>>>>>> suggests as >>>>>>> much with this snippet: >>>>>>> >>>>>>> |IMPLEMENTATION NOTE >>>>>>> |Data Value of All 1’s >>>>>>> |Many platforms, including those supporting RP Extensions for DPC, >>>>>>> can >>>>>>> |return a data value of all 1’s to software when an error is >>>>>>> associated >>>>>>> |with a PCI Express Configuration, I/O, or Memory Read Request. >>>>>>> During >>>>>> >>>>>> DPC, >>>>>>> >>>>>>> |the Downstream Port discards Requests destined for the Link and >>>>>> >>>>>> completes >>>>>>> >>>>>>> |them with an error (i.e., either with an Unsupported Request (UR) or >>>>>>> |Completer Abort (CA) Completion Status). By ending a series of MMIO >>>>>>> or >>>>>>> |configuration space operations with a read to an address with a >>>>>>> known >>>>>>> |data value not equal to all 1’s, software may determine if a >>>>>>> Completer >>>>>>> |has been removed or DPC has been triggered. >>>>>>> >>>>>>> I'm not sure the above is directly relevant to this case, but the >>>>>>> same >>>>>>> principle (reading all 1's means the device is probably gone) seems >>>>>>> to >>>>>>> hold. >>>>>>> >>>>>>> This is based on part of a debugging patch Bjorn posted in the >>>>>>> referenced >>>>>>> bugzilla, and its required to make the HP ZBook G2 I've got here not >>>>>>> barf >>>>>>> when disconnecting a thunderbolt ethernet adapter and corrupt memory. >>>>>>> >>>>>>> Suggested-by: Bjorn Helgaas <bhelgaas@google.com> >>>>>>> CC: Bjorn Helgaas <bhelgaas@google.com> >>>>>>> CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >>>>>>> CC: linux-pci@vger.kernel.org >>>>>>> Signed-off-by: Jarod Wilson <jarod@redhat.com> >>>>>> >>>>>> >>>>>> Hi Jarod, >>>>>> >>>>>> I think there are two issues here: >>>>>> >>>>>> 1) pciehp doesn't handle all 1's correctly >>>>>> 2) use-after-free of hotplug_slot >>>>>> >>>>>> This patch is for the first issue. I think it's correct, but I still >>>>>> have a question or two. I attached an updated version of the patch >>>>>> and changelog. >>>>>> >>>>>> Here's the path I think we're taking: 03:03.0 receives pciehp >>>>>> interrupt for removal (link down and card not present), and we call >>>>>> pciehp_unconfigure_device() for 05:00.0 and everything downstream from >>>>>> it. Part of this is removing 06:00.0. I expected this would use this >>>>>> path: >>>>>> >>>>>> pcie_port_remove_service # .remove method for 06:00.0 >>>>>> dev_printk("unloading service driver ...") >>>>>> pciehp_remove # hpdriver_portdrv.remove >>>>>> pciehp_release_ctrl >>>>>> pcie_shutdown_notification >>>>>> pcie_disable_notification >>>>>> pcie_write_cmd >>>>>> pcie_do_write_cmd(..., true) # wait >>>>>> pcie_wait_cmd >>>>>> pcie_poll_cmd >>>>>> read PCI_EXP_SLTSTA # would get 0xffff >>>>>> read PCI_EXPT_SLTCTL # would get 0xffff >>>>>> >>>>>> so I added checks for ~0 data in pcie_poll_cmd() and >>>>>> pcie_do_write_cmd(). >>>>>> >>>>>> But the dmesg log shows that we were in pcie_isr(), and I don't >>>>>> understand yet how we got there. Can you help figure that out? Maybe >>>>>> put a dump_stack() in pcie_isr() or something? >>>>> >>>>> >>>>> [ 1949.102247] pciehp 0000:03:03.0:pcie24: pcie_isr: intr_loc 108 >>>>> [ 1949.102252] pciehp 0000:03:03.0:pcie24: Presence/Notify input change >>>>> [ 1949.102256] pciehp 0000:03:03.0:pcie24: Card not present on Slot(3) >>>>> [ 1949.102262] pciehp 0000:03:03.0:pcie24: Data Link Layer State change >>>>> [ 1949.102266] pciehp 0000:03:03.0:pcie24: slot(3): Link Down event >>>>> [ 1949.102281] pciehp 0000:03:03.0:pcie24: Surprise Removal >>>>> [ 1949.102286] pciehp 0000:03:03.0:pcie24: Link Down event ignored on >>>>> slot(3): already powering off >>>>> [ 1949.102288] pciehp 0000:03:03.0:pcie24: Disabling >>>>> domain:bus:device=0000:05:00 >>>>> [ 1949.102290] pciehp 0000:03:03.0:pcie24: pciehp_unconfigure_device: >>>>> domain:bus:dev = 0000:05:00 >>>>> [ 1950.321907] tg3 0000:07:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE >>>>> will not clear MAC_TX_MODE=ffffffff >>>>> [ 1950.525986] [sched_delayed] sched: RT throttling activated >>>>> [ 1950.544164] pciehp 0000:06:00.0:pcie24: unloading service driver >>>>> pciehp >>>>> [ 1950.544170] pciehp 0000:06:00.0:pcie24: release_slot: physical_slot >>>>> = 9 >>>>> [ 1950.545016] pciehp 0000:06:00.0:pcie24: Timeout on hotplug command >>>>> 0x1038 (issued 19092 msec ago) >>>>> [ 1950.545020] pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no >>>>> response >>>> >>>> >from device >>>>> >>>>> [ 1950.545021] pciehp 0000:06:00.0:pcie24: pcie_disable_notification: >>>>> SLOTCTRL d8 write cmd 0 >>>>> [ 1950.545025] pciehp 0000:06:00.0:pcie24: Device has gone away >>>>> [ 1950.545027] CPU: 0 PID: 12361 Comm: kworker/0:2 Not tainted >>>>> 3.10.0-302.el7.hp.x86_64 #1 >>>>> [ 1950.545028] Hardware name: Hewlett-Packard HP ZBook 15 G2/2253, BIOS >>>>> M70 >>>>> Ver. 01.07 02/26/2015 >>>>> [ 1950.545033] Workqueue: pciehp-3 pciehp_power_thread >>>>> [ 1950.545034] 0000000000000000 00000000f721dd13 ffff8804822ffa78 >>>>> ffffffff81632729 >>>>> [ 1950.545036] ffff8804822ffac0 ffffffff8133bf64 ffff00000000002e >>>>> 00000000f721dd13 >>>>> [ 1950.545038] ffff8804818fab00 ffff880468f70cc0 000000000000002e >>>>> 0000000000000282 >>>>> [ 1950.545039] Call Trace: >>>>> [ 1950.545044] [<ffffffff81632729>] dump_stack+0x19/0x1b >>>>> [ 1950.545046] [<ffffffff8133bf64>] pcie_isr+0x264/0x280 >>>>> [ 1950.545048] [<ffffffff8111b6b9>] __free_irq+0x189/0x220 >>>>> [ 1950.545049] [<ffffffff8111b7e9>] free_irq+0x49/0xb0 >>>>> [ 1950.545051] [<ffffffff8133d3b9>] pciehp_release_ctrl+0xb9/0xe0 >>>>> [ 1950.545053] [<ffffffff81339db3>] pciehp_remove+0x23/0x30 >>>>> [ 1950.545055] [<ffffffff8133442e>] pcie_port_remove_service+0x4e/0x60 >>>> >>>> >>>> Do you have CONFIG_DEBUG_SHIRQ set? When CONFIG_DEBUG_SHIRQ is set, >>>> __free_irq() calls the ISR one last time. It does make sense that the >>>> driver must be prepared for the ISR to be called at any time before >>>> free_irq() returns. I just didn't see a path for the already-removed >>>> device to generate an actual interrupt. >>> >>> >>> Yup, I've got CONFIG_DEBUG_SHIRQ=y in my config. So I take it we're >>> hitting the action->handler(irq, dev_id) bit in __free_irq(), after >>> we've already done a bunch of teardown/removal? >> >> >> OK, good, that explains why we called pcie_isr(). >> >>>> Can you try the version I posted, with the additional tests in >>>> pcie_poll_cmd() and pcie_do_write_cmd()? We should try to read from >>>> the device there, even before we free the IRQ, so we might see several >>>> messages. Maybe there's a way we can be smarter about bailing out >>>> there. >>> >>> >>> The above was with your additions munged in with the older patch, I >>> actually do see "pcie_do_write_cmd: no response from device" just >>> two lines ahead of each "Device has gone away" message from >>> pcie_isr(). >>> >>> pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response from device >>> pciehp 0000:06:00.0:pcie24: pcie_disable_notification: SLOTCTRL d8 >>> write cmd 0 >>> pciehp 0000:06:00.0:pcie24: Device has gone away <- from pcie_isr() >> >> >> Oh, sorry! I should have noticed that. I just wanted to make sure I >> didn't cause a flood of extra messages. >> >> I think I'll merge this version (with all three checks). We still have a >> slot lifetime issue, but that's a separate problem. > > > Sounds good to me, thanks much for your help on this. > > Do we really still have a slot lifetime issue though? It looks to be the > path from pciehp_release_ctrl that leads to free_irq and __free_irq calling > pcie_isr one last time, and there's a ctrl_info("Latch %s on Slot(%s)", open > ? ..., slot_name(slot)); in pcie_isr *if* we aren't bailing when we read all > 1's from PCI_EXP_SLTSTA. I think when we bail early, we should never see the > subsequent attempt to read the freed slot. It's possible that we avoid referencing the freed data, but I don't have warm fuzzies because it's hard to prove that by analyzing the source code. It's hard to even know what to look for -- there's no clue in the code that says "don't reference slot->hotplug_slot after this point." And it feels like a poor design to hang on to that pointer after the slot has been freed. IIRC, your initial report mentioned possible memory corruption, and I don't even have a theory about where that happened. The slot->hotplug_slot references I saw were all reads where we printed junk but shouldn't have actually corrupted anything. Anyway, I don't know what to do about it, and I don't have time right now to poke at it, but it does worry me a bit. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 8/4/2015 3:27 PM, Bjorn Helgaas wrote: > On Tue, Aug 4, 2015 at 1:46 PM, Jarod Wilson <jarod@redhat.com> wrote: >> On 8/4/2015 1:51 PM, Bjorn Helgaas wrote: ... >>>>> Can you try the version I posted, with the additional tests in >>>>> pcie_poll_cmd() and pcie_do_write_cmd()? We should try to read from >>>>> the device there, even before we free the IRQ, so we might see several >>>>> messages. Maybe there's a way we can be smarter about bailing out >>>>> there. >>>> >>>> >>>> The above was with your additions munged in with the older patch, I >>>> actually do see "pcie_do_write_cmd: no response from device" just >>>> two lines ahead of each "Device has gone away" message from >>>> pcie_isr(). >>>> >>>> pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response from device >>>> pciehp 0000:06:00.0:pcie24: pcie_disable_notification: SLOTCTRL d8 >>>> write cmd 0 >>>> pciehp 0000:06:00.0:pcie24: Device has gone away <- from pcie_isr() >>> >>> >>> Oh, sorry! I should have noticed that. I just wanted to make sure I >>> didn't cause a flood of extra messages. >>> >>> I think I'll merge this version (with all three checks). We still have a >>> slot lifetime issue, but that's a separate problem. >> >> >> Sounds good to me, thanks much for your help on this. >> >> Do we really still have a slot lifetime issue though? It looks to be the >> path from pciehp_release_ctrl that leads to free_irq and __free_irq calling >> pcie_isr one last time, and there's a ctrl_info("Latch %s on Slot(%s)", open >> ? ..., slot_name(slot)); in pcie_isr *if* we aren't bailing when we read all >> 1's from PCI_EXP_SLTSTA. I think when we bail early, we should never see the >> subsequent attempt to read the freed slot. > > It's possible that we avoid referencing the freed data, but I don't > have warm fuzzies because it's hard to prove that by analyzing the > source code. It's hard to even know what to look for -- there's no > clue in the code that says "don't reference slot->hotplug_slot after > this point." And it feels like a poor design to hang on to that > pointer after the slot has been freed. > > IIRC, your initial report mentioned possible memory corruption, and I > don't even have a theory about where that happened. The > slot->hotplug_slot references I saw were all reads where we printed > junk but shouldn't have actually corrupted anything. Looking at the output I was seeing, it looks like one of the ~0 reads is interpreted as a switch interrupt received, data link layer state change, etc., followed by "Enabling domain:bus:device=0000:0x:00" from pciehp_power_thread. Subsequently, we're calling pciehp_enable_slot, which calls board_added, and in the output I've got, its tripping over board_added's call to pciehp_check_link_status ("Failed to check link status"), which means going to err_exit and calling set_slot_off. Next up, set_slot_off is calling pciehp_power_off_slot, which does a pcie_write_cmd(). Is it possible that write might lead to memory corruption? > Anyway, I don't know what to do about it, and I don't have time right > now to poke at it, but it does worry me a bit. Definitely a bit worrying, still trying to comprehend it all here myself...
On Tue, Aug 4, 2015 at 3:22 PM, Jarod Wilson <jarod@redhat.com> wrote: > On 8/4/2015 3:27 PM, Bjorn Helgaas wrote: >> >> On Tue, Aug 4, 2015 at 1:46 PM, Jarod Wilson <jarod@redhat.com> wrote: >>> >>> On 8/4/2015 1:51 PM, Bjorn Helgaas wrote: > > ... > >>>>>> Can you try the version I posted, with the additional tests in >>>>>> pcie_poll_cmd() and pcie_do_write_cmd()? We should try to read from >>>>>> the device there, even before we free the IRQ, so we might see several >>>>>> messages. Maybe there's a way we can be smarter about bailing out >>>>>> there. >>>>> >>>>> >>>>> >>>>> The above was with your additions munged in with the older patch, I >>>>> actually do see "pcie_do_write_cmd: no response from device" just >>>>> two lines ahead of each "Device has gone away" message from >>>>> pcie_isr(). >>>>> >>>>> pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response from device >>>>> pciehp 0000:06:00.0:pcie24: pcie_disable_notification: SLOTCTRL d8 >>>>> write cmd 0 >>>>> pciehp 0000:06:00.0:pcie24: Device has gone away <- from pcie_isr() >>>> >>>> >>>> >>>> Oh, sorry! I should have noticed that. I just wanted to make sure I >>>> didn't cause a flood of extra messages. >>>> >>>> I think I'll merge this version (with all three checks). We still have >>>> a >>>> slot lifetime issue, but that's a separate problem. >>> >>> >>> >>> Sounds good to me, thanks much for your help on this. >>> >>> Do we really still have a slot lifetime issue though? It looks to be the >>> path from pciehp_release_ctrl that leads to free_irq and __free_irq >>> calling >>> pcie_isr one last time, and there's a ctrl_info("Latch %s on Slot(%s)", >>> open >>> ? ..., slot_name(slot)); in pcie_isr *if* we aren't bailing when we read >>> all >>> 1's from PCI_EXP_SLTSTA. I think when we bail early, we should never see >>> the >>> subsequent attempt to read the freed slot. >> >> >> It's possible that we avoid referencing the freed data, but I don't >> have warm fuzzies because it's hard to prove that by analyzing the >> source code. It's hard to even know what to look for -- there's no >> clue in the code that says "don't reference slot->hotplug_slot after >> this point." And it feels like a poor design to hang on to that >> pointer after the slot has been freed. >> >> IIRC, your initial report mentioned possible memory corruption, and I >> don't even have a theory about where that happened. The >> slot->hotplug_slot references I saw were all reads where we printed >> junk but shouldn't have actually corrupted anything. > > > Looking at the output I was seeing, it looks like one of the ~0 reads is > interpreted as a switch interrupt received, data link layer state change, > etc., followed by "Enabling domain:bus:device=0000:0x:00" from > pciehp_power_thread. Subsequently, we're calling pciehp_enable_slot, which > calls board_added, and in the output I've got, its tripping over > board_added's call to pciehp_check_link_status ("Failed to check link > status"), which means going to err_exit and calling set_slot_off. > > Next up, set_slot_off is calling pciehp_power_off_slot, which does a > pcie_write_cmd(). Is it possible that write might lead to memory corruption? I doubt it; pcie_write_cmd() by itself just writes to a bridge register. Even if the device is gone, that shouldn't corrupt memory. But I don't know what really happened, and I don't remember what led to the corruption hypothesis in the first place. I think the corrupted-looking slot name strings are just a consequence of reading memory that had already been freed. With some work, we might be able to confirm that by matching it with a poison pattern or something. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index daf54be..8f3d3cf 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -111,6 +111,12 @@ static int pcie_poll_cmd(struct controller *ctrl, int timeout) while (true) { pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status); + if (slot_status == (u16) ~0) { + ctrl_info(ctrl, "%s: no response from device\n", + __func__); + return 0; + } + if (slot_status & PCI_EXP_SLTSTA_CC) { pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC); @@ -186,6 +192,11 @@ static void pcie_do_write_cmd(struct controller *ctrl, u16 cmd, pcie_wait_cmd(ctrl); pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &slot_ctrl); + if (slot_ctrl == (u16) ~0) { + ctrl_info(ctrl, "%s: no response from device\n", __func__); + goto out; + } + slot_ctrl &= ~mask; slot_ctrl |= (cmd & mask); ctrl->cmd_busy = 1; @@ -201,6 +212,7 @@ static void pcie_do_write_cmd(struct controller *ctrl, u16 cmd, if (wait) pcie_wait_cmd(ctrl); +out: mutex_unlock(&ctrl->ctrl_lock); } @@ -542,6 +554,11 @@ static irqreturn_t pcie_isr(int irq, void *dev_id) intr_loc = 0; do { pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &detected); + if (detected == (u16) ~0) { + ctrl_info(ctrl, "%s: no response from device\n", + __func__); + return IRQ_HANDLED; + } detected &= (PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD | PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |