diff mbox

PCI: fix kernel oops on bridge rmoval

Message ID 49CB33A0.80401@jp.fujitsu.com
State Accepted, archived
Headers show

Commit Message

Kenji Kaneshige March 26, 2009, 7:49 a.m. UTC
Hi,

I encountered the kernel oops when I tried bridge removal using
Alex's logical hotplug interface on Jesse's linux-next. I'm
attaching the patch to solve this problem. See the description
of the attached patch for details.

This patch is against Jesse's linux-next.

Thanks,
Kenji Kaneshige


Fix the following kernel oops problem that happens when removing PCI
bridge with pciehp loaded. It should also occur with other hotplug
driver that is implemented as a bridge's driver.

[  459.997257] pciehp 0000:2f:04.0:pcie24: unloading service driver pciehp
[  459.997495] general protection fault: 0000 [#1] SMP
[  459.997737] last sysfs file: /sys/devices/pci0000:00/0000:00:04.0/0000:2e:00.0/0000:2f:04.0/remove
[  459.997964] CPU 4
[  459.998129] Modules linked in: pciehp ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod sbs sbshc battery ac parport_pc lp parport mptspi mptscsih mptbase scsi_transport_spi e1000e sg sr_mod cdrom button serio_raw i2c_i801 i2c_core shpchp pcspkr ata_piix libata megaraid_sas sd_mod scsi_mod crc_t10dif ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
[  459.998129] Pid: 56, comm: events/4 Not tainted 2.6.29-rc8-kk #1 PRIMERGY
[  459.998129] RIP: 0010:[<ffffffff803bf047>]  [<ffffffff803bf047>] pci_slot_release+0x37/0x100
[  459.998129] RSP: 0018:ffff88083b3bf9e0  EFLAGS: 00010246
[  459.998129] RAX: ffff88083adc5158 RBX: ffff880836c1bc80 RCX: 6b6b6b6b6b6b6b6b
[  459.998129] RDX: 0000000000000000 RSI: ffffffff803a77f0 RDI: ffff880836c1bc48
[  459.998129] RBP: ffff88083b3bfa00 R08: 0000000000000002 R09: 0000000000000000
[  459.998129] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880836c1bc48
[  459.998129] R13: ffff880836c1bc20 R14: ffff880836c1bc48 R15: ffff880836d1ec38
[  459.998129] FS:  0000000000000000(0000) GS:ffff88083ccc3770(0000) knlGS:0000000000000000
[  459.998129] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[  459.998129] CR2: 00007f1562f1d558 CR3: 0000000838090000 CR4: 00000000000006e0
[  459.998129] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  459.998129] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  459.998129] Process events/4 (pid: 56, threadinfo ffff88083b3be000, task ffff88083b3b3e40)
[  459.998129] Stack:
[  459.998129]  ffff880836c1bc80 ffff880836c1bc48 ffffffff80793320 ffff88083b0d0960
[  459.998129]  ffff88083b3bfa30 ffffffff803a788a ffff880836c1bc80 ffffffff803a77f0
[  459.998129]  ffff880836c1bc20 ffff880836d1ec38 ffff88083b3bfa50 ffffffff803a8ce7
[  459.998129] Call Trace:
[  459.998129]  [<ffffffff803a788a>] kobject_release+0x9a/0x290
[  459.998129]  [<ffffffff803a77f0>] ? kobject_release+0x0/0x290
[  459.998129]  [<ffffffff803a8ce7>] kref_put+0x37/0x80
[  459.998129]  [<ffffffff803a76f7>] kobject_put+0x27/0x60
[  459.998129]  [<ffffffff803bebcc>] ? pci_destroy_slot+0x3c/0xc0
[  459.998129]  [<ffffffff803bebd5>] pci_destroy_slot+0x45/0xc0
[  459.998129]  [<ffffffff803c797d>] pci_hp_deregister+0x13d/0x210
[  459.998129]  [<ffffffffa031141d>] cleanup_slots+0x2d/0x80 [pciehp]
[  459.998129]  [<ffffffffa0311735>] pciehp_remove+0x15/0x30 [pciehp]
[  459.998129]  [<ffffffff803c4c99>] pcie_port_remove_service+0x69/0x90
[  459.998129]  [<ffffffff80441da9>] __device_release_driver+0x59/0x90
[  459.998129]  [<ffffffff80441edb>] device_release_driver+0x2b/0x40
[  459.998129]  [<ffffffff804419d6>] bus_remove_device+0xa6/0x120
[  459.998129]  [<ffffffff8043e46b>] device_del+0x12b/0x190
[  459.998129]  [<ffffffff803c4d90>] ? remove_iter+0x0/0x40
[  459.998129]  [<ffffffff8043e4f6>] device_unregister+0x26/0x70
[  459.998129]  [<ffffffff803c4dbf>] remove_iter+0x2f/0x40
[  459.998129]  [<ffffffff8043ddf3>] device_for_each_child+0x33/0x60
[  459.998129]  [<ffffffff8033ee30>] ? sysfs_schedule_callback_work+0x0/0x50
[  459.998129]  [<ffffffff803c4d30>] pcie_port_device_remove+0x30/0x80
[  459.998129]  [<ffffffff803c55a1>] pcie_portdrv_remove+0x11/0x20
[  459.998129]  [<ffffffff803bfeb2>] pci_device_remove+0x32/0x70
[  459.998129]  [<ffffffff80441da9>] __device_release_driver+0x59/0x90
[  459.998129]  [<ffffffff80441edb>] device_release_driver+0x2b/0x40
[  459.998129]  [<ffffffff804419d6>] bus_remove_device+0xa6/0x120
[  459.998129]  [<ffffffff8043e46b>] device_del+0x12b/0x190
[  459.998129]  [<ffffffff8043e4f6>] device_unregister+0x26/0x70
[  459.998129]  [<ffffffff803ba969>] pci_stop_dev+0x49/0x60
[  459.998129]  [<ffffffff803baab0>] pci_remove_bus_device+0x40/0xc0
[  459.998129]  [<ffffffff803c10d9>] remove_callback+0x29/0x40
[  459.998129]  [<ffffffff8033ee4f>] sysfs_schedule_callback_work+0x1f/0x50
[  459.998129]  [<ffffffff8025769a>] run_workqueue+0x15a/0x230
[  459.998129]  [<ffffffff80257648>] ? run_workqueue+0x108/0x230
[  459.998129]  [<ffffffff8025846f>] worker_thread+0x9f/0x100
[  459.998129]  [<ffffffff8025bce0>] ? autoremove_wake_function+0x0/0x40
[  459.998129]  [<ffffffff802583d0>] ? worker_thread+0x0/0x100
[  459.998129]  [<ffffffff8025b89d>] kthread+0x4d/0x80
[  459.998129]  [<ffffffff8020d4ba>] child_rip+0xa/0x20
[  459.998129]  [<ffffffff8020cebc>] ? restore_args+0x0/0x30
[  459.998129]  [<ffffffff8025b850>] ? kthread+0x0/0x80
[  459.998129]  [<ffffffff8020d4b0>] ? child_rip+0x0/0x20
[  459.998129] Code: 56 49 89 fe 41 55 4c 8d 6f d8 41 54 53 74 09 f6 05 b8 05 c7 00 08 75 72 49 8b 45 00 48 8b 48 28 eb 05 66 90 48 89 f1 49 8b 45 00 <48> 8b 31 48 83 c0 28 0f 18 0e 48 39 c1 74 1c 8b 41 38 41 0f b6
[  459.998129] RIP  [<ffffffff803bf047>] pci_slot_release+0x37/0x100
[  459.998129]  RSP <ffff88083b3bf9e0>
[  460.018595] ---[ end trace 5a08d2095374aedc ]---

The pci_remove_bus_device() removes all buses and devices under the
bridge, and then remove the bridge. So the remove() callback of the
hotplug drivers implemented as a bridge's driver is executed after the
struct pci_bus of the bridge's secondary bus is removed. The remove()
callback of those driver deregister the slot using pci_destroy_slot(),
and slot's release callback refers the struct pci_bus that was already
freed. This is the cause of the kernel oops.

This patch solves the problem by stop all the driver before removing
the bridge and its childe bus and devices.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>

 drivers/pci/remove.c |    1 +
 1 file changed, 1 insertion(+)


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Alexander Chiang March 26, 2009, 12:56 p.m. UTC | #1
* Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>:
> Hi,
> 
> I encountered the kernel oops when I tried bridge removal using
> Alex's logical hotplug interface on Jesse's linux-next. I'm
> attaching the patch to solve this problem. See the description
> of the attached patch for details.
> 
> This patch is against Jesse's linux-next.
> 
> Thanks,
> Kenji Kaneshige
> 
> 
> Fix the following kernel oops problem that happens when removing PCI
> bridge with pciehp loaded. It should also occur with other hotplug
> driver that is implemented as a bridge's driver.
> 
> [  459.997257] pciehp 0000:2f:04.0:pcie24: unloading service driver pciehp
> [  459.997495] general protection fault: 0000 [#1] SMP
> [  459.997737] last sysfs file: /sys/devices/pci0000:00/0000:00:04.0/0000:2e:00.0/0000:2f:04.0/remove
> [  459.997964] CPU 4
> [  459.998129] Modules linked in: pciehp ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod sbs sbshc battery ac parport_pc lp parport mptspi mptscsih mptbase scsi_transport_spi e1000e sg sr_mod cdrom button serio_raw i2c_i801 i2c_core shpchp pcspkr ata_piix libata megaraid_sas sd_mod scsi_mod crc_t10dif ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
> [  459.998129] Pid: 56, comm: events/4 Not tainted 2.6.29-rc8-kk #1 PRIMERGY
> [  459.998129] RIP: 0010:[<ffffffff803bf047>]  [<ffffffff803bf047>] pci_slot_release+0x37/0x100
> [  459.998129] RSP: 0018:ffff88083b3bf9e0  EFLAGS: 00010246
> [  459.998129] RAX: ffff88083adc5158 RBX: ffff880836c1bc80 RCX: 6b6b6b6b6b6b6b6b
> [  459.998129] RDX: 0000000000000000 RSI: ffffffff803a77f0 RDI: ffff880836c1bc48
> [  459.998129] RBP: ffff88083b3bfa00 R08: 0000000000000002 R09: 0000000000000000
> [  459.998129] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880836c1bc48
> [  459.998129] R13: ffff880836c1bc20 R14: ffff880836c1bc48 R15: ffff880836d1ec38
> [  459.998129] FS:  0000000000000000(0000) GS:ffff88083ccc3770(0000) knlGS:0000000000000000
> [  459.998129] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [  459.998129] CR2: 00007f1562f1d558 CR3: 0000000838090000 CR4: 00000000000006e0
> [  459.998129] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  459.998129] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  459.998129] Process events/4 (pid: 56, threadinfo ffff88083b3be000, task ffff88083b3b3e40)
> [  459.998129] Stack:
> [  459.998129]  ffff880836c1bc80 ffff880836c1bc48 ffffffff80793320 ffff88083b0d0960
> [  459.998129]  ffff88083b3bfa30 ffffffff803a788a ffff880836c1bc80 ffffffff803a77f0
> [  459.998129]  ffff880836c1bc20 ffff880836d1ec38 ffff88083b3bfa50 ffffffff803a8ce7
> [  459.998129] Call Trace:
> [  459.998129]  [<ffffffff803a788a>] kobject_release+0x9a/0x290
> [  459.998129]  [<ffffffff803a77f0>] ? kobject_release+0x0/0x290
> [  459.998129]  [<ffffffff803a8ce7>] kref_put+0x37/0x80
> [  459.998129]  [<ffffffff803a76f7>] kobject_put+0x27/0x60
> [  459.998129]  [<ffffffff803bebcc>] ? pci_destroy_slot+0x3c/0xc0
> [  459.998129]  [<ffffffff803bebd5>] pci_destroy_slot+0x45/0xc0
> [  459.998129]  [<ffffffff803c797d>] pci_hp_deregister+0x13d/0x210
> [  459.998129]  [<ffffffffa031141d>] cleanup_slots+0x2d/0x80 [pciehp]
> [  459.998129]  [<ffffffffa0311735>] pciehp_remove+0x15/0x30 [pciehp]
> [  459.998129]  [<ffffffff803c4c99>] pcie_port_remove_service+0x69/0x90
> [  459.998129]  [<ffffffff80441da9>] __device_release_driver+0x59/0x90
> [  459.998129]  [<ffffffff80441edb>] device_release_driver+0x2b/0x40
> [  459.998129]  [<ffffffff804419d6>] bus_remove_device+0xa6/0x120
> [  459.998129]  [<ffffffff8043e46b>] device_del+0x12b/0x190
> [  459.998129]  [<ffffffff803c4d90>] ? remove_iter+0x0/0x40
> [  459.998129]  [<ffffffff8043e4f6>] device_unregister+0x26/0x70
> [  459.998129]  [<ffffffff803c4dbf>] remove_iter+0x2f/0x40
> [  459.998129]  [<ffffffff8043ddf3>] device_for_each_child+0x33/0x60
> [  459.998129]  [<ffffffff8033ee30>] ? sysfs_schedule_callback_work+0x0/0x50
> [  459.998129]  [<ffffffff803c4d30>] pcie_port_device_remove+0x30/0x80
> [  459.998129]  [<ffffffff803c55a1>] pcie_portdrv_remove+0x11/0x20
> [  459.998129]  [<ffffffff803bfeb2>] pci_device_remove+0x32/0x70
> [  459.998129]  [<ffffffff80441da9>] __device_release_driver+0x59/0x90
> [  459.998129]  [<ffffffff80441edb>] device_release_driver+0x2b/0x40
> [  459.998129]  [<ffffffff804419d6>] bus_remove_device+0xa6/0x120
> [  459.998129]  [<ffffffff8043e46b>] device_del+0x12b/0x190
> [  459.998129]  [<ffffffff8043e4f6>] device_unregister+0x26/0x70
> [  459.998129]  [<ffffffff803ba969>] pci_stop_dev+0x49/0x60
> [  459.998129]  [<ffffffff803baab0>] pci_remove_bus_device+0x40/0xc0
> [  459.998129]  [<ffffffff803c10d9>] remove_callback+0x29/0x40
> [  459.998129]  [<ffffffff8033ee4f>] sysfs_schedule_callback_work+0x1f/0x50
> [  459.998129]  [<ffffffff8025769a>] run_workqueue+0x15a/0x230
> [  459.998129]  [<ffffffff80257648>] ? run_workqueue+0x108/0x230
> [  459.998129]  [<ffffffff8025846f>] worker_thread+0x9f/0x100
> [  459.998129]  [<ffffffff8025bce0>] ? autoremove_wake_function+0x0/0x40
> [  459.998129]  [<ffffffff802583d0>] ? worker_thread+0x0/0x100
> [  459.998129]  [<ffffffff8025b89d>] kthread+0x4d/0x80
> [  459.998129]  [<ffffffff8020d4ba>] child_rip+0xa/0x20
> [  459.998129]  [<ffffffff8020cebc>] ? restore_args+0x0/0x30
> [  459.998129]  [<ffffffff8025b850>] ? kthread+0x0/0x80
> [  459.998129]  [<ffffffff8020d4b0>] ? child_rip+0x0/0x20
> [  459.998129] Code: 56 49 89 fe 41 55 4c 8d 6f d8 41 54 53 74 09 f6 05 b8 05 c7 00 08 75 72 49 8b 45 00 48 8b 48 28 eb 05 66 90 48 89 f1 49 8b 45 00 <48> 8b 31 48 83 c0 28 0f 18 0e 48 39 c1 74 1c 8b 41 38 41 0f b6
> [  459.998129] RIP  [<ffffffff803bf047>] pci_slot_release+0x37/0x100
> [  459.998129]  RSP <ffff88083b3bf9e0>
> [  460.018595] ---[ end trace 5a08d2095374aedc ]---
> 
> The pci_remove_bus_device() removes all buses and devices under the
> bridge, and then remove the bridge. So the remove() callback of the
                   removes
> hotplug drivers implemented as a bridge's driver is executed after the
> struct pci_bus of the bridge's secondary bus is removed. The remove()
> callback of those driver deregister the slot using pci_destroy_slot(),
                           unregisters
> and slot's release callback refers the struct pci_bus that was already
                              refers to the
> freed. This is the cause of the kernel oops.
> 
> This patch solves the problem by stop all the driver before removing
> the bridge and its childe bus and devices.
                     child
> 

Good catch, thank you Kenji-san. I didn't see this because I
didn't have hotplug drivers loaded during my testing. :-/

I was thinking originally of making the hotplug drivers register
a bus notifier, similar to what Trent did with his new legacy
fakephp which is probably still necessary, but this change is a
good start.

I tested this patch on my machines and it works fine in the "no
hotplug drivers" loaded case.

Jesse, can you just clean up the changelog (and patch title)
before applying?

Thanks.

Acked-by: Alex Chiang <achiang@hp.com>

> Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
> 
>  drivers/pci/remove.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> Index: linux-next-20090323/drivers/pci/remove.c
> ===================================================================
> --- linux-next-20090323.orig/drivers/pci/remove.c
> +++ linux-next-20090323/drivers/pci/remove.c
> @@ -95,6 +95,7 @@ EXPORT_SYMBOL(pci_remove_bus);
>   */
>  void pci_remove_bus_device(struct pci_dev *dev)
>  {
> +	pci_stop_bus_device(dev);
>  	if (dev->subordinate) {
>  		struct pci_bus *b = dev->subordinate;
>  
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kenji Kaneshige March 26, 2009, 2:36 p.m. UTC | #2
Alex Chiang wrote:
> * Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>:
>> Hi,
>>
>> I encountered the kernel oops when I tried bridge removal using
>> Alex's logical hotplug interface on Jesse's linux-next. I'm
>> attaching the patch to solve this problem. See the description
>> of the attached patch for details.
>>
>> This patch is against Jesse's linux-next.
>>
>> Thanks,
>> Kenji Kaneshige
>>
>>
>> Fix the following kernel oops problem that happens when removing PCI
>> bridge with pciehp loaded. It should also occur with other hotplug
>> driver that is implemented as a bridge's driver.
>>
>> [  459.997257] pciehp 0000:2f:04.0:pcie24: unloading service driver pciehp
>> [  459.997495] general protection fault: 0000 [#1] SMP
>> [  459.997737] last sysfs file: /sys/devices/pci0000:00/0000:00:04.0/0000:2e:00.0/0000:2f:04.0/remove
>> [  459.997964] CPU 4
>> [  459.998129] Modules linked in: pciehp ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod sbs sbshc battery ac parport_pc lp parport mptspi mptscsih mptbase scsi_transport_spi e1000e sg sr_mod cdrom button serio_raw i2c_i801 i2c_core shpchp pcspkr ata_piix libata megaraid_sas sd_mod scsi_mod crc_t10dif ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
>> [  459.998129] Pid: 56, comm: events/4 Not tainted 2.6.29-rc8-kk #1 PRIMERGY
>> [  459.998129] RIP: 0010:[<ffffffff803bf047>]  [<ffffffff803bf047>] pci_slot_release+0x37/0x100
>> [  459.998129] RSP: 0018:ffff88083b3bf9e0  EFLAGS: 00010246
>> [  459.998129] RAX: ffff88083adc5158 RBX: ffff880836c1bc80 RCX: 6b6b6b6b6b6b6b6b
>> [  459.998129] RDX: 0000000000000000 RSI: ffffffff803a77f0 RDI: ffff880836c1bc48
>> [  459.998129] RBP: ffff88083b3bfa00 R08: 0000000000000002 R09: 0000000000000000
>> [  459.998129] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880836c1bc48
>> [  459.998129] R13: ffff880836c1bc20 R14: ffff880836c1bc48 R15: ffff880836d1ec38
>> [  459.998129] FS:  0000000000000000(0000) GS:ffff88083ccc3770(0000) knlGS:0000000000000000
>> [  459.998129] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>> [  459.998129] CR2: 00007f1562f1d558 CR3: 0000000838090000 CR4: 00000000000006e0
>> [  459.998129] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [  459.998129] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [  459.998129] Process events/4 (pid: 56, threadinfo ffff88083b3be000, task ffff88083b3b3e40)
>> [  459.998129] Stack:
>> [  459.998129]  ffff880836c1bc80 ffff880836c1bc48 ffffffff80793320 ffff88083b0d0960
>> [  459.998129]  ffff88083b3bfa30 ffffffff803a788a ffff880836c1bc80 ffffffff803a77f0
>> [  459.998129]  ffff880836c1bc20 ffff880836d1ec38 ffff88083b3bfa50 ffffffff803a8ce7
>> [  459.998129] Call Trace:
>> [  459.998129]  [<ffffffff803a788a>] kobject_release+0x9a/0x290
>> [  459.998129]  [<ffffffff803a77f0>] ? kobject_release+0x0/0x290
>> [  459.998129]  [<ffffffff803a8ce7>] kref_put+0x37/0x80
>> [  459.998129]  [<ffffffff803a76f7>] kobject_put+0x27/0x60
>> [  459.998129]  [<ffffffff803bebcc>] ? pci_destroy_slot+0x3c/0xc0
>> [  459.998129]  [<ffffffff803bebd5>] pci_destroy_slot+0x45/0xc0
>> [  459.998129]  [<ffffffff803c797d>] pci_hp_deregister+0x13d/0x210
>> [  459.998129]  [<ffffffffa031141d>] cleanup_slots+0x2d/0x80 [pciehp]
>> [  459.998129]  [<ffffffffa0311735>] pciehp_remove+0x15/0x30 [pciehp]
>> [  459.998129]  [<ffffffff803c4c99>] pcie_port_remove_service+0x69/0x90
>> [  459.998129]  [<ffffffff80441da9>] __device_release_driver+0x59/0x90
>> [  459.998129]  [<ffffffff80441edb>] device_release_driver+0x2b/0x40
>> [  459.998129]  [<ffffffff804419d6>] bus_remove_device+0xa6/0x120
>> [  459.998129]  [<ffffffff8043e46b>] device_del+0x12b/0x190
>> [  459.998129]  [<ffffffff803c4d90>] ? remove_iter+0x0/0x40
>> [  459.998129]  [<ffffffff8043e4f6>] device_unregister+0x26/0x70
>> [  459.998129]  [<ffffffff803c4dbf>] remove_iter+0x2f/0x40
>> [  459.998129]  [<ffffffff8043ddf3>] device_for_each_child+0x33/0x60
>> [  459.998129]  [<ffffffff8033ee30>] ? sysfs_schedule_callback_work+0x0/0x50
>> [  459.998129]  [<ffffffff803c4d30>] pcie_port_device_remove+0x30/0x80
>> [  459.998129]  [<ffffffff803c55a1>] pcie_portdrv_remove+0x11/0x20
>> [  459.998129]  [<ffffffff803bfeb2>] pci_device_remove+0x32/0x70
>> [  459.998129]  [<ffffffff80441da9>] __device_release_driver+0x59/0x90
>> [  459.998129]  [<ffffffff80441edb>] device_release_driver+0x2b/0x40
>> [  459.998129]  [<ffffffff804419d6>] bus_remove_device+0xa6/0x120
>> [  459.998129]  [<ffffffff8043e46b>] device_del+0x12b/0x190
>> [  459.998129]  [<ffffffff8043e4f6>] device_unregister+0x26/0x70
>> [  459.998129]  [<ffffffff803ba969>] pci_stop_dev+0x49/0x60
>> [  459.998129]  [<ffffffff803baab0>] pci_remove_bus_device+0x40/0xc0
>> [  459.998129]  [<ffffffff803c10d9>] remove_callback+0x29/0x40
>> [  459.998129]  [<ffffffff8033ee4f>] sysfs_schedule_callback_work+0x1f/0x50
>> [  459.998129]  [<ffffffff8025769a>] run_workqueue+0x15a/0x230
>> [  459.998129]  [<ffffffff80257648>] ? run_workqueue+0x108/0x230
>> [  459.998129]  [<ffffffff8025846f>] worker_thread+0x9f/0x100
>> [  459.998129]  [<ffffffff8025bce0>] ? autoremove_wake_function+0x0/0x40
>> [  459.998129]  [<ffffffff802583d0>] ? worker_thread+0x0/0x100
>> [  459.998129]  [<ffffffff8025b89d>] kthread+0x4d/0x80
>> [  459.998129]  [<ffffffff8020d4ba>] child_rip+0xa/0x20
>> [  459.998129]  [<ffffffff8020cebc>] ? restore_args+0x0/0x30
>> [  459.998129]  [<ffffffff8025b850>] ? kthread+0x0/0x80
>> [  459.998129]  [<ffffffff8020d4b0>] ? child_rip+0x0/0x20
>> [  459.998129] Code: 56 49 89 fe 41 55 4c 8d 6f d8 41 54 53 74 09 f6 05 b8 05 c7 00 08 75 72 49 8b 45 00 48 8b 48 28 eb 05 66 90 48 89 f1 49 8b 45 00 <48> 8b 31 48 83 c0 28 0f 18 0e 48 39 c1 74 1c 8b 41 38 41 0f b6
>> [  459.998129] RIP  [<ffffffff803bf047>] pci_slot_release+0x37/0x100
>> [  459.998129]  RSP <ffff88083b3bf9e0>
>> [  460.018595] ---[ end trace 5a08d2095374aedc ]---
>>
>> The pci_remove_bus_device() removes all buses and devices under the
>> bridge, and then remove the bridge. So the remove() callback of the
>                    removes
>> hotplug drivers implemented as a bridge's driver is executed after the
>> struct pci_bus of the bridge's secondary bus is removed. The remove()
>> callback of those driver deregister the slot using pci_destroy_slot(),
>                            unregisters
>> and slot's release callback refers the struct pci_bus that was already
>                               refers to the
>> freed. This is the cause of the kernel oops.
>>
>> This patch solves the problem by stop all the driver before removing
>> the bridge and its childe bus and devices.
>                      child
> 
> Good catch, thank you Kenji-san. I didn't see this because I
> didn't have hotplug drivers loaded during my testing. :-/
> 
> I was thinking originally of making the hotplug drivers register
> a bus notifier, similar to what Trent did with his new legacy
> fakephp which is probably still necessary, but this change is a
> good start.
> 
> I tested this patch on my machines and it works fine in the "no
> hotplug drivers" loaded case.
> 

Thank you very much for testing.

We still have similar kernel oops (see below) with ACPI pci slot
detection driver. I guess the same problem would also occur with
acpiphp though I've not tried yet. I don't look at Trent's bus
notifier approach yet, but I think we need something like this to
fix this problem.

Here are steps to reproduce and kernel oops message.

* Steps to reproduce

(1) Load ACPI pci slot detection driver
(2) Remove the parent bridge of the slot
(3) Unload ACPI pci slot detection driver

* Kernel oops message

[24462.585196] general protection fault: 0000 [#1] SMP
[24462.585306] last sysfs file: /sys/devices/pci0000:00/0000:00:04.0/0000:2e:00.0/0000:2f:04.0/remove
[24462.585314] CPU 10
[24462.585314] Modules linked in: pci_slot(-) ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod sbs sbshc battery ac parport_pc lp parport mptspi mptscsih mptbase scsi_transport_spi e1000e sg sr_mod cdrom button serio_raw i2c_i801 i2c_core shpchp pcspkr ata_piix libata megaraid_sas sd_mod scsi_mod crc_t10dif ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: pci_slot]
[24462.585314] Pid: 864, comm: rmmod Not tainted 2.6.29-rc8-kk #2 PRIMERGY
[24462.585314] RIP: 0010:[<ffffffff803bf047>]  [<ffffffff803bf047>] pci_slot_release+0x37/0x100
[24462.585314] RSP: 0018:ffff88081fdfddc8  EFLAGS: 00010246
[24462.585314] RAX: ffff880838d72688 RBX: ffff880824428380 RCX: 6b6b6b6b6b6b6b6b
[24462.585314] RDX: 0000000000000000 RSI: ffffffff803a77f0 RDI: ffff880824428348
[24462.585314] RBP: ffff88081fdfdde8 R08: 0000000000000002 R09: 0000000000000000
[24462.585314] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880824428348
[24462.585314] R13: ffff880824428320 R14: ffff880824428348 R15: 0000000000000880
[24462.585314] FS:  00007f0414b7c6e0(0000) GS:ffff88083b16caf0(0000) knlGS:0000000000000000
[24462.585314] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[24462.585314] CR2: 0000003f15474bd0 CR3: 000000081899b000 CR4: 00000000000006e0
[24462.585314] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[24462.585314] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[24462.585314] Process rmmod (pid: 864, threadinfo ffff88081fdfc000, task ffff8808245e8000)
[24462.585314] Stack:
[24462.585314]  ffff880824428380 ffff880824428348 ffffffff80793320 ffff88083650c460
[24462.585314]  ffff88081fdfde18 ffffffff803a788a ffff880824428380 ffffffff803a77f0
[24462.585314]  ffff880824428320 ffff88081fdfdef8 ffff88081fdfde38 ffffffff803a8ce7
[24462.585314] Call Trace:
[24462.585314]  [<ffffffff803a788a>] kobject_release+0x9a/0x290
[24462.585314]  [<ffffffff803a77f0>] ? kobject_release+0x0/0x290
[24462.585314]  [<ffffffff803a8ce7>] kref_put+0x37/0x80
[24462.585314]  [<ffffffff803a76f7>] kobject_put+0x27/0x60
[24462.585314]  [<ffffffff803bebcc>] ? pci_destroy_slot+0x3c/0xc0
[24462.585314]  [<ffffffff803bebd5>] pci_destroy_slot+0x45/0xc0
[24462.585314]  [<ffffffffa000f05c>] acpi_pci_slot_remove+0x5c/0x91 [pci_slot]
[24462.585314]  [<ffffffff8040064b>] acpi_pci_unregister_driver+0x4b/0x62
[24462.585314]  [<ffffffffa000f5c8>] acpi_pci_slot_exit+0x10/0x12 [pci_slot]
[24462.585314]  [<ffffffff80276ce1>] sys_delete_module+0x161/0x250
[24462.585314]  [<ffffffff80567100>] ? trace_hardirqs_off_thunk+0x30/0x3c
[24462.585314]  [<ffffffff8029151a>] ? audit_syscall_entry+0x14a/0x1b0
[24462.585314]  [<ffffffff8020c3db>] system_call_fastpath+0x16/0x1b
[24462.585314] Code: 56 49 89 fe 41 55 4c 8d 6f d8 41 54 53 74 09 f6 05 b8 05 c7 00 08 75 72 49 8b 45 00 48 8b 48 28 eb 05 66 90 48 89 f1 49 8b 45 00 <48> 8b 31 48 83 c0 28 0f 18 0e 48 39 c1 74 1c 8b 41 38 41 0f b6
[24462.585314] RIP  [<ffffffff803bf047>] pci_slot_release+0x37/0x100
[24462.585314]  RSP <ffff88081fdfddc8>
[24462.592478] ---[ end trace e97c8f1f187fa2b0 ]---

Thanks,
Kenji Kaneshige


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesse Barnes March 26, 2009, 10:55 p.m. UTC | #3
On Thu, 26 Mar 2009 16:49:52 +0900
Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> wrote:

> Hi,
> 
> I encountered the kernel oops when I tried bridge removal using
> Alex's logical hotplug interface on Jesse's linux-next. I'm
> attaching the patch to solve this problem. See the description
> of the attached patch for details.
> 
> This patch is against Jesse's linux-next.

Applied, thanks.
diff mbox

Patch

Index: linux-next-20090323/drivers/pci/remove.c
===================================================================
--- linux-next-20090323.orig/drivers/pci/remove.c
+++ linux-next-20090323/drivers/pci/remove.c
@@ -95,6 +95,7 @@  EXPORT_SYMBOL(pci_remove_bus);
  */
 void pci_remove_bus_device(struct pci_dev *dev)
 {
+	pci_stop_bus_device(dev);
 	if (dev->subordinate) {
 		struct pci_bus *b = dev->subordinate;