diff mbox

PCI: fix kernel oops on bridge rmoval

Message ID 20090326213839.GA1112@ldl.fc.hp.com (mailing list archive)
State Rejected, archived
Headers show

Commit Message

Alexander Chiang March 26, 2009, 9:38 p.m. UTC
* Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>:
> 
> Thank you very much for testing.
> 
> We still have similar kernel oops (see below) with ACPI pci slot
> detection driver. I guess the same problem would also occur with
> acpiphp though I've not tried yet. I don't look at Trent's bus
> notifier approach yet, but I think we need something like this to
> fix this problem.
> 
> Here are steps to reproduce and kernel oops message.
> 
> * Steps to reproduce
> 
> (1) Load ACPI pci slot detection driver
> (2) Remove the parent bridge of the slot
> (3) Unload ACPI pci slot detection driver

Thanks for the report.

I believe this patch will fix the case for bridges. I haven't
tested what happens if we remove an endpoint yet though.

Can you try this?

Thanks.

/ac

---
commit 557ce38e78cf06ce16aefcf273051ea0bac3d35c
Author: Alex Chiang <achiang@hp.com>
Date:   Thu Mar 26 15:36:34 2009 -0600

    PCI: pci_create_slot / pci_destroy_slot need to grab reference to parent bus
    
    If a logical hotunplug (remove) is performed on a PCI bridge claimed by
    a hotplug or slot detection driver, and then the hotplug/detection module
    is unloaded, we will encounter an oops:
    
    Call Trace:
     [<a000000100039bc0>] die+0x1c0/0x2c0
                                    sp=e0000005062ff9e0 bsp=e0000005062f13b0
     [<a000000100039d00>] die_if_kernel+0x40/0x60
                                    sp=e0000005062ff9e0 bsp=e0000005062f1380
     [<a00000010003b590>] ia64_fault+0x1230/0x1280
                                    sp=e0000005062ff9e0 bsp=e0000005062f1300
     [<a00000010000c700>] ia64_native_leave_kernel+0x0/0x270
                                    sp=e0000005062ffbf0 bsp=e0000005062f1300
     [<a0000001003988f0>] pci_slot_release+0x70/0x1c0
                                    sp=e0000005062ffdc0 bsp=e0000005062f12b0
     [<a0000001003694d0>] kobject_release+0x4f0/0x5e0
                                    sp=e0000005062ffdc0 bsp=e0000005062f1270
     [<a00000010036b490>] kref_put+0xd0/0x100
                                    sp=e0000005062ffdc0 bsp=e0000005062f1248
     [<a000000100368650>] kobject_put+0x90/0xc0
                                    sp=e0000005062ffdc0 bsp=e0000005062f1220
     [<a000000100399260>] pci_destroy_slot+0xa0/0xe0
                                    sp=e0000005062ffdc0 bsp=e0000005062f11f0
     [<a0000002044d2c70>] pci_hp_deregister+0x510/0x560 [pci_hotplug]
                                    sp=e0000005062ffdc0 bsp=e0000005062f11a8
     [<a000000205d71aa0>] acpiphp_unregister_hotplug_slot+0x80/0x100 [acpiphp]
                                    sp=e0000005062ffdc0 bsp=e0000005062f1180
     [<a000000205d73d40>] cleanup_bridge+0x3a0/0x4c0 [acpiphp]
                                    sp=e0000005062ffdc0 bsp=e0000005062f1128
     [<a000000205d73ee0>] cleanup_p2p_bridge+0x80/0xc0 [acpiphp]
                                    sp=e0000005062ffdc0 bsp=e0000005062f1108
     [<a0000001003ed200>] acpi_ns_walk_namespace+0x160/0x2e0
                                    sp=e0000005062ffdc0 bsp=e0000005062f1098
     [<a0000001003e8850>] acpi_walk_namespace+0x90/0xe0
                                    sp=e0000005062ffdc0 bsp=e0000005062f1048
     [<a000000205d73f70>] remove_bridge+0x50/0xe0 [acpiphp]
                                    sp=e0000005062ffdc0 bsp=e0000005062f1028
     [<a000000100411590>] acpi_pci_unregister_driver+0x1f0/0x2a0
                                    sp=e0000005062ffdc0 bsp=e0000005062f0fe8
     [<a000000205d759d0>] acpiphp_glue_exit+0x30/0x60 [acpiphp]
                                    sp=e0000005062ffdc0 bsp=e0000005062f0fd0
     [<a000000205d77380>] acpiphp_exit+0x20/0x40 [acpiphp]
                                    sp=e0000005062ffdc0 bsp=e0000005062f0fb8
     [<a0000001000e9310>] sys_delete_module+0x410/0x520
                                    sp=e0000005062ffdc0 bsp=e0000005062f0f38
    
    This is because pci_slot_release will access the parent PCI bus,
    which has already been released by the user's prior hot unplug.
    
    The solution is for pci_create_slot to grab a reference on the
    parent PCI bus (and pci_destroy_slot to put the reference). This
    will prevent the parent from release while the hotplug or slot
    detection driver is loaded.
    
    Reported-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
    Signed-off-by: Alex Chiang <achiang@hp.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Kenji Kaneshige March 27, 2009, 12:02 a.m. UTC | #1
Alex Chiang wrote:
> * Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>:
>> Thank you very much for testing.
>>
>> We still have similar kernel oops (see below) with ACPI pci slot
>> detection driver. I guess the same problem would also occur with
>> acpiphp though I've not tried yet. I don't look at Trent's bus
>> notifier approach yet, but I think we need something like this to
>> fix this problem.
>>
>> Here are steps to reproduce and kernel oops message.
>>
>> * Steps to reproduce
>>
>> (1) Load ACPI pci slot detection driver
>> (2) Remove the parent bridge of the slot
>> (3) Unload ACPI pci slot detection driver
> 
> Thanks for the report.
> 
> I believe this patch will fix the case for bridges. I haven't
> tested what happens if we remove an endpoint yet though.
> 
> Can you try this?
> 
> Thanks.
> 
> /ac
> 
> ---
> commit 557ce38e78cf06ce16aefcf273051ea0bac3d35c
> Author: Alex Chiang <achiang@hp.com>
> Date:   Thu Mar 26 15:36:34 2009 -0600
> 
>     PCI: pci_create_slot / pci_destroy_slot need to grab reference to parent bus
>     
>     If a logical hotunplug (remove) is performed on a PCI bridge claimed by
>     a hotplug or slot detection driver, and then the hotplug/detection module
>     is unloaded, we will encounter an oops:
>     
>     Call Trace:
>      [<a000000100039bc0>] die+0x1c0/0x2c0
>                                     sp=e0000005062ff9e0 bsp=e0000005062f13b0
>      [<a000000100039d00>] die_if_kernel+0x40/0x60
>                                     sp=e0000005062ff9e0 bsp=e0000005062f1380
>      [<a00000010003b590>] ia64_fault+0x1230/0x1280
>                                     sp=e0000005062ff9e0 bsp=e0000005062f1300
>      [<a00000010000c700>] ia64_native_leave_kernel+0x0/0x270
>                                     sp=e0000005062ffbf0 bsp=e0000005062f1300
>      [<a0000001003988f0>] pci_slot_release+0x70/0x1c0
>                                     sp=e0000005062ffdc0 bsp=e0000005062f12b0
>      [<a0000001003694d0>] kobject_release+0x4f0/0x5e0
>                                     sp=e0000005062ffdc0 bsp=e0000005062f1270
>      [<a00000010036b490>] kref_put+0xd0/0x100
>                                     sp=e0000005062ffdc0 bsp=e0000005062f1248
>      [<a000000100368650>] kobject_put+0x90/0xc0
>                                     sp=e0000005062ffdc0 bsp=e0000005062f1220
>      [<a000000100399260>] pci_destroy_slot+0xa0/0xe0
>                                     sp=e0000005062ffdc0 bsp=e0000005062f11f0
>      [<a0000002044d2c70>] pci_hp_deregister+0x510/0x560 [pci_hotplug]
>                                     sp=e0000005062ffdc0 bsp=e0000005062f11a8
>      [<a000000205d71aa0>] acpiphp_unregister_hotplug_slot+0x80/0x100 [acpiphp]
>                                     sp=e0000005062ffdc0 bsp=e0000005062f1180
>      [<a000000205d73d40>] cleanup_bridge+0x3a0/0x4c0 [acpiphp]
>                                     sp=e0000005062ffdc0 bsp=e0000005062f1128
>      [<a000000205d73ee0>] cleanup_p2p_bridge+0x80/0xc0 [acpiphp]
>                                     sp=e0000005062ffdc0 bsp=e0000005062f1108
>      [<a0000001003ed200>] acpi_ns_walk_namespace+0x160/0x2e0
>                                     sp=e0000005062ffdc0 bsp=e0000005062f1098
>      [<a0000001003e8850>] acpi_walk_namespace+0x90/0xe0
>                                     sp=e0000005062ffdc0 bsp=e0000005062f1048
>      [<a000000205d73f70>] remove_bridge+0x50/0xe0 [acpiphp]
>                                     sp=e0000005062ffdc0 bsp=e0000005062f1028
>      [<a000000100411590>] acpi_pci_unregister_driver+0x1f0/0x2a0
>                                     sp=e0000005062ffdc0 bsp=e0000005062f0fe8
>      [<a000000205d759d0>] acpiphp_glue_exit+0x30/0x60 [acpiphp]
>                                     sp=e0000005062ffdc0 bsp=e0000005062f0fd0
>      [<a000000205d77380>] acpiphp_exit+0x20/0x40 [acpiphp]
>                                     sp=e0000005062ffdc0 bsp=e0000005062f0fb8
>      [<a0000001000e9310>] sys_delete_module+0x410/0x520
>                                     sp=e0000005062ffdc0 bsp=e0000005062f0f38
>     
>     This is because pci_slot_release will access the parent PCI bus,
>     which has already been released by the user's prior hot unplug.
>     
>     The solution is for pci_create_slot to grab a reference on the
>     parent PCI bus (and pci_destroy_slot to put the reference). This
>     will prevent the parent from release while the hotplug or slot
>     detection driver is loaded.
>     
>     Reported-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
>     Signed-off-by: Alex Chiang <achiang@hp.com>
> 
> diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
> index 2118944..459d6a2 100644
> --- a/drivers/pci/slot.c
> +++ b/drivers/pci/slot.c
> @@ -248,6 +248,7 @@ placeholder:
>  		if (PCI_SLOT(dev->devfn) == slot_nr)
>  			dev->slot = slot;
>  
> +	get_device(&parent->dev);
>  	dev_dbg(&parent->dev, "dev %02x, created physical slot %s\n",
>  		slot_nr, pci_slot_name(slot));
>  
> @@ -302,6 +303,7 @@ void pci_destroy_slot(struct pci_slot *slot)
>  		slot->number, atomic_read(&slot->kobj.kref.refcount) - 1);
>  
>  	down_write(&pci_bus_sem);
> +	put_device(&slot->bus->dev);
>  	kobject_put(&slot->kobj);
>  	up_write(&pci_bus_sem);
>  }
> 

I've not tried your patch yet, but I don't think it works because
pci_create_slot() can be executed by some hotplug drivers (pciehp,
shpchp, ...) before parent->dev is initialized.

Anyway, I'll try it and report the result as soon as possible.

Thanks,
Kenji Kaneshige


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kenji Kaneshige March 27, 2009, 12:47 a.m. UTC | #2
Kenji Kaneshige wrote:
> Alex Chiang wrote:
>> * Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>:
>>> Thank you very much for testing.
>>>
>>> We still have similar kernel oops (see below) with ACPI pci slot
>>> detection driver. I guess the same problem would also occur with
>>> acpiphp though I've not tried yet. I don't look at Trent's bus
>>> notifier approach yet, but I think we need something like this to
>>> fix this problem.
>>>
>>> Here are steps to reproduce and kernel oops message.
>>>
>>> * Steps to reproduce
>>>
>>> (1) Load ACPI pci slot detection driver
>>> (2) Remove the parent bridge of the slot
>>> (3) Unload ACPI pci slot detection driver
>>
>> Thanks for the report.
>>
>> I believe this patch will fix the case for bridges. I haven't
>> tested what happens if we remove an endpoint yet though.
>>
>> Can you try this?
>>
>> Thanks.
>>
>> /ac
>>
>> ---
>> commit 557ce38e78cf06ce16aefcf273051ea0bac3d35c
>> Author: Alex Chiang <achiang@hp.com>
>> Date:   Thu Mar 26 15:36:34 2009 -0600
>>
>>     PCI: pci_create_slot / pci_destroy_slot need to grab reference to 
>> parent bus
>>         If a logical hotunplug (remove) is performed on a PCI bridge 
>> claimed by
>>     a hotplug or slot detection driver, and then the hotplug/detection 
>> module
>>     is unloaded, we will encounter an oops:
>>         Call Trace:
>>      [<a000000100039bc0>] die+0x1c0/0x2c0
>>                                     sp=e0000005062ff9e0 
>> bsp=e0000005062f13b0
>>      [<a000000100039d00>] die_if_kernel+0x40/0x60
>>                                     sp=e0000005062ff9e0 
>> bsp=e0000005062f1380
>>      [<a00000010003b590>] ia64_fault+0x1230/0x1280
>>                                     sp=e0000005062ff9e0 
>> bsp=e0000005062f1300
>>      [<a00000010000c700>] ia64_native_leave_kernel+0x0/0x270
>>                                     sp=e0000005062ffbf0 
>> bsp=e0000005062f1300
>>      [<a0000001003988f0>] pci_slot_release+0x70/0x1c0
>>                                     sp=e0000005062ffdc0 
>> bsp=e0000005062f12b0
>>      [<a0000001003694d0>] kobject_release+0x4f0/0x5e0
>>                                     sp=e0000005062ffdc0 
>> bsp=e0000005062f1270
>>      [<a00000010036b490>] kref_put+0xd0/0x100
>>                                     sp=e0000005062ffdc0 
>> bsp=e0000005062f1248
>>      [<a000000100368650>] kobject_put+0x90/0xc0
>>                                     sp=e0000005062ffdc0 
>> bsp=e0000005062f1220
>>      [<a000000100399260>] pci_destroy_slot+0xa0/0xe0
>>                                     sp=e0000005062ffdc0 
>> bsp=e0000005062f11f0
>>      [<a0000002044d2c70>] pci_hp_deregister+0x510/0x560 [pci_hotplug]
>>                                     sp=e0000005062ffdc0 
>> bsp=e0000005062f11a8
>>      [<a000000205d71aa0>] acpiphp_unregister_hotplug_slot+0x80/0x100 
>> [acpiphp]
>>                                     sp=e0000005062ffdc0 
>> bsp=e0000005062f1180
>>      [<a000000205d73d40>] cleanup_bridge+0x3a0/0x4c0 [acpiphp]
>>                                     sp=e0000005062ffdc0 
>> bsp=e0000005062f1128
>>      [<a000000205d73ee0>] cleanup_p2p_bridge+0x80/0xc0 [acpiphp]
>>                                     sp=e0000005062ffdc0 
>> bsp=e0000005062f1108
>>      [<a0000001003ed200>] acpi_ns_walk_namespace+0x160/0x2e0
>>                                     sp=e0000005062ffdc0 
>> bsp=e0000005062f1098
>>      [<a0000001003e8850>] acpi_walk_namespace+0x90/0xe0
>>                                     sp=e0000005062ffdc0 
>> bsp=e0000005062f1048
>>      [<a000000205d73f70>] remove_bridge+0x50/0xe0 [acpiphp]
>>                                     sp=e0000005062ffdc0 
>> bsp=e0000005062f1028
>>      [<a000000100411590>] acpi_pci_unregister_driver+0x1f0/0x2a0
>>                                     sp=e0000005062ffdc0 
>> bsp=e0000005062f0fe8
>>      [<a000000205d759d0>] acpiphp_glue_exit+0x30/0x60 [acpiphp]
>>                                     sp=e0000005062ffdc0 
>> bsp=e0000005062f0fd0
>>      [<a000000205d77380>] acpiphp_exit+0x20/0x40 [acpiphp]
>>                                     sp=e0000005062ffdc0 
>> bsp=e0000005062f0fb8
>>      [<a0000001000e9310>] sys_delete_module+0x410/0x520
>>                                     sp=e0000005062ffdc0 
>> bsp=e0000005062f0f38
>>         This is because pci_slot_release will access the parent PCI bus,
>>     which has already been released by the user's prior hot unplug.
>>         The solution is for pci_create_slot to grab a reference on the
>>     parent PCI bus (and pci_destroy_slot to put the reference). This
>>     will prevent the parent from release while the hotplug or slot
>>     detection driver is loaded.
>>         Reported-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
>>     Signed-off-by: Alex Chiang <achiang@hp.com>
>>
>> diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
>> index 2118944..459d6a2 100644
>> --- a/drivers/pci/slot.c
>> +++ b/drivers/pci/slot.c
>> @@ -248,6 +248,7 @@ placeholder:
>>          if (PCI_SLOT(dev->devfn) == slot_nr)
>>              dev->slot = slot;
>>  
>> +    get_device(&parent->dev);
>>      dev_dbg(&parent->dev, "dev %02x, created physical slot %s\n",
>>          slot_nr, pci_slot_name(slot));
>>  
>> @@ -302,6 +303,7 @@ void pci_destroy_slot(struct pci_slot *slot)
>>          slot->number, atomic_read(&slot->kobj.kref.refcount) - 1);
>>  
>>      down_write(&pci_bus_sem);
>> +    put_device(&slot->bus->dev);
>>      kobject_put(&slot->kobj);
>>      up_write(&pci_bus_sem);
>>  }
>>
> 
> I've not tried your patch yet, but I don't think it works because
> pci_create_slot() can be executed by some hotplug drivers (pciehp,
> shpchp, ...) before parent->dev is initialized.
> 
> Anyway, I'll try it and report the result as soon as possible.
> 
> Thanks,
> Kenji Kaneshige
> 
> 

I tried your patch. The result is that the patch doesn't fix anything
and introduces new kernel oops (see below). I can reproduce the new
kernel oops by the following steps.

(1) Load pciehp
(2) Remove parent bridge of the slot
(3) Add parent bridge of the slot

I think the reason why your patch doesn't fix the problem is the place
of put_device() is wrong. It needs to be placed after kobject_put(),
doesn't it?

And the reason why your patch introduce the new kernel oops is
get_device(&parent->dev) is executed before parent->dev is initialized,
as I mentioned in the previous e-mail.

[  144.362098] ------------[ cut here ]------------
[  144.362239] WARNING: at lib/kref.c:43 kref_get+0x2f/0x40()
[  144.362382] Hardware name: PRIMERGY
[  144.362524] Modules linked in: pciehp ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod sbs sbshc pci_slot battery ac parport_pc lp parport mptspi mptscsih mptbase scsi_transport_spi e1000e sg sr_mod cdrom button serio_raw i2c_i801 i2c_core shpchp pcspkr ata_piix libata megaraid_sas sd_mod scsi_mod crc_t10dif ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
[  144.365747] Pid: 68, comm: work_on_cpu/0 Not tainted 2.6.29-rc8-kk #3
[  144.365892] Call Trace:
[  144.366039]  [<ffffffff8024507c>] warn_slowpath+0xdc/0x110
[  144.366185]  [<ffffffff8033ef9b>] ? sysfs_add_file_mode+0x6b/0xa0
[  144.366331]  [<ffffffff8033efdc>] ? sysfs_add_file+0xc/0x10
[  144.366475]  [<ffffffff8033f00b>] ? sysfs_create_file+0x2b/0x40
[  144.366622]  [<ffffffff803a7bf1>] ? kobject_add_internal+0x141/0x250
[  144.366769]  [<ffffffff803a7e0d>] ? kobject_add_varg+0x5d/0x70
[  144.366917]  [<ffffffff80567466>] ? _spin_unlock+0x26/0x30
[  144.367070]  [<ffffffff803a8353>] ? kset_find_obj+0x23/0x90
[  144.367213]  [<ffffffff803a8395>] ? kset_find_obj+0x65/0x90
[  144.367355]  [<ffffffff803a8d5f>] kref_get+0x2f/0x40
[  144.367497]  [<ffffffff803a7a9a>] kobject_get+0x1a/0x30
[  144.367641]  [<ffffffff8043df27>] get_device+0x17/0x20
[  144.367784]  [<ffffffff803bef5f>] pci_create_slot+0x1af/0x280
[  144.367929]  [<ffffffff803c75cf>] pci_hp_register+0x8f/0x320
[  144.368087]  [<ffffffffa031aa70>] pciehp_probe+0xb0/0x4d0 [pciehp]
[  144.368234]  [<ffffffff803c5354>] pcie_port_probe_service+0x54/0x90
[  144.368381]  [<ffffffff804421de>] driver_probe_device+0xbe/0x2e0
[  144.368526]  [<ffffffff804424a0>] ? __device_attach+0x0/0x10
[  144.368670]  [<ffffffff804424a9>] __device_attach+0x9/0x10
[  144.368813]  [<ffffffff80440ee8>] bus_for_each_drv+0x68/0x90
[  144.368957]  [<ffffffff80442578>] device_attach+0x88/0x90
[  144.369110]  [<ffffffff80440e5b>] bus_attach_device+0x5b/0x80
[  144.369255]  [<ffffffff8043ea0a>] device_add+0x40a/0x690
[  144.369399]  [<ffffffff8043eca9>] device_register+0x19/0x20
[  144.369544]  [<ffffffff803c51ec>] pcie_port_device_register+0x3fc/0x510
[  144.369694]  [<ffffffff80257410>] ? do_work_for_cpu+0x0/0x20
[  144.369839]  [<ffffffff80257410>] ? do_work_for_cpu+0x0/0x20
[  144.369985]  [<ffffffff80554fc5>] pcie_portdrv_probe+0x35/0xa0
[  144.370140]  [<ffffffff803bf232>] local_pci_probe+0x12/0x20
[  144.370283]  [<ffffffff80257423>] do_work_for_cpu+0x13/0x20
[  144.371678]  [<ffffffff80257559>] ? run_workqueue+0x19/0x230
[  144.371822]  [<ffffffff8025769a>] run_workqueue+0x15a/0x230
[  144.371965]  [<ffffffff80257648>] ? run_workqueue+0x108/0x230
[  144.372114]  [<ffffffff8025846f>] worker_thread+0x9f/0x100
[  144.372258]  [<ffffffff8025bce0>] ? autoremove_wake_function+0x0/0x40
[  144.372404]  [<ffffffff802583d0>] ? worker_thread+0x0/0x100
[  144.372547]  [<ffffffff8025b89d>] kthread+0x4d/0x80
[  144.372689]  [<ffffffff8020d4ba>] child_rip+0xa/0x20
[  144.372831]  [<ffffffff8020cebc>] ? restore_args+0x0/0x30
[  144.372975]  [<ffffffff8025b850>] ? kthread+0x0/0x80
[  144.373126]  [<ffffffff8020d4b0>] ? child_rip+0x0/0x20
[  144.373269] ---[ end trace 43bc571356db47af ]---

Thanks,
Kenji Kaneshige

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Chiang March 27, 2009, 3:52 a.m. UTC | #3
* Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>:
> Kenji Kaneshige wrote:
>> Alex Chiang wrote:
>>> diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
>>> index 2118944..459d6a2 100644
>>> --- a/drivers/pci/slot.c
>>> +++ b/drivers/pci/slot.c
>>> @@ -248,6 +248,7 @@ placeholder:
>>>          if (PCI_SLOT(dev->devfn) == slot_nr)
>>>              dev->slot = slot;
>>>  +    get_device(&parent->dev);
>>>      dev_dbg(&parent->dev, "dev %02x, created physical slot %s\n",
>>>          slot_nr, pci_slot_name(slot));
>>>  @@ -302,6 +303,7 @@ void pci_destroy_slot(struct pci_slot *slot)
>>>          slot->number, atomic_read(&slot->kobj.kref.refcount) - 1);
>>>       down_write(&pci_bus_sem);
>>> +    put_device(&slot->bus->dev);
>>>      kobject_put(&slot->kobj);
>>>      up_write(&pci_bus_sem);
>>>  }
>>>
>>
>> I've not tried your patch yet, but I don't think it works because
>> pci_create_slot() can be executed by some hotplug drivers (pciehp,
>> shpchp, ...) before parent->dev is initialized.
>>
>> Anyway, I'll try it and report the result as soon as possible.
>>
>
> I tried your patch. The result is that the patch doesn't fix anything
> and introduces new kernel oops (see below). I can reproduce the new
> kernel oops by the following steps.
>
> (1) Load pciehp
> (2) Remove parent bridge of the slot
> (3) Add parent bridge of the slot
>
> I think the reason why your patch doesn't fix the problem is the place
> of put_device() is wrong. It needs to be placed after kobject_put(),
> doesn't it?

You are right.

> And the reason why your patch introduce the new kernel oops is
> get_device(&parent->dev) is executed before parent->dev is initialized,
> as I mentioned in the previous e-mail.

Sigh, you are right again. I tested with acpiphp which does not
have that issue. Sorry for the noise.

I'll try and think of something else. :-/

Thanks.

/ac

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kenji Kaneshige March 27, 2009, 4:10 a.m. UTC | #4
Alex Chiang wrote:
> * Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>:
>> Kenji Kaneshige wrote:
>>> Alex Chiang wrote:
>>>> diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
>>>> index 2118944..459d6a2 100644
>>>> --- a/drivers/pci/slot.c
>>>> +++ b/drivers/pci/slot.c
>>>> @@ -248,6 +248,7 @@ placeholder:
>>>>          if (PCI_SLOT(dev->devfn) == slot_nr)
>>>>              dev->slot = slot;
>>>>  +    get_device(&parent->dev);
>>>>      dev_dbg(&parent->dev, "dev %02x, created physical slot %s\n",
>>>>          slot_nr, pci_slot_name(slot));
>>>>  @@ -302,6 +303,7 @@ void pci_destroy_slot(struct pci_slot *slot)
>>>>          slot->number, atomic_read(&slot->kobj.kref.refcount) - 1);
>>>>       down_write(&pci_bus_sem);
>>>> +    put_device(&slot->bus->dev);
>>>>      kobject_put(&slot->kobj);
>>>>      up_write(&pci_bus_sem);
>>>>  }
>>>>
>>> I've not tried your patch yet, but I don't think it works because
>>> pci_create_slot() can be executed by some hotplug drivers (pciehp,
>>> shpchp, ...) before parent->dev is initialized.
>>>
>>> Anyway, I'll try it and report the result as soon as possible.
>>>
>> I tried your patch. The result is that the patch doesn't fix anything
>> and introduces new kernel oops (see below). I can reproduce the new
>> kernel oops by the following steps.
>>
>> (1) Load pciehp
>> (2) Remove parent bridge of the slot
>> (3) Add parent bridge of the slot
>>
>> I think the reason why your patch doesn't fix the problem is the place
>> of put_device() is wrong. It needs to be placed after kobject_put(),
>> doesn't it?
> 
> You are right.
> 
>> And the reason why your patch introduce the new kernel oops is
>> get_device(&parent->dev) is executed before parent->dev is initialized,
>> as I mentioned in the previous e-mail.
> 
> Sigh, you are right again. I tested with acpiphp which does not
> have that issue. Sorry for the noise.
> 

No problem. Please feel free to ask me for testing.

Thanks,
Kenji Kaneshige



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
index 2118944..459d6a2 100644
--- a/drivers/pci/slot.c
+++ b/drivers/pci/slot.c
@@ -248,6 +248,7 @@  placeholder:
 		if (PCI_SLOT(dev->devfn) == slot_nr)
 			dev->slot = slot;
 
+	get_device(&parent->dev);
 	dev_dbg(&parent->dev, "dev %02x, created physical slot %s\n",
 		slot_nr, pci_slot_name(slot));
 
@@ -302,6 +303,7 @@  void pci_destroy_slot(struct pci_slot *slot)
 		slot->number, atomic_read(&slot->kobj.kref.refcount) - 1);
 
 	down_write(&pci_bus_sem);
+	put_device(&slot->bus->dev);
 	kobject_put(&slot->kobj);
 	up_write(&pci_bus_sem);
 }