diff mbox

[v5,09/13] PCI: Introduce /sys/bus/pci/devices/.../remove

Message ID 49C9BBD7.4040705@jp.fujitsu.com
State Superseded, archived
Headers show

Commit Message

Kenji Kaneshige March 25, 2009, 5:06 a.m. UTC
Alex Chiang wrote:
> * Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>:
>> I still have the following kernel error messages in testing with your
>> latest set of patches (Jesse's linux-next). The test case is removing
>> e1000e device or its parent bridge by "echo 1 > /sys/bus/pci/devices/
>> .../remove".
>>
>> [  537.379995] =============================================
>> [  537.380124] [ INFO: possible recursive locking detected ]
>> [  537.380128] 2.6.29-rc8-kk #1
>> [  537.380128] ---------------------------------------------
>> [  537.380128] events/4/56 is trying to acquire lock:
>> [  537.380128]  (events){--..}, at: [<ffffffff80257fc0>] flush_workqueue+0x0/0xa0
>> [  537.380128]
>> [  537.380128] but task is already holding lock:
>> [  537.380128]  (events){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
>> [  537.380128]
>> [  537.380128] other info that might help us debug this:
>> [  537.380128] 3 locks held by events/4/56:
>> [  537.380128]  #0:  (events){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
>> [  537.380128]  #1:  (&ss->work){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
>> [  537.380128]  #2:  (pci_remove_rescan_mutex){--..}, at: [<ffffffff803c10d1>] remove_callback+0x21/0x40
> 
> I still cannot reproduce this lockdep issue, even using your
> .config with an e1000e device on an x86_64 kernel. :(
> 
> I tried removing the endpoint, an intermediate bridge device, and
> the parent bus. I don't know what I'm doing wrong...
> 

I don't know either...
The reproducibility is 100% on my environment. The steps are
just boot the system and remove the device.

> Can you please try this patch though, and see if it fixes the
> warning? It applies on top of my other sysfs patch that
> introduces a mutex in sysfs_schedule_callback.

Anyway, I confirmed the kernel error messages were gone with
the patch against sysfs. Note that I used the following patch
I made for testing instead since your patch could not be
applied to Jesse's linux-next. 

Thanks,
Kenji Kaneshige



 fs/sysfs/file.c |   14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Alexander Chiang March 25, 2009, 5:20 a.m. UTC | #1
* Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>:
> Alex Chiang wrote:
> > * Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>:
> >> I still have the following kernel error messages in testing with your
> >> latest set of patches (Jesse's linux-next). The test case is removing
> >> e1000e device or its parent bridge by "echo 1 > /sys/bus/pci/devices/
> >> .../remove".
> >>
> >> [  537.379995] =============================================
> >> [  537.380124] [ INFO: possible recursive locking detected ]
> >> [  537.380128] 2.6.29-rc8-kk #1
> >> [  537.380128] ---------------------------------------------
> >> [  537.380128] events/4/56 is trying to acquire lock:
> >> [  537.380128]  (events){--..}, at: [<ffffffff80257fc0>] flush_workqueue+0x0/0xa0
> >> [  537.380128]
> >> [  537.380128] but task is already holding lock:
> >> [  537.380128]  (events){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
> >> [  537.380128]
> >> [  537.380128] other info that might help us debug this:
> >> [  537.380128] 3 locks held by events/4/56:
> >> [  537.380128]  #0:  (events){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
> >> [  537.380128]  #1:  (&ss->work){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
> >> [  537.380128]  #2:  (pci_remove_rescan_mutex){--..}, at: [<ffffffff803c10d1>] remove_callback+0x21/0x40
> > 
> > I still cannot reproduce this lockdep issue, even using your
> > .config with an e1000e device on an x86_64 kernel. :(
> > 
> > I tried removing the endpoint, an intermediate bridge device, and
> > the parent bus. I don't know what I'm doing wrong...
> > 
> 
> I don't know either...
> The reproducibility is 100% on my environment. The steps are
> just boot the system and remove the device.
> 
> > Can you please try this patch though, and see if it fixes the
> > warning? It applies on top of my other sysfs patch that
> > introduces a mutex in sysfs_schedule_callback.
> 
> Anyway, I confirmed the kernel error messages were gone with
> the patch against sysfs. Note that I used the following patch
> I made for testing instead since your patch could not be
> applied to Jesse's linux-next. 

Great, thank you for testing Kenji-san.

/ac

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kenji Kaneshige March 25, 2009, 5:39 a.m. UTC | #2
Alex Chiang wrote:
> * Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>:
>> Alex Chiang wrote:
>>> * Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>:
>>>> I still have the following kernel error messages in testing with your
>>>> latest set of patches (Jesse's linux-next). The test case is removing
>>>> e1000e device or its parent bridge by "echo 1 > /sys/bus/pci/devices/
>>>> .../remove".
>>>>
>>>> [  537.379995] =============================================
>>>> [  537.380124] [ INFO: possible recursive locking detected ]
>>>> [  537.380128] 2.6.29-rc8-kk #1
>>>> [  537.380128] ---------------------------------------------
>>>> [  537.380128] events/4/56 is trying to acquire lock:
>>>> [  537.380128]  (events){--..}, at: [<ffffffff80257fc0>] flush_workqueue+0x0/0xa0
>>>> [  537.380128]
>>>> [  537.380128] but task is already holding lock:
>>>> [  537.380128]  (events){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
>>>> [  537.380128]
>>>> [  537.380128] other info that might help us debug this:
>>>> [  537.380128] 3 locks held by events/4/56:
>>>> [  537.380128]  #0:  (events){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
>>>> [  537.380128]  #1:  (&ss->work){--..}, at: [<ffffffff80257648>] run_workqueue+0x108/0x230
>>>> [  537.380128]  #2:  (pci_remove_rescan_mutex){--..}, at: [<ffffffff803c10d1>] remove_callback+0x21/0x40
>>> I still cannot reproduce this lockdep issue, even using your
>>> .config with an e1000e device on an x86_64 kernel. :(
>>>
>>> I tried removing the endpoint, an intermediate bridge device, and
>>> the parent bus. I don't know what I'm doing wrong...
>>>
>> I don't know either...
>> The reproducibility is 100% on my environment. The steps are
>> just boot the system and remove the device.
>>
>>> Can you please try this patch though, and see if it fixes the
>>> warning? It applies on top of my other sysfs patch that
>>> introduces a mutex in sysfs_schedule_callback.
>> Anyway, I confirmed the kernel error messages were gone with
>> the patch against sysfs. Note that I used the following patch
>> I made for testing instead since your patch could not be
>> applied to Jesse's linux-next. 
> 
> Great, thank you for testing Kenji-san.
>

You're welcome.

Just in case, my patch is just for testing, and it is very buggy
(no destroy operation, lack of module_put() in error code path,
and so on). Please consider it as just for testing.

Thanks,
Kenji Kaneshige



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-next-20090323/fs/sysfs/file.c
===================================================================
--- linux-next-20090323.orig/fs/sysfs/file.c	2009-03-25 12:09:37.000000000 +0900
+++ linux-next-20090323/fs/sysfs/file.c	2009-03-25 13:40:10.000000000 +0900
@@ -677,6 +677,7 @@ 
 	kfree(ss);
 }
 
+static struct workqueue_struct *sysfsd_wq;
 /**
  * sysfs_schedule_callback - helper to schedule a callback for a kobject
  * @kobj: object we're acting for.
@@ -704,6 +705,17 @@ 
 
 	if (!try_module_get(owner))
 		return -ENODEV;
+
+	if (!sysfsd_wq) {
+		sysfsd_wq = create_workqueue("sysfsd");
+		if (!sysfsd_wq) {
+			printk(KERN_ERR
+			       "%s: Could not create workqueue\n", __func__);
+			WARN_ON(1);
+			return -ENOMEM;
+		}
+	}
+
 	ss = kmalloc(sizeof(*ss), GFP_KERNEL);
 	if (!ss) {
 		module_put(owner);
@@ -715,7 +727,7 @@ 
 	ss->data = data;
 	ss->owner = owner;
 	INIT_WORK(&ss->work, sysfs_schedule_callback_work);
-	schedule_work(&ss->work);
+	queue_work(sysfsd_wq, &ss->work);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(sysfs_schedule_callback);