Message ID | 2301639.HY0nSRLtah@vostro.rjw.lan (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
2013/6/14 Rafael J. Wysocki <rjw@sisk.pl>: > What about the appended patch (on top of [1/9], untested)? > > Rafael Sorry, I have lost the track of which patches, on top of 3.10-rc5, I should include in my testing and which I shouldn't. Could you please restore my understanding? I.e., please provide a full list of LKML or Bugzilla links to patches which I should test during the next boot of the laptop. Thanks in advance. > --- > drivers/pci/hotplug/acpiphp_glue.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > Index: linux-pm/drivers/pci/hotplug/acpiphp_glue.c > =================================================================== > --- linux-pm.orig/drivers/pci/hotplug/acpiphp_glue.c > +++ linux-pm/drivers/pci/hotplug/acpiphp_glue.c > @@ -145,9 +145,20 @@ static int post_dock_fixups(struct notif > return NOTIFY_OK; > } > > +static void handle_dock_event_func(acpi_handle handle, u32 event, void *context) > +{ > + if (event == ACPI_NOTIFY_EJECT_REQUEST) { > + struct acpiphp_func *func = context; > + > + if (!acpiphp_disable_slot(func->slot)) > + acpiphp_eject_slot(func->slot); > + } else { > + handle_hotplug_event_func(handle, event, context); > + } > +} > > static const struct acpi_dock_ops acpiphp_dock_ops = { > - .handler = handle_hotplug_event_func, > + .handler = handle_dock_event_func, > }; > > /* Check whether the PCI device is managed by native PCIe hotplug driver */ > -- Alexander E. Patrakov -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday, June 14, 2013 06:30:58 PM Alexander E. Patrakov wrote: > 2013/6/14 Rafael J. Wysocki <rjw@sisk.pl>: > > > What about the appended patch (on top of [1/9], untested)? > > > > Rafael > > Sorry, I have lost the track of which patches, on top of 3.10-rc5, I > should include in my testing and which I shouldn't. Could you please > restore my understanding? I.e., please provide a full list of LKML or > Bugzilla links to patches which I should test during the next boot of > the laptop. > > Thanks in advance. As far as I'm concerned, please test these two for now: https://patchwork.kernel.org/patch/2717851/ https://patchwork.kernel.org/patch/2721271/ They are targeted at the ordering problems only, however, so they won't address the resource allocation issues you're seeing. There are some other patches Jiang Liu and Yinghai are working on as far as I can say, but I'm not sure what their status is at the moment. Thanks, Rafael
On 06/14/2013 08:23 PM, Rafael J. Wysocki wrote: > On Thursday, June 13, 2013 09:59:44 PM Rafael J. Wysocki wrote: >> On Friday, June 14, 2013 12:32:25 AM Jiang Liu wrote: >>> Current ACPI glue logic expects that physical devices are destroyed >>> before destroying companion ACPI devices, otherwise it will break the >>> ACPI unbind logic and cause following warning messages: >>> [ 185.026073] usb usb5: Oops, 'acpi_handle' corrupt >>> [ 185.035150] pci 0000:1b:00.0: Oops, 'acpi_handle' corrupt >>> [ 185.035515] pci 0000:18:02.0: Oops, 'acpi_handle' corrupt >>> [ 180.013656] port1: Oops, 'acpi_handle' corrupt >>> Please refer to https://bugzilla.kernel.org/attachment.cgi?id=104321 >>> for full log message. >> >> So my question is, did we have this problem before commit 3b63aaa70e1? >> >> If we did, then when did it start? Or was it present forever? >> >>> Above warning messages are caused by following scenario: >>> 1) acpi_dock_notifier_call() queues a task (T1) onto kacpi_hotplug_wq >>> 2) kacpi_hotplug_wq handles T1, which invokes acpi_dock_deferred_cb() >>> ->dock_notify()-> handle_eject_request()->hotplug_dock_devices() >>> 3) hotplug_dock_devices() first invokes registered hotplug callbacks to >>> destroy physical devices, then destroys all affected ACPI devices. >>> Everything seems perfect until now. But the acpiphp dock notification >>> handler will queue another task (T2) onto kacpi_hotplug_wq to really >>> destroy affected physical devices. >> >> Would not the solution be to modify it so that it didn't spawn the other >> task (T2), but removed the affected physical devices synchronously? >> >>> 4) kacpi_hotplug_wq finishes T1, and all affected ACPI devices have >>> been destroyed. >>> 5) kacpi_hotplug_wq handles T2, which destroys all affected physical >>> devices. >>> >>> So it breaks ACPI glue logic's expection because ACPI devices are destroyed >>> in step 3 and physical devices are destroyed in step 5. >>> >>> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> >>> Reported-by: Alexander E. Patrakov <patrakov@gmail.com> >>> Cc: Bjorn Helgaas <bhelgaas@google.com> >>> Cc: Yinghai Lu <yinghai@kernel.org> >>> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> >>> Cc: linux-pci@vger.kernel.org >>> Cc: linux-kernel@vger.kernel.org >>> Cc: stable@vger.kernel.org >>> --- >>> Hi Bjorn and Rafael, >>> The recursive lock changes haven't been tested yet, need help >>> from Alexander for testing. >> >> Well, let's just say I'm not a fan of recursive locks. Is that unavoidable >> here? > > What about the appended patch (on top of [1/9], untested)? > > Rafael It should have similar effect as patch 2/9, and it will encounter the same deadlock scenario as 2/9 too. > > > --- > drivers/pci/hotplug/acpiphp_glue.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > Index: linux-pm/drivers/pci/hotplug/acpiphp_glue.c > =================================================================== > --- linux-pm.orig/drivers/pci/hotplug/acpiphp_glue.c > +++ linux-pm/drivers/pci/hotplug/acpiphp_glue.c > @@ -145,9 +145,20 @@ static int post_dock_fixups(struct notif > return NOTIFY_OK; > } > > +static void handle_dock_event_func(acpi_handle handle, u32 event, void *context) > +{ > + if (event == ACPI_NOTIFY_EJECT_REQUEST) { > + struct acpiphp_func *func = context; > + > + if (!acpiphp_disable_slot(func->slot)) > + acpiphp_eject_slot(func->slot); > + } else { > + handle_hotplug_event_func(handle, event, context); > + } > +} > > static const struct acpi_dock_ops acpiphp_dock_ops = { > - .handler = handle_hotplug_event_func, > + .handler = handle_dock_event_func, > }; > > /* Check whether the PCI device is managed by native PCIe hotplug driver */ > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday, June 14, 2013 09:57:15 PM Jiang Liu wrote: > On 06/14/2013 08:23 PM, Rafael J. Wysocki wrote: > > On Thursday, June 13, 2013 09:59:44 PM Rafael J. Wysocki wrote: > >> On Friday, June 14, 2013 12:32:25 AM Jiang Liu wrote: > >>> Current ACPI glue logic expects that physical devices are destroyed > >>> before destroying companion ACPI devices, otherwise it will break the > >>> ACPI unbind logic and cause following warning messages: > >>> [ 185.026073] usb usb5: Oops, 'acpi_handle' corrupt > >>> [ 185.035150] pci 0000:1b:00.0: Oops, 'acpi_handle' corrupt > >>> [ 185.035515] pci 0000:18:02.0: Oops, 'acpi_handle' corrupt > >>> [ 180.013656] port1: Oops, 'acpi_handle' corrupt > >>> Please refer to https://bugzilla.kernel.org/attachment.cgi?id=104321 > >>> for full log message. > >> > >> So my question is, did we have this problem before commit 3b63aaa70e1? > >> > >> If we did, then when did it start? Or was it present forever? > >> > >>> Above warning messages are caused by following scenario: > >>> 1) acpi_dock_notifier_call() queues a task (T1) onto kacpi_hotplug_wq > >>> 2) kacpi_hotplug_wq handles T1, which invokes acpi_dock_deferred_cb() > >>> ->dock_notify()-> handle_eject_request()->hotplug_dock_devices() > >>> 3) hotplug_dock_devices() first invokes registered hotplug callbacks to > >>> destroy physical devices, then destroys all affected ACPI devices. > >>> Everything seems perfect until now. But the acpiphp dock notification > >>> handler will queue another task (T2) onto kacpi_hotplug_wq to really > >>> destroy affected physical devices. > >> > >> Would not the solution be to modify it so that it didn't spawn the other > >> task (T2), but removed the affected physical devices synchronously? > >> > >>> 4) kacpi_hotplug_wq finishes T1, and all affected ACPI devices have > >>> been destroyed. > >>> 5) kacpi_hotplug_wq handles T2, which destroys all affected physical > >>> devices. > >>> > >>> So it breaks ACPI glue logic's expection because ACPI devices are destroyed > >>> in step 3 and physical devices are destroyed in step 5. > >>> > >>> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> > >>> Reported-by: Alexander E. Patrakov <patrakov@gmail.com> > >>> Cc: Bjorn Helgaas <bhelgaas@google.com> > >>> Cc: Yinghai Lu <yinghai@kernel.org> > >>> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> > >>> Cc: linux-pci@vger.kernel.org > >>> Cc: linux-kernel@vger.kernel.org > >>> Cc: stable@vger.kernel.org > >>> --- > >>> Hi Bjorn and Rafael, > >>> The recursive lock changes haven't been tested yet, need help > >>> from Alexander for testing. > >> > >> Well, let's just say I'm not a fan of recursive locks. Is that unavoidable > >> here? > > > > What about the appended patch (on top of [1/9], untested)? > > > > Rafael > It should have similar effect as patch 2/9, and it will encounter the > same deadlock scenario as 2/9 too. And why exactly? I'm looking at acpiphp_disable_slot() and I'm not seeing where the problematic lock is taken. Similarly for power_off_slot(). It should take the ACPI scan lock, but that's a different matter. Thanks, Rafael > > --- > > drivers/pci/hotplug/acpiphp_glue.c | 13 ++++++++++++- > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > Index: linux-pm/drivers/pci/hotplug/acpiphp_glue.c > > =================================================================== > > --- linux-pm.orig/drivers/pci/hotplug/acpiphp_glue.c > > +++ linux-pm/drivers/pci/hotplug/acpiphp_glue.c > > @@ -145,9 +145,20 @@ static int post_dock_fixups(struct notif > > return NOTIFY_OK; > > } > > > > +static void handle_dock_event_func(acpi_handle handle, u32 event, void *context) > > +{ > > + if (event == ACPI_NOTIFY_EJECT_REQUEST) { > > + struct acpiphp_func *func = context; > > + > > + if (!acpiphp_disable_slot(func->slot)) > > + acpiphp_eject_slot(func->slot); > > + } else { > > + handle_hotplug_event_func(handle, event, context); > > + } > > +} > > > > static const struct acpi_dock_ops acpiphp_dock_ops = { > > - .handler = handle_hotplug_event_func, > > + .handler = handle_dock_event_func, > > }; > > > > /* Check whether the PCI device is managed by native PCIe hotplug driver */ > > >
On 06/14/2013 10:12 PM, Rafael J. Wysocki wrote: > On Friday, June 14, 2013 09:57:15 PM Jiang Liu wrote: >> On 06/14/2013 08:23 PM, Rafael J. Wysocki wrote: >>> On Thursday, June 13, 2013 09:59:44 PM Rafael J. Wysocki wrote: >>>> On Friday, June 14, 2013 12:32:25 AM Jiang Liu wrote: >>>>> Current ACPI glue logic expects that physical devices are destroyed >>>>> before destroying companion ACPI devices, otherwise it will break the >>>>> ACPI unbind logic and cause following warning messages: >>>>> [ 185.026073] usb usb5: Oops, 'acpi_handle' corrupt >>>>> [ 185.035150] pci 0000:1b:00.0: Oops, 'acpi_handle' corrupt >>>>> [ 185.035515] pci 0000:18:02.0: Oops, 'acpi_handle' corrupt >>>>> [ 180.013656] port1: Oops, 'acpi_handle' corrupt >>>>> Please refer to https://bugzilla.kernel.org/attachment.cgi?id=104321 >>>>> for full log message. >>>> >>>> So my question is, did we have this problem before commit 3b63aaa70e1? >>>> >>>> If we did, then when did it start? Or was it present forever? >>>> >>>>> Above warning messages are caused by following scenario: >>>>> 1) acpi_dock_notifier_call() queues a task (T1) onto kacpi_hotplug_wq >>>>> 2) kacpi_hotplug_wq handles T1, which invokes acpi_dock_deferred_cb() >>>>> ->dock_notify()-> handle_eject_request()->hotplug_dock_devices() >>>>> 3) hotplug_dock_devices() first invokes registered hotplug callbacks to >>>>> destroy physical devices, then destroys all affected ACPI devices. >>>>> Everything seems perfect until now. But the acpiphp dock notification >>>>> handler will queue another task (T2) onto kacpi_hotplug_wq to really >>>>> destroy affected physical devices. >>>> >>>> Would not the solution be to modify it so that it didn't spawn the other >>>> task (T2), but removed the affected physical devices synchronously? >>>> >>>>> 4) kacpi_hotplug_wq finishes T1, and all affected ACPI devices have >>>>> been destroyed. >>>>> 5) kacpi_hotplug_wq handles T2, which destroys all affected physical >>>>> devices. >>>>> >>>>> So it breaks ACPI glue logic's expection because ACPI devices are destroyed >>>>> in step 3 and physical devices are destroyed in step 5. >>>>> >>>>> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> >>>>> Reported-by: Alexander E. Patrakov <patrakov@gmail.com> >>>>> Cc: Bjorn Helgaas <bhelgaas@google.com> >>>>> Cc: Yinghai Lu <yinghai@kernel.org> >>>>> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> >>>>> Cc: linux-pci@vger.kernel.org >>>>> Cc: linux-kernel@vger.kernel.org >>>>> Cc: stable@vger.kernel.org >>>>> --- >>>>> Hi Bjorn and Rafael, >>>>> The recursive lock changes haven't been tested yet, need help >>>>> from Alexander for testing. >>>> >>>> Well, let's just say I'm not a fan of recursive locks. Is that unavoidable >>>> here? >>> >>> What about the appended patch (on top of [1/9], untested)? >>> >>> Rafael >> It should have similar effect as patch 2/9, and it will encounter the >> same deadlock scenario as 2/9 too. > > And why exactly? > > I'm looking at acpiphp_disable_slot() and I'm not seeing where the > problematic lock is taken. Similarly for power_off_slot(). > > It should take the ACPI scan lock, but that's a different matter. > > Thanks, > Rafael The deadlock scenario is the same: hotplug_dock_devices() mutex_lock(&ds->hp_lock) dd->ops->handler() destroy pci bus unregister_hotplug_dock_device() mutex_lock(&ds->hp_lock) > > >>> --- >>> drivers/pci/hotplug/acpiphp_glue.c | 13 ++++++++++++- >>> 1 file changed, 12 insertions(+), 1 deletion(-) >>> >>> Index: linux-pm/drivers/pci/hotplug/acpiphp_glue.c >>> =================================================================== >>> --- linux-pm.orig/drivers/pci/hotplug/acpiphp_glue.c >>> +++ linux-pm/drivers/pci/hotplug/acpiphp_glue.c >>> @@ -145,9 +145,20 @@ static int post_dock_fixups(struct notif >>> return NOTIFY_OK; >>> } >>> >>> +static void handle_dock_event_func(acpi_handle handle, u32 event, void *context) >>> +{ >>> + if (event == ACPI_NOTIFY_EJECT_REQUEST) { >>> + struct acpiphp_func *func = context; >>> + >>> + if (!acpiphp_disable_slot(func->slot)) >>> + acpiphp_eject_slot(func->slot); >>> + } else { >>> + handle_hotplug_event_func(handle, event, context); >>> + } >>> +} >>> >>> static const struct acpi_dock_ops acpiphp_dock_ops = { >>> - .handler = handle_hotplug_event_func, >>> + .handler = handle_dock_event_func, >>> }; >>> >>> /* Check whether the PCI device is managed by native PCIe hotplug driver */ >>> >> -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday, June 14, 2013 11:30:12 PM Jiang Liu wrote: > On 06/14/2013 10:12 PM, Rafael J. Wysocki wrote: > > On Friday, June 14, 2013 09:57:15 PM Jiang Liu wrote: > >> On 06/14/2013 08:23 PM, Rafael J. Wysocki wrote: > >>> On Thursday, June 13, 2013 09:59:44 PM Rafael J. Wysocki wrote: > >>>> On Friday, June 14, 2013 12:32:25 AM Jiang Liu wrote: > >>>>> Current ACPI glue logic expects that physical devices are destroyed > >>>>> before destroying companion ACPI devices, otherwise it will break the > >>>>> ACPI unbind logic and cause following warning messages: > >>>>> [ 185.026073] usb usb5: Oops, 'acpi_handle' corrupt > >>>>> [ 185.035150] pci 0000:1b:00.0: Oops, 'acpi_handle' corrupt > >>>>> [ 185.035515] pci 0000:18:02.0: Oops, 'acpi_handle' corrupt > >>>>> [ 180.013656] port1: Oops, 'acpi_handle' corrupt > >>>>> Please refer to https://bugzilla.kernel.org/attachment.cgi?id=104321 > >>>>> for full log message. > >>>> > >>>> So my question is, did we have this problem before commit 3b63aaa70e1? > >>>> > >>>> If we did, then when did it start? Or was it present forever? > >>>> > >>>>> Above warning messages are caused by following scenario: > >>>>> 1) acpi_dock_notifier_call() queues a task (T1) onto kacpi_hotplug_wq > >>>>> 2) kacpi_hotplug_wq handles T1, which invokes acpi_dock_deferred_cb() > >>>>> ->dock_notify()-> handle_eject_request()->hotplug_dock_devices() > >>>>> 3) hotplug_dock_devices() first invokes registered hotplug callbacks to > >>>>> destroy physical devices, then destroys all affected ACPI devices. > >>>>> Everything seems perfect until now. But the acpiphp dock notification > >>>>> handler will queue another task (T2) onto kacpi_hotplug_wq to really > >>>>> destroy affected physical devices. > >>>> > >>>> Would not the solution be to modify it so that it didn't spawn the other > >>>> task (T2), but removed the affected physical devices synchronously? > >>>> > >>>>> 4) kacpi_hotplug_wq finishes T1, and all affected ACPI devices have > >>>>> been destroyed. > >>>>> 5) kacpi_hotplug_wq handles T2, which destroys all affected physical > >>>>> devices. > >>>>> > >>>>> So it breaks ACPI glue logic's expection because ACPI devices are destroyed > >>>>> in step 3 and physical devices are destroyed in step 5. > >>>>> > >>>>> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> > >>>>> Reported-by: Alexander E. Patrakov <patrakov@gmail.com> > >>>>> Cc: Bjorn Helgaas <bhelgaas@google.com> > >>>>> Cc: Yinghai Lu <yinghai@kernel.org> > >>>>> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> > >>>>> Cc: linux-pci@vger.kernel.org > >>>>> Cc: linux-kernel@vger.kernel.org > >>>>> Cc: stable@vger.kernel.org > >>>>> --- > >>>>> Hi Bjorn and Rafael, > >>>>> The recursive lock changes haven't been tested yet, need help > >>>>> from Alexander for testing. > >>>> > >>>> Well, let's just say I'm not a fan of recursive locks. Is that unavoidable > >>>> here? > >>> > >>> What about the appended patch (on top of [1/9], untested)? > >>> > >>> Rafael > >> It should have similar effect as patch 2/9, and it will encounter the > >> same deadlock scenario as 2/9 too. > > > > And why exactly? > > > > I'm looking at acpiphp_disable_slot() and I'm not seeing where the > > problematic lock is taken. Similarly for power_off_slot(). > > > > It should take the ACPI scan lock, but that's a different matter. > > > > Thanks, > > Rafael > The deadlock scenario is the same: Well, you didn't answer my question. > hotplug_dock_devices() > mutex_lock(&ds->hp_lock) > dd->ops->handler() > destroy pci bus And this wasn't particularly helpful. What about mentioning acpi_pci_remove_bus()? I don't remember all changes made recently, sorry. > unregister_hotplug_dock_device() > mutex_lock(&ds->hp_lock) I see the problem. ds->hp_lock is used to make both addition and removal of hotplug devices wait for us to complete walking ds->hotplug_devices. However, if we are in the process of removing hotplug devices, we don't need removals to block on ds->hp_lock (in fact, we don't even want them to block on it). Conversely, if we are in the process of adding hotplug devices, we don't want additions to block on ds->hp_lock. So, why don't we do the following: (1) Introduce a 'hotplug_status' field into struct_dock station with possible values representing "removal", "addition" and "no action" and a wait queue associated with it. We can use ds->dd_lock to protect that field. (2) hotplug_status will be modified by hotplug_dock_devices() depending on the event. For example, on eject it will set hotplug_status to "removal". Then, after completing the walk, it will reset hotplug_status to "no action" and wake up its wait queue. (3) dock_del_hotplug_device() will check if hotplug_status is "removal" or "no_action" and if so, it will just do the removal under ds->dd_lock. If it is "addition", though, it will go to sleep (in the hotplug_status' wait queue) until hotplug_dock_devices() wakes it up. (4) Analogously for dock_add_hotplug_device(). For this to work the "addition" and "deletion" of hotplug devices should better be implemented as setting and clearing a 'hotplug' flag in struct dock_dependent_device (instead of using ds->hotplug_devices). Thanks, Rafael
Index: linux-pm/drivers/pci/hotplug/acpiphp_glue.c =================================================================== --- linux-pm.orig/drivers/pci/hotplug/acpiphp_glue.c +++ linux-pm/drivers/pci/hotplug/acpiphp_glue.c @@ -145,9 +145,20 @@ static int post_dock_fixups(struct notif return NOTIFY_OK; } +static void handle_dock_event_func(acpi_handle handle, u32 event, void *context) +{ + if (event == ACPI_NOTIFY_EJECT_REQUEST) { + struct acpiphp_func *func = context; + + if (!acpiphp_disable_slot(func->slot)) + acpiphp_eject_slot(func->slot); + } else { + handle_hotplug_event_func(handle, event, context); + } +} static const struct acpi_dock_ops acpiphp_dock_ops = { - .handler = handle_hotplug_event_func, + .handler = handle_dock_event_func, }; /* Check whether the PCI device is managed by native PCIe hotplug driver */