diff mbox

ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock

Message ID 1551349.SyN5ciAXNe@aspire.rjw.lan (mailing list archive)
State New, archived
Headers show

Commit Message

Rafael J. Wysocki Nov. 7, 2017, 11:06 p.m. UTC
On Tuesday, November 7, 2017 11:47:54 PM CET Rafael J. Wysocki wrote:
> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
> <ville.syrjala@linux.intel.com> wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >
> > acpi_remove_pm_notifier() ends up calling flush_workqueue() while
> > holding acpi_pm_notifier_lock, and that same lock is taken by
> > by the work via acpi_pm_notify_handler(). This can deadlock.
> 
> OK, good catch!
> 
> [cut]
> 
> >
> > Cc: stable@vger.kernel.org
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Len Brown <lenb@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > ---
> >  drivers/acpi/device_pm.c | 21 ++++++++++++---------
> >  1 file changed, 12 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> > index fbcc73f7a099..18af71057b44 100644
> > --- a/drivers/acpi/device_pm.c
> > +++ b/drivers/acpi/device_pm.c
> > @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
> >
> >  #ifdef CONFIG_PM
> >  static DEFINE_MUTEX(acpi_pm_notifier_lock);
> > +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
> >
> >  void acpi_pm_wakeup_event(struct device *dev)
> >  {
> > @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
> >         if (!dev && !func)
> >                 return AE_BAD_PARAMETER;
> >
> > -       mutex_lock(&acpi_pm_notifier_lock);
> > +       mutex_lock(&acpi_pm_notifier_install_lock);
> >
> >         if (adev->wakeup.flags.notifier_present)
> >                 goto out;
> >
> > -       adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > -       adev->wakeup.context.dev = dev;
> > -       adev->wakeup.context.func = func;
> > -
> 
> But this doesn't look good to me.
> 
> notifier_present should be checked under acpi_pm_notifier_lock.
> 
> Actually, acpi_install_notify_handler() itself need not be called
> under the lock, because it does sufficient checks of its own.
> 
> So say you do
> 
> mutex_lock(&acpi_pm_notifier_lock);
> 
> if (adev->wakeup.context.dev)
>         goto out;
> 
> adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> adev->wakeup.context.dev = dev;
> adev->wakeup.context.func = func;
> 
> mutex_unlock(&acpi_pm_notifier_lock);
> 
> >         status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
> >                                              acpi_pm_notify_handler, NULL);
> >         if (ACPI_FAILURE(status))
> >                 goto out;
> >
> > +       mutex_lock(&acpi_pm_notifier_lock);
> 
> And here you just set notifier_present under acpi_pm_notifier_lock.
> 
> > +       adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > +       adev->wakeup.context.dev = dev;
> > +       adev->wakeup.context.func = func;
> >         adev->wakeup.flags.notifier_present = true;
> > +       mutex_unlock(&acpi_pm_notifier_lock);
> >
> >   out:
> > -       mutex_unlock(&acpi_pm_notifier_lock);
> > +       mutex_unlock(&acpi_pm_notifier_install_lock);
> >         return status;
> >  }
> 
> Then on removal you can clear notifier_present first and drop the lock
> around the acpi_remove_notify_handler() call and nothing bad will
> happen.
> 
> If you call acpi_add_pm_notifier() twice in parallel, the first
> instance will set context.dev and the second one will see it set and
> bail out.  The first instance will then do the rest.
> 
> If you call acpi_remove_pm_notifier() twice in a row, the first
> instance will see notifier_present set and will clear it, so the
> second one will see notifier_present unset and it will bail out.
> 
> Now, if you call acpi_remove_pm_notifier() in parallel with
> acpi_add_pm_notifier(), either the former will see notifier_present
> unset and bail out, or the latter will see context.dev unset and bail
> out.
> 
> It doesn't look like the outer lock is needed, or have I missed anything?

So something like the below (totally untested) should work too, shouldn't it?

---
 drivers/acpi/device_pm.c |   35 +++++++++++++++++++++++++++--------
 1 file changed, 27 insertions(+), 8 deletions(-)

Comments

Rafael J. Wysocki Nov. 8, 2017, 12:23 p.m. UTC | #1
On Wed, Nov 8, 2017 at 12:06 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Tuesday, November 7, 2017 11:47:54 PM CET Rafael J. Wysocki wrote:
>> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
>> <ville.syrjala@linux.intel.com> wrote:
>> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> >
>> > acpi_remove_pm_notifier() ends up calling flush_workqueue() while
>> > holding acpi_pm_notifier_lock, and that same lock is taken by
>> > by the work via acpi_pm_notify_handler(). This can deadlock.
>>
>> OK, good catch!
>>
>> [cut]
>>
>> >
>> > Cc: stable@vger.kernel.org
>> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> > Cc: Len Brown <lenb@kernel.org>
>> > Cc: Peter Zijlstra <peterz@infradead.org>
>> > Cc: Tejun Heo <tj@kernel.org>
>> > Cc: Ingo Molnar <mingo@kernel.org>
>> > Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
>> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> > ---
>> >  drivers/acpi/device_pm.c | 21 ++++++++++++---------
>> >  1 file changed, 12 insertions(+), 9 deletions(-)
>> >
>> > diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
>> > index fbcc73f7a099..18af71057b44 100644
>> > --- a/drivers/acpi/device_pm.c
>> > +++ b/drivers/acpi/device_pm.c
>> > @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
>> >
>> >  #ifdef CONFIG_PM
>> >  static DEFINE_MUTEX(acpi_pm_notifier_lock);
>> > +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
>> >
>> >  void acpi_pm_wakeup_event(struct device *dev)
>> >  {
>> > @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
>> >         if (!dev && !func)
>> >                 return AE_BAD_PARAMETER;
>> >
>> > -       mutex_lock(&acpi_pm_notifier_lock);
>> > +       mutex_lock(&acpi_pm_notifier_install_lock);
>> >
>> >         if (adev->wakeup.flags.notifier_present)
>> >                 goto out;
>> >
>> > -       adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
>> > -       adev->wakeup.context.dev = dev;
>> > -       adev->wakeup.context.func = func;
>> > -
>>
>> But this doesn't look good to me.
>>
>> notifier_present should be checked under acpi_pm_notifier_lock.
>>
>> Actually, acpi_install_notify_handler() itself need not be called
>> under the lock, because it does sufficient checks of its own.
>>
>> So say you do
>>
>> mutex_lock(&acpi_pm_notifier_lock);
>>
>> if (adev->wakeup.context.dev)
>>         goto out;
>>
>> adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
>> adev->wakeup.context.dev = dev;
>> adev->wakeup.context.func = func;
>>
>> mutex_unlock(&acpi_pm_notifier_lock);
>>
>> >         status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
>> >                                              acpi_pm_notify_handler, NULL);
>> >         if (ACPI_FAILURE(status))
>> >                 goto out;
>> >
>> > +       mutex_lock(&acpi_pm_notifier_lock);
>>
>> And here you just set notifier_present under acpi_pm_notifier_lock.
>>
>> > +       adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
>> > +       adev->wakeup.context.dev = dev;
>> > +       adev->wakeup.context.func = func;
>> >         adev->wakeup.flags.notifier_present = true;
>> > +       mutex_unlock(&acpi_pm_notifier_lock);
>> >
>> >   out:
>> > -       mutex_unlock(&acpi_pm_notifier_lock);
>> > +       mutex_unlock(&acpi_pm_notifier_install_lock);
>> >         return status;
>> >  }
>>
>> Then on removal you can clear notifier_present first and drop the lock
>> around the acpi_remove_notify_handler() call and nothing bad will
>> happen.
>>
>> If you call acpi_add_pm_notifier() twice in parallel, the first
>> instance will set context.dev and the second one will see it set and
>> bail out.  The first instance will then do the rest.
>>
>> If you call acpi_remove_pm_notifier() twice in a row, the first
>> instance will see notifier_present set and will clear it, so the
>> second one will see notifier_present unset and it will bail out.
>>
>> Now, if you call acpi_remove_pm_notifier() in parallel with
>> acpi_add_pm_notifier(), either the former will see notifier_present
>> unset and bail out, or the latter will see context.dev unset and bail
>> out.
>>
>> It doesn't look like the outer lock is needed, or have I missed anything?
>
> So something like the below (totally untested) should work too, shouldn't it?

There is a problem if a device is removed while acpi_add_pm_notifier()
is still in progress, in which case with my patch the
acpi_remove_pm_notifier() called from the removal path may bail out
prematurely (that doesn't seem likely to happen, but still I don't see
why it cannot happen), so I'll just use your patch. :-)

Thanks,
Rafael
Ville Syrjälä Nov. 8, 2017, 12:31 p.m. UTC | #2
On Wed, Nov 08, 2017 at 01:23:56PM +0100, Rafael J. Wysocki wrote:
> On Wed, Nov 8, 2017 at 12:06 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Tuesday, November 7, 2017 11:47:54 PM CET Rafael J. Wysocki wrote:
> >> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
> >> <ville.syrjala@linux.intel.com> wrote:
> >> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >> >
> >> > acpi_remove_pm_notifier() ends up calling flush_workqueue() while
> >> > holding acpi_pm_notifier_lock, and that same lock is taken by
> >> > by the work via acpi_pm_notify_handler(). This can deadlock.
> >>
> >> OK, good catch!
> >>
> >> [cut]
> >>
> >> >
> >> > Cc: stable@vger.kernel.org
> >> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> > Cc: Len Brown <lenb@kernel.org>
> >> > Cc: Peter Zijlstra <peterz@infradead.org>
> >> > Cc: Tejun Heo <tj@kernel.org>
> >> > Cc: Ingo Molnar <mingo@kernel.org>
> >> > Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
> >> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >> > ---
> >> >  drivers/acpi/device_pm.c | 21 ++++++++++++---------
> >> >  1 file changed, 12 insertions(+), 9 deletions(-)
> >> >
> >> > diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> >> > index fbcc73f7a099..18af71057b44 100644
> >> > --- a/drivers/acpi/device_pm.c
> >> > +++ b/drivers/acpi/device_pm.c
> >> > @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
> >> >
> >> >  #ifdef CONFIG_PM
> >> >  static DEFINE_MUTEX(acpi_pm_notifier_lock);
> >> > +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
> >> >
> >> >  void acpi_pm_wakeup_event(struct device *dev)
> >> >  {
> >> > @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
> >> >         if (!dev && !func)
> >> >                 return AE_BAD_PARAMETER;
> >> >
> >> > -       mutex_lock(&acpi_pm_notifier_lock);
> >> > +       mutex_lock(&acpi_pm_notifier_install_lock);
> >> >
> >> >         if (adev->wakeup.flags.notifier_present)
> >> >                 goto out;
> >> >
> >> > -       adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> >> > -       adev->wakeup.context.dev = dev;
> >> > -       adev->wakeup.context.func = func;
> >> > -
> >>
> >> But this doesn't look good to me.
> >>
> >> notifier_present should be checked under acpi_pm_notifier_lock.
> >>
> >> Actually, acpi_install_notify_handler() itself need not be called
> >> under the lock, because it does sufficient checks of its own.
> >>
> >> So say you do
> >>
> >> mutex_lock(&acpi_pm_notifier_lock);
> >>
> >> if (adev->wakeup.context.dev)
> >>         goto out;
> >>
> >> adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> >> adev->wakeup.context.dev = dev;
> >> adev->wakeup.context.func = func;
> >>
> >> mutex_unlock(&acpi_pm_notifier_lock);
> >>
> >> >         status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
> >> >                                              acpi_pm_notify_handler, NULL);
> >> >         if (ACPI_FAILURE(status))
> >> >                 goto out;
> >> >
> >> > +       mutex_lock(&acpi_pm_notifier_lock);
> >>
> >> And here you just set notifier_present under acpi_pm_notifier_lock.
> >>
> >> > +       adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> >> > +       adev->wakeup.context.dev = dev;
> >> > +       adev->wakeup.context.func = func;
> >> >         adev->wakeup.flags.notifier_present = true;
> >> > +       mutex_unlock(&acpi_pm_notifier_lock);
> >> >
> >> >   out:
> >> > -       mutex_unlock(&acpi_pm_notifier_lock);
> >> > +       mutex_unlock(&acpi_pm_notifier_install_lock);
> >> >         return status;
> >> >  }
> >>
> >> Then on removal you can clear notifier_present first and drop the lock
> >> around the acpi_remove_notify_handler() call and nothing bad will
> >> happen.
> >>
> >> If you call acpi_add_pm_notifier() twice in parallel, the first
> >> instance will set context.dev and the second one will see it set and
> >> bail out.  The first instance will then do the rest.
> >>
> >> If you call acpi_remove_pm_notifier() twice in a row, the first
> >> instance will see notifier_present set and will clear it, so the
> >> second one will see notifier_present unset and it will bail out.
> >>
> >> Now, if you call acpi_remove_pm_notifier() in parallel with
> >> acpi_add_pm_notifier(), either the former will see notifier_present
> >> unset and bail out, or the latter will see context.dev unset and bail
> >> out.
> >>
> >> It doesn't look like the outer lock is needed, or have I missed anything?
> >
> > So something like the below (totally untested) should work too, shouldn't it?
> 
> There is a problem if a device is removed while acpi_add_pm_notifier()
> is still in progress, in which case with my patch the
> acpi_remove_pm_notifier() called from the removal path may bail out
> prematurely (that doesn't seem likely to happen, but still I don't see
> why it cannot happen), so I'll just use your patch. :-)

OK. I was just looking at your version and was pretty much convinced
that it would work. But I'll take your word that it might not :)
Rafael J. Wysocki Nov. 8, 2017, 12:41 p.m. UTC | #3
On Wednesday, November 8, 2017 1:31:22 PM CET Ville Syrjälä wrote:
> On Wed, Nov 08, 2017 at 01:23:56PM +0100, Rafael J. Wysocki wrote:
> > On Wed, Nov 8, 2017 at 12:06 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > On Tuesday, November 7, 2017 11:47:54 PM CET Rafael J. Wysocki wrote:
> > >> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
> > >> <ville.syrjala@linux.intel.com> wrote:
> > >> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > >> >
> > >> > acpi_remove_pm_notifier() ends up calling flush_workqueue() while
> > >> > holding acpi_pm_notifier_lock, and that same lock is taken by
> > >> > by the work via acpi_pm_notify_handler(). This can deadlock.
> > >>
> > >> OK, good catch!
> > >>
> > >> [cut]
> > >>
> > >> >
> > >> > Cc: stable@vger.kernel.org
> > >> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >> > Cc: Len Brown <lenb@kernel.org>
> > >> > Cc: Peter Zijlstra <peterz@infradead.org>
> > >> > Cc: Tejun Heo <tj@kernel.org>
> > >> > Cc: Ingo Molnar <mingo@kernel.org>
> > >> > Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
> > >> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > >> > ---
> > >> >  drivers/acpi/device_pm.c | 21 ++++++++++++---------
> > >> >  1 file changed, 12 insertions(+), 9 deletions(-)
> > >> >
> > >> > diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> > >> > index fbcc73f7a099..18af71057b44 100644
> > >> > --- a/drivers/acpi/device_pm.c
> > >> > +++ b/drivers/acpi/device_pm.c
> > >> > @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
> > >> >
> > >> >  #ifdef CONFIG_PM
> > >> >  static DEFINE_MUTEX(acpi_pm_notifier_lock);
> > >> > +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
> > >> >
> > >> >  void acpi_pm_wakeup_event(struct device *dev)
> > >> >  {
> > >> > @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
> > >> >         if (!dev && !func)
> > >> >                 return AE_BAD_PARAMETER;
> > >> >
> > >> > -       mutex_lock(&acpi_pm_notifier_lock);
> > >> > +       mutex_lock(&acpi_pm_notifier_install_lock);
> > >> >
> > >> >         if (adev->wakeup.flags.notifier_present)
> > >> >                 goto out;
> > >> >
> > >> > -       adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > >> > -       adev->wakeup.context.dev = dev;
> > >> > -       adev->wakeup.context.func = func;
> > >> > -
> > >>
> > >> But this doesn't look good to me.
> > >>
> > >> notifier_present should be checked under acpi_pm_notifier_lock.
> > >>
> > >> Actually, acpi_install_notify_handler() itself need not be called
> > >> under the lock, because it does sufficient checks of its own.
> > >>
> > >> So say you do
> > >>
> > >> mutex_lock(&acpi_pm_notifier_lock);
> > >>
> > >> if (adev->wakeup.context.dev)
> > >>         goto out;
> > >>
> > >> adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > >> adev->wakeup.context.dev = dev;
> > >> adev->wakeup.context.func = func;
> > >>
> > >> mutex_unlock(&acpi_pm_notifier_lock);
> > >>
> > >> >         status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
> > >> >                                              acpi_pm_notify_handler, NULL);
> > >> >         if (ACPI_FAILURE(status))
> > >> >                 goto out;
> > >> >
> > >> > +       mutex_lock(&acpi_pm_notifier_lock);
> > >>
> > >> And here you just set notifier_present under acpi_pm_notifier_lock.
> > >>
> > >> > +       adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > >> > +       adev->wakeup.context.dev = dev;
> > >> > +       adev->wakeup.context.func = func;
> > >> >         adev->wakeup.flags.notifier_present = true;
> > >> > +       mutex_unlock(&acpi_pm_notifier_lock);
> > >> >
> > >> >   out:
> > >> > -       mutex_unlock(&acpi_pm_notifier_lock);
> > >> > +       mutex_unlock(&acpi_pm_notifier_install_lock);
> > >> >         return status;
> > >> >  }
> > >>
> > >> Then on removal you can clear notifier_present first and drop the lock
> > >> around the acpi_remove_notify_handler() call and nothing bad will
> > >> happen.
> > >>
> > >> If you call acpi_add_pm_notifier() twice in parallel, the first
> > >> instance will set context.dev and the second one will see it set and
> > >> bail out.  The first instance will then do the rest.
> > >>
> > >> If you call acpi_remove_pm_notifier() twice in a row, the first
> > >> instance will see notifier_present set and will clear it, so the
> > >> second one will see notifier_present unset and it will bail out.
> > >>
> > >> Now, if you call acpi_remove_pm_notifier() in parallel with
> > >> acpi_add_pm_notifier(), either the former will see notifier_present
> > >> unset and bail out, or the latter will see context.dev unset and bail
> > >> out.
> > >>
> > >> It doesn't look like the outer lock is needed, or have I missed anything?
> > >
> > > So something like the below (totally untested) should work too, shouldn't it?
> > 
> > There is a problem if a device is removed while acpi_add_pm_notifier()
> > is still in progress, in which case with my patch the
> > acpi_remove_pm_notifier() called from the removal path may bail out
> > prematurely (that doesn't seem likely to happen, but still I don't see
> > why it cannot happen), so I'll just use your patch. :-)
> 
> OK. I was just looking at your version and was pretty much convinced
> that it would work. But I'll take your word that it might not :)

Well, you don't have to. :-)

The scenario I have in mind is as follows:

1. acpi_add_pm_notifier() sets context.dev and context.func and drops the
   lock.  notifier_present is still unset.

2. acpi_remove_pm_notifier() checks notifier_present under the lock.
   It is (still) unset, so the function decides that there's nothing to do.

3. acpi_add_pm_notifier() continues with notifier installation and the
   device goes away at the same time.

Thanks,
Rafael
Rafael J. Wysocki Nov. 8, 2017, 12:55 p.m. UTC | #4
On Wed, Nov 8, 2017 at 1:41 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Wednesday, November 8, 2017 1:31:22 PM CET Ville Syrjälä wrote:
>> On Wed, Nov 08, 2017 at 01:23:56PM +0100, Rafael J. Wysocki wrote:
>> > On Wed, Nov 8, 2017 at 12:06 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> > > On Tuesday, November 7, 2017 11:47:54 PM CET Rafael J. Wysocki wrote:
>> > >> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
>> > >> <ville.syrjala@linux.intel.com> wrote:
>> > >> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> > >> >
>> > >> > acpi_remove_pm_notifier() ends up calling flush_workqueue() while
>> > >> > holding acpi_pm_notifier_lock, and that same lock is taken by
>> > >> > by the work via acpi_pm_notify_handler(). This can deadlock.
>> > >>
>> > >> OK, good catch!
>> > >>
>> > >> [cut]
>> > >>
>> > >> >
>> > >> > Cc: stable@vger.kernel.org
>> > >> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> > >> > Cc: Len Brown <lenb@kernel.org>
>> > >> > Cc: Peter Zijlstra <peterz@infradead.org>
>> > >> > Cc: Tejun Heo <tj@kernel.org>
>> > >> > Cc: Ingo Molnar <mingo@kernel.org>
>> > >> > Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
>> > >> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> > >> > ---
>> > >> >  drivers/acpi/device_pm.c | 21 ++++++++++++---------
>> > >> >  1 file changed, 12 insertions(+), 9 deletions(-)
>> > >> >
>> > >> > diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
>> > >> > index fbcc73f7a099..18af71057b44 100644
>> > >> > --- a/drivers/acpi/device_pm.c
>> > >> > +++ b/drivers/acpi/device_pm.c
>> > >> > @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
>> > >> >
>> > >> >  #ifdef CONFIG_PM
>> > >> >  static DEFINE_MUTEX(acpi_pm_notifier_lock);
>> > >> > +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
>> > >> >
>> > >> >  void acpi_pm_wakeup_event(struct device *dev)
>> > >> >  {
>> > >> > @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
>> > >> >         if (!dev && !func)
>> > >> >                 return AE_BAD_PARAMETER;
>> > >> >
>> > >> > -       mutex_lock(&acpi_pm_notifier_lock);
>> > >> > +       mutex_lock(&acpi_pm_notifier_install_lock);
>> > >> >
>> > >> >         if (adev->wakeup.flags.notifier_present)
>> > >> >                 goto out;
>> > >> >
>> > >> > -       adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
>> > >> > -       adev->wakeup.context.dev = dev;
>> > >> > -       adev->wakeup.context.func = func;
>> > >> > -
>> > >>
>> > >> But this doesn't look good to me.
>> > >>
>> > >> notifier_present should be checked under acpi_pm_notifier_lock.
>> > >>
>> > >> Actually, acpi_install_notify_handler() itself need not be called
>> > >> under the lock, because it does sufficient checks of its own.
>> > >>
>> > >> So say you do
>> > >>
>> > >> mutex_lock(&acpi_pm_notifier_lock);
>> > >>
>> > >> if (adev->wakeup.context.dev)
>> > >>         goto out;
>> > >>
>> > >> adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
>> > >> adev->wakeup.context.dev = dev;
>> > >> adev->wakeup.context.func = func;
>> > >>
>> > >> mutex_unlock(&acpi_pm_notifier_lock);
>> > >>
>> > >> >         status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
>> > >> >                                              acpi_pm_notify_handler, NULL);
>> > >> >         if (ACPI_FAILURE(status))
>> > >> >                 goto out;
>> > >> >
>> > >> > +       mutex_lock(&acpi_pm_notifier_lock);
>> > >>
>> > >> And here you just set notifier_present under acpi_pm_notifier_lock.
>> > >>
>> > >> > +       adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
>> > >> > +       adev->wakeup.context.dev = dev;
>> > >> > +       adev->wakeup.context.func = func;
>> > >> >         adev->wakeup.flags.notifier_present = true;
>> > >> > +       mutex_unlock(&acpi_pm_notifier_lock);
>> > >> >
>> > >> >   out:
>> > >> > -       mutex_unlock(&acpi_pm_notifier_lock);
>> > >> > +       mutex_unlock(&acpi_pm_notifier_install_lock);
>> > >> >         return status;
>> > >> >  }
>> > >>
>> > >> Then on removal you can clear notifier_present first and drop the lock
>> > >> around the acpi_remove_notify_handler() call and nothing bad will
>> > >> happen.
>> > >>
>> > >> If you call acpi_add_pm_notifier() twice in parallel, the first
>> > >> instance will set context.dev and the second one will see it set and
>> > >> bail out.  The first instance will then do the rest.
>> > >>
>> > >> If you call acpi_remove_pm_notifier() twice in a row, the first
>> > >> instance will see notifier_present set and will clear it, so the
>> > >> second one will see notifier_present unset and it will bail out.
>> > >>
>> > >> Now, if you call acpi_remove_pm_notifier() in parallel with
>> > >> acpi_add_pm_notifier(), either the former will see notifier_present
>> > >> unset and bail out, or the latter will see context.dev unset and bail
>> > >> out.
>> > >>
>> > >> It doesn't look like the outer lock is needed, or have I missed anything?
>> > >
>> > > So something like the below (totally untested) should work too, shouldn't it?
>> >
>> > There is a problem if a device is removed while acpi_add_pm_notifier()
>> > is still in progress, in which case with my patch the
>> > acpi_remove_pm_notifier() called from the removal path may bail out
>> > prematurely (that doesn't seem likely to happen, but still I don't see
>> > why it cannot happen), so I'll just use your patch. :-)
>>
>> OK. I was just looking at your version and was pretty much convinced
>> that it would work. But I'll take your word that it might not :)
>
> Well, you don't have to. :-)
>
> The scenario I have in mind is as follows:
>
> 1. acpi_add_pm_notifier() sets context.dev and context.func and drops the
>    lock.  notifier_present is still unset.
>
> 2. acpi_remove_pm_notifier() checks notifier_present under the lock.
>    It is (still) unset, so the function decides that there's nothing to do.
>
> 3. acpi_add_pm_notifier() continues with notifier installation and the
>    device goes away at the same time.

Of course, the outer lock doesn't help against
acpi_remove_pm_notifier() in the removal path running right before
acpi_add_pm_notifier(), so if there's no other mutual exclusion, we
still have a problem in there (need to check that).

Thanks,
Rafael
Ville Syrjälä Nov. 8, 2017, 1 p.m. UTC | #5
On Wed, Nov 08, 2017 at 01:41:06PM +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 8, 2017 1:31:22 PM CET Ville Syrjälä wrote:
> > On Wed, Nov 08, 2017 at 01:23:56PM +0100, Rafael J. Wysocki wrote:
> > > On Wed, Nov 8, 2017 at 12:06 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > > On Tuesday, November 7, 2017 11:47:54 PM CET Rafael J. Wysocki wrote:
> > > >> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
> > > >> <ville.syrjala@linux.intel.com> wrote:
> > > >> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > >> >
> > > >> > acpi_remove_pm_notifier() ends up calling flush_workqueue() while
> > > >> > holding acpi_pm_notifier_lock, and that same lock is taken by
> > > >> > by the work via acpi_pm_notify_handler(). This can deadlock.
> > > >>
> > > >> OK, good catch!
> > > >>
> > > >> [cut]
> > > >>
> > > >> >
> > > >> > Cc: stable@vger.kernel.org
> > > >> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >> > Cc: Len Brown <lenb@kernel.org>
> > > >> > Cc: Peter Zijlstra <peterz@infradead.org>
> > > >> > Cc: Tejun Heo <tj@kernel.org>
> > > >> > Cc: Ingo Molnar <mingo@kernel.org>
> > > >> > Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
> > > >> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > >> > ---
> > > >> >  drivers/acpi/device_pm.c | 21 ++++++++++++---------
> > > >> >  1 file changed, 12 insertions(+), 9 deletions(-)
> > > >> >
> > > >> > diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> > > >> > index fbcc73f7a099..18af71057b44 100644
> > > >> > --- a/drivers/acpi/device_pm.c
> > > >> > +++ b/drivers/acpi/device_pm.c
> > > >> > @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
> > > >> >
> > > >> >  #ifdef CONFIG_PM
> > > >> >  static DEFINE_MUTEX(acpi_pm_notifier_lock);
> > > >> > +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
> > > >> >
> > > >> >  void acpi_pm_wakeup_event(struct device *dev)
> > > >> >  {
> > > >> > @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
> > > >> >         if (!dev && !func)
> > > >> >                 return AE_BAD_PARAMETER;
> > > >> >
> > > >> > -       mutex_lock(&acpi_pm_notifier_lock);
> > > >> > +       mutex_lock(&acpi_pm_notifier_install_lock);
> > > >> >
> > > >> >         if (adev->wakeup.flags.notifier_present)
> > > >> >                 goto out;
> > > >> >
> > > >> > -       adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > > >> > -       adev->wakeup.context.dev = dev;
> > > >> > -       adev->wakeup.context.func = func;
> > > >> > -
> > > >>
> > > >> But this doesn't look good to me.
> > > >>
> > > >> notifier_present should be checked under acpi_pm_notifier_lock.
> > > >>
> > > >> Actually, acpi_install_notify_handler() itself need not be called
> > > >> under the lock, because it does sufficient checks of its own.
> > > >>
> > > >> So say you do
> > > >>
> > > >> mutex_lock(&acpi_pm_notifier_lock);
> > > >>
> > > >> if (adev->wakeup.context.dev)
> > > >>         goto out;
> > > >>
> > > >> adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > > >> adev->wakeup.context.dev = dev;
> > > >> adev->wakeup.context.func = func;
> > > >>
> > > >> mutex_unlock(&acpi_pm_notifier_lock);
> > > >>
> > > >> >         status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
> > > >> >                                              acpi_pm_notify_handler, NULL);
> > > >> >         if (ACPI_FAILURE(status))
> > > >> >                 goto out;
> > > >> >
> > > >> > +       mutex_lock(&acpi_pm_notifier_lock);
> > > >>
> > > >> And here you just set notifier_present under acpi_pm_notifier_lock.
> > > >>
> > > >> > +       adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > > >> > +       adev->wakeup.context.dev = dev;
> > > >> > +       adev->wakeup.context.func = func;
> > > >> >         adev->wakeup.flags.notifier_present = true;
> > > >> > +       mutex_unlock(&acpi_pm_notifier_lock);
> > > >> >
> > > >> >   out:
> > > >> > -       mutex_unlock(&acpi_pm_notifier_lock);
> > > >> > +       mutex_unlock(&acpi_pm_notifier_install_lock);
> > > >> >         return status;
> > > >> >  }
> > > >>
> > > >> Then on removal you can clear notifier_present first and drop the lock
> > > >> around the acpi_remove_notify_handler() call and nothing bad will
> > > >> happen.
> > > >>
> > > >> If you call acpi_add_pm_notifier() twice in parallel, the first
> > > >> instance will set context.dev and the second one will see it set and
> > > >> bail out.  The first instance will then do the rest.
> > > >>
> > > >> If you call acpi_remove_pm_notifier() twice in a row, the first
> > > >> instance will see notifier_present set and will clear it, so the
> > > >> second one will see notifier_present unset and it will bail out.
> > > >>
> > > >> Now, if you call acpi_remove_pm_notifier() in parallel with
> > > >> acpi_add_pm_notifier(), either the former will see notifier_present
> > > >> unset and bail out, or the latter will see context.dev unset and bail
> > > >> out.
> > > >>
> > > >> It doesn't look like the outer lock is needed, or have I missed anything?
> > > >
> > > > So something like the below (totally untested) should work too, shouldn't it?
> > > 
> > > There is a problem if a device is removed while acpi_add_pm_notifier()
> > > is still in progress, in which case with my patch the
> > > acpi_remove_pm_notifier() called from the removal path may bail out
> > > prematurely (that doesn't seem likely to happen, but still I don't see
> > > why it cannot happen), so I'll just use your patch. :-)
> > 
> > OK. I was just looking at your version and was pretty much convinced
> > that it would work. But I'll take your word that it might not :)
> 
> Well, you don't have to. :-)
> 
> The scenario I have in mind is as follows:
> 
> 1. acpi_add_pm_notifier() sets context.dev and context.func and drops the
>    lock.  notifier_present is still unset.
> 
> 2. acpi_remove_pm_notifier() checks notifier_present under the lock.
>    It is (still) unset, so the function decides that there's nothing to do.
> 
> 3. acpi_add_pm_notifier() continues with notifier installation and the
>    device goes away at the same time.

Right. That makes sense. Though I don't know what would prevent racing
add_pm_notifier() against remove_pm_notifier() in a similar way even
with the outer lock. In that case the remove_pm_notifier() would just
have to get at the mutex first and then we'd still end up with the
notifier installed.
diff mbox

Patch

Index: linux-pm/drivers/acpi/device_pm.c
===================================================================
--- linux-pm.orig/drivers/acpi/device_pm.c
+++ linux-pm/drivers/acpi/device_pm.c
@@ -424,6 +424,13 @@  static void acpi_pm_notify_handler(acpi_
 	acpi_bus_put_acpi_device(adev);
 }
 
+static void acpi_dev_wakeup_cleanup(struct acpi_device *adev)
+{
+	adev->wakeup.context.func = NULL;
+	adev->wakeup.context.dev = NULL;
+	wakeup_source_unregister(adev->wakeup.ws);
+}
+
 /**
  * acpi_add_pm_notifier - Register PM notify handler for given ACPI device.
  * @adev: ACPI device to add the notify handler for.
@@ -445,17 +452,24 @@  acpi_status acpi_add_pm_notifier(struct
 
 	mutex_lock(&acpi_pm_notifier_lock);
 
-	if (adev->wakeup.flags.notifier_present)
+	if (adev->wakeup.context.dev || adev->wakeup.context.func)
 		goto out;
 
 	adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
 	adev->wakeup.context.dev = dev;
 	adev->wakeup.context.func = func;
 
+	mutex_unlock(&acpi_pm_notifier_lock);
+
 	status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
 					     acpi_pm_notify_handler, NULL);
-	if (ACPI_FAILURE(status))
+
+	mutex_lock(&acpi_pm_notifier_lock);
+
+	if (ACPI_FAILURE(status)) {
+		acpi_dev_wakeup_cleanup(adev);
 		goto out;
+	}
 
 	adev->wakeup.flags.notifier_present = true;
 
@@ -477,17 +491,22 @@  acpi_status acpi_remove_pm_notifier(stru
 	if (!adev->wakeup.flags.notifier_present)
 		goto out;
 
+	adev->wakeup.flags.notifier_present = false;
+
+	mutex_unlock(&acpi_pm_notifier_lock);
+
 	status = acpi_remove_notify_handler(adev->handle,
 					    ACPI_SYSTEM_NOTIFY,
 					    acpi_pm_notify_handler);
-	if (ACPI_FAILURE(status))
-		goto out;
 
-	adev->wakeup.context.func = NULL;
-	adev->wakeup.context.dev = NULL;
-	wakeup_source_unregister(adev->wakeup.ws);
+	mutex_lock(&acpi_pm_notifier_lock);
 
-	adev->wakeup.flags.notifier_present = false;
+	if (ACPI_FAILURE(status)) {
+		adev->wakeup.flags.notifier_present = true;
+		goto out;
+	}
+
+	acpi_dev_wakeup_cleanup(adev);
 
  out:
 	mutex_unlock(&acpi_pm_notifier_lock);