diff mbox series

PM-runtime: fix deadlock with ktime

Message ID 1548836194-15264-1-git-send-email-vincent.guittot@linaro.org (mailing list archive)
State New, archived
Headers show
Series PM-runtime: fix deadlock with ktime | expand

Commit Message

Vincent Guittot Jan. 30, 2019, 8:16 a.m. UTC
A deadlock has been seen when swicthing clocksources which use PM runtime.
The call path is:
change_clocksource
    ...
    write_seqcount_begin
    ...
    timekeeping_update
        ...
        sh_cmt_clocksource_enable
            ...
            rpm_resume
                pm_runtime_mark_last_busy
                    ktime_get
                        do
                            read_seqcount_begin
                        while read_seqcount_retry
    ....
    write_seqcount_end

Although we should be safe because we haven't yet changed the clocksource
at that time, we can't because of seqcount protection.

Use ktime_get_mono_fast_ns instead which is lock safe for such case

Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers")
Reported-by: Biju Das <biju.das@bp.renesas.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 drivers/base/power/runtime.c | 10 +++++-----
 include/linux/pm_runtime.h   |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

Comments

Geert Uytterhoeven Jan. 30, 2019, 8:21 a.m. UTC | #1
Hi Vincent,

On Wed, Jan 30, 2019 at 9:16 AM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
> A deadlock has been seen when swicthing clocksources which use PM runtime.
> The call path is:
> change_clocksource
>     ...
>     write_seqcount_begin
>     ...
>     timekeeping_update
>         ...
>         sh_cmt_clocksource_enable
>             ...
>             rpm_resume
>                 pm_runtime_mark_last_busy
>                     ktime_get
>                         do
>                             read_seqcount_begin
>                         while read_seqcount_retry
>     ....
>     write_seqcount_end
>
> Although we should be safe because we haven't yet changed the clocksource
> at that time, we can't because of seqcount protection.
>
> Use ktime_get_mono_fast_ns instead which is lock safe for such case
>
> Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers")
> Reported-by: Biju Das <biju.das@bp.renesas.com>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

Thanks for your patch!

/**
 * ktime_get_mono_fast_ns - Fast NMI safe access to clock monotonic
 *
 * This timestamp is not guaranteed to be monotonic across an update.
 * The timestamp is calculated by:
 *
 *      now = base_mono + clock_delta * slope
 *
 * So if the update lowers the slope, readers who are forced to the
 * not yet updated second array are still using the old steeper slope.
 *
 * tmono
 * ^
 * |    o  n
 * |   o n
 * |  u
 * | o
 * |o
 * |12345678---> reader order
 *
 * o = old slope
 * u = update
 * n = new slope
 *
 * So reader 6 will observe time going backwards versus reader 5.
 *
 * While other CPUs are likely to be able observe that, the only way
 * for a CPU local observation is when an NMI hits in the middle of
 * the update. Timestamps taken from that NMI context might be ahead
 * of the following timestamps. Callers need to be aware of that and
 * deal with it.
 */

As this function is not guaranteed to be monotonic, have you checked how
the Runtime PM code behaves if time goes backwards? Does it just make
a suboptimal decision or does it crash?

Thanks!

Gr{oetje,eeting}s,

                        Geert
Vincent Guittot Jan. 30, 2019, 9:14 a.m. UTC | #2
Hi Geert,

On Wed, 30 Jan 2019 at 09:21, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Vincent,
>
> On Wed, Jan 30, 2019 at 9:16 AM Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> > A deadlock has been seen when swicthing clocksources which use PM runtime.
> > The call path is:
> > change_clocksource
> >     ...
> >     write_seqcount_begin
> >     ...
> >     timekeeping_update
> >         ...
> >         sh_cmt_clocksource_enable
> >             ...
> >             rpm_resume
> >                 pm_runtime_mark_last_busy
> >                     ktime_get
> >                         do
> >                             read_seqcount_begin
> >                         while read_seqcount_retry
> >     ....
> >     write_seqcount_end
> >
> > Although we should be safe because we haven't yet changed the clocksource
> > at that time, we can't because of seqcount protection.
> >
> > Use ktime_get_mono_fast_ns instead which is lock safe for such case
> >
> > Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers")
> > Reported-by: Biju Das <biju.das@bp.renesas.com>
> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>
> Thanks for your patch!
>
> /**
>  * ktime_get_mono_fast_ns - Fast NMI safe access to clock monotonic
>  *
>  * This timestamp is not guaranteed to be monotonic across an update.
>  * The timestamp is calculated by:
>  *
>  *      now = base_mono + clock_delta * slope
>  *
>  * So if the update lowers the slope, readers who are forced to the
>  * not yet updated second array are still using the old steeper slope.
>  *
>  * tmono
>  * ^
>  * |    o  n
>  * |   o n
>  * |  u
>  * | o
>  * |o
>  * |12345678---> reader order
>  *
>  * o = old slope
>  * u = update
>  * n = new slope
>  *
>  * So reader 6 will observe time going backwards versus reader 5.
>  *
>  * While other CPUs are likely to be able observe that, the only way
>  * for a CPU local observation is when an NMI hits in the middle of
>  * the update. Timestamps taken from that NMI context might be ahead
>  * of the following timestamps. Callers need to be aware of that and
>  * deal with it.
>  */
>
> As this function is not guaranteed to be monotonic, have you checked how
> the Runtime PM code behaves if time goes backwards? Does it just make
> a suboptimal decision or does it crash?

As a worst case this will generate a suboptimal decision around the update

Regards,
Vincent

>
> Thanks!
>
> Gr{oetje,eeting}s,
>
>                         Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds
Rafael J. Wysocki Jan. 30, 2019, 9:39 a.m. UTC | #3
On Wed, Jan 30, 2019 at 10:14 AM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> Hi Geert,
>
> On Wed, 30 Jan 2019 at 09:21, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> >
> > Hi Vincent,
> >
> > On Wed, Jan 30, 2019 at 9:16 AM Vincent Guittot
> > <vincent.guittot@linaro.org> wrote:
> > > A deadlock has been seen when swicthing clocksources which use PM runtime.
> > > The call path is:
> > > change_clocksource
> > >     ...
> > >     write_seqcount_begin
> > >     ...
> > >     timekeeping_update
> > >         ...
> > >         sh_cmt_clocksource_enable
> > >             ...
> > >             rpm_resume
> > >                 pm_runtime_mark_last_busy
> > >                     ktime_get
> > >                         do
> > >                             read_seqcount_begin
> > >                         while read_seqcount_retry
> > >     ....
> > >     write_seqcount_end
> > >
> > > Although we should be safe because we haven't yet changed the clocksource
> > > at that time, we can't because of seqcount protection.
> > >
> > > Use ktime_get_mono_fast_ns instead which is lock safe for such case
> > >
> > > Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers")
> > > Reported-by: Biju Das <biju.das@bp.renesas.com>
> > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> >
> > Thanks for your patch!
> >
> > /**
> >  * ktime_get_mono_fast_ns - Fast NMI safe access to clock monotonic
> >  *
> >  * This timestamp is not guaranteed to be monotonic across an update.
> >  * The timestamp is calculated by:
> >  *
> >  *      now = base_mono + clock_delta * slope
> >  *
> >  * So if the update lowers the slope, readers who are forced to the
> >  * not yet updated second array are still using the old steeper slope.
> >  *
> >  * tmono
> >  * ^
> >  * |    o  n
> >  * |   o n
> >  * |  u
> >  * | o
> >  * |o
> >  * |12345678---> reader order
> >  *
> >  * o = old slope
> >  * u = update
> >  * n = new slope
> >  *
> >  * So reader 6 will observe time going backwards versus reader 5.
> >  *
> >  * While other CPUs are likely to be able observe that, the only way
> >  * for a CPU local observation is when an NMI hits in the middle of
> >  * the update. Timestamps taken from that NMI context might be ahead
> >  * of the following timestamps. Callers need to be aware of that and
> >  * deal with it.
> >  */
> >
> > As this function is not guaranteed to be monotonic, have you checked how
> > the Runtime PM code behaves if time goes backwards? Does it just make
> > a suboptimal decision or does it crash?
>
> As a worst case this will generate a suboptimal decision around the update

So that should be explained in the changelog of the patch.  In detail,
if poss, please.
Vincent Guittot Jan. 30, 2019, 9:41 a.m. UTC | #4
On Wed, 30 Jan 2019 at 10:39, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Wed, Jan 30, 2019 at 10:14 AM Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> >
> > Hi Geert,
> >
> > On Wed, 30 Jan 2019 at 09:21, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > >
> > > Hi Vincent,
> > >
> > > On Wed, Jan 30, 2019 at 9:16 AM Vincent Guittot
> > > <vincent.guittot@linaro.org> wrote:
> > > > A deadlock has been seen when swicthing clocksources which use PM runtime.
> > > > The call path is:
> > > > change_clocksource
> > > >     ...
> > > >     write_seqcount_begin
> > > >     ...
> > > >     timekeeping_update
> > > >         ...
> > > >         sh_cmt_clocksource_enable
> > > >             ...
> > > >             rpm_resume
> > > >                 pm_runtime_mark_last_busy
> > > >                     ktime_get
> > > >                         do
> > > >                             read_seqcount_begin
> > > >                         while read_seqcount_retry
> > > >     ....
> > > >     write_seqcount_end
> > > >
> > > > Although we should be safe because we haven't yet changed the clocksource
> > > > at that time, we can't because of seqcount protection.
> > > >
> > > > Use ktime_get_mono_fast_ns instead which is lock safe for such case
> > > >
> > > > Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers")
> > > > Reported-by: Biju Das <biju.das@bp.renesas.com>
> > > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > >
> > > Thanks for your patch!
> > >
> > > /**
> > >  * ktime_get_mono_fast_ns - Fast NMI safe access to clock monotonic
> > >  *
> > >  * This timestamp is not guaranteed to be monotonic across an update.
> > >  * The timestamp is calculated by:
> > >  *
> > >  *      now = base_mono + clock_delta * slope
> > >  *
> > >  * So if the update lowers the slope, readers who are forced to the
> > >  * not yet updated second array are still using the old steeper slope.
> > >  *
> > >  * tmono
> > >  * ^
> > >  * |    o  n
> > >  * |   o n
> > >  * |  u
> > >  * | o
> > >  * |o
> > >  * |12345678---> reader order
> > >  *
> > >  * o = old slope
> > >  * u = update
> > >  * n = new slope
> > >  *
> > >  * So reader 6 will observe time going backwards versus reader 5.
> > >  *
> > >  * While other CPUs are likely to be able observe that, the only way
> > >  * for a CPU local observation is when an NMI hits in the middle of
> > >  * the update. Timestamps taken from that NMI context might be ahead
> > >  * of the following timestamps. Callers need to be aware of that and
> > >  * deal with it.
> > >  */
> > >
> > > As this function is not guaranteed to be monotonic, have you checked how
> > > the Runtime PM code behaves if time goes backwards? Does it just make
> > > a suboptimal decision or does it crash?
> >
> > As a worst case this will generate a suboptimal decision around the update
>
> So that should be explained in the changelog of the patch.  In detail,
> if poss, please.

Ok, I'm going to update the commit message
diff mbox series

Patch

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 457be03..708a13f 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -130,7 +130,7 @@  u64 pm_runtime_autosuspend_expiration(struct device *dev)
 {
 	int autosuspend_delay;
 	u64 last_busy, expires = 0;
-	u64 now = ktime_to_ns(ktime_get());
+	u64 now = ktime_get_mono_fast_ns();
 
 	if (!dev->power.use_autosuspend)
 		goto out;
@@ -909,7 +909,7 @@  static enum hrtimer_restart  pm_suspend_timer_fn(struct hrtimer *timer)
 	 * If 'expires' is after the current time, we've been called
 	 * too early.
 	 */
-	if (expires > 0 && expires < ktime_to_ns(ktime_get())) {
+	if (expires > 0 && expires < ktime_get_mono_fast_ns()) {
 		dev->power.timer_expires = 0;
 		rpm_suspend(dev, dev->power.timer_autosuspends ?
 		    (RPM_ASYNC | RPM_AUTO) : RPM_ASYNC);
@@ -928,7 +928,7 @@  static enum hrtimer_restart  pm_suspend_timer_fn(struct hrtimer *timer)
 int pm_schedule_suspend(struct device *dev, unsigned int delay)
 {
 	unsigned long flags;
-	ktime_t expires;
+	u64 expires;
 	int retval;
 
 	spin_lock_irqsave(&dev->power.lock, flags);
@@ -945,8 +945,8 @@  int pm_schedule_suspend(struct device *dev, unsigned int delay)
 	/* Other scheduled or pending requests need to be canceled. */
 	pm_runtime_cancel_pending(dev);
 
-	expires = ktime_add(ktime_get(), ms_to_ktime(delay));
-	dev->power.timer_expires = ktime_to_ns(expires);
+	expires = ktime_get_mono_fast_ns() + (u64)delay * NSEC_PER_MSEC);
+	dev->power.timer_expires = expires;
 	dev->power.timer_autosuspends = 0;
 	hrtimer_start(&dev->power.suspend_timer, expires, HRTIMER_MODE_ABS);
 
diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
index 54af4ee..fed5be7 100644
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -105,7 +105,7 @@  static inline bool pm_runtime_callbacks_present(struct device *dev)
 
 static inline void pm_runtime_mark_last_busy(struct device *dev)
 {
-	WRITE_ONCE(dev->power.last_busy, ktime_to_ns(ktime_get()));
+	WRITE_ONCE(dev->power.last_busy, ktime_get_mono_fast_ns());
 }
 
 static inline bool pm_runtime_is_irq_safe(struct device *dev)