diff mbox

[update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)

Message ID Pine.LNX.4.44L0.0906281642500.7576-100000@netrider.rowland.org (mailing list archive)
State RFC, archived
Headers show

Commit Message

Alan Stern June 28, 2009, 9:07 p.m. UTC
On Sun, 28 Jun 2009, Rafael J. Wysocki wrote:

> It seems that if we do something like in the appended patch, then
> cancel_work() and cancel_delayed_work_dequeue() can be used to simplify the
> $subject patch slightly.

I merged your patch with my own work, leading to the patch below.

There were a bunch of things I didn't like about the existing code,
particularly cancel_delayed_work.  To start with, it seems like a large
enough routine that it shouldn't be inlined.  More importantly, it
foolishly calls del_timer_sync, resulting in the unnecessary
restriction that it cannot be used in_interrupt.  Finally, although it
will deactivate a delayed_work's timer, it doesn't even try to remove
the item from the workqueue if the timer has already expired.

Your cancel_delayed_work_dequeue is better -- so much better that I
don't see any reason to keep the original cancel_delayed_work at all.  
I got rid of it and used your routine instead.

I also changed the comments you wrote for cancel_work.  You can see 
that now they are much more explicit and complete.

The original version of __cancel_work_timer is not safe to use
in_interrupt.  If it is called from a handler whose IRQ interrupted
delayed_work_timer_fn, it can loop indefinitely.  Therefore I added a
check; if it finds that the work_struct is currently being enqueued and
it is running in_interrupt, it gives up right away.  There are a few
other improvements too.

Consequently it is now safe to call cancel_work and cancel_delayed_work
in_interrupt or while holding a spinlock.  This means you can use these
functions to cancel the various PM runtime work items whenever needed.  
As a result, you don't need two work_structs in dev_pm_info; a single
delayed_work will be enough.

Tell me what you think.

Alan Stern




--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Rafael Wysocki June 29, 2009, 12:15 a.m. UTC | #1
On Sunday 28 June 2009, Alan Stern wrote:
> On Sun, 28 Jun 2009, Rafael J. Wysocki wrote:
> 
> > It seems that if we do something like in the appended patch, then
> > cancel_work() and cancel_delayed_work_dequeue() can be used to simplify the
> > $subject patch slightly.
> 
> I merged your patch with my own work, leading to the patch below.
> 
> There were a bunch of things I didn't like about the existing code,
> particularly cancel_delayed_work.  To start with, it seems like a large
> enough routine that it shouldn't be inlined.

Agreed.

> More importantly, it foolishly calls del_timer_sync, resulting in the
> unnecessary restriction that it cannot be used in_interrupt.  Finally,
> although it will deactivate a delayed_work's timer, it doesn't even try to
> remove the item from the workqueue if the timer has already expired.
> 
> Your cancel_delayed_work_dequeue is better -- so much better that I
> don't see any reason to keep the original cancel_delayed_work at all.  
> I got rid of it and used your routine instead.
> 
> I also changed the comments you wrote for cancel_work.  You can see 
> that now they are much more explicit and complete.
> 
> The original version of __cancel_work_timer is not safe to use
> in_interrupt.  If it is called from a handler whose IRQ interrupted
> delayed_work_timer_fn, it can loop indefinitely.

Right, I overlooked that.

> Therefore I added a check; if it finds that the work_struct is currently
> being enqueued and it is running in_interrupt, it gives up right away.

Hmm, it doesn't do the work_clear_pending(work) in that case, so we allow
the work to be queued and run?  Out of couriosity, what the caller is supposed
to do then?

> There are a few other improvements too.
> 
> Consequently it is now safe to call cancel_work and cancel_delayed_work
> in_interrupt or while holding a spinlock.  This means you can use these
> functions to cancel the various PM runtime work items whenever needed.  
> As a result, you don't need two work_structs in dev_pm_info; a single
> delayed_work will be enough.

Yes, I'm going to rebase the framework patch on top of this one.

> Tell me what you think.

I like the patch. :-)

Best,
Rafael

 
> Index: usb-2.6/include/linux/workqueue.h
> ===================================================================
> --- usb-2.6.orig/include/linux/workqueue.h
> +++ usb-2.6/include/linux/workqueue.h
> @@ -223,24 +223,10 @@ int execute_in_process_context(work_func
>  extern int flush_work(struct work_struct *work);
>  
>  extern int cancel_work_sync(struct work_struct *work);
> -
> -/*
> - * Kill off a pending schedule_delayed_work().  Note that the work callback
> - * function may still be running on return from cancel_delayed_work(), unless
> - * it returns 1 and the work doesn't re-arm itself. Run flush_workqueue() or
> - * cancel_work_sync() to wait on it.
> - */
> -static inline int cancel_delayed_work(struct delayed_work *work)
> -{
> -	int ret;
> -
> -	ret = del_timer_sync(&work->timer);
> -	if (ret)
> -		work_clear_pending(&work->work);
> -	return ret;
> -}
> +extern int cancel_work(struct work_struct *work);
>  
>  extern int cancel_delayed_work_sync(struct delayed_work *work);
> +extern int cancel_delayed_work(struct delayed_work *dwork);
>  
>  /* Obsolete. use cancel_delayed_work_sync() */
>  static inline
> Index: usb-2.6/kernel/workqueue.c
> ===================================================================
> --- usb-2.6.orig/kernel/workqueue.c
> +++ usb-2.6/kernel/workqueue.c
> @@ -465,6 +465,7 @@ static int try_to_grab_pending(struct wo
>  {
>  	struct cpu_workqueue_struct *cwq;
>  	int ret = -1;
> +	unsigned long flags;
>  
>  	if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work)))
>  		return 0;
> @@ -478,7 +479,7 @@ static int try_to_grab_pending(struct wo
>  	if (!cwq)
>  		return ret;
>  
> -	spin_lock_irq(&cwq->lock);
> +	spin_lock_irqsave(&cwq->lock, flags);
>  	if (!list_empty(&work->entry)) {
>  		/*
>  		 * This work is queued, but perhaps we locked the wrong cwq.
> @@ -491,7 +492,7 @@ static int try_to_grab_pending(struct wo
>  			ret = 1;
>  		}
>  	}
> -	spin_unlock_irq(&cwq->lock);
> +	spin_unlock_irqrestore(&cwq->lock, flags);
>  
>  	return ret;
>  }
> @@ -536,18 +537,26 @@ static void wait_on_work(struct work_str
>  		wait_on_cpu_work(per_cpu_ptr(wq->cpu_wq, cpu), work);
>  }
>  
> -static int __cancel_work_timer(struct work_struct *work,
> +static int __cancel_work_timer(struct work_struct *work, bool wait,
>  				struct timer_list* timer)
>  {
>  	int ret;
>  
> -	do {
> -		ret = (timer && likely(del_timer(timer)));
> -		if (!ret)
> -			ret = try_to_grab_pending(work);
> -		wait_on_work(work);
> -	} while (unlikely(ret < 0));
> +	if (timer && likely(del_timer(timer))) {
> +		ret = 1;
> +		goto done;
> +	}
>  
> +	for (;;) {
> +		ret = try_to_grab_pending(work);
> +		if (likely(ret >= 0))
> +			break;
> +		if (in_interrupt())
> +			return ret;
> +	}
> +	if (ret == 0 && wait)
> +		wait_on_work(work);
> + done:
>  	work_clear_pending(work);
>  	return ret;
>  }
> @@ -575,11 +584,43 @@ static int __cancel_work_timer(struct wo
>   */
>  int cancel_work_sync(struct work_struct *work)
>  {
> -	return __cancel_work_timer(work, NULL);
> +	return __cancel_work_timer(work, true, NULL);
>  }
>  EXPORT_SYMBOL_GPL(cancel_work_sync);
>  
>  /**
> + * cancel_work - try to cancel a pending work_struct.
> + * @work: the work_struct to cancel
> + *
> + * Try to cancel a pending work_struct before it starts running.
> + * Upon return, @work may safely be reused if the return value
> + * is 1 or the return value is 0 and the work callback function
> + * doesn't resubmit @work.
> + *
> + * The callback function may be running upon return if the return value
> + * is <= 0; use cancel_work_sync() to wait for the callback function
> + * to finish.
> + *
> + * There's not much point using this routine unless you can guarantee
> + * that neither the callback function nor anything else is in the
> + * process of submitting @work (or is about to do so).  The only good
> + * reason might be that optimistically trying to cancel @work has less
> + * overhead than letting it go ahead and run.
> + *
> + * This routine may be called from interrupt context.
> + *
> + * Returns: 1 if @work was removed from its workqueue,
> + *	    0 if @work was not pending (may be running),
> + *	   -1 if @work was concurrently being enqueued and we were
> + *		called in_interrupt.
> + */
> +int cancel_work(struct work_struct *work)
> +{
> +	return __cancel_work_timer(work, false, NULL);
> +}
> +EXPORT_SYMBOL_GPL(cancel_work);
> +
> +/**
>   * cancel_delayed_work_sync - reliably kill off a delayed work.
>   * @dwork: the delayed work struct
>   *
> @@ -590,10 +631,24 @@ EXPORT_SYMBOL_GPL(cancel_work_sync);
>   */
>  int cancel_delayed_work_sync(struct delayed_work *dwork)
>  {
> -	return __cancel_work_timer(&dwork->work, &dwork->timer);
> +	return __cancel_work_timer(&dwork->work, true, &dwork->timer);
>  }
>  EXPORT_SYMBOL(cancel_delayed_work_sync);
>  
> +/**
> + * cancel_delayed_work - try to cancel a delayed_work_struct.
> + * @dwork: the delayed_work_struct to cancel
> + *
> + * Try to cancel a pending delayed_work, either by deactivating its
> + * timer or by removing it from its workqueue.  This routine is just
> + * like cancel_work() except that it handles a delayed_work.
> + */
> +int cancel_delayed_work(struct delayed_work *dwork)
> +{
> +	return __cancel_work_timer(&dwork->work, false, &dwork->timer);
> +}
> +EXPORT_SYMBOL(cancel_delayed_work);
> +
>  static struct workqueue_struct *keventd_wq __read_mostly;
>  
>  /**
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern June 29, 2009, 3:05 a.m. UTC | #2
On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:

> > The original version of __cancel_work_timer is not safe to use
> > in_interrupt.  If it is called from a handler whose IRQ interrupted
> > delayed_work_timer_fn, it can loop indefinitely.
> 
> Right, I overlooked that.
> 
> > Therefore I added a check; if it finds that the work_struct is currently
> > being enqueued and it is running in_interrupt, it gives up right away.
> 
> Hmm, it doesn't do the work_clear_pending(work) in that case, so we allow
> the work to be queued and run?

Yes.  That's better than leaving the work queued but with the "pending" 
flag cleared.  :-)

>  Out of couriosity, what the caller is supposed
> to do then?

In the case of cancel_work, this is a simple race.  The work_struct was
being submitted at the same time as the cancellation occurred.  The end
result is the same as if the submission had been slightly later: The
work is on the queue and it will run.  If the caller can guarantee that 
the work is not in the process of being submitted (as described in the 
kerneldoc) then this situation will never arise.

In the case of cancel_delayed_work, things are more complicated.  If
the cancellation had occurred a little earlier, it would have
deactivated the timer.  If it had occurred a little later, it would
have removed the item from the workqueue.  But since it arrived at
exactly the wrong time -- while the timer routine is enqueuing the work
-- there's nothing it can do.  The caller has to cope as best he can.

For runtime PM this isn't a big issue.  The only delayed work we have
is a delayed autosuspend request.  These things get cancelled when:

	pm_runtime_suspend runs synchronously.  That happens in
	process context so we're okay.

	pm_runtime_suspend_atomic (not yet written!) is called.
	If the cancellation fails, we'll have to return an error.
	The suspend will happen later, when the work item runs.
	Ultimately, the best we can do is recommend that people
	don't mix pm_request_suspend with pm_runtime_suspend_atomic.


Which reminds me...  The way you've got things set up, 
pm_runtime_put_atomic queues an idle notification, right?  That's 
a little inconsistent with the naming of the other routines.

Instead, pm_runtime_put_atomic should be a version of pm_runtime_put
that can safely be called in an atomic context -- it implies that it
will call the runtime_notify callback while holding the spinlock.  The
routine to queue an idle-notify request should be called something like
pm_request_put -- although that name isn't so great because it sounds 
like the put gets deferred instead of the notification.

Alan Stern


--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki June 29, 2009, 2:09 p.m. UTC | #3
On Monday 29 June 2009, Alan Stern wrote:
> On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > The original version of __cancel_work_timer is not safe to use
> > > in_interrupt.  If it is called from a handler whose IRQ interrupted
> > > delayed_work_timer_fn, it can loop indefinitely.
> > 
> > Right, I overlooked that.
> > 
> > > Therefore I added a check; if it finds that the work_struct is currently
> > > being enqueued and it is running in_interrupt, it gives up right away.
> > 
> > Hmm, it doesn't do the work_clear_pending(work) in that case, so we allow
> > the work to be queued and run?
> 
> Yes.  That's better than leaving the work queued but with the "pending" 
> flag cleared.  :-)

So there still is a problem, I'm afraid (details below).

> >  Out of couriosity, what the caller is supposed
> > to do then?
> 
> In the case of cancel_work, this is a simple race.  The work_struct was
> being submitted at the same time as the cancellation occurred.  The end
> result is the same as if the submission had been slightly later: The
> work is on the queue and it will run.  If the caller can guarantee that 
> the work is not in the process of being submitted (as described in the 
> kerneldoc) then this situation will never arise.
> 
> In the case of cancel_delayed_work, things are more complicated.  If
> the cancellation had occurred a little earlier, it would have
> deactivated the timer.  If it had occurred a little later, it would
> have removed the item from the workqueue.  But since it arrived at
> exactly the wrong time -- while the timer routine is enqueuing the work
> -- there's nothing it can do.  The caller has to cope as best he can.
> 
> For runtime PM this isn't a big issue.  The only delayed work we have
> is a delayed autosuspend request.  These things get cancelled when:
> 
> 	pm_runtime_suspend runs synchronously.  That happens in
> 	process context so we're okay.
> 
> 	pm_runtime_suspend_atomic (not yet written!) is called.

This is going to be added in a separate patch.

> 	If the cancellation fails, we'll have to return an error.
> 	The suspend will happen later, when the work item runs.
> 	Ultimately, the best we can do is recommend that people
> 	don't mix pm_request_suspend with pm_runtime_suspend_atomic.

Well, not only in that cases and in fact this is where the actual problem is.

Namely, pm_request_suspend() and pm_request_resume() have to cancel any
pending requests in a reliable way so that the work struct can be used safely
after they've returned.

Assume for example that there's a suspend request pending while
pm_request_resume() is being called.  pm_request_resume() uses
cancel_delayed_work() to kill off the request, but that's in interrupt and it
happens to return -1.  Now, there's pm_runtime_put_atomic() right after that
which attempts to queue up an idle notification request before the
delayed suspend request has a chance to run and bad things happen.

So, it seems, pm_request_resume() can't kill suspend requests by itself
and instead it has to queue up resume requests for this purpose, which
brings us right back to the problem of two requests queued up at a time
(a delayed suspend request and a resume request that is supposed to cancel it).

Nevertheless, using your workqueue patch we can still simplify things quite a
bit, so I think it's worth doing anyway.

> Which reminds me...  The way you've got things set up, 
> pm_runtime_put_atomic queues an idle notification, right?  That's 
> a little inconsistent with the naming of the other routines.
> 
> Instead, pm_runtime_put_atomic should be a version of pm_runtime_put
> that can safely be called in an atomic context -- it implies that it
> will call the runtime_notify callback while holding the spinlock.  The
> routine to queue an idle-notify request should be called something like
> pm_request_put -- although that name isn't so great because it sounds 
> like the put gets deferred instead of the notification.

There can be pm_request_put() and pm_request_put_sync(), for example.
Or pm_request_put_async() and pm_request_put(), depending on which version is
going to be used more often.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern June 29, 2009, 2:29 p.m. UTC | #4
On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:

> Well, not only in that cases and in fact this is where the actual problem is.
> 
> Namely, pm_request_suspend() and pm_request_resume() have to cancel any
> pending requests in a reliable way so that the work struct can be used safely
> after they've returned.

Right.

> Assume for example that there's a suspend request pending while
> pm_request_resume() is being called.  pm_request_resume() uses
> cancel_delayed_work() to kill off the request, but that's in interrupt and it
> happens to return -1.  Now, there's pm_runtime_put_atomic() right after that
> which attempts to queue up an idle notification request before the
> delayed suspend request has a chance to run and bad things happen.
> 
> So, it seems, pm_request_resume() can't kill suspend requests by itself
> and instead it has to queue up resume requests for this purpose, which
> brings us right back to the problem of two requests queued up at a time
> (a delayed suspend request and a resume request that is supposed to cancel it).

No, you're trying to do too much.  If the state is RPM_IDLE (i.e., a 
suspend request is pending) then rpm_request_resume doesn't need to do 
anything.  The device is already resumed!  Sure, it can try to kill the 
request and change the state to RPM_ACTIVE, but it doesn't need to.

Think about it.  Even if the suspend request were killed off, there's 
always the possibility that someone could call rpm_runtime_suspend 
right afterward.  If the driver really wants to resume the device and 
prevent it from suspending again, then the driver should call 
pm_runtime_get before pm_request_resume.  Then it won't matter if the 
suspend request runs.

> Nevertheless, using your workqueue patch we can still simplify things quite a
> bit, so I think it's worth doing anyway.

Me too.  :-)

> > Which reminds me...  The way you've got things set up, 
> > pm_runtime_put_atomic queues an idle notification, right?  That's 
> > a little inconsistent with the naming of the other routines.
> > 
> > Instead, pm_runtime_put_atomic should be a version of pm_runtime_put
> > that can safely be called in an atomic context -- it implies that it
> > will call the runtime_notify callback while holding the spinlock.  The
> > routine to queue an idle-notify request should be called something like
> > pm_request_put -- although that name isn't so great because it sounds 
> > like the put gets deferred instead of the notification.
> 
> There can be pm_request_put() and pm_request_put_sync(), for example.
> Or pm_request_put_async() and pm_request_put(), depending on which version is
> going to be used more often.

I don't follow you.  We only need one version of pm_request_put.  Did 
you mean "pm_runtime_put" and "pm_runtime_put_async"?  That would make 
sense.

If you use that (instead of pm_request_put) then would you want to
similarly rename pm_request_resume and pm_request_suspend to
pm_runtime_resume_async and pm_runtime_suspend_async?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki June 29, 2009, 2:54 p.m. UTC | #5
On Monday 29 June 2009, Alan Stern wrote:
> On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:
> 
> > Well, not only in that cases and in fact this is where the actual problem is.
> > 
> > Namely, pm_request_suspend() and pm_request_resume() have to cancel any
> > pending requests in a reliable way so that the work struct can be used safely
> > after they've returned.
> 
> Right.
> 
> > Assume for example that there's a suspend request pending while
> > pm_request_resume() is being called.  pm_request_resume() uses
> > cancel_delayed_work() to kill off the request, but that's in interrupt and it
> > happens to return -1.  Now, there's pm_runtime_put_atomic() right after that
> > which attempts to queue up an idle notification request before the
> > delayed suspend request has a chance to run and bad things happen.
> > 
> > So, it seems, pm_request_resume() can't kill suspend requests by itself
> > and instead it has to queue up resume requests for this purpose, which
> > brings us right back to the problem of two requests queued up at a time
> > (a delayed suspend request and a resume request that is supposed to cancel it).
> 
> No, you're trying to do too much.  If the state is RPM_IDLE (i.e., a 
> suspend request is pending) then rpm_request_resume doesn't need to do 
> anything.  The device is already resumed!  Sure, it can try to kill the 
> request and change the state to RPM_ACTIVE, but it doesn't need to.

I think it does need to do that, because the reuqest may be scheduled way
in the future and we can't preserve its work structure until it runs.
pm_request_resume() doesn't know in advance when the suspend work function is
going to be queued up and run.

> Think about it.  Even if the suspend request were killed off, there's 
> always the possibility that someone could call rpm_runtime_suspend 
> right afterward.  If the driver really wants to resume the device and 
> prevent it from suspending again, then the driver should call 
> pm_runtime_get before pm_request_resume.  Then it won't matter if the 
> suspend request runs.

No, it doesn't matter if the request runs, but it does matter if the work
structure used for queuing it up may be used for another purpose. :-)

> > Nevertheless, using your workqueue patch we can still simplify things quite a
> > bit, so I think it's worth doing anyway.
> 
> Me too.  :-)
> 
> > > Which reminds me...  The way you've got things set up, 
> > > pm_runtime_put_atomic queues an idle notification, right?  That's 
> > > a little inconsistent with the naming of the other routines.
> > > 
> > > Instead, pm_runtime_put_atomic should be a version of pm_runtime_put
> > > that can safely be called in an atomic context -- it implies that it
> > > will call the runtime_notify callback while holding the spinlock.  The
> > > routine to queue an idle-notify request should be called something like
> > > pm_request_put -- although that name isn't so great because it sounds 
> > > like the put gets deferred instead of the notification.
> > 
> > There can be pm_request_put() and pm_request_put_sync(), for example.
> > Or pm_request_put_async() and pm_request_put(), depending on which version is
> > going to be used more often.
> 
> I don't follow you.  We only need one version of pm_request_put.  Did 
> you mean "pm_runtime_put" and "pm_runtime_put_async"?  That would make 
> sense.

Yes, I did, sorry.

> If you use that (instead of pm_request_put) then would you want to
> similarly rename pm_request_resume and pm_request_suspend to
> pm_runtime_resume_async and pm_runtime_suspend_async?

Well, I think the pm_request_[suspend|resume] names are better. :-)

The problem with pm_<something>_put is that it does two things at a time,
decrements the resume counter and runs or queues up an idle notification.
Perhaps it's a good idea to call it after the second thing and change
pm_runtime_get() to pm_runtime_inuse(), so that we have:

* pm_runtime_inuse() - increment the resume counter
* pm_runtime_idle() - decrement the resume counter and run idle notification
* pm_request_idle()  - decrement the resume counter and queue idle notification

and __pm_runtime_idle() as the "bare" idle notification function?

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern June 29, 2009, 3:27 p.m. UTC | #6
On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:

> > > So, it seems, pm_request_resume() can't kill suspend requests by itself
> > > and instead it has to queue up resume requests for this purpose, which
> > > brings us right back to the problem of two requests queued up at a time
> > > (a delayed suspend request and a resume request that is supposed to cancel it).
> > 
> > No, you're trying to do too much.  If the state is RPM_IDLE (i.e., a 
> > suspend request is pending) then rpm_request_resume doesn't need to do 
> > anything.  The device is already resumed!  Sure, it can try to kill the 
> > request and change the state to RPM_ACTIVE, but it doesn't need to.
> 
> I think it does need to do that, because the reuqest may be scheduled way
> in the future and we can't preserve its work structure until it runs.
> pm_request_resume() doesn't know in advance when the suspend work function is
> going to be queued up and run.

It doesn't need to know.  All it needs to do is guarantee that the
device will be in a resumed state some time not long after the function
returns.  Thus calling rpm_request_resume while the status is RPM_IDLE
is like calling it while the status is RPM_ACTIVE.  In neither case
does it have to do anything, because the device will already be resumed
when it returns.

Perhaps instead we should provide a way to kill a pending suspend
request?  It's not clear that anyone would need this.  The only reason
I can think of is if you wanted to change the timeout duration.  But it
wouldn't be able to run in interrupt context.

> > Think about it.  Even if the suspend request were killed off, there's 
> > always the possibility that someone could call rpm_runtime_suspend 
> > right afterward.  If the driver really wants to resume the device and 
> > prevent it from suspending again, then the driver should call 
> > pm_runtime_get before pm_request_resume.  Then it won't matter if the 
> > suspend request runs.
> 
> No, it doesn't matter if the request runs, but it does matter if the work
> structure used for queuing it up may be used for another purpose. :-)

What else would it be used for?  If rpm_request_resume returns without 
doing anything and leaves the status set to RPM_IDLE, then the work 
structure won't be reused until the status changes.


> The problem with pm_<something>_put is that it does two things at a time,
> decrements the resume counter and runs or queues up an idle notification.
> Perhaps it's a good idea to call it after the second thing and change
> pm_runtime_get() to pm_runtime_inuse(), so that we have:
> 
> * pm_runtime_inuse() - increment the resume counter
> * pm_runtime_idle() - decrement the resume counter and run idle notification
> * pm_request_idle()  - decrement the resume counter and queue idle notification
> 
> and __pm_runtime_idle() as the "bare" idle notification function?

I could live with that, but the nice thing about "get" and "put" is
that they directly suggest a counter is being maintained and therefore
the calls have to balance.  Maybe we should just call it 
rpm_request_put and not worry that the put happens immediately.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki June 29, 2009, 3:55 p.m. UTC | #7
On Monday 29 June 2009, Alan Stern wrote:
> On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > > So, it seems, pm_request_resume() can't kill suspend requests by itself
> > > > and instead it has to queue up resume requests for this purpose, which
> > > > brings us right back to the problem of two requests queued up at a time
> > > > (a delayed suspend request and a resume request that is supposed to cancel it).
> > > 
> > > No, you're trying to do too much.  If the state is RPM_IDLE (i.e., a 
> > > suspend request is pending) then rpm_request_resume doesn't need to do 
> > > anything.  The device is already resumed!  Sure, it can try to kill the 
> > > request and change the state to RPM_ACTIVE, but it doesn't need to.
> > 
> > I think it does need to do that, because the reuqest may be scheduled way
> > in the future and we can't preserve its work structure until it runs.
> > pm_request_resume() doesn't know in advance when the suspend work function is
> > going to be queued up and run.
> 
> It doesn't need to know.  All it needs to do is guarantee that the
> device will be in a resumed state some time not long after the function
> returns.  Thus calling rpm_request_resume while the status is RPM_IDLE
> is like calling it while the status is RPM_ACTIVE.  In neither case
> does it have to do anything, because the device will already be resumed
> when it returns.

Not exactly, because RPM_IDLE prevents idle notifications from being run,
as it means a suspend has already been requested, which is not really the
case after pm_request_resume().

> Perhaps instead we should provide a way to kill a pending suspend
> request?  It's not clear that anyone would need this.  The only reason
> I can think of is if you wanted to change the timeout duration.  But it
> wouldn't be able to run in interrupt context.
> 
> > > Think about it.  Even if the suspend request were killed off, there's 
> > > always the possibility that someone could call rpm_runtime_suspend 
> > > right afterward.  If the driver really wants to resume the device and 
> > > prevent it from suspending again, then the driver should call 
> > > pm_runtime_get before pm_request_resume.  Then it won't matter if the 
> > > suspend request runs.
> > 
> > No, it doesn't matter if the request runs, but it does matter if the work
> > structure used for queuing it up may be used for another purpose. :-)
> 
> What else would it be used for?  If rpm_request_resume returns without 
> doing anything and leaves the status set to RPM_IDLE, then the work 
> structure won't be reused until the status changes.

Which is not right, because we may want to run ->runtime_idle() before
the status is changed.

That's why I think pm_request_resume() should queue up a resume request if
a suspend request is pending.

> > The problem with pm_<something>_put is that it does two things at a time,
> > decrements the resume counter and runs or queues up an idle notification.
> > Perhaps it's a good idea to call it after the second thing and change
> > pm_runtime_get() to pm_runtime_inuse(), so that we have:
> > 
> > * pm_runtime_inuse() - increment the resume counter
> > * pm_runtime_idle() - decrement the resume counter and run idle notification
> > * pm_request_idle()  - decrement the resume counter and queue idle notification
> > 
> > and __pm_runtime_idle() as the "bare" idle notification function?
> 
> I could live with that, but the nice thing about "get" and "put" is
> that they directly suggest a counter is being maintained and therefore
> the calls have to balance.  Maybe we should just call it 
> rpm_request_put and not worry that the put happens immediately.

OK

Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern June 29, 2009, 4:10 p.m. UTC | #8
On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:

> > It doesn't need to know.  All it needs to do is guarantee that the
> > device will be in a resumed state some time not long after the function
> > returns.  Thus calling rpm_request_resume while the status is RPM_IDLE
> > is like calling it while the status is RPM_ACTIVE.  In neither case
> > does it have to do anything, because the device will already be resumed
> > when it returns.
> 
> Not exactly, because RPM_IDLE prevents idle notifications from being run,
> as it means a suspend has already been requested, which is not really the
> case after pm_request_resume().

Yes it is.  A delayed suspend and an immediate resume have _both_ been
requested.  We are within our rights to say that the resume request 
gets fulfilled immediately (by doing nothing) and the suspend request 
will be fulfilled later.

> > > No, it doesn't matter if the request runs, but it does matter if the work
> > > structure used for queuing it up may be used for another purpose. :-)
> > 
> > What else would it be used for?  If rpm_request_resume returns without 
> > doing anything and leaves the status set to RPM_IDLE, then the work 
> > structure won't be reused until the status changes.
> 
> Which is not right, because we may want to run ->runtime_idle() before
> the status is changed.

If the status is RPM_IDLE then there's already a suspend request
queued.  So what reason is there for sending idle notifications?  The 
whole point of idle notifications is to let the driver know that it 
might want to initiate a suspend -- but one has already been initiated.

> That's why I think pm_request_resume() should queue up a resume request if
> a suspend request is pending.

Surely you don't mean we should suspend the device and then resume it
immediately afterward?  What would be the point?  Just leave the device 
active throughout.

As long as the behavior is documented, I think it will be okay for
pm_request_resume not to cancel a pending suspend request.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki June 29, 2009, 4:39 p.m. UTC | #9
On Monday 29 June 2009, Alan Stern wrote:
> On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > It doesn't need to know.  All it needs to do is guarantee that the
> > > device will be in a resumed state some time not long after the function
> > > returns.  Thus calling rpm_request_resume while the status is RPM_IDLE
> > > is like calling it while the status is RPM_ACTIVE.  In neither case
> > > does it have to do anything, because the device will already be resumed
> > > when it returns.
> > 
> > Not exactly, because RPM_IDLE prevents idle notifications from being run,
> > as it means a suspend has already been requested, which is not really the
> > case after pm_request_resume().
> 
> Yes it is.  A delayed suspend and an immediate resume have _both_ been
> requested.  We are within our rights to say that the resume request 
> gets fulfilled immediately (by doing nothing) and the suspend request 
> will be fulfilled later.

Theoretically, we are, but practically we want to be able to use
pm_runtime_put() (the asynchronous version) after a pm_runtime_resume()
that found the device operational, but that would result in queuing a request
using the same work structure that is used by the pending suspend request.
Don't you see a problem here?

> > > > No, it doesn't matter if the request runs, but it does matter if the work
> > > > structure used for queuing it up may be used for another purpose. :-)
> > > 
> > > What else would it be used for?  If rpm_request_resume returns without 
> > > doing anything and leaves the status set to RPM_IDLE, then the work 
> > > structure won't be reused until the status changes.
> > 
> > Which is not right, because we may want to run ->runtime_idle() before
> > the status is changed.
> 
> If the status is RPM_IDLE then there's already a suspend request
> queued.  So what reason is there for sending idle notifications?  The 
> whole point of idle notifications is to let the driver know that it 
> might want to initiate a suspend -- but one has already been initiated.
> 
> > That's why I think pm_request_resume() should queue up a resume request if
> > a suspend request is pending.
> 
> Surely you don't mean we should suspend the device and then resume it
> immediately afterward?

No, I don't.

> What would be the point?  Just leave the device active throughout.

Absolutely.

> As long as the behavior is documented, I think it will be okay for
> pm_request_resume not to cancel a pending suspend request.

I could agree with that, but what about pm_runtime_resume() happening after
a suspend request has been scheduled?  Should it also ignore the pending
suspend request?

In which case it would be consistent to allow to schedule suspends even though
the resume counter is greater than 0.

Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern June 29, 2009, 5:29 p.m. UTC | #10
On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:

> Theoretically, we are, but practically we want to be able to use
> pm_runtime_put() (the asynchronous version) after a pm_runtime_resume()
> that found the device operational, but that would result in queuing a request
> using the same work structure that is used by the pending suspend request.
> Don't you see a problem here?

This is a different situation.  pm_runtime_resume does have the luxury 
of killing the suspend request, and it should do so.

Let's think about it this way.  Why does a driver call
pm_request_resume in the first place?  Because an interrupt handler or
spinlocked region wants to do some I/O, so the device has to
be active.

But when will it do the I/O?  If the device is currently suspended, the
driver can do the I/O at the end of its runtime_resume callback.  But
if the status is RPM_ACTIVE, the callback won't be invoked, so the
interrupt handler will have to do the I/O directly.  The same is true
for RPM_IDLE.

Except for one problem: In RPM_IDLE, a suspend might occur at any time.  
(In theory the same thing could happen in RPM_ACTIVE.)  To prevent
this, the driver can call pm_runtime_get before pm_request_resume.  
When the I/O is all finished, it calls pm_request_put.

If the work routine starts running before the pm_request_put, it will 
see that the counter is positive so it will set the status back to 
RPM_ACTIVE.  Then the put will queue an idle notification.  If the work 
routine hasn't started running before the pm_request_put then the 
status will remain RPM_IDLE all along.

Regardless, it's not necessary for pm_request_resume to kill the 
suspend request.  And even if it did, the driver would still need to 
implement both pathways for doing the I/O.


> > As long as the behavior is documented, I think it will be okay for
> > pm_request_resume not to cancel a pending suspend request.
> 
> I could agree with that, but what about pm_runtime_resume() happening after
> a suspend request has been scheduled?  Should it also ignore the pending
> suspend request?

It could, but probably it shouldn't.

> In which case it would be consistent to allow to schedule suspends even though
> the resume counter is greater than 0.

True enough, although I'm not sure there's a good reason for it.  You 
certainly can increment the resume counter after scheduling a suspend 
request -- the effect would be the same.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki June 29, 2009, 6:25 p.m. UTC | #11
On Monday 29 June 2009, Alan Stern wrote:
> On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:
> 
> > Theoretically, we are, but practically we want to be able to use
> > pm_runtime_put() (the asynchronous version) after a pm_runtime_resume()
> > that found the device operational, but that would result in queuing a request
> > using the same work structure that is used by the pending suspend request.
> > Don't you see a problem here?
> 
> This is a different situation.  pm_runtime_resume does have the luxury 
> of killing the suspend request, and it should do so.

I should have said pm_request_resume(), sorry.

> Let's think about it this way.  Why does a driver call
> pm_request_resume in the first place?  Because an interrupt handler or
> spinlocked region wants to do some I/O, so the device has to
> be active.

Right.

> But when will it do the I/O?  If the device is currently suspended, the
> driver can do the I/O at the end of its runtime_resume callback.  But
> if the status is RPM_ACTIVE, the callback won't be invoked, so the
> interrupt handler will have to do the I/O directly.  The same is true
> for RPM_IDLE.

I still agree.

> Except for one problem: In RPM_IDLE, a suspend might occur at any time.  
> (In theory the same thing could happen in RPM_ACTIVE.)  To prevent
> this, the driver can call pm_runtime_get before pm_request_resume.  
> When the I/O is all finished, it calls pm_request_put.

At which point pm_request_put() tries to queue up an idle notification that
uses the same work_struct as the pending suspend request.  Not good.

> If the work routine starts running before the pm_request_put, it will 
> see that the counter is positive so it will set the status back to 
> RPM_ACTIVE.  Then the put will queue an idle notification.  If the work 
> routine hasn't started running before the pm_request_put then the 
> status will remain RPM_IDLE all along.
> 
> Regardless, it's not necessary for pm_request_resume to kill the 
> suspend request.  And even if it did, the driver would still need to 
> implement both pathways for doing the I/O.

IMO one can think of pm_request_resume() as a top half of pm_runtime_resume().
Thus, it should either queue up a request to run pm_runtime_resume() or leave
the status as though pm_runtime_resume() ran.  Anything else would be
internally inconsistent.  So, if pm_runtime_resume() cancels pending suspend
requests, pm_request_resume() should do the same or the other way around.

Now, arguably, ignoring pending suspend requests is somewhat easier from
the core's point of view, but it may not be so for drivers.

> > > As long as the behavior is documented, I think it will be okay for
> > > pm_request_resume not to cancel a pending suspend request.
> > 
> > I could agree with that, but what about pm_runtime_resume() happening after
> > a suspend request has been scheduled?  Should it also ignore the pending
> > suspend request?
> 
> It could, but probably it shouldn't.

So, IMO, pm_request_resume() shouldn't as well.

> > In which case it would be consistent to allow to schedule suspends even though
> > the resume counter is greater than 0.
> 
> True enough, although I'm not sure there's a good reason for it.  You 
> certainly can increment the resume counter after scheduling a suspend 
> request -- the effect would be the same.

Yes, it would.

My point is that the core should always treat pending suspend requests in the
same way.  If they are canceled by pm_runtime_resume(), then
pm_request_resume() should also cancel them and it shouldn't be possible
to schedule a suspend request when the resume counter is greater than 0.
In turn, if they are ignored by pm_runtime_resume(), then pm_request_resume()
should also ignore them and there's no point to prevent pm_request_suspend()
from scheduling a suspend request if the resume counter is greater than 0.

Any other type of behavior has a potential to confuse driver writers.

Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern June 29, 2009, 7:25 p.m. UTC | #12
On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:

> IMO one can think of pm_request_resume() as a top half of pm_runtime_resume().

Normal top halves don't trigger before the circumstances are
appropriate.  For example, if you enable remote wakeup on a USB device,
it won't send a wakeup signal before it has been powered down.  A
driver calling pm_request_resume while the device is still resumed is
like a USB device sending a wakeup request while it is still powered
up.  So IMO the analogy with top halves isn't a good one.

> Thus, it should either queue up a request to run pm_runtime_resume() or leave
> the status as though pm_runtime_resume() ran.  Anything else would be
> internally inconsistent.  So, if pm_runtime_resume() cancels pending suspend
> requests, pm_request_resume() should do the same or the other way around.
> 
> Now, arguably, ignoring pending suspend requests is somewhat easier from
> the core's point of view, but it may not be so for drivers.

The argument I gave in the previous email demonstrates that it doesn't
make any difference to drivers.  Either way, they have to use two I/O
pathways, they have to do a pm_runtime_get before pm_request_resume,
and they have to do a pm_request_put after the I/O is done.

Of course, this is all somewhat theoretical.  I still don't know of any 
actual drivers that do the equivalent of pm_request_resume.

> My point is that the core should always treat pending suspend requests in the
> same way.  If they are canceled by pm_runtime_resume(), then
> pm_request_resume() should also cancel them and it shouldn't be possible
> to schedule a suspend request when the resume counter is greater than 0.
> In turn, if they are ignored by pm_runtime_resume(), then pm_request_resume()
> should also ignore them and there's no point to prevent pm_request_suspend()
> from scheduling a suspend request if the resume counter is greater than 0.
> 
> Any other type of behavior has a potential to confuse driver writers.

Another possible approach you could take when the call to
cancel_delayed_work fails (which should be rare) is to turn on RPM_WAKE
in addition to RPM_IDLE and leave the suspend request queued.  When
__pm_runtime_suspend sees both flags are set, it should abort and set
the status directly back to RPM_ACTIVE.  At that time the idle
notifications can start up again.

Is this any better?  I can't see how drivers would care, though.

Alan Stern

P.S.: What do you think should happen if there's a delayed suspend
request pending, then pm_request_resume is called (and it leaves the
request queued), and then someone calls pm_runtime_suspend?  You've got
two pending requests and a synchronous call all active at the same
time!

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki June 29, 2009, 9:04 p.m. UTC | #13
On Monday 29 June 2009, Alan Stern wrote:
> On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:
> 
> > IMO one can think of pm_request_resume() as a top half of pm_runtime_resume().
> 
> Normal top halves don't trigger before the circumstances are
> appropriate.  For example, if you enable remote wakeup on a USB device,
> it won't send a wakeup signal before it has been powered down.  A
> driver calling pm_request_resume while the device is still resumed is
> like a USB device sending a wakeup request while it is still powered
> up.  So IMO the analogy with top halves isn't a good one.
> 
> > Thus, it should either queue up a request to run pm_runtime_resume() or leave
> > the status as though pm_runtime_resume() ran.  Anything else would be
> > internally inconsistent.  So, if pm_runtime_resume() cancels pending suspend
> > requests, pm_request_resume() should do the same or the other way around.
> > 
> > Now, arguably, ignoring pending suspend requests is somewhat easier from
> > the core's point of view, but it may not be so for drivers.
> 
> The argument I gave in the previous email demonstrates that it doesn't
> make any difference to drivers.  Either way, they have to use two I/O
> pathways, they have to do a pm_runtime_get before pm_request_resume,
> and they have to do a pm_request_put after the I/O is done.
> 
> Of course, this is all somewhat theoretical.  I still don't know of any 
> actual drivers that do the equivalent of pm_request_resume.
> 
> > My point is that the core should always treat pending suspend requests in the
> > same way.  If they are canceled by pm_runtime_resume(), then
> > pm_request_resume() should also cancel them and it shouldn't be possible
> > to schedule a suspend request when the resume counter is greater than 0.
> > In turn, if they are ignored by pm_runtime_resume(), then pm_request_resume()
> > should also ignore them and there's no point to prevent pm_request_suspend()
> > from scheduling a suspend request if the resume counter is greater than 0.
> > 
> > Any other type of behavior has a potential to confuse driver writers.
> 
> Another possible approach you could take when the call to
> cancel_delayed_work fails (which should be rare) is to turn on RPM_WAKE
> in addition to RPM_IDLE and leave the suspend request queued.  When
> __pm_runtime_suspend sees both flags are set, it should abort and set
> the status directly back to RPM_ACTIVE.  At that time the idle
> notifications can start up again.
> 
> Is this any better?  I can't see how drivers would care, though.

There still is the problem that the suspend request is occupying the
work_struct which cannot be used for any other purpose.  I don't think this
is avoidable, though.  This way or another it is possible to have two requests
pending at a time.

Perhaps the simplest thing to do would be to simply ignore pending suspend
requests in both pm_request_resume() and pm_runtime_resume() and to allow
them to be scheduled at any time.  That shouldn't hurt anything as long as
pm_runtime_suspend() is smart enough, but it has to be anyway, because it
can be run synchronously at any time.

The only question is what pm_runtime_suspend() should do when it sees a pending
suspend request and quite frankly I think it can just ignore it as well,
leaving the RPM_IDLE bit set.  In which case the name RPM_IDLE will not really
be adequate, so perhaps it can be renamed to RPM_REQUEST or something like
this.

Then, we'll need a separate work structure for suspend requests, but I have no
problem with that.

> P.S.: What do you think should happen if there's a delayed suspend
> request pending, then pm_request_resume is called (and it leaves the
> request queued), and then someone calls pm_runtime_suspend?  You've got
> two pending requests and a synchronous call all active at the same
> time!

That's easy, pm_runtime_suspend() sees a pending resume, so it quits and the
other things work out as usual.

Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern June 29, 2009, 10 p.m. UTC | #14
On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:

> > Another possible approach you could take when the call to
> > cancel_delayed_work fails (which should be rare) is to turn on RPM_WAKE
> > in addition to RPM_IDLE and leave the suspend request queued.  When
> > __pm_runtime_suspend sees both flags are set, it should abort and set
> > the status directly back to RPM_ACTIVE.  At that time the idle
> > notifications can start up again.
> > 
> > Is this any better?  I can't see how drivers would care, though.
> 
> There still is the problem that the suspend request is occupying the
> work_struct which cannot be used for any other purpose.

What other purpose?  We don't send idle notifications in RPM_IDLE and
resume requests don't need to be stored since (as described above) they
just set the RPM_WAKE flag.  Hence nothing else needs to use the
work_struct.

>  I don't think this
> is avoidable, though.  This way or another it is possible to have two requests
> pending at a time.
> 
> Perhaps the simplest thing to do would be to simply ignore pending suspend
> requests in both pm_request_resume() and pm_runtime_resume() and to allow
> them to be scheduled at any time.  That shouldn't hurt anything as long as
> pm_runtime_suspend() is smart enough, but it has to be anyway, because it
> can be run synchronously at any time.
> 
> The only question is what pm_runtime_suspend() should do when it sees a pending
> suspend request and quite frankly I think it can just ignore it as well,
> leaving the RPM_IDLE bit set.  In which case the name RPM_IDLE will not really
> be adequate, so perhaps it can be renamed to RPM_REQUEST or something like
> this.
> 
> Then, we'll need a separate work structure for suspend requests, but I have no
> problem with that.

You seem to be thinking about these requests in a very different way
from me.  They don't form a queue or anything like that.  Instead they
mean "Change the device's power state to this value as soon as
possible" -- and they are needed only because sometimes (in atomic or
interrupt contexts) the change can't be made right away.

That's why it doesn't make any sense to have both a suspend and a 
resume request pending at the same time.  It would mean the driver is 
telling us "Change the device's power state to both low-power and 
full-power as soon as possible"!

We should settle on a general policy for how to handle it when a 
driver makes the mistake of telling us to do contradictory things.  
There are three natural policies:

	The first request takes precedence over the second;

	The second request takes precedence over the first;

	Resumes take precedence over suspends.

Any one of those would be acceptable.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki June 29, 2009, 10:50 p.m. UTC | #15
On Tuesday 30 June 2009, Alan Stern wrote:
> On Mon, 29 Jun 2009, Rafael J. Wysocki wrote:
> 
> > > Another possible approach you could take when the call to
> > > cancel_delayed_work fails (which should be rare) is to turn on RPM_WAKE
> > > in addition to RPM_IDLE and leave the suspend request queued.  When
> > > __pm_runtime_suspend sees both flags are set, it should abort and set
> > > the status directly back to RPM_ACTIVE.  At that time the idle
> > > notifications can start up again.
> > > 
> > > Is this any better?  I can't see how drivers would care, though.
> > 
> > There still is the problem that the suspend request is occupying the
> > work_struct which cannot be used for any other purpose.
> 
> What other purpose?  We don't send idle notifications in RPM_IDLE

OK

> and resume requests don't need to be stored since (as described above) they
> just set the RPM_WAKE flag.  Hence nothing else needs to use the
> work_struct.

Good.  I'd go for it, then.  OK?

> >  I don't think this
> > is avoidable, though.  This way or another it is possible to have two requests
> > pending at a time.
> > 
> > Perhaps the simplest thing to do would be to simply ignore pending suspend
> > requests in both pm_request_resume() and pm_runtime_resume() and to allow
> > them to be scheduled at any time.  That shouldn't hurt anything as long as
> > pm_runtime_suspend() is smart enough, but it has to be anyway, because it
> > can be run synchronously at any time.
> > 
> > The only question is what pm_runtime_suspend() should do when it sees a pending
> > suspend request and quite frankly I think it can just ignore it as well,
> > leaving the RPM_IDLE bit set.  In which case the name RPM_IDLE will not really
> > be adequate, so perhaps it can be renamed to RPM_REQUEST or something like
> > this.
> > 
> > Then, we'll need a separate work structure for suspend requests, but I have no
> > problem with that.
> 
> You seem to be thinking about these requests in a very different way
> from me.  They don't form a queue or anything like that.  Instead they
> mean "Change the device's power state to this value as soon as
> possible" -- and they are needed only because sometimes (in atomic or
> interrupt contexts) the change can't be made right away.
> 
> That's why it doesn't make any sense to have both a suspend and a 
> resume request pending at the same time.  It would mean the driver is 
> telling us "Change the device's power state to both low-power and 
> full-power as soon as possible"!
> 
> We should settle on a general policy for how to handle it when a 
> driver makes the mistake of telling us to do contradictory things.  
> There are three natural policies:
> 
> 	The first request takes precedence over the second;
> 
> 	The second request takes precedence over the first;
> 
> 	Resumes take precedence over suspends.
> 
> Any one of those would be acceptable.

IMO resumes should take precedence over suspends, because resume usually means
"there's I/O to process" and we usually we want the I/O to be processed as soon
as possible (deferred wake-up will usually mean deferred I/O and that would
hurt user experience).

Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern June 30, 2009, 3:10 p.m. UTC | #16
On Tue, 30 Jun 2009, Rafael J. Wysocki wrote:

> > There are three natural policies:
> > 
> > 	The first request takes precedence over the second;
> > 
> > 	The second request takes precedence over the first;
> > 
> > 	Resumes take precedence over suspends.
> > 
> > Any one of those would be acceptable.
> 
> IMO resumes should take precedence over suspends, because resume usually means
> "there's I/O to process" and we usually we want the I/O to be processed as soon
> as possible (deferred wake-up will usually mean deferred I/O and that would
> hurt user experience).

I don't know.  I gave this a lot of thought last night, and it seems
that the best approach would be to always obey the most recent request.  
(In the case of delayed suspend requests, this is ambiguous because
there are two times involved: when the request was originally submitted
and when the delay expires.  We should use the time of original
submission.)  The only exception is pm_request_put; it shouldn't
override an existing suspend request just to send an idle notification.

If this seems difficult to implement, it's an indication that the
overall design needs to be reworked.  Here's what I came up with:

The possible statuses are reduced to: RPM_ACTIVE, RPM_SUSPENDING, 
RPM_SUSPENDED, RPM_RESUMING, and RPM_ERROR.  These most directly 
correspond to the core's view of the device state.  Transitions are:

	RPM_ACTIVE <-> RPM_SUSPENDING -> RPM_SUSPENDED ->
		RPM_RESUMING -> RPM_ACTIVE ...

Failure of a suspend causes the backward transition from RPM_SUSPENDING 
to RPM_ACTIVE, and errors cause a transition to RPM_ERROR.  Otherwise 
we always go forward.

Instead of a delayed_work_struct, we'll have a regular work_struct plus 
a separate timer_list structure.  That way we get more control over 
what happens when the timer expires.  The timer callback routine will 
submit the work_struct, but it will do so under the spinlock and only 
after making the proper checks.

There will be only one work callback routine.  It decides what to do
based on the status and a new field: async_action.  The possible values
for async_action are 0 (do nothing), ASYNC_SUSPEND, ASYNC_RESUME, and
ASYNC_NOTIFY.

There will be a few extra fields: a work_pending flag, the timer 
expiration value (which doubles as a timer_pending flag by being set
to 0 when the timer isn't pending), and maybe some others.

There are restrictions on what can happen in each state.  The timer is
allowed to run only in RPM_RESUMING and RPM_ACTIVE, and ASYNC_NOTIFY is
allowed only in those states.  ASYNC_RESUME isn't allowed in
RPM_RESUMING or RPM_ACTIVE, and ASYNC_SUSPEND isn't allowed in
RPM_SUSPENDING or RPM_SUSPENDED.  Pending work isn't allowed in
RPM_RESUMING or RPM_SUSPENDING; if a request is submitted at such times
is merely sets async_action, and the work will be scheduled when the
resume or suspend finishes.  This is to avoid forcing the workqueue
thread to wait unnecessarily.

__pm_runtime_suspend and __pm_runtime_resume start out by cancelling 
the timer and the work_struct (if they are pending) and by setting 
async_action to 0.  The cancellations don't have to wait; if a callback 
routine is already running then when it gets the spinlock it will see 
that it has nothing to do.

If __pm_runtime_suspend was called asynchronously and the status is
already RPM_SUSPENDING, it can return after taking these actions.  If
the status is already RPM_RESUMING, it should set async_action to
ASYNC_SUSPEND and then return.  __pm_runtime_resume behaves similarly.

The pm_request_* routines similarly cancel a pending timer and clear
async_action.  They should cancel any pending work unless they're going
to submit new work anyway.

That's enough to give you the general idea.  I think this design is 
a lot cleaner than the current one.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki June 30, 2009, 10:30 p.m. UTC | #17
On Tuesday 30 June 2009, Alan Stern wrote:
... 
> That's enough to give you the general idea.  I think this design is 
> a lot cleaner than the current one.

Well, I'm not really happy with starting over, but if you think we should do
that, then let's do it.

I think we both agree that the callbacks, ->runtime_idle(), ->runtime_suspend()
and ->runtime_resume() make sense.  Now, the role of the framework, IMO, is to
provide a mechanism by which it is possible:
(1) to schedule a delayed execution of ->runtime_suspend(), possibly from
    interrupt context,
(2) to schedule execution of ->runtime_resume() or ->runtime_idle(), possibly
    from interrupt context,
(3) to execute ->runtime_suspend() or ->runtime_resume() directly in a
    synchronous way (I'm not sure about ->runtime_idle())
_and_ to ensure that these callbacks will be executed when it makes sense.
There's no other point, because the core has no information to make choices,
it can only prevent wrong things from happening, if possible.

I think you will agree that the users of the framework should be able to
prevent ->runtime_suspend() from being called and that's what the usage counter
is for.  Also, IMO it should be impossible to execute ->runtime_idle(), via the
framework, when the usage counter is nonzero.

BTW, I don't think resume_count is the best name; it used to be in the version
of my patch where it was automatically incremented when ->runtime_resume() was
about to called.  usage_count is probably better.

Next, I think that the framework should refuse to call ->runtime_suspend() and
->runtime_idle() if the children of the device are not suspended and the
"ignore children" flag is unset.  The counter of unsuspended children is used
for that.  I think the rule should be that it is decremented for the parent
whenever ->runtime_suspend() is called for a child and it is incremented
for the parent whenever ->runtime_resume() is called for a child.

Now, the question is what rules should apply to the ordering and possible
simultaneous execution of ->runtime_idle(), ->runtime_suspend() and
->runtime_resume().  I think the following rules make sense:

  * It is forbidden to run ->runtime_suspend() twice in a row.

  * It is forbidden to run ->runtime_suspend() in parallel with another instance
    of ->runtime_suspend().

  * It is forbidden to run ->runtime_resume() twice in a row.

  * It is forbidden to run ->runtime_resume() in parallel with another instance
    of ->runtime_resume().

  * It is allowed to run ->runtime_suspend() after ->runtime_resume() or after
    ->runtime_idle(), but the latter case is preferred. 

  * It is allowed to run ->runtime_resume() after ->runtime_suspend().

  * It is forbidden to run ->runtime_resume() after ->runtime_idle().

  * It is forbidden to run ->runtime_suspend() and ->runtime_resume() in
    parallel with each other.

  * It is forbidden to run ->runtime_idle() twice in a row.

  * It is forbidden to run ->runtime_idle() in parallel with another instance
    of ->runtime_idle().

  * It is forbidden to run ->runtime_idle() after ->runtime_suspend().

  * It is allowed to run ->runtime_idle() after ->runtime_resume().

  * It is allowed to execute ->runtime_suspend() or ->runtime_resume() when
    ->runtime_idle() is running.  In particular, it is allowed to (indirectly)
    call ->runtime_suspend() from within ->runtime_idle().

  * It is forbidden to execute ->runtime_idle() when ->runtime_resume() or
    ->runtime_suspend() is running.

  * If ->runtime_resume() is about to be called immediately after
    ->runtime_suspend(), the execution of ->runtime_suspend() should be
    prevented from happening, if possible, in which case the execution of
    ->runtime_resume() shouldn't happen.

  * If ->runtime_suspend() is about to be called immediately after
    ->runtime_resume(), the execution of ->runtime_resume() should be
    prevented from happening, if possible, in which case the execution of
    ->runtime_suspend() shouldn't happen.

[Are there any more rules related to these callbacks we should take into
account?]

Next, if we agree about the rules above, the question is what helper functions
should be provided by the core allowing these rules to be followed
automatically and what error codes should be returned by them in case it
wasn't possible to proceed without breaking the rules.

IMO, it is reasonable to provide:

  * pm_schedule_suspend(dev, delay) - schedule the execution of
    ->runtime_suspend(dev) after delay.

  * pm_runtime_suspend(dev) - execute ->runtime_suspend(dev) right now.

  * pm_runtime_resume(dev) - execute ->runtime_resume(dev) right now.

  * pm_request_resume(dev) - put a request to execute ->runtime_resume(dev)
    into the run-time PM workqueue.

  * pm_runtime_get(dev) - increment the device's usage counter.

  * pm_runtime_put(dev) - decrement the device's usage counter.

  * pm_runtime_idle(dev) - execute ->runtime_idle(dev) right now if the usage
    counter is zero and all of the device's children are suspended (or the
    "ignore children" flag is set).

  * pm_request_idle(dev) - put a request to execute ->runtime_idle(dev)
    into the run-time PM workqueue.  The usage counter and children will be
    checked immediately before executing ->runtime_idle(dev).

I'm not sure if it is really necessary to combine pm_runtime_idle() or
pm_request_idle() with pm_runtime_put().  At least right now I don't see any
real value of that.

I also am not sure what error codes should be returned by the above helper
functions and in what conditions.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern July 1, 2009, 3:35 p.m. UTC | #18
On Wed, 1 Jul 2009, Rafael J. Wysocki wrote:

> On Tuesday 30 June 2009, Alan Stern wrote:
> ... 
> > That's enough to give you the general idea.  I think this design is 
> > a lot cleaner than the current one.
> 
> Well, I'm not really happy with starting over, but if you think we should do
> that, then let's do it.

It's not a complete restart.  Much of the existing interface and quite
a bit of code would remain the same.

> I think we both agree that the callbacks, ->runtime_idle(), ->runtime_suspend()
> and ->runtime_resume() make sense.  Now, the role of the framework, IMO, is to
> provide a mechanism by which it is possible:
> (1) to schedule a delayed execution of ->runtime_suspend(), possibly from
>     interrupt context,
> (2) to schedule execution of ->runtime_resume() or ->runtime_idle(), possibly
>     from interrupt context,
> (3) to execute ->runtime_suspend() or ->runtime_resume() directly in a
>     synchronous way (I'm not sure about ->runtime_idle())

Yes, runtime_idle also, for drivers that require minimal overhead.

> _and_ to ensure that these callbacks will be executed when it makes sense.

Thus if the situation changes before the callback can be made, so that
it no longer makes sense, the framework should cancel the callback.

> There's no other point, because the core has no information to make choices,
> it can only prevent wrong things from happening, if possible.

Exactly.

> I think you will agree that the users of the framework should be able to
> prevent ->runtime_suspend() from being called and that's what the usage counter
> is for.  Also, IMO it should be impossible to execute ->runtime_idle(), via the
> framework, when the usage counter is nonzero.

Right, because then by definition the device is in use so it can't be 
idle.

> BTW, I don't think resume_count is the best name; it used to be in the version
> of my patch where it was automatically incremented when ->runtime_resume() was
> about to called.  usage_count is probably better.

Fine.

> Next, I think that the framework should refuse to call ->runtime_suspend() and
> ->runtime_idle() if the children of the device are not suspended and the
> "ignore children" flag is unset.

Yes; this is part of the "makes sense" requirement.

>  The counter of unsuspended children is used
> for that.  I think the rule should be that it is decremented for the parent
> whenever ->runtime_suspend() is called for a child and it is incremented
> for the parent whenever ->runtime_resume() is called for a child.

Of course.  (Minor change: decremented when runtime_suspend _succeeds_ 
for a child.)

> Now, the question is what rules should apply to the ordering and possible
> simultaneous execution of ->runtime_idle(), ->runtime_suspend() and
> ->runtime_resume().  I think the following rules make sense:

Oh dear.  I wouldn't attempt to make a complete list of all possible 
interactions.  It's too hard to know whether you have really covered 
all the cases.

>   * It is forbidden to run ->runtime_suspend() twice in a row.
> 
>   * It is forbidden to run ->runtime_suspend() in parallel with another instance
>     of ->runtime_suspend().
> 
>   * It is forbidden to run ->runtime_resume() twice in a row.
> 
>   * It is forbidden to run ->runtime_resume() in parallel with another instance
>     of ->runtime_resume().
> 
>   * It is allowed to run ->runtime_suspend() after ->runtime_resume() or after
>     ->runtime_idle(), but the latter case is preferred. 
> 
>   * It is allowed to run ->runtime_resume() after ->runtime_suspend().
> 
>   * It is forbidden to run ->runtime_resume() after ->runtime_idle().
> 
>   * It is forbidden to run ->runtime_suspend() and ->runtime_resume() in
>     parallel with each other.
> 
>   * It is forbidden to run ->runtime_idle() twice in a row.
> 
>   * It is forbidden to run ->runtime_idle() in parallel with another instance
>     of ->runtime_idle().
> 
>   * It is forbidden to run ->runtime_idle() after ->runtime_suspend().
> 
>   * It is allowed to run ->runtime_idle() after ->runtime_resume().
> 
>   * It is allowed to execute ->runtime_suspend() or ->runtime_resume() when
>     ->runtime_idle() is running.  In particular, it is allowed to (indirectly)
>     call ->runtime_suspend() from within ->runtime_idle().
> 
>   * It is forbidden to execute ->runtime_idle() when ->runtime_resume() or
>     ->runtime_suspend() is running.

We can summarize these rules as follows:

	Never allow more than one callback at a time, except that
	runtime_suspend may be invoked while runtime_idle is running.

	Don't call runtime_resume while the device is active.

	Don't call runtime_suspend or runtime_idle while the device
	is suspended.

	Don't invoke any callbacks if the device state is unknown
	(RPM_ERROR).

Implicit is the notion that the device is suspended when 
runtime_suspend returns successfully, it is active when runtime_resume 
returns successfully, and it is unknown when either returns an error.

>   * If ->runtime_resume() is about to be called immediately after
>     ->runtime_suspend(), the execution of ->runtime_suspend() should be
>     prevented from happening, if possible, in which case the execution of
>     ->runtime_resume() shouldn't happen.
> 
>   * If ->runtime_suspend() is about to be called immediately after
>     ->runtime_resume(), the execution of ->runtime_resume() should be
>     prevented from happening, if possible, in which case the execution of
>     ->runtime_suspend() shouldn't happen.

These could be considered optional optimizations.  Or if you prefer, 
they could be covered by a "New requests override previous requests" 
rule.

> [Are there any more rules related to these callbacks we should take into
> account?]

	Runtime PM callbacks are mutually exclusive with other driver
	core callbacks (probe, remove, dev_pm_ops, etc.).

	If a callback occurs asynchronously then it will be invoked
	in process context.  If it occurs as part of a synchronous
	request then it is invoked in the caller's context.

Related to this is the requirement that pm_runtime_idle,
pm_runtime_suspend, and pm_runtime_resume must always be called in
process context whereas pm_runtime_idle_atomic,
pm_runtime_suspend_atomic, and pm_runtime_resume_atomic may be called
in any context.

> Next, if we agree about the rules above, the question is what helper functions
> should be provided by the core allowing these rules to be followed
> automatically and what error codes should be returned by them in case it
> wasn't possible to proceed without breaking the rules.
> 
> IMO, it is reasonable to provide:
> 
>   * pm_schedule_suspend(dev, delay) - schedule the execution of
>     ->runtime_suspend(dev) after delay.
> 
>   * pm_runtime_suspend(dev) - execute ->runtime_suspend(dev) right now.
> 
>   * pm_runtime_resume(dev) - execute ->runtime_resume(dev) right now.
> 
>   * pm_request_resume(dev) - put a request to execute ->runtime_resume(dev)
>     into the run-time PM workqueue.
> 
>   * pm_runtime_get(dev) - increment the device's usage counter.
> 
>   * pm_runtime_put(dev) - decrement the device's usage counter.
> 
>   * pm_runtime_idle(dev) - execute ->runtime_idle(dev) right now if the usage
>     counter is zero and all of the device's children are suspended (or the
>     "ignore children" flag is set).
> 
>   * pm_request_idle(dev) - put a request to execute ->runtime_idle(dev)
>     into the run-time PM workqueue.  The usage counter and children will be
>     checked immediately before executing ->runtime_idle(dev).

Should the counters also be checked when the request is submitted?  
And should the same go for pm_schedule_suspend?  These are nontrivial
questions; good arguments can be made both ways.

> I'm not sure if it is really necessary to combine pm_runtime_idle() or
> pm_request_idle() with pm_runtime_put().  At least right now I don't see any
> real value of that.

Likewise combining pm_runtime_get with pm_runtime_resume.  The only
value is to make things easier for drivers, because these will be very
common idioms.

> I also am not sure what error codes should be returned by the above helper
> functions and in what conditions.

The error codes you have been using seem okay to me, in general.

However, some of those requests would violate the rules in a trivial 
way.  For these we might return a positive value rather than a negative 
error code.  For example, calling pm_runtime_resume while the device is 
already active shouldn't be considered an error.  But it can't be 
considered a complete success either, because it won't invoke the 
runtime_resume method.

To be determined: How runtime PM will interact with system sleep.


About all I can add is the "New requests override previous requests"  
policy.  This would apply to all the non-synchronous requests, whether
they are delayed or added directly to the workqueue.  If a new request
(synchronous or not) is received before the old one has started to run,
the old one will be cancelled.  This holds even if the new request is
redundant, like a resume request received while the device is active.

There is one exception to this rule: An idle_notify request does not 
cancel a delayed or queued suspend request.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki July 1, 2009, 10:19 p.m. UTC | #19
On Wednesday 01 July 2009, Alan Stern wrote:
> On Wed, 1 Jul 2009, Rafael J. Wysocki wrote:
> 
> > On Tuesday 30 June 2009, Alan Stern wrote:
> > ... 
> > > That's enough to give you the general idea.  I think this design is 
> > > a lot cleaner than the current one.
> > 
> > Well, I'm not really happy with starting over, but if you think we should do
> > that, then let's do it.
> 
> It's not a complete restart.  Much of the existing interface and quite
> a bit of code would remain the same.
> 
> > I think we both agree that the callbacks, ->runtime_idle(), ->runtime_suspend()
> > and ->runtime_resume() make sense.  Now, the role of the framework, IMO, is to
> > provide a mechanism by which it is possible:
> > (1) to schedule a delayed execution of ->runtime_suspend(), possibly from
> >     interrupt context,
> > (2) to schedule execution of ->runtime_resume() or ->runtime_idle(), possibly
> >     from interrupt context,
> > (3) to execute ->runtime_suspend() or ->runtime_resume() directly in a
> >     synchronous way (I'm not sure about ->runtime_idle())
> 
> Yes, runtime_idle also, for drivers that require minimal overhead.
> 
> > _and_ to ensure that these callbacks will be executed when it makes sense.
> 
> Thus if the situation changes before the callback can be made, so that
> it no longer makes sense, the framework should cancel the callback.

Yes, but there's one thing to consider.  Suppose a remote wake-up causes a
resume request to be queued up and pm_runtime_resume() is called synchronously
exactly at the time the request's work function is started.  There are two
attempts to resume in progress, but only one of them can call
->runtime_resume(), so what's the other one supposed to do?  The asynchronous
one can just return error code, but the the caller of the synchronous
pm_runtime_resume() must know whether or not the resume was successful.
So, perhaps, if the synchronous resume happens to lose the race, it should
wait for the other one to complete, check the device's status and return 0 if
it's active?  That wouldn't cause the workqueue thread to wait.

> > There's no other point, because the core has no information to make choices,
> > it can only prevent wrong things from happening, if possible.
> 
> Exactly.
> 
> > I think you will agree that the users of the framework should be able to
> > prevent ->runtime_suspend() from being called and that's what the usage counter
> > is for.  Also, IMO it should be impossible to execute ->runtime_idle(), via the
> > framework, when the usage counter is nonzero.
> 
> Right, because then by definition the device is in use so it can't be 
> idle.
> 
> > BTW, I don't think resume_count is the best name; it used to be in the version
> > of my patch where it was automatically incremented when ->runtime_resume() was
> > about to called.  usage_count is probably better.
> 
> Fine.
> 
> > Next, I think that the framework should refuse to call ->runtime_suspend() and
> > ->runtime_idle() if the children of the device are not suspended and the
> > "ignore children" flag is unset.
> 
> Yes; this is part of the "makes sense" requirement.
> 
> >  The counter of unsuspended children is used
> > for that.  I think the rule should be that it is decremented for the parent
> > whenever ->runtime_suspend() is called for a child and it is incremented
> > for the parent whenever ->runtime_resume() is called for a child.
> 
> Of course.  (Minor change: decremented when runtime_suspend _succeeds_ 
> for a child.)
> 
> > Now, the question is what rules should apply to the ordering and possible
> > simultaneous execution of ->runtime_idle(), ->runtime_suspend() and
> > ->runtime_resume().  I think the following rules make sense:
> 
> Oh dear.  I wouldn't attempt to make a complete list of all possible 
> interactions.  It's too hard to know whether you have really covered 
> all the cases.
> 
> >   * It is forbidden to run ->runtime_suspend() twice in a row.
> > 
> >   * It is forbidden to run ->runtime_suspend() in parallel with another instance
> >     of ->runtime_suspend().
> > 
> >   * It is forbidden to run ->runtime_resume() twice in a row.
> > 
> >   * It is forbidden to run ->runtime_resume() in parallel with another instance
> >     of ->runtime_resume().
> > 
> >   * It is allowed to run ->runtime_suspend() after ->runtime_resume() or after
> >     ->runtime_idle(), but the latter case is preferred. 
> > 
> >   * It is allowed to run ->runtime_resume() after ->runtime_suspend().
> > 
> >   * It is forbidden to run ->runtime_resume() after ->runtime_idle().
> > 
> >   * It is forbidden to run ->runtime_suspend() and ->runtime_resume() in
> >     parallel with each other.
> > 
> >   * It is forbidden to run ->runtime_idle() twice in a row.
> > 
> >   * It is forbidden to run ->runtime_idle() in parallel with another instance
> >     of ->runtime_idle().
> > 
> >   * It is forbidden to run ->runtime_idle() after ->runtime_suspend().
> > 
> >   * It is allowed to run ->runtime_idle() after ->runtime_resume().
> > 
> >   * It is allowed to execute ->runtime_suspend() or ->runtime_resume() when
> >     ->runtime_idle() is running.  In particular, it is allowed to (indirectly)
> >     call ->runtime_suspend() from within ->runtime_idle().
> > 
> >   * It is forbidden to execute ->runtime_idle() when ->runtime_resume() or
> >     ->runtime_suspend() is running.
> 
> We can summarize these rules as follows:
> 
> 	Never allow more than one callback at a time, except that
> 	runtime_suspend may be invoked while runtime_idle is running.

Caution here.  If ->runtime_idle() runs ->runtime_suspend() and immediately
after that resume is requested by remote wake-up, ->runtime_resume() may also
be run while ->runtime_idle() is still running.

OTOH, we need to know when ->runtime_idle() has completed, because we have to
ensure it won't still be running after run-time PM has been disabled for the
device.

IMO, we need two flags, one indicating that either ->runtime_suspend(), or
->runtime_resume() is being executed (they are mutually exclusive) and the
the other one indicating that ->runtime_idle() is being executed.  For the
purpose of further discussion below I'll call them RPM_IDLE_RUNNING and
RPM_IN_TRANSITION.

With this notation, the above rule may be translated as:

    Don't run any of the callbacks if RPM_IN_TRANSITION is set.  Don't run
    ->runtime_idle() if RPM_IDLE_RUNNING is set.

Which implies that RPM_IDLE_RUNNING cannot be set when RPM_IN_TRANSITION is
set, but it is valid to set RPM_IN_TRANSITION when RPM_IDLE_RUNNING is set.

> 	Don't call runtime_resume while the device is active.
> 
> 	Don't call runtime_suspend or runtime_idle while the device
> 	is suspended.
> 
> 	Don't invoke any callbacks if the device state is unknown
> 	(RPM_ERROR).
> 
> Implicit is the notion that the device is suspended when 
> runtime_suspend returns successfully, it is active when runtime_resume 
> returns successfully, and it is unknown when either returns an error.

Yes.

There are two possible "final" states, so I'd use one flag to indicate the
current status.  Let's call it RPM_SUSPENDED for now (which means that the
device is suspended when it's set and active otherwise) and I think we can make
the rule that this flag is only changed after successful execution of
->runtime_suspend() or ->runtime_resume().

Whether the device is suspending or resuming follows from the values of
RPM_SUSPENDED and RPM_IN_TRANSITION.

> >   * If ->runtime_resume() is about to be called immediately after
> >     ->runtime_suspend(), the execution of ->runtime_suspend() should be
> >     prevented from happening, if possible, in which case the execution of
> >     ->runtime_resume() shouldn't happen.
> > 
> >   * If ->runtime_suspend() is about to be called immediately after
> >     ->runtime_resume(), the execution of ->runtime_resume() should be
> >     prevented from happening, if possible, in which case the execution of
> >     ->runtime_suspend() shouldn't happen.
> 
> These could be considered optional optimizations.  Or if you prefer, 
> they could be covered by a "New requests override previous requests" 
> rule.

I'm not sure if I agree with this rule yet.

> > [Are there any more rules related to these callbacks we should take into
> > account?]
> 
> 	Runtime PM callbacks are mutually exclusive with other driver
> 	core callbacks (probe, remove, dev_pm_ops, etc.).

OK
 
> 	If a callback occurs asynchronously then it will be invoked
> 	in process context.  If it occurs as part of a synchronous
> 	request then it is invoked in the caller's context.
> 
> Related to this is the requirement that pm_runtime_idle,
> pm_runtime_suspend, and pm_runtime_resume must always be called in
> process context whereas pm_runtime_idle_atomic,
> pm_runtime_suspend_atomic, and pm_runtime_resume_atomic may be called
> in any context.

OK

> > Next, if we agree about the rules above, the question is what helper functions
> > should be provided by the core allowing these rules to be followed
> > automatically and what error codes should be returned by them in case it
> > wasn't possible to proceed without breaking the rules.
> > 
> > IMO, it is reasonable to provide:
> > 
> >   * pm_schedule_suspend(dev, delay) - schedule the execution of
> >     ->runtime_suspend(dev) after delay.
> > 
> >   * pm_runtime_suspend(dev) - execute ->runtime_suspend(dev) right now.
> > 
> >   * pm_runtime_resume(dev) - execute ->runtime_resume(dev) right now.
> > 
> >   * pm_request_resume(dev) - put a request to execute ->runtime_resume(dev)
> >     into the run-time PM workqueue.
> > 
> >   * pm_runtime_get(dev) - increment the device's usage counter.
> > 
> >   * pm_runtime_put(dev) - decrement the device's usage counter.
> > 
> >   * pm_runtime_idle(dev) - execute ->runtime_idle(dev) right now if the usage
> >     counter is zero and all of the device's children are suspended (or the
> >     "ignore children" flag is set).
> > 
> >   * pm_request_idle(dev) - put a request to execute ->runtime_idle(dev)
> >     into the run-time PM workqueue.  The usage counter and children will be
> >     checked immediately before executing ->runtime_idle(dev).
> 
> Should the counters also be checked when the request is submitted?  
> And should the same go for pm_schedule_suspend?  These are nontrivial
> questions; good arguments can be made both ways.

That's the difficult part. :-)

First, I think a delayed suspend should be treated in a special way, because
it's not really a request to suspend.  Namely, as long as the timer hasn't
triggered yet, nothing happens and there's nothing against the rules above.
A request to suspend is queued up after the timer has triggered and the timer
function is where the rules come into play.  IOW, it consists of two
operations, setting up a timer and queuing up a request to suspend when the
timer triggers.  IMO the first of them can be done at any time, while the other
one may be affected by the rules.

It implies that we should really introduce a timer and a timer function that
will queue up suspend requests, instead of using struct delayed_work.

Second, I think it may be a good idea to use the usage counter to block further
requests while submitting a resume request.

Namely, suppose that pm_request_resume() increments usage_count and returns 0,
if the resume was not necessary and the caller can do the I/O by itself, or
error code, which means that it was necessary to queue up a resume request.
If 0 is returned, the caller is supposed to do the I/O and call
pm_runtime_put() when done.  Otherwise it just quits and ->runtime_resume() is
supposed to take care of the I/O, in which case the request's work function
should call pm_runtime_put() when done.  [If it was impossible to queue up a
request, error code is returned, but the usage counter is decremented by
pm_request_resume(), so that the caller need not handle that special case,
hopefully rare.]

This implies that it may be a good idea to check usage_count when submitting
idle notification and suspend requests (where in case of suspend a request is
submitted by the timer function, when the timer has already triggered, so
there's no need to check the counter while setting up the timer).

The counter of unsuspended children may change after a request has been
submitted and before its work function has a chance to run, so I don't see much
point checking it when submitting requests.

So, if the above idea is adopted, idle notification and suspend requests
won't be queued up when a resume request is pending (there's the question what
the timer function attempting to queue up a suspend request is supposed to do
in such a case) and in the other cases we can use the following rules:

    Any pending request takes precedence over a new idle notification request.

    If a new request is not an idle notification request, it takes precedence
    over the pending one, so it cancels it with the help of cancel_work().

[In the latter case, if a suspend request is canceled, we may want to set up the
timer for another one.]  For that, we're going to need a single flag, say
RPM_PENDING, which is set whenever a request is queued up.

> > I'm not sure if it is really necessary to combine pm_runtime_idle() or
> > pm_request_idle() with pm_runtime_put().  At least right now I don't see any
> > real value of that.
> 
> Likewise combining pm_runtime_get with pm_runtime_resume.  The only
> value is to make things easier for drivers, because these will be very
> common idioms.
> 
> > I also am not sure what error codes should be returned by the above helper
> > functions and in what conditions.
> 
> The error codes you have been using seem okay to me, in general.
> 
> However, some of those requests would violate the rules in a trivial 
> way.  For these we might return a positive value rather than a negative 
> error code.  For example, calling pm_runtime_resume while the device is 
> already active shouldn't be considered an error.  But it can't be 
> considered a complete success either, because it won't invoke the 
> runtime_resume method.

That need not matter from the caller's point of view, though.  In the case of
pm_runtime_resume() the caller will probably be mostly interested whether or
not it can do I/O after the function has returned.

> To be determined: How runtime PM will interact with system sleep.

Yes.  My first idea was to disable run-time PM before entering a system sleep
state, but that would involve canceling all of the pending requests.

> About all I can add is the "New requests override previous requests"  
> policy.  This would apply to all the non-synchronous requests, whether
> they are delayed or added directly to the workqueue.  If a new request
> (synchronous or not) is received before the old one has started to run,
> the old one will be cancelled.  This holds even if the new request is
> redundant, like a resume request received while the device is active.
> 
> There is one exception to this rule: An idle_notify request does not 
> cancel a delayed or queued suspend request.

I'm not sure if such a rigid rule will be really useful.

Also, as I said above, I think we shouldn't regard setting up the suspend
timer as queuing up a request, but as a totally separate operation.

Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki July 2, 2009, 3:42 p.m. UTC | #20
On Thursday 02 July 2009, Rafael J. Wysocki wrote:
> On Wednesday 01 July 2009, Alan Stern wrote:
> > On Wed, 1 Jul 2009, Rafael J. Wysocki wrote:
...
> > Should the counters also be checked when the request is submitted?  
> > And should the same go for pm_schedule_suspend?  These are nontrivial
> > questions; good arguments can be made both ways.
> 
> That's the difficult part. :-)
> 
> First, I think a delayed suspend should be treated in a special way, because
> it's not really a request to suspend.  Namely, as long as the timer hasn't
> triggered yet, nothing happens and there's nothing against the rules above.
> A request to suspend is queued up after the timer has triggered and the timer
> function is where the rules come into play.  IOW, it consists of two
> operations, setting up a timer and queuing up a request to suspend when the
> timer triggers.  IMO the first of them can be done at any time, while the other
> one may be affected by the rules.
> 
> It implies that we should really introduce a timer and a timer function that
> will queue up suspend requests, instead of using struct delayed_work.
> 
> Second, I think it may be a good idea to use the usage counter to block further
> requests while submitting a resume request.
> 
> Namely, suppose that pm_request_resume() increments usage_count and returns 0,
> if the resume was not necessary and the caller can do the I/O by itself, or
> error code, which means that it was necessary to queue up a resume request.
> If 0 is returned, the caller is supposed to do the I/O and call
> pm_runtime_put() when done.  Otherwise it just quits and ->runtime_resume() is
> supposed to take care of the I/O, in which case the request's work function
> should call pm_runtime_put() when done.  [If it was impossible to queue up a
> request, error code is returned, but the usage counter is decremented by
> pm_request_resume(), so that the caller need not handle that special case,
> hopefully rare.]
> 
> This implies that it may be a good idea to check usage_count when submitting
> idle notification and suspend requests (where in case of suspend a request is
> submitted by the timer function, when the timer has already triggered, so
> there's no need to check the counter while setting up the timer).
> 
> The counter of unsuspended children may change after a request has been
> submitted and before its work function has a chance to run, so I don't see much
> point checking it when submitting requests.
> 
> So, if the above idea is adopted, idle notification and suspend requests
> won't be queued up when a resume request is pending (there's the question what
> the timer function attempting to queue up a suspend request is supposed to do
> in such a case) and in the other cases we can use the following rules:
>
>     Any pending request takes precedence over a new idle notification request.
> 
>     If a new request is not an idle notification request, it takes precedence
>     over the pending one, so it cancels it with the help of cancel_work().
> 
> [In the latter case, if a suspend request is canceled, we may want to set up the
> timer for another one.]  For that, we're going to need a single flag, say
> RPM_PENDING, which is set whenever a request is queued up.

After some reconsideration I'd like to change that in the following way:

     Any pending request takes precedence over a new idle notification request.

     A pending request takes precedence over a new request of the same type.

    If the new request is not an idle notification request and is not of the
    same type as the pending one, it takes precedence over the pending one, so
    it cancels the pending request with the help of cancel_work().

So, instead of a single flag, I'd like to use a 2-bit field to store
information about pending requests, where the 4 values are RPM_REQ_NONE,
RPM_REQ_IDLE, RPM_REQ_SUSPEND, RPM_REQ_RESUME.

Also, IMO it makes sense to queue up an idle notification or suspend request
regardless of the current status of the device, as long as the usage counter is
greater than 0, because the status can always change after the request has been
submitted and before its work function is executed.

So, I think we can use something like this:

struct dev_pm_info {
	pm_message_t		power_state;
	unsigned			can_wakeup:1;
	unsigned			should_wakeup:1;
	enum dpm_state		status;		/* Owned by the PM core */
#ifdef CONFIG_PM_SLEEP
	struct list_head	entry;
#endif
#ifdef CONFIG_PM_RUNTIME
	struct timer_list	suspend_timer;
	wait_queue_head_t	wait_queue;
	struct work_struct	work;
	spinlock_t		lock;
	atomic_t		usage_count;
	atomic_t		child_count;
	unsigned int		ignore_children:1;
	unsigned int		enabled:1; /* 'true' if run-time PM is enabled */
	unsigned int		idle_notification:1; /* 'true' if ->runtime_idle() is running */
	unsigned int		in_transition:1; /* 'true' if ->runtime_[suspend|resume]() is running */
	unsigned int		suspended:1; /* 'true' if current status is 'suspended' */
	unsigned int		pending_request:2; /* RPM_REQ_NONE, RPM_REQ_IDLE, etc. */
	unsigned int		runtime_error:1; /* 'ture' if the last transition failed */
	int			error_code; /* Error code returned by the last executed callback */
#endif
};

with the following rules regarding the (most important) helper functions:

  pm_schedule_suspend(dev, delay) is always successful.  It adds a new timer
  with pm_request_suspend() as the timer function, dev as the data and
  jiffies + delay as the expiration time.  If the timer is pending when this
  function is called, the timer is deactivated using del_timer() and replaced
  by the new timer.

  pm_request_suspend() checks if 'usage_count' is zero and returns -EAGAIN
  if not.  Next, it checks if 'pending_request' is RPM_REQ_SUSPEND and returns
  -EALREADY in that case.  Next, if 'pending_request' is RPM_REQ_IDLE,
  the request is cancelled.  Finally, a new suspend request is submitted.

  pm_runtime_suspend() checks if 'usage_count' is zero and returns -EAGAIN
  if not.  Next, it checks 'in_transition' and 'suspended' and returns 0 if the
  former is unset and the latter is set.  If 'in_transition' is set and 'suspended'
  is not set (the device is currently suspending), the behavior depends on
  whether or not the function was called synchronously, in which case it waits
  for the other suspend to finish.  If it was called via the workqueue,
  -EINPROGRESS is returned.  Next, 'in_transition' is set, ->runtime_suspend()
  is executed amd 'in_transition' is unset.  If ->runtime_suspend() returned 0,
  'suspended' is set and 0 is returned.  Otherwise, if the error code was
  -EAGAIN or -EBUSY, 'suspended' is not set and the error code is returned.
  Otherwise, 'runtime_error' is set and the error code is returned ('suspended'
  is not set).

  pm_request_resume() increments 'usage_count' and checks 'suspended' and
  'in_transition'.  If both 'suspended' and 'in_transition" are not set, 0 is
  returned and the caller is supposed to decrement 'usage_count', with the
  help of pm_runtime_put().  Otherwise, the function checks if
  'pending_request' is different from zero, in which case the pending request
  is canceled.  Finally, a new resume request is submitted and -EBUSY is
  returned.  In that case, 'usage_count' will be decremented by the request's
  work function (not by pm_runtime_resume(), but by the wrapper function that
  calls it).

  pm_runtime_resume() increments 'usage_count' and checks 'in_transition' and
  'suspended'.  If both are unset, 0 is returned.  If both are set (the device
  is resuming) the behavior depends on whether or not the function was called
  synchronously, in which case it waits for the concurrent resume to complete,
  while it immediately returns -EINPROGRESS in the other case.  If 'suspended'
  is not set, but 'in_transision' is set (the device is suspending), the
  function waits for the suspend to complete and starts over.  Next,
  'in_transition' is set, ->runtime_resume() is executed and 'in_transition'
  is unset.  If ->runtime_resume() returned 0, 'suspended' is unset and 0 is
  returned.  Otherwise, 'runtime_error' is set and the error code from
  ->runtime_resume() is returned ('suspended' is not unset).  'usage_count' is
  always decremented before return, regardless of the return value.

  pm_request_idle() checks 'usage_count' and returns -EAGAIN if it's greater
  than 0.  Next, it checks 'pending_request' and immediately returns -EBUSY, if
  it's different from RPM_REQ_NONE and RPM_REQ_IDLE, or -EALREADY, if it's
  equal to RPM_REQ_IDLE.  Finally, new idle notification request is submitted.

  pm_runtime_idle() checks 'usage_count' and returns -EAGAIN if it's greater
  than 0.  Next, it checks 'suspended' and 'in_transition' and returns -EBUSY
  if any of them is set.  Next, it checks 'idle_notification' and returns
  -EINPROGRESS is it's set.  Finally, 'idle_notification' is set,
  ->runtime_idle()  is executed and 'idle_notification' is unset.

Additionally, all of the helper functions return -EINVAL immediately if
'runtime_error' is set.

Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern July 2, 2009, 3:55 p.m. UTC | #21
On Thu, 2 Jul 2009, Rafael J. Wysocki wrote:

> > > _and_ to ensure that these callbacks will be executed when it makes sense.
> > 
> > Thus if the situation changes before the callback can be made, so that
> > it no longer makes sense, the framework should cancel the callback.
> 
> Yes, but there's one thing to consider.  Suppose a remote wake-up causes a
> resume request to be queued up and pm_runtime_resume() is called synchronously
> exactly at the time the request's work function is started.  There are two
> attempts to resume in progress, but only one of them can call
> ->runtime_resume(), so what's the other one supposed to do?  The asynchronous
> one can just return error code, but the the caller of the synchronous
> pm_runtime_resume() must know whether or not the resume was successful.
> So, perhaps, if the synchronous resume happens to lose the race, it should
> wait for the other one to complete, check the device's status and return 0 if
> it's active?  That wouldn't cause the workqueue thread to wait.

I didn't address this explicitly in the previous message, but yes.  
This is no different from the way your current version works.

Similarly, if a synchronous resume call occurs while a suspend is in 
progress, it should wait until the suspend finishes and then carry out 
a resume.

> > We can summarize these rules as follows:
> > 
> > 	Never allow more than one callback at a time, except that
> > 	runtime_suspend may be invoked while runtime_idle is running.
> 
> Caution here.  If ->runtime_idle() runs ->runtime_suspend() and immediately
> after that resume is requested by remote wake-up, ->runtime_resume() may also
> be run while ->runtime_idle() is still running.

Yes, I didn't think of that case.  We have to allow either of the other 
two to be invoked while runtime_idle is running.  But we can rule out 
calling runtime_idle recursively.

> OTOH, we need to know when ->runtime_idle() has completed, because we have to
> ensure it won't still be running after run-time PM has been disabled for the
> device.
> 
> IMO, we need two flags, one indicating that either ->runtime_suspend(), or
> ->runtime_resume() is being executed (they are mutually exclusive) and the
> the other one indicating that ->runtime_idle() is being executed.  For the
> purpose of further discussion below I'll call them RPM_IDLE_RUNNING and
> RPM_IN_TRANSITION.

The RPM_IN_TRANSITION flag is unnecessary.  It would always be equal to
(status == RPM_SUSPENDING || status == RPM_RESUMING).

> With this notation, the above rule may be translated as:
> 
>     Don't run any of the callbacks if RPM_IN_TRANSITION is set.  Don't run
>     ->runtime_idle() if RPM_IDLE_RUNNING is set.
> 
> Which implies that RPM_IDLE_RUNNING cannot be set when RPM_IN_TRANSITION is
> set, but it is valid to set RPM_IN_TRANSITION when RPM_IDLE_RUNNING is set.

That is equivalent to my conclusion above.

> There are two possible "final" states, so I'd use one flag to indicate the
> current status.  Let's call it RPM_SUSPENDED for now (which means that the
> device is suspended when it's set and active otherwise) and I think we can make
> the rule that this flag is only changed after successful execution of
> ->runtime_suspend() or ->runtime_resume().
> 
> Whether the device is suspending or resuming follows from the values of
> RPM_SUSPENDED and RPM_IN_TRANSITION.

You can use two single-bit flags (SUSPEND and IN_TRANSITION) or a 
single two-bit state value (ACTIVE, SUSPENDING, SUSPENDED, RESUMING).  
It doesn't make much difference which you choose.


> > Should the counters also be checked when the request is submitted?  
> > And should the same go for pm_schedule_suspend?  These are nontrivial
> > questions; good arguments can be made both ways.
> 
> That's the difficult part. :-)
> 
> First, I think a delayed suspend should be treated in a special way, because
> it's not really a request to suspend.  Namely, as long as the timer hasn't
> triggered yet, nothing happens and there's nothing against the rules above.
> A request to suspend is queued up after the timer has triggered and the timer
> function is where the rules come into play.  IOW, it consists of two
> operations, setting up a timer and queuing up a request to suspend when the
> timer triggers.  IMO the first of them can be done at any time, while the other
> one may be affected by the rules.

I don't agree.  For example, suppose the device has an active child
when the driver says: Suspend it in 30 seconds.  If the child is then
removed after only 10 seconds, does it make sense to go ahead with
suspending the parent 20 seconds later?  No -- if the parent is going
to be suspended, the decision as to when should be made at the time the
child is removed, not beforehand.

(Even more concretely, suppose there is a 30-second inactivity timeout
for autosuspend.  Removing the child counts as activity and so should
restart the timer.)

To put it another way, suppose you accept a delayed request under
inappropriate conditions.  If the conditions don't change, the whole
thing was a waste of effort.  And if the conditions do change, then the
whole delayed request should be reconsidered anyhow.  So why accept it?

> It implies that we should really introduce a timer and a timer function that
> will queue up suspend requests, instead of using struct delayed_work.

Yes, this was part of my proposal.

> Second, I think it may be a good idea to use the usage counter to block further
> requests while submitting a resume request.
> 
> Namely, suppose that pm_request_resume() increments usage_count and returns 0,
> if the resume was not necessary and the caller can do the I/O by itself, or
> error code, which means that it was necessary to queue up a resume request.
> If 0 is returned, the caller is supposed to do the I/O and call
> pm_runtime_put() when done.  Otherwise it just quits and ->runtime_resume() is
> supposed to take care of the I/O, in which case the request's work function
> should call pm_runtime_put() when done.  [If it was impossible to queue up a
> request, error code is returned, but the usage counter is decremented by
> pm_request_resume(), so that the caller need not handle that special case,
> hopefully rare.]

Trying to keep track of reasons for incrementing and decrementing 
usage_count is very difficult to do in the core.  What happens if 
pm_request_resume increments the count but then the driver calls 
pm_runtime_get, pm_runtime_resume, pm_runtime_put all before the work 
routine can run?

It's better to make the driver responsible for maintaining the counter
value.  Forcing the driver to do pm_runtime_get, pm_request_resume is
better than having the core automatically change the counter.

> This implies that it may be a good idea to check usage_count when submitting
> idle notification and suspend requests (where in case of suspend a request is
> submitted by the timer function, when the timer has already triggered, so
> there's no need to check the counter while setting up the timer).
> 
> The counter of unsuspended children may change after a request has been
> submitted and before its work function has a chance to run, so I don't see much
> point checking it when submitting requests.

As I said above, if the counters don't change then the submission was 
unnecessary, and if they do change then the submission should be 
reconsidered.  Therefore they _should_ be checked in submissions.

> So, if the above idea is adopted, idle notification and suspend requests
> won't be queued up when a resume request is pending (there's the question what
> the timer function attempting to queue up a suspend request is supposed to do
> in such a case) and in the other cases we can use the following rules:
> 
>     Any pending request takes precedence over a new idle notification request.

For pending resume requests this rule is unnecessary; it's invalid to
submit an idle notification request while a resume request is pending
(since resume requests can be pending only in the RPM_SUSPENDING and
RPM_SUSPENDED states while idle notification requests are accepted only
in RPM_RESUMING and RPM_ACTIVE).

For pending suspends, I think we should allow synchronous idle
notifications while the suspend is pending.  The runtime_idle callback
might then start its own suspend before the workqueue can get around to
it.  You're right about async idle requests though; that was the 
exception I noted below.

>     If a new request is not an idle notification request, it takes precedence
>     over the pending one, so it cancels it with the help of cancel_work().
> 
> [In the latter case, if a suspend request is canceled, we may want to set up the
> timer for another one.]  For that, we're going to need a single flag, say
> RPM_PENDING, which is set whenever a request is queued up.

That's what I called work_pending in my proposal.

> > The error codes you have been using seem okay to me, in general.
> > 
> > However, some of those requests would violate the rules in a trivial 
> > way.  For these we might return a positive value rather than a negative 
> > error code.  For example, calling pm_runtime_resume while the device is 
> > already active shouldn't be considered an error.  But it can't be 
> > considered a complete success either, because it won't invoke the 
> > runtime_resume method.
> 
> That need not matter from the caller's point of view, though.  In the case of
> pm_runtime_resume() the caller will probably be mostly interested whether or
> not it can do I/O after the function has returned.

Yes.  But the driver might depend on something happening inside the
runtime_resume method, so it would need to know if a successful
pm_runtime_resume wasn't going to invoke the callback.

> > To be determined: How runtime PM will interact with system sleep.
> 
> Yes.  My first idea was to disable run-time PM before entering a system sleep
> state, but that would involve canceling all of the pending requests.

Or simply freezing the workqueue.

> > About all I can add is the "New requests override previous requests"  
> > policy.  This would apply to all the non-synchronous requests, whether
> > they are delayed or added directly to the workqueue.  If a new request
> > (synchronous or not) is received before the old one has started to run,
> > the old one will be cancelled.  This holds even if the new request is
> > redundant, like a resume request received while the device is active.
> > 
> > There is one exception to this rule: An idle_notify request does not 
> > cancel a delayed or queued suspend request.
> 
> I'm not sure if such a rigid rule will be really useful.

A rigid rule is easier to understand and apply than one with a large
number of special cases.  However, in the statement of the rule above,
I forgot to mention that this applies only if the new request is valid,
i.e., if it's not forbidden by the current status or the counter
values.

> Also, as I said above, I think we shouldn't regard setting up the suspend
> timer as queuing up a request, but as a totally separate operation.

Well, there can't be any pending resume requests when the suspend timer
is set up, so we have to consider only pending idle notifications or
pending suspends.  I agree, we would want to allow an idle notification
to remain pending when the suspend timer is set up.  As for pending
suspends, we _should_ allow the new request to override the old one.  
This will come up whenever the timeout value is changed.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki July 2, 2009, 5:50 p.m. UTC | #22
On Thursday 02 July 2009, Alan Stern wrote:
> On Thu, 2 Jul 2009, Rafael J. Wysocki wrote:
> 
> > > > _and_ to ensure that these callbacks will be executed when it makes sense.
> > > 
> > > Thus if the situation changes before the callback can be made, so that
> > > it no longer makes sense, the framework should cancel the callback.
> > 
> > Yes, but there's one thing to consider.  Suppose a remote wake-up causes a
> > resume request to be queued up and pm_runtime_resume() is called synchronously
> > exactly at the time the request's work function is started.  There are two
> > attempts to resume in progress, but only one of them can call
> > ->runtime_resume(), so what's the other one supposed to do?  The asynchronous
> > one can just return error code, but the the caller of the synchronous
> > pm_runtime_resume() must know whether or not the resume was successful.
> > So, perhaps, if the synchronous resume happens to lose the race, it should
> > wait for the other one to complete, check the device's status and return 0 if
> > it's active?  That wouldn't cause the workqueue thread to wait.
> 
> I didn't address this explicitly in the previous message, but yes.  
> This is no different from the way your current version works.
> 
> Similarly, if a synchronous resume call occurs while a suspend is in 
> progress, it should wait until the suspend finishes and then carry out 
> a resume.

Agreed.

> > > We can summarize these rules as follows:
> > > 
> > > 	Never allow more than one callback at a time, except that
> > > 	runtime_suspend may be invoked while runtime_idle is running.
> > 
> > Caution here.  If ->runtime_idle() runs ->runtime_suspend() and immediately
> > after that resume is requested by remote wake-up, ->runtime_resume() may also
> > be run while ->runtime_idle() is still running.
> 
> Yes, I didn't think of that case.  We have to allow either of the other 
> two to be invoked while runtime_idle is running.  But we can rule out 
> calling runtime_idle recursively.
> 
> > OTOH, we need to know when ->runtime_idle() has completed, because we have to
> > ensure it won't still be running after run-time PM has been disabled for the
> > device.
> > 
> > IMO, we need two flags, one indicating that either ->runtime_suspend(), or
> > ->runtime_resume() is being executed (they are mutually exclusive) and the
> > the other one indicating that ->runtime_idle() is being executed.  For the
> > purpose of further discussion below I'll call them RPM_IDLE_RUNNING and
> > RPM_IN_TRANSITION.
> 
> The RPM_IN_TRANSITION flag is unnecessary.  It would always be equal to
> (status == RPM_SUSPENDING || status == RPM_RESUMING).

I thought of replacing the old flags with RPM_IN_TRANSITION, actually.

> > With this notation, the above rule may be translated as:
> > 
> >     Don't run any of the callbacks if RPM_IN_TRANSITION is set.  Don't run
> >     ->runtime_idle() if RPM_IDLE_RUNNING is set.
> > 
> > Which implies that RPM_IDLE_RUNNING cannot be set when RPM_IN_TRANSITION is
> > set, but it is valid to set RPM_IN_TRANSITION when RPM_IDLE_RUNNING is set.
> 
> That is equivalent to my conclusion above.
> 
> > There are two possible "final" states, so I'd use one flag to indicate the
> > current status.  Let's call it RPM_SUSPENDED for now (which means that the
> > device is suspended when it's set and active otherwise) and I think we can make
> > the rule that this flag is only changed after successful execution of
> > ->runtime_suspend() or ->runtime_resume().
> > 
> > Whether the device is suspending or resuming follows from the values of
> > RPM_SUSPENDED and RPM_IN_TRANSITION.
> 
> You can use two single-bit flags (SUSPEND and IN_TRANSITION) or a 
> single two-bit state value (ACTIVE, SUSPENDING, SUSPENDED, RESUMING).  
> It doesn't make much difference which you choose.

No, it doesn't.

Still, the additional flag for 'idle notification is in progress' is still
necessary for the following two reasons:

(1) Idle notifications cannot be run (synchronously) when one is already in
    progress, so we need a means to determine whether or not this is the case.

(2) If run-time PM is to be disabled, the function doing that must guarantee
    that ->runtime_idle() won't be running after it's returned, so it needs to
    know how to check that.

> > > Should the counters also be checked when the request is submitted?  
> > > And should the same go for pm_schedule_suspend?  These are nontrivial
> > > questions; good arguments can be made both ways.
> > 
> > That's the difficult part. :-)
> > 
> > First, I think a delayed suspend should be treated in a special way, because
> > it's not really a request to suspend.  Namely, as long as the timer hasn't
> > triggered yet, nothing happens and there's nothing against the rules above.
> > A request to suspend is queued up after the timer has triggered and the timer
> > function is where the rules come into play.  IOW, it consists of two
> > operations, setting up a timer and queuing up a request to suspend when the
> > timer triggers.  IMO the first of them can be done at any time, while the other
> > one may be affected by the rules.
> 
> I don't agree.  For example, suppose the device has an active child
> when the driver says: Suspend it in 30 seconds.  If the child is then
> removed after only 10 seconds, does it make sense to go ahead with
> suspending the parent 20 seconds later?  No -- if the parent is going
> to be suspended, the decision as to when should be made at the time the
> child is removed, not beforehand.

There are two functions, on that sets up the timer and the other that queues
up the request.  This is the second one that makes the decision if the request
is still worth queuing up.

> (Even more concretely, suppose there is a 30-second inactivity timeout
> for autosuspend.  Removing the child counts as activity and so should
> restart the timer.)
> 
> To put it another way, suppose you accept a delayed request under
> inappropriate conditions.  If the conditions don't change, the whole
> thing was a waste of effort.  And if the conditions do change, then the
> whole delayed request should be reconsidered anyhow.

The problem is, even if you always accept a delayed request under appropriate
conditions, you still have to reconsider it before putting it into the work
queue, because the conditions might have changed.  So, you'd like to do this:

(1) Check if the conditions are appropriate, set up a timer.
(2) Check if the conditions are appropriate, queue up a suspend request.

while I think it will be simpler to do this:

(1) Set up a timer.
(2) Check if the conditions are appropriate, queue up a suspend request.

In any case you can have a pm_runtime_suspend() / pm_runtime_resume() cycle
between (1) and (2), so I don't really see a practical difference.

> So why accept it?

Beacuse that simplifies things?

For example, suppose ->runtime_resume() has been called as
a result of a remote wake-up (ie. after pm_request_resume()) and it has some
I/O to process, but it is known beforehand that the device will most likely be
inactive after the I/O is done.  So, it's tempting to call
pm_schedule_suspend() from within ->runtime_resume(), but the conditions are
inappropriate (the device is not regarded as suspended).  However, calling
pm_schedule_suspend() with a long enough delay doesn't break any rules related
to the ->runtime_*() callbacks, so why should it be forbidden?

Next, suppose pm_schedule_suspend() is called, but it fails because the
conditions are inappropriate.  What's the caller supposed to do?  Wait for the
conditions to change and repeat?  But why should it bother if the conditions
may still change before ->runtime_suspend() is actually called?

IMO, it's the caller's problem whether or not what it does is useful or
efficient.  The core's problem is to ensure that it doesn't break things.

> > It implies that we should really introduce a timer and a timer function that
> > will queue up suspend requests, instead of using struct delayed_work.
> 
> Yes, this was part of my proposal.
> 
> > Second, I think it may be a good idea to use the usage counter to block further
> > requests while submitting a resume request.
> > 
> > Namely, suppose that pm_request_resume() increments usage_count and returns 0,
> > if the resume was not necessary and the caller can do the I/O by itself, or
> > error code, which means that it was necessary to queue up a resume request.
> > If 0 is returned, the caller is supposed to do the I/O and call
> > pm_runtime_put() when done.  Otherwise it just quits and ->runtime_resume() is
> > supposed to take care of the I/O, in which case the request's work function
> > should call pm_runtime_put() when done.  [If it was impossible to queue up a
> > request, error code is returned, but the usage counter is decremented by
> > pm_request_resume(), so that the caller need not handle that special case,
> > hopefully rare.]
> 
> Trying to keep track of reasons for incrementing and decrementing 
> usage_count is very difficult to do in the core.  What happens if 
> pm_request_resume increments the count but then the driver calls 
> pm_runtime_get, pm_runtime_resume, pm_runtime_put all before the work 
> routine can run?

Nothing wrong, as long as the increments and decrements are balanced (if they
aren't balanced, there is a bug in the driver anyway).  In fact, for this to
work we need the rule that a new request of the same type doesn't replace an
existing one.  Then, the submitted resume request cannot be canceled, so the
work function will run and drop the usage counter.

> It's better to make the driver responsible for maintaining the counter
> value.  Forcing the driver to do pm_runtime_get, pm_request_resume is
> better than having the core automatically change the counter.

So the caller will do:

pm_runtime_get(dev);
error = pm_request_resume(dev);
if (error)
    goto out;
<process I/O>
pm_runtime_put();

but how is it supposed to ensure that pm_runtime_put() will be called after
executing the 'goto out' thing?

Anyway, we don't need to use the usage counter for that (although it's cheap).
Instead, we can make pm_request_suspend() and pm_request_idle() check if a
resume request is pending and fail if that's the case.

> > This implies that it may be a good idea to check usage_count when submitting
> > idle notification and suspend requests (where in case of suspend a request is
> > submitted by the timer function, when the timer has already triggered, so
> > there's no need to check the counter while setting up the timer).
> > 
> > The counter of unsuspended children may change after a request has been
> > submitted and before its work function has a chance to run, so I don't see much
> > point checking it when submitting requests.
> 
> As I said above, if the counters don't change then the submission was 
> unnecessary, and if they do change then the submission should be 
> reconsidered.  Therefore they _should_ be checked in submissions.

Let's put it another way.  What's the practical benefit to the caller if we
always check the counters in submissions?

> > So, if the above idea is adopted, idle notification and suspend requests
> > won't be queued up when a resume request is pending (there's the question what
> > the timer function attempting to queue up a suspend request is supposed to do
> > in such a case) and in the other cases we can use the following rules:
> > 
> >     Any pending request takes precedence over a new idle notification request.
> 
> For pending resume requests this rule is unnecessary; it's invalid to
> submit an idle notification request while a resume request is pending
> (since resume requests can be pending only in the RPM_SUSPENDING and
> RPM_SUSPENDED states while idle notification requests are accepted only
> in RPM_RESUMING and RPM_ACTIVE).

It is correct nevertheless. :-)

> For pending suspends, I think we should allow synchronous idle
> notifications while the suspend is pending.

Sure, I was talking only about requests here, where by 'request' I understood
a work item put into the workqueue.

> The runtime_idle callback might then start its own suspend before the
> workqueue can get around to it.  You're right about async idle requests
> though; that was the exception I noted below.
> 
> >     If a new request is not an idle notification request, it takes precedence
> >     over the pending one, so it cancels it with the help of cancel_work().
> > 
> > [In the latter case, if a suspend request is canceled, we may want to set up the
> > timer for another one.]  For that, we're going to need a single flag, say
> > RPM_PENDING, which is set whenever a request is queued up.
> 
> That's what I called work_pending in my proposal.

Well, after some reconsideration I think it's not enough (as I wrote in my last
message), becuase it generally makes sense to make the following rule:

    A pending request always takes precedence over a new request of the same
    type.

So, for example, if pm_request_resume() is called and there's a resume request
pending already, the new pm_request_resume() should just let the pending
request alone and quit.

Thus, it seems reasonable to remember what type of a request is pending
(i don't think we can figure it out from the status fields in 100% of the
cases).

> > > The error codes you have been using seem okay to me, in general.
> > > 
> > > However, some of those requests would violate the rules in a trivial 
> > > way.  For these we might return a positive value rather than a negative 
> > > error code.  For example, calling pm_runtime_resume while the device is 
> > > already active shouldn't be considered an error.  But it can't be 
> > > considered a complete success either, because it won't invoke the 
> > > runtime_resume method.
> > 
> > That need not matter from the caller's point of view, though.  In the case of
> > pm_runtime_resume() the caller will probably be mostly interested whether or
> > not it can do I/O after the function has returned.
> 
> Yes.  But the driver might depend on something happening inside the
> runtime_resume method, so it would need to know if a successful
> pm_runtime_resume wasn't going to invoke the callback.

Hmm.  That would require the driver to know that the device was suspended,
but in that case pm_runtime_resume() returning 0 would mean that _someone_
ran ->runtime_resume() for it in any case.

If the driver doesn't know if the device was suspended beforehand, it cannot
depend on the execution of ->runtime_resume().

> > > To be determined: How runtime PM will interact with system sleep.
> > 
> > Yes.  My first idea was to disable run-time PM before entering a system sleep
> > state, but that would involve canceling all of the pending requests.
> 
> Or simply freezing the workqueue.

Well, what about the synchronous calls?  How are we going to prevent them
from happening after freezing the workqueue?

> > > About all I can add is the "New requests override previous requests"  
> > > policy.  This would apply to all the non-synchronous requests, whether
> > > they are delayed or added directly to the workqueue.  If a new request
> > > (synchronous or not) is received before the old one has started to run,
> > > the old one will be cancelled.  This holds even if the new request is
> > > redundant, like a resume request received while the device is active.
> > > 
> > > There is one exception to this rule: An idle_notify request does not 
> > > cancel a delayed or queued suspend request.
> > 
> > I'm not sure if such a rigid rule will be really useful.
> 
> A rigid rule is easier to understand and apply than one with a large
> number of special cases.  However, in the statement of the rule above,
> I forgot to mention that this applies only if the new request is valid,
> i.e., if it's not forbidden by the current status or the counter
> values.

Ah, OK.  I'd also like to add the rule about requests of the same type
(if there's one pending already, the new one is discareded).

> > Also, as I said above, I think we shouldn't regard setting up the suspend
> > timer as queuing up a request, but as a totally separate operation.
> 
> Well, there can't be any pending resume requests when the suspend timer
> is set up, so we have to consider only pending idle notifications or
> pending suspends.  I agree, we would want to allow an idle notification
> to remain pending when the suspend timer is set up.  As for pending
> suspends, we _should_ allow the new request to override the old one.  
> This will come up whenever the timeout value is changed.

Now there's a point in which allowing to set up the suspend timer at any time
simplifies things quite a bit.  Namely, in that case, if pm_schedule_suspend()
is called and it sees a timer pending, it deactivates the timer with
del_timer() and sets up a new one with add_timer().  It doesn't need to worry
about whether the suspend request has been queued up already or
pm_runtime_suspend() is running or something.  Things will work themselves out
anyway eventually.

Otherwise, after calling del_timer() we'll need to check if the timer was pending
and if it wasn't, then if the suspend requests has been queued up already, and
if it has, then if pm_runtime_suspend() is running (the current status is
RPM_SUSPENDING) etc.  That doesn't look particularly clean.

Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern July 2, 2009, 7:53 p.m. UTC | #23
On Thu, 2 Jul 2009, Rafael J. Wysocki wrote:

> > The RPM_IN_TRANSITION flag is unnecessary.  It would always be equal to
> > (status == RPM_SUSPENDING || status == RPM_RESUMING).
> 
> I thought of replacing the old flags with RPM_IN_TRANSITION, actually.

Okay, but hopefully you won't mind if I continue to use the old state 
names in conversation.

> Still, the additional flag for 'idle notification is in progress' is still
> necessary for the following two reasons:
> 
> (1) Idle notifications cannot be run (synchronously) when one is already in
>     progress, so we need a means to determine whether or not this is the case.
> 
> (2) If run-time PM is to be disabled, the function doing that must guarantee
>     that ->runtime_idle() won't be running after it's returned, so it needs to
>     know how to check that.

Agreed.


> > I don't agree.  For example, suppose the device has an active child
> > when the driver says: Suspend it in 30 seconds.  If the child is then
> > removed after only 10 seconds, does it make sense to go ahead with
> > suspending the parent 20 seconds later?  No -- if the parent is going
> > to be suspended, the decision as to when should be made at the time the
> > child is removed, not beforehand.
> 
> There are two functions, on that sets up the timer and the other that queues
> up the request.  This is the second one that makes the decision if the request
> is still worth queuing up.
> 
> > (Even more concretely, suppose there is a 30-second inactivity timeout
> > for autosuspend.  Removing the child counts as activity and so should
> > restart the timer.)
> > 
> > To put it another way, suppose you accept a delayed request under
> > inappropriate conditions.  If the conditions don't change, the whole
> > thing was a waste of effort.  And if the conditions do change, then the
> > whole delayed request should be reconsidered anyhow.
> 
> The problem is, even if you always accept a delayed request under appropriate
> conditions, you still have to reconsider it before putting it into the work
> queue, because the conditions might have changed.  So, you'd like to do this:
> 
> (1) Check if the conditions are appropriate, set up a timer.
> (2) Check if the conditions are appropriate, queue up a suspend request.
> 
> while I think it will be simpler to do this:
> 
> (1) Set up a timer.
> (2) Check if the conditions are appropriate, queue up a suspend request.
> 
> In any case you can have a pm_runtime_suspend() / pm_runtime_resume() cycle
> between (1) and (2), so I don't really see a practical difference.

A cycle like that would cancel the timer anyway.  Maybe that's what you 
meant...

Hmm.  What sort of conditions are we talking about?  One possiblity is 
that we are in the wrong state, i.e., in SUSPENDING or SUSPENDED.  It's 
completely useless to start a timer then; if the state changes the 
timer will be cancelled, and if it doesn't change then the request 
won't be queued when the timer expires.

The other possiblity is that either the children or usage counter is 
positive.  If the counter decrements to 0 so that a suspend is feasible 
then we would send an idle notification.  At that point the driver 
could decide what to do; the most likely response would be to 
reschedule the suspend.  In fact, it's hard to think of a situation 
where the driver would want to just let the timer keep on running.

> For example, suppose ->runtime_resume() has been called as
> a result of a remote wake-up (ie. after pm_request_resume()) and it has some
> I/O to process, but it is known beforehand that the device will most likely be
> inactive after the I/O is done.  So, it's tempting to call
> pm_schedule_suspend() from within ->runtime_resume(), but the conditions are
> inappropriate (the device is not regarded as suspended).

??  Conditions are perfectly appropriate, since suspend requests are 
allowed in the RESUMING state.

Unless the driver also did a pm_runtime_get, of course.  But in that 
case it would have to do a pm_runtime_put eventually, at which point it 
could schedule the suspend.

>  However, calling
> pm_schedule_suspend() with a long enough delay doesn't break any rules related
> to the ->runtime_*() callbacks, so why should it be forbidden?

It isn't.

> Next, suppose pm_schedule_suspend() is called, but it fails because the
> conditions are inappropriate.  What's the caller supposed to do?  Wait for the
> conditions to change and repeat?

In a manner of speaking.  More precisely, whatever code is responsible 
for changing the conditions should call pm_schedule_suspend.  Or set up 
an idle notification, leading indirectly to pm_schedule_suspend.

>  But why should it bother if the conditions
> may still change before ->runtime_suspend() is actually called?

It should bother because conditions might _not_ change, in which case
the suspend would occur.  But for what you are proposing, if the
conditions don't change then the suspend will not occur.

> IMO, it's the caller's problem whether or not what it does is useful or
> efficient.  The core's problem is to ensure that it doesn't break things.

But what's the drawback?  The extra overhead of checking whether two
counters are positive is minuscule compared to the effort of setting up
a timer.  And it's even better when you consider that the mostly likely
outcome of letting the timer run is that the timer handler would fail
to queue a suspend request (because the counters are unchanged).


> > Trying to keep track of reasons for incrementing and decrementing 
> > usage_count is very difficult to do in the core.  What happens if 
> > pm_request_resume increments the count but then the driver calls 
> > pm_runtime_get, pm_runtime_resume, pm_runtime_put all before the work 
> > routine can run?
> 
> Nothing wrong, as long as the increments and decrements are balanced (if they
> aren't balanced, there is a bug in the driver anyway).

That's my point -- in this situation it's very difficult for the driver
to balance them.  There would be no decrement to balance
pm_request_resume's automatic increment, because the work routine would
never run.

>  In fact, for this to
> work we need the rule that a new request of the same type doesn't replace an
> existing one.  Then, the submitted resume request cannot be canceled, so the
> work function will run and drop the usage counter.

A new pm_schedule_suspend _should_ replace an existing one.  For 
idle_notify and resume requests, this rule is more or less a no-op.

> > It's better to make the driver responsible for maintaining the counter
> > value.  Forcing the driver to do pm_runtime_get, pm_request_resume is
> > better than having the core automatically change the counter.
> 
> So the caller will do:
> 
> pm_runtime_get(dev);
> error = pm_request_resume(dev);
> if (error)
>     goto out;
> <process I/O>
> pm_runtime_put();

["error" isn't a good name.  The return value would be 0 to indicate 
the request was accepted and queued, or 1 to indicate the device is 
already active.  Or perhaps vice versa.]

> but how is it supposed to ensure that pm_runtime_put() will be called after
> executing the 'goto out' thing?

The same way it knows that the runtime_resume method has to process the
pending I/O.  That is, the presence of I/O to process means that once
the processing is over, the driver should call pm_runtime_put.

> Anyway, we don't need to use the usage counter for that (although it's cheap).
> Instead, we can make pm_request_suspend() and pm_request_idle() check if a
> resume request is pending and fail if that's the case.

But what about pm_runtime_suspend?  I think we need to use the counter.
Besides, the states in which suspend requests and idle requests are 
valid are disjoint from the states in which resume requests are valid.

> Let's put it another way.  What's the practical benefit to the caller if we
> always check the counters in submissions?

It saves the overhead of setting up and running a useless timer.  It 
avoids a race between the timer routine and pm_runtime_put.


> > >     Any pending request takes precedence over a new idle notification request.
> > 
> > For pending resume requests this rule is unnecessary; it's invalid to
> > submit an idle notification request while a resume request is pending
> > (since resume requests can be pending only in the RPM_SUSPENDING and
> > RPM_SUSPENDED states while idle notification requests are accepted only
> > in RPM_RESUMING and RPM_ACTIVE).
> 
> It is correct nevertheless. :-)

Okay, if you want.  Provided you agree that "pending request" doesn't 
include unexpired suspend timers.

> Well, after some reconsideration I think it's not enough (as I wrote in my last
> message), becuase it generally makes sense to make the following rule:
> 
>     A pending request always takes precedence over a new request of the same
>     type.
> 
> So, for example, if pm_request_resume() is called and there's a resume request
> pending already, the new pm_request_resume() should just let the pending
> request alone and quit.

Do you mean we shouldn't cancel the work item and then requeue it?  I
agree.  In fact I'd go even farther: If the timer routine find an idle
request pending, it shouldn't cancel it -- instead it should simply
change async_action to ASYNC_SUSPEND.  That's a simple optimization.  
Regardless, the effect isn't visible to drivers.

> Thus, it seems reasonable to remember what type of a request is pending
> (i don't think we can figure it out from the status fields in 100% of the
> cases).

That's what the async_action field in my proposal is for.


> > Yes.  But the driver might depend on something happening inside the
> > runtime_resume method, so it would need to know if a successful
> > pm_runtime_resume wasn't going to invoke the callback.
> 
> Hmm.  That would require the driver to know that the device was suspended,
> but in that case pm_runtime_resume() returning 0 would mean that _someone_
> ran ->runtime_resume() for it in any case.
> 
> If the driver doesn't know if the device was suspended beforehand, it cannot
> depend on the execution of ->runtime_resume().

Exactly.  Therefore it needs to be told if pm_runtime_resume isn't 
going to call the runtime_resume method, so that it can take 
appropriate remedial action.


> > > > To be determined: How runtime PM will interact with system sleep.
> > > 
> > > Yes.  My first idea was to disable run-time PM before entering a system sleep
> > > state, but that would involve canceling all of the pending requests.
> > 
> > Or simply freezing the workqueue.
> 
> Well, what about the synchronous calls?  How are we going to prevent them
> from happening after freezing the workqueue?

How about your "rpm_disabled" flag?


> Now there's a point in which allowing to set up the suspend timer at any time
> simplifies things quite a bit.  Namely, in that case, if pm_schedule_suspend()
> is called and it sees a timer pending, it deactivates the timer with
> del_timer() and sets up a new one with add_timer().  It doesn't need to worry
> about whether the suspend request has been queued up already or
> pm_runtime_suspend() is running or something.  Things will work themselves out
> anyway eventually.
> 
> Otherwise, after calling del_timer() we'll need to check if the timer was pending
> and if it wasn't, then if the suspend requests has been queued up already, and
> if it has, then if pm_runtime_suspend() is running (the current status is
> RPM_SUSPENDING) etc.  That doesn't look particularly clean.

It's not as bad as you think.  In pseudo code:

	ret = suspend_allowed(dev);
	if (ret)
		return ret;
	if (dev->power.timer_expiration) {
		del_timer(&dev->power.timer);
		dev->power.timer_expiration = 0;
	}
	if (dev->power.work_pending) {
		cancel_work(&dev->power.work);
		dev->power.work_pending = 0;
		dev->power.async_action = 0;
	}
	dev->power.timer_expiration = max(jiffies + delay, 1UL);
	mod_timer(&dev->power.timer, delay);

The middle section could usefully be put in a subroutine.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki July 2, 2009, 11:05 p.m. UTC | #24
On Thursday 02 July 2009, Alan Stern wrote:
> On Thu, 2 Jul 2009, Rafael J. Wysocki wrote:
> 
> > > The RPM_IN_TRANSITION flag is unnecessary.  It would always be equal to
> > > (status == RPM_SUSPENDING || status == RPM_RESUMING).
> > 
> > I thought of replacing the old flags with RPM_IN_TRANSITION, actually.
> 
> Okay, but hopefully you won't mind if I continue to use the old state 
> names in conversation.

Sure.

> > Still, the additional flag for 'idle notification is in progress' is still
> > necessary for the following two reasons:
> > 
> > (1) Idle notifications cannot be run (synchronously) when one is already in
> >     progress, so we need a means to determine whether or not this is the case.
> > 
> > (2) If run-time PM is to be disabled, the function doing that must guarantee
> >     that ->runtime_idle() won't be running after it's returned, so it needs to
> >     know how to check that.
> 
> Agreed.
> 
> 
> > > I don't agree.  For example, suppose the device has an active child
> > > when the driver says: Suspend it in 30 seconds.  If the child is then
> > > removed after only 10 seconds, does it make sense to go ahead with
> > > suspending the parent 20 seconds later?  No -- if the parent is going
> > > to be suspended, the decision as to when should be made at the time the
> > > child is removed, not beforehand.
> > 
> > There are two functions, on that sets up the timer and the other that queues
> > up the request.  This is the second one that makes the decision if the request
> > is still worth queuing up.
> > 
> > > (Even more concretely, suppose there is a 30-second inactivity timeout
> > > for autosuspend.  Removing the child counts as activity and so should
> > > restart the timer.)
> > > 
> > > To put it another way, suppose you accept a delayed request under
> > > inappropriate conditions.  If the conditions don't change, the whole
> > > thing was a waste of effort.  And if the conditions do change, then the
> > > whole delayed request should be reconsidered anyhow.
> > 
> > The problem is, even if you always accept a delayed request under appropriate
> > conditions, you still have to reconsider it before putting it into the work
> > queue, because the conditions might have changed.  So, you'd like to do this:
> > 
> > (1) Check if the conditions are appropriate, set up a timer.
> > (2) Check if the conditions are appropriate, queue up a suspend request.
> > 
> > while I think it will be simpler to do this:
> > 
> > (1) Set up a timer.
> > (2) Check if the conditions are appropriate, queue up a suspend request.
> > 
> > In any case you can have a pm_runtime_suspend() / pm_runtime_resume() cycle
> > between (1) and (2), so I don't really see a practical difference.
> 
> A cycle like that would cancel the timer anyway.  Maybe that's what you 
> meant...

Yes.

> Hmm.  What sort of conditions are we talking about?  One possiblity is 
> that we are in the wrong state, i.e., in SUSPENDING or SUSPENDED.  It's 
> completely useless to start a timer then; if the state changes the 
> timer will be cancelled, and if it doesn't change then the request 
> won't be queued when the timer expires.

OK

> The other possiblity is that either the children or usage counter is 
> positive.  If the counter decrements to 0 so that a suspend is feasible 
> then we would send an idle notification.  At that point the driver 
> could decide what to do; the most likely response would be to 
> reschedule the suspend.  In fact, it's hard to think of a situation 
> where the driver would want to just let the timer keep on running.

OK

> > For example, suppose ->runtime_resume() has been called as
> > a result of a remote wake-up (ie. after pm_request_resume()) and it has some
> > I/O to process, but it is known beforehand that the device will most likely be
> > inactive after the I/O is done.  So, it's tempting to call
> > pm_schedule_suspend() from within ->runtime_resume(), but the conditions are
> > inappropriate (the device is not regarded as suspended).
> 
> ??  Conditions are perfectly appropriate, since suspend requests are 
> allowed in the RESUMING state.

OK

> Unless the driver also did a pm_runtime_get, of course.  But in that 
> case it would have to do a pm_runtime_put eventually, at which point it 
> could schedule the suspend.
> 
> >  However, calling
> > pm_schedule_suspend() with a long enough delay doesn't break any rules related
> > to the ->runtime_*() callbacks, so why should it be forbidden?
> 
> It isn't.
> 
> > Next, suppose pm_schedule_suspend() is called, but it fails because the
> > conditions are inappropriate.  What's the caller supposed to do?  Wait for the
> > conditions to change and repeat?
> 
> In a manner of speaking.  More precisely, whatever code is responsible 
> for changing the conditions should call pm_schedule_suspend.  Or set up 
> an idle notification, leading indirectly to pm_schedule_suspend.
> 
> >  But why should it bother if the conditions
> > may still change before ->runtime_suspend() is actually called?
> 
> It should bother because conditions might _not_ change, in which case
> the suspend would occur.  But for what you are proposing, if the
> conditions don't change then the suspend will not occur.
> 
> > IMO, it's the caller's problem whether or not what it does is useful or
> > efficient.  The core's problem is to ensure that it doesn't break things.
> 
> But what's the drawback?  The extra overhead of checking whether two
> counters are positive is minuscule compared to the effort of setting up
> a timer.  And it's even better when you consider that the mostly likely
> outcome of letting the timer run is that the timer handler would fail
> to queue a suspend request (because the counters are unchanged).
> 
> 
> > > Trying to keep track of reasons for incrementing and decrementing 
> > > usage_count is very difficult to do in the core.  What happens if 
> > > pm_request_resume increments the count but then the driver calls 
> > > pm_runtime_get, pm_runtime_resume, pm_runtime_put all before the work 
> > > routine can run?
> > 
> > Nothing wrong, as long as the increments and decrements are balanced (if they
> > aren't balanced, there is a bug in the driver anyway).
> 
> That's my point -- in this situation it's very difficult for the driver
> to balance them.  There would be no decrement to balance
> pm_request_resume's automatic increment, because the work routine would
> never run.
> 
> >  In fact, for this to
> > work we need the rule that a new request of the same type doesn't replace an
> > existing one.  Then, the submitted resume request cannot be canceled, so the
> > work function will run and drop the usage counter.
> 
> A new pm_schedule_suspend _should_ replace an existing one.  For 
> idle_notify and resume requests, this rule is more or less a no-op.
> 
> > > It's better to make the driver responsible for maintaining the counter
> > > value.  Forcing the driver to do pm_runtime_get, pm_request_resume is
> > > better than having the core automatically change the counter.
> > 
> > So the caller will do:
> > 
> > pm_runtime_get(dev);
> > error = pm_request_resume(dev);
> > if (error)
> >     goto out;
> > <process I/O>
> > pm_runtime_put();
> 
> ["error" isn't a good name.  The return value would be 0 to indicate 
> the request was accepted and queued, or 1 to indicate the device is 
> already active.  Or perhaps vice versa.]

Why do you insist on using positive values?  Also, there are other situations
possible (like run-time PM is disabled etc.).

> > but how is it supposed to ensure that pm_runtime_put() will be called after
> > executing the 'goto out' thing?
> 
> The same way it knows that the runtime_resume method has to process the
> pending I/O.  That is, the presence of I/O to process means that once
> the processing is over, the driver should call pm_runtime_put.

I overlooked the fact that if pm_request_resume() returns a value indicating
that the request has been queued up, the status is such that it won't allow any
other requests to be queued up and only pm_runtime_resume() can.  The status
will still remain this way until ->runtime_resume() has returned, so the caller
can just call pm_runtime_put() right after pm_request_resume() in that case
(unless it wants to process I/O after ->runtime_resume() has returned, but
then it can increment the usage counter in ->runtime_resume()).

> > Anyway, we don't need to use the usage counter for that (although it's cheap).
> > Instead, we can make pm_request_suspend() and pm_request_idle() check if a
> > resume request is pending and fail if that's the case.
> 
> But what about pm_runtime_suspend?  I think we need to use the counter.
> Besides, the states in which suspend requests and idle requests are 
> valid are disjoint from the states in which resume requests are valid.

That's correct.  pm_runtime_suspend() should check the counter IMO, but it
shouldn't change it.

Also, it looks like the status bits are sufficient to prevent suspend requests
or synchronous suspends from happening at wrong times, from the core's point
of view, so scratch the idea of using the usage counter to block them.

> > Let's put it another way.  What's the practical benefit to the caller if we
> > always check the counters in submissions?
> 
> It saves the overhead of setting up and running a useless timer.  It 
> avoids a race between the timer routine and pm_runtime_put.

OK

> > > >     Any pending request takes precedence over a new idle notification request.
> > > 
> > > For pending resume requests this rule is unnecessary; it's invalid to
> > > submit an idle notification request while a resume request is pending
> > > (since resume requests can be pending only in the RPM_SUSPENDING and
> > > RPM_SUSPENDED states while idle notification requests are accepted only
> > > in RPM_RESUMING and RPM_ACTIVE).
> > 
> > It is correct nevertheless. :-)
> 
> Okay, if you want.  Provided you agree that "pending request" doesn't 
> include unexpired suspend timers.

Sure.

> > Well, after some reconsideration I think it's not enough (as I wrote in my last
> > message), becuase it generally makes sense to make the following rule:
> > 
> >     A pending request always takes precedence over a new request of the same
> >     type.
> > 
> > So, for example, if pm_request_resume() is called and there's a resume request
> > pending already, the new pm_request_resume() should just let the pending
> > request alone and quit.
> 
> Do you mean we shouldn't cancel the work item and then requeue it?  I
> agree.  In fact I'd go even farther: If the timer routine find an idle
> request pending, it shouldn't cancel it -- instead it should simply
> change async_action to ASYNC_SUSPEND.  That's a simple optimization.  
> Regardless, the effect isn't visible to drivers.

I don't really like the async_action idea, as you might have noticed.

> > Thus, it seems reasonable to remember what type of a request is pending
> > (i don't think we can figure it out from the status fields in 100% of the
> > cases).
> 
> That's what the async_action field in my proposal is for.

Ah.  Why don't we just use a request type field instead?

In fact, we can use a 2-bit status field (RPM_ACTIVE, RPM_SUSPENDING,
RPM_SUSPENDED, RPM_RESUMING) and a 2-bit request type field
(RPM_REQ_NONE, RPM_REQ_IDLE, RPM_REQ_SUSPEND, RPM_REQ_RESUME).

Additionally, we'll need an "idle notification is running" flag as we've aleady
agreed, but that's independent on the status and request type (except that, I
think, it should be forbidden to set the request type to RPM_REQ_IDLE if
this flag is set).

That would pretty much suffice to represent all of the possibilities.

I'd also add a "disabled" flag indicating that run-time PM of the device is
disabled, an "error" flag indicating that one of the
->runtime_[suspend/resume]() callbacks has failed to do its job and
and an int field to store the error code returned by the failing callback (in
case the failure happened in an asynchronous routine).

> > > Yes.  But the driver might depend on something happening inside the
> > > runtime_resume method, so it would need to know if a successful
> > > pm_runtime_resume wasn't going to invoke the callback.
> > 
> > Hmm.  That would require the driver to know that the device was suspended,
> > but in that case pm_runtime_resume() returning 0 would mean that _someone_
> > ran ->runtime_resume() for it in any case.
> > 
> > If the driver doesn't know if the device was suspended beforehand, it cannot
> > depend on the execution of ->runtime_resume().
> 
> Exactly.  Therefore it needs to be told if pm_runtime_resume isn't 
> going to call the runtime_resume method, so that it can take 
> appropriate remedial action.

OK, it can return 1 if the status was already RPM_ACTIVE.

> > > > > To be determined: How runtime PM will interact with system sleep.
> > > > 
> > > > Yes.  My first idea was to disable run-time PM before entering a system sleep
> > > > state, but that would involve canceling all of the pending requests.
> > > 
> > > Or simply freezing the workqueue.
> > 
> > Well, what about the synchronous calls?  How are we going to prevent them
> > from happening after freezing the workqueue?
> 
> How about your "rpm_disabled" flag?

That's fine, we'd also need to wait for running callbacks to finish too.  And
I'm still not convinced if we should preserve requests queued up before the
system sleep.  Or keep the suspend timer running for that matter.

> > Now there's a point in which allowing to set up the suspend timer at any time
> > simplifies things quite a bit.  Namely, in that case, if pm_schedule_suspend()
> > is called and it sees a timer pending, it deactivates the timer with
> > del_timer() and sets up a new one with add_timer().  It doesn't need to worry
> > about whether the suspend request has been queued up already or
> > pm_runtime_suspend() is running or something.  Things will work themselves out
> > anyway eventually.
> > 
> > Otherwise, after calling del_timer() we'll need to check if the timer was pending
> > and if it wasn't, then if the suspend requests has been queued up already, and
> > if it has, then if pm_runtime_suspend() is running (the current status is
> > RPM_SUSPENDING) etc.  That doesn't look particularly clean.
> 
> It's not as bad as you think.  In pseudo code:
> 
> 	ret = suspend_allowed(dev);
> 	if (ret)
> 		return ret;
> 	if (dev->power.timer_expiration) {
> 		del_timer(&dev->power.timer);
> 		dev->power.timer_expiration = 0;
> 	}
> 	if (dev->power.work_pending) {
> 		cancel_work(&dev->power.work);
> 		dev->power.work_pending = 0;
> 		dev->power.async_action = 0;
> 	}
> 	dev->power.timer_expiration = max(jiffies + delay, 1UL);
> 	mod_timer(&dev->power.timer, delay);
> 
> The middle section could usefully be put in a subroutine.

Could you please remind me what timer_expiration is for?

So, at a high level, the pm_request_* and pm_schedule_* functions would work
like this (I'm omitting acquiring and releasing locks):

pm_request_idle()
  * return -EINVAL if 'disabled' is set or 'runtime_error' is set
  * return -EAGAIN if 'runtime status' is not RPM_ACTIVE or 'request type' is
    RPM_REQ_SUSPEND or 'usage_count' > 0 or 'child_count' > 0
  * return -EALREADY if 'request type' is RPM_REQ_IDLE
  * return -EINPROGRESS if 'idle notification in progress' is set
  * change 'request type' to RPM_REQ_IDLE and queue up a request to execute
    ->runtime_idle() or ->runtime_suspend() (which one will be executed depends
    on 'request type' at the time when the work function is run)
  * return 0

pm_schedule_suspend()
  * return -EINVAL if 'disabled' is set or 'runtime_error' is set
  * return -EAGAIN if 'usage_count' > 0 or 'child_count' > 0
  * return -EALREADY if 'runtime status' is RPM_SUSPENDED
  * return -EINPROGRESS if 'runtime status' is RPM_SUSPENDING
  * if suspend timer is pending, deactivate it
  * if 'request type' is not RPM_REQ_NONE, cancel the work
  * set up a timer to execute pm_request_suspend()
  * return 0

pm_request_suspend()
  * return if 'disabled' is set or 'runtime_error' is set
  * return if 'usage_count' > 0 or 'child_count' > 0
  * return if 'runtime status' is RPM_SUSPENDING or RPM_SUSPENDED
  * if 'request type' is RPM_REQ_IDLE, change it to RPM_REQ_SUSPEND and return
  * change 'request type' to RPM_REQ_SUSPEND and queue up a request to
    execute ->runtime_suspend()

pm_request_resume()
  * return -EINVAL if 'disabled' is set or 'runtime_error' is set
  * return -EINPROGRESS if 'runtime status' is RPM_RESUMING
  * return -EALREADY if 'request type' is RPM_REQ_RESUME
  * if suspend timer is pending, deactivate it
  * if 'request type' is not RPM_REQ_NONE, cancel the work
  * return 1 if 'runtime status' is RPM_ACTIVE
  * change 'request type' to RPM_REQ_RESUME and queue up a request to
    execute ->runtime_resume()
  * return 0

Or did I miss anything?

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern July 3, 2009, 8:58 p.m. UTC | #25
On Fri, 3 Jul 2009, Rafael J. Wysocki wrote:

> > ["error" isn't a good name.  The return value would be 0 to indicate 
> > the request was accepted and queued, or 1 to indicate the device is 
> > already active.  Or perhaps vice versa.]
> 
> Why do you insist on using positive values?  Also, there are other situations
> possible (like run-time PM is disabled etc.).

I think we should use positive values to indicate situations that
aren't the "nominal" case but also aren't errors.  This simplifies
error checking in drivers.  For example, you wouldn't want to print a
debugging or warning message just because the device happened to be
active already when you called pm_runtime_resume.


> I don't really like the async_action idea, as you might have noticed.

Do you mean that you don't like the field, or that you don't like its name?

> > > Thus, it seems reasonable to remember what type of a request is pending
> > > (i don't think we can figure it out from the status fields in 100% of the
> > > cases).
> > 
> > That's what the async_action field in my proposal is for.
> 
> Ah.  Why don't we just use a request type field instead?

"A rose by any other name..."

> In fact, we can use a 2-bit status field (RPM_ACTIVE, RPM_SUSPENDING,
> RPM_SUSPENDED, RPM_RESUMING) and a 2-bit request type field
> (RPM_REQ_NONE, RPM_REQ_IDLE, RPM_REQ_SUSPEND, RPM_REQ_RESUME).

That's the same as my 0, ASYNC_IDLE, ASYNC_SUSPEND, ASYNC_RESUME.

> Additionally, we'll need an "idle notification is running" flag as we've aleady
> agreed, but that's independent on the status and request type (except that, I
> think, it should be forbidden to set the request type to RPM_REQ_IDLE if
> this flag is set).

I don't see why; we can allow drivers to queue an idle notification
from within their runtime_idle routine (even though it might seem
pointless).  What we should forbid is calling pm_runtime_idle when the
flag is set.

> That would pretty much suffice to represent all of the possibilities.
> 
> I'd also add a "disabled" flag indicating that run-time PM of the device is
> disabled, an "error" flag indicating that one of the
> ->runtime_[suspend/resume]() callbacks has failed to do its job and
> and an int field to store the error code returned by the failing callback (in
> case the failure happened in an asynchronous routine).

Sure -- those are all things in the current design which should remain.  
As well as the wait_queue.


> That's fine, we'd also need to wait for running callbacks to finish too.  And
> I'm still not convinced if we should preserve requests queued up before the
> system sleep.  Or keep the suspend timer running for that matter.

This all does into the "to-be-determined" category.  :-)


> Could you please remind me what timer_expiration is for?

It is the jiffies value for the next timer expiration, or 0 if the
timer isn't pending.  Its purpose is to allow us to correctly
reschedule suspend requests.

Suppose the timer expires at about the same time as a new
pm_schedule_suspend call occurs.  If the timer routine hasn't queued
the work item yet then there's nothing to cancel, so how do we prevent
a suspend request from being added to the workqueue?  Answer: The timer
routine checks timer_expiration.  If the value stored there is in the
future, then the routine knows it was triggered early and it shouldn't
submit the work item.

Also (a minor benefit), before calling del_timer we can check whether
timer_expiration is nonzero.

> So, at a high level, the pm_request_* and pm_schedule_* functions would work
> like this (I'm omitting acquiring and releasing locks):
> 
> pm_request_idle()
>   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
>   * return -EAGAIN if 'runtime status' is not RPM_ACTIVE or 'request type' is
>     RPM_REQ_SUSPEND or 'usage_count' > 0 or 'child_count' > 0

We should allow the status to be RPM_RESUMING.

>   * return -EALREADY if 'request type' is RPM_REQ_IDLE

No, return 0.

>   * return -EINPROGRESS if 'idle notification in progress' is set

No, go ahead and schedule another idle notification.

>   * change 'request type' to RPM_REQ_IDLE and queue up a request to execute
>     ->runtime_idle() or ->runtime_suspend() (which one will be executed depends
>     on 'request type' at the time when the work function is run)

More simply, just queue the work item.

>   * return 0
> 
> pm_schedule_suspend()
>   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
>   * return -EAGAIN if 'usage_count' > 0 or 'child_count' > 0
>   * return -EALREADY if 'runtime status' is RPM_SUSPENDED
>   * return -EINPROGRESS if 'runtime status' is RPM_SUSPENDING

The last two aren't right.  If the status is RPM_SUSPENDED or
RPM_SUSPENDING, cancel any pending work and set the type to
RPM_REQ_NONE before returning.  In other words, cancel a possible 
pending resume request.

>   * if suspend timer is pending, deactivate it

This step isn't needed here, since you're going to restart the timer 
anyway.

>   * if 'request type' is not RPM_REQ_NONE, cancel the work

Set timer_expiration = jiffies + delay.

>   * set up a timer to execute pm_request_suspend()
>   * return 0
> 
> pm_request_suspend()
>   * return if 'disabled' is set or 'runtime_error' is set
>   * return if 'usage_count' > 0 or 'child_count' > 0
>   * return if 'runtime status' is RPM_SUSPENDING or RPM_SUSPENDED

First cancel a possible pending resume request.

If the status is RPM_RESUMING or RPM_ACTIVE, cancel a possible pending 
timer (and set time_expiration to 0).

>   * if 'request type' is RPM_REQ_IDLE, change it to RPM_REQ_SUSPEND and return
>   * change 'request type' to RPM_REQ_SUSPEND and queue up a request to
>     execute ->runtime_suspend()
> 
> pm_request_resume()
>   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
>   * return -EINPROGRESS if 'runtime status' is RPM_RESUMING

Or RPM_ACTIVE.

>   * return -EALREADY if 'request type' is RPM_REQ_RESUME

For these last two, first cancel a possible pending suspend request
and a possible timer.  Should we leave a pending idle request in place?  
And return 1, not an error code.

>   * if suspend timer is pending, deactivate it

The timer can't be pending at this point.

>   * if 'request type' is not RPM_REQ_NONE, cancel the work

At this point, 'request type' can only be RPM_REQ_NONE or 
RPM_REQ_RESUME.  In neither case do we want to cancel it.

>   * return 1 if 'runtime status' is RPM_ACTIVE

See above.

>   * change 'request type' to RPM_REQ_RESUME and queue up a request to
>     execute ->runtime_resume()

Queue the request only if the state is RPM_SUSPENDED.

>   * return 0
> 
> Or did I miss anything?

I think this is pretty close.  It'll be necessary to go back and reread 
the old email messages to make sure this really does everything we 
eventually agreed on.  :-)

Similar outlines apply for pm_runtime_suspend, pm_runtime_resume, and
pm_runtime_idle.  There is an extra requirement: When a suspend or
resume is over, if 'request type' is set then schedule the work item.  
Doing things this way allows the workqueue thread to avoid waiting
around for the suspend or resume to finish.

Also, when a resume is over we should schedule an idle notification 
even if 'request type' is clear, provided the counters are 0.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki July 3, 2009, 11:57 p.m. UTC | #26
On Friday 03 July 2009, Alan Stern wrote:
> On Fri, 3 Jul 2009, Rafael J. Wysocki wrote:
> 
> > > ["error" isn't a good name.  The return value would be 0 to indicate 
> > > the request was accepted and queued, or 1 to indicate the device is 
> > > already active.  Or perhaps vice versa.]
> > 
> > Why do you insist on using positive values?  Also, there are other situations
> > possible (like run-time PM is disabled etc.).
> 
> I think we should use positive values to indicate situations that
> aren't the "nominal" case but also aren't errors.  This simplifies
> error checking in drivers.  For example, you wouldn't want to print a
> debugging or warning message just because the device happened to be
> active already when you called pm_runtime_resume.

OK

> > I don't really like the async_action idea, as you might have noticed.
> 
> Do you mean that you don't like the field, or that you don't like its name?

The name, actually.  That's because I'd like to use the values for something
that's not 'async' in substance (more on that later).
 
> > > > Thus, it seems reasonable to remember what type of a request is pending
> > > > (i don't think we can figure it out from the status fields in 100% of the
> > > > cases).
> > > 
> > > That's what the async_action field in my proposal is for.
> > 
> > Ah.  Why don't we just use a request type field instead?
> 
> "A rose by any other name..."
> 
> > In fact, we can use a 2-bit status field (RPM_ACTIVE, RPM_SUSPENDING,
> > RPM_SUSPENDED, RPM_RESUMING) and a 2-bit request type field
> > (RPM_REQ_NONE, RPM_REQ_IDLE, RPM_REQ_SUSPEND, RPM_REQ_RESUME).
> 
> That's the same as my 0, ASYNC_IDLE, ASYNC_SUSPEND, ASYNC_RESUME.
> 
> > Additionally, we'll need an "idle notification is running" flag as we've aleady
> > agreed, but that's independent on the status and request type (except that, I
> > think, it should be forbidden to set the request type to RPM_REQ_IDLE if
> > this flag is set).
> 
> I don't see why; we can allow drivers to queue an idle notification
> from within their runtime_idle routine (even though it might seem
> pointless).  What we should forbid is calling pm_runtime_idle when the
> flag is set.

OK

> > That would pretty much suffice to represent all of the possibilities.
> > 
> > I'd also add a "disabled" flag indicating that run-time PM of the device is
> > disabled, an "error" flag indicating that one of the
> > ->runtime_[suspend/resume]() callbacks has failed to do its job and
> > and an int field to store the error code returned by the failing callback (in
> > case the failure happened in an asynchronous routine).
> 
> Sure -- those are all things in the current design which should remain.  
> As well as the wait_queue.

It occured to me in the meantime that if we added a 'request_pending' (or
'work_pending' or whatever similar) flag to the above, we could avoid using
cancel_work().  Namely, if 'request_pending' indicates that there's a work item
queued up, we could change 'request type' to NONE in case we didn't want the
work function to do anything.  Then, the work function would just unset
'request_pending' and quit if 'request type' is NONE.

I generally like the idea of changing 'request type' on the fly once we've
noticed that the currently pending request should be replaced by another one.
That would require us to introduce a big idle-suspend-resume function
choosing the callback to run based on 'request type', which would be quite
complicated.  But that function could also be used for the 'synchronous'
operations, so perhaps it's worth trying?

Such a function can take two arguments, dev and request, where the second
one determines the callback to run.  It can take the same values as 'request
type', where NONE means "you've been called from the workqueue, use 'request
type' from dev to check what to do", but your ASYNC_* names are not really
suitable here. :-)

> > That's fine, we'd also need to wait for running callbacks to finish too.  And
> > I'm still not convinced if we should preserve requests queued up before the
> > system sleep.  Or keep the suspend timer running for that matter.
> 
> This all does into the "to-be-determined" category.  :-)

Well, I'd like to choose something to start with.

> > Could you please remind me what timer_expiration is for?
> 
> It is the jiffies value for the next timer expiration, or 0 if the
> timer isn't pending.  Its purpose is to allow us to correctly
> reschedule suspend requests.
> 
> Suppose the timer expires at about the same time as a new
> pm_schedule_suspend call occurs.  If the timer routine hasn't queued
> the work item yet then there's nothing to cancel, so how do we prevent
> a suspend request from being added to the workqueue?  Answer: The timer
> routine checks timer_expiration.  If the value stored there is in the
> future, then the routine knows it was triggered early and it shouldn't
> submit the work item.
> 
> Also (a minor benefit), before calling del_timer we can check whether
> timer_expiration is nonzero.

OK, thanks.

> > So, at a high level, the pm_request_* and pm_schedule_* functions would work
> > like this (I'm omitting acquiring and releasing locks):
> > 
> > pm_request_idle()
> >   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> >   * return -EAGAIN if 'runtime status' is not RPM_ACTIVE or 'request type' is
> >     RPM_REQ_SUSPEND or 'usage_count' > 0 or 'child_count' > 0
> 
> We should allow the status to be RPM_RESUMING.

OK

> >   * return -EALREADY if 'request type' is RPM_REQ_IDLE
> 
> No, return 0.

OK

> >   * return -EINPROGRESS if 'idle notification in progress' is set
> 
> No, go ahead and schedule another idle notification.

OK

> >   * change 'request type' to RPM_REQ_IDLE and queue up a request to execute
> >     ->runtime_idle() or ->runtime_suspend() (which one will be executed depends
> >     on 'request type' at the time when the work function is run)
> 
> More simply, just queue the work item.
> 
> >   * return 0
> > 
> > pm_schedule_suspend()
> >   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> >   * return -EAGAIN if 'usage_count' > 0 or 'child_count' > 0
> >   * return -EALREADY if 'runtime status' is RPM_SUSPENDED
> >   * return -EINPROGRESS if 'runtime status' is RPM_SUSPENDING
> 
> The last two aren't right.  If the status is RPM_SUSPENDED or
> RPM_SUSPENDING, cancel any pending work and set the type to
> RPM_REQ_NONE before returning.  In other words, cancel a possible 
> pending resume request.

Wy do you think the possible pending resume request should be canceled?

I don't really agree here.  Resume request really means there's data to
process, so we shouldn't cancel pending resume requests IMO.

The driver should be given a chance to process data in ->runtime_resume()
even if it doesn't use the usage counter.  Otherwise, the usage counter would
always have to be used along with resume requests, so having
pm_request_resume() that doesn't increment the usage counter would really be
pointless.

> >   * if suspend timer is pending, deactivate it
> 
> This step isn't needed here, since you're going to restart the timer 
> anyway.

OK, restart the timer.

> >   * if 'request type' is not RPM_REQ_NONE, cancel the work
> 
> Set timer_expiration = jiffies + delay.

OK

> >   * set up a timer to execute pm_request_suspend()
> >   * return 0
> > 
> > pm_request_suspend()
> >   * return if 'disabled' is set or 'runtime_error' is set
> >   * return if 'usage_count' > 0 or 'child_count' > 0
> >   * return if 'runtime status' is RPM_SUSPENDING or RPM_SUSPENDED
> 
> First cancel a possible pending resume request.

I disagree.

> If the status is RPM_RESUMING or RPM_ACTIVE, cancel a possible pending 
> timer (and set time_expiration to 0).

We're the timer function, so either the timer is not pending, or we've been
executed too early.

> >   * if 'request type' is RPM_REQ_IDLE, change it to RPM_REQ_SUSPEND and return
> >   * change 'request type' to RPM_REQ_SUSPEND and queue up a request to
> >     execute ->runtime_suspend()
> > 
> > pm_request_resume()
> >   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> >   * return -EINPROGRESS if 'runtime status' is RPM_RESUMING
> 
> Or RPM_ACTIVE.

Maybe return 1 in that case?

> >   * return -EALREADY if 'request type' is RPM_REQ_RESUME
> 
> For these last two, first cancel a possible pending suspend request
> and a possible timer.

Possible timer only, I think.  if 'request type' is RESUME, there can't be
suspend request pending.

> Should we leave a pending idle request in place?  

Probably not.  It's likely going to result in a suspend request that we would
cancel.

> And return 1, not an error code.
> 
> >   * if suspend timer is pending, deactivate it
> 
> The timer can't be pending at this point.

That's if we deactivated it earlier.  OK

> >   * if 'request type' is not RPM_REQ_NONE, cancel the work
> 
> At this point, 'request type' can only be RPM_REQ_NONE or 
> RPM_REQ_RESUME.  In neither case do we want to cancel it.
> 
> >   * return 1 if 'runtime status' is RPM_ACTIVE
> 
> See above.
> 
> >   * change 'request type' to RPM_REQ_RESUME and queue up a request to
> >     execute ->runtime_resume()
> 
> Queue the request only if the state is RPM_SUSPENDED.
> 
> >   * return 0
> > 
> > Or did I miss anything?
> 
> I think this is pretty close.  It'll be necessary to go back and reread 
> the old email messages to make sure this really does everything we 
> eventually agreed on.  :-)

I think it's sufficient if we agree on the final version. :-)

> Similar outlines apply for pm_runtime_suspend, pm_runtime_resume, and
> pm_runtime_idle.  There is an extra requirement: When a suspend or
> resume is over, if 'request type' is set then schedule the work item.  
> Doing things this way allows the workqueue thread to avoid waiting
> around for the suspend or resume to finish.

I agree except that I would like suspends to just fail when the status is
RPM_RESUMING.  The reason is that a sloppily written driver could enter a
busy-loop of suspending-resuming the device, without the possibility to process
data, if there's full symmetry between suspend and resume.  So, I'd like to
break that symmetry and make resume operations privileged with respect to
suspend and idle notifications.  

> Also, when a resume is over we should schedule an idle notification 
> even if 'request type' is clear, provided the counters are 0.

Agreed.

Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern July 4, 2009, 3:12 a.m. UTC | #27
On Sat, 4 Jul 2009, Rafael J. Wysocki wrote:

> > > I don't really like the async_action idea, as you might have noticed.
> > 
> > Do you mean that you don't like the field, or that you don't like its name?
> 
> The name, actually.  That's because I'd like to use the values for something
> that's not 'async' in substance (more on that later).

Okay.  I don't care about the name.

> It occured to me in the meantime that if we added a 'request_pending' (or
> 'work_pending' or whatever similar) flag to the above, we could avoid using
> cancel_work().  Namely, if 'request_pending' indicates that there's a work item
> queued up, we could change 'request type' to NONE in case we didn't want the
> work function to do anything.  Then, the work function would just unset
> 'request_pending' and quit if 'request type' is NONE.

You mean use request_pending to decide whether to call cancel_work, 
instead of looking at request_type?  That's right.

As for whether or not we should actually call cancel_work...  Which is 
more expensive: Calling cancel_work when no work is pending, or letting 
the work item run when it doesn't have anything to do?  Probably the 
latter.

> I generally like the idea of changing 'request type' on the fly once we've
> noticed that the currently pending request should be replaced by another one.

Me too.

> That would require us to introduce a big idle-suspend-resume function
> choosing the callback to run based on 'request type', which would be quite
> complicated.

It doesn't have to be very big or complicated:

	spin_lock_irq(&dev->power.lock);
	switch (dev->power.request_type) {
	case RPM_REQ_SUSPEND:
		__pm_runtime_suspend(dev, false);
		break;
	case RPM_REQ_RESUME:
		__pm_runtime_resume(dev, false);
		break;
	case RPM_REQ_IDLE:
		__pm_runtime_idle(dev, false);
		break;
	default:
	}
	spin_unlock_irq(&dev->power.lock);

It would be necessary to change the __pm_runtime_* routines, since they
would now have to be called with the lock held.

>  But that function could also be used for the 'synchronous'
> operations, so perhaps it's worth trying?
> 
> Such a function can take two arguments, dev and request, where the second
> one determines the callback to run.  It can take the same values as 'request
> type', where NONE means "you've been called from the workqueue, use 'request
> type' from dev to check what to do", but your ASYNC_* names are not really
> suitable here. :-)

I don't see any advantage in that approach.  The pm_runtime_* functions
already know what they want to do.  Why encode it in a request argument
only to decode it again?


> > > That's fine, we'd also need to wait for running callbacks to
> > > finish too.  And I'm still not convinced if we should preserve
> > > requests queued up before the system sleep.  Or keep the suspend
> > > timer running for that matter.
> > 
> > This all does into the "to-be-determined" category.  :-)
> 
> Well, I'd like to choose something to start with.

Pending suspends and the suspend timer don't matter much; we can cancel
them because they ought to get resubmitted after the system wakes up.  
Pending resumes are more difficult; depending on how they are treated
they could morph into immediate wakeup requests.

Perhaps even more tricky is how to handle things like the ACPI suspend
calls when the device is already runtime-suspended.  I don't know what 
we should do about that.


> > > pm_schedule_suspend()
> > >   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> > >   * return -EAGAIN if 'usage_count' > 0 or 'child_count' > 0
> > >   * return -EALREADY if 'runtime status' is RPM_SUSPENDED
> > >   * return -EINPROGRESS if 'runtime status' is RPM_SUSPENDING
> > 
> > The last two aren't right.  If the status is RPM_SUSPENDED or
> > RPM_SUSPENDING, cancel any pending work and set the type to
> > RPM_REQ_NONE before returning.  In other words, cancel a possible 
> > pending resume request.
> 
> Wy do you think the possible pending resume request should be canceled?

It's part of the "most recent request wins" approach.

> I don't really agree here.  Resume request really means there's data to
> process, so we shouldn't cancel pending resume requests IMO.
> 
> The driver should be given a chance to process data in ->runtime_resume()
> even if it doesn't use the usage counter.  Otherwise, the usage counter would
> always have to be used along with resume requests, so having
> pm_request_resume() that doesn't increment the usage counter would really be
> pointless.

All right, I'll go along with this.  So instead of "most recent request 
wins", we have something like this:

	Resume requests (queued or in progress) override suspend and 
	idle requests (sync or async).

	Suspend requests (queued or in progress, but not unexpired)
	override idle requests (sync or async).

But this statement might not be precise enough, and I'm too tired to
think through all the ramifications right now.


> > > pm_request_suspend()
> > >   * return if 'disabled' is set or 'runtime_error' is set
> > >   * return if 'usage_count' > 0 or 'child_count' > 0
> > >   * return if 'runtime status' is RPM_SUSPENDING or RPM_SUSPENDED
> > 
> > First cancel a possible pending resume request.
> 
> I disagree.

This is the same as the above, right?

> > If the status is RPM_RESUMING or RPM_ACTIVE, cancel a possible pending 
> > timer (and set time_expiration to 0).
> 
> We're the timer function, so either the timer is not pending, or we've been
> executed too early.

Oh, okay.  I thought perhaps you might have wanted to export
pm_request_suspend.  But this is really supposed to be just the timer 
handler.


> > > pm_request_resume()
> > >   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> > >   * return -EINPROGRESS if 'runtime status' is RPM_RESUMING
> > 
> > Or RPM_ACTIVE.
> 
> Maybe return 1 in that case?

Yes.

> > >   * return -EALREADY if 'request type' is RPM_REQ_RESUME
> > 
> > For these last two, first cancel a possible pending suspend request
> > and a possible timer.
> 
> Possible timer only, I think.  if 'request type' is RESUME, there can't be
> suspend request pending.

But there can be if the status is RPM_ACTIVE.  So okay, if the status 
isn't RPM_RESUMING or RPM_ACTIVE and request_type is RPM_REQ_RESUME, 
then return 0.


> > Similar outlines apply for pm_runtime_suspend, pm_runtime_resume, and
> > pm_runtime_idle.  There is an extra requirement: When a suspend or
> > resume is over, if 'request type' is set then schedule the work item.  
> > Doing things this way allows the workqueue thread to avoid waiting
> > around for the suspend or resume to finish.
> 
> I agree except that I would like suspends to just fail when the status is
> RPM_RESUMING.  The reason is that a sloppily written driver could enter a
> busy-loop of suspending-resuming the device, without the possibility to process
> data, if there's full symmetry between suspend and resume.  So, I'd like to
> break that symmetry and make resume operations privileged with respect to
> suspend and idle notifications.  

This follows from the new precedence rule.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki July 4, 2009, 9:27 p.m. UTC | #28
On Saturday 04 July 2009, Alan Stern wrote:
> On Sat, 4 Jul 2009, Rafael J. Wysocki wrote:
> 
> > > > I don't really like the async_action idea, as you might have noticed.
> > > 
> > > Do you mean that you don't like the field, or that you don't like its name?
> > 
> > The name, actually.  That's because I'd like to use the values for something
> > that's not 'async' in substance (more on that later).
> 
> Okay.  I don't care about the name.
> 
> > It occured to me in the meantime that if we added a 'request_pending' (or
> > 'work_pending' or whatever similar) flag to the above, we could avoid using
> > cancel_work().  Namely, if 'request_pending' indicates that there's a work item
> > queued up, we could change 'request type' to NONE in case we didn't want the
> > work function to do anything.  Then, the work function would just unset
> > 'request_pending' and quit if 'request type' is NONE.
> 
> You mean use request_pending to decide whether to call cancel_work, 
> instead of looking at request_type?  That's right.
> 
> As for whether or not we should actually call cancel_work...  Which is 
> more expensive: Calling cancel_work when no work is pending, or letting 
> the work item run when it doesn't have anything to do?  Probably the 
> latter.

Agreed, but that doesn't affect functionality.  We can get the desired
functionality without the cancel_work() patch and then optimize things along
with that patch.  This way it'll be easier to demontrate the benefit of it.

> > I generally like the idea of changing 'request type' on the fly once we've
> > noticed that the currently pending request should be replaced by another one.
> 
> Me too.
> 
> > That would require us to introduce a big idle-suspend-resume function
> > choosing the callback to run based on 'request type', which would be quite
> > complicated.
> 
> It doesn't have to be very big or complicated:
> 
> 	spin_lock_irq(&dev->power.lock);

Ah, that is the detail I overlooked, we don't need to use
spin_lock_irqsave(), because these function will always be called with
interrupts enabled.

> 	switch (dev->power.request_type) {
> 	case RPM_REQ_SUSPEND:
> 		__pm_runtime_suspend(dev, false);
> 		break;
> 	case RPM_REQ_RESUME:
> 		__pm_runtime_resume(dev, false);
> 		break;
> 	case RPM_REQ_IDLE:
> 		__pm_runtime_idle(dev, false);
> 		break;
> 	default:
> 	}
> 	spin_unlock_irq(&dev->power.lock);
> 
> It would be necessary to change the __pm_runtime_* routines, since they
> would now have to be called with the lock held.
> 
> >  But that function could also be used for the 'synchronous'
> > operations, so perhaps it's worth trying?
> > 
> > Such a function can take two arguments, dev and request, where the second
> > one determines the callback to run.  It can take the same values as 'request
> > type', where NONE means "you've been called from the workqueue, use 'request
> > type' from dev to check what to do", but your ASYNC_* names are not really
> > suitable here. :-)
> 
> I don't see any advantage in that approach.  The pm_runtime_* functions
> already know what they want to do.  Why encode it in a request argument
> only to decode it again?

Well, scratch that anyway, I thought it would be necessary because of the
'irqsave' locking.

> > > > That's fine, we'd also need to wait for running callbacks to
> > > > finish too.  And I'm still not convinced if we should preserve
> > > > requests queued up before the system sleep.  Or keep the suspend
> > > > timer running for that matter.
> > > 
> > > This all does into the "to-be-determined" category.  :-)
> > 
> > Well, I'd like to choose something to start with.
> 
> Pending suspends and the suspend timer don't matter much; we can cancel
> them because they ought to get resubmitted after the system wakes up.  
> Pending resumes are more difficult; depending on how they are treated
> they could morph into immediate wakeup requests.
> 
> Perhaps even more tricky is how to handle things like the ACPI suspend
> calls when the device is already runtime-suspended.  I don't know what 
> we should do about that.

That almost entirely depends on the bus type.  For PCI and probably PNP as well
there's a notion of ACPI low power states and there are AML methods to put the
devices into these states.  Unfortunately, the ACPI low power state to put the
device into depends on the target sleep state of the system, so these devices
will probably have to be put into D0 before system suspend anyway.

I think that the bus type can handle this as long as it knows the state the
device is in before system suspend.  So, the only thing the core should do is
to block the execution of run-time PM framework functions during system
sleep and resume.  The state it leaves the device in shouldn't matter.

So, I think we can simply freeze the workqueue, set the 'disabled' bit for each
device and wait for all run-time PM operations on it in progress to complete.

In the 'disabled' state the bus type or driver can modify the run-time PM
status to whatever they like anyway.  Perhaps we can provide a helper to
change 'request type' to RPM_REQ_NONE.

> > > > pm_schedule_suspend()
> > > >   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> > > >   * return -EAGAIN if 'usage_count' > 0 or 'child_count' > 0
> > > >   * return -EALREADY if 'runtime status' is RPM_SUSPENDED
> > > >   * return -EINPROGRESS if 'runtime status' is RPM_SUSPENDING
> > > 
> > > The last two aren't right.  If the status is RPM_SUSPENDED or
> > > RPM_SUSPENDING, cancel any pending work and set the type to
> > > RPM_REQ_NONE before returning.  In other words, cancel a possible 
> > > pending resume request.
> > 
> > Wy do you think the possible pending resume request should be canceled?
> 
> It's part of the "most recent request wins" approach.
>
> > I don't really agree here.  Resume request really means there's data to
> > process, so we shouldn't cancel pending resume requests IMO.
> > 
> > The driver should be given a chance to process data in ->runtime_resume()
> > even if it doesn't use the usage counter.  Otherwise, the usage counter would
> > always have to be used along with resume requests, so having
> > pm_request_resume() that doesn't increment the usage counter would really be
> > pointless.
> 
> All right, I'll go along with this.  So instead of "most recent request 
> wins", we have something like this:
> 
> 	Resume requests (queued or in progress) override suspend and 
> 	idle requests (sync or async).
> 
> 	Suspend requests (queued or in progress, but not unexpired)
> 	override idle requests (sync or async).
> 
> But this statement might not be precise enough, and I'm too tired to
> think through all the ramifications right now.

Fair enough. :-)

> > > > pm_request_suspend()
> > > >   * return if 'disabled' is set or 'runtime_error' is set
> > > >   * return if 'usage_count' > 0 or 'child_count' > 0
> > > >   * return if 'runtime status' is RPM_SUSPENDING or RPM_SUSPENDED
> > > 
> > > First cancel a possible pending resume request.
> > 
> > I disagree.
> 
> This is the same as the above, right?

Yes.

> > > If the status is RPM_RESUMING or RPM_ACTIVE, cancel a possible pending 
> > > timer (and set time_expiration to 0).
> > 
> > We're the timer function, so either the timer is not pending, or we've been
> > executed too early.
> 
> Oh, okay.  I thought perhaps you might have wanted to export
> pm_request_suspend.  But this is really supposed to be just the timer 
> handler.

Yes, it is.

> > > > pm_request_resume()
> > > >   * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> > > >   * return -EINPROGRESS if 'runtime status' is RPM_RESUMING
> > > 
> > > Or RPM_ACTIVE.
> > 
> > Maybe return 1 in that case?
> 
> Yes.
> 
> > > >   * return -EALREADY if 'request type' is RPM_REQ_RESUME
> > > 
> > > For these last two, first cancel a possible pending suspend request
> > > and a possible timer.
> > 
> > Possible timer only, I think.  if 'request type' is RESUME, there can't be
> > suspend request pending.
> 
> But there can be if the status is RPM_ACTIVE.  So okay, if the status 
> isn't RPM_RESUMING or RPM_ACTIVE and request_type is RPM_REQ_RESUME, 
> then return 0.
> 
> 
> > > Similar outlines apply for pm_runtime_suspend, pm_runtime_resume, and
> > > pm_runtime_idle.  There is an extra requirement: When a suspend or
> > > resume is over, if 'request type' is set then schedule the work item.  
> > > Doing things this way allows the workqueue thread to avoid waiting
> > > around for the suspend or resume to finish.
> > 
> > I agree except that I would like suspends to just fail when the status is
> > RPM_RESUMING.  The reason is that a sloppily written driver could enter a
> > busy-loop of suspending-resuming the device, without the possibility to process
> > data, if there's full symmetry between suspend and resume.  So, I'd like to
> > break that symmetry and make resume operations privileged with respect to
> > suspend and idle notifications.  
> 
> This follows from the new precedence rule.

Yes.

So, I guess we have the majority of things clarified and perhaps its time to
write a patch for further discussion. :-)

Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern July 5, 2009, 2:50 p.m. UTC | #29
On Sat, 4 Jul 2009, Rafael J. Wysocki wrote:

> > As for whether or not we should actually call cancel_work...  Which is 
> > more expensive: Calling cancel_work when no work is pending, or letting 
> > the work item run when it doesn't have anything to do?  Probably the 
> > latter.
> 
> Agreed, but that doesn't affect functionality.  We can get the desired
> functionality without the cancel_work() patch and then optimize things along
> with that patch.  This way it'll be easier to demontrate the benefit of it.

Good idea.

> That almost entirely depends on the bus type.  For PCI and probably PNP as well
> there's a notion of ACPI low power states and there are AML methods to put the
> devices into these states.  Unfortunately, the ACPI low power state to put the
> device into depends on the target sleep state of the system, so these devices
> will probably have to be put into D0 before system suspend anyway.
> 
> I think that the bus type can handle this as long as it knows the state the
> device is in before system suspend.  So, the only thing the core should do is
> to block the execution of run-time PM framework functions during system
> sleep and resume.  The state it leaves the device in shouldn't matter.
> 
> So, I think we can simply freeze the workqueue, set the 'disabled' bit for each
> device and wait for all run-time PM operations on it in progress to complete.
> 
> In the 'disabled' state the bus type or driver can modify the run-time PM
> status to whatever they like anyway.  Perhaps we can provide a helper to
> change 'request type' to RPM_REQ_NONE.

The only modification that really makes sense is like you said, going
back to full power in preparation for the platform suspend operation.  
Therefore perhaps we should allow pm_runtime_resume to work even when
rpm_disabled is set.  And if we're going to cancel pending suspend and
idle requests, then rpm_request would normally be RPM_REQ_NONE anyway.

Which leaves only the question of what to do when a resume request is 
pending...

> So, I guess we have the majority of things clarified and perhaps its time to
> write a patch for further discussion. :-)

Go ahead!

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki July 5, 2009, 9:47 p.m. UTC | #30
On Sunday 05 July 2009, Alan Stern wrote:
> On Sat, 4 Jul 2009, Rafael J. Wysocki wrote:
> 
> > > As for whether or not we should actually call cancel_work...  Which is 
> > > more expensive: Calling cancel_work when no work is pending, or letting 
> > > the work item run when it doesn't have anything to do?  Probably the 
> > > latter.
> > 
> > Agreed, but that doesn't affect functionality.  We can get the desired
> > functionality without the cancel_work() patch and then optimize things along
> > with that patch.  This way it'll be easier to demontrate the benefit of it.
> 
> Good idea.
> 
> > That almost entirely depends on the bus type.  For PCI and probably PNP as well
> > there's a notion of ACPI low power states and there are AML methods to put the
> > devices into these states.  Unfortunately, the ACPI low power state to put the
> > device into depends on the target sleep state of the system, so these devices
> > will probably have to be put into D0 before system suspend anyway.
> > 
> > I think that the bus type can handle this as long as it knows the state the
> > device is in before system suspend.  So, the only thing the core should do is
> > to block the execution of run-time PM framework functions during system
> > sleep and resume.  The state it leaves the device in shouldn't matter.
> > 
> > So, I think we can simply freeze the workqueue, set the 'disabled' bit for each
> > device and wait for all run-time PM operations on it in progress to complete.
> > 
> > In the 'disabled' state the bus type or driver can modify the run-time PM
> > status to whatever they like anyway.  Perhaps we can provide a helper to
> > change 'request type' to RPM_REQ_NONE.
> 
> The only modification that really makes sense is like you said, going
> back to full power in preparation for the platform suspend operation.  
> Therefore perhaps we should allow pm_runtime_resume to work even when
> rpm_disabled is set.  And if we're going to cancel pending suspend and
> idle requests, then rpm_request would normally be RPM_REQ_NONE anyway.

After we've disabled run-time PM with pm_runtime_disable(), the bus type
and driver can do whatever they like with the device, we don't care.  However,
they need to make sure that the state of the device will match its run-time PM
status when its run-time PM is enabled again.

> Which leaves only the question of what to do when a resume request is 
> pending...

I think the pm_runtime_disable() can carry out a synchronous wake-up if it
sees a pending resume request.  That would make sense in general, IMO, becuase
having a resume request pending usually means there's I/O to process and it's
better to allow the device to process that I/O before disabling the run-time
PM of it.

To put it differently, if there's a resume request pending, the run-time PM of
the device should be disabled while in the 'active' state rather than while in
the 'suspended' state.

Now, if we do that, the problem of run-time resume requests pending while
entering a system sleep state can be solved.  Namely, we can make
pm_runtime_disable() return a result that will be -EBUSY if a pending resume
request is found by it and 0 otherwise.  Then, that result can be used by
dpm_prepare() to decide whether to continue suspend or to terminate it if
the device is a wake-up one.

> > So, I guess we have the majority of things clarified and perhaps its time to
> > write a patch for further discussion. :-)
> 
> Go ahead!

In fact I've already done that, but I need to have a final look at it to check
if there are no obvious mistakes in there.

Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: usb-2.6/include/linux/workqueue.h
===================================================================
--- usb-2.6.orig/include/linux/workqueue.h
+++ usb-2.6/include/linux/workqueue.h
@@ -223,24 +223,10 @@  int execute_in_process_context(work_func
 extern int flush_work(struct work_struct *work);
 
 extern int cancel_work_sync(struct work_struct *work);
-
-/*
- * Kill off a pending schedule_delayed_work().  Note that the work callback
- * function may still be running on return from cancel_delayed_work(), unless
- * it returns 1 and the work doesn't re-arm itself. Run flush_workqueue() or
- * cancel_work_sync() to wait on it.
- */
-static inline int cancel_delayed_work(struct delayed_work *work)
-{
-	int ret;
-
-	ret = del_timer_sync(&work->timer);
-	if (ret)
-		work_clear_pending(&work->work);
-	return ret;
-}
+extern int cancel_work(struct work_struct *work);
 
 extern int cancel_delayed_work_sync(struct delayed_work *work);
+extern int cancel_delayed_work(struct delayed_work *dwork);
 
 /* Obsolete. use cancel_delayed_work_sync() */
 static inline
Index: usb-2.6/kernel/workqueue.c
===================================================================
--- usb-2.6.orig/kernel/workqueue.c
+++ usb-2.6/kernel/workqueue.c
@@ -465,6 +465,7 @@  static int try_to_grab_pending(struct wo
 {
 	struct cpu_workqueue_struct *cwq;
 	int ret = -1;
+	unsigned long flags;
 
 	if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work)))
 		return 0;
@@ -478,7 +479,7 @@  static int try_to_grab_pending(struct wo
 	if (!cwq)
 		return ret;
 
-	spin_lock_irq(&cwq->lock);
+	spin_lock_irqsave(&cwq->lock, flags);
 	if (!list_empty(&work->entry)) {
 		/*
 		 * This work is queued, but perhaps we locked the wrong cwq.
@@ -491,7 +492,7 @@  static int try_to_grab_pending(struct wo
 			ret = 1;
 		}
 	}
-	spin_unlock_irq(&cwq->lock);
+	spin_unlock_irqrestore(&cwq->lock, flags);
 
 	return ret;
 }
@@ -536,18 +537,26 @@  static void wait_on_work(struct work_str
 		wait_on_cpu_work(per_cpu_ptr(wq->cpu_wq, cpu), work);
 }
 
-static int __cancel_work_timer(struct work_struct *work,
+static int __cancel_work_timer(struct work_struct *work, bool wait,
 				struct timer_list* timer)
 {
 	int ret;
 
-	do {
-		ret = (timer && likely(del_timer(timer)));
-		if (!ret)
-			ret = try_to_grab_pending(work);
-		wait_on_work(work);
-	} while (unlikely(ret < 0));
+	if (timer && likely(del_timer(timer))) {
+		ret = 1;
+		goto done;
+	}
 
+	for (;;) {
+		ret = try_to_grab_pending(work);
+		if (likely(ret >= 0))
+			break;
+		if (in_interrupt())
+			return ret;
+	}
+	if (ret == 0 && wait)
+		wait_on_work(work);
+ done:
 	work_clear_pending(work);
 	return ret;
 }
@@ -575,11 +584,43 @@  static int __cancel_work_timer(struct wo
  */
 int cancel_work_sync(struct work_struct *work)
 {
-	return __cancel_work_timer(work, NULL);
+	return __cancel_work_timer(work, true, NULL);
 }
 EXPORT_SYMBOL_GPL(cancel_work_sync);
 
 /**
+ * cancel_work - try to cancel a pending work_struct.
+ * @work: the work_struct to cancel
+ *
+ * Try to cancel a pending work_struct before it starts running.
+ * Upon return, @work may safely be reused if the return value
+ * is 1 or the return value is 0 and the work callback function
+ * doesn't resubmit @work.
+ *
+ * The callback function may be running upon return if the return value
+ * is <= 0; use cancel_work_sync() to wait for the callback function
+ * to finish.
+ *
+ * There's not much point using this routine unless you can guarantee
+ * that neither the callback function nor anything else is in the
+ * process of submitting @work (or is about to do so).  The only good
+ * reason might be that optimistically trying to cancel @work has less
+ * overhead than letting it go ahead and run.
+ *
+ * This routine may be called from interrupt context.
+ *
+ * Returns: 1 if @work was removed from its workqueue,
+ *	    0 if @work was not pending (may be running),
+ *	   -1 if @work was concurrently being enqueued and we were
+ *		called in_interrupt.
+ */
+int cancel_work(struct work_struct *work)
+{
+	return __cancel_work_timer(work, false, NULL);
+}
+EXPORT_SYMBOL_GPL(cancel_work);
+
+/**
  * cancel_delayed_work_sync - reliably kill off a delayed work.
  * @dwork: the delayed work struct
  *
@@ -590,10 +631,24 @@  EXPORT_SYMBOL_GPL(cancel_work_sync);
  */
 int cancel_delayed_work_sync(struct delayed_work *dwork)
 {
-	return __cancel_work_timer(&dwork->work, &dwork->timer);
+	return __cancel_work_timer(&dwork->work, true, &dwork->timer);
 }
 EXPORT_SYMBOL(cancel_delayed_work_sync);
 
+/**
+ * cancel_delayed_work - try to cancel a delayed_work_struct.
+ * @dwork: the delayed_work_struct to cancel
+ *
+ * Try to cancel a pending delayed_work, either by deactivating its
+ * timer or by removing it from its workqueue.  This routine is just
+ * like cancel_work() except that it handles a delayed_work.
+ */
+int cancel_delayed_work(struct delayed_work *dwork)
+{
+	return __cancel_work_timer(&dwork->work, false, &dwork->timer);
+}
+EXPORT_SYMBOL(cancel_delayed_work);
+
 static struct workqueue_struct *keventd_wq __read_mostly;
 
 /**