diff mbox

[1/2] spi: Add missing pm_runtime_put_noidle() after failed get

Message ID 20180518173008.73291-2-tony@atomide.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tony Lindgren May 18, 2018, 5:30 p.m. UTC
If pm_runtime_get_sync() fails we should call pm_runtime_put_noidle().
This is probably not a critical fix as we should only hit this when
things are broken elsewhere.

Signed-off-by: Tony Lindgren <tony@atomide.com>
---
 drivers/spi/spi.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Mark Brown May 21, 2018, 3:46 p.m. UTC | #1
On Fri, May 18, 2018 at 10:30:07AM -0700, Tony Lindgren wrote:
> If pm_runtime_get_sync() fails we should call pm_runtime_put_noidle().
> This is probably not a critical fix as we should only hit this when
> things are broken elsewhere.

This feels like a bug in the runtime PM APIs to be honest - I'd really
not expect that if a function call like a get failed there'd be any
cleanup to do.  I'd expect a very high proportion of users to have the
same problem due to this.
Andy Shevchenko May 21, 2018, 8:23 p.m. UTC | #2
+Cc: Rafael

On Mon, May 21, 2018 at 6:46 PM, Mark Brown <broonie@kernel.org> wrote:
> On Fri, May 18, 2018 at 10:30:07AM -0700, Tony Lindgren wrote:
>> If pm_runtime_get_sync() fails we should call pm_runtime_put_noidle().
>> This is probably not a critical fix as we should only hit this when
>> things are broken elsewhere.
>
> This feels like a bug in the runtime PM APIs to be honest - I'd really
> not expect that if a function call like a get failed there'd be any
> cleanup to do.  I'd expect a very high proportion of users to have the
> same problem due to this.

I don't remember the full and correct explanation, but there is a
rationale behind such behaviour (I suppose it's related to sync/async
agnosticism of RPM ops)
Girish Mahadevan May 21, 2018, 10:19 p.m. UTC | #3
Hi,

On 5/21/2018 2:23 PM, Andy Shevchenko wrote:
> +Cc: Rafael
> 
> On Mon, May 21, 2018 at 6:46 PM, Mark Brown <broonie@kernel.org> wrote:
>> On Fri, May 18, 2018 at 10:30:07AM -0700, Tony Lindgren wrote:
>>> If pm_runtime_get_sync() fails we should call pm_runtime_put_noidle().
>>> This is probably not a critical fix as we should only hit this when
>>> things are broken elsewhere.
>>
>> This feels like a bug in the runtime PM APIs to be honest - I'd really
>> not expect that if a function call like a get failed there'd be any
>> cleanup to do.  I'd expect a very high proportion of users to have the
>> same problem due to this.
> 
> I don't remember the full and correct explanation, but there is a
> rationale behind such behaviour (I suppose it's related to sync/async
> agnosticism of RPM ops)
> 

We've seen this behavior before and have had similar code to protect
against these failures before.

The pm_runtime_get_sync() fails in certain conditions (for e.g when
called during certain portions of the system suspend/resume).

AFAIK:
Since the runtime APIs internally increment the refcount then call
rpm_resume(), in case rpm_resume() fails (for e.g in case RPM is
disabled) then the driver has to either put the refcount and not go
through with the transaction OR manually move the RPM state and call the
RPM callback APIs internally.

Best Regards
Girish
Mark Brown May 22, 2018, 9:32 a.m. UTC | #4
On Mon, May 21, 2018 at 04:19:07PM -0600, Mahadevan, Girish wrote:
> On 5/21/2018 2:23 PM, Andy Shevchenko wrote:
> > On Mon, May 21, 2018 at 6:46 PM, Mark Brown <broonie@kernel.org> wrote:
> >> On Fri, May 18, 2018 at 10:30:07AM -0700, Tony Lindgren wrote:

> >>> If pm_runtime_get_sync() fails we should call pm_runtime_put_noidle().
> >>> This is probably not a critical fix as we should only hit this when
> >>> things are broken elsewhere.

> >> This feels like a bug in the runtime PM APIs to be honest - I'd really
> >> not expect that if a function call like a get failed there'd be any
> >> cleanup to do.  I'd expect a very high proportion of users to have the
> >> same problem due to this.

> > I don't remember the full and correct explanation, but there is a
> > rationale behind such behaviour (I suppose it's related to sync/async
> > agnosticism of RPM ops)

> We've seen this behavior before and have had similar code to protect
> against these failures before.

> The pm_runtime_get_sync() fails in certain conditions (for e.g when
> called during certain portions of the system suspend/resume).

> AFAIK:
> Since the runtime APIs internally increment the refcount then call
> rpm_resume(), in case rpm_resume() fails (for e.g in case RPM is
> disabled) then the driver has to either put the refcount and not go
> through with the transaction OR manually move the RPM state and call the
> RPM callback APIs internally.

That's still really confusing and surprising, it seems like if the
driver wants to manually call the callbacks there should be a specific
API that it can use to get that behaviour rather than requiring every
user to know that this might happen and manually handle it.
Wysocki, Rafael J May 22, 2018, 6:01 p.m. UTC | #5
On 5/22/2018 11:32 AM, Mark Brown wrote:
> On Mon, May 21, 2018 at 04:19:07PM -0600, Mahadevan, Girish wrote:
>> On 5/21/2018 2:23 PM, Andy Shevchenko wrote:
>>> On Mon, May 21, 2018 at 6:46 PM, Mark Brown <broonie@kernel.org> wrote:
>>>> On Fri, May 18, 2018 at 10:30:07AM -0700, Tony Lindgren wrote:
>>>>> If pm_runtime_get_sync() fails we should call pm_runtime_put_noidle().
>>>>> This is probably not a critical fix as we should only hit this when
>>>>> things are broken elsewhere.
>>>> This feels like a bug in the runtime PM APIs to be honest - I'd really
>>>> not expect that if a function call like a get failed there'd be any
>>>> cleanup to do.  I'd expect a very high proportion of users to have the
>>>> same problem due to this.
>>> I don't remember the full and correct explanation, but there is a
>>> rationale behind such behaviour (I suppose it's related to sync/async
>>> agnosticism of RPM ops)
>> We've seen this behavior before and have had similar code to protect
>> against these failures before.
>> The pm_runtime_get_sync() fails in certain conditions (for e.g when
>> called during certain portions of the system suspend/resume).
>> AFAIK:
>> Since the runtime APIs internally increment the refcount then call
>> rpm_resume(), in case rpm_resume() fails (for e.g in case RPM is
>> disabled) then the driver has to either put the refcount and not go
>> through with the transaction OR manually move the RPM state and call the
>> RPM callback APIs internally.
> That's still really confusing and surprising, it seems like if the
> driver wants to manually call the callbacks there should be a specific
> API that it can use to get that behaviour rather than requiring every
> user to know that this might happen and manually handle it.


That's a design choice made a long time a go and sorry for it turning 
out to be inconvenient.

However, there are pieces of code in the kernel that don't check the 
return value of pm_runtime_get_sync() and then call pm_runtime_put() (or 
equivalent) after that unconditionally. They all would need to be 
changed to do the latter only if the preceding pm_runtime_get_sync() was 
successful.


--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Brown May 23, 2018, 8:33 a.m. UTC | #6
On Tue, May 22, 2018 at 08:01:29PM +0200, Rafael J. Wysocki wrote:
> On 5/22/2018 11:32 AM, Mark Brown wrote:

> > That's still really confusing and surprising, it seems like if the
> > driver wants to manually call the callbacks there should be a specific
> > API that it can use to get that behaviour rather than requiring every
> > user to know that this might happen and manually handle it.

> That's a design choice made a long time a go and sorry for it turning out to
> be inconvenient.

It's not so much that it's inconvenient as that it's a really surprising
and unusual which makes it error prone - both in terms of people not
being aware of the behaviour and in terms of the expected handling
looking wrong.

> However, there are pieces of code in the kernel that don't check the return
> value of pm_runtime_get_sync() and then call pm_runtime_put() (or
> equivalent) after that unconditionally. They all would need to be changed to
> do the latter only if the preceding pm_runtime_get_sync() was successful.

This does mean that we're encouraging code that ignores errors which
doesn't seem ideal.  On the other hand trying to change anything at this
point might be more trouble than it's worth if there's too many users,
perhaps it's better to just try to document this more.
diff mbox

Patch

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -1213,6 +1213,7 @@  static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
 	if (!was_busy && ctlr->auto_runtime_pm) {
 		ret = pm_runtime_get_sync(ctlr->dev.parent);
 		if (ret < 0) {
+			pm_runtime_put_noidle(ctlr->dev.parent);
 			dev_err(&ctlr->dev, "Failed to power device: %d\n",
 				ret);
 			mutex_unlock(&ctlr->io_mutex);