Message ID | 913f1715-bdd0-1c03-ad76-38be9d3d2298@nvidia.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On Wednesday, January 24, 2018 9:53:14 PM CET Bo Yan wrote: > > On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote: > > On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote: > >> drivers/cpufreq/cpufreq.c | 4 ++++ > >> 1 file changed, 4 insertions(+) > >> > >> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > >> index 41d148af7748..95b1c4afe14e 100644 > >> --- a/drivers/cpufreq/cpufreq.c > >> +++ b/drivers/cpufreq/cpufreq.c > >> @@ -1680,6 +1680,10 @@ void cpufreq_resume(void) > >> if (!cpufreq_driver) > >> return; > >> > >> + if (unlikely(!cpufreq_suspended)) { > >> + pr_warn("%s: resume after failing suspend\n", __func__); > >> + return; > >> + } > >> cpufreq_suspended = false; > >> > >> if (!has_target() && !cpufreq_driver->resume) > >> > > Good catch, but rather than doing this it would be better to avoid > > calling cpufreq_resume() at all if cpufreq_suspend() has not been called. > Yes, I thought about that, but there is no good way to skip over it > without introducing another flag. cpufreq_resume is called by > dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure > case, dpm_resume is called, but dpm_suspend is not. So on a higher level > it's already unbalanced. > > One possibility is to rely on the pm_transition flag. So something like: > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > index dc259d20c967..8469e6fc2b2c 100644 > --- a/drivers/base/power/main.c > +++ b/drivers/base/power/main.c > @@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t > cookie) > void dpm_resume(pm_message_t state) > { > struct device *dev; > + bool suspended = (pm_transition.event != PM_EVENT_ON); > ktime_t starttime = ktime_get(); > > trace_suspend_resume(TPS("dpm_resume"), state.event, true); > @@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state) > async_synchronize_full(); > dpm_show_time(starttime, state, NULL); > > - cpufreq_resume(); > + if (likely(suspended)) > + cpufreq_resume(); > trace_suspend_resume(TPS("dpm_resume"), state.event, false); > } I was thinking about something else. Anyway, I think your original patch is OK too, but without printing the message. Just combine the cpufreq_suspended check with the cpufreq_driver one and the unlikely() thing is not necessary. Thanks, Rafael
On 02/02/2018 03:54 AM, Rafael J. Wysocki wrote: > On Wednesday, January 24, 2018 9:53:14 PM CET Bo Yan wrote: >> >> On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote: >>> On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote: >>>> drivers/cpufreq/cpufreq.c | 4 ++++ >>>> 1 file changed, 4 insertions(+) >>>> >>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >>>> index 41d148af7748..95b1c4afe14e 100644 >>>> --- a/drivers/cpufreq/cpufreq.c >>>> +++ b/drivers/cpufreq/cpufreq.c >>>> @@ -1680,6 +1680,10 @@ void cpufreq_resume(void) >>>> if (!cpufreq_driver) >>>> return; >>>> >>>> + if (unlikely(!cpufreq_suspended)) { >>>> + pr_warn("%s: resume after failing suspend\n", __func__); >>>> + return; >>>> + } >>>> cpufreq_suspended = false; >>>> >>>> if (!has_target() && !cpufreq_driver->resume) >>>> >>> Good catch, but rather than doing this it would be better to avoid >>> calling cpufreq_resume() at all if cpufreq_suspend() has not been called. >> Yes, I thought about that, but there is no good way to skip over it >> without introducing another flag. cpufreq_resume is called by >> dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure >> case, dpm_resume is called, but dpm_suspend is not. So on a higher level >> it's already unbalanced. >> >> One possibility is to rely on the pm_transition flag. So something like: >> >> >> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c >> index dc259d20c967..8469e6fc2b2c 100644 >> --- a/drivers/base/power/main.c >> +++ b/drivers/base/power/main.c >> @@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t >> cookie) >> void dpm_resume(pm_message_t state) >> { >> struct device *dev; >> + bool suspended = (pm_transition.event != PM_EVENT_ON); >> ktime_t starttime = ktime_get(); >> >> trace_suspend_resume(TPS("dpm_resume"), state.event, true); >> @@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state) >> async_synchronize_full(); >> dpm_show_time(starttime, state, NULL); >> >> - cpufreq_resume(); >> + if (likely(suspended)) >> + cpufreq_resume(); >> trace_suspend_resume(TPS("dpm_resume"), state.event, false); >> } > > I was thinking about something else. > > Anyway, I think your original patch is OK too, but without printing the > message. Just combine the cpufreq_suspended check with the cpufreq_driver > one and the unlikely() thing is not necessary. > I rather have this fixed in the dpm_suspend/resume() code. This is just masking the first issue that's being caused by unbalanced error handling. If that means adding flags in dpm_suspend/resume() then that's what we should do right now and clean it up later if it can be improved. Making cpufreq more messy doesn't seem like the right answer. Thanks, Saravana
On 02/02/2018 11:34 AM, Saravana Kannan wrote: > On 02/02/2018 03:54 AM, Rafael J. Wysocki wrote: >> On Wednesday, January 24, 2018 9:53:14 PM CET Bo Yan wrote: >>> >>> On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote: >>>> On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote: >>>>> drivers/cpufreq/cpufreq.c | 4 ++++ >>>>> 1 file changed, 4 insertions(+) >>>>> >>>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >>>>> index 41d148af7748..95b1c4afe14e 100644 >>>>> --- a/drivers/cpufreq/cpufreq.c >>>>> +++ b/drivers/cpufreq/cpufreq.c >>>>> @@ -1680,6 +1680,10 @@ void cpufreq_resume(void) >>>>> if (!cpufreq_driver) >>>>> return; >>>>> >>>>> + if (unlikely(!cpufreq_suspended)) { >>>>> + pr_warn("%s: resume after failing suspend\n", __func__); >>>>> + return; >>>>> + } >>>>> cpufreq_suspended = false; >>>>> >>>>> if (!has_target() && !cpufreq_driver->resume) >>>>> >>>> Good catch, but rather than doing this it would be better to avoid >>>> calling cpufreq_resume() at all if cpufreq_suspend() has not been >>>> called. >>> Yes, I thought about that, but there is no good way to skip over it >>> without introducing another flag. cpufreq_resume is called by >>> dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure >>> case, dpm_resume is called, but dpm_suspend is not. So on a higher >>> level >>> it's already unbalanced. >>> >>> One possibility is to rely on the pm_transition flag. So something >>> like: >>> >>> >>> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c >>> index dc259d20c967..8469e6fc2b2c 100644 >>> --- a/drivers/base/power/main.c >>> +++ b/drivers/base/power/main.c >>> @@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t >>> cookie) >>> void dpm_resume(pm_message_t state) >>> { >>> struct device *dev; >>> + bool suspended = (pm_transition.event != PM_EVENT_ON); >>> ktime_t starttime = ktime_get(); >>> >>> trace_suspend_resume(TPS("dpm_resume"), state.event, true); >>> @@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state) >>> async_synchronize_full(); >>> dpm_show_time(starttime, state, NULL); >>> >>> - cpufreq_resume(); >>> + if (likely(suspended)) >>> + cpufreq_resume(); >>> trace_suspend_resume(TPS("dpm_resume"), state.event, false); >>> } >> >> I was thinking about something else. >> >> Anyway, I think your original patch is OK too, but without printing the >> message. Just combine the cpufreq_suspended check with the >> cpufreq_driver >> one and the unlikely() thing is not necessary. >> > > I rather have this fixed in the dpm_suspend/resume() code. This is > just masking the first issue that's being caused by unbalanced error > handling. If that means adding flags in dpm_suspend/resume() then > that's what we should do right now and clean it up later if it can be > improved. Making cpufreq more messy doesn't seem like the right answer. > > Thanks, > Saravana > > dpm_suspend and dpm_resume by themselves are not balanced in this particular case. As it's currently structured, dpm_resume can't be omitted even if dpm_suspend is skipped due to earlier failure. I think checking cpufreq_suspended flag is a reasonable compromise. If we can find a way to make dpm_suspend/dpm_resume also balanced, that will be best.
On 02-02-18, 13:28, Bo Yan wrote: > On 02/02/2018 11:34 AM, Saravana Kannan wrote: > >I rather have this fixed in the dpm_suspend/resume() code. This is just > >masking the first issue that's being caused by unbalanced error handling. > >If that means adding flags in dpm_suspend/resume() then that's what we > >should do right now and clean it up later if it can be improved. Making > >cpufreq more messy doesn't seem like the right answer. +1 > dpm_suspend and dpm_resume by themselves are not balanced in this particular > case. As it's currently structured, dpm_resume can't be omitted even if > dpm_suspend is skipped due to earlier failure. I think checking > cpufreq_suspended flag is a reasonable compromise. If we can find a way to > make dpm_suspend/dpm_resume also balanced, that will be best. I think cpufreq is just one of the users which broke. Others didn't break because: - They don't have a complicated resume part. - Or we just don't know that they broke. Resuming something that never suspended is just broken by design. Yeah, its much simpler in this particular case to fix cpufreq core but the suspend/resume/hibernation part is really core kernel and should be fixed to avoid such band-aids.
On Monday, February 5, 2018 5:01:18 AM CET Viresh Kumar wrote: > On 02-02-18, 13:28, Bo Yan wrote: > > On 02/02/2018 11:34 AM, Saravana Kannan wrote: > > >I rather have this fixed in the dpm_suspend/resume() code. This is just > > >masking the first issue that's being caused by unbalanced error handling. > > >If that means adding flags in dpm_suspend/resume() then that's what we > > >should do right now and clean it up later if it can be improved. Making > > >cpufreq more messy doesn't seem like the right answer. > > +1 > > > dpm_suspend and dpm_resume by themselves are not balanced in this particular > > case. As it's currently structured, dpm_resume can't be omitted even if > > dpm_suspend is skipped due to earlier failure. I think checking > > cpufreq_suspended flag is a reasonable compromise. If we can find a way to > > make dpm_suspend/dpm_resume also balanced, that will be best. > > I think cpufreq is just one of the users which broke. Others didn't break > because: > > - They don't have a complicated resume part. > - Or we just don't know that they broke. No and no. > Resuming something that never suspended is just broken by design. Yeah, its much > simpler in this particular case to fix cpufreq core but the > suspend/resume/hibernation part is really core kernel and should be fixed to > avoid such band-aids. By design (which I admit may be confusing) it should be fine to call dpm_resume_end() after a failing dpm_suspend_start(), whatever the reason for the failure is. cpufreq_suspend/resume() don't take that into account, everybody else does. Thanks, Rafael
On 05-02-18, 09:50, Rafael J. Wysocki wrote: > By design (which I admit may be confusing) it should be fine to call > dpm_resume_end() after a failing dpm_suspend_start(), whatever the reason > for the failure is. cpufreq_suspend/resume() don't take that into account, > everybody else does. Hmm, I see. Can't do much then, just fix the only broken piece of code :)
On 02/05/2018 01:05 AM, Viresh Kumar wrote: > On 05-02-18, 09:50, Rafael J. Wysocki wrote: >> By design (which I admit may be confusing) it should be fine to call >> dpm_resume_end() after a failing dpm_suspend_start(), whatever the reason >> for the failure is. cpufreq_suspend/resume() don't take that into account, >> everybody else does. > > Hmm, I see. Can't do much then, just fix the only broken piece of code :) > Sorry for the late reply, this email didn't get filtered into the right folder. I think the design of dpm_suspend_start() and dpm_resume_end() generally works fine because we seem to keep track of what devices have been suspended so far (in the dpm_suspended_list) and call resume only of those. So, why isn't the right fix to have cpufreq get put into that list? Instead of just always call it on the resume path even if it wasn't suspended? That seems to be the real issue. So, we should either have dpm_suspend/resume() have a flag to keep track of if cpufreq_suspend/resume() was called and make sure they are called in proper pairs. Or have cpufreq register in a way that gets it put in the suspend/resume list. I'd still like to NACK this change. -Saravana
On Thursday, February 15, 2018 10:27:10 PM CET Saravana Kannan wrote: > On 02/05/2018 01:05 AM, Viresh Kumar wrote: > > On 05-02-18, 09:50, Rafael J. Wysocki wrote: > >> By design (which I admit may be confusing) it should be fine to call > >> dpm_resume_end() after a failing dpm_suspend_start(), whatever the reason > >> for the failure is. cpufreq_suspend/resume() don't take that into account, > >> everybody else does. > > > > Hmm, I see. Can't do much then, just fix the only broken piece of code :) > > > > Sorry for the late reply, this email didn't get filtered into the right > folder. > > I think the design of dpm_suspend_start() and dpm_resume_end() generally > works fine because we seem to keep track of what devices have been > suspended so far (in the dpm_suspended_list) and call resume only of > those. So, why isn't the right fix to have cpufreq get put into that > list? Because it is more complicated? > Instead of just always call it on the resume path even if it > wasn't suspended? That seems to be the real issue. > > So, we should either have dpm_suspend/resume() have a flag to keep track > of if cpufreq_suspend/resume() was called and make sure they are called > in proper pairs. Why? > Or have cpufreq register in a way that gets it put in > the suspend/resume list. > > I'd still like to NACK this change. It's gone in already, sorry.
diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index dc259d20c967..8469e6fc2b2c 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t cookie) void dpm_resume(pm_message_t state) { struct device *dev; + bool suspended = (pm_transition.event != PM_EVENT_ON); ktime_t starttime = ktime_get();