diff mbox

[7/8] cpufreq: Preserve policy structure across suspend/resume

Message ID 20130711221704.547.64296.stgit@srivatsabhat.in.ibm.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Srivatsa S. Bhat July 11, 2013, 10:17 p.m. UTC
To perform light-weight cpu-init and teardown in the cpufreq subsystem
during suspend/resume, we need to separate out the 2 main functionalities
of the cpufreq CPU hotplug callbacks, as outlined below:

1. Init/tear-down of core cpufreq and CPU-specific components, which are
   critical to the correct functioning of the cpufreq subsystem.

2. Init/tear-down of cpufreq sysfs files during suspend/resume.

The first part requires accurate updates to the policy structure such as
its ->cpus and ->related_cpus masks, whereas the second part requires that
the policy->kobj structure is not released or re-initialized during
suspend/resume.

To handle both these requirements, we need to allow updates to the policy
structure throughout suspend/resume, but prevent the structure from getting
freed up. Also, we must have a mechanism by which the cpu-up callbacks can
restore the policy structure, without allocating things afresh. (That also
helps avoid memory leaks).

To achieve this, we use 2 schemes:
a. Use a fallback per-cpu storage area for preserving the policy structures
   during suspend, so that they can be restored during resume appropriately.

b. Use the 'frozen' flag to determine when to free or allocate the policy
   structure vs when to restore the policy from the saved fallback storage.
   Thus we can successfully preserve the structure across suspend/resume.

Effectively, this helps us complete the separation of the 'light-weight'
and the 'full' init/tear-down sequences in the cpufreq subsystem, so that
this can be made use of in the suspend/resume scenario.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 drivers/cpufreq/cpufreq.c |   69 ++++++++++++++++++++++++++++++++++-----------
 1 file changed, 52 insertions(+), 17 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Viresh Kumar July 15, 2013, 9:55 a.m. UTC | #1
Hi Srivatsa,

I may be wrong but it looks something is wrong in this patch.

On 12 July 2013 03:47, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c

> @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev,
>         if ((cpus == 1) && (cpufreq_driver->target))
>                 __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);
>
> -       pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
> -       cpufreq_cpu_put(data);
> +       if (!frozen) {
> +               pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
> +               cpufreq_cpu_put(data);

So, we don't decrement usage count here. But we are still increasing
counts on cpufreq_add_dev after resume, isn't it?

So, we wouldn't be able to free policy struct once all the cpus of a
policy are removed after suspend/resume has happened once.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Srivatsa S. Bhat July 15, 2013, 10:05 a.m. UTC | #2
On 07/15/2013 03:25 PM, Viresh Kumar wrote:
> Hi Srivatsa,
> 
> I may be wrong but it looks something is wrong in this patch.
> 
> On 12 July 2013 03:47, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> 
>> @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev,
>>         if ((cpus == 1) && (cpufreq_driver->target))
>>                 __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);
>>
>> -       pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
>> -       cpufreq_cpu_put(data);
>> +       if (!frozen) {
>> +               pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
>> +               cpufreq_cpu_put(data);
> 
> So, we don't decrement usage count here. But we are still increasing
> counts on cpufreq_add_dev after resume, isn't it?
> 
> So, we wouldn't be able to free policy struct once all the cpus of a
> policy are removed after suspend/resume has happened once.
> 

Actually even I was wondering about this while writing the patch and
I even tested shutdown after multiple suspend/resume cycles, to verify that
the refcount is messed up. But surprisingly, things worked just fine.

Logically there should've been a refcount mismatch and things should have
failed, but everything worked fine during my tests. Apart from suspend/resume
and shutdown tests, I even tried mixing a few regular CPU hotplug operations
(echo 0/1 to sysfs online files), but nothing stood out.

Sorry, I forgot to document this in the patch. Either the patch is wrong
or something else is silently fixing this up. Not sure what is the exact
situation.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar July 15, 2013, 10:21 a.m. UTC | #3
On 15 July 2013 15:35, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Actually even I was wondering about this while writing the patch and
> I even tested shutdown after multiple suspend/resume cycles, to verify that
> the refcount is messed up. But surprisingly, things worked just fine.

What kind of system have you tested it on?
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki July 15, 2013, 11:35 a.m. UTC | #4
On Monday, July 15, 2013 03:35:04 PM Srivatsa S. Bhat wrote:
> On 07/15/2013 03:25 PM, Viresh Kumar wrote:
> > Hi Srivatsa,
> > 
> > I may be wrong but it looks something is wrong in this patch.
> > 
> > On 12 July 2013 03:47, Srivatsa S. Bhat
> > <srivatsa.bhat@linux.vnet.ibm.com> wrote:
> >> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > 
> >> @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev,
> >>         if ((cpus == 1) && (cpufreq_driver->target))
> >>                 __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);
> >>
> >> -       pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
> >> -       cpufreq_cpu_put(data);
> >> +       if (!frozen) {
> >> +               pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
> >> +               cpufreq_cpu_put(data);
> > 
> > So, we don't decrement usage count here. But we are still increasing
> > counts on cpufreq_add_dev after resume, isn't it?
> > 
> > So, we wouldn't be able to free policy struct once all the cpus of a
> > policy are removed after suspend/resume has happened once.
> > 
> 
> Actually even I was wondering about this while writing the patch and
> I even tested shutdown after multiple suspend/resume cycles, to verify that
> the refcount is messed up. But surprisingly, things worked just fine.
> 
> Logically there should've been a refcount mismatch and things should have
> failed, but everything worked fine during my tests. Apart from suspend/resume
> and shutdown tests, I even tried mixing a few regular CPU hotplug operations
> (echo 0/1 to sysfs online files), but nothing stood out.
> 
> Sorry, I forgot to document this in the patch. Either the patch is wrong
> or something else is silently fixing this up. Not sure what is the exact
> situation.

OK, so I'm not going to queue [2-8/8] up until we find out what's going on
here (and until Toralf tells me that it doesn't break his system any more).

I've queued up [1/8] for 3.11 already.

Thanks,
Rafael
Srivatsa S. Bhat July 15, 2013, 11:53 a.m. UTC | #5
On 07/15/2013 05:05 PM, Rafael J. Wysocki wrote:
> On Monday, July 15, 2013 03:35:04 PM Srivatsa S. Bhat wrote:
>> On 07/15/2013 03:25 PM, Viresh Kumar wrote:
>>> Hi Srivatsa,
>>>
>>> I may be wrong but it looks something is wrong in this patch.
>>>
>>> On 12 July 2013 03:47, Srivatsa S. Bhat
>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>>>
>>>> @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev,
>>>>         if ((cpus == 1) && (cpufreq_driver->target))
>>>>                 __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);
>>>>
>>>> -       pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
>>>> -       cpufreq_cpu_put(data);
>>>> +       if (!frozen) {
>>>> +               pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
>>>> +               cpufreq_cpu_put(data);
>>>
>>> So, we don't decrement usage count here. But we are still increasing
>>> counts on cpufreq_add_dev after resume, isn't it?
>>>
>>> So, we wouldn't be able to free policy struct once all the cpus of a
>>> policy are removed after suspend/resume has happened once.
>>>
>>
>> Actually even I was wondering about this while writing the patch and
>> I even tested shutdown after multiple suspend/resume cycles, to verify that
>> the refcount is messed up. But surprisingly, things worked just fine.
>>
>> Logically there should've been a refcount mismatch and things should have
>> failed, but everything worked fine during my tests. Apart from suspend/resume
>> and shutdown tests, I even tried mixing a few regular CPU hotplug operations
>> (echo 0/1 to sysfs online files), but nothing stood out.
>>
>> Sorry, I forgot to document this in the patch. Either the patch is wrong
>> or something else is silently fixing this up. Not sure what is the exact
>> situation.
> 
> OK, so I'm not going to queue [2-8/8] up until we find out what's going on
> here (and until Toralf tells me that it doesn't break his system any more).
> 

Ok, that sounds good.

> I've queued up [1/8] for 3.11 already.
> 

Thank you!
 
Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar July 16, 2013, 6:15 a.m. UTC | #6
On 15 July 2013 15:35, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Actually even I was wondering about this while writing the patch and
> I even tested shutdown after multiple suspend/resume cycles, to verify that
> the refcount is messed up. But surprisingly, things worked just fine.
>
> Logically there should've been a refcount mismatch and things should have
> failed, but everything worked fine during my tests. Apart from suspend/resume
> and shutdown tests, I even tried mixing a few regular CPU hotplug operations
> (echo 0/1 to sysfs online files), but nothing stood out.
>
> Sorry, I forgot to document this in the patch. Either the patch is wrong
> or something else is silently fixing this up. Not sure what is the exact
> situation.

To understand it I actually applied your patches to get better view of the code.
(Haven't tested it though).. And found that your code is doing the right thing
and we shouldn't get a mismatch.. This is the sequence of events I can draw:

- __cpu_add_dev() for first cpu. sets the refcount to 'x', where x are
the no. of
cpus in its clock domain.
- _cpu_add_dev() for other cpus: doesn't change anything in refcount

- Suspend:
 - cpu_remove_dev() for all cpus, due to frozen flag we don't touch the value
of count
- Resume:
 - cpu_add_dev() for all cpus, due to frozen flag we don't touch the
value of count.

And so things work as expected. That's why your code isn't breaking anything I
believe.

But can no. of cpus change inbetween suspend/resume? Then count would be
tricky as we are using the same policy structure.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Srivatsa S. Bhat July 16, 2013, 8:56 a.m. UTC | #7
On 07/16/2013 11:45 AM, Viresh Kumar wrote:
> On 15 July 2013 15:35, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Actually even I was wondering about this while writing the patch and
>> I even tested shutdown after multiple suspend/resume cycles, to verify that
>> the refcount is messed up. But surprisingly, things worked just fine.
>>
>> Logically there should've been a refcount mismatch and things should have
>> failed, but everything worked fine during my tests. Apart from suspend/resume
>> and shutdown tests, I even tried mixing a few regular CPU hotplug operations
>> (echo 0/1 to sysfs online files), but nothing stood out.
>>
>> Sorry, I forgot to document this in the patch. Either the patch is wrong
>> or something else is silently fixing this up. Not sure what is the exact
>> situation.
> 
> To understand it I actually applied your patches to get better view of the code.
> (Haven't tested it though).. And found that your code is doing the right thing
> and we shouldn't get a mismatch.. This is the sequence of events I can draw:
> 
> - __cpu_add_dev() for first cpu. sets the refcount to 'x', where x are
> the no. of
> cpus in its clock domain.
> - _cpu_add_dev() for other cpus: doesn't change anything in refcount
> 
> - Suspend:
>  - cpu_remove_dev() for all cpus, due to frozen flag we don't touch the value
> of count
> - Resume:
>  - cpu_add_dev() for all cpus, due to frozen flag we don't touch the
> value of count.
>

Actually this one is tricky (I took a look again). So we have this code in the
beginning of _cpufreq_add_dev():


1008 #ifdef CONFIG_SMP
1009         /* check whether a different CPU already registered this
1010          * CPU because it is in the same boat. */
1011         policy = cpufreq_cpu_get(cpu);
1012         if (unlikely(policy)) {
1013                 cpufreq_cpu_put(policy);
1014                 return 0;
1015         }

The _get() is not controlled by the frozen flag, but it still doesn't take a
refcount because of a subtle reason: per_cpu(cpufreq_cpu_data, cpu) was set to
NULL in __cpufreq_remove_dev() and the memory was saved away in fallback storage.
So, when __cpufreq_cpu_get() executes, it sees:

 204         /* get the CPU */
 205         data = per_cpu(cpufreq_cpu_data, cpu);
 206 
 207         if (!data)
 208                 goto err_out_put_module;

Thus, since data is NULL, cpufreq_cpu_get() won't take a refcount and will return
silently.

Further down in __cpufreq_add_dev(), we restore the original memory, using
the frozen flag:

1037         if (frozen)
1038                 /* Restore the saved policy when doing light-weight init */
1039                 policy = cpufreq_policy_restore(cpu);
1040         else
1041                 policy = cpufreq_policy_alloc();


So that is how we manage to fool cpufreq_cpu_get() into not taking a fresh
refcount while resuming :)
 
> And so things work as expected. That's why your code isn't breaking anything I
> believe.
> 

Thanks a lot for the code inspection and your detailed analysis!

> But can no. of cpus change inbetween suspend/resume? Then count would be
> tricky as we are using the same policy structure.
> 

No, number of CPUs won't change in between suspend/resume. Even if somebody
tried that, that would be an eccentric case and we won't handle that.
Besides, *many more* things will break than just cpufreq, if somebody actually
tries that out!

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar July 16, 2013, 9:10 a.m. UTC | #8
On 16 July 2013 14:26, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 07/16/2013 11:45 AM, Viresh Kumar wrote:

>> To understand it I actually applied your patches to get better view of the code.
>> (Haven't tested it though).. And found that your code is doing the right thing
>> and we shouldn't get a mismatch.. This is the sequence of events I can draw:
>>
>> - __cpu_add_dev() for first cpu. sets the refcount to 'x', where x are
>> the no. of
>> cpus in its clock domain.
>> - _cpu_add_dev() for other cpus: doesn't change anything in refcount
>>
>> - Suspend:
>>  - cpu_remove_dev() for all cpus, due to frozen flag we don't touch the value
>> of count
>> - Resume:
>>  - cpu_add_dev() for all cpus, due to frozen flag we don't touch the
>> value of count.
>>
>
> Actually this one is tricky (I took a look again). So we have this code in the
> beginning of _cpufreq_add_dev():
>
>
> 1008 #ifdef CONFIG_SMP
> 1009         /* check whether a different CPU already registered this
> 1010          * CPU because it is in the same boat. */
> 1011         policy = cpufreq_cpu_get(cpu);
> 1012         if (unlikely(policy)) {
> 1013                 cpufreq_cpu_put(policy);
> 1014                 return 0;
> 1015         }
>
> The _get() is not controlled by the frozen flag, but it still doesn't take a
> refcount because of a subtle reason: per_cpu(cpufreq_cpu_data, cpu) was set to
> NULL in __cpufreq_remove_dev() and the memory was saved away in fallback storage.
> So, when __cpufreq_cpu_get() executes, it sees:
>
>  204         /* get the CPU */
>  205         data = per_cpu(cpufreq_cpu_data, cpu);
>  206
>  207         if (!data)
>  208                 goto err_out_put_module;
>
> Thus, since data is NULL, cpufreq_cpu_get() won't take a refcount and will return
> silently.

Even if this wouldn't have happened, refcount wouldn't have been
touched due to this code:

> 1012         if (unlikely(policy)) {
> 1013                 cpufreq_cpu_put(policy);
> 1014                 return 0;
> 1015         }

i.e. If we get a valid policy structure, we siimply put the policy again
and so decrement the incremented refcount.

So, even if you don't keep the fallback storage, things should work
without any issue (probably worth trying as this will get rid of a per
cpu variable :))

> Further down in __cpufreq_add_dev(), we restore the original memory, using
> the frozen flag:
>
> 1037         if (frozen)
> 1038                 /* Restore the saved policy when doing light-weight init */
> 1039                 policy = cpufreq_policy_restore(cpu);
> 1040         else
> 1041                 policy = cpufreq_policy_alloc();
>
>
> So that is how we manage to fool cpufreq_cpu_get() into not taking a fresh
> refcount while resuming :)
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Srivatsa S. Bhat July 16, 2013, 9:29 a.m. UTC | #9
On 07/16/2013 02:40 PM, Viresh Kumar wrote:
> On 16 July 2013 14:26, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 07/16/2013 11:45 AM, Viresh Kumar wrote:
> 
>>> To understand it I actually applied your patches to get better view of the code.
>>> (Haven't tested it though).. And found that your code is doing the right thing
>>> and we shouldn't get a mismatch.. This is the sequence of events I can draw:
>>>
>>> - __cpu_add_dev() for first cpu. sets the refcount to 'x', where x are
>>> the no. of
>>> cpus in its clock domain.
>>> - _cpu_add_dev() for other cpus: doesn't change anything in refcount
>>>
>>> - Suspend:
>>>  - cpu_remove_dev() for all cpus, due to frozen flag we don't touch the value
>>> of count
>>> - Resume:
>>>  - cpu_add_dev() for all cpus, due to frozen flag we don't touch the
>>> value of count.
>>>
>>
>> Actually this one is tricky (I took a look again). So we have this code in the
>> beginning of _cpufreq_add_dev():
>>
>>
>> 1008 #ifdef CONFIG_SMP
>> 1009         /* check whether a different CPU already registered this
>> 1010          * CPU because it is in the same boat. */
>> 1011         policy = cpufreq_cpu_get(cpu);
>> 1012         if (unlikely(policy)) {
>> 1013                 cpufreq_cpu_put(policy);
>> 1014                 return 0;
>> 1015         }
>>
>> The _get() is not controlled by the frozen flag, but it still doesn't take a
>> refcount because of a subtle reason: per_cpu(cpufreq_cpu_data, cpu) was set to
>> NULL in __cpufreq_remove_dev() and the memory was saved away in fallback storage.
>> So, when __cpufreq_cpu_get() executes, it sees:
>>
>>  204         /* get the CPU */
>>  205         data = per_cpu(cpufreq_cpu_data, cpu);
>>  206
>>  207         if (!data)
>>  208                 goto err_out_put_module;
>>
>> Thus, since data is NULL, cpufreq_cpu_get() won't take a refcount and will return
>> silently.
> 
> Even if this wouldn't have happened, refcount wouldn't have been
> touched due to this code:
> 
>> 1012         if (unlikely(policy)) {
>> 1013                 cpufreq_cpu_put(policy);
>> 1014                 return 0;
>> 1015         }
> 
> i.e. If we get a valid policy structure, we siimply put the policy again
> and so decrement the incremented refcount.

Ah, yes!

> 
> So, even if you don't keep the fallback storage, things should work
> without any issue (probably worth trying as this will get rid of a per
> cpu variable :))
>

No, I already tried that and it didn't work ;-( The thing is, we need the
__cpufreq_add_dev() code to call the ->init() routines of drivers etc. But if
it finds the policy structure, it will skip all of that initialization and happily
proceed. Which is precisely the cause of all the erratic behaviour we are seeing
(ie., lack of proper initialization post-resume).

So this approach keeps the memory preserved in a fallback storage and lets the
init code run to full completion without any issues.

Perhaps we could do some _more_ code reorganization in the future to take this
issue into account etc., but IMHO that might be non-trivial. I'm trying to keep
this as simple and straight-forward as possible as a first step, to atleast get
it properly working. (Changing the order in which init is done is kinda scary
since its hard to comprehend what assumptions we might be breaking!).

We can perhaps revisit your idea later and optimize out the extra per-cpu data.
 
>> Further down in __cpufreq_add_dev(), we restore the original memory, using
>> the frozen flag:
>>
>> 1037         if (frozen)
>> 1038                 /* Restore the saved policy when doing light-weight init */
>> 1039                 policy = cpufreq_policy_restore(cpu);
>> 1040         else
>> 1041                 policy = cpufreq_policy_alloc();
>>
>>
>> So that is how we manage to fool cpufreq_cpu_get() into not taking a fresh
>> refcount while resuming :)
 
Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar July 16, 2013, 9:35 a.m. UTC | #10
On 16 July 2013 14:59, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 07/16/2013 02:40 PM, Viresh Kumar wrote:

>> So, even if you don't keep the fallback storage, things should work
>> without any issue (probably worth trying as this will get rid of a per
>> cpu variable :))
>>
>
> No, I already tried that and it didn't work ;-( The thing is, we need the
> __cpufreq_add_dev() code to call the ->init() routines of drivers etc. But if
> it finds the policy structure, it will skip all of that initialization and happily
> proceed. Which is precisely the cause of all the erratic behaviour we are seeing
> (ie., lack of proper initialization post-resume).

I missed that point. :)

> So this approach keeps the memory preserved in a fallback storage and lets the
> init code run to full completion without any issues.
>
> Perhaps we could do some _more_ code reorganization in the future to take this
> issue into account etc., but IMHO that might be non-trivial. I'm trying to keep
> this as simple and straight-forward as possible as a first step, to atleast get
> it properly working. (Changing the order in which init is done is kinda scary
> since its hard to comprehend what assumptions we might be breaking!).
>
> We can perhaps revisit your idea later and optimize out the extra per-cpu data.

No, we don't need to optimize it that way. Current design looks good
for now.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Srivatsa S. Bhat July 16, 2013, 9:54 a.m. UTC | #11
On 07/16/2013 03:05 PM, Viresh Kumar wrote:
> On 16 July 2013 14:59, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 07/16/2013 02:40 PM, Viresh Kumar wrote:
> 
>>> So, even if you don't keep the fallback storage, things should work
>>> without any issue (probably worth trying as this will get rid of a per
>>> cpu variable :))
>>>
>>
>> No, I already tried that and it didn't work ;-( The thing is, we need the
>> __cpufreq_add_dev() code to call the ->init() routines of drivers etc. But if
>> it finds the policy structure, it will skip all of that initialization and happily
>> proceed. Which is precisely the cause of all the erratic behaviour we are seeing
>> (ie., lack of proper initialization post-resume).
> 
> I missed that point. :)
> 
>> So this approach keeps the memory preserved in a fallback storage and lets the
>> init code run to full completion without any issues.
>>
>> Perhaps we could do some _more_ code reorganization in the future to take this
>> issue into account etc., but IMHO that might be non-trivial. I'm trying to keep
>> this as simple and straight-forward as possible as a first step, to atleast get
>> it properly working. (Changing the order in which init is done is kinda scary
>> since its hard to comprehend what assumptions we might be breaking!).
>>
>> We can perhaps revisit your idea later and optimize out the extra per-cpu data.
> 
> No, we don't need to optimize it that way. Current design looks good
> for now.

Cool! Thanks :)

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 1128753..15ced5f 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -44,6 +44,7 @@ 
  */
 static struct cpufreq_driver *cpufreq_driver;
 static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data);
+static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data_fallback);
 static DEFINE_RWLOCK(cpufreq_driver_lock);
 static DEFINE_MUTEX(cpufreq_governor_lock);
 
@@ -942,6 +943,20 @@  static int cpufreq_add_policy_cpu(unsigned int cpu, unsigned int sibling,
 }
 #endif
 
+static struct cpufreq_policy *cpufreq_policy_restore(unsigned int cpu)
+{
+	struct cpufreq_policy *policy;
+	unsigned long flags;
+
+	write_lock_irqsave(&cpufreq_driver_lock, flags);
+
+	policy = per_cpu(cpufreq_cpu_data_fallback, cpu);
+
+	write_unlock_irqrestore(&cpufreq_driver_lock, flags);
+
+	return policy;
+}
+
 static struct cpufreq_policy *cpufreq_policy_alloc(void)
 {
 	struct cpufreq_policy *policy;
@@ -1019,7 +1034,12 @@  static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif,
 		goto module_out;
 	}
 
-	policy = cpufreq_policy_alloc();
+	if (frozen)
+		/* Restore the saved policy when doing light-weight init */
+		policy = cpufreq_policy_restore(cpu);
+	else
+		policy = cpufreq_policy_alloc();
+
 	if (!policy)
 		goto nomem_out;
 
@@ -1199,6 +1219,10 @@  static int __cpufreq_remove_dev(struct device *dev,
 	data = per_cpu(cpufreq_cpu_data, cpu);
 	per_cpu(cpufreq_cpu_data, cpu) = NULL;
 
+	/* Save the policy somewhere when doing a light-weight tear-down */
+	if (frozen)
+		per_cpu(cpufreq_cpu_data_fallback, cpu) = data;
+
 	write_unlock_irqrestore(&cpufreq_driver_lock, flags);
 
 	if (!data) {
@@ -1239,29 +1263,40 @@  static int __cpufreq_remove_dev(struct device *dev,
 	if ((cpus == 1) && (cpufreq_driver->target))
 		__cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);
 
-	pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
-	cpufreq_cpu_put(data);
+	if (!frozen) {
+		pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
+		cpufreq_cpu_put(data);
+	}
 
 	/* If cpu is last user of policy, free policy */
 	if (cpus == 1) {
-		lock_policy_rwsem_read(cpu);
-		kobj = &data->kobj;
-		cmp = &data->kobj_unregister;
-		unlock_policy_rwsem_read(cpu);
-		kobject_put(kobj);
-
-		/* we need to make sure that the underlying kobj is actually
-		 * not referenced anymore by anybody before we proceed with
-		 * unloading.
-		 */
-		pr_debug("waiting for dropping of refcount\n");
-		wait_for_completion(cmp);
-		pr_debug("wait complete\n");
+		if (!frozen) {
+			lock_policy_rwsem_read(cpu);
+			kobj = &data->kobj;
+			cmp = &data->kobj_unregister;
+			unlock_policy_rwsem_read(cpu);
+			kobject_put(kobj);
+
+			/*
+			 * We need to make sure that the underlying kobj is
+			 * actually not referenced anymore by anybody before we
+			 * proceed with unloading.
+			 */
+			pr_debug("waiting for dropping of refcount\n");
+			wait_for_completion(cmp);
+			pr_debug("wait complete\n");
+		}
 
+		/*
+		 * Perform the ->exit() even during light-weight tear-down,
+		 * since this is a core component, and is essential for the
+		 * the subsequent light-weight ->init() to succeed.
+		 */
 		if (cpufreq_driver->exit)
 			cpufreq_driver->exit(data);
 
-		cpufreq_policy_free(data);
+		if (!frozen)
+			cpufreq_policy_free(data);
 
 	} else if (cpufreq_driver->target) {
 		__cpufreq_governor(data, CPUFREQ_GOV_START);