cpufreq_stats NULL deref on second system suspend

From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

On 09/11/2013 04:04 AM, Rafael J. Wysocki wrote:
> On Tuesday, September 10, 2013 02:53:01 PM Stephen Warren wrote:
>> On 09/09/2013 05:14 PM, Rafael J. Wysocki wrote:
>>> On Monday, September 09, 2013 03:29:06 PM Stephen Warren wrote:
>>>> On 09/09/2013 02:24 PM, Rafael J. Wysocki wrote:
>>>>> On Monday, September 09, 2013 02:01:32 PM Stephen Warren wrote:
>>>>>> On 09/09/2013 02:01 PM, Rafael J. Wysocki wrote:
>>>>>>> On Monday, September 09, 2013 01:22:23 PM Stephen Warren wrote:
>>>>>>>> Viresh,
>>>>>>>>
>>>>>>>> I'm seeing the crash below when suspending my system for the second time.
>>>>>>>>
>>>>>>>> I can avoid this with the following patch, which adds a check which
>>>>>>>> already exists in all-but-one other places that the same lookup is made:
>>>>>>>
>>>>>>> Which kernel did you test?
>>>>>>
>>>>>> next-20130909.
>>>>>
>>>>> Is it reproducible with the current mainline?
>>>>
>>>> This does not affect v3.11, but does affect current HEAD; 300893b "Merge
>>>> tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs".
>>>
>>> What system does it break on?
>>
>> A dual-core ARM system (NVIDIA Tegra20 SoC, Harmony board).
>>
>>> Any chance to bisect cpufreq changes between 3.11 and the current HEAD?
>>
>> Sure, it's due to 5302c3f "cpufreq: Perform light-weight init/teardown
>> during suspend/resume".
> 
> Thanks!
> 
> Srivatsa, any chance to look into this?
>

Sure, Rafael. Thanks for CC'ing me.

Stephen, I went through the code and I think I found out what is going wrong.
Can you please try the following patch?

Regards,
Srivatsa S. Bhat

----------------------------------------------------------------------------

From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Subject: [PATCH] cpufreq: Fix crash in cpufreq-stats during suspend/resume

Stephen Warren reported that the cpufreq-stats code hits a NULL pointer
dereference during the second attempt to suspend a system. He also
pin-pointed the problem to commit 5302c3f "cpufreq: Perform light-weight
init/teardown during suspend/resume".

That commit actually ensured that the cpufreq-stats table and the
cpufreq-stats sysfs entries are *not* torn down (ie., not freed) during
suspend/resume, which makes it all the more surprising. However, it turns
out that the root-cause is not that we access an already freed memory, but
that the reference to the allocated memory gets moved around and we lose
track of that during resume, leading to the reported crash in a subsequent
suspend attempt.

In the suspend path, during CPU offline, the value of policy->cpu is updated
by choosing one of the surviving CPUs in that policy, as long as there is
atleast one CPU in that policy. And cpufreq_stats_update_policy_cpu() is
invoked to update the reference to the stats structure by assigning it to
the new CPU. However, in the resume path, during CPU online, we end up
assigning a fresh CPU as the policy->cpu, without letting cpufreq-stats
know about this. Thus the reference to the stats structure remains
(incorrectly) associated with the old CPU. So, in a subsequent suspend attempt,
during CPU offline, we end up accessing an incorrect location to get the
stats structure, which eventually leads to the NULL pointer dereference.

Fix this by letting cpufreq-stats know about the update of the policy->cpu
during CPU online in the resume path. (Also, move the update_policy_cpu()
function higher up in the file, so that __cpufreq_add_dev() can invoke
it).

Reported-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 drivers/cpufreq/cpufreq.c |   37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

cpufreq_stats NULL deref on second system suspend

Commit Message

Comments

Patch