diff mbox

cpufreq: fix a NULL pointer dereference triggered by _PPC changed notification

Message ID 1418864919-14870-1-git-send-email-ethan.zhao@oracle.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

ethan zhao Dec. 18, 2014, 1:08 a.m. UTC
If _PPC changed notification happens before governor was initiated while kernel
is booting, a NULL pointer dereference will be triggered:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
 IP: [<ffffffff81470453>] __cpufreq_governor+0x23/0x1e0
 PGD 0
 Oops: 0000 [#1] SMP
 ... ...
 RIP: 0010:[<ffffffff81470453>]  [<ffffffff81470453>]
 __cpufreq_governor+0x23/0x1e0
 RSP: 0018:ffff881fcfbcfbb8  EFLAGS: 00010286
 RAX: 0000000000000000 RBX: ffff881fd11b3980 RCX: ffff88407fc20000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff881fd11b3980
 RBP: ffff881fcfbcfbd8 R08: 0000000000000000 R09: 000000000000000f
 R10: ffffffff818068d0 R11: 0000000000000043 R12: 0000000000000004
 R13: 0000000000000000 R14: ffffffff8196cae0 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff881fffc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000030 CR3: 00000000018ae000 CR4: 00000000000407f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process kworker/0:3 (pid: 750, threadinfo ffff881fcfbce000, task
 ffff881fcf556400)
 Stack:
  ffff881fffc17d00 ffff881fcfbcfc18 ffff881fd11b3980 0000000000000000
  ffff881fcfbcfc08 ffffffff81470d08 ffff881fd11b3980 0000000000000007
  ffff881fcfbcfc18 ffff881fffc17d00 ffff881fcfbcfd28 ffffffff81472e9a
 Call Trace:
  [<ffffffff81470d08>] __cpufreq_set_policy+0x1b8/0x2e0
  [<ffffffff81472e9a>] cpufreq_update_policy+0xca/0x150
  [<ffffffff81472f20>] ? cpufreq_update_policy+0x150/0x150
  [<ffffffff81324a96>] acpi_processor_ppc_has_changed+0x71/0x7b
  [<ffffffff81320bcd>] acpi_processor_notify+0x55/0x115
  [<ffffffff812f9c29>] acpi_device_notify+0x19/0x1b
  [<ffffffff813084ca>] acpi_ev_notify_dispatch+0x41/0x5f
  [<ffffffff812f64a4>] acpi_os_execute_deferred+0x27/0x34

The root cause is a race conditon -- cpufreq core and acpi-cpufreq driver
were initiated, but cpufreq_governor wasn't and _PPC changed notification
happened, __cpufreq_governor() was called within acpi_os_execute_deferred
kernel thread context.

To fix this panic issue, add pointer checking code in __cpufreq_governor()
before pointer policy->governor is to be dereferenced.

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
---
 drivers/cpufreq/cpufreq.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Viresh Kumar Dec. 18, 2014, 4:26 a.m. UTC | #1
On 18 December 2014 at 06:38, Ethan Zhao <ethan.zhao@oracle.com> wrote:
> If _PPC changed notification happens before governor was initiated while kernel
> is booting, a NULL pointer dereference will be triggered:
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
>  IP: [<ffffffff81470453>] __cpufreq_governor+0x23/0x1e0
>  PGD 0
>  Oops: 0000 [#1] SMP
>  ... ...
>  RIP: 0010:[<ffffffff81470453>]  [<ffffffff81470453>]
>  __cpufreq_governor+0x23/0x1e0
>  RSP: 0018:ffff881fcfbcfbb8  EFLAGS: 00010286
>  RAX: 0000000000000000 RBX: ffff881fd11b3980 RCX: ffff88407fc20000
>  RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff881fd11b3980
>  RBP: ffff881fcfbcfbd8 R08: 0000000000000000 R09: 000000000000000f
>  R10: ffffffff818068d0 R11: 0000000000000043 R12: 0000000000000004
>  R13: 0000000000000000 R14: ffffffff8196cae0 R15: 0000000000000000
>  FS:  0000000000000000(0000) GS:ffff881fffc00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 0000000000000030 CR3: 00000000018ae000 CR4: 00000000000407f0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>  Process kworker/0:3 (pid: 750, threadinfo ffff881fcfbce000, task
>  ffff881fcf556400)
>  Stack:
>   ffff881fffc17d00 ffff881fcfbcfc18 ffff881fd11b3980 0000000000000000
>   ffff881fcfbcfc08 ffffffff81470d08 ffff881fd11b3980 0000000000000007
>   ffff881fcfbcfc18 ffff881fffc17d00 ffff881fcfbcfd28 ffffffff81472e9a
>  Call Trace:
>   [<ffffffff81470d08>] __cpufreq_set_policy+0x1b8/0x2e0
>   [<ffffffff81472e9a>] cpufreq_update_policy+0xca/0x150
>   [<ffffffff81472f20>] ? cpufreq_update_policy+0x150/0x150
>   [<ffffffff81324a96>] acpi_processor_ppc_has_changed+0x71/0x7b
>   [<ffffffff81320bcd>] acpi_processor_notify+0x55/0x115
>   [<ffffffff812f9c29>] acpi_device_notify+0x19/0x1b
>   [<ffffffff813084ca>] acpi_ev_notify_dispatch+0x41/0x5f
>   [<ffffffff812f64a4>] acpi_os_execute_deferred+0x27/0x34
>
> The root cause is a race conditon -- cpufreq core and acpi-cpufreq driver
> were initiated, but cpufreq_governor wasn't and _PPC changed notification
> happened, __cpufreq_governor() was called within acpi_os_execute_deferred
> kernel thread context.
>
> To fix this panic issue, add pointer checking code in __cpufreq_governor()
> before pointer policy->governor is to be dereferenced.
>
> Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
> ---
>  drivers/cpufreq/cpufreq.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 4473eba..b75735c 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -2021,6 +2021,11 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
>         /* Don't start any governor operations if we are entering suspend */
>         if (cpufreq_suspended)
>                 return 0;
> +       /* Governor might not be initiated here if _PPC changed notification
> +          happened, check it.
> +       */

Please adopt correct style of multiline comment here..

> +       if (!policy->governor)
> +               return -EINVAL;

And yet another band-aid to get things going...

We really need to sort out things here, its not getting us anywhere.
Cpufreq core's state machine is in real bad shape right now..

Okay, let me find some time at higher priority and get things
fixed here. There are unattended bugs floating around because
bandaids aren't working anymore.

Till then, you can get this one pushed for current rc.

After the comment fix, Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 4473eba..b75735c 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2021,6 +2021,11 @@  static int __cpufreq_governor(struct cpufreq_policy *policy,
 	/* Don't start any governor operations if we are entering suspend */
 	if (cpufreq_suspended)
 		return 0;
+	/* Governor might not be initiated here if _PPC changed notification
+	   happened, check it.
+	*/
+	if (!policy->governor)
+		return -EINVAL;
 
 	if (policy->governor->max_transition_latency &&
 	    policy->cpuinfo.transition_latency >