diff mbox

ftrace/x86: fix x86-32 triple fault with graph tracing and suspend-to-ram

Message ID 20170327145441.aybim6rmc6nxelij@treble (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Josh Poimboeuf March 27, 2017, 2:54 p.m. UTC
On x86-32, with CONFIG_FIRMWARE and multiple CPUs, if you enable
function graph tracing and then suspend to RAM, it will triple fault and
reboot when it resumes.

The first fault happens when booting a secondary CPU:

startup_32_smp()
  load_ucode_ap()
    prepare_ftrace_return()
      ftrace_graph_is_dead()
        (accesses 'kill_ftrace_graph')

The early head_32.S code calls into load_ucode_ap(), which has an an
ftrace hook, so it calls prepare_ftrace_return(), which calls
ftrace_graph_is_dead(), which tries to access the global
'kill_ftrace_graph' variable with a virtual address, causing a fault
because the CPU is still in real mode.

The fix is to add a check in prepare_ftrace_return() to make sure it's
running in protected mode before continuing.  The check makes sure the
stack pointer is a virtual kernel address.  It's a bit of a hack, but
it's not very intrusive and it works well enough.

For reference, here are a few other ways this could have potentially
been fixed:

- Move startup_32_smp()'s call to load_ucode_ap() down to *after* paging
  is enabled.  (No idea what that would break.)

- Track down load_ucode_ap()'s entire callee tree and mark all the
  functions 'notrace'.  (Probably not realistic.)

- Pause graph tracing in ftrace_suspend_notifier_call() or bringup_cpu()
  or __cpu_up(), and ensure that the pause facility can be queried from
  real mode.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/kernel/ftrace.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Paul Menzel March 27, 2017, 3:01 p.m. UTC | #1
Dear Josh,


On 03/27/17 16:54, Josh Poimboeuf wrote:
> On x86-32, with CONFIG_FIRMWARE and multiple CPUs, if you enable
> function graph tracing and then suspend to RAM, it will triple fault and
> reboot when it resumes.
>
> The first fault happens when booting a secondary CPU:
>
> startup_32_smp()
>   load_ucode_ap()
>     prepare_ftrace_return()
>       ftrace_graph_is_dead()
>         (accesses 'kill_ftrace_graph')
>
> The early head_32.S code calls into load_ucode_ap(), which has an an
> ftrace hook, so it calls prepare_ftrace_return(), which calls
> ftrace_graph_is_dead(), which tries to access the global
> 'kill_ftrace_graph' variable with a virtual address, causing a fault
> because the CPU is still in real mode.
>
> The fix is to add a check in prepare_ftrace_return() to make sure it's
> running in protected mode before continuing.  The check makes sure the
> stack pointer is a virtual kernel address.  It's a bit of a hack, but
> it's not very intrusive and it works well enough.
>
> For reference, here are a few other ways this could have potentially
> been fixed:
>
> - Move startup_32_smp()'s call to load_ucode_ap() down to *after* paging
>   is enabled.  (No idea what that would break.)
>
> - Track down load_ucode_ap()'s entire callee tree and mark all the
>   functions 'notrace'.  (Probably not realistic.)
>
> - Pause graph tracing in ftrace_suspend_notifier_call() or bringup_cpu()
>   or __cpu_up(), and ensure that the pause facility can be queried from
>   real mode.
>
> Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
>  arch/x86/kernel/ftrace.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)

Thank you for debugging this. It’s great that you were able to reproduce 
this in QEMU. Hopefully, that’ll make for an easy test case. ;-)

> diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> index 8f3d9cf..1c5c4e2 100644
> --- a/arch/x86/kernel/ftrace.c
> +++ b/arch/x86/kernel/ftrace.c
> @@ -983,6 +983,17 @@ void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
>  	unsigned long return_hooker = (unsigned long)
>  				&return_to_handler;
>
> +	/*
> +	 * When resuming from suspend-to-ram, this function can be indirectly
> +	 * called from early CPU startup code while the CPU is in real mode,
> +	 * which would fail miserably.  Make sure the stack pointer is a
> +	 * virtual address.
> +	 *
> +	 * This check isn't as accurate as virt_addr_valid(), but it should be
> +	 * good enough for this purpose, and it's fast.
> +	 */
> +	if (unlikely((long)__builtin_frame_address(0) >= 0)) return;

The coding style requires the `return;` to be on a separate line.

> +
>  	if (unlikely(ftrace_graph_is_dead()))
>  		return;

I’ll test your change this evening.


Kind regards,

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt March 27, 2017, 3:24 p.m. UTC | #2
On Mon, 27 Mar 2017 17:01:53 +0200
Paul Menzel <pmenzel@molgen.mpg.de> wrote:

> > +	/*
> > +	 * When resuming from suspend-to-ram, this function can be indirectly
> > +	 * called from early CPU startup code while the CPU is in real mode,
> > +	 * which would fail miserably.  Make sure the stack pointer is a
> > +	 * virtual address.
> > +	 *
> > +	 * This check isn't as accurate as virt_addr_valid(), but it should be
> > +	 * good enough for this purpose, and it's fast.
> > +	 */
> > +	if (unlikely((long)__builtin_frame_address(0) >= 0)) return;  
> 
> The coding style requires the `return;` to be on a separate line.

Correct, and new versions of a patch should always start a new thread
(unless it's a single update of a patch in a long patch series).
Otherwise they get ignored. (hint hint)

-- Steve



> 
> > +
> >  	if (unlikely(ftrace_graph_is_dead()))
> >  		return;  
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Menzel March 28, 2017, 9:51 a.m. UTC | #3
Dear Josh,


On 03/27/17 17:01, Paul Menzel wrote:

> On 03/27/17 16:54, Josh Poimboeuf wrote:
>> On x86-32, with CONFIG_FIRMWARE and multiple CPUs, if you enable
>> function graph tracing and then suspend to RAM, it will triple fault and
>> reboot when it resumes.
>>
>> The first fault happens when booting a secondary CPU:
>>
>> startup_32_smp()
>>   load_ucode_ap()
>>     prepare_ftrace_return()
>>       ftrace_graph_is_dead()
>>         (accesses 'kill_ftrace_graph')
>>
>> The early head_32.S code calls into load_ucode_ap(), which has an an
>> ftrace hook, so it calls prepare_ftrace_return(), which calls
>> ftrace_graph_is_dead(), which tries to access the global
>> 'kill_ftrace_graph' variable with a virtual address, causing a fault
>> because the CPU is still in real mode.
>>
>> The fix is to add a check in prepare_ftrace_return() to make sure it's
>> running in protected mode before continuing.  The check makes sure the
>> stack pointer is a virtual kernel address.  It's a bit of a hack, but
>> it's not very intrusive and it works well enough.
>>
>> For reference, here are a few other ways this could have potentially
>> been fixed:
>>
>> - Move startup_32_smp()'s call to load_ucode_ap() down to *after* paging
>>   is enabled.  (No idea what that would break.)
>>
>> - Track down load_ucode_ap()'s entire callee tree and mark all the
>>   functions 'notrace'.  (Probably not realistic.)
>>
>> - Pause graph tracing in ftrace_suspend_notifier_call() or bringup_cpu()
>>   or __cpu_up(), and ensure that the pause facility can be queried from
>>   real mode.
>>
>> Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
>> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
>> ---
>>  arch/x86/kernel/ftrace.c | 11 +++++++++++
>>  1 file changed, 11 insertions(+)
>
> Thank you for debugging this. It’s great that you were able to reproduce
> this in QEMU. Hopefully, that’ll make for an easy test case. ;-)
>
>> diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
>> index 8f3d9cf..1c5c4e2 100644
>> --- a/arch/x86/kernel/ftrace.c
>> +++ b/arch/x86/kernel/ftrace.c
>> @@ -983,6 +983,17 @@ void prepare_ftrace_return(unsigned long
>> self_addr, unsigned long *parent,
>>      unsigned long return_hooker = (unsigned long)
>>                  &return_to_handler;
>>
>> +    /*
>> +     * When resuming from suspend-to-ram, this function can be
>> indirectly
>> +     * called from early CPU startup code while the CPU is in real mode,
>> +     * which would fail miserably.  Make sure the stack pointer is a
>> +     * virtual address.
>> +     *
>> +     * This check isn't as accurate as virt_addr_valid(), but it
>> should be
>> +     * good enough for this purpose, and it's fast.
>> +     */
>> +    if (unlikely((long)__builtin_frame_address(0) >= 0)) return;
>
> The coding style requires the `return;` to be on a separate line.
>
>> +
>>      if (unlikely(ftrace_graph_is_dead()))
>>          return;
>
> I’ll test your change this evening.

With both patches applied `./analyze_suspend.py -config 
suspend-callgraph.cfg -filter i915` succeeds on a Lenovo X60t, so 
suspend and resume work perfectly, when tracing is enabled.

Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>

It’d be awesome, if you could tag both patches for inclusion into the 
stable Linux Kernel series.


Kind regards,

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt March 28, 2017, 3:39 p.m. UTC | #4
On Tue, 28 Mar 2017 11:51:45 +0200
Paul Menzel <pmenzel@molgen.mpg.de> wrote:

> With both patches applied `./analyze_suspend.py -config 
> suspend-callgraph.cfg -filter i915` succeeds on a Lenovo X60t, so 
> suspend and resume work perfectly, when tracing is enabled.
> 
> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
> 
> It’d be awesome, if you could tag both patches for inclusion into the 
> stable Linux Kernel series.

As long as they are not dependent on my patch series, I'm fine with
these going to stable.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josh Poimboeuf March 28, 2017, 3:55 p.m. UTC | #5
On Tue, Mar 28, 2017 at 11:39:41AM -0400, Steven Rostedt wrote:
> On Tue, 28 Mar 2017 11:51:45 +0200
> Paul Menzel <pmenzel@molgen.mpg.de> wrote:
> 
> > With both patches applied `./analyze_suspend.py -config 
> > suspend-callgraph.cfg -filter i915` succeeds on a Lenovo X60t, so 
> > suspend and resume work perfectly, when tracing is enabled.
> > 
> > Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
> > 
> > It’d be awesome, if you could tag both patches for inclusion into the 
> > stable Linux Kernel series.
> 
> As long as they are not dependent on my patch series, I'm fine with
> these going to stable.

Stable sounds fine to me too.  Both patches are independent of your
x86-32 fentry patch set.
Rafael J. Wysocki March 28, 2017, 9:12 p.m. UTC | #6
On Tuesday, March 28, 2017 10:55:46 AM Josh Poimboeuf wrote:
> On Tue, Mar 28, 2017 at 11:39:41AM -0400, Steven Rostedt wrote:
> > On Tue, 28 Mar 2017 11:51:45 +0200
> > Paul Menzel <pmenzel@molgen.mpg.de> wrote:
> > 
> > > With both patches applied `./analyze_suspend.py -config 
> > > suspend-callgraph.cfg -filter i915` succeeds on a Lenovo X60t, so 
> > > suspend and resume work perfectly, when tracing is enabled.
> > > 
> > > Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
> > > 
> > > It’d be awesome, if you could tag both patches for inclusion into the 
> > > stable Linux Kernel series.
> > 
> > As long as they are not dependent on my patch series, I'm fine with
> > these going to stable.
> 
> Stable sounds fine to me too.  Both patches are independent of your
> x86-32 fentry patch set.

Does https://patchwork.kernel.org/patch/9628301/ need to go into any particular
-stable series or just all of them?

Or should a Fixes: tag be added to it?

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josh Poimboeuf March 28, 2017, 9:42 p.m. UTC | #7
On Tue, Mar 28, 2017 at 11:12:42PM +0200, Rafael J. Wysocki wrote:
> On Tuesday, March 28, 2017 10:55:46 AM Josh Poimboeuf wrote:
> > On Tue, Mar 28, 2017 at 11:39:41AM -0400, Steven Rostedt wrote:
> > > On Tue, 28 Mar 2017 11:51:45 +0200
> > > Paul Menzel <pmenzel@molgen.mpg.de> wrote:
> > > 
> > > > With both patches applied `./analyze_suspend.py -config 
> > > > suspend-callgraph.cfg -filter i915` succeeds on a Lenovo X60t, so 
> > > > suspend and resume work perfectly, when tracing is enabled.
> > > > 
> > > > Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
> > > > 
> > > > It’d be awesome, if you could tag both patches for inclusion into the 
> > > > stable Linux Kernel series.
> > > 
> > > As long as they are not dependent on my patch series, I'm fine with
> > > these going to stable.
> > 
> > Stable sounds fine to me too.  Both patches are independent of your
> > x86-32 fentry patch set.
> 
> Does https://patchwork.kernel.org/patch/9628301/ need to go into any particular
> -stable series or just all of them?
> 
> Or should a Fixes: tag be added to it?

As far as I can tell this issue has been around since the function_graph
tracer was introduced in 2008:

  15e6cb3673ea ("tracing: add a tracer to catch execution time of kernel functions")

(Though only for gcc >= 4.4.)

Not sure if it's overkill to specify 'Fixes' for an 8+ year old bug?  I
guess it can't hurt anything.

I think it can go in all of the stable branches.
Rafael J. Wysocki March 28, 2017, 9:47 p.m. UTC | #8
On Tuesday, March 28, 2017 04:42:18 PM Josh Poimboeuf wrote:
> On Tue, Mar 28, 2017 at 11:12:42PM +0200, Rafael J. Wysocki wrote:
> > On Tuesday, March 28, 2017 10:55:46 AM Josh Poimboeuf wrote:
> > > On Tue, Mar 28, 2017 at 11:39:41AM -0400, Steven Rostedt wrote:
> > > > On Tue, 28 Mar 2017 11:51:45 +0200
> > > > Paul Menzel <pmenzel@molgen.mpg.de> wrote:
> > > > 
> > > > > With both patches applied `./analyze_suspend.py -config 
> > > > > suspend-callgraph.cfg -filter i915` succeeds on a Lenovo X60t, so 
> > > > > suspend and resume work perfectly, when tracing is enabled.
> > > > > 
> > > > > Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
> > > > > 
> > > > > It’d be awesome, if you could tag both patches for inclusion into the 
> > > > > stable Linux Kernel series.
> > > > 
> > > > As long as they are not dependent on my patch series, I'm fine with
> > > > these going to stable.
> > > 
> > > Stable sounds fine to me too.  Both patches are independent of your
> > > x86-32 fentry patch set.
> > 
> > Does https://patchwork.kernel.org/patch/9628301/ need to go into any particular
> > -stable series or just all of them?
> > 
> > Or should a Fixes: tag be added to it?
> 
> As far as I can tell this issue has been around since the function_graph
> tracer was introduced in 2008:
> 
>   15e6cb3673ea ("tracing: add a tracer to catch execution time of kernel functions")
> 
> (Though only for gcc >= 4.4.)
> 
> Not sure if it's overkill to specify 'Fixes' for an 8+ year old bug?  I
> guess it can't hurt anything.
> 
> I think it can go in all of the stable branches.

OK, thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 8f3d9cf..1c5c4e2 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -983,6 +983,17 @@  void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
 	unsigned long return_hooker = (unsigned long)
 				&return_to_handler;
 
+	/*
+	 * When resuming from suspend-to-ram, this function can be indirectly
+	 * called from early CPU startup code while the CPU is in real mode,
+	 * which would fail miserably.  Make sure the stack pointer is a
+	 * virtual address.
+	 *
+	 * This check isn't as accurate as virt_addr_valid(), but it should be
+	 * good enough for this purpose, and it's fast.
+	 */
+	if (unlikely((long)__builtin_frame_address(0) >= 0)) return;
+
 	if (unlikely(ftrace_graph_is_dead()))
 		return;