[v1] x86/power/64: Restore processor state before using per-cpu variables
diff mbox

Message ID 1470952169-39061-1-git-send-email-thgarnie@google.com
State New
Headers show

Commit Message

Thomas Garnier Aug. 11, 2016, 9:49 p.m. UTC
Restore the processor state before calling any other function to ensure
per-cpu variables can be used with KASLR memory randomization.

Tracing functions use per-cpu variables (gs based) and one was called
just before restoring the processor state fully. It resulted in a double
fault when both the tracing & the exception handler functions tried to
use a per-cpu variable.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20160808

Thanks to Rafael, Jiri & Borislav in tracking down this bug.
---
 kernel/power/hibernate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Borislav Petkov Aug. 12, 2016, 5:49 a.m. UTC | #1
On Thu, Aug 11, 2016 at 02:49:29PM -0700, Thomas Garnier wrote:
> Restore the processor state before calling any other function to ensure
> per-cpu variables can be used with KASLR memory randomization.
> 
> Tracing functions use per-cpu variables (gs based) and one was called
> just before restoring the processor state fully. It resulted in a double
> fault when both the tracing & the exception handler functions tried to
> use a per-cpu variable.
> 
> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> ---
> Based on next-20160808

Ok, I believe before I test this, I need to apply another patch from
Rafael. I think it is the "Always create temporary identity mapping
correctly" thing.

Yes, no?

Rafael, can you please apply everything on a test branch for us to run?

Thanks.
Jiri Kosina Aug. 12, 2016, 6:01 a.m. UTC | #2
On Thu, 11 Aug 2016, Thomas Garnier wrote:

> Restore the processor state before calling any other function to ensure
> per-cpu variables can be used with KASLR memory randomization.
> 
> Tracing functions use per-cpu variables (gs based) and one was called
> just before restoring the processor state fully. It resulted in a double
> fault when both the tracing & the exception handler functions tried to
> use a per-cpu variable.
> 
> Signed-off-by: Thomas Garnier <thgarnie@google.com>

Alright, this did the trick, thanks :) Feel free to add

	Reported-by: Jiri Kosina <jkosina@suse.cz>
	Tested-by: Jiri Kosina <jkosina@suse.cz>

One thing is still beyond me though ... how the heck this doesn't happen 
without DEBUG_LOCK_ALLOC? The percpu area pointer should be corrupted 
nevertheless, shouldn't it?

Thanks,
Pavel Machek Aug. 12, 2016, 6:29 a.m. UTC | #3
Hi!

> Restore the processor state before calling any other function to ensure
> per-cpu variables can be used with KASLR memory randomization.
> 
> Tracing functions use per-cpu variables (gs based) and one was called
> just before restoring the processor state fully. It resulted in a double
> fault when both the tracing & the exception handler functions tried to
> use a per-cpu variable.
> 
> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> ---
> Based on next-20160808
> 
> Thanks to Rafael, Jiri & Borislav in tracking down this bug.
> ---
>  kernel/power/hibernate.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> index a881c6a..33c79b6 100644
> --- a/kernel/power/hibernate.c
> +++ b/kernel/power/hibernate.c
> @@ -300,12 +300,12 @@ static int create_image(int platform_mode)
>  	save_processor_state();
>  	trace_suspend_resume(TPS("machine_suspend"), PM_EVENT_HIBERNATE, true);
>  	error = swsusp_arch_suspend();
> +	/* Restore control flow magically appears here */
> +	restore_processor_state();
>  	trace_suspend_resume(TPS("machine_suspend"), PM_EVENT_HIBERNATE, false);
>  	if (error)
>  		printk(KERN_ERR "PM: Error %d creating hibernation image\n",
>  			error);
> -	/* Restore control flow magically appears here */
> -	restore_processor_state();
>  	if (!in_suspend)
>  		events_check_enabled = false;
>  

Ugh. Plus it also fixes very confusing situation where /* Restore
control flow magically appears here */ comment was 4 lines away from
where it _actually_ magically appeared. Good catch.

Acked-by: Pavel Machek <pavel@ucw.cz>
Jiri Kosina Aug. 12, 2016, 9:23 a.m. UTC | #4
On Fri, 12 Aug 2016, Jiri Kosina wrote:

> One thing is still beyond me though ... how the heck this doesn't happen 
> without DEBUG_LOCK_ALLOC? The percpu area pointer should be corrupted 
> nevertheless, shouldn't it?

The reason is that turning DEBUG_LOCK_ALLOC changing 
trace_suspend_resume() from

ffffffff810c7280 <trace_suspend_resume>:
ffffffff810c7280:       55                      push   %rbp
ffffffff810c7281:       48 89 e5                mov    %rsp,%rbp
ffffffff810c7284:       41 56                   push   %r14
ffffffff810c7286:       41 55                   push   %r13
ffffffff810c7288:       41 54                   push   %r12
ffffffff810c728a:       53                      push   %rbx
ffffffff810c728b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
ffffffff810c7290:       65 8b 05 59 2f f4 7e    mov    %gs:0x7ef42f59(%rip),%eax        # a1f0 <cpu_number>
ffffffff810c7297:       89 c0                   mov    %eax,%eax
ffffffff810c7299:       48 0f a3 05 9f 2f c4    bt     %rax,0xc42f9f(%rip)        # ffffffff81d0a240 <__cpu_online_mask>

to

ffffffff810ad150 <trace_suspend_resume>:
ffffffff810ad150:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
ffffffff810ad155:       c3                      retq   
ffffffff810ad156:       65 8b 05 93 d0 f5 7e    mov    %gs:0x7ef5d093(%rip),%eax        # a1f0 <cpu_number>
ffffffff810ad15d:       89 c0                   mov    %eax,%eax
ffffffff810ad15f:       48 0f a3 05 59 0b c4    bt     %rax,0xc40b59(%rip)        # ffffffff81cedcc0 <__cpu_online_mask>
ffffffff810ad166:       00

IOW, with the config change, trace_suspend_resume() returns immediately 
without trying to get the current SMP id. And it's because of 
DEBUG_LOCK_ALLOC implies LOCKDEP, and that does this to __DECLARE_TRACE()

	 * When lockdep is enabled, we make sure to always do the RCU portions of
	 * the tracepoint code, regardless of whether tracing is on. However,
	 * don't check if the condition is false, due to interaction with idle
	 * instrumentation. This lets us find RCU issues triggered with tracepoints
	 * even when this tracepoint is off. This code has no purpose other than
	 * poking RCU a bit.
	 */
	#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
		extern struct tracepoint __tracepoint_##name;                   \
		static inline void trace_##name(proto)                          \
		{                                                               \
			if (static_key_false(&__tracepoint_##name.key))         \
				__DO_TRACE(&__tracepoint_##name,                \
					TP_PROTO(data_proto),                   \
					TP_ARGS(data_args),                     \
					TP_CONDITION(cond),,);                  \
			if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {             \
				rcu_read_lock_sched_notrace();                  \
				rcu_dereference_sched(__tracepoint_##name.funcs);\
				rcu_read_unlock_sched_notrace();                \
			}                                                       \
		} 

That's pretty nasty, as turning LOCKDEP on has sideffects on the code 
that'd normally not be expected to be run at all (tracepoint off).

Oh well.

Thanks again,
Rafael J. Wysocki Aug. 12, 2016, 11:14 a.m. UTC | #5
On Fri, Aug 12, 2016 at 7:49 AM, Borislav Petkov <bp@suse.de> wrote:
> On Thu, Aug 11, 2016 at 02:49:29PM -0700, Thomas Garnier wrote:
>> Restore the processor state before calling any other function to ensure
>> per-cpu variables can be used with KASLR memory randomization.
>>
>> Tracing functions use per-cpu variables (gs based) and one was called
>> just before restoring the processor state fully. It resulted in a double
>> fault when both the tracing & the exception handler functions tried to
>> use a per-cpu variable.
>>
>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>> ---
>> Based on next-20160808
>
> Ok, I believe before I test this, I need to apply another patch from
> Rafael. I think it is the "Always create temporary identity mapping
> correctly" thing.
>
> Yes, no?

Yes.

> Rafael, can you please apply everything on a test branch for us to run?

You can simply test my linux-next branch:

git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
linux-next

That's 4.8-rc1 plus 3 fixes on top of it.

Thanks,
Rafael
Thomas Garnier Aug. 12, 2016, 4:03 p.m. UTC | #6
On Fri, Aug 12, 2016 at 2:23 AM, Jiri Kosina <jikos@kernel.org> wrote:
> On Fri, 12 Aug 2016, Jiri Kosina wrote:
>
> That's pretty nasty, as turning LOCKDEP on has sideffects on the code
> that'd normally not be expected to be run at all (tracepoint off).
>
> Oh well.

Thanks for the analysis, I didn't got that far so I had no idea how
everything was connected.

> Thanks again,

Thanks you and Borislav for finding it.
Thomas Garnier Aug. 12, 2016, 4:03 p.m. UTC | #7
On Fri, Aug 12, 2016 at 4:14 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Fri, Aug 12, 2016 at 7:49 AM, Borislav Petkov <bp@suse.de> wrote:
>> On Thu, Aug 11, 2016 at 02:49:29PM -0700, Thomas Garnier wrote:
>>> Restore the processor state before calling any other function to ensure
>>> per-cpu variables can be used with KASLR memory randomization.
>>>
>>> Tracing functions use per-cpu variables (gs based) and one was called
>>> just before restoring the processor state fully. It resulted in a double
>>> fault when both the tracing & the exception handler functions tried to
>>> use a per-cpu variable.
>>>
>>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>>> ---
>>> Based on next-20160808
>>
>> Ok, I believe before I test this, I need to apply another patch from
>> Rafael. I think it is the "Always create temporary identity mapping
>> correctly" thing.
>>
>> Yes, no?
>
> Yes.
>
>> Rafael, can you please apply everything on a test branch for us to run?
>
> You can simply test my linux-next branch:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> linux-next
>
> That's 4.8-rc1 plus 3 fixes on top of it.

Borislav, let me know once you tested it and I will send a v2 with acked/tested.
Borislav Petkov Aug. 12, 2016, 5:45 p.m. UTC | #8
On Fri, Aug 12, 2016 at 09:03:52AM -0700, Thomas Garnier wrote:
> Borislav, let me know once you tested it and I will send a v2 with
> acked/tested.

Just did 5 s2d runs, back-to-back, with Rafael's linux-next branch.
Looks good so far, no hickups. I'll watch it the coming days if there
are any changes.

Reported-and-tested-by: Borislav Petkov <bp@suse.de>

Btw, I'm running with:

CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_RANDOMIZE_BASE=y
CONFIG_RANDOMIZE_MEMORY=y
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING=0x0

Thanks guys for the fixes.

Patch
diff mbox

diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index a881c6a..33c79b6 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -300,12 +300,12 @@  static int create_image(int platform_mode)
 	save_processor_state();
 	trace_suspend_resume(TPS("machine_suspend"), PM_EVENT_HIBERNATE, true);
 	error = swsusp_arch_suspend();
+	/* Restore control flow magically appears here */
+	restore_processor_state();
 	trace_suspend_resume(TPS("machine_suspend"), PM_EVENT_HIBERNATE, false);
 	if (error)
 		printk(KERN_ERR "PM: Error %d creating hibernation image\n",
 			error);
-	/* Restore control flow magically appears here */
-	restore_processor_state();
 	if (!in_suspend)
 		events_check_enabled = false;