x86/boot: Fix early exception handling with CONFIG_PERF_COUNTERS
diff mbox series

Message ID 20200415173911.13286-1-andrew.cooper3@citrix.com
State New
Headers show
Series
  • x86/boot: Fix early exception handling with CONFIG_PERF_COUNTERS
Related show

Commit Message

Andrew Cooper April 15, 2020, 5:39 p.m. UTC
The PERFC_INCR() macro uses current->processor, but current is not valid
during early boot.  This causes the following crash to occur if
e.g. rdmsr_safe() has to recover from a #GP fault.

  (XEN) Early fatal page fault at e008:ffff82d0803b1a39 (cr2=0000000000000004, ec=0000)
  (XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
  (XEN) CPU:    0
  (XEN) RIP:    e008:[<ffff82d0803b1a39>] x86_64/entry.S#handle_exception_saved+0x64/0xb8
  ...
  (XEN) Xen call trace:
  (XEN)    [<ffff82d0803b1a39>] R x86_64/entry.S#handle_exception_saved+0x64/0xb8
  (XEN)    [<ffff82d0806394fe>] F __start_xen+0x2cd/0x2980
  (XEN)    [<ffff82d0802000ec>] F __high_start+0x4c/0x4e

Furthermore, the PERFC_INCR() macro is wildly inefficient.  There has been a
single caller for many releases now, so inline it and delete the macro
completely.

For the assembly, move entry_vector from %eax into %ecx.  There is no encoding
benefit for movzbl, and it frees up %eax to be used inside the
CONFIG_PERF_COUNTERS block where there is an encoding benefit.

There is no need to reference current at all.  What is actually needed is the
per_cpu_offset which can be obtained directly from the top-of-stack block.
This simplifies the counter handling to 3 instructions and no spilling to the
stack at all.

The same breakage from above is now handled properly:

  (XEN) traps.c:1591: GPF (0000): ffff82d0806394fe [__start_xen+0x2cd/0x2980] -> ffff82d0803b3bfb

Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Julien Grall <jgrall@amazon.com>
---
 xen/arch/x86/x86_64/asm-offsets.c |  1 +
 xen/arch/x86/x86_64/entry.S       | 10 +++++++---
 xen/include/asm-x86/asm_defns.h   | 16 ----------------
 3 files changed, 8 insertions(+), 19 deletions(-)

Comments

Jan Beulich April 16, 2020, 7:19 a.m. UTC | #1
On 15.04.2020 19:39, Andrew Cooper wrote:
> The PERFC_INCR() macro uses current->processor, but current is not valid
> during early boot.  This causes the following crash to occur if
> e.g. rdmsr_safe() has to recover from a #GP fault.
> 
>   (XEN) Early fatal page fault at e008:ffff82d0803b1a39 (cr2=0000000000000004, ec=0000)
>   (XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
>   (XEN) CPU:    0
>   (XEN) RIP:    e008:[<ffff82d0803b1a39>] x86_64/entry.S#handle_exception_saved+0x64/0xb8
>   ...
>   (XEN) Xen call trace:
>   (XEN)    [<ffff82d0803b1a39>] R x86_64/entry.S#handle_exception_saved+0x64/0xb8
>   (XEN)    [<ffff82d0806394fe>] F __start_xen+0x2cd/0x2980
>   (XEN)    [<ffff82d0802000ec>] F __high_start+0x4c/0x4e
> 
> Furthermore, the PERFC_INCR() macro is wildly inefficient.  There has been a
> single caller for many releases now, so inline it and delete the macro
> completely.
> 
> For the assembly, move entry_vector from %eax into %ecx.  There is no encoding
> benefit for movzbl, and it frees up %eax to be used inside the
> CONFIG_PERF_COUNTERS block where there is an encoding benefit.

I don't mind the change in register use, but I'd certainly like to
understand the supposed "encoding benefit". Afaics ...

> --- a/xen/arch/x86/x86_64/entry.S
> +++ b/xen/arch/x86/x86_64/entry.S
> @@ -677,10 +677,14 @@ handle_exception_saved:
>  #endif /* CONFIG_PV */
>          sti
>  1:      movq  %rsp,%rdi
> -        movzbl UREGS_entry_vector(%rsp),%eax
> +        movzbl UREGS_entry_vector(%rsp), %ecx
>          leaq  exception_table(%rip),%rdx
> -        PERFC_INCR(exceptions, %rax, %rbx)
> -        mov   (%rdx, %rax, 8), %rdx
> +#ifdef CONFIG_PERF_COUNTERS
> +        lea   per_cpu__perfcounters(%rip), %rax
> +        add   STACK_CPUINFO_FIELD(per_cpu_offset)(%r14), %rax
> +        incl  ASM_PERFC_exceptions * 4(%rax, %rcx, 4)
> +#endif

... all insns inside the block use a ModR/M byte, and there's also
no special SIB encoding form used (i.e. one where the disp size
would increase because of register choice).

With this clarified, i.e. possibly the commit message updated
accordingly, and possibly even code churn reduced by undoing the
change of register used,
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan
Andrew Cooper April 16, 2020, 8:19 a.m. UTC | #2
On 16/04/2020 08:19, Jan Beulich wrote:
> On 15.04.2020 19:39, Andrew Cooper wrote:
>> The PERFC_INCR() macro uses current->processor, but current is not valid
>> during early boot.  This causes the following crash to occur if
>> e.g. rdmsr_safe() has to recover from a #GP fault.
>>
>>   (XEN) Early fatal page fault at e008:ffff82d0803b1a39 (cr2=0000000000000004, ec=0000)
>>   (XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
>>   (XEN) CPU:    0
>>   (XEN) RIP:    e008:[<ffff82d0803b1a39>] x86_64/entry.S#handle_exception_saved+0x64/0xb8
>>   ...
>>   (XEN) Xen call trace:
>>   (XEN)    [<ffff82d0803b1a39>] R x86_64/entry.S#handle_exception_saved+0x64/0xb8
>>   (XEN)    [<ffff82d0806394fe>] F __start_xen+0x2cd/0x2980
>>   (XEN)    [<ffff82d0802000ec>] F __high_start+0x4c/0x4e
>>
>> Furthermore, the PERFC_INCR() macro is wildly inefficient.  There has been a
>> single caller for many releases now, so inline it and delete the macro
>> completely.
>>
>> For the assembly, move entry_vector from %eax into %ecx.  There is no encoding
>> benefit for movzbl, and it frees up %eax to be used inside the
>> CONFIG_PERF_COUNTERS block where there is an encoding benefit.
> I don't mind the change in register use, but I'd certainly like to
> understand the supposed "encoding benefit". Afaics ...
>
>> --- a/xen/arch/x86/x86_64/entry.S
>> +++ b/xen/arch/x86/x86_64/entry.S
>> @@ -677,10 +677,14 @@ handle_exception_saved:
>>  #endif /* CONFIG_PV */
>>          sti
>>  1:      movq  %rsp,%rdi
>> -        movzbl UREGS_entry_vector(%rsp),%eax
>> +        movzbl UREGS_entry_vector(%rsp), %ecx
>>          leaq  exception_table(%rip),%rdx
>> -        PERFC_INCR(exceptions, %rax, %rbx)
>> -        mov   (%rdx, %rax, 8), %rdx
>> +#ifdef CONFIG_PERF_COUNTERS
>> +        lea   per_cpu__perfcounters(%rip), %rax
>> +        add   STACK_CPUINFO_FIELD(per_cpu_offset)(%r14), %rax
>> +        incl  ASM_PERFC_exceptions * 4(%rax, %rcx, 4)
>> +#endif
> ... all insns inside the block use a ModR/M byte, and there's also
> no special SIB encoding form used (i.e. one where the disp size
> would increase because of register choice).

Hmm - I suppose it is stale, written for an older version of the logic
before I spotted a further optimisation.

> With this clarified, i.e. possibly the commit message updated
> accordingly, and possibly even code churn reduced by undoing the
> change of register used,

Leaving %eax as was, and using %rdi for the single temporary looks like:

diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 997c481ecb..1984bb7948 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -679,7 +679,11 @@ handle_exception_saved:
 1:      movq  %rsp,%rdi
         movzbl UREGS_entry_vector(%rsp),%eax
         leaq  exception_table(%rip),%rdx
-        PERFC_INCR(exceptions, %rax, %rbx)
+#ifdef CONFIG_PERF_COUNTERS
+        lea   per_cpu__perfcounters(%rip), %rdi
+        add   STACK_CPUINFO_FIELD(per_cpu_offset)(%r14), %rdi
+        incl  ASM_PERFC_exceptions * 4(%rdi, %rax, 4)
+#endif
         mov   (%rdx, %rax, 8), %rdx
         INDIRECT_CALL %rdx
         mov   %r15, STACK_CPUINFO_FIELD(xen_cr3)(%r14)


If you're happy with that?

~Andrew
Julien Grall April 16, 2020, 8:33 a.m. UTC | #3
Hi Andrew,

On 15/04/2020 18:39, Andrew Cooper wrote:
> The PERFC_INCR() macro uses current->processor, but current is not valid
> during early boot.  This causes the following crash to occur if
> e.g. rdmsr_safe() has to recover from a #GP fault.
> 
>    (XEN) Early fatal page fault at e008:ffff82d0803b1a39 (cr2=0000000000000004, ec=0000)
>    (XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
>    (XEN) CPU:    0
>    (XEN) RIP:    e008:[<ffff82d0803b1a39>] x86_64/entry.S#handle_exception_saved+0x64/0xb8
>    ...
>    (XEN) Xen call trace:
>    (XEN)    [<ffff82d0803b1a39>] R x86_64/entry.S#handle_exception_saved+0x64/0xb8
>    (XEN)    [<ffff82d0806394fe>] F __start_xen+0x2cd/0x2980
>    (XEN)    [<ffff82d0802000ec>] F __high_start+0x4c/0x4e
> 
> Furthermore, the PERFC_INCR() macro is wildly inefficient.  There has been a
> single caller for many releases now, so inline it and delete the macro
> completely.
> 
> For the assembly, move entry_vector from %eax into %ecx.  There is no encoding
> benefit for movzbl, and it frees up %eax to be used inside the
> CONFIG_PERF_COUNTERS block where there is an encoding benefit.
> 
> There is no need to reference current at all.  What is actually needed is the
> per_cpu_offset which can be obtained directly from the top-of-stack block.
> This simplifies the counter handling to 3 instructions and no spilling to the
> stack at all.
> 
> The same breakage from above is now handled properly:
> 
>    (XEN) traps.c:1591: GPF (0000): ffff82d0806394fe [__start_xen+0x2cd/0x2980] -> ffff82d0803b3bfb
> 
> Reported-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

I can confirm the crash I have seen has now disappeared.

Tested-by: Julien Grall <jgrall@amazon.com>

Cheers,
Jan Beulich April 16, 2020, 8:35 a.m. UTC | #4
On 16.04.2020 10:19, Andrew Cooper wrote:
> --- a/xen/arch/x86/x86_64/entry.S
> +++ b/xen/arch/x86/x86_64/entry.S
> @@ -679,7 +679,11 @@ handle_exception_saved:
>  1:      movq  %rsp,%rdi
>          movzbl UREGS_entry_vector(%rsp),%eax
>          leaq  exception_table(%rip),%rdx
> -        PERFC_INCR(exceptions, %rax, %rbx)
> +#ifdef CONFIG_PERF_COUNTERS
> +        lea   per_cpu__perfcounters(%rip), %rdi
> +        add   STACK_CPUINFO_FIELD(per_cpu_offset)(%r14), %rdi
> +        incl  ASM_PERFC_exceptions * 4(%rdi, %rax, 4)
> +#endif
>          mov   (%rdx, %rax, 8), %rdx
>          INDIRECT_CALL %rdx
>          mov   %r15, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
> 
> 
> If you're happy with that?

I'm afraid I'm not - you can't use %rdi here, it already holds the
called function's argument. I'd be fine with %rcx used instead.

Jan
Andrew Cooper April 16, 2020, 8:47 a.m. UTC | #5
On 16/04/2020 09:35, Jan Beulich wrote:
> On 16.04.2020 10:19, Andrew Cooper wrote:
>> --- a/xen/arch/x86/x86_64/entry.S
>> +++ b/xen/arch/x86/x86_64/entry.S
>> @@ -679,7 +679,11 @@ handle_exception_saved:
>>  1:      movq  %rsp,%rdi
>>          movzbl UREGS_entry_vector(%rsp),%eax
>>          leaq  exception_table(%rip),%rdx
>> -        PERFC_INCR(exceptions, %rax, %rbx)
>> +#ifdef CONFIG_PERF_COUNTERS
>> +        lea   per_cpu__perfcounters(%rip), %rdi
>> +        add   STACK_CPUINFO_FIELD(per_cpu_offset)(%r14), %rdi
>> +        incl  ASM_PERFC_exceptions * 4(%rdi, %rax, 4)
>> +#endif
>>          mov   (%rdx, %rax, 8), %rdx
>>          INDIRECT_CALL %rdx
>>          mov   %r15, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
>>
>>
>> If you're happy with that?
> I'm afraid I'm not - you can't use %rdi here, it already holds the
> called function's argument. I'd be fine with %rcx used instead.

Bah - serves me right for being lazy and not retesting.

%rcx it is.

~Andrew

Patch
diff mbox series

diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
index b8e8510439..500df7a3e7 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -112,6 +112,7 @@  void __dummy__(void)
     OFFSET(CPUINFO_guest_cpu_user_regs, struct cpu_info, guest_cpu_user_regs);
     OFFSET(CPUINFO_verw_sel, struct cpu_info, verw_sel);
     OFFSET(CPUINFO_current_vcpu, struct cpu_info, current_vcpu);
+    OFFSET(CPUINFO_per_cpu_offset, struct cpu_info, per_cpu_offset);
     OFFSET(CPUINFO_cr4, struct cpu_info, cr4);
     OFFSET(CPUINFO_xen_cr3, struct cpu_info, xen_cr3);
     OFFSET(CPUINFO_pv_cr3, struct cpu_info, pv_cr3);
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 997c481ecb..52bd41d9f6 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -677,10 +677,14 @@  handle_exception_saved:
 #endif /* CONFIG_PV */
         sti
 1:      movq  %rsp,%rdi
-        movzbl UREGS_entry_vector(%rsp),%eax
+        movzbl UREGS_entry_vector(%rsp), %ecx
         leaq  exception_table(%rip),%rdx
-        PERFC_INCR(exceptions, %rax, %rbx)
-        mov   (%rdx, %rax, 8), %rdx
+#ifdef CONFIG_PERF_COUNTERS
+        lea   per_cpu__perfcounters(%rip), %rax
+        add   STACK_CPUINFO_FIELD(per_cpu_offset)(%r14), %rax
+        incl  ASM_PERFC_exceptions * 4(%rax, %rcx, 4)
+#endif
+        mov   (%rdx, %rcx, 8), %rdx
         INDIRECT_CALL %rdx
         mov   %r15, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
         mov   %r13b, STACK_CPUINFO_FIELD(use_pv_cr3)(%r14)
diff --git a/xen/include/asm-x86/asm_defns.h b/xen/include/asm-x86/asm_defns.h
index bc9d9fcdb2..b42a19b654 100644
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -346,22 +346,6 @@  static always_inline void stac(void)
 
 #endif
 
-#ifdef CONFIG_PERF_COUNTERS
-#define PERFC_INCR(_name,_idx,_cur)             \
-        pushq _cur;                             \
-        movslq VCPU_processor(_cur),_cur;       \
-        pushq %rdx;                             \
-        leaq __per_cpu_offset(%rip),%rdx;       \
-        movq (%rdx,_cur,8),_cur;                \
-        leaq per_cpu__perfcounters(%rip),%rdx;  \
-        addq %rdx,_cur;                         \
-        popq %rdx;                              \
-        incl ASM_PERFC_##_name*4(_cur,_idx,4);  \
-        popq _cur
-#else
-#define PERFC_INCR(_name,_idx,_cur)
-#endif
-
 /* Work around AMD erratum #88 */
 #define safe_swapgs                             \
         "mfence; swapgs;"