diff mbox series

[v2] x86/pv: Inject #UD for missing SYSCALL callbacks

Message ID 20201009115301.19516-1-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series [v2] x86/pv: Inject #UD for missing SYSCALL callbacks | expand

Commit Message

Andrew Cooper Oct. 9, 2020, 11:53 a.m. UTC
Despite appearing to be a deliberate design choice of early PV64, the
resulting behaviour for unregistered SYSCALL callbacks creates an untenable
testability problem for Xen.  Furthermore, the behaviour is undocumented,
bizarre, and inconsistent with related behaviour in Xen, and very liable
introduce a security vulnerability into a PV guest if the author hasn't
studied Xen's assembly code in detail.

There are two different bugs here.

1) The current logic confuses the registered entrypoints, and may deliver a
   SYSCALL from 32bit userspace to the 64bit entry, when only a 64bit
   entrypoint is registered.

   This has been the case ever since 2007 (c/s cd75d47348b) but up until
   2018 (c/s dba899de14) the wrong selectors would be handed to the guest for
   a 32bit SYSCALL entry, making it appear as if it a 64bit entry all along.

   Xen would malfunction under these circumstances, if it were a PV guest.
   Linux would as well, but PVOps has always registered both entrypoints and
   discarded the Xen-provided selectors.  NetBSD really does malfunction as a
   consequence (benignly now, but a VM DoS before the 2018 Xen selector fix).

2) In the case that neither SYSCALL callbacks are registered, the guest will
   be crashed when userspace executes a SYSCALL instruction, which is a
   userspace => kernel DoS.

   This has been the case ever since the introduction of 64bit PV support, but
   behaves unlike all other SYSCALL/SYSENTER callbacks in Xen, which yield
   #GP/#UD in userspace before the callback is registered, and are therefore
   safe by default.

This change does constitute a change in the PV ABI, for corner cases of a PV
guest kernel registering neither callback, or not registering the 32bit
callback when running on AMD/Hygon hardware.

It brings the behaviour in line with PV32 SYSCALL/SYSENTER, and PV64
SYSENTER (safe by default, until explicitly enabled), as well as native
hardware (always delivered to the single applicable callback).

Most importantly however, and the primary reason for the change, is that it
lets us sensibly test the fast system call entrypoints under all states a PV
guest can construct, to prove correct behaviour.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Andy Lutomirski <luto@kernel.org>
CC: Manuel Bouyer <bouyer@antioche.eu.org>

v2:
 * Drop unnecessary instruction suffixes
 * Don't truncate #UD entrypoint to 32 bits

Manuel: This will result in a corner case change for NetBSD.

At the moment on native, 32bit userspace on 64bit NetBSD will get #UD (Intel,
etc), or an explicit -ENOSYS (AMD, etc) when trying to execute a 32bit SYSCALL
instruction.

After this change, a 64bit PV VM will consistently see #UD (like on Intel, etc
hardware) even when running on AMD/Hygon hardware (as Xsyscall32 isn't
registered with Xen), rather than following Xsyscall into the proper system
call path.
---
 xen/arch/x86/x86_64/entry.S | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

Comments

Manuel Bouyer Oct. 9, 2020, 12:40 p.m. UTC | #1
On Fri, Oct 09, 2020 at 12:53:01PM +0100, Andrew Cooper wrote:
> Despite appearing to be a deliberate design choice of early PV64, the
> resulting behaviour for unregistered SYSCALL callbacks creates an untenable
> testability problem for Xen.  Furthermore, the behaviour is undocumented,
> bizarre, and inconsistent with related behaviour in Xen, and very liable
> introduce a security vulnerability into a PV guest if the author hasn't
> studied Xen's assembly code in detail.
> 
> There are two different bugs here.
> 
> 1) The current logic confuses the registered entrypoints, and may deliver a
>    SYSCALL from 32bit userspace to the 64bit entry, when only a 64bit
>    entrypoint is registered.
> 
>    This has been the case ever since 2007 (c/s cd75d47348b) but up until
>    2018 (c/s dba899de14) the wrong selectors would be handed to the guest for
>    a 32bit SYSCALL entry, making it appear as if it a 64bit entry all along.
> 
>    Xen would malfunction under these circumstances, if it were a PV guest.
>    Linux would as well, but PVOps has always registered both entrypoints and
>    discarded the Xen-provided selectors.  NetBSD really does malfunction as a
>    consequence (benignly now, but a VM DoS before the 2018 Xen selector fix).

What do you mean with «malfunction» ? A 64bits guest can run 32bit code
just fine, this is part of our daily regression tests.
Andrew Cooper Oct. 9, 2020, 12:50 p.m. UTC | #2
On 09/10/2020 13:40, Manuel Bouyer wrote:
> On Fri, Oct 09, 2020 at 12:53:01PM +0100, Andrew Cooper wrote:
>> Despite appearing to be a deliberate design choice of early PV64, the
>> resulting behaviour for unregistered SYSCALL callbacks creates an untenable
>> testability problem for Xen.  Furthermore, the behaviour is undocumented,
>> bizarre, and inconsistent with related behaviour in Xen, and very liable
>> introduce a security vulnerability into a PV guest if the author hasn't
>> studied Xen's assembly code in detail.
>>
>> There are two different bugs here.
>>
>> 1) The current logic confuses the registered entrypoints, and may deliver a
>>    SYSCALL from 32bit userspace to the 64bit entry, when only a 64bit
>>    entrypoint is registered.
>>
>>    This has been the case ever since 2007 (c/s cd75d47348b) but up until
>>    2018 (c/s dba899de14) the wrong selectors would be handed to the guest for
>>    a 32bit SYSCALL entry, making it appear as if it a 64bit entry all along.
>>
>>    Xen would malfunction under these circumstances, if it were a PV guest.
>>    Linux would as well, but PVOps has always registered both entrypoints and
>>    discarded the Xen-provided selectors.  NetBSD really does malfunction as a
>>    consequence (benignly now, but a VM DoS before the 2018 Xen selector fix).
> What do you mean with «malfunction» ? A 64bits guest can run 32bit code
> just fine, this is part of our daily regression tests.

Right, but your 32bit code never executes the SYSCALL instruction,
because it is hardwired as -ENOSYS on native, and doesn't work on Intel
hardware at all.

Under Xen, this enters the regular syscall path (buggy but benign), and
before the selector fix two years ago, would (AFAICT) eventually try to
HYPERCALL_iret with the bogus selectors, and hit the failsafe callback,
which is a straight panic().

~Andrew
Roger Pau Monne Oct. 14, 2020, 2:16 p.m. UTC | #3
On Fri, Oct 09, 2020 at 12:53:01PM +0100, Andrew Cooper wrote:
> Despite appearing to be a deliberate design choice of early PV64, the
> resulting behaviour for unregistered SYSCALL callbacks creates an untenable
> testability problem for Xen.  Furthermore, the behaviour is undocumented,
> bizarre, and inconsistent with related behaviour in Xen, and very liable
> introduce a security vulnerability into a PV guest if the author hasn't
> studied Xen's assembly code in detail.
> 
> There are two different bugs here.
> 
> 1) The current logic confuses the registered entrypoints, and may deliver a
>    SYSCALL from 32bit userspace to the 64bit entry, when only a 64bit
>    entrypoint is registered.
> 
>    This has been the case ever since 2007 (c/s cd75d47348b) but up until
>    2018 (c/s dba899de14) the wrong selectors would be handed to the guest for
>    a 32bit SYSCALL entry, making it appear as if it a 64bit entry all along.
> 
>    Xen would malfunction under these circumstances, if it were a PV guest.
>    Linux would as well, but PVOps has always registered both entrypoints and
>    discarded the Xen-provided selectors.  NetBSD really does malfunction as a
>    consequence (benignly now, but a VM DoS before the 2018 Xen selector fix).
> 
> 2) In the case that neither SYSCALL callbacks are registered, the guest will
>    be crashed when userspace executes a SYSCALL instruction, which is a
>    userspace => kernel DoS.
> 
>    This has been the case ever since the introduction of 64bit PV support, but
>    behaves unlike all other SYSCALL/SYSENTER callbacks in Xen, which yield
>    #GP/#UD in userspace before the callback is registered, and are therefore
>    safe by default.

This seems fairly reasonable, as it turns a guest crash into an #UD
AFAICT.

> This change does constitute a change in the PV ABI, for corner cases of a PV
> guest kernel registering neither callback, or not registering the 32bit
> callback when running on AMD/Hygon hardware.

Is there any place suitable to document this behavior?

> It brings the behaviour in line with PV32 SYSCALL/SYSENTER, and PV64
> SYSENTER (safe by default, until explicitly enabled), as well as native
> hardware (always delivered to the single applicable callback).
> 
> Most importantly however, and the primary reason for the change, is that it
> lets us sensibly test the fast system call entrypoints under all states a PV
> guest can construct, to prove correct behaviour.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Wei Liu <wl@xen.org>
> CC: Andy Lutomirski <luto@kernel.org>
> CC: Manuel Bouyer <bouyer@antioche.eu.org>
> 
> v2:
>  * Drop unnecessary instruction suffixes
>  * Don't truncate #UD entrypoint to 32 bits
> 
> Manuel: This will result in a corner case change for NetBSD.
> 
> At the moment on native, 32bit userspace on 64bit NetBSD will get #UD (Intel,
> etc), or an explicit -ENOSYS (AMD, etc) when trying to execute a 32bit SYSCALL
> instruction.
> 
> After this change, a 64bit PV VM will consistently see #UD (like on Intel, etc
> hardware) even when running on AMD/Hygon hardware (as Xsyscall32 isn't
> registered with Xen), rather than following Xsyscall into the proper system
> call path.

Would this result in a regression for NetBSD then? Is it fine to see
#UD regardless of the platform? It's not clear to me from the text
above whether this change will cause issues with NetBSD.

Roger.
Manuel Bouyer Oct. 14, 2020, 2:20 p.m. UTC | #4
On Wed, Oct 14, 2020 at 04:16:20PM +0200, Roger Pau Monné wrote:
> [...]
> Would this result in a regression for NetBSD then? Is it fine to see
> #UD regardless of the platform? It's not clear to me from the text
> above whether this change will cause issues with NetBSD.

AFAIK this should not cause any issue. If I understand it properly,
SYSCALL in a 32bit context would not work in any case on Intel CPUs.
The patch just makes if fail on AMD cpus the same way it fails on Intel.
Andrew Cooper Oct. 14, 2020, 2:26 p.m. UTC | #5
On 14/10/2020 15:20, Manuel Bouyer wrote:
> On Wed, Oct 14, 2020 at 04:16:20PM +0200, Roger Pau Monné wrote:
>> [...]
>> Would this result in a regression for NetBSD then? Is it fine to see
>> #UD regardless of the platform? It's not clear to me from the text
>> above whether this change will cause issues with NetBSD.
> AFAIK this should not cause any issue. If I understand it properly,
> SYSCALL in a 32bit context would not work in any case on Intel CPUs.
> The patch just makes if fail on AMD cpus the same way it fails on Intel.

Correct.

~Andrew
Andrew Cooper Oct. 14, 2020, 3:17 p.m. UTC | #6
On 14/10/2020 15:16, Roger Pau Monné wrote:
>> This change does constitute a change in the PV ABI, for corner cases of a PV
>> guest kernel registering neither callback, or not registering the 32bit
>> callback when running on AMD/Hygon hardware.
> Is there any place suitable to document this behavior?

In the short term, my XTF test which will eventually get into CI.

Longer term, my theoretical future where I've described some of this
stuff in docs/guest-guide/

~Andrew
Roger Pau Monne Oct. 14, 2020, 4:28 p.m. UTC | #7
On Fri, Oct 09, 2020 at 12:53:01PM +0100, Andrew Cooper wrote:
> Despite appearing to be a deliberate design choice of early PV64, the
> resulting behaviour for unregistered SYSCALL callbacks creates an untenable
> testability problem for Xen.  Furthermore, the behaviour is undocumented,
> bizarre, and inconsistent with related behaviour in Xen, and very liable
> introduce a security vulnerability into a PV guest if the author hasn't
> studied Xen's assembly code in detail.
> 
> There are two different bugs here.
> 
> 1) The current logic confuses the registered entrypoints, and may deliver a
>    SYSCALL from 32bit userspace to the 64bit entry, when only a 64bit
>    entrypoint is registered.
> 
>    This has been the case ever since 2007 (c/s cd75d47348b) but up until
>    2018 (c/s dba899de14) the wrong selectors would be handed to the guest for
>    a 32bit SYSCALL entry, making it appear as if it a 64bit entry all along.
> 
>    Xen would malfunction under these circumstances, if it were a PV guest.
>    Linux would as well, but PVOps has always registered both entrypoints and
>    discarded the Xen-provided selectors.  NetBSD really does malfunction as a
>    consequence (benignly now, but a VM DoS before the 2018 Xen selector fix).
> 
> 2) In the case that neither SYSCALL callbacks are registered, the guest will
>    be crashed when userspace executes a SYSCALL instruction, which is a
>    userspace => kernel DoS.
> 
>    This has been the case ever since the introduction of 64bit PV support, but
>    behaves unlike all other SYSCALL/SYSENTER callbacks in Xen, which yield
>    #GP/#UD in userspace before the callback is registered, and are therefore
>    safe by default.
> 
> This change does constitute a change in the PV ABI, for corner cases of a PV
> guest kernel registering neither callback, or not registering the 32bit
> callback when running on AMD/Hygon hardware.
> 
> It brings the behaviour in line with PV32 SYSCALL/SYSENTER, and PV64
> SYSENTER (safe by default, until explicitly enabled), as well as native
> hardware (always delivered to the single applicable callback).
> 
> Most importantly however, and the primary reason for the change, is that it
> lets us sensibly test the fast system call entrypoints under all states a PV
> guest can construct, to prove correct behaviour.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Wei Liu <wl@xen.org>
> CC: Andy Lutomirski <luto@kernel.org>
> CC: Manuel Bouyer <bouyer@antioche.eu.org>
> 
> v2:
>  * Drop unnecessary instruction suffixes
>  * Don't truncate #UD entrypoint to 32 bits
> 
> Manuel: This will result in a corner case change for NetBSD.
> 
> At the moment on native, 32bit userspace on 64bit NetBSD will get #UD (Intel,
> etc), or an explicit -ENOSYS (AMD, etc) when trying to execute a 32bit SYSCALL
> instruction.
> 
> After this change, a 64bit PV VM will consistently see #UD (like on Intel, etc
> hardware) even when running on AMD/Hygon hardware (as Xsyscall32 isn't
> registered with Xen), rather than following Xsyscall into the proper system
> call path.
> ---
>  xen/arch/x86/x86_64/entry.S | 26 +++++++++++++++++++-------
>  1 file changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
> index 000eb9722b..aaf8402f93 100644
> --- a/xen/arch/x86/x86_64/entry.S
> +++ b/xen/arch/x86/x86_64/entry.S
> @@ -26,18 +26,30 @@
>  /* %rbx: struct vcpu */
>  ENTRY(switch_to_kernel)
>          leaq  VCPU_trap_bounce(%rbx),%rdx
> -        /* TB_eip = (32-bit syscall && syscall32_addr) ?
> -         *          syscall32_addr : syscall_addr */
> -        xor   %eax,%eax
> +
> +        /* TB_eip = 32-bit syscall ? syscall32_addr : syscall_addr */
> +        mov   VCPU_syscall32_addr(%rbx), %ecx

This being an unsigned long field, shouldn't you use %rcx here?

Roger.
Andrew Cooper Oct. 14, 2020, 5:41 p.m. UTC | #8
On 14/10/2020 17:28, Roger Pau Monné wrote:
> On Fri, Oct 09, 2020 at 12:53:01PM +0100, Andrew Cooper wrote:
>> Despite appearing to be a deliberate design choice of early PV64, the
>> resulting behaviour for unregistered SYSCALL callbacks creates an untenable
>> testability problem for Xen.  Furthermore, the behaviour is undocumented,
>> bizarre, and inconsistent with related behaviour in Xen, and very liable
>> introduce a security vulnerability into a PV guest if the author hasn't
>> studied Xen's assembly code in detail.
>>
>> There are two different bugs here.
>>
>> 1) The current logic confuses the registered entrypoints, and may deliver a
>>    SYSCALL from 32bit userspace to the 64bit entry, when only a 64bit
>>    entrypoint is registered.
>>
>>    This has been the case ever since 2007 (c/s cd75d47348b) but up until
>>    2018 (c/s dba899de14) the wrong selectors would be handed to the guest for
>>    a 32bit SYSCALL entry, making it appear as if it a 64bit entry all along.
>>
>>    Xen would malfunction under these circumstances, if it were a PV guest.
>>    Linux would as well, but PVOps has always registered both entrypoints and
>>    discarded the Xen-provided selectors.  NetBSD really does malfunction as a
>>    consequence (benignly now, but a VM DoS before the 2018 Xen selector fix).
>>
>> 2) In the case that neither SYSCALL callbacks are registered, the guest will
>>    be crashed when userspace executes a SYSCALL instruction, which is a
>>    userspace => kernel DoS.
>>
>>    This has been the case ever since the introduction of 64bit PV support, but
>>    behaves unlike all other SYSCALL/SYSENTER callbacks in Xen, which yield
>>    #GP/#UD in userspace before the callback is registered, and are therefore
>>    safe by default.
>>
>> This change does constitute a change in the PV ABI, for corner cases of a PV
>> guest kernel registering neither callback, or not registering the 32bit
>> callback when running on AMD/Hygon hardware.
>>
>> It brings the behaviour in line with PV32 SYSCALL/SYSENTER, and PV64
>> SYSENTER (safe by default, until explicitly enabled), as well as native
>> hardware (always delivered to the single applicable callback).
>>
>> Most importantly however, and the primary reason for the change, is that it
>> lets us sensibly test the fast system call entrypoints under all states a PV
>> guest can construct, to prove correct behaviour.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> CC: Wei Liu <wl@xen.org>
>> CC: Andy Lutomirski <luto@kernel.org>
>> CC: Manuel Bouyer <bouyer@antioche.eu.org>
>>
>> v2:
>>  * Drop unnecessary instruction suffixes
>>  * Don't truncate #UD entrypoint to 32 bits
>>
>> Manuel: This will result in a corner case change for NetBSD.
>>
>> At the moment on native, 32bit userspace on 64bit NetBSD will get #UD (Intel,
>> etc), or an explicit -ENOSYS (AMD, etc) when trying to execute a 32bit SYSCALL
>> instruction.
>>
>> After this change, a 64bit PV VM will consistently see #UD (like on Intel, etc
>> hardware) even when running on AMD/Hygon hardware (as Xsyscall32 isn't
>> registered with Xen), rather than following Xsyscall into the proper system
>> call path.
>> ---
>>  xen/arch/x86/x86_64/entry.S | 26 +++++++++++++++++++-------
>>  1 file changed, 19 insertions(+), 7 deletions(-)
>>
>> diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
>> index 000eb9722b..aaf8402f93 100644
>> --- a/xen/arch/x86/x86_64/entry.S
>> +++ b/xen/arch/x86/x86_64/entry.S
>> @@ -26,18 +26,30 @@
>>  /* %rbx: struct vcpu */
>>  ENTRY(switch_to_kernel)
>>          leaq  VCPU_trap_bounce(%rbx),%rdx
>> -        /* TB_eip = (32-bit syscall && syscall32_addr) ?
>> -         *          syscall32_addr : syscall_addr */
>> -        xor   %eax,%eax
>> +
>> +        /* TB_eip = 32-bit syscall ? syscall32_addr : syscall_addr */
>> +        mov   VCPU_syscall32_addr(%rbx), %ecx
> This being an unsigned long field, shouldn't you use %rcx here?

Yes I should.  Sorry - thought I'd fixed all of these.  I'll ad higher
half handlers to the XTF test.

~Andrew
diff mbox series

Patch

diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 000eb9722b..aaf8402f93 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -26,18 +26,30 @@ 
 /* %rbx: struct vcpu */
 ENTRY(switch_to_kernel)
         leaq  VCPU_trap_bounce(%rbx),%rdx
-        /* TB_eip = (32-bit syscall && syscall32_addr) ?
-         *          syscall32_addr : syscall_addr */
-        xor   %eax,%eax
+
+        /* TB_eip = 32-bit syscall ? syscall32_addr : syscall_addr */
+        mov   VCPU_syscall32_addr(%rbx), %ecx
+        mov   VCPU_syscall_addr(%rbx), %rax
         cmpw  $FLAT_USER_CS32,UREGS_cs(%rsp)
-        cmoveq VCPU_syscall32_addr(%rbx),%rax
-        testq %rax,%rax
-        cmovzq VCPU_syscall_addr(%rbx),%rax
-        movq  %rax,TRAPBOUNCE_eip(%rdx)
+        cmove %rcx, %rax
+
         /* TB_flags = VGCF_syscall_disables_events ? TBF_INTERRUPT : 0 */
         btl   $_VGCF_syscall_disables_events,VCPU_guest_context_flags(%rbx)
         setc  %cl
         leal  (,%rcx,TBF_INTERRUPT),%ecx
+
+        test  %rax, %rax
+UNLIKELY_START(z, syscall_no_callback) /* TB_eip == 0 => #UD */
+        mov   VCPU_trap_ctxt(%rbx), %rdi
+        movl  $X86_EXC_UD, UREGS_entry_vector(%rsp)
+        subl  $2, UREGS_rip(%rsp)
+        mov   X86_EXC_UD * TRAPINFO_sizeof + TRAPINFO_eip(%rdi), %rax
+        testb $4, X86_EXC_UD * TRAPINFO_sizeof + TRAPINFO_flags(%rdi)
+        setnz %cl
+        lea   TBF_EXCEPTION(, %rcx, TBF_INTERRUPT), %ecx
+UNLIKELY_END(syscall_no_callback)
+
+        movq  %rax,TRAPBOUNCE_eip(%rdx)
         movb  %cl,TRAPBOUNCE_flags(%rdx)
         call  create_bounce_frame
         andl  $~X86_EFLAGS_DF,UREGS_eflags(%rsp)