diff mbox series

[kvm-unit-tests,2/2] x86: replace `int 0x20` with `syscall`

Message ID 20220424070951.106990-2-darcy.sh@antgroup.com (mailing list archive)
State New, archived
Headers show
Series [kvm-unit-tests,1/2] x86: replace `push` `pop` with callee-clobbered list | expand

Commit Message

SU Hang April 24, 2022, 7:09 a.m. UTC
Signed-off-by: SU Hang <darcy.sh@antgroup.com>
---
 lib/x86/usermode.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Sean Christopherson April 25, 2022, 3:58 p.m. UTC | #1
On Sun, Apr 24, 2022, SU Hang wrote:

Why?  As gross as it is, I actually think INTn is a better option because it
doesn't require writing multiple MSRs, and can work for both 64-bit and 32-bit KUT.
The latter is currently a moot point since this code is 64-bit only, but the UMIP
test _does_ support 32-bit, and it's do_ring3() should really be rolled into this
framework.

Furthermore, we really should have a test to verify that KVM correctly emulates
SYSCALL at CPL3 with EFER.SCE=0, and forcing EFER.SCE=1 just to get to CPL3 would
make it impossible to utilize this framework for such a test.

> Signed-off-by: SU Hang <darcy.sh@antgroup.com>
> ---
>  lib/x86/usermode.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/x86/usermode.c b/lib/x86/usermode.c
> index 477cb9f..e4cb899 100644
> --- a/lib/x86/usermode.c
> +++ b/lib/x86/usermode.c
> @@ -12,7 +12,6 @@
>  #include <stdint.h>
>  
>  #define USERMODE_STACK_SIZE	0x2000
> -#define RET_TO_KERNEL_IRQ	0x20
>  
>  static jmp_buf jmpbuf;
>  
> @@ -40,9 +39,11 @@ uint64_t run_in_user(usermode_func func, unsigned int fault_vector,
>  	static unsigned char user_stack[USERMODE_STACK_SIZE];
>  
>  	*raised_vector = 0;
> -	set_idt_entry(RET_TO_KERNEL_IRQ, &ret_to_kernel, 3);
>  	handle_exception(fault_vector,
>  			restore_exec_to_jmpbuf_exception_handler);
> +	wrmsr(MSR_EFER, rdmsr(MSR_EFER) | EFER_SCE);

Tangentially related, KUT should really explicitly initialize EFER.SCE during boot.
cstart64.S leaves it alone, which means KUT has no defined value for EFER.SCE.
One thought would be to set EFER.SCE=1 during boot (in a separate patch) and remove
the code from the syscall test that forces EFER.SCE.  AFAICT, no existing test
verifies that KVM injects #UD on SYSCALL without EFER.SCE set, though it would be
nice to add one.

> +	wrmsr(MSR_STAR, ((u64)(USER_CS32 << 16) | KERNEL_CS) << 32);

It doesn't matter at this time because this framework doesn't ses SYSRET, but this
should be USER_CS or USER_CS64.

> +	wrmsr(MSR_LSTAR, (u64)&ret_to_kernel);
>  
>  	if (setjmp(jmpbuf) != 0) {
>  		*raised_vector = 1;
> @@ -73,7 +74,7 @@ uint64_t run_in_user(usermode_func func, unsigned int fault_vector,
>  			"mov %[arg4], %%rcx\n\t"
>  			"call *%[func]\n\t"
>  			/* Return to kernel via system call */
> -			"int %[kernel_entry_vector]\n\t"
> +			"syscall\n\t"
>  			/* Kernel Mode */
>  			"ret_to_kernel:\n\t"
>  			"mov %[rsp0], %%rsp\n\t"
> @@ -89,8 +90,7 @@ uint64_t run_in_user(usermode_func func, unsigned int fault_vector,
>  			[user_ds]"i"(USER_DS),
>  			[user_cs]"i"(USER_CS),
>  			[user_stack_top]"r"(user_stack +
> -					sizeof(user_stack)),
> -			[kernel_entry_vector]"i"(RET_TO_KERNEL_IRQ)
> +					sizeof(user_stack))
>  			:
>  			"rsi", "rdi", "rbx", "rcx", "rdx", "r8", "r9", "r10", "r11");
>  
> -- 
> 2.32.0.3.g01195cf9f
>
Sean Christopherson April 27, 2022, 1:07 a.m. UTC | #2
On Mon, Apr 25, 2022, Sean Christopherson wrote:
> On Sun, Apr 24, 2022, SU Hang wrote:
> 
> Why?  As gross as it is, I actually think INTn is a better option because it
> doesn't require writing multiple MSRs, and can work for both 64-bit and 32-bit KUT.
> The latter is currently a moot point since this code is 64-bit only, but the UMIP
> test _does_ support 32-bit, and it's do_ring3() should really be rolled into this
> framework.
> 
> Furthermore, we really should have a test to verify that KVM correctly emulates
> SYSCALL at CPL3 with EFER.SCE=0, and forcing EFER.SCE=1 just to get to CPL3 would
> make it impossible to utilize this framework for such a test.

And a concrete reason not to apply this patch: it causes the nVMX #AC test to fail:

  FAIL: #AC handled by L2
  FAIL: #AC handled by L1
SU Hang April 27, 2022, 9:27 a.m. UTC | #3
> Why?
We are implementing a para-virtualization hypervisor, which doesn't allow guest
to trigger soft interrupt > 0x20(but `int 0x80` works fine), so I want to
replace `int 0x20` with a more common `syscall`.


> it's do_ring3() should really be rolled into this framework.
Yes, it is worth working on it, I'll do it in my part-time.


> no existing test verifies that KVM injects #UD on SYSCALL without EFER.SCE
> set, though it would be nice to add one.
I am also interested in it, maybe do it later.


> > + wrmsr(MSR_STAR, ((u64)(USER_CS32 << 16) | KERNEL_CS) << 32);
> It doesn't matter at this time because this framework doesn't ses SYSRET, but
> this should be USER_CS or USER_CS64.
Oops, intel SDM vol.3 <chap 5.8.8> says:
"""
When SYSRET transfers control to 64-bit mode user code using REX.W, the
processor gets the privilege level 3 target code segment, instruction pointer,
          stack segment, and flags as follows:
    • Target code segment — Reads a non-NULL selector from IA32_STAR[63:48] + 16.
    • Stack segment — IA32_STAR[63:48] + 8.
"""

Since the value of USER_CS is 0x4b in 64 bit mode, SS register points to 0x53 =
0x4b + 8, (offset is 0x50) But `gdt + offset(0x50)` hasn't been setup(so does DS
register).
> refs: https://gitlab.com/kvm-unit-tests/kvm-unit-tests/-/blob/master/lib/x86/desc.c#L34
> refs: https://gitlab.com/kvm-unit-tests/kvm-unit-tests/-/blob/master/x86/syscall.c#L68

Linux also does so, the reason is to reuse user segment descriptor in both 32/64 bit.
> refs1: https://github.com/torvalds/linux/blob/46cf2c613f4b10eb12f749207b0fd2c1bfae3088/arch/x86/kernel/cpu/common.c#L1942
> refs2: https://github.com/torvalds/linux/blob/46cf2c613f4b10eb12f749207b0fd2c1bfae3088/arch/x86/include/asm/segment.h#L211
> refs3: https://github.com/torvalds/linux/blob/46cf2c613f4b10eb12f749207b0fd2c1bfae3088/arch/x86/kernel/cpu/common.c#L216

> And a concrete reason not to apply this patch: it causes the nVMX #AC test to fail:
It's awkward, some KUT test cases results diffs on my different machines, which
makes me don't know which result I could trust, so I only pay attention to the
test cases that I care about. I'll keep an eye on the rest cases in the
future.
Sean Christopherson April 28, 2022, 3:58 p.m. UTC | #4
On Wed, Apr 27, 2022, SU Hang wrote:
> > It doesn't matter at this time because this framework doesn't ses SYSRET, but
> > this should be USER_CS or USER_CS64.
> Oops, intel SDM vol.3 <chap 5.8.8> says:
> """
> When SYSRET transfers control to 64-bit mode user code using REX.W, the
> processor gets the privilege level 3 target code segment, instruction pointer,
>           stack segment, and flags as follows:
>     • Target code segment — Reads a non-NULL selector from IA32_STAR[63:48] + 16.

Ugh, missed that detail.

>     • Stack segment — IA32_STAR[63:48] + 8.
> """
diff mbox series

Patch

diff --git a/lib/x86/usermode.c b/lib/x86/usermode.c
index 477cb9f..e4cb899 100644
--- a/lib/x86/usermode.c
+++ b/lib/x86/usermode.c
@@ -12,7 +12,6 @@ 
 #include <stdint.h>
 
 #define USERMODE_STACK_SIZE	0x2000
-#define RET_TO_KERNEL_IRQ	0x20
 
 static jmp_buf jmpbuf;
 
@@ -40,9 +39,11 @@  uint64_t run_in_user(usermode_func func, unsigned int fault_vector,
 	static unsigned char user_stack[USERMODE_STACK_SIZE];
 
 	*raised_vector = 0;
-	set_idt_entry(RET_TO_KERNEL_IRQ, &ret_to_kernel, 3);
 	handle_exception(fault_vector,
 			restore_exec_to_jmpbuf_exception_handler);
+	wrmsr(MSR_EFER, rdmsr(MSR_EFER) | EFER_SCE);
+	wrmsr(MSR_STAR, ((u64)(USER_CS32 << 16) | KERNEL_CS) << 32);
+	wrmsr(MSR_LSTAR, (u64)&ret_to_kernel);
 
 	if (setjmp(jmpbuf) != 0) {
 		*raised_vector = 1;
@@ -73,7 +74,7 @@  uint64_t run_in_user(usermode_func func, unsigned int fault_vector,
 			"mov %[arg4], %%rcx\n\t"
 			"call *%[func]\n\t"
 			/* Return to kernel via system call */
-			"int %[kernel_entry_vector]\n\t"
+			"syscall\n\t"
 			/* Kernel Mode */
 			"ret_to_kernel:\n\t"
 			"mov %[rsp0], %%rsp\n\t"
@@ -89,8 +90,7 @@  uint64_t run_in_user(usermode_func func, unsigned int fault_vector,
 			[user_ds]"i"(USER_DS),
 			[user_cs]"i"(USER_CS),
 			[user_stack_top]"r"(user_stack +
-					sizeof(user_stack)),
-			[kernel_entry_vector]"i"(RET_TO_KERNEL_IRQ)
+					sizeof(user_stack))
 			:
 			"rsi", "rdi", "rbx", "rcx", "rdx", "r8", "r9", "r10", "r11");