[kvm-unit-tests,14/16] svm: rewerite vm entry macros

Message ID	20221020152404.283980-15-mlevitsk@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: Maxim Levitsky <mlevitsk@redhat.com> To: kvm@vger.kernel.org Cc: Maxim Levitsky <mlevitsk@redhat.com>, Cathy Avery <cavery@redhat.com>, Paolo Bonzini <pbonzini@redhat.com> Subject: [kvm-unit-tests PATCH 14/16] svm: rewerite vm entry macros Date: Thu, 20 Oct 2022 18:24:02 +0300 Message-Id: <20221020152404.283980-15-mlevitsk@redhat.com> In-Reply-To: <20221020152404.283980-1-mlevitsk@redhat.com> References: <20221020152404.283980-1-mlevitsk@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	kvm-unit-tests: set of fixes and new tests \| expand [kvm-unit-tests,00/16] kvm-unit-tests: set of fixes and new tests [kvm-unit-tests,01/16] x86: make irq_enable avoid the interrupt shadow [kvm-unit-tests,02/16] x86: add few helper functions for apic local timer [kvm-unit-tests,03/16] svm: use irq_enable instead of sti/nop [kvm-unit-tests,04/16] svm: make svm_intr_intercept_mix_if/gif test a bit more robust [kvm-unit-tests,05/16] svm: use apic_start_timer/apic_stop_timer instead of open coding it [kvm-unit-tests,06/16] x86: Add test for #SMI during interrupt window [kvm-unit-tests,07/16] x86: Add a simple test for SYSENTER instruction. [kvm-unit-tests,08/16] svm: add nested shutdown test. [kvm-unit-tests,09/16] svm: move svm spec definitions to lib/x86/svm.h [kvm-unit-tests,10/16] svm: move some svm support functions into lib/x86/svm_lib.h [kvm-unit-tests,11/16] svm: add svm_suported [kvm-unit-tests,12/16] svm: move setup_svm to svm_lib.c [kvm-unit-tests,13/16] svm: move vmcb_ident to svm_lib.c [kvm-unit-tests,14/16] svm: rewerite vm entry macros [kvm-unit-tests,15/16] svm: introduce svm_vcpu [kvm-unit-tests,16/16] add IPI loss stress test

Maxim Levitsky Oct. 20, 2022, 3:24 p.m. UTC

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 lib/x86/svm_lib.h | 58 +++++++++++++++++++++++++++++++++++++++
 x86/svm.c         | 51 ++++++++++------------------------
 x86/svm.h         | 70 ++---------------------------------------------
 x86/svm_tests.c   | 24 ++++++++++------
 4 files changed, 91 insertions(+), 112 deletions(-)

Sean Christopherson Oct. 20, 2022, 6:55 p.m. UTC | #1

On Thu, Oct 20, 2022, Maxim Levitsky wrote:

Changelog please.  This patch in particular is extremely difficult to review
without some explanation of what is being done, and why.

If it's not too much trouble, splitting this over multiple patches would be nice.

> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  lib/x86/svm_lib.h | 58 +++++++++++++++++++++++++++++++++++++++
>  x86/svm.c         | 51 ++++++++++------------------------
>  x86/svm.h         | 70 ++---------------------------------------------
>  x86/svm_tests.c   | 24 ++++++++++------
>  4 files changed, 91 insertions(+), 112 deletions(-)
> 
> diff --git a/lib/x86/svm_lib.h b/lib/x86/svm_lib.h
> index 27c3b137..59db26de 100644
> --- a/lib/x86/svm_lib.h
> +++ b/lib/x86/svm_lib.h
> @@ -71,4 +71,62 @@ u8* svm_get_io_bitmap(void);
>  #define MSR_BITMAP_SIZE 8192
>  
>  
> +struct svm_extra_regs

Why not just svm_gprs?  This could even include RAX by grabbing it from the VMCB
after VMRUN.

> +{
> +    u64 rbx;
> +    u64 rcx;
> +    u64 rdx;
> +    u64 rbp;
> +    u64 rsi;
> +    u64 rdi;
> +    u64 r8;
> +    u64 r9;
> +    u64 r10;
> +    u64 r11;
> +    u64 r12;
> +    u64 r13;
> +    u64 r14;
> +    u64 r15;

Tab instead of spaces.

> +};
> +
> +#define SWAP_GPRS(reg) \
> +		"xchg %%rcx, 0x08(%%" reg ")\n\t"       \

No need for 2-tab indentation.

> +		"xchg %%rdx, 0x10(%%" reg ")\n\t"       \
> +		"xchg %%rbp, 0x18(%%" reg ")\n\t"       \
> +		"xchg %%rsi, 0x20(%%" reg ")\n\t"       \
> +		"xchg %%rdi, 0x28(%%" reg ")\n\t"       \
> +		"xchg %%r8,  0x30(%%" reg ")\n\t"       \
> +		"xchg %%r9,  0x38(%%" reg ")\n\t"       \
> +		"xchg %%r10, 0x40(%%" reg ")\n\t"       \
> +		"xchg %%r11, 0x48(%%" reg ")\n\t"       \
> +		"xchg %%r12, 0x50(%%" reg ")\n\t"       \
> +		"xchg %%r13, 0x58(%%" reg ")\n\t"       \
> +		"xchg %%r14, 0x60(%%" reg ")\n\t"       \
> +		"xchg %%r15, 0x68(%%" reg ")\n\t"       \
> +		\

Extra line.

> +		"xchg %%rbx, 0x00(%%" reg ")\n\t"       \

Why is RBX last here, but first in the struct?  Ah, because the initial swap uses
RBX as the scratch register.  Why use RAX for the post-VMRUN swap?  AFAICT, that's
completely arbitrary.

> +
> +

> +#define __SVM_VMRUN(vmcb, regs, label)          \
> +		asm volatile (                          \

Unnecessarily deep indentation.

> +			"vmload %%rax\n\t"                  \
> +			"push %%rax\n\t"                    \
> +			"push %%rbx\n\t"                    \
> +			SWAP_GPRS("rbx")                    \
> +			".global " label "\n\t"             \
> +			label ": vmrun %%rax\n\t"           \
> +			"vmsave %%rax\n\t"                  \
> +			"pop %%rax\n\t"                     \
> +			SWAP_GPRS("rax")                    \
> +			"pop %%rax\n\t"                     \
> +			:                                   \
> +			: "a" (virt_to_phys(vmcb)),         \
> +			  "b"(regs)                         \
> +			/* clobbers*/                       \
> +			: "memory"                          \
> +		);

If we're going to rewrite this, why not turn it into a proper assembly routine?
E.g. the whole test_run() noinline thing just so that vmrun_rip isn't redefined
is gross.

> diff --git a/x86/svm.c b/x86/svm.c
> index 37b4cd38..9484a6d1 100644
> --- a/x86/svm.c
> +++ b/x86/svm.c
> @@ -76,11 +76,11 @@ static void test_thunk(struct svm_test *test)
>  	vmmcall();
>  }
>  
> -struct regs regs;
> +struct svm_extra_regs regs;
>  
> -struct regs get_regs(void)
> +struct svm_extra_regs* get_regs(void)
>  {
> -	return regs;
> +	return &regs;

This isn't strictly necessary, is it?  I.e. avoiding the copy can be done in a
separate patch, no?

> @@ -2996,7 +2998,7 @@ static void svm_lbrv_test1(void)
>  
>  	wrmsr(MSR_IA32_DEBUGCTLMSR, DEBUGCTLMSR_LBR);
>  	DO_BRANCH(host_branch1);
> -	SVM_BARE_VMRUN;
> +	SVM_VMRUN(vmcb,regs);

Space after the comma.  Multiple cases below too.

>  	dbgctl = rdmsr(MSR_IA32_DEBUGCTLMSR);
>  
>  	if (vmcb->control.exit_code != SVM_EXIT_VMMCALL) {
> @@ -3011,6 +3013,8 @@ static void svm_lbrv_test1(void)
>  
>  static void svm_lbrv_test2(void)
>  {
> +	struct svm_extra_regs* regs = get_regs();
> +
>  	report(true, "Test that without LBRV enabled, guest LBR state does 'leak' to the host(2)");
>  
>  	vmcb->save.rip = (ulong)svm_lbrv_test_guest2;
> @@ -3019,7 +3023,7 @@ static void svm_lbrv_test2(void)
>  	wrmsr(MSR_IA32_DEBUGCTLMSR, DEBUGCTLMSR_LBR);
>  	DO_BRANCH(host_branch2);
>  	wrmsr(MSR_IA32_DEBUGCTLMSR, 0);
> -	SVM_BARE_VMRUN;
> +	SVM_VMRUN(vmcb,regs);
>  	dbgctl = rdmsr(MSR_IA32_DEBUGCTLMSR);
>  	wrmsr(MSR_IA32_DEBUGCTLMSR, 0);
>  
> @@ -3035,6 +3039,8 @@ static void svm_lbrv_test2(void)
>  
>  static void svm_lbrv_nested_test1(void)
>  {
> +	struct svm_extra_regs* regs = get_regs();
> +
>  	if (!lbrv_supported()) {
>  		report_skip("LBRV not supported in the guest");
>  		return;
> @@ -3047,7 +3053,7 @@ static void svm_lbrv_nested_test1(void)
>  
>  	wrmsr(MSR_IA32_DEBUGCTLMSR, DEBUGCTLMSR_LBR);
>  	DO_BRANCH(host_branch3);
> -	SVM_BARE_VMRUN;
> +	SVM_VMRUN(vmcb,regs);
>  	dbgctl = rdmsr(MSR_IA32_DEBUGCTLMSR);
>  	wrmsr(MSR_IA32_DEBUGCTLMSR, 0);
>  
> @@ -3068,6 +3074,8 @@ static void svm_lbrv_nested_test1(void)
>  
>  static void svm_lbrv_nested_test2(void)
>  {
> +	struct svm_extra_regs* regs = get_regs();
> +
>  	if (!lbrv_supported()) {
>  		report_skip("LBRV not supported in the guest");
>  		return;
> @@ -3083,7 +3091,7 @@ static void svm_lbrv_nested_test2(void)
>  
>  	wrmsr(MSR_IA32_DEBUGCTLMSR, DEBUGCTLMSR_LBR);
>  	DO_BRANCH(host_branch4);
> -	SVM_BARE_VMRUN;
> +	SVM_VMRUN(vmcb,regs);
>  	dbgctl = rdmsr(MSR_IA32_DEBUGCTLMSR);
>  	wrmsr(MSR_IA32_DEBUGCTLMSR, 0);
>  
> -- 
> 2.26.3
>

Maxim Levitsky Oct. 24, 2022, 12:45 p.m. UTC | #2

On Thu, 2022-10-20 at 18:55 +0000, Sean Christopherson wrote:
> On Thu, Oct 20, 2022, Maxim Levitsky wrote:
> 
> Changelog please.  This patch in particular is extremely difficult to review
> without some explanation of what is being done, and why.
> 
> If it's not too much trouble, splitting this over multiple patches would be nice.
> 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  lib/x86/svm_lib.h | 58 +++++++++++++++++++++++++++++++++++++++
> >  x86/svm.c         | 51 ++++++++++------------------------
> >  x86/svm.h         | 70 ++---------------------------------------------
> >  x86/svm_tests.c   | 24 ++++++++++------
> >  4 files changed, 91 insertions(+), 112 deletions(-)
> > 
> > diff --git a/lib/x86/svm_lib.h b/lib/x86/svm_lib.h
> > index 27c3b137..59db26de 100644
> > --- a/lib/x86/svm_lib.h
> > +++ b/lib/x86/svm_lib.h
> > @@ -71,4 +71,62 @@ u8* svm_get_io_bitmap(void);
> >  #define MSR_BITMAP_SIZE 8192
> >  
> >  
> > +struct svm_extra_regs
> 
> Why not just svm_gprs?  This could even include RAX by grabbing it from the VMCB
> after VMRUN.

I prefer to have a single source of truth - if I grab it from vmcb, then
it will have to be synced to vmcb on each vmrun, like the KVM does,
but it also has dirty registers bitmap and such. I prefer to keep it simple.

Plus there is also RSP in vmcb, and RFLAGS, and even RIP to some extent is a GPR.
To call this struct svm_gprs, I would have to include them there as well.
And also there is segment registers, etc, etc.

So instead of pretending that this struct contains all the GPRs of the guest
(or host while guest is running) I renamed it to state that it contains only
some gprs that SVM doesn't context switch.

> 
> > +{
> > +    u64 rbx;
> > +    u64 rcx;
> > +    u64 rdx;
> > +    u64 rbp;
> > +    u64 rsi;
> > +    u64 rdi;
> > +    u64 r8;
> > +    u64 r9;
> > +    u64 r10;
> > +    u64 r11;
> > +    u64 r12;
> > +    u64 r13;
> > +    u64 r14;
> > +    u64 r15;
> 
> Tab instead of spaces.
> 
> > +};
> > +
> > +#define SWAP_GPRS(reg) \
> > +               "xchg %%rcx, 0x08(%%" reg ")\n\t"       \
> 
> No need for 2-tab indentation.
> 
> > +               "xchg %%rdx, 0x10(%%" reg ")\n\t"       \
> > +               "xchg %%rbp, 0x18(%%" reg ")\n\t"       \
> > +               "xchg %%rsi, 0x20(%%" reg ")\n\t"       \
> > +               "xchg %%rdi, 0x28(%%" reg ")\n\t"       \
> > +               "xchg %%r8,  0x30(%%" reg ")\n\t"       \
> > +               "xchg %%r9,  0x38(%%" reg ")\n\t"       \
> > +               "xchg %%r10, 0x40(%%" reg ")\n\t"       \
> > +               "xchg %%r11, 0x48(%%" reg ")\n\t"       \
> > +               "xchg %%r12, 0x50(%%" reg ")\n\t"       \
> > +               "xchg %%r13, 0x58(%%" reg ")\n\t"       \
> > +               "xchg %%r14, 0x60(%%" reg ")\n\t"       \
> > +               "xchg %%r15, 0x68(%%" reg ")\n\t"       \
> > +               \
> 
> Extra line.
> 
> > +               "xchg %%rbx, 0x00(%%" reg ")\n\t"       \
> 
> Why is RBX last here, but first in the struct?  Ah, because the initial swap uses
> RBX as the scratch register.  Why use RAX for the post-VMRUN swap?  AFAICT, that's
> completely arbitrary.

Let me explain:

On entry to the guest the code has to save the host GPRs and then load the guest GPRs.

Host RAX and RBX are set by the gcc as I requested with "a" and "b" modifiers, but even
these should not be changed by the assembly code from the values set in the input.
(At least I haven't found a way to mark a register as both input and clobber)

Now RAX is the hardcoded input to VMRUN, thus I leave it alone, and use RBX as regs pointer,
which is restored to the guest value (and host value stored in the regs) at the end of SWAP_GPRs.

I could have used another GPR for regs pointer, but not RAX, because if I were to use RAX,
I would need then to restore it to vmcb pointer before vmrun, which will complicate the code.

That is what the kernel VMRUN code does though, however it doesn't preserve the host GPRs mostly,
relying on function ABI to preserve only registers that ABI states to be preserved.
IMHO all of this just doesn't matter much, as long as it works.

Now after the VMRUN, all have is useless RAX (points to VMCB) and RSP still pointing to the stack.
Guest values of these were stored to VMCB and host values restored from host save area by the CPU.

So after VMRUN no register can be touched but RAX, you can't even pop values from the stack,
since that would overwrite the guest value.

So on restore I pop from the stack the regs pointer to RAX, and using it I swap the guest and host GPRs.
and then I restore the RAX again to its VMCB pointer value.

I hope that explains why on entry I use RBX and on exit I use RAX.

If I switch to full blown assembly function for this, then I could do it.

Note though that my LBR tests do still need this as a macro because they must not do
any extra jumps/calls as these clobber the LBR registers.

> 
> > +
> > +
> 
> > +#define __SVM_VMRUN(vmcb, regs, label)          \
> > +               asm volatile (                          \
> 
> Unnecessarily deep indentation.
> 
> > +                       "vmload %%rax\n\t"                  \
> > +                       "push %%rax\n\t"                    \
> > +                       "push %%rbx\n\t"                    \
> > +                       SWAP_GPRS("rbx")                    \
> > +                       ".global " label "\n\t"             \
> > +                       label ": vmrun %%rax\n\t"           \
> > +                       "vmsave %%rax\n\t"                  \
> > +                       "pop %%rax\n\t"                     \
> > +                       SWAP_GPRS("rax")                    \
> > +                       "pop %%rax\n\t"                     \
> > +                       :                                   \
> > +                       : "a" (virt_to_phys(vmcb)),         \
> > +                         "b"(regs)                         \
> > +                       /* clobbers*/                       \
> > +                       : "memory"                          \
> > +               );
> 
> If we're going to rewrite this, why not turn it into a proper assembly routine?
> E.g. the whole test_run() noinline thing just so that vmrun_rip isn't redefined
> is gross.

I had limited time working on this, but yes it makes sense.
I see if I find time to do it.


> 
> > diff --git a/x86/svm.c b/x86/svm.c
> > index 37b4cd38..9484a6d1 100644
> > --- a/x86/svm.c
> > +++ b/x86/svm.c
> > @@ -76,11 +76,11 @@ static void test_thunk(struct svm_test *test)
> >         vmmcall();
> >  }
> >  
> > -struct regs regs;
> > +struct svm_extra_regs regs;
> >  
> > -struct regs get_regs(void)
> > +struct svm_extra_regs* get_regs(void)
> >  {
> > -       return regs;
> > +       return &regs;
> 
> This isn't strictly necessary, is it?  I.e. avoiding the copy can be done in a
> separate patch, no?
Yes.
> 
> > @@ -2996,7 +2998,7 @@ static void svm_lbrv_test1(void)
> >  
> >         wrmsr(MSR_IA32_DEBUGCTLMSR, DEBUGCTLMSR_LBR);
> >         DO_BRANCH(host_branch1);
> > -       SVM_BARE_VMRUN;
> > +       SVM_VMRUN(vmcb,regs);
> 
> Space after the comma.  Multiple cases below too.
> 
> >         dbgctl = rdmsr(MSR_IA32_DEBUGCTLMSR);
> >  
> >         if (vmcb->control.exit_code != SVM_EXIT_VMMCALL) {
> > @@ -3011,6 +3013,8 @@ static void svm_lbrv_test1(void)
> >  
> >  static void svm_lbrv_test2(void)
> >  {
> > +       struct svm_extra_regs* regs = get_regs();
> > +
> >         report(true, "Test that without LBRV enabled, guest LBR state does 'leak' to the host(2)");
> >  
> >         vmcb->save.rip = (ulong)svm_lbrv_test_guest2;
> > @@ -3019,7 +3023,7 @@ static void svm_lbrv_test2(void)
> >         wrmsr(MSR_IA32_DEBUGCTLMSR, DEBUGCTLMSR_LBR);
> >         DO_BRANCH(host_branch2);
> >         wrmsr(MSR_IA32_DEBUGCTLMSR, 0);
> > -       SVM_BARE_VMRUN;
> > +       SVM_VMRUN(vmcb,regs);
> >         dbgctl = rdmsr(MSR_IA32_DEBUGCTLMSR);
> >         wrmsr(MSR_IA32_DEBUGCTLMSR, 0);
> >  
> > @@ -3035,6 +3039,8 @@ static void svm_lbrv_test2(void)
> >  
> >  static void svm_lbrv_nested_test1(void)
> >  {
> > +       struct svm_extra_regs* regs = get_regs();
> > +
> >         if (!lbrv_supported()) {
> >                 report_skip("LBRV not supported in the guest");
> >                 return;
> > @@ -3047,7 +3053,7 @@ static void svm_lbrv_nested_test1(void)
> >  
> >         wrmsr(MSR_IA32_DEBUGCTLMSR, DEBUGCTLMSR_LBR);
> >         DO_BRANCH(host_branch3);
> > -       SVM_BARE_VMRUN;
> > +       SVM_VMRUN(vmcb,regs);
> >         dbgctl = rdmsr(MSR_IA32_DEBUGCTLMSR);
> >         wrmsr(MSR_IA32_DEBUGCTLMSR, 0);
> >  
> > @@ -3068,6 +3074,8 @@ static void svm_lbrv_nested_test1(void)
> >  
> >  static void svm_lbrv_nested_test2(void)
> >  {
> > +       struct svm_extra_regs* regs = get_regs();
> > +
> >         if (!lbrv_supported()) {
> >                 report_skip("LBRV not supported in the guest");
> >                 return;
> > @@ -3083,7 +3091,7 @@ static void svm_lbrv_nested_test2(void)
> >  
> >         wrmsr(MSR_IA32_DEBUGCTLMSR, DEBUGCTLMSR_LBR);
> >         DO_BRANCH(host_branch4);
> > -       SVM_BARE_VMRUN;
> > +       SVM_VMRUN(vmcb,regs);
> >         dbgctl = rdmsr(MSR_IA32_DEBUGCTLMSR);
> >         wrmsr(MSR_IA32_DEBUGCTLMSR, 0);
> >  
> > -- 
> > 2.26.3
> > 
> 

Best regards,
	Maxim Levitsky

Sean Christopherson Oct. 24, 2022, 7:56 p.m. UTC | #3

On Mon, Oct 24, 2022, Maxim Levitsky wrote:
> On Thu, 2022-10-20 at 18:55 +0000, Sean Christopherson wrote:
> > On Thu, Oct 20, 2022, Maxim Levitsky wrote:
> > 
> > Changelog please.  This patch in particular is extremely difficult to review
> > without some explanation of what is being done, and why.
> > 
> > If it's not too much trouble, splitting this over multiple patches would be nice.
> > 
> > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > ---
> > >  lib/x86/svm_lib.h | 58 +++++++++++++++++++++++++++++++++++++++
> > >  x86/svm.c         | 51 ++++++++++------------------------
> > >  x86/svm.h         | 70 ++---------------------------------------------
> > >  x86/svm_tests.c   | 24 ++++++++++------
> > >  4 files changed, 91 insertions(+), 112 deletions(-)
> > > 
> > > diff --git a/lib/x86/svm_lib.h b/lib/x86/svm_lib.h
> > > index 27c3b137..59db26de 100644
> > > --- a/lib/x86/svm_lib.h
> > > +++ b/lib/x86/svm_lib.h
> > > @@ -71,4 +71,62 @@ u8* svm_get_io_bitmap(void);
> > >  #define MSR_BITMAP_SIZE 8192
> > >  
> > >  
> > > +struct svm_extra_regs
> > 
> > Why not just svm_gprs?  This could even include RAX by grabbing it from the VMCB
> > after VMRUN.
> 
> I prefer to have a single source of truth - if I grab it from vmcb, then
> it will have to be synced to vmcb on each vmrun, like the KVM does,
> but it also has dirty registers bitmap and such.

KUT doesn't need a dirty registers bitmap.  That's purely a performance optimization
for VMX so that KVM can avoid unnecessary VMWRITEs for RIP and RSP.  E.g. SVM
ignores the dirty bitmap entirely:

  static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
  {
	struct vcpu_svm *svm = to_svm(vcpu);

	trace_kvm_entry(vcpu);

	svm->vmcb->save.rax = vcpu->arch.regs[VCPU_REGS_RAX];
	svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
	svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];

	...

  }

And even for VMX, I can't imagine a nVMX test will ever be so performance
sensitive that an extra VMWRITE for RSP will be a problem.

> I prefer to keep it simple.

The issue is simplifying the assembly code increases the complexity of the users.
E.g. users and readers need to understand what "extra regs", which means documenting
what is included and what's not.  On the other hand, the assembly is already quite
complex, adding a few lines to swap RAX and RSP doesn't really change the overall
of complexity of that low level code.

The other bit of complexity is that if a test wants to access all GPRs, it needs
both this struct and the VMCB.  RSP is unlikely to be problematic, but I can see
guest.RAX being something a test wants access to.

> Plus there is also RSP in vmcb, and RFLAGS, and even RIP to some extent is a GPR.

RIP is definitely not a GPR, it has no assigned index.  RFLAGS is also not a GPR.

> To call this struct svm_gprs, I would have to include them there as well.

RAX and RSP are the only GPRs that need to be moved to/from the VMCB.  

> And also there is segment registers, etc, etc.

Which aren't GPRs.

> So instead of pretending that this struct contains all the GPRs of the guest
> (or host while guest is running) I renamed it to state that it contains only
> some gprs that SVM doesn't context switch.

...

> > > +               "xchg %%rdx, 0x10(%%" reg ")\n\t"       \
> > > +               "xchg %%rbp, 0x18(%%" reg ")\n\t"       \
> > > +               "xchg %%rsi, 0x20(%%" reg ")\n\t"       \
> > > +               "xchg %%rdi, 0x28(%%" reg ")\n\t"       \
> > > +               "xchg %%r8,  0x30(%%" reg ")\n\t"       \
> > > +               "xchg %%r9,  0x38(%%" reg ")\n\t"       \
> > > +               "xchg %%r10, 0x40(%%" reg ")\n\t"       \
> > > +               "xchg %%r11, 0x48(%%" reg ")\n\t"       \
> > > +               "xchg %%r12, 0x50(%%" reg ")\n\t"       \
> > > +               "xchg %%r13, 0x58(%%" reg ")\n\t"       \
> > > +               "xchg %%r14, 0x60(%%" reg ")\n\t"       \
> > > +               "xchg %%r15, 0x68(%%" reg ")\n\t"       \
> > > +               \
> > 
> > Extra line.
> > 
> > > +               "xchg %%rbx, 0x00(%%" reg ")\n\t"       \
> > 
> > Why is RBX last here, but first in the struct?  Ah, because the initial swap uses
> > RBX as the scratch register.  Why use RAX for the post-VMRUN swap?  AFAICT, that's
> > completely arbitrary.
> 
> Let me explain:
> 
> On entry to the guest the code has to save the host GPRs and then load the guest GPRs.
> 
> Host RAX and RBX are set by the gcc as I requested with "a" and "b"
> modifiers, but even these should not be changed by the assembly code from the
> values set in the input.
> (At least I haven't found a way to mark a register as both input and clobber)

The way to achive input+clobber is to use input+output, i.e. "+b" (regs), but I
think that's a moot point...

> Now RAX is the hardcoded input to VMRUN, thus I leave it alone, and use RBX
> as regs pointer, which is restored to the guest value (and host value stored
> in the regs) at the end of SWAP_GPRs.

...because SWAP_GPRs isn't the end of the asm blob.  As long as RBX holds the
same value (regs) at the end of the asm blob, no clobbering is necessary even if
RBX is changed within the blob.

> If I switch to full blown assembly function for this, then I could do it.
> 
> Note though that my LBR tests do still need this as a macro because they must
> not do any extra jumps/calls as these clobber the LBR registers.

Shouldn't it be fairly easy to account for the CALL in the asm routine?  Taking
on that sort of dependency is quite gross, but it'd likely be less maintenance
in the long run than an inline asm blob.

Maxim Levitsky Oct. 27, 2022, 12:07 p.m. UTC | #4

On Mon, 2022-10-24 at 19:56 +0000, Sean Christopherson wrote:
> On Mon, Oct 24, 2022, Maxim Levitsky wrote:
> > On Thu, 2022-10-20 at 18:55 +0000, Sean Christopherson wrote:
> > > On Thu, Oct 20, 2022, Maxim Levitsky wrote:
> > > 
> > > Changelog please.  This patch in particular is extremely difficult to review
> > > without some explanation of what is being done, and why.
> > > 
> > > If it's not too much trouble, splitting this over multiple patches would be nice.
> > > 
> > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > > ---
> > > >  lib/x86/svm_lib.h | 58 +++++++++++++++++++++++++++++++++++++++
> > > >  x86/svm.c         | 51 ++++++++++------------------------
> > > >  x86/svm.h         | 70 ++---------------------------------------------
> > > >  x86/svm_tests.c   | 24 ++++++++++------
> > > >  4 files changed, 91 insertions(+), 112 deletions(-)
> > > > 
> > > > diff --git a/lib/x86/svm_lib.h b/lib/x86/svm_lib.h
> > > > index 27c3b137..59db26de 100644
> > > > --- a/lib/x86/svm_lib.h
> > > > +++ b/lib/x86/svm_lib.h
> > > > @@ -71,4 +71,62 @@ u8* svm_get_io_bitmap(void);
> > > >  #define MSR_BITMAP_SIZE 8192
> > > >  
> > > >  
> > > > +struct svm_extra_regs
> > > 
> > > Why not just svm_gprs?  This could even include RAX by grabbing it from the VMCB
> > > after VMRUN.
> > 
> > I prefer to have a single source of truth - if I grab it from vmcb, then
> > it will have to be synced to vmcb on each vmrun, like the KVM does,
> > but it also has dirty registers bitmap and such.
> 
> KUT doesn't need a dirty registers bitmap.  That's purely a performance optimization
> for VMX so that KVM can avoid unnecessary VMWRITEs for RIP and RSP.  E.g. SVM
> ignores the dirty bitmap entirely:
I know that.

> 
>   static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
>   {
> 	struct vcpu_svm *svm = to_svm(vcpu);
> 
> 	trace_kvm_entry(vcpu);
> 
> 	svm->vmcb->save.rax = vcpu->arch.regs[VCPU_REGS_RAX];
> 	svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
> 	svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];
> 
> 	...
> 
>   }
> 
> And even for VMX, I can't imagine a nVMX test will ever be so performance
> sensitive that an extra VMWRITE for RSP will be a problem.
> 
> > I prefer to keep it simple.

I too. So the only other more or less clean way is to copy the RAX and RSP from vmcb to 
svm_gprs on exit, and vise versa on VM entry. Is this what you mean?

> 
> The issue is simplifying the assembly code increases the complexity of the users.
> E.g. users and readers need to understand what "extra regs", which means documenting
> what is included and what's not.  On the other hand, the assembly is already quite
> complex, adding a few lines to swap RAX and RSP doesn't really change the overall
> of complexity of that low level code.
> 
> The other bit of complexity is that if a test wants to access all GPRs, it needs
> both this struct and the VMCB.  RSP is unlikely to be problematic, but I can see
> guest.RAX being something a test wants access to.
> 
> > Plus there is also RSP in vmcb, and RFLAGS, and even RIP to some extent is a GPR.
> 
> RIP is definitely not a GPR, it has no assigned index.  RFLAGS is also not a GPR.
> 
> > To call this struct svm_gprs, I would have to include them there as well.
> 
> RAX and RSP are the only GPRs that need to be moved to/from the VMCB.  
> 
> > And also there is segment registers, etc, etc.
> 
> Which aren't GPRs.

But user can want to use them too.

> 
> > So instead of pretending that this struct contains all the GPRs of the guest
> > (or host while guest is running) I renamed it to state that it contains only
> > some gprs that SVM doesn't context switch.
> 
> ...
> 
> > > > +               "xchg %%rdx, 0x10(%%" reg ")\n\t"       \
> > > > +               "xchg %%rbp, 0x18(%%" reg ")\n\t"       \
> > > > +               "xchg %%rsi, 0x20(%%" reg ")\n\t"       \
> > > > +               "xchg %%rdi, 0x28(%%" reg ")\n\t"       \
> > > > +               "xchg %%r8,  0x30(%%" reg ")\n\t"       \
> > > > +               "xchg %%r9,  0x38(%%" reg ")\n\t"       \
> > > > +               "xchg %%r10, 0x40(%%" reg ")\n\t"       \
> > > > +               "xchg %%r11, 0x48(%%" reg ")\n\t"       \
> > > > +               "xchg %%r12, 0x50(%%" reg ")\n\t"       \
> > > > +               "xchg %%r13, 0x58(%%" reg ")\n\t"       \
> > > > +               "xchg %%r14, 0x60(%%" reg ")\n\t"       \
> > > > +               "xchg %%r15, 0x68(%%" reg ")\n\t"       \
> > > > +               \
> > > 
> > > Extra line.
> > > 
> > > > +               "xchg %%rbx, 0x00(%%" reg ")\n\t"       \
> > > 
> > > Why is RBX last here, but first in the struct?  Ah, because the initial swap uses
> > > RBX as the scratch register.  Why use RAX for the post-VMRUN swap?  AFAICT, that's
> > > completely arbitrary.
> > 
> > Let me explain:
> > 
> > On entry to the guest the code has to save the host GPRs and then load the guest GPRs.
> > 
> > Host RAX and RBX are set by the gcc as I requested with "a" and "b"
> > modifiers, but even these should not be changed by the assembly code from the
> > values set in the input.
> > (At least I haven't found a way to mark a register as both input and clobber)
> 
> The way to achive input+clobber is to use input+output, i.e. "+b" (regs), but I
> think that's a moot point...
I'll try that.

> 
> > Now RAX is the hardcoded input to VMRUN, thus I leave it alone, and use RBX
> > as regs pointer, which is restored to the guest value (and host value stored
> > in the regs) at the end of SWAP_GPRs.
> 
> ...because SWAP_GPRs isn't the end of the asm blob.  As long as RBX holds the
> same value (regs) at the end of the asm blob, no clobbering is necessary even if
> RBX is changed within the blob.
Exactly - I preserved it over the stack, but if I can tell gcc that my macro
clobbers it, then I won't need to.


> 
> > If I switch to full blown assembly function for this, then I could do it.
> > 
> > Note though that my LBR tests do still need this as a macro because they must
> > not do any extra jumps/calls as these clobber the LBR registers.
> 
> Shouldn't it be fairly easy to account for the CALL in the asm routine?  Taking
> on that sort of dependency is quite gross, but it'd likely be less maintenance
> in the long run than an inline asm blob.

That is not possible - the SVM has just one LBR - so doing call will erase it.

I'll think of something, I also do want to turn this into a function.

Best regards,
	Maxim Levitsky

>

Sean Christopherson Oct. 27, 2022, 7:39 p.m. UTC | #5

On Thu, Oct 27, 2022, Maxim Levitsky wrote:
> On Mon, 2022-10-24 at 19:56 +0000, Sean Christopherson wrote:
> > > And also there is segment registers, etc, etc.
> > 
> > Which aren't GPRs.
> 
> But user can want to use them too.

My point is that they don't need to be handled in this the VM-Entry/VM-Exit path
as both VMX and SVM context switch all segment information through the VMCS/VMCB.
In other words, if we want to provide easy, generic access to segment information,
that can be done completely separately from this code and in a separate struct.

> > > Note though that my LBR tests do still need this as a macro because they must
> > > not do any extra jumps/calls as these clobber the LBR registers.
> > 
> > Shouldn't it be fairly easy to account for the CALL in the asm routine?  Taking
> > on that sort of dependency is quite gross, but it'd likely be less maintenance
> > in the long run than an inline asm blob.
> 
> That is not possible - the SVM has just one LBR - so doing call will erase it.

Ugh, that's a pain.  

> I'll think of something, I also do want to turn this into a function.

Actually, IIUC, there's no need to preserve the LBR across the call to a VMRUN
subroutine.  When checking that the host value is preserved, LBRs are disabled
before the call.  When checking that the guest value leaks back into the host,
the host value is irrelevant, the only thing that matters is that the LBR is
pre-filled with something other than the guest value, and that functionality is
provided by the call into the VMRUN subroutine.

LBR side topic #1, sequences like this should really be a single asm blob:

	wrmsr(MSR_IA32_DEBUGCTLMSR, DEBUGCTLMSR_LBR);
	DO_BRANCH(...);
	wrmsr(MSR_IA32_DEBUGCTLMSR, 0);

as there is nothing that prevents the compiler from inserting a branch between
DO_BRANCH() and the wrmsr().  It's extremely unlikely, but technicall possible.

LBR side topic #2, the tests are broken on our Milan systems.  I've poked around
a few times, but haven't dug in deep yet (and probably won't have cycles to do so
anytime soon).

PASS: Basic LBR test
PASS: Test that without LBRV enabled, guest LBR state does 'leak' to the host(1)
PASS: Test that without LBRV enabled, guest LBR state does 'leak' to the host(2)
PASS: Test that with LBRV enabled, guest LBR state doesn't leak (1)
Unhandled exception 6 #UD at ip 000000000040175c
error_code=0000      rflags=00010086      cs=00000008
rax=00000000004016e7 rcx=00000000000001dc rdx=80000000004016e7 rbx=0000000000414920
rbp=000000000042fa38 rsi=0000000000000000 rdi=0000000000414d98
 r8=00000000004176f9  r9=00000000000003f8 r10=000000000000000d r11=0000000000000000
r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000
cr0=0000000080010011 cr2=0000000000000000 cr3=00000000010bf000 cr4=0000000000040020
cr8=0000000000000000

[kvm-unit-tests,14/16] svm: rewerite vm entry macros

Commit Message

Comments

Patch