diff mbox series

[1/2] KVM: x86: Allow userspace to opt out of hypercall patching

Message ID 20220316005538.2282772-2-oupton@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: Allow opt out of guest hypercall patching | expand

Commit Message

Oliver Upton March 16, 2022, 12:55 a.m. UTC
KVM handles the VMCALL/VMMCALL instructions very strangely. Even though
both of these instructions really should #UD when executed on the wrong
vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the
guest's instruction with the appropriate instruction for the vendor.
Nonetheless, older guest kernels without commit c1118b3602c2 ("x86: kvm:
use alternatives for VMCALL vs. VMMCALL if kernel text is read-only")
do not patch in the appropriate instruction using alternatives, likely
motivating KVM's intervention.

Add a quirk allowing userspace to opt out of hypercall patching. If the
quirk is disabled, KVM synthesizes a #UD in the guest.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 Documentation/virt/kvm/api.rst  |  9 +++++++++
 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/include/uapi/asm/kvm.h | 11 ++++++-----
 arch/x86/kvm/x86.c              | 11 +++++++++++
 4 files changed, 28 insertions(+), 6 deletions(-)

Comments

Sean Christopherson March 24, 2022, 5:44 p.m. UTC | #1
On Wed, Mar 16, 2022, Oliver Upton wrote:
> KVM handles the VMCALL/VMMCALL instructions very strangely. Even though
> both of these instructions really should #UD when executed on the wrong
> vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the
> guest's instruction with the appropriate instruction for the vendor.
> Nonetheless, older guest kernels without commit c1118b3602c2 ("x86: kvm:
> use alternatives for VMCALL vs. VMMCALL if kernel text is read-only")
> do not patch in the appropriate instruction using alternatives, likely
> motivating KVM's intervention.
> 
> Add a quirk allowing userspace to opt out of hypercall patching.

A quirk may not be appropriate, per Paolo, the whole cross-vendor thing is
intentional.

https://lore.kernel.org/all/20211210222903.3417968-1-seanjc@google.com

> If the quirk is disabled, KVM synthesizes a #UD in the guest.

...

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d3a9ce07a565..685c4bc453b4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9291,6 +9291,17 @@ static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt)
>  	char instruction[3];
>  	unsigned long rip = kvm_rip_read(vcpu);
>  
> +	/*
> +	 * If the quirk is disabled, synthesize a #UD and let the guest pick up
> +	 * the pieces.
> +	 */
> +	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) {
> +		ctxt->exception.error_code_valid = false;
> +		ctxt->exception.vector = UD_VECTOR;
> +		ctxt->have_exception = true;
> +		return X86EMUL_PROPAGATE_FAULT;

This should return X86EMUL_UNHANDLEABLE instead of manually injecting a #UD.  That
will also end up generating a #UD in most cases, but will play nice with
KVM_CAP_EXIT_ON_EMULATION_FAILURE.
Paolo Bonzini March 24, 2022, 5:57 p.m. UTC | #2
On 3/24/22 18:44, Sean Christopherson wrote:
> On Wed, Mar 16, 2022, Oliver Upton wrote:
>> KVM handles the VMCALL/VMMCALL instructions very strangely. Even though
>> both of these instructions really should #UD when executed on the wrong
>> vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the
>> guest's instruction with the appropriate instruction for the vendor.
>> Nonetheless, older guest kernels without commit c1118b3602c2 ("x86: kvm:
>> use alternatives for VMCALL vs. VMMCALL if kernel text is read-only")
>> do not patch in the appropriate instruction using alternatives, likely
>> motivating KVM's intervention.
>>
>> Add a quirk allowing userspace to opt out of hypercall patching.
> 
> A quirk may not be appropriate, per Paolo, the whole cross-vendor thing is
> intentional.
> 
> https://lore.kernel.org/all/20211210222903.3417968-1-seanjc@google.com

It's intentional, but the days of the patching part are over.

KVM should just call emulate_hypercall if called with the wrong opcode 
(which in turn can be quirked away).  That however would be more complex 
to implement because the hypercall path wants to e.g. inject a #UD with 
kvm_queue_exception().

All this makes me want to just apply Oliver's patch.

>> +	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) {
>> +		ctxt->exception.error_code_valid = false;
>> +		ctxt->exception.vector = UD_VECTOR;
>> +		ctxt->have_exception = true;
>> +		return X86EMUL_PROPAGATE_FAULT;
> 
> This should return X86EMUL_UNHANDLEABLE instead of manually injecting a #UD.  That
> will also end up generating a #UD in most cases, but will play nice with
> KVM_CAP_EXIT_ON_EMULATION_FAILURE.

Hmm, not sure about that.  This is not an emulation failure in the sense 
that we don't know what to do.  We know that for this x86 vendor the 
right thing to do is to growl at the guest.

KVM_CAP_EXIT_ON_EMULATION_FAILURE would not have a way to ask KVM to 
invoke the hypercall, anyway, so it's not really possible for userspace 
to do the emulation.

Paolo
Oliver Upton March 24, 2022, 7:05 p.m. UTC | #3
On Thu, Mar 24, 2022 at 06:57:18PM +0100, Paolo Bonzini wrote:
> On 3/24/22 18:44, Sean Christopherson wrote:
> > On Wed, Mar 16, 2022, Oliver Upton wrote:
> > > KVM handles the VMCALL/VMMCALL instructions very strangely. Even though
> > > both of these instructions really should #UD when executed on the wrong
> > > vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the
> > > guest's instruction with the appropriate instruction for the vendor.
> > > Nonetheless, older guest kernels without commit c1118b3602c2 ("x86: kvm:
> > > use alternatives for VMCALL vs. VMMCALL if kernel text is read-only")
> > > do not patch in the appropriate instruction using alternatives, likely
> > > motivating KVM's intervention.
> > > 
> > > Add a quirk allowing userspace to opt out of hypercall patching.
> > 
> > A quirk may not be appropriate, per Paolo, the whole cross-vendor thing is
> > intentional.
> > 
> > https://lore.kernel.org/all/20211210222903.3417968-1-seanjc@google.com
> 
> It's intentional, but the days of the patching part are over.
> 
> KVM should just call emulate_hypercall if called with the wrong opcode
> (which in turn can be quirked away).  That however would be more complex to
> implement because the hypercall path wants to e.g. inject a #UD with
> kvm_queue_exception().
> 
> All this makes me want to just apply Oliver's patch.
> 
> > > +	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) {
> > > +		ctxt->exception.error_code_valid = false;
> > > +		ctxt->exception.vector = UD_VECTOR;
> > > +		ctxt->have_exception = true;
> > > +		return X86EMUL_PROPAGATE_FAULT;
> > 
> > This should return X86EMUL_UNHANDLEABLE instead of manually injecting a #UD.  That
> > will also end up generating a #UD in most cases, but will play nice with
> > KVM_CAP_EXIT_ON_EMULATION_FAILURE.

Sean and I were looking at this together right now, and it turns out
that returning X86EMUL_UNHANDLEABLE at this point will unconditionally
bounce out to userspace.

x86_decode_emulated_instruction() would need to be the spot we bail to
guard these exits with the CAP.

> Hmm, not sure about that.  This is not an emulation failure in the sense
> that we don't know what to do.  We know that for this x86 vendor the right
> thing to do is to growl at the guest.
> 
> KVM_CAP_EXIT_ON_EMULATION_FAILURE would not have a way to ask KVM to invoke
> the hypercall, anyway, so it's not really possible for userspace to do the
> emulation.

Userspace could theoretically patch the hypercall itself and retry execution.
But IMO, userspace should just leave the quirk enabled and accept the default
KVM behavior if it still wants hypercall patching.

--
Thanks,
Oliver
Sean Christopherson March 25, 2022, 11:53 p.m. UTC | #4
On Thu, Mar 24, 2022, Oliver Upton wrote:
> On Thu, Mar 24, 2022 at 06:57:18PM +0100, Paolo Bonzini wrote:
> > On 3/24/22 18:44, Sean Christopherson wrote:
> > > On Wed, Mar 16, 2022, Oliver Upton wrote:
> > > > KVM handles the VMCALL/VMMCALL instructions very strangely. Even though
> > > > both of these instructions really should #UD when executed on the wrong
> > > > vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the
> > > > guest's instruction with the appropriate instruction for the vendor.
> > > > Nonetheless, older guest kernels without commit c1118b3602c2 ("x86: kvm:
> > > > use alternatives for VMCALL vs. VMMCALL if kernel text is read-only")
> > > > do not patch in the appropriate instruction using alternatives, likely
> > > > motivating KVM's intervention.
> > > > 
> > > > Add a quirk allowing userspace to opt out of hypercall patching.
> > > 
> > > A quirk may not be appropriate, per Paolo, the whole cross-vendor thing is
> > > intentional.
> > > 
> > > https://lore.kernel.org/all/20211210222903.3417968-1-seanjc@google.com
> > 
> > It's intentional, but the days of the patching part are over.
> > 
> > KVM should just call emulate_hypercall if called with the wrong opcode
> > (which in turn can be quirked away).  That however would be more complex to
> > implement because the hypercall path wants to e.g. inject a #UD with
> > kvm_queue_exception().
> > 
> > All this makes me want to just apply Oliver's patch.
> > 
> > > > +	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) {
> > > > +		ctxt->exception.error_code_valid = false;
> > > > +		ctxt->exception.vector = UD_VECTOR;
> > > > +		ctxt->have_exception = true;
> > > > +		return X86EMUL_PROPAGATE_FAULT;
> > > 
> > > This should return X86EMUL_UNHANDLEABLE instead of manually injecting a #UD.  That
> > > will also end up generating a #UD in most cases, but will play nice with
> > > KVM_CAP_EXIT_ON_EMULATION_FAILURE.
> 
> Sean and I were looking at this together right now, and it turns out
> that returning X86EMUL_UNHANDLEABLE at this point will unconditionally
> bounce out to userspace.
> 
> x86_decode_emulated_instruction() would need to be the spot we bail to
> guard these exits with the CAP.

I've spent waaay too much time thinking about this...

TL;DR: I'm in favor of applying the patch as-is.

My objection to manually injecting the #UD is that there's no guarantee that KVM
is actually handling a #UD.  It's largely a moot point, as KVM barfs on VMCALL/VMMCALL
if it's _not_ from a #UD, e.g. KVM hangs the guest if it does a VMCALL when KVM is
emulating due to lack of unrestricted guest.  Forcing #UD is probably a net positive
in that case, as it will cause KVM to ultimately fail with "emulation error" and
bail to userspace.

It is possible to handle this in decode (idea below), but it will set us up for
pain later.  If KVM ever does gain support for truly emulating hypercall, then as
Paolo said, the question of whether to emulate the "wrong" hypercall is orthogonal
to whether or not to patch.  The correct way to check if the "wrong" hypercall
should be emulated would be to query the vCPU vendor via guest's CPUID, at which
point we _do_ want to get into em_hypercall() on UD to do the CPUID check even if
the quirk is disabled.  So I agree with Paolo, make it a quirk that guards the
patching, and document that because of KVM's deficiencies, that also happens to
mean that encountering the "wrong" hypercall is fatal to the guest.  Maybe fodder
for KVM's new vCPU errata? :-)  If we fix that erratum, then the quirk can be
modified to only skip the patching.

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index be83c9c8482d..29abeaf605e2 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -5248,9 +5248,15 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int

        ctxt->execute = opcode.u.execute;

-       if (unlikely(emulation_type & EMULTYPE_TRAP_UD) &&
-           likely(!(ctxt->d & EmulateOnUD)))
-               return EMULATION_FAILED;
+       if (unlikely(emulation_type & EMULTYPE_TRAP_UD)) {
+               if (likely(!(ctxt->d & EmulateOnUD)))
+                       return EMULATION_FAILED;
+
+               /* blah blah blah */
+               if (ctxt->execute == em_hypercall &&
+                   !ctxt->has_quirk_fix_hypercall)
+                       return EMULATION_FAILED;
+       }

        if (unlikely(ctxt->d &
            (NotImpl|Stack|Op3264|Sse|Mmx|Intercept|CheckPerm|NearBranch|
diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index 1cbd46cf71f9..35d2f20d368c 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -305,6 +305,8 @@ struct x86_emulate_ctxt {
        void *vcpu;
        const struct x86_emulate_ops *ops;

+       bool has_quirk_fix_hypercall;
+
        /* Register state before/after emulation. */
        unsigned long eflags;
        unsigned long eip; /* eip before instruction emulation */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 685c4bc453b4..832f85e72978 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7915,6 +7915,8 @@ static struct x86_emulate_ctxt *alloc_emulate_ctxt(struct kvm_vcpu *vcpu)

        ctxt->vcpu = vcpu;
        ctxt->ops = &emulate_ops;
+       ctxt->has_quirk_fix_hypercall =
+               kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN);
        vcpu->arch.emulate_ctxt = ctxt;

        return ctxt;
@@ -9291,16 +9293,8 @@ static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt)
        char instruction[3];
        unsigned long rip = kvm_rip_read(vcpu);

-       /*
-        * If the quirk is disabled, synthesize a #UD and let the guest pick up
-        * the pieces.
-        */
-       if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) {
-               ctxt->exception.error_code_valid = false;
-               ctxt->exception.vector = UD_VECTOR;
-               ctxt->have_exception = true;
-               return X86EMUL_PROPAGATE_FAULT;
-       }
+       if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN))
+               return X86EMUL_CONTINUE;

        static_call(kvm_x86_patch_hypercall)(vcpu, instruction);
Oliver Upton March 28, 2022, 5:28 p.m. UTC | #5
On Fri, Mar 25, 2022 at 11:53:05PM +0000, Sean Christopherson wrote:
> On Thu, Mar 24, 2022, Oliver Upton wrote:
> > On Thu, Mar 24, 2022 at 06:57:18PM +0100, Paolo Bonzini wrote:
> > > On 3/24/22 18:44, Sean Christopherson wrote:
> > > > On Wed, Mar 16, 2022, Oliver Upton wrote:
> > > > > KVM handles the VMCALL/VMMCALL instructions very strangely. Even though
> > > > > both of these instructions really should #UD when executed on the wrong
> > > > > vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the
> > > > > guest's instruction with the appropriate instruction for the vendor.
> > > > > Nonetheless, older guest kernels without commit c1118b3602c2 ("x86: kvm:
> > > > > use alternatives for VMCALL vs. VMMCALL if kernel text is read-only")
> > > > > do not patch in the appropriate instruction using alternatives, likely
> > > > > motivating KVM's intervention.
> > > > > 
> > > > > Add a quirk allowing userspace to opt out of hypercall patching.
> > > > 
> > > > A quirk may not be appropriate, per Paolo, the whole cross-vendor thing is
> > > > intentional.
> > > > 
> > > > https://lore.kernel.org/all/20211210222903.3417968-1-seanjc@google.com
> > > 
> > > It's intentional, but the days of the patching part are over.
> > > 
> > > KVM should just call emulate_hypercall if called with the wrong opcode
> > > (which in turn can be quirked away).  That however would be more complex to
> > > implement because the hypercall path wants to e.g. inject a #UD with
> > > kvm_queue_exception().
> > > 
> > > All this makes me want to just apply Oliver's patch.
> > > 
> > > > > +	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) {
> > > > > +		ctxt->exception.error_code_valid = false;
> > > > > +		ctxt->exception.vector = UD_VECTOR;
> > > > > +		ctxt->have_exception = true;
> > > > > +		return X86EMUL_PROPAGATE_FAULT;
> > > > 
> > > > This should return X86EMUL_UNHANDLEABLE instead of manually injecting a #UD.  That
> > > > will also end up generating a #UD in most cases, but will play nice with
> > > > KVM_CAP_EXIT_ON_EMULATION_FAILURE.
> > 
> > Sean and I were looking at this together right now, and it turns out
> > that returning X86EMUL_UNHANDLEABLE at this point will unconditionally
> > bounce out to userspace.
> > 
> > x86_decode_emulated_instruction() would need to be the spot we bail to
> > guard these exits with the CAP.
> 
> I've spent waaay too much time thinking about this...
> 
> TL;DR: I'm in favor of applying the patch as-is.
> 
> My objection to manually injecting the #UD is that there's no guarantee that KVM
> is actually handling a #UD.  It's largely a moot point, as KVM barfs on VMCALL/VMMCALL
> if it's _not_ from a #UD, e.g. KVM hangs the guest if it does a VMCALL when KVM is
> emulating due to lack of unrestricted guest.  Forcing #UD is probably a net positive
> in that case, as it will cause KVM to ultimately fail with "emulation error" and
> bail to userspace.
> 
> It is possible to handle this in decode (idea below), but it will set us up for
> pain later.  If KVM ever does gain support for truly emulating hypercall

There was another annoyance that motivated me to sidestep emulation
altogether.

'Correct' emulation (or whatever we decide to call what KVM does) of the hypercall
instruction would require that we actually inform the emulator about nested for
both vendor calls. And by that I mean both {svm,vmx}_check_intercept would need
to correctly handle both VMCALL/VMMCALL. The one nice thing about hypercall
patching is that we could keep L1 oblivious, as we would have already rewritten
the instruction before reflecting the exit to L1.

While I was looking at #UD under nested for this issue, I noticed:

Isn't there a subtle inversion on #UD intercepts for nVMX? L1 gets first dibs
on #UD, even though it is possible that L0 was emulating an instruction not
present in hardware (like RDPID). If L1 passed through RDPID the #UD
should not be reflected to L1. I believe this would require that we make
the emulator aware of nVMX which sounds like a science project on its
own.

Do we write this off as another erratum of KVM's (virtual) hardware on VMX? :)

--
Thanks,
Oliver
Sean Christopherson March 28, 2022, 6:28 p.m. UTC | #6
On Mon, Mar 28, 2022, Oliver Upton wrote:
> While I was looking at #UD under nested for this issue, I noticed:
> 
> Isn't there a subtle inversion on #UD intercepts for nVMX? L1 gets first dibs
> on #UD, even though it is possible that L0 was emulating an instruction not
> present in hardware (like RDPID). If L1 passed through RDPID the #UD
> should not be reflected to L1.

Yes, it's a known bug.

> I believe this would require that we make the emulator aware of nVMX which
> sounds like a science project on its own.

I don't think it would require any new awareness in the emulator proper, KVM
would "just" need to ensure it properly morphs the resulting reflected #UD to a
nested VM-Exit if the emulator doesn't "handle" the #UD.  In theory, that should
Just Work...

> Do we write this off as another erratum of KVM's (virtual) hardware on VMX? :)

I don't think we write it off entirely, but it's definitely on the backburner
because there are so precious few cases where KVM emulates on #UD.  And for good
reason, e.g. the RDPID case takes an instruction that exists purely to optimize
certain flows and turns them into dreadfully sloooow paths.
Maxim Levitsky Aug. 24, 2022, 9:34 a.m. UTC | #7
On Mon, 2022-03-28 at 18:28 +0000, Sean Christopherson wrote:
> On Mon, Mar 28, 2022, Oliver Upton wrote:
> > While I was looking at #UD under nested for this issue, I noticed:
> > 
> > Isn't there a subtle inversion on #UD intercepts for nVMX? L1 gets first dibs
> > on #UD, even though it is possible that L0 was emulating an instruction not
> > present in hardware (like RDPID). If L1 passed through RDPID the #UD
> > should not be reflected to L1.
> 
> Yes, it's a known bug.
> 
> > I believe this would require that we make the emulator aware of nVMX which
> > sounds like a science project on its own.
> 
> I don't think it would require any new awareness in the emulator proper, KVM
> would "just" need to ensure it properly morphs the resulting reflected #UD to a
> nested VM-Exit if the emulator doesn't "handle" the #UD.  In theory, that should
> Just Work...
> 
> > Do we write this off as another erratum of KVM's (virtual) hardware on VMX? :)
> 
> I don't think we write it off entirely, but it's definitely on the backburner
> because there are so precious few cases where KVM emulates on #UD.  And for good
> reason, e.g. the RDPID case takes an instruction that exists purely to optimize
> certain flows and turns them into dreadfully sloooow paths.
> 

I noticed that 'fix_hypercall_test' selftest fails if run in a VM. The reason is
that L0 patches the hypercall before L1 sees it so it can't really do anything
about it.

Do you think we can always stop patching hypercalls for the nested guest regardless
of the quirk, or that too will be considered breaking backwards compatability?

Best regards,
	Maxim Levitsky
Sean Christopherson Aug. 24, 2022, 2:43 p.m. UTC | #8
On Wed, Aug 24, 2022, Maxim Levitsky wrote:
> On Mon, 2022-03-28 at 18:28 +0000, Sean Christopherson wrote:
> > On Mon, Mar 28, 2022, Oliver Upton wrote:
> > > While I was looking at #UD under nested for this issue, I noticed:
> > > 
> > > Isn't there a subtle inversion on #UD intercepts for nVMX? L1 gets first dibs
> > > on #UD, even though it is possible that L0 was emulating an instruction not
> > > present in hardware (like RDPID). If L1 passed through RDPID the #UD
> > > should not be reflected to L1.
> > 
> > Yes, it's a known bug.
> > 
> > > I believe this would require that we make the emulator aware of nVMX which
> > > sounds like a science project on its own.
> > 
> > I don't think it would require any new awareness in the emulator proper, KVM
> > would "just" need to ensure it properly morphs the resulting reflected #UD to a
> > nested VM-Exit if the emulator doesn't "handle" the #UD.  In theory, that should
> > Just Work...
> > 
> > > Do we write this off as another erratum of KVM's (virtual) hardware on VMX? :)
> > 
> > I don't think we write it off entirely, but it's definitely on the backburner
> > because there are so precious few cases where KVM emulates on #UD.  And for good
> > reason, e.g. the RDPID case takes an instruction that exists purely to optimize
> > certain flows and turns them into dreadfully sloooow paths.
> > 
> 
> I noticed that 'fix_hypercall_test' selftest fails if run in a VM. The reason is
> that L0 patches the hypercall before L1 sees it so it can't really do anything
> about it.
> 
> Do you think we can always stop patching hypercalls for the nested guest regardless
> of the quirk, or that too will be considered breaking backwards compatability?

Heh, go run it on Intel, problem solved ;-)

As discussed last year[*], it's impossible to get this right in all cases, ignoring
the fact that patching in the first place is arguably wrong.  E.g. if KVM is running
on AMD hardware and L0 exposes an Intel vCPU to L1, then it sadly becomes KVM's
responsibility to patch L2 because from L1's perspective, a #UD on Intel's VMCALL
in L2 is spurious.

Regardless of what path we take, I do think we should align VMX and SVM on exception
intercept behavior.

[*] https://lore.kernel.org/all/YEZUhbBtNjWh0Zka@google.com
Maxim Levitsky Aug. 24, 2022, 3:06 p.m. UTC | #9
On Wed, 2022-08-24 at 14:43 +0000, Sean Christopherson wrote:
> On Wed, Aug 24, 2022, Maxim Levitsky wrote:
> > On Mon, 2022-03-28 at 18:28 +0000, Sean Christopherson wrote:
> > > On Mon, Mar 28, 2022, Oliver Upton wrote:
> > > > While I was looking at #UD under nested for this issue, I noticed:
> > > > 
> > > > Isn't there a subtle inversion on #UD intercepts for nVMX? L1 gets first dibs
> > > > on #UD, even though it is possible that L0 was emulating an instruction not
> > > > present in hardware (like RDPID). If L1 passed through RDPID the #UD
> > > > should not be reflected to L1.
> > > 
> > > Yes, it's a known bug.
> > > 
> > > > I believe this would require that we make the emulator aware of nVMX which
> > > > sounds like a science project on its own.
> > > 
> > > I don't think it would require any new awareness in the emulator proper, KVM
> > > would "just" need to ensure it properly morphs the resulting reflected #UD to a
> > > nested VM-Exit if the emulator doesn't "handle" the #UD.  In theory, that should
> > > Just Work...
> > > 
> > > > Do we write this off as another erratum of KVM's (virtual) hardware on VMX? :)
> > > 
> > > I don't think we write it off entirely, but it's definitely on the backburner
> > > because there are so precious few cases where KVM emulates on #UD.  And for good
> > > reason, e.g. the RDPID case takes an instruction that exists purely to optimize
> > > certain flows and turns them into dreadfully sloooow paths.
> > > 
> > 
> > I noticed that 'fix_hypercall_test' selftest fails if run in a VM. The reason is
> > that L0 patches the hypercall before L1 sees it so it can't really do anything
> > about it.
> > 
> > Do you think we can always stop patching hypercalls for the nested guest regardless
> > of the quirk, or that too will be considered breaking backwards compatability?
> 
> Heh, go run it on Intel, problem solved ;-)
> 
> As discussed last year[*], it's impossible to get this right in all cases, ignoring
> the fact that patching in the first place is arguably wrong.  E.g. if KVM is running
> on AMD hardware and L0 exposes an Intel vCPU to L1, then it sadly becomes KVM's
> responsibility to patch L2 because from L1's perspective, a #UD on Intel's VMCALL
> in L2 is spurious.
> 
> Regardless of what path we take, I do think we should align VMX and SVM on exception
> intercept behavior.

Maybe then we should at least skip the unit test if running nested (should be easy to check the hypervisor
cpuid)?

Oh well, I do understand you that the whole 'patching' thing is one big mess :(

I wonder how hard it will be to ask Qemu to disable this quirk....

Best regards,
	Maxim Levitsky

> 
> [*] https://lore.kernel.org/all/YEZUhbBtNjWh0Zka@google.com
>
Paolo Bonzini Aug. 24, 2022, 5:15 p.m. UTC | #10
On 8/24/22 17:06, Maxim Levitsky wrote:
> I wonder how hard it will be to ask Qemu to disable this quirk....

I think we should just do it.  Send a patch. :)

Paolo
Sean Christopherson Aug. 24, 2022, 6:40 p.m. UTC | #11
On Wed, Aug 24, 2022, Maxim Levitsky wrote:
> On Wed, 2022-08-24 at 14:43 +0000, Sean Christopherson wrote:
> > On Wed, Aug 24, 2022, Maxim Levitsky wrote:
> > > I noticed that 'fix_hypercall_test' selftest fails if run in a VM. The reason is
> > > that L0 patches the hypercall before L1 sees it so it can't really do anything
> > > about it.
> > > 
> > > Do you think we can always stop patching hypercalls for the nested guest regardless
> > > of the quirk, or that too will be considered breaking backwards compatability?
> > 
> > Heh, go run it on Intel, problem solved ;-)
> > 
> > As discussed last year[*], it's impossible to get this right in all cases, ignoring
> > the fact that patching in the first place is arguably wrong.  E.g. if KVM is running
> > on AMD hardware and L0 exposes an Intel vCPU to L1, then it sadly becomes KVM's
> > responsibility to patch L2 because from L1's perspective, a #UD on Intel's VMCALL
> > in L2 is spurious.
> > 
> > Regardless of what path we take, I do think we should align VMX and SVM on exception
> > intercept behavior.
> 
> Maybe then we should at least skip the unit test if running nested (should be
> easy to check the hypervisor cpuid)?

My preference is to keep the test as is.  It's not completely useless in a VM,
e.g. Intel works (currently), non-KVM VMs may or may not work, and VMMs that disable
the quirk in L0 will also work.

max_guest_memory_test is in a similar boat.  Running that in L1 is not recommended
as KVM's shadow paging just can't keep up.  But that doesn't mean that the test
should _never_ be run in L1.

If we have the test skip itself, then opting in requires a code change.  On the
other hand, skipping the test in whatever script is being used to run selftests
is easy enough, e.g. `grep hypervisor /proc/cpuinfo`.  IMO running test via `make`
is a complete mess and should be avoided anyways :-)
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 507be67746b0..862314e156ae 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7233,6 +7233,15 @@  The valid bits in cap.args[0] are:
                                     Additionally, when this quirk is disabled,
                                     KVM clears CPUID.01H:ECX[bit 3] if
                                     IA32_MISC_ENABLE[bit 18] is cleared.
+
+ KVM_X86_QUIRK_FIX_HYPERCALL_INSN   By default, KVM rewrites guest
+                                    VMMCALL/VMCALL instructions to match the
+                                    vendor's hypercall instruction for the
+                                    system. When this quirk is disabled, KVM
+                                    will no longer rewrite invalid guest
+                                    hypercall instructions. Executing the
+                                    incorrect hypercall instruction will
+                                    generate a #UD within the guest.
 =================================== ============================================
 
 8. Other capabilities.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9694dd5e6ccc..832e9af24a85 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1996,6 +1996,7 @@  int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
 	 KVM_X86_QUIRK_CD_NW_CLEARED |		\
 	 KVM_X86_QUIRK_LAPIC_MMIO_HOLE |	\
 	 KVM_X86_QUIRK_OUT_7E_INC_RIP |		\
-	 KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)
+	 KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT |	\
+	 KVM_X86_QUIRK_FIX_HYPERCALL_INSN)
 
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index bf6e96011dfe..21614807a2cb 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -428,11 +428,12 @@  struct kvm_sync_regs {
 	struct kvm_vcpu_events events;
 };
 
-#define KVM_X86_QUIRK_LINT0_REENABLED	   (1 << 0)
-#define KVM_X86_QUIRK_CD_NW_CLEARED	   (1 << 1)
-#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE	   (1 << 2)
-#define KVM_X86_QUIRK_OUT_7E_INC_RIP	   (1 << 3)
-#define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT (1 << 4)
+#define KVM_X86_QUIRK_LINT0_REENABLED		(1 << 0)
+#define KVM_X86_QUIRK_CD_NW_CLEARED		(1 << 1)
+#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE		(1 << 2)
+#define KVM_X86_QUIRK_OUT_7E_INC_RIP		(1 << 3)
+#define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT	(1 << 4)
+#define KVM_X86_QUIRK_FIX_HYPERCALL_INSN	(1 << 5)
 
 #define KVM_STATE_NESTED_FORMAT_VMX	0
 #define KVM_STATE_NESTED_FORMAT_SVM	1
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d3a9ce07a565..685c4bc453b4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9291,6 +9291,17 @@  static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt)
 	char instruction[3];
 	unsigned long rip = kvm_rip_read(vcpu);
 
+	/*
+	 * If the quirk is disabled, synthesize a #UD and let the guest pick up
+	 * the pieces.
+	 */
+	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) {
+		ctxt->exception.error_code_valid = false;
+		ctxt->exception.vector = UD_VECTOR;
+		ctxt->have_exception = true;
+		return X86EMUL_PROPAGATE_FAULT;
+	}
+
 	static_call(kvm_x86_patch_hypercall)(vcpu, instruction);
 
 	return emulator_write_emulated(ctxt, rip, instruction, 3,