Message ID | 20220316005538.2282772-2-oupton@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86: Allow opt out of guest hypercall patching | expand |
On Wed, Mar 16, 2022, Oliver Upton wrote: > KVM handles the VMCALL/VMMCALL instructions very strangely. Even though > both of these instructions really should #UD when executed on the wrong > vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the > guest's instruction with the appropriate instruction for the vendor. > Nonetheless, older guest kernels without commit c1118b3602c2 ("x86: kvm: > use alternatives for VMCALL vs. VMMCALL if kernel text is read-only") > do not patch in the appropriate instruction using alternatives, likely > motivating KVM's intervention. > > Add a quirk allowing userspace to opt out of hypercall patching. A quirk may not be appropriate, per Paolo, the whole cross-vendor thing is intentional. https://lore.kernel.org/all/20211210222903.3417968-1-seanjc@google.com > If the quirk is disabled, KVM synthesizes a #UD in the guest. ... > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index d3a9ce07a565..685c4bc453b4 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -9291,6 +9291,17 @@ static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt) > char instruction[3]; > unsigned long rip = kvm_rip_read(vcpu); > > + /* > + * If the quirk is disabled, synthesize a #UD and let the guest pick up > + * the pieces. > + */ > + if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) { > + ctxt->exception.error_code_valid = false; > + ctxt->exception.vector = UD_VECTOR; > + ctxt->have_exception = true; > + return X86EMUL_PROPAGATE_FAULT; This should return X86EMUL_UNHANDLEABLE instead of manually injecting a #UD. That will also end up generating a #UD in most cases, but will play nice with KVM_CAP_EXIT_ON_EMULATION_FAILURE.
On 3/24/22 18:44, Sean Christopherson wrote: > On Wed, Mar 16, 2022, Oliver Upton wrote: >> KVM handles the VMCALL/VMMCALL instructions very strangely. Even though >> both of these instructions really should #UD when executed on the wrong >> vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the >> guest's instruction with the appropriate instruction for the vendor. >> Nonetheless, older guest kernels without commit c1118b3602c2 ("x86: kvm: >> use alternatives for VMCALL vs. VMMCALL if kernel text is read-only") >> do not patch in the appropriate instruction using alternatives, likely >> motivating KVM's intervention. >> >> Add a quirk allowing userspace to opt out of hypercall patching. > > A quirk may not be appropriate, per Paolo, the whole cross-vendor thing is > intentional. > > https://lore.kernel.org/all/20211210222903.3417968-1-seanjc@google.com It's intentional, but the days of the patching part are over. KVM should just call emulate_hypercall if called with the wrong opcode (which in turn can be quirked away). That however would be more complex to implement because the hypercall path wants to e.g. inject a #UD with kvm_queue_exception(). All this makes me want to just apply Oliver's patch. >> + if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) { >> + ctxt->exception.error_code_valid = false; >> + ctxt->exception.vector = UD_VECTOR; >> + ctxt->have_exception = true; >> + return X86EMUL_PROPAGATE_FAULT; > > This should return X86EMUL_UNHANDLEABLE instead of manually injecting a #UD. That > will also end up generating a #UD in most cases, but will play nice with > KVM_CAP_EXIT_ON_EMULATION_FAILURE. Hmm, not sure about that. This is not an emulation failure in the sense that we don't know what to do. We know that for this x86 vendor the right thing to do is to growl at the guest. KVM_CAP_EXIT_ON_EMULATION_FAILURE would not have a way to ask KVM to invoke the hypercall, anyway, so it's not really possible for userspace to do the emulation. Paolo
On Thu, Mar 24, 2022 at 06:57:18PM +0100, Paolo Bonzini wrote: > On 3/24/22 18:44, Sean Christopherson wrote: > > On Wed, Mar 16, 2022, Oliver Upton wrote: > > > KVM handles the VMCALL/VMMCALL instructions very strangely. Even though > > > both of these instructions really should #UD when executed on the wrong > > > vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the > > > guest's instruction with the appropriate instruction for the vendor. > > > Nonetheless, older guest kernels without commit c1118b3602c2 ("x86: kvm: > > > use alternatives for VMCALL vs. VMMCALL if kernel text is read-only") > > > do not patch in the appropriate instruction using alternatives, likely > > > motivating KVM's intervention. > > > > > > Add a quirk allowing userspace to opt out of hypercall patching. > > > > A quirk may not be appropriate, per Paolo, the whole cross-vendor thing is > > intentional. > > > > https://lore.kernel.org/all/20211210222903.3417968-1-seanjc@google.com > > It's intentional, but the days of the patching part are over. > > KVM should just call emulate_hypercall if called with the wrong opcode > (which in turn can be quirked away). That however would be more complex to > implement because the hypercall path wants to e.g. inject a #UD with > kvm_queue_exception(). > > All this makes me want to just apply Oliver's patch. > > > > + if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) { > > > + ctxt->exception.error_code_valid = false; > > > + ctxt->exception.vector = UD_VECTOR; > > > + ctxt->have_exception = true; > > > + return X86EMUL_PROPAGATE_FAULT; > > > > This should return X86EMUL_UNHANDLEABLE instead of manually injecting a #UD. That > > will also end up generating a #UD in most cases, but will play nice with > > KVM_CAP_EXIT_ON_EMULATION_FAILURE. Sean and I were looking at this together right now, and it turns out that returning X86EMUL_UNHANDLEABLE at this point will unconditionally bounce out to userspace. x86_decode_emulated_instruction() would need to be the spot we bail to guard these exits with the CAP. > Hmm, not sure about that. This is not an emulation failure in the sense > that we don't know what to do. We know that for this x86 vendor the right > thing to do is to growl at the guest. > > KVM_CAP_EXIT_ON_EMULATION_FAILURE would not have a way to ask KVM to invoke > the hypercall, anyway, so it's not really possible for userspace to do the > emulation. Userspace could theoretically patch the hypercall itself and retry execution. But IMO, userspace should just leave the quirk enabled and accept the default KVM behavior if it still wants hypercall patching. -- Thanks, Oliver
On Thu, Mar 24, 2022, Oliver Upton wrote: > On Thu, Mar 24, 2022 at 06:57:18PM +0100, Paolo Bonzini wrote: > > On 3/24/22 18:44, Sean Christopherson wrote: > > > On Wed, Mar 16, 2022, Oliver Upton wrote: > > > > KVM handles the VMCALL/VMMCALL instructions very strangely. Even though > > > > both of these instructions really should #UD when executed on the wrong > > > > vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the > > > > guest's instruction with the appropriate instruction for the vendor. > > > > Nonetheless, older guest kernels without commit c1118b3602c2 ("x86: kvm: > > > > use alternatives for VMCALL vs. VMMCALL if kernel text is read-only") > > > > do not patch in the appropriate instruction using alternatives, likely > > > > motivating KVM's intervention. > > > > > > > > Add a quirk allowing userspace to opt out of hypercall patching. > > > > > > A quirk may not be appropriate, per Paolo, the whole cross-vendor thing is > > > intentional. > > > > > > https://lore.kernel.org/all/20211210222903.3417968-1-seanjc@google.com > > > > It's intentional, but the days of the patching part are over. > > > > KVM should just call emulate_hypercall if called with the wrong opcode > > (which in turn can be quirked away). That however would be more complex to > > implement because the hypercall path wants to e.g. inject a #UD with > > kvm_queue_exception(). > > > > All this makes me want to just apply Oliver's patch. > > > > > > + if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) { > > > > + ctxt->exception.error_code_valid = false; > > > > + ctxt->exception.vector = UD_VECTOR; > > > > + ctxt->have_exception = true; > > > > + return X86EMUL_PROPAGATE_FAULT; > > > > > > This should return X86EMUL_UNHANDLEABLE instead of manually injecting a #UD. That > > > will also end up generating a #UD in most cases, but will play nice with > > > KVM_CAP_EXIT_ON_EMULATION_FAILURE. > > Sean and I were looking at this together right now, and it turns out > that returning X86EMUL_UNHANDLEABLE at this point will unconditionally > bounce out to userspace. > > x86_decode_emulated_instruction() would need to be the spot we bail to > guard these exits with the CAP. I've spent waaay too much time thinking about this... TL;DR: I'm in favor of applying the patch as-is. My objection to manually injecting the #UD is that there's no guarantee that KVM is actually handling a #UD. It's largely a moot point, as KVM barfs on VMCALL/VMMCALL if it's _not_ from a #UD, e.g. KVM hangs the guest if it does a VMCALL when KVM is emulating due to lack of unrestricted guest. Forcing #UD is probably a net positive in that case, as it will cause KVM to ultimately fail with "emulation error" and bail to userspace. It is possible to handle this in decode (idea below), but it will set us up for pain later. If KVM ever does gain support for truly emulating hypercall, then as Paolo said, the question of whether to emulate the "wrong" hypercall is orthogonal to whether or not to patch. The correct way to check if the "wrong" hypercall should be emulated would be to query the vCPU vendor via guest's CPUID, at which point we _do_ want to get into em_hypercall() on UD to do the CPUID check even if the quirk is disabled. So I agree with Paolo, make it a quirk that guards the patching, and document that because of KVM's deficiencies, that also happens to mean that encountering the "wrong" hypercall is fatal to the guest. Maybe fodder for KVM's new vCPU errata? :-) If we fix that erratum, then the quirk can be modified to only skip the patching. diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index be83c9c8482d..29abeaf605e2 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -5248,9 +5248,15 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int ctxt->execute = opcode.u.execute; - if (unlikely(emulation_type & EMULTYPE_TRAP_UD) && - likely(!(ctxt->d & EmulateOnUD))) - return EMULATION_FAILED; + if (unlikely(emulation_type & EMULTYPE_TRAP_UD)) { + if (likely(!(ctxt->d & EmulateOnUD))) + return EMULATION_FAILED; + + /* blah blah blah */ + if (ctxt->execute == em_hypercall && + !ctxt->has_quirk_fix_hypercall) + return EMULATION_FAILED; + } if (unlikely(ctxt->d & (NotImpl|Stack|Op3264|Sse|Mmx|Intercept|CheckPerm|NearBranch| diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h index 1cbd46cf71f9..35d2f20d368c 100644 --- a/arch/x86/kvm/kvm_emulate.h +++ b/arch/x86/kvm/kvm_emulate.h @@ -305,6 +305,8 @@ struct x86_emulate_ctxt { void *vcpu; const struct x86_emulate_ops *ops; + bool has_quirk_fix_hypercall; + /* Register state before/after emulation. */ unsigned long eflags; unsigned long eip; /* eip before instruction emulation */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 685c4bc453b4..832f85e72978 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7915,6 +7915,8 @@ static struct x86_emulate_ctxt *alloc_emulate_ctxt(struct kvm_vcpu *vcpu) ctxt->vcpu = vcpu; ctxt->ops = &emulate_ops; + ctxt->has_quirk_fix_hypercall = + kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN); vcpu->arch.emulate_ctxt = ctxt; return ctxt; @@ -9291,16 +9293,8 @@ static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt) char instruction[3]; unsigned long rip = kvm_rip_read(vcpu); - /* - * If the quirk is disabled, synthesize a #UD and let the guest pick up - * the pieces. - */ - if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) { - ctxt->exception.error_code_valid = false; - ctxt->exception.vector = UD_VECTOR; - ctxt->have_exception = true; - return X86EMUL_PROPAGATE_FAULT; - } + if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) + return X86EMUL_CONTINUE; static_call(kvm_x86_patch_hypercall)(vcpu, instruction);
On Fri, Mar 25, 2022 at 11:53:05PM +0000, Sean Christopherson wrote: > On Thu, Mar 24, 2022, Oliver Upton wrote: > > On Thu, Mar 24, 2022 at 06:57:18PM +0100, Paolo Bonzini wrote: > > > On 3/24/22 18:44, Sean Christopherson wrote: > > > > On Wed, Mar 16, 2022, Oliver Upton wrote: > > > > > KVM handles the VMCALL/VMMCALL instructions very strangely. Even though > > > > > both of these instructions really should #UD when executed on the wrong > > > > > vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the > > > > > guest's instruction with the appropriate instruction for the vendor. > > > > > Nonetheless, older guest kernels without commit c1118b3602c2 ("x86: kvm: > > > > > use alternatives for VMCALL vs. VMMCALL if kernel text is read-only") > > > > > do not patch in the appropriate instruction using alternatives, likely > > > > > motivating KVM's intervention. > > > > > > > > > > Add a quirk allowing userspace to opt out of hypercall patching. > > > > > > > > A quirk may not be appropriate, per Paolo, the whole cross-vendor thing is > > > > intentional. > > > > > > > > https://lore.kernel.org/all/20211210222903.3417968-1-seanjc@google.com > > > > > > It's intentional, but the days of the patching part are over. > > > > > > KVM should just call emulate_hypercall if called with the wrong opcode > > > (which in turn can be quirked away). That however would be more complex to > > > implement because the hypercall path wants to e.g. inject a #UD with > > > kvm_queue_exception(). > > > > > > All this makes me want to just apply Oliver's patch. > > > > > > > > + if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) { > > > > > + ctxt->exception.error_code_valid = false; > > > > > + ctxt->exception.vector = UD_VECTOR; > > > > > + ctxt->have_exception = true; > > > > > + return X86EMUL_PROPAGATE_FAULT; > > > > > > > > This should return X86EMUL_UNHANDLEABLE instead of manually injecting a #UD. That > > > > will also end up generating a #UD in most cases, but will play nice with > > > > KVM_CAP_EXIT_ON_EMULATION_FAILURE. > > > > Sean and I were looking at this together right now, and it turns out > > that returning X86EMUL_UNHANDLEABLE at this point will unconditionally > > bounce out to userspace. > > > > x86_decode_emulated_instruction() would need to be the spot we bail to > > guard these exits with the CAP. > > I've spent waaay too much time thinking about this... > > TL;DR: I'm in favor of applying the patch as-is. > > My objection to manually injecting the #UD is that there's no guarantee that KVM > is actually handling a #UD. It's largely a moot point, as KVM barfs on VMCALL/VMMCALL > if it's _not_ from a #UD, e.g. KVM hangs the guest if it does a VMCALL when KVM is > emulating due to lack of unrestricted guest. Forcing #UD is probably a net positive > in that case, as it will cause KVM to ultimately fail with "emulation error" and > bail to userspace. > > It is possible to handle this in decode (idea below), but it will set us up for > pain later. If KVM ever does gain support for truly emulating hypercall There was another annoyance that motivated me to sidestep emulation altogether. 'Correct' emulation (or whatever we decide to call what KVM does) of the hypercall instruction would require that we actually inform the emulator about nested for both vendor calls. And by that I mean both {svm,vmx}_check_intercept would need to correctly handle both VMCALL/VMMCALL. The one nice thing about hypercall patching is that we could keep L1 oblivious, as we would have already rewritten the instruction before reflecting the exit to L1. While I was looking at #UD under nested for this issue, I noticed: Isn't there a subtle inversion on #UD intercepts for nVMX? L1 gets first dibs on #UD, even though it is possible that L0 was emulating an instruction not present in hardware (like RDPID). If L1 passed through RDPID the #UD should not be reflected to L1. I believe this would require that we make the emulator aware of nVMX which sounds like a science project on its own. Do we write this off as another erratum of KVM's (virtual) hardware on VMX? :) -- Thanks, Oliver
On Mon, Mar 28, 2022, Oliver Upton wrote: > While I was looking at #UD under nested for this issue, I noticed: > > Isn't there a subtle inversion on #UD intercepts for nVMX? L1 gets first dibs > on #UD, even though it is possible that L0 was emulating an instruction not > present in hardware (like RDPID). If L1 passed through RDPID the #UD > should not be reflected to L1. Yes, it's a known bug. > I believe this would require that we make the emulator aware of nVMX which > sounds like a science project on its own. I don't think it would require any new awareness in the emulator proper, KVM would "just" need to ensure it properly morphs the resulting reflected #UD to a nested VM-Exit if the emulator doesn't "handle" the #UD. In theory, that should Just Work... > Do we write this off as another erratum of KVM's (virtual) hardware on VMX? :) I don't think we write it off entirely, but it's definitely on the backburner because there are so precious few cases where KVM emulates on #UD. And for good reason, e.g. the RDPID case takes an instruction that exists purely to optimize certain flows and turns them into dreadfully sloooow paths.
On Mon, 2022-03-28 at 18:28 +0000, Sean Christopherson wrote: > On Mon, Mar 28, 2022, Oliver Upton wrote: > > While I was looking at #UD under nested for this issue, I noticed: > > > > Isn't there a subtle inversion on #UD intercepts for nVMX? L1 gets first dibs > > on #UD, even though it is possible that L0 was emulating an instruction not > > present in hardware (like RDPID). If L1 passed through RDPID the #UD > > should not be reflected to L1. > > Yes, it's a known bug. > > > I believe this would require that we make the emulator aware of nVMX which > > sounds like a science project on its own. > > I don't think it would require any new awareness in the emulator proper, KVM > would "just" need to ensure it properly morphs the resulting reflected #UD to a > nested VM-Exit if the emulator doesn't "handle" the #UD. In theory, that should > Just Work... > > > Do we write this off as another erratum of KVM's (virtual) hardware on VMX? :) > > I don't think we write it off entirely, but it's definitely on the backburner > because there are so precious few cases where KVM emulates on #UD. And for good > reason, e.g. the RDPID case takes an instruction that exists purely to optimize > certain flows and turns them into dreadfully sloooow paths. > I noticed that 'fix_hypercall_test' selftest fails if run in a VM. The reason is that L0 patches the hypercall before L1 sees it so it can't really do anything about it. Do you think we can always stop patching hypercalls for the nested guest regardless of the quirk, or that too will be considered breaking backwards compatability? Best regards, Maxim Levitsky
On Wed, Aug 24, 2022, Maxim Levitsky wrote: > On Mon, 2022-03-28 at 18:28 +0000, Sean Christopherson wrote: > > On Mon, Mar 28, 2022, Oliver Upton wrote: > > > While I was looking at #UD under nested for this issue, I noticed: > > > > > > Isn't there a subtle inversion on #UD intercepts for nVMX? L1 gets first dibs > > > on #UD, even though it is possible that L0 was emulating an instruction not > > > present in hardware (like RDPID). If L1 passed through RDPID the #UD > > > should not be reflected to L1. > > > > Yes, it's a known bug. > > > > > I believe this would require that we make the emulator aware of nVMX which > > > sounds like a science project on its own. > > > > I don't think it would require any new awareness in the emulator proper, KVM > > would "just" need to ensure it properly morphs the resulting reflected #UD to a > > nested VM-Exit if the emulator doesn't "handle" the #UD. In theory, that should > > Just Work... > > > > > Do we write this off as another erratum of KVM's (virtual) hardware on VMX? :) > > > > I don't think we write it off entirely, but it's definitely on the backburner > > because there are so precious few cases where KVM emulates on #UD. And for good > > reason, e.g. the RDPID case takes an instruction that exists purely to optimize > > certain flows and turns them into dreadfully sloooow paths. > > > > I noticed that 'fix_hypercall_test' selftest fails if run in a VM. The reason is > that L0 patches the hypercall before L1 sees it so it can't really do anything > about it. > > Do you think we can always stop patching hypercalls for the nested guest regardless > of the quirk, or that too will be considered breaking backwards compatability? Heh, go run it on Intel, problem solved ;-) As discussed last year[*], it's impossible to get this right in all cases, ignoring the fact that patching in the first place is arguably wrong. E.g. if KVM is running on AMD hardware and L0 exposes an Intel vCPU to L1, then it sadly becomes KVM's responsibility to patch L2 because from L1's perspective, a #UD on Intel's VMCALL in L2 is spurious. Regardless of what path we take, I do think we should align VMX and SVM on exception intercept behavior. [*] https://lore.kernel.org/all/YEZUhbBtNjWh0Zka@google.com
On Wed, 2022-08-24 at 14:43 +0000, Sean Christopherson wrote: > On Wed, Aug 24, 2022, Maxim Levitsky wrote: > > On Mon, 2022-03-28 at 18:28 +0000, Sean Christopherson wrote: > > > On Mon, Mar 28, 2022, Oliver Upton wrote: > > > > While I was looking at #UD under nested for this issue, I noticed: > > > > > > > > Isn't there a subtle inversion on #UD intercepts for nVMX? L1 gets first dibs > > > > on #UD, even though it is possible that L0 was emulating an instruction not > > > > present in hardware (like RDPID). If L1 passed through RDPID the #UD > > > > should not be reflected to L1. > > > > > > Yes, it's a known bug. > > > > > > > I believe this would require that we make the emulator aware of nVMX which > > > > sounds like a science project on its own. > > > > > > I don't think it would require any new awareness in the emulator proper, KVM > > > would "just" need to ensure it properly morphs the resulting reflected #UD to a > > > nested VM-Exit if the emulator doesn't "handle" the #UD. In theory, that should > > > Just Work... > > > > > > > Do we write this off as another erratum of KVM's (virtual) hardware on VMX? :) > > > > > > I don't think we write it off entirely, but it's definitely on the backburner > > > because there are so precious few cases where KVM emulates on #UD. And for good > > > reason, e.g. the RDPID case takes an instruction that exists purely to optimize > > > certain flows and turns them into dreadfully sloooow paths. > > > > > > > I noticed that 'fix_hypercall_test' selftest fails if run in a VM. The reason is > > that L0 patches the hypercall before L1 sees it so it can't really do anything > > about it. > > > > Do you think we can always stop patching hypercalls for the nested guest regardless > > of the quirk, or that too will be considered breaking backwards compatability? > > Heh, go run it on Intel, problem solved ;-) > > As discussed last year[*], it's impossible to get this right in all cases, ignoring > the fact that patching in the first place is arguably wrong. E.g. if KVM is running > on AMD hardware and L0 exposes an Intel vCPU to L1, then it sadly becomes KVM's > responsibility to patch L2 because from L1's perspective, a #UD on Intel's VMCALL > in L2 is spurious. > > Regardless of what path we take, I do think we should align VMX and SVM on exception > intercept behavior. Maybe then we should at least skip the unit test if running nested (should be easy to check the hypervisor cpuid)? Oh well, I do understand you that the whole 'patching' thing is one big mess :( I wonder how hard it will be to ask Qemu to disable this quirk.... Best regards, Maxim Levitsky > > [*] https://lore.kernel.org/all/YEZUhbBtNjWh0Zka@google.com >
On 8/24/22 17:06, Maxim Levitsky wrote:
> I wonder how hard it will be to ask Qemu to disable this quirk....
I think we should just do it. Send a patch. :)
Paolo
On Wed, Aug 24, 2022, Maxim Levitsky wrote: > On Wed, 2022-08-24 at 14:43 +0000, Sean Christopherson wrote: > > On Wed, Aug 24, 2022, Maxim Levitsky wrote: > > > I noticed that 'fix_hypercall_test' selftest fails if run in a VM. The reason is > > > that L0 patches the hypercall before L1 sees it so it can't really do anything > > > about it. > > > > > > Do you think we can always stop patching hypercalls for the nested guest regardless > > > of the quirk, or that too will be considered breaking backwards compatability? > > > > Heh, go run it on Intel, problem solved ;-) > > > > As discussed last year[*], it's impossible to get this right in all cases, ignoring > > the fact that patching in the first place is arguably wrong. E.g. if KVM is running > > on AMD hardware and L0 exposes an Intel vCPU to L1, then it sadly becomes KVM's > > responsibility to patch L2 because from L1's perspective, a #UD on Intel's VMCALL > > in L2 is spurious. > > > > Regardless of what path we take, I do think we should align VMX and SVM on exception > > intercept behavior. > > Maybe then we should at least skip the unit test if running nested (should be > easy to check the hypervisor cpuid)? My preference is to keep the test as is. It's not completely useless in a VM, e.g. Intel works (currently), non-KVM VMs may or may not work, and VMMs that disable the quirk in L0 will also work. max_guest_memory_test is in a similar boat. Running that in L1 is not recommended as KVM's shadow paging just can't keep up. But that doesn't mean that the test should _never_ be run in L1. If we have the test skip itself, then opting in requires a code change. On the other hand, skipping the test in whatever script is being used to run selftests is easy enough, e.g. `grep hypervisor /proc/cpuinfo`. IMO running test via `make` is a complete mess and should be avoided anyways :-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 507be67746b0..862314e156ae 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -7233,6 +7233,15 @@ The valid bits in cap.args[0] are: Additionally, when this quirk is disabled, KVM clears CPUID.01H:ECX[bit 3] if IA32_MISC_ENABLE[bit 18] is cleared. + + KVM_X86_QUIRK_FIX_HYPERCALL_INSN By default, KVM rewrites guest + VMMCALL/VMCALL instructions to match the + vendor's hypercall instruction for the + system. When this quirk is disabled, KVM + will no longer rewrite invalid guest + hypercall instructions. Executing the + incorrect hypercall instruction will + generate a #UD within the guest. =================================== ============================================ 8. Other capabilities. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 9694dd5e6ccc..832e9af24a85 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1996,6 +1996,7 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages); KVM_X86_QUIRK_CD_NW_CLEARED | \ KVM_X86_QUIRK_LAPIC_MMIO_HOLE | \ KVM_X86_QUIRK_OUT_7E_INC_RIP | \ - KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) + KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT | \ + KVM_X86_QUIRK_FIX_HYPERCALL_INSN) #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index bf6e96011dfe..21614807a2cb 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -428,11 +428,12 @@ struct kvm_sync_regs { struct kvm_vcpu_events events; }; -#define KVM_X86_QUIRK_LINT0_REENABLED (1 << 0) -#define KVM_X86_QUIRK_CD_NW_CLEARED (1 << 1) -#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE (1 << 2) -#define KVM_X86_QUIRK_OUT_7E_INC_RIP (1 << 3) -#define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT (1 << 4) +#define KVM_X86_QUIRK_LINT0_REENABLED (1 << 0) +#define KVM_X86_QUIRK_CD_NW_CLEARED (1 << 1) +#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE (1 << 2) +#define KVM_X86_QUIRK_OUT_7E_INC_RIP (1 << 3) +#define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT (1 << 4) +#define KVM_X86_QUIRK_FIX_HYPERCALL_INSN (1 << 5) #define KVM_STATE_NESTED_FORMAT_VMX 0 #define KVM_STATE_NESTED_FORMAT_SVM 1 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d3a9ce07a565..685c4bc453b4 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9291,6 +9291,17 @@ static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt) char instruction[3]; unsigned long rip = kvm_rip_read(vcpu); + /* + * If the quirk is disabled, synthesize a #UD and let the guest pick up + * the pieces. + */ + if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) { + ctxt->exception.error_code_valid = false; + ctxt->exception.vector = UD_VECTOR; + ctxt->have_exception = true; + return X86EMUL_PROPAGATE_FAULT; + } + static_call(kvm_x86_patch_hypercall)(vcpu, instruction); return emulator_write_emulated(ctxt, rip, instruction, 3,
KVM handles the VMCALL/VMMCALL instructions very strangely. Even though both of these instructions really should #UD when executed on the wrong vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the guest's instruction with the appropriate instruction for the vendor. Nonetheless, older guest kernels without commit c1118b3602c2 ("x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only") do not patch in the appropriate instruction using alternatives, likely motivating KVM's intervention. Add a quirk allowing userspace to opt out of hypercall patching. If the quirk is disabled, KVM synthesizes a #UD in the guest. Signed-off-by: Oliver Upton <oupton@google.com> --- Documentation/virt/kvm/api.rst | 9 +++++++++ arch/x86/include/asm/kvm_host.h | 3 ++- arch/x86/include/uapi/asm/kvm.h | 11 ++++++----- arch/x86/kvm/x86.c | 11 +++++++++++ 4 files changed, 28 insertions(+), 6 deletions(-)