diff mbox series

[2/3] KVM: vmx, svm, mmu: Process MMIO during event delivery

Message ID 20240927161657.68110-3-iorlov@amazon.com (mailing list archive)
State New
Headers show
Series Handle MMIO during event delivery error on SVM | expand

Commit Message

Ivan Orlov Sept. 27, 2024, 4:16 p.m. UTC
Currently, the situation when guest accesses MMIO during event delivery
is handled differently in VMX and SVM: on VMX KVM returns internal error
with suberror = KVM_INTERNAL_ERROR_DELIVERY_EV, when SVM simply goes
into infinite loop trying to deliver an event again and again.

Such a situation could happen when the exception occurs with guest IDTR
(or GDTR) descriptor base pointing to an MMIO address.

Even with fixes for infinite loops on TDP failures applied, the problem
still exists on SVM.

Eliminate the SVM/VMX difference by returning a KVM internal error with
suberror = KVM_INTERNAL_ERROR_DELIVERY_EV for both SVM and VMX. As we
don't have a reliable way to detect MMIO operation on SVM before
actually looking at the GPA, move the problem detection into the common
KVM x86 layer (into the kvm_mmu_page_fault function) and add the
PFERR_EVT_DELIVERY flag which gets set in the SVM/VMX specific vmexit
handler to signal that we are in the middle of the event delivery.

This way we won't introduce much overhead for VMX platform either, as
the situation when the guest accesses MMIO during event delivery is
quite rare and shouldn't happen frequently.

Signed-off-by: Ivan Orlov <iorlov@amazon.com>
---
 arch/x86/include/asm/kvm_host.h |  6 ++++++
 arch/x86/kvm/mmu/mmu.c          | 15 ++++++++++++++-
 arch/x86/kvm/svm/svm.c          |  4 ++++
 arch/x86/kvm/vmx/vmx.c          | 21 +++++++++------------
 4 files changed, 33 insertions(+), 13 deletions(-)

Comments

Sean Christopherson Oct. 12, 2024, 12:05 a.m. UTC | #1
Same complaints on the shortlog scope.

On Fri, Sep 27, 2024, Ivan Orlov wrote:
> Currently, the situation when guest accesses MMIO during event delivery
> is handled differently in VMX and SVM: on VMX KVM returns internal error
> with suberror = KVM_INTERNAL_ERROR_DELIVERY_EV, when SVM simply goes
> into infinite loop trying to deliver an event again and again.
> 
> Such a situation could happen when the exception occurs with guest IDTR
> (or GDTR) descriptor base pointing to an MMIO address.
> 
> Even with fixes for infinite loops on TDP failures applied, the problem
> still exists on SVM.
> 
> Eliminate the SVM/VMX difference by returning a KVM internal error with
> suberror = KVM_INTERNAL_ERROR_DELIVERY_EV for both SVM and VMX. As we
> don't have a reliable way to detect MMIO operation on SVM before
> actually looking at the GPA, move the problem detection into the common
> KVM x86 layer (into the kvm_mmu_page_fault function) and add the
> PFERR_EVT_DELIVERY flag which gets set in the SVM/VMX specific vmexit
> handler to signal that we are in the middle of the event delivery.
> 
> This way we won't introduce much overhead for VMX platform either, as
> the situation when the guest accesses MMIO during event delivery is
> quite rare and shouldn't happen frequently.
> 
> Signed-off-by: Ivan Orlov <iorlov@amazon.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  6 ++++++
>  arch/x86/kvm/mmu/mmu.c          | 15 ++++++++++++++-
>  arch/x86/kvm/svm/svm.c          |  4 ++++
>  arch/x86/kvm/vmx/vmx.c          | 21 +++++++++------------
>  4 files changed, 33 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 348daba424dd..a1088239c9f5 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -282,6 +282,12 @@ enum x86_intercept_stage;
>  #define PFERR_PRIVATE_ACCESS   BIT_ULL(49)
>  #define PFERR_SYNTHETIC_MASK   (PFERR_IMPLICIT_ACCESS | PFERR_PRIVATE_ACCESS)
>  
> +/*
> + * EVT_DELIVERY is a KVM-defined flag used to indicate that a fault occurred
> + * during event delivery.
> + */
> +#define PFERR_EVT_DELIVERY     BIT_ULL(50)

This is very clearly a synthetic flag.  See the mask above, and the warnings
that fire.  I think it's a moot point though (see below).

> +
>  /* apic attention bits */
>  #define KVM_APIC_CHECK_VAPIC	0
>  /*
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e081f785fb23..36e25a6ba364 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6120,8 +6120,21 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>  			return -EFAULT;
>  
>  		r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
> -		if (r == RET_PF_EMULATE)
> +		if (r == RET_PF_EMULATE) {
> +			/*
> +			 * Check if the guest is accessing MMIO during event delivery. For
> +			 * instance, it could happen if the guest sets IDT / GDT descriptor
> +			 * base to point to an MMIO address. We can't deliver such an event

Too many pronouns.

> +			 * without VMM intervention, so return a corresponding internal error
> +			 * instead (otherwise, vCPU will fall into infinite loop trying to
> +			 * deliver the event again and again).
> +			 */
> +			if (error_code & PFERR_EVT_DELIVERY) {

Hmm, I'm 99% certain handling error in this location is wrong, and I'm also pretty
sure it's unnecessary.  Or rather, the synthetic error code is unnecessary.

It's wrong because this path specifically handles "cached" MMIO, i.e. emulated
MMIO that is triggered by a special MMIO SPTE.  KVM should punt to userspace on
_any_ MMIO emulation.  KVM has gotten away with the flaw because SVM is completely
broken, and VMX can always generate reserved EPTEs.  But with SVM, on CPUs with
MAXPHYADDR=52, KVM can't generate a reserved #PF, i.e. can't do cached MMIO, and
so I'm pretty sure your test would fail on those CPUs since they'll never come
down this path.

Heh, though I bet the introduction of RET_PF_WRITE_PROTECTED has regressed shadow
paging on CPUs with PA52.

Anyways, the synthetic PFERR flag is unnecessary because the information is readily
available to {vmx,svm}_check_emulate_instruction().  Ha!  And EMULTYPE_WRITE_PF_TO_SP
means vendor code can even precisely identify MMIO.

I think another X86EMUL_* return type is needed, but that's better than a synthetic
#PF error code flag.

> +				kvm_prepare_ev_delivery_failure_exit(vcpu, cr2_or_gpa, true);
> +				return 0;
> +			}
>  			goto emulate;
> +		}
>  	}
>  
>  	if (r == RET_PF_INVALID) {
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 9df3e1e5ae81..93ce8c3d02f3 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2059,6 +2059,10 @@ static int npf_interception(struct kvm_vcpu *vcpu)
>  	u64 fault_address = svm->vmcb->control.exit_info_2;
>  	u64 error_code = svm->vmcb->control.exit_info_1;
>  
> +	/* Check if we have events awaiting delivery */

No "we".  And this is inaccurate.  The event isn't _awaiting_ delivery, the CPU
was smack dab in the middle of delivering the event.

> +	if (svm->vmcb->control.exit_int_info & SVM_EXITINTINFO_TYPE_MASK)
> +		error_code |= PFERR_EVT_DELIVERY;
> +
>  	/*
>  	 * WARN if hardware generates a fault with an error code that collides
>  	 * with KVM-defined sythentic flags.  Clear the flags and continue on,
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index afd785e7f3a3..bbe1126c3c7d 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -5828,6 +5828,11 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>  static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
>  {
>  	gpa_t gpa;
> +	u64 error_code = PFERR_RSVD_MASK;
> +
> +	/* Do we have events awaiting delivery? */
> +	error_code |= (to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK)
> +		     ? PFERR_EVT_DELIVERY : 0;
>  
>  	if (vmx_check_emulate_instruction(vcpu, EMULTYPE_PF, NULL, 0))
>  		return 1;
> @@ -5843,7 +5848,7 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
>  		return kvm_skip_emulated_instruction(vcpu);
>  	}
>  
> -	return kvm_mmu_page_fault(vcpu, gpa, PFERR_RSVD_MASK, NULL, 0);
> +	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
>  }
>  
>  static int handle_nmi_window(struct kvm_vcpu *vcpu)
> @@ -6536,24 +6541,16 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
>  		return 0;
>  	}
>  
> -	/*
> -	 * Note:
> -	 * Do not try to fix EXIT_REASON_EPT_MISCONFIG if it caused by
> -	 * delivery event since it indicates guest is accessing MMIO.
> -	 * The vm-exit can be triggered again after return to guest that
> -	 * will cause infinite loop.
> -	 */
>  	if ((vectoring_info & VECTORING_INFO_VALID_MASK) &&
>  	    (exit_reason.basic != EXIT_REASON_EXCEPTION_NMI &&
>  	     exit_reason.basic != EXIT_REASON_EPT_VIOLATION &&
>  	     exit_reason.basic != EXIT_REASON_PML_FULL &&
>  	     exit_reason.basic != EXIT_REASON_APIC_ACCESS &&
>  	     exit_reason.basic != EXIT_REASON_TASK_SWITCH &&
> -	     exit_reason.basic != EXIT_REASON_NOTIFY)) {
> +	     exit_reason.basic != EXIT_REASON_NOTIFY &&
> +	     exit_reason.basic != EXIT_REASON_EPT_MISCONFIG)) {

Changing the behavior of EPT_MISCONFIG belongs in a separate commit.

Huh, and that's technically a bug fix.  If userspace _creates_ a memslot, KVM
doesn't eagerly zap MMIO SPTEs and instead relies on vcpu_match_mmio_gen() to
force kvm_mmu_page_fault() down the actual page fault path.  If the guest somehow
manages to generate an access to the new page while vectoring an event, KVM will
spuriously exit to userspace instead of trying to fault-in the new page.

It's _ridiculously_ contrived, but technically a bug.

Ugh, and the manual call to vmx_check_emulate_instruction() in handle_ept_misconfig()
is similarly flawed, though encountering that is even more contrived as that only
affects accesses from SGX enclaves.

Hmm, and looking at all of this, SVM doesn't take advantage of KVM_FAST_MMIO_BUS.
Unless I'm forgetting some fundamental incompatibility, SVM can do fast MMIO so
long as next_rip is valid.

Anyways, no need to deal with vmx_check_emulate_instruction() or fast MMIO, I'll
tackle that in a separate series.  But for this series, please do the EPT misconfig
in a separate patch from fixing SVM.  E.g. extract the helper, convert VMX to the
new flow, and then teach SVM to do the same.

>  		gpa_t gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> -		bool is_mmio = exit_reason.basic == EXIT_REASON_EPT_MISCONFIG;
> -

Blank newline after variable declarations.

> -		kvm_prepare_ev_delivery_failure_exit(vcpu, gpa, is_mmio);
> +		kvm_prepare_ev_delivery_failure_exit(vcpu, gpa, false);
>  		return 0;
>  	}

All in all, I think this is the basic gist?  Definitely feel free to propose a
better name than X86EMUL_UNHANDLEABLE_VECTORING.

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9df3e1e5ae81..7a96bf5af015 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4800,6 +4800,10 @@ static int svm_check_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type,
        bool smep, smap, is_user;
        u64 error_code;
 
+       if (svm->vmcb->control.exit_int_info & SVM_EXITINTINFO_TYPE_MASK &&
+           is_emul_type_mmio(emul_type))
+               return X86EMUL_UNHANDLEABLE_VECTORING;
+
        /* Emulation is always possible when KVM has access to all guest state. */
        if (!sev_guest(vcpu->kvm))
                return X86EMUL_CONTINUE;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1b9cc1032010..43bc73b60c2b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1693,6 +1693,10 @@ static int vmx_rtit_ctl_check(struct kvm_vcpu *vcpu, u64 data)
 int vmx_check_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type,
                                  void *insn, int insn_len)
 {
+       if (to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK &&
+           is_emul_type_mmio(emul_type))
+               return X86EMUL_UNHANDLEABLE_VECTORING;
+
        /*
         * Emulation of instructions in SGX enclaves is impossible as RIP does
         * not point at the failing instruction, and even if it did, the code
@@ -6536,13 +6540,6 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
                return 0;
        }
 
-       /*
-        * Note:
-        * Do not try to fix EXIT_REASON_EPT_MISCONFIG if it caused by
-        * delivery event since it indicates guest is accessing MMIO.
-        * The vm-exit can be triggered again after return to guest that
-        * will cause infinite loop.
-        */
        if ((vectoring_info & VECTORING_INFO_VALID_MASK) &&
            (exit_reason.basic != EXIT_REASON_EXCEPTION_NMI &&
             exit_reason.basic != EXIT_REASON_EPT_VIOLATION &&
@@ -6550,10 +6547,7 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
             exit_reason.basic != EXIT_REASON_APIC_ACCESS &&
             exit_reason.basic != EXIT_REASON_TASK_SWITCH &&
             exit_reason.basic != EXIT_REASON_NOTIFY)) {
-               gpa_t gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
-               bool is_mmio = exit_reason.basic == EXIT_REASON_EPT_MISCONFIG;
-
-               kvm_prepare_ev_delivery_failure_exit(vcpu, gpa, is_mmio);
+               kvm_prepare_ev_delivery_failure_exit(vcpu, INVALID_GPA);
                return 0;
        }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8ee67fc23e5d..d5cd254aaca7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9122,6 +9122,11 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
                if (r == X86EMUL_RETRY_INSTR || r == X86EMUL_PROPAGATE_FAULT)
                        return 1;
 
+               if (r == X86EMUL_UNHANDLEABLE_EVENT_VECTORING) {
+                       kvm_prepare_event_vectoring_exit(vcpu, cr2_or_gpa);
+                       return 0;
+               }
+
                WARN_ON_ONCE(r != X86EMUL_UNHANDLEABLE);
                return handle_emulation_failure(vcpu, emulation_type);
        }
Ivan Orlov Oct. 16, 2024, 10:53 p.m. UTC | #2
On 10/12/24 01:05, Sean Christopherson wrote:
> 
>> +			 * without VMM intervention, so return a corresponding internal error
>> +			 * instead (otherwise, vCPU will fall into infinite loop trying to
>> +			 * deliver the event again and again).
>> +			 */
>> +			if (error_code & PFERR_EVT_DELIVERY) {
> 
> Hmm, I'm 99% certain handling error in this location is wrong, and I'm also pretty
> sure it's unnecessary.  Or rather, the synthetic error code is unnecessary.
> 
> It's wrong because this path specifically handles "cached" MMIO, i.e. emulated
> MMIO that is triggered by a special MMIO SPTE.  KVM should punt to userspace on
> _any_ MMIO emulation.  KVM has gotten away with the flaw because SVM is completely
> broken, and VMX can always generate reserved EPTEs.  But with SVM, on CPUs with
> MAXPHYADDR=52, KVM can't generate a reserved #PF, i.e. can't do cached MMIO, and
> so I'm pretty sure your test would fail on those CPUs since they'll never come
> down this path.
> 

Ah, alright, I see... Probably, I need to test the next version with 
enable_mmio_caching=false as well.

> Heh, though I bet the introduction of RET_PF_WRITE_PROTECTED has regressed shadow
> paging on CPUs with PA52.
> 

Is it because it doesn't process write-protected gfn correctly if it is 
in MMIO range when mmio caching is disabled?

> Anyways, the synthetic PFERR flag is unnecessary because the information is readily
> available to {vmx,svm}_check_emulate_instruction().  Ha!  And EMULTYPE_WRITE_PF_TO_SP
> means vendor code can even precisely identify MMIO.
> 

Hmm, do you mean EMULTYPE_PF? It looks like EMULTYPE_WRITE_PF_TO_SP has 
nothing to do with MMIO...

I thought about processing the error in check_emulate_instruction as it 
seems logical, however I hadn't found a way to detect MMIO without page 
walking on SVM. I'll validate that EMULTYPE_PF gets set in all of the 
MMIO cases and move the handling into this function in V2 if it works.

> I think another X86EMUL_* return type is needed, but that's better than a synthetic
> #PF error code flag.
> 

If I understand correctly, you suggest returning this new 
X86EMUL_<something> code from {svm,vmx}_check_emulate_instruction and 
process it in the common code, right? I agree that it's much better than 
handling the error in MMU code. We are gonna return this return type 
from vendor code and handle it in the common code this way, which is neat!

>>   
>> -	/*
>> -	 * Note:
>> -	 * Do not try to fix EXIT_REASON_EPT_MISCONFIG if it caused by
>> -	 * delivery event since it indicates guest is accessing MMIO.
>> -	 * The vm-exit can be triggered again after return to guest that
>> -	 * will cause infinite loop.
>> -	 */
>>   	if ((vectoring_info & VECTORING_INFO_VALID_MASK) &&
>>   	    (exit_reason.basic != EXIT_REASON_EXCEPTION_NMI &&
>>   	     exit_reason.basic != EXIT_REASON_EPT_VIOLATION &&
>>   	     exit_reason.basic != EXIT_REASON_PML_FULL &&
>>   	     exit_reason.basic != EXIT_REASON_APIC_ACCESS &&
>>   	     exit_reason.basic != EXIT_REASON_TASK_SWITCH &&
>> -	     exit_reason.basic != EXIT_REASON_NOTIFY)) {
>> +	     exit_reason.basic != EXIT_REASON_NOTIFY &&
>> +	     exit_reason.basic != EXIT_REASON_EPT_MISCONFIG)) {
> 
> Changing the behavior of EPT_MISCONFIG belongs in a separate commit.
> 

I will extract the vmx-specific changes into separate commit in V2, thanks!

> Huh, and that's technically a bug fix.  If userspace _creates_ a memslot, KVM
> doesn't eagerly zap MMIO SPTEs and instead relies on vcpu_match_mmio_gen() to
> force kvm_mmu_page_fault() down the actual page fault path.  If the guest somehow
> manages to generate an access to the new page while vectoring an event, KVM will
> spuriously exit to userspace instead of trying to fault-in the new page.
> 
> It's _ridiculously_ contrived, but technically a bug.
> 

That's amazing, I finally introduced an unintentional bugfix (usually 
it's other way around) :D

> Ugh, and the manual call to vmx_check_emulate_instruction() in handle_ept_misconfig()
> is similarly flawed, though encountering that is even more contrived as that only
> affects accesses from SGX enclaves.
> 
> Hmm, and looking at all of this, SVM doesn't take advantage of KVM_FAST_MMIO_BUS.
> Unless I'm forgetting some fundamental incompatibility, SVM can do fast MMIO so
> long as next_rip is valid.
> 
> Anyways, no need to deal with vmx_check_emulate_instruction() or fast MMIO, I'll
> tackle that in a separate series.  But for this series, please do the EPT misconfig
> in a separate patch from fixing SVM.  E.g. extract the helper, convert VMX to the
> new flow, and then teach SVM to do the same.
> 

Hmm, implementing KVM_FAST_MMIO_BUS for SVM sounds like an interesting 
thing to do, please let me know if I could help. By the way, why can't 
we move the call to kvm_io_bus_write into the common code (e.g. MMU)? It 
would remove the need of implementing KVM_FAST_MMIO_BUS specifically for 
each vendor.

>>   		gpa_t gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
>> -		bool is_mmio = exit_reason.basic == EXIT_REASON_EPT_MISCONFIG;
>> -
> 
> Blank newline after variable declarations.
> 
>> -		kvm_prepare_ev_delivery_failure_exit(vcpu, gpa, is_mmio);
>> +		kvm_prepare_ev_delivery_failure_exit(vcpu, gpa, false);
>>   		return 0;
>>   	}
> 
> All in all, I think this is the basic gist?  Definitely feel free to propose a
> better name than X86EMUL_UNHANDLEABLE_VECTORING.
> 

It sounds OK, but maybe something more precise would work, like 
X86EMUL_VECTORING_IO_NEEDED (by analogy with X86EMUL_IO_NEEDED)?

Thanks a lot for the review.
Sean Christopherson Oct. 17, 2024, 4:20 p.m. UTC | #3
On Wed, Oct 16, 2024, Ivan Orlov wrote:
> On 10/12/24 01:05, Sean Christopherson wrote:
> > 
> > > +			 * without VMM intervention, so return a corresponding internal error
> > > +			 * instead (otherwise, vCPU will fall into infinite loop trying to
> > > +			 * deliver the event again and again).
> > > +			 */
> > > +			if (error_code & PFERR_EVT_DELIVERY) {
> > 
> > Hmm, I'm 99% certain handling error in this location is wrong, and I'm also pretty
> > sure it's unnecessary.  Or rather, the synthetic error code is unnecessary.
> > 
> > It's wrong because this path specifically handles "cached" MMIO, i.e. emulated
> > MMIO that is triggered by a special MMIO SPTE.  KVM should punt to userspace on
> > _any_ MMIO emulation.  KVM has gotten away with the flaw because SVM is completely
> > broken, and VMX can always generate reserved EPTEs.  But with SVM, on CPUs with
> > MAXPHYADDR=52, KVM can't generate a reserved #PF, i.e. can't do cached MMIO, and
> > so I'm pretty sure your test would fail on those CPUs since they'll never come
> > down this path.
> > 
> 
> Ah, alright, I see... Probably, I need to test the next version with
> enable_mmio_caching=false as well.
> 
> > Heh, though I bet the introduction of RET_PF_WRITE_PROTECTED has regressed
> > shadow paging on CPUs with PA52.
> 
> Is it because it doesn't process write-protected gfn correctly if it is in
> MMIO range when mmio caching is disabled?

Ignore this, I was thinking lack of cached MMIO SPTEs would result in no
EPT Misconfig, but it's shadow paging, there is no EPT.

> > Anyways, the synthetic PFERR flag is unnecessary because the information is readily
> > available to {vmx,svm}_check_emulate_instruction().  Ha!  And EMULTYPE_WRITE_PF_TO_SP
> > means vendor code can even precisely identify MMIO.
> 
> Hmm, do you mean EMULTYPE_PF? It looks like EMULTYPE_WRITE_PF_TO_SP has
> nothing to do with MMIO...

Nope.  Well, both.  EMULTYPE_PF is set if *any* page fault triggers emulation.
EMULTYPE_WRITE_PF_TO_SP is set if emulation was triggered by a write protection
violation due to write-tracking, and write-tracking requires an underlying memslot.
I.e. if EMULTYPE_WRITE_PF_TO_SP is set, then emulation *can't* be for emulated
MMIO.

> I thought about processing the error in check_emulate_instruction as it
> seems logical, however I hadn't found a way to detect MMIO without page
> walking on SVM. I'll validate that EMULTYPE_PF gets set in all of the MMIO
> cases and move the handling into this function in V2 if it works.
> 
> > I think another X86EMUL_* return type is needed, but that's better than a synthetic
> > #PF error code flag.
> > 
> 
> If I understand correctly, you suggest returning this new
> X86EMUL_<something> code from {svm,vmx}_check_emulate_instruction and
> process it in the common code, right? I agree that it's much better than
> handling the error in MMU code. We are gonna return this return type from
> vendor code and handle it in the common code this way, which is neat!

Yep, exactly.

> > Ugh, and the manual call to vmx_check_emulate_instruction() in handle_ept_misconfig()
> > is similarly flawed, though encountering that is even more contrived as that only
> > affects accesses from SGX enclaves.
> > 
> > Hmm, and looking at all of this, SVM doesn't take advantage of KVM_FAST_MMIO_BUS.
> > Unless I'm forgetting some fundamental incompatibility, SVM can do fast MMIO so
> > long as next_rip is valid.
> > 
> > Anyways, no need to deal with vmx_check_emulate_instruction() or fast MMIO, I'll
> > tackle that in a separate series.  But for this series, please do the EPT misconfig
> > in a separate patch from fixing SVM.  E.g. extract the helper, convert VMX to the
> > new flow, and then teach SVM to do the same.
> > 
> 
> Hmm, implementing KVM_FAST_MMIO_BUS for SVM sounds like an interesting thing
> to do, please let me know if I could help. By the way, why can't we move the
> call to kvm_io_bus_write into the common code (e.g. MMU)? It would remove
> the need of implementing KVM_FAST_MMIO_BUS specifically for each vendor.

That would work too, but vendor code needs to be aware of "fast" MMIO no matter
what, because there are vendor specific conditions that make fast MMIO impossible
(KVM needs the CPU to provide the instruction length, otherwise KVM needs to
emulate the instruction in order to decode it, which makes fast MMIO not-fast).

> > >   		gpa_t gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> > > -		bool is_mmio = exit_reason.basic == EXIT_REASON_EPT_MISCONFIG;
> > > -
> > 
> > Blank newline after variable declarations.
> > 
> > > -		kvm_prepare_ev_delivery_failure_exit(vcpu, gpa, is_mmio);
> > > +		kvm_prepare_ev_delivery_failure_exit(vcpu, gpa, false);
> > >   		return 0;
> > >   	}
> > 
> > All in all, I think this is the basic gist?  Definitely feel free to propose a
> > better name than X86EMUL_UNHANDLEABLE_VECTORING.
> > 
> 
> It sounds OK, but maybe something more precise would work, like
> X86EMUL_VECTORING_IO_NEEDED (by analogy with X86EMUL_IO_NEEDED)?

Hmm, I definitely want X86EMUL_UNHANDLEABLE_XXX, so that it's super clear that
emulation failed.  Maybe X86EMUL_UNHANDLEABLE_IO_VECTORING?  Sadly, I think we
need VECTORING (or similar) in the name, because it will be morphed to
KVM_INTERNAL_ERROR_DELIVERY_EV.  E.g. simply X86EMUL_UNHANDLEABLE_IO would be
nice, but is wildly misleading as there are other scenarios where KVM can't
handled emulated MMIO during instruction emulation.  :-/
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 348daba424dd..a1088239c9f5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -282,6 +282,12 @@  enum x86_intercept_stage;
 #define PFERR_PRIVATE_ACCESS   BIT_ULL(49)
 #define PFERR_SYNTHETIC_MASK   (PFERR_IMPLICIT_ACCESS | PFERR_PRIVATE_ACCESS)
 
+/*
+ * EVT_DELIVERY is a KVM-defined flag used to indicate that a fault occurred
+ * during event delivery.
+ */
+#define PFERR_EVT_DELIVERY     BIT_ULL(50)
+
 /* apic attention bits */
 #define KVM_APIC_CHECK_VAPIC	0
 /*
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e081f785fb23..36e25a6ba364 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6120,8 +6120,21 @@  int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 			return -EFAULT;
 
 		r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
-		if (r == RET_PF_EMULATE)
+		if (r == RET_PF_EMULATE) {
+			/*
+			 * Check if the guest is accessing MMIO during event delivery. For
+			 * instance, it could happen if the guest sets IDT / GDT descriptor
+			 * base to point to an MMIO address. We can't deliver such an event
+			 * without VMM intervention, so return a corresponding internal error
+			 * instead (otherwise, vCPU will fall into infinite loop trying to
+			 * deliver the event again and again).
+			 */
+			if (error_code & PFERR_EVT_DELIVERY) {
+				kvm_prepare_ev_delivery_failure_exit(vcpu, cr2_or_gpa, true);
+				return 0;
+			}
 			goto emulate;
+		}
 	}
 
 	if (r == RET_PF_INVALID) {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9df3e1e5ae81..93ce8c3d02f3 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2059,6 +2059,10 @@  static int npf_interception(struct kvm_vcpu *vcpu)
 	u64 fault_address = svm->vmcb->control.exit_info_2;
 	u64 error_code = svm->vmcb->control.exit_info_1;
 
+	/* Check if we have events awaiting delivery */
+	if (svm->vmcb->control.exit_int_info & SVM_EXITINTINFO_TYPE_MASK)
+		error_code |= PFERR_EVT_DELIVERY;
+
 	/*
 	 * WARN if hardware generates a fault with an error code that collides
 	 * with KVM-defined sythentic flags.  Clear the flags and continue on,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index afd785e7f3a3..bbe1126c3c7d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5828,6 +5828,11 @@  static int handle_ept_violation(struct kvm_vcpu *vcpu)
 static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
 {
 	gpa_t gpa;
+	u64 error_code = PFERR_RSVD_MASK;
+
+	/* Do we have events awaiting delivery? */
+	error_code |= (to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK)
+		     ? PFERR_EVT_DELIVERY : 0;
 
 	if (vmx_check_emulate_instruction(vcpu, EMULTYPE_PF, NULL, 0))
 		return 1;
@@ -5843,7 +5848,7 @@  static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
 		return kvm_skip_emulated_instruction(vcpu);
 	}
 
-	return kvm_mmu_page_fault(vcpu, gpa, PFERR_RSVD_MASK, NULL, 0);
+	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
 }
 
 static int handle_nmi_window(struct kvm_vcpu *vcpu)
@@ -6536,24 +6541,16 @@  static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 		return 0;
 	}
 
-	/*
-	 * Note:
-	 * Do not try to fix EXIT_REASON_EPT_MISCONFIG if it caused by
-	 * delivery event since it indicates guest is accessing MMIO.
-	 * The vm-exit can be triggered again after return to guest that
-	 * will cause infinite loop.
-	 */
 	if ((vectoring_info & VECTORING_INFO_VALID_MASK) &&
 	    (exit_reason.basic != EXIT_REASON_EXCEPTION_NMI &&
 	     exit_reason.basic != EXIT_REASON_EPT_VIOLATION &&
 	     exit_reason.basic != EXIT_REASON_PML_FULL &&
 	     exit_reason.basic != EXIT_REASON_APIC_ACCESS &&
 	     exit_reason.basic != EXIT_REASON_TASK_SWITCH &&
-	     exit_reason.basic != EXIT_REASON_NOTIFY)) {
+	     exit_reason.basic != EXIT_REASON_NOTIFY &&
+	     exit_reason.basic != EXIT_REASON_EPT_MISCONFIG)) {
 		gpa_t gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
-		bool is_mmio = exit_reason.basic == EXIT_REASON_EPT_MISCONFIG;
-
-		kvm_prepare_ev_delivery_failure_exit(vcpu, gpa, is_mmio);
+		kvm_prepare_ev_delivery_failure_exit(vcpu, gpa, false);
 		return 0;
 	}