diff mbox series

[v2,02/11] KVM: nSVM: clean up the copying of V_INTR bits from vmcb02 to vmcb12

Message ID 20221129193717.513824-3-mlevitsk@redhat.com (mailing list archive)
State New, archived
Headers show
Series SVM: vNMI (with my fixes) | expand

Commit Message

Maxim Levitsky Nov. 29, 2022, 7:37 p.m. UTC
the V_IRQ and v_TPR bits don't exist when virtual interrupt
masking is not enabled, therefore the KVM should not copy these
bits regardless of V_IRQ intercept.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 23 ++++++++---------------
 1 file changed, 8 insertions(+), 15 deletions(-)

Comments

Sean Christopherson Jan. 28, 2023, 12:37 a.m. UTC | #1
On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> the V_IRQ and v_TPR bits don't exist when virtual interrupt
> masking is not enabled, therefore the KVM should not copy these
> bits regardless of V_IRQ intercept.

Hmm, the APM disagrees:

 The APIC's TPR always controls the task priority for physical interrupts, and the
 V_TPR always controls virtual interrupts.

   While running a guest with V_INTR_MASKING cleared to 0:
     • Writes to CR8 affect both the APIC's TPR and the V_TPR register.


 ...

 The three VMCB fields V_IRQ, V_INTR_PRIO, and V_INTR_VECTOR indicate whether there
 is a virtual interrupt pending, and, if so, what its vector number and priority are.

IIUC, V_INTR_MASKING_MASK is mostly about EFLAGS.IF, with a small side effect on
TPR.  E.g. a VMM could pend a V_IRQ but clear V_INTR_MASKING and expect the guest
to take the V_IRQ.  At least, that's my reading of things.

> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/svm/nested.c | 23 ++++++++---------------
>  1 file changed, 8 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 37af0338da7c32..aad3145b2f62fe 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -412,24 +412,17 @@ void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
>   */
>  void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
>  {
> -	u32 mask;
> +	u32 mask = 0;
>  	svm->nested.ctl.event_inj      = svm->vmcb->control.event_inj;
>  	svm->nested.ctl.event_inj_err  = svm->vmcb->control.event_inj_err;
>  
> -	/* Only a few fields of int_ctl are written by the processor.  */
> -	mask = V_IRQ_MASK | V_TPR_MASK;
> -	if (!(svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) &&
> -	    svm_is_intercept(svm, INTERCEPT_VINTR)) {
> -		/*
> -		 * In order to request an interrupt window, L0 is usurping
> -		 * svm->vmcb->control.int_ctl and possibly setting V_IRQ
> -		 * even if it was clear in L1's VMCB.  Restoring it would be
> -		 * wrong.  However, in this case V_IRQ will remain true until
> -		 * interrupt_window_interception calls svm_clear_vintr and
> -		 * restores int_ctl.  We can just leave it aside.
> -		 */
> -		mask &= ~V_IRQ_MASK;
> -	}
> +	/*
> +	 * Only a few fields of int_ctl are written by the processor.
> +	 * Copy back only the bits that are passed through to the L2.

Just "L2", not "the L2".

> +	 */
> +

Unnecessary newline.

> +	if (svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK)
> +		mask = V_IRQ_MASK | V_TPR_MASK;
>  
>  	if (nested_vgif_enabled(svm))
>  		mask |= V_GIF_MASK;
> -- 
> 2.26.3
>
Sean Christopherson Jan. 31, 2023, 1:44 a.m. UTC | #2
On Sat, Jan 28, 2023, Sean Christopherson wrote:
> On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> > the V_IRQ and v_TPR bits don't exist when virtual interrupt
> > masking is not enabled, therefore the KVM should not copy these
> > bits regardless of V_IRQ intercept.
> 
> Hmm, the APM disagrees:
> 
>  The APIC's TPR always controls the task priority for physical interrupts, and the
>  V_TPR always controls virtual interrupts.
> 
>    While running a guest with V_INTR_MASKING cleared to 0:
>      • Writes to CR8 affect both the APIC's TPR and the V_TPR register.
> 
> 
>  ...
> 
>  The three VMCB fields V_IRQ, V_INTR_PRIO, and V_INTR_VECTOR indicate whether there
>  is a virtual interrupt pending, and, if so, what its vector number and priority are.
> 
> IIUC, V_INTR_MASKING_MASK is mostly about EFLAGS.IF, with a small side effect on
> TPR.  E.g. a VMM could pend a V_IRQ but clear V_INTR_MASKING and expect the guest
> to take the V_IRQ.  At least, that's my reading of things.
>
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/svm/nested.c | 23 ++++++++---------------
> >  1 file changed, 8 insertions(+), 15 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index 37af0338da7c32..aad3145b2f62fe 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -412,24 +412,17 @@ void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
> >   */
> >  void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
> >  {
> > -	u32 mask;
> > +	u32 mask = 0;
> >  	svm->nested.ctl.event_inj      = svm->vmcb->control.event_inj;
> >  	svm->nested.ctl.event_inj_err  = svm->vmcb->control.event_inj_err;
> >  
> > -	/* Only a few fields of int_ctl are written by the processor.  */
> > -	mask = V_IRQ_MASK | V_TPR_MASK;
> > -	if (!(svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) &&
> > -	    svm_is_intercept(svm, INTERCEPT_VINTR)) {
> > -		/*
> > -		 * In order to request an interrupt window, L0 is usurping
> > -		 * svm->vmcb->control.int_ctl and possibly setting V_IRQ
> > -		 * even if it was clear in L1's VMCB.  Restoring it would be
> > -		 * wrong.  However, in this case V_IRQ will remain true until
> > -		 * interrupt_window_interception calls svm_clear_vintr and
> > -		 * restores int_ctl.  We can just leave it aside.
> > -		 */
> > -		mask &= ~V_IRQ_MASK;

Argh! *shakes fist at KVM and SVM*

This is ridiculously convoluted, and I'm pretty sure there are existing bugs.  If
L1 runs L2 with V_IRQ=1 and V_INTR_MASKING=1, and KVM requests an interrupt window,
then KVM will overwrite vmcb02's int_vector and int_ctl, i.e. clobber L1's V_IRQ,
but then silently clear INTERCEPT_VINTR in recalc_intercepts() and thus prevent
svm_clear_vintr() from being reached, i.e. prevent restoring L1's V_IRQ.

Bug #1 is that KVM shouldn't clobber the V_IRQ fields if KVM ultimately decides
not to open an interrupt window.  Bug #2 is that KVM needs to open an interrupt
window if save.RFLAGS.IF=1, as interrupts may become unblocked in that case,
e.g. if L2 is in an interrupt shadow.

So I think this over two patches?

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 05d38944a6c0..ad1e70ac8669 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -139,13 +139,18 @@ void recalc_intercepts(struct vcpu_svm *svm)
 
        if (g->int_ctl & V_INTR_MASKING_MASK) {
                /*
-                * Once running L2 with HF_VINTR_MASK, EFLAGS.IF and CR8
-                * does not affect any interrupt we may want to inject;
-                * therefore, writes to CR8 are irrelevant to L0, as are
-                * interrupt window vmexits.
+                * If L2 is active and V_INTR_MASKING is enabled in vmcb12,
+                * disable intercept of CR8 writes as L2's CR8 does not affect
+                * any interrupt KVM may want to inject.
+                *
+                * Similarly, disable intercept of virtual interrupts (used to
+                * detect interrupt windows) if the saved RFLAGS.IF is '0', as
+                * the effective RFLAGS.IF for L1 interrupts will never be set
+                * while L2 is running (L2's RFLAGS.IF doesn't affect L1 IRQs).
                 */
                vmcb_clr_intercept(c, INTERCEPT_CR8_WRITE);
-               vmcb_clr_intercept(c, INTERCEPT_VINTR);
+               if (!(svm->vmcb01.ptr->save.rflags & X86_EFLAGS_IF))
+                       vmcb_clr_intercept(c, INTERCEPT_VINTR);
        }
 
        /*
@@ -416,18 +421,18 @@ void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
 
        /* Only a few fields of int_ctl are written by the processor.  */
        mask = V_IRQ_MASK | V_TPR_MASK;
-       if (!(svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) &&
-           svm_is_intercept(svm, INTERCEPT_VINTR)) {
-               /*
-                * In order to request an interrupt window, L0 is usurping
-                * svm->vmcb->control.int_ctl and possibly setting V_IRQ
-                * even if it was clear in L1's VMCB.  Restoring it would be
-                * wrong.  However, in this case V_IRQ will remain true until
-                * interrupt_window_interception calls svm_clear_vintr and
-                * restores int_ctl.  We can just leave it aside.
-                */
+
+       /*
+        * Don't sync vmcb02 V_IRQ back to vmcb12 if KVM (L0) is intercepting
+        * virtual interrupts in order to request an interrupt window, as KVM
+        * has usurped vmcb02's int_ctl.  If an interrupt window opens before
+        * the next VM-Exit, svm_clear_vintr() will restore vmcb12's int_ctl.
+        * If no window opens, V_IRQ will be correctly preserved in vmcb12's
+        * int_ctl (because it was never recognized while L2 was running).
+        */
+       if (svm_is_intercept(svm, INTERCEPT_VINTR) &&
+           !test_bit(INTERCEPT_VINTR, (unsigned long *)svm->nested.ctl.intercepts))
                mask &= ~V_IRQ_MASK;
-       }
 
        if (nested_vgif_enabled(svm))
                mask |= V_GIF_MASK;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b103fe7cbc82..59d2891662ef 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1580,6 +1580,16 @@ static void svm_set_vintr(struct vcpu_svm *svm)
 
        svm_set_intercept(svm, INTERCEPT_VINTR);
 
+       /*
+        * Recalculating intercepts may have clear the VINTR intercept.  If
+        * V_INTR_MASKING is enabled in vmcb12, then the effective RFLAGS.IF
+        * for L1 physical interrupts is L1's RFLAGS.IF at the time of VMRUN.
+        * Requesting an interrupt window if save.RFLAGS.IF=0 is pointless as
+        * interrupts will never be unblocked while L2 is running.
+        */
+       if (!svm_is_intercept(svm, INTERCEPT_VINTR))
+               return;
+
        /*
         * This is just a dummy VINTR to actually cause a vmexit to happen.
         * Actual injection of virtual interrupts happens through EVENTINJ.
Maxim Levitsky Feb. 24, 2023, 2:38 p.m. UTC | #3
On Tue, 2023-01-31 at 01:44 +0000, Sean Christopherson wrote:
> On Sat, Jan 28, 2023, Sean Christopherson wrote:
> > On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> > > the V_IRQ and v_TPR bits don't exist when virtual interrupt
> > > masking is not enabled, therefore the KVM should not copy these
> > > bits regardless of V_IRQ intercept.
> > 
> > Hmm, the APM disagrees:

Yes, my apologies, after re-reading the APM I agree with you.


> > 
> >  The APIC's TPR always controls the task priority for physical interrupts, and the
> >  V_TPR always controls virtual interrupts.
> > 
> >    While running a guest with V_INTR_MASKING cleared to 0:
> >      • Writes to CR8 affect both the APIC's TPR and the V_TPR register.
> > 
> > 
> >  ...
> > 
> >  The three VMCB fields V_IRQ, V_INTR_PRIO, and V_INTR_VECTOR indicate whether there
> >  is a virtual interrupt pending, and, if so, what its vector number and priority are.
> > 
> > IIUC, V_INTR_MASKING_MASK is mostly about EFLAGS.IF, with a small side effect on
> > TPR.  E.g. a VMM could pend a V_IRQ but clear V_INTR_MASKING and expect the guest
> > to take the V_IRQ.  At least, that's my reading of things.

Yes, this is how I understand it as well.


> > 
> > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > ---
> > >  arch/x86/kvm/svm/nested.c | 23 ++++++++---------------
> > >  1 file changed, 8 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > > index 37af0338da7c32..aad3145b2f62fe 100644
> > > --- a/arch/x86/kvm/svm/nested.c
> > > +++ b/arch/x86/kvm/svm/nested.c
> > > @@ -412,24 +412,17 @@ void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
> > >   */
> > >  void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
> > >  {
> > > -	u32 mask;
> > > +	u32 mask = 0;
> > >  	svm->nested.ctl.event_inj      = svm->vmcb->control.event_inj;
> > >  	svm->nested.ctl.event_inj_err  = svm->vmcb->control.event_inj_err;
> > >  
> > > -	/* Only a few fields of int_ctl are written by the processor.  */
> > > -	mask = V_IRQ_MASK | V_TPR_MASK;
> > > -	if (!(svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) &&
> > > -	    svm_is_intercept(svm, INTERCEPT_VINTR)) {
> > > -		/*
> > > -		 * In order to request an interrupt window, L0 is usurping
> > > -		 * svm->vmcb->control.int_ctl and possibly setting V_IRQ
> > > -		 * even if it was clear in L1's VMCB.  Restoring it would be
> > > -		 * wrong.  However, in this case V_IRQ will remain true until
> > > -		 * interrupt_window_interception calls svm_clear_vintr and
> > > -		 * restores int_ctl.  We can just leave it aside.
> > > -		 */
> > > -		mask &= ~V_IRQ_MASK;
> 
> Argh! *shakes fist at KVM and SVM*
> 
> This is ridiculously convoluted, and I'm pretty sure there are existing bugs.  If
> L1 runs L2 with V_IRQ=1 and V_INTR_MASKING=1


Note that there are two cases when we need an interrupt window in nested case:

- If the L1 doesn't intercept interrupts, which is what we are taking about here.
- If the L1 does intercept interrupts, but let L2 control the L1's EFLAGS.IF and/or
  L1's GIF 
  (that is V_INTR_MASKING_MASK is not set, and/or L1 doesn't intercept STGI/CLGI).

  In this case a 'real' interrupt will be converted to a VM exit but only
  when both L1's EFLAGS.IF is true and L1's GIF is true.



> , and KVM requests an interrupt window,
> then KVM will overwrite vmcb02's int_vector and int_ctl, i.e. clobber L1's V_IRQ,
> but then silently clear INTERCEPT_VINTR in recalc_intercepts() and thus prevent
> svm_clear_vintr() from being reached, i.e. prevent restoring L1's V_IRQ.


> 
> Bug #1 is that KVM shouldn't clobber the V_IRQ fields if KVM ultimately decides
> not to open an interrupt window.  Bug #2 is that KVM needs to open an interrupt
> window if save.RFLAGS.IF=1, as interrupts may become unblocked in that case,
> e.g. if L2 is in an interrupt shadow.


> 
> So I think this over two patches?
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 05d38944a6c0..ad1e70ac8669 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -139,13 +139,18 @@ void recalc_intercepts(struct vcpu_svm *svm)
>  
>         if (g->int_ctl & V_INTR_MASKING_MASK) {
>                 /*
> -                * Once running L2 with HF_VINTR_MASK, EFLAGS.IF and CR8
> -                * does not affect any interrupt we may want to inject;
> -                * therefore, writes to CR8 are irrelevant to L0, as are
> -                * interrupt window vmexits.
> +                * If L2 is active and V_INTR_MASKING is enabled in vmcb12,
> +                * disable intercept of CR8 writes as L2's CR8 does not affect
> +                * any interrupt KVM may want to inject.
> +                *
> +                * Similarly, disable intercept of virtual interrupts (used to
> +                * detect interrupt windows) if the saved RFLAGS.IF is '0', as
> +                * the effective RFLAGS.IF for L1 interrupts will never be set
> +                * while L2 is running (L2's RFLAGS.IF doesn't affect L1 IRQs).
>                  */
>                 vmcb_clr_intercept(c, INTERCEPT_CR8_WRITE);
> -               vmcb_clr_intercept(c, INTERCEPT_VINTR);
> +               if (!(svm->vmcb01.ptr->save.rflags & X86_EFLAGS_IF))
> +                       vmcb_clr_intercept(c, INTERCEPT_VINTR);

How about instead moving this code to svm_set_vintr?

That is, in the guest mode, if the guest has V_INTR_MASKING_MASK, then
then a nested VM exit is the next point the interrupt window could open,
thus we don't set VINTR)

Or even better put the logic in svm_enable_irq_window (that is avoid
calling svm_set_vintr in the first place).

I also think that it worth it to add a warning that 'svm_set_intercept'
didn't work, that is didn't really set an intercept. 
In theory that can result in nasty CVEs in addition to logic bugs as you found.


>         }
>  
>         /*
> @@ -416,18 +421,18 @@ void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
>  
>         /* Only a few fields of int_ctl are written by the processor.  */
>         mask = V_IRQ_MASK | V_TPR_MASK;
> -       if (!(svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) &&
> -           svm_is_intercept(svm, INTERCEPT_VINTR)) {
> -               /*
> -                * In order to request an interrupt window, L0 is usurping
> -                * svm->vmcb->control.int_ctl and possibly setting V_IRQ
> -                * even if it was clear in L1's VMCB.  Restoring it would be
> -                * wrong.  However, in this case V_IRQ will remain true until
> -                * interrupt_window_interception calls svm_clear_vintr and
> -                * restores int_ctl.  We can just leave it aside.
> -                */
> +
> +       /*
> +        * Don't sync vmcb02 V_IRQ back to vmcb12 if KVM (L0) is intercepting
> +        * virtual interrupts in order to request an interrupt window, as KVM
> +        * has usurped vmcb02's int_ctl.  If an interrupt window opens before
> +        * the next VM-Exit, svm_clear_vintr() will restore vmcb12's int_ctl.
> +        * If no window opens, V_IRQ will be correctly preserved in vmcb12's
> +        * int_ctl (because it was never recognized while L2 was running).
> +        */
> +       if (svm_is_intercept(svm, INTERCEPT_VINTR) &&
> +           !test_bit(INTERCEPT_VINTR, (unsigned long *)svm->nested.ctl.intercepts))
>                 mask &= ~V_IRQ_MASK;

This makes sense.



> -       }
>  
>         if (nested_vgif_enabled(svm))
>                 mask |= V_GIF_MASK;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index b103fe7cbc82..59d2891662ef 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1580,6 +1580,16 @@ static void svm_set_vintr(struct vcpu_svm *svm)
>  
>         svm_set_intercept(svm, INTERCEPT_VINTR);
>  
> +       /*
> +        * Recalculating intercepts may have clear the VINTR intercept.  If
> +        * V_INTR_MASKING is enabled in vmcb12, then the effective RFLAGS.IF
> +        * for L1 physical interrupts is L1's RFLAGS.IF at the time of VMRUN.
> +        * Requesting an interrupt window if save.RFLAGS.IF=0 is pointless as
> +        * interrupts will never be unblocked while L2 is running.
> +        */
> +       if (!svm_is_intercept(svm, INTERCEPT_VINTR))
> +               return;

This won't be needed if we don't call the svm_set_vintr in the first place.

> +
>         /*
>          * This is just a dummy VINTR to actually cause a vmexit to happen.
>          * Actual injection of virtual interrupts happens through EVENTINJ.
> 



With all this said, I also want to note that this patch has *nothing* to do with VNMI,
I only added it due to some refactoring, so feel free to drop it from vNMI queue,
and deal with those bugs separately.

Best regards,
	Maxim Levitsky
Sean Christopherson Feb. 24, 2023, 4:48 p.m. UTC | #4
On Fri, Feb 24, 2023, Maxim Levitsky wrote:
> On Tue, 2023-01-31 at 01:44 +0000, Sean Christopherson wrote:
> > On Sat, Jan 28, 2023, Sean Christopherson wrote:
> > So I think this over two patches?
> > 
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index 05d38944a6c0..ad1e70ac8669 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -139,13 +139,18 @@ void recalc_intercepts(struct vcpu_svm *svm)
> >  
> >         if (g->int_ctl & V_INTR_MASKING_MASK) {
> >                 /*
> > -                * Once running L2 with HF_VINTR_MASK, EFLAGS.IF and CR8
> > -                * does not affect any interrupt we may want to inject;
> > -                * therefore, writes to CR8 are irrelevant to L0, as are
> > -                * interrupt window vmexits.
> > +                * If L2 is active and V_INTR_MASKING is enabled in vmcb12,
> > +                * disable intercept of CR8 writes as L2's CR8 does not affect
> > +                * any interrupt KVM may want to inject.
> > +                *
> > +                * Similarly, disable intercept of virtual interrupts (used to
> > +                * detect interrupt windows) if the saved RFLAGS.IF is '0', as
> > +                * the effective RFLAGS.IF for L1 interrupts will never be set
> > +                * while L2 is running (L2's RFLAGS.IF doesn't affect L1 IRQs).
> >                  */
> >                 vmcb_clr_intercept(c, INTERCEPT_CR8_WRITE);
> > -               vmcb_clr_intercept(c, INTERCEPT_VINTR);
> > +               if (!(svm->vmcb01.ptr->save.rflags & X86_EFLAGS_IF))
> > +                       vmcb_clr_intercept(c, INTERCEPT_VINTR);
> 
> How about instead moving this code to svm_set_vintr?

I considered that, but it doesn't handle the case where INTERCEPT_VINTR was set
in vmcb01 before nested VMRUN, i.e. KVM is waiting for an interrupt window at
the time of L1, and L1 doesn't set RFLAGS.IF=1 prior to VMRUN.

> That is, in the guest mode, if the guest has V_INTR_MASKING_MASK, then
> then a nested VM exit is the next point the interrupt window could open,
> thus we don't set VINTR)
> 
> Or even better put the logic in svm_enable_irq_window (that is avoid
> calling svm_set_vintr in the first place).
> 
> I also think that it worth it to add a warning that 'svm_set_intercept'
> didn't work, that is didn't really set an intercept.

Heh, I had coded that up too, but switched to bailing from svm_set_vintr() if the
intercept was disabled because of the aforementioned scenario.

> In theory that can result in nasty CVEs in addition to logic bugs as you found.

I don't think this can result in a CVE, at least not without even worse bugs in
L1.  KVM uses INTERCEPT_VINTR purely to detect interrupt windows, and failure to
configure an IRQ window would at worst cause KVM to delay an IRQ.  If a missing
or late IRQ lets L2 extract data from L1, then L1 has problems of its own.

> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index b103fe7cbc82..59d2891662ef 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -1580,6 +1580,16 @@ static void svm_set_vintr(struct vcpu_svm *svm)
> >  
> >         svm_set_intercept(svm, INTERCEPT_VINTR);
> >  
> > +       /*
> > +        * Recalculating intercepts may have clear the VINTR intercept.  If
> > +        * V_INTR_MASKING is enabled in vmcb12, then the effective RFLAGS.IF
> > +        * for L1 physical interrupts is L1's RFLAGS.IF at the time of VMRUN.
> > +        * Requesting an interrupt window if save.RFLAGS.IF=0 is pointless as
> > +        * interrupts will never be unblocked while L2 is running.
> > +        */
> > +       if (!svm_is_intercept(svm, INTERCEPT_VINTR))
> > +               return;
> 
> This won't be needed if we don't call the svm_set_vintr in the first place.
> 
> > +
> >         /*
> >          * This is just a dummy VINTR to actually cause a vmexit to happen.
> >          * Actual injection of virtual interrupts happens through EVENTINJ.
> > 
> 
> 
> 
> With all this said, I also want to note that this patch has *nothing* to do with VNMI,
> I only added it due to some refactoring, so feel free to drop it from vNMI queue,
> and deal with those bugs separately.

Ya, let's tackle this in a separate series.  I'll circle back to this after rc1
(I'm OOO next week).

Santosh, in the next version of the vNMI series, can you drop any patches that
aren't strictly necessary to enable vNMI?
diff mbox series

Patch

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 37af0338da7c32..aad3145b2f62fe 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -412,24 +412,17 @@  void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
  */
 void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
 {
-	u32 mask;
+	u32 mask = 0;
 	svm->nested.ctl.event_inj      = svm->vmcb->control.event_inj;
 	svm->nested.ctl.event_inj_err  = svm->vmcb->control.event_inj_err;
 
-	/* Only a few fields of int_ctl are written by the processor.  */
-	mask = V_IRQ_MASK | V_TPR_MASK;
-	if (!(svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) &&
-	    svm_is_intercept(svm, INTERCEPT_VINTR)) {
-		/*
-		 * In order to request an interrupt window, L0 is usurping
-		 * svm->vmcb->control.int_ctl and possibly setting V_IRQ
-		 * even if it was clear in L1's VMCB.  Restoring it would be
-		 * wrong.  However, in this case V_IRQ will remain true until
-		 * interrupt_window_interception calls svm_clear_vintr and
-		 * restores int_ctl.  We can just leave it aside.
-		 */
-		mask &= ~V_IRQ_MASK;
-	}
+	/*
+	 * Only a few fields of int_ctl are written by the processor.
+	 * Copy back only the bits that are passed through to the L2.
+	 */
+
+	if (svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK)
+		mask = V_IRQ_MASK | V_TPR_MASK;
 
 	if (nested_vgif_enabled(svm))
 		mask |= V_GIF_MASK;