diff mbox series

[07/12] KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable

Message ID 20210520230339.267445-8-jmattson@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: nVMX: Fix vmcs02 PID use-after-free issue | expand

Commit Message

Jim Mattson May 20, 2021, 11:03 p.m. UTC
Don't allow posted interrupts to modify a stale posted interrupt
descriptor (including the initial value of 0).

Empirical tests on real hardware reveal that a posted interrupt
descriptor referencing an unbacked address has PCI bus error semantics
(reads as all 1's; writes are ignored). However, kvm can't distinguish
unbacked addresses from device-backed (MMIO) addresses, so it should
really ask userspace for an MMIO completion. That's overly
complicated, so just punt with KVM_INTERNAL_ERROR.

Don't return the error until the posted interrupt descriptor is
actually accessed. We don't want to break the existing kvm-unit-tests
that assume they can launch an L2 VM with a posted interrupt
descriptor that references MMIO space in L1.

Fixes: 6beb7bd52e48 ("kvm: nVMX: Refactor nested_get_vmcs12_pages()")
Signed-off-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/vmx/nested.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Comments

Sean Christopherson May 24, 2021, 11:21 p.m. UTC | #1
On Thu, May 20, 2021, Jim Mattson wrote:
> Don't allow posted interrupts to modify a stale posted interrupt
> descriptor (including the initial value of 0).
> 
> Empirical tests on real hardware reveal that a posted interrupt
> descriptor referencing an unbacked address has PCI bus error semantics
> (reads as all 1's; writes are ignored). However, kvm can't distinguish
> unbacked addresses from device-backed (MMIO) addresses, so it should
> really ask userspace for an MMIO completion. That's overly
> complicated, so just punt with KVM_INTERNAL_ERROR.
> 
> Don't return the error until the posted interrupt descriptor is
> actually accessed. We don't want to break the existing kvm-unit-tests
> that assume they can launch an L2 VM with a posted interrupt
> descriptor that references MMIO space in L1.
> 
> Fixes: 6beb7bd52e48 ("kvm: nVMX: Refactor nested_get_vmcs12_pages()")
> Signed-off-by: Jim Mattson <jmattson@google.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 706c31821362..defd42201bb4 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -3175,6 +3175,15 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
>  				offset_in_page(vmcs12->posted_intr_desc_addr));
>  			vmcs_write64(POSTED_INTR_DESC_ADDR,
>  				     pfn_to_hpa(map->pfn) + offset_in_page(vmcs12->posted_intr_desc_addr));
> +		} else {
> +			/*
> +			 * Defer the KVM_INTERNAL_ERROR exit until
> +			 * someone tries to trigger posted interrupt
> +			 * processing on this vCPU, to avoid breaking
> +			 * existing kvm-unit-tests.

Run the lines out to 80 chars.  Also, can we change the comment to tie it to
CPU behavior in someway?  A few years down the road, "existing kvm-unit-tests"
may not have any relevant meaning, and it's not like kvm-unit-tests is bug free
either.  E.g. something like

			/*
			 * Defer the KVM_INTERNAL_ERROR exit until posted
			 * interrupt processing actually occurs on this vCPU.
			 * Until that happens, the descriptor is not accessed,
			 * and userspace can technically rely on that behavior.
			 */ 

> +			 */
> +			vmx->nested.pi_desc = NULL;
> +			pin_controls_clearbit(vmx, PIN_BASED_POSTED_INTR);
>  		}
>  	}
>  	if (nested_vmx_prepare_msr_bitmap(vcpu, vmcs12))
> @@ -3689,10 +3698,14 @@ static int vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
>  	void *vapic_page;
>  	u16 status;
>  
> -	if (!vmx->nested.pi_desc || !vmx->nested.pi_pending)
> +	if (!vmx->nested.pi_pending)
>  		return 0;
>  
> +	if (!vmx->nested.pi_desc)
> +		goto mmio_needed;
> +
>  	vmx->nested.pi_pending = false;
> +
>  	if (!pi_test_and_clear_on(vmx->nested.pi_desc))
>  		return 0;
>  
> -- 
> 2.31.1.818.g46aad6cb9e-goog
>
Jim Mattson May 24, 2021, 11:27 p.m. UTC | #2
On Mon, May 24, 2021 at 4:22 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, May 20, 2021, Jim Mattson wrote:
> > Don't allow posted interrupts to modify a stale posted interrupt
> > descriptor (including the initial value of 0).
> >
> > Empirical tests on real hardware reveal that a posted interrupt
> > descriptor referencing an unbacked address has PCI bus error semantics
> > (reads as all 1's; writes are ignored). However, kvm can't distinguish
> > unbacked addresses from device-backed (MMIO) addresses, so it should
> > really ask userspace for an MMIO completion. That's overly
> > complicated, so just punt with KVM_INTERNAL_ERROR.
> >
> > Don't return the error until the posted interrupt descriptor is
> > actually accessed. We don't want to break the existing kvm-unit-tests
> > that assume they can launch an L2 VM with a posted interrupt
> > descriptor that references MMIO space in L1.
> >
> > Fixes: 6beb7bd52e48 ("kvm: nVMX: Refactor nested_get_vmcs12_pages()")
> > Signed-off-by: Jim Mattson <jmattson@google.com>
> > ---
> >  arch/x86/kvm/vmx/nested.c | 15 ++++++++++++++-
> >  1 file changed, 14 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index 706c31821362..defd42201bb4 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -3175,6 +3175,15 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
> >                               offset_in_page(vmcs12->posted_intr_desc_addr));
> >                       vmcs_write64(POSTED_INTR_DESC_ADDR,
> >                                    pfn_to_hpa(map->pfn) + offset_in_page(vmcs12->posted_intr_desc_addr));
> > +             } else {
> > +                     /*
> > +                      * Defer the KVM_INTERNAL_ERROR exit until
> > +                      * someone tries to trigger posted interrupt
> > +                      * processing on this vCPU, to avoid breaking
> > +                      * existing kvm-unit-tests.
>
> Run the lines out to 80 chars.  Also, can we change the comment to tie it to
> CPU behavior in someway?  A few years down the road, "existing kvm-unit-tests"
> may not have any relevant meaning, and it's not like kvm-unit-tests is bug free
> either.  E.g. something like
>
>                         /*
>                          * Defer the KVM_INTERNAL_ERROR exit until posted
>                          * interrupt processing actually occurs on this vCPU.
>                          * Until that happens, the descriptor is not accessed,
>                          * and userspace can technically rely on that behavior.
>                          */
Okay...except for the fact that kvm will rather gratuitously process
posted interrupts in situations where hardware won't. That makes it
difficult to tie this to hardware behavior.
Sean Christopherson May 24, 2021, 11:45 p.m. UTC | #3
On Mon, May 24, 2021, Jim Mattson wrote:
> On Mon, May 24, 2021 at 4:22 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, May 20, 2021, Jim Mattson wrote:
> > > Don't allow posted interrupts to modify a stale posted interrupt
> > > descriptor (including the initial value of 0).
> > >
> > > Empirical tests on real hardware reveal that a posted interrupt
> > > descriptor referencing an unbacked address has PCI bus error semantics
> > > (reads as all 1's; writes are ignored). However, kvm can't distinguish
> > > unbacked addresses from device-backed (MMIO) addresses, so it should
> > > really ask userspace for an MMIO completion. That's overly
> > > complicated, so just punt with KVM_INTERNAL_ERROR.
> > >
> > > Don't return the error until the posted interrupt descriptor is
> > > actually accessed. We don't want to break the existing kvm-unit-tests
> > > that assume they can launch an L2 VM with a posted interrupt
> > > descriptor that references MMIO space in L1.
> > >
> > > Fixes: 6beb7bd52e48 ("kvm: nVMX: Refactor nested_get_vmcs12_pages()")
> > > Signed-off-by: Jim Mattson <jmattson@google.com>
> > > ---
> > >  arch/x86/kvm/vmx/nested.c | 15 ++++++++++++++-
> > >  1 file changed, 14 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > index 706c31821362..defd42201bb4 100644
> > > --- a/arch/x86/kvm/vmx/nested.c
> > > +++ b/arch/x86/kvm/vmx/nested.c
> > > @@ -3175,6 +3175,15 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
> > >                               offset_in_page(vmcs12->posted_intr_desc_addr));
> > >                       vmcs_write64(POSTED_INTR_DESC_ADDR,
> > >                                    pfn_to_hpa(map->pfn) + offset_in_page(vmcs12->posted_intr_desc_addr));
> > > +             } else {
> > > +                     /*
> > > +                      * Defer the KVM_INTERNAL_ERROR exit until
> > > +                      * someone tries to trigger posted interrupt
> > > +                      * processing on this vCPU, to avoid breaking
> > > +                      * existing kvm-unit-tests.
> >
> > Run the lines out to 80 chars.  Also, can we change the comment to tie it to
> > CPU behavior in someway?  A few years down the road, "existing kvm-unit-tests"
> > may not have any relevant meaning, and it's not like kvm-unit-tests is bug free
> > either.  E.g. something like
> >
> >                         /*
> >                          * Defer the KVM_INTERNAL_ERROR exit until posted
> >                          * interrupt processing actually occurs on this vCPU.
> >                          * Until that happens, the descriptor is not accessed,
> >                          * and userspace can technically rely on that behavior.
> >                          */
> Okay...except for the fact that kvm will rather gratuitously process
> posted interrupts in situations where hardware won't. That makes it
> difficult to tie this to hardware behavior.

Hrm, true, but we can at say that KVM won't bail if there's zero chance of posted
interrupts being processed.  I hope?
Jim Mattson May 25, 2021, 12:03 a.m. UTC | #4
On Mon, May 24, 2021 at 4:45 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, May 24, 2021, Jim Mattson wrote:
> > On Mon, May 24, 2021 at 4:22 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Thu, May 20, 2021, Jim Mattson wrote:
> > > > Don't allow posted interrupts to modify a stale posted interrupt
> > > > descriptor (including the initial value of 0).
> > > >
> > > > Empirical tests on real hardware reveal that a posted interrupt
> > > > descriptor referencing an unbacked address has PCI bus error semantics
> > > > (reads as all 1's; writes are ignored). However, kvm can't distinguish
> > > > unbacked addresses from device-backed (MMIO) addresses, so it should
> > > > really ask userspace for an MMIO completion. That's overly
> > > > complicated, so just punt with KVM_INTERNAL_ERROR.
> > > >
> > > > Don't return the error until the posted interrupt descriptor is
> > > > actually accessed. We don't want to break the existing kvm-unit-tests
> > > > that assume they can launch an L2 VM with a posted interrupt
> > > > descriptor that references MMIO space in L1.
> > > >
> > > > Fixes: 6beb7bd52e48 ("kvm: nVMX: Refactor nested_get_vmcs12_pages()")
> > > > Signed-off-by: Jim Mattson <jmattson@google.com>
> > > > ---
> > > >  arch/x86/kvm/vmx/nested.c | 15 ++++++++++++++-
> > > >  1 file changed, 14 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > > index 706c31821362..defd42201bb4 100644
> > > > --- a/arch/x86/kvm/vmx/nested.c
> > > > +++ b/arch/x86/kvm/vmx/nested.c
> > > > @@ -3175,6 +3175,15 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
> > > >                               offset_in_page(vmcs12->posted_intr_desc_addr));
> > > >                       vmcs_write64(POSTED_INTR_DESC_ADDR,
> > > >                                    pfn_to_hpa(map->pfn) + offset_in_page(vmcs12->posted_intr_desc_addr));
> > > > +             } else {
> > > > +                     /*
> > > > +                      * Defer the KVM_INTERNAL_ERROR exit until
> > > > +                      * someone tries to trigger posted interrupt
> > > > +                      * processing on this vCPU, to avoid breaking
> > > > +                      * existing kvm-unit-tests.
> > >
> > > Run the lines out to 80 chars.  Also, can we change the comment to tie it to
> > > CPU behavior in someway?  A few years down the road, "existing kvm-unit-tests"
> > > may not have any relevant meaning, and it's not like kvm-unit-tests is bug free
> > > either.  E.g. something like
> > >
> > >                         /*
> > >                          * Defer the KVM_INTERNAL_ERROR exit until posted
> > >                          * interrupt processing actually occurs on this vCPU.
> > >                          * Until that happens, the descriptor is not accessed,
> > >                          * and userspace can technically rely on that behavior.
> > >                          */
> > Okay...except for the fact that kvm will rather gratuitously process
> > posted interrupts in situations where hardware won't. That makes it
> > difficult to tie this to hardware behavior.
>
> Hrm, true, but we can at say that KVM won't bail if there's zero chance of posted
> interrupts being processed.  I hope?
Zero chance in KVM, or zero chance on hardware?

For instance, set TPR high enough to block the posted interrupt vector
from being delivered, and there is zero chance of posted interrupts
being processed by hardware. However, if another L1 vCPU sends that
vector by IPI, there is a 100% chance that KVM will bail, because it
ignores TPR for processing posted interrupts.
Sean Christopherson May 25, 2021, 12:11 a.m. UTC | #5
On Mon, May 24, 2021, Jim Mattson wrote:
> On Mon, May 24, 2021 at 4:45 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Mon, May 24, 2021, Jim Mattson wrote:
> > > On Mon, May 24, 2021 at 4:22 PM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > On Thu, May 20, 2021, Jim Mattson wrote:
> > > > > Don't allow posted interrupts to modify a stale posted interrupt
> > > > > descriptor (including the initial value of 0).
> > > > >
> > > > > Empirical tests on real hardware reveal that a posted interrupt
> > > > > descriptor referencing an unbacked address has PCI bus error semantics
> > > > > (reads as all 1's; writes are ignored). However, kvm can't distinguish
> > > > > unbacked addresses from device-backed (MMIO) addresses, so it should
> > > > > really ask userspace for an MMIO completion. That's overly
> > > > > complicated, so just punt with KVM_INTERNAL_ERROR.
> > > > >
> > > > > Don't return the error until the posted interrupt descriptor is
> > > > > actually accessed. We don't want to break the existing kvm-unit-tests
> > > > > that assume they can launch an L2 VM with a posted interrupt
> > > > > descriptor that references MMIO space in L1.
> > > > >
> > > > > Fixes: 6beb7bd52e48 ("kvm: nVMX: Refactor nested_get_vmcs12_pages()")
> > > > > Signed-off-by: Jim Mattson <jmattson@google.com>
> > > > > ---
> > > > >  arch/x86/kvm/vmx/nested.c | 15 ++++++++++++++-
> > > > >  1 file changed, 14 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > > > index 706c31821362..defd42201bb4 100644
> > > > > --- a/arch/x86/kvm/vmx/nested.c
> > > > > +++ b/arch/x86/kvm/vmx/nested.c
> > > > > @@ -3175,6 +3175,15 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
> > > > >                               offset_in_page(vmcs12->posted_intr_desc_addr));
> > > > >                       vmcs_write64(POSTED_INTR_DESC_ADDR,
> > > > >                                    pfn_to_hpa(map->pfn) + offset_in_page(vmcs12->posted_intr_desc_addr));
> > > > > +             } else {
> > > > > +                     /*
> > > > > +                      * Defer the KVM_INTERNAL_ERROR exit until
> > > > > +                      * someone tries to trigger posted interrupt
> > > > > +                      * processing on this vCPU, to avoid breaking
> > > > > +                      * existing kvm-unit-tests.
> > > >
> > > > Run the lines out to 80 chars.  Also, can we change the comment to tie it to
> > > > CPU behavior in someway?  A few years down the road, "existing kvm-unit-tests"
> > > > may not have any relevant meaning, and it's not like kvm-unit-tests is bug free
> > > > either.  E.g. something like
> > > >
> > > >                         /*
> > > >                          * Defer the KVM_INTERNAL_ERROR exit until posted
> > > >                          * interrupt processing actually occurs on this vCPU.
> > > >                          * Until that happens, the descriptor is not accessed,
> > > >                          * and userspace can technically rely on that behavior.
> > > >                          */
> > > Okay...except for the fact that kvm will rather gratuitously process
> > > posted interrupts in situations where hardware won't. That makes it
> > > difficult to tie this to hardware behavior.
> >
> > Hrm, true, but we can at say that KVM won't bail if there's zero chance of posted
> > interrupts being processed.  I hope?
> Zero chance in KVM, or zero chance on hardware?

I was hoping hardware...

> For instance, set TPR high enough to block the posted interrupt vector
> from being delivered, and there is zero chance of posted interrupts
> being processed by hardware. However, if another L1 vCPU sends that
> vector by IPI, there is a 100% chance that KVM will bail, because it
> ignores TPR for processing posted interrupts.

Can we instead word it along the lines of:

  Defer the KVM_INTERNAL_EXIT until KVM actually attempts to consume the posted
  interrupt descriptor on behalf of the vCPU.  Note, KVM may process posted
  interrupts when it architecturally should not.  Bugs aside, userspace can at
  least rely on KVM to not process posted interrupts if there is no (posted?)
  interrupt activity whatsoever.
Jim Mattson May 25, 2021, 12:15 a.m. UTC | #6
On Mon, May 24, 2021 at 5:11 PM Sean Christopherson <seanjc@google.com> wrote:
>
> Can we instead word it along the lines of:
>
>   Defer the KVM_INTERNAL_EXIT until KVM actually attempts to consume the posted
>   interrupt descriptor on behalf of the vCPU.  Note, KVM may process posted
>   interrupts when it architecturally should not.  Bugs aside, userspace can at
>   least rely on KVM to not process posted interrupts if there is no (posted?)
>   interrupt activity whatsoever.

How about:

Defer the KVM_INTERNAL_EXIT until KVM tries to access the contents of
the VMCS12 posted interrupt descriptor. (Note that KVM may do this
when it should not, per the architectural specification.)
Sean Christopherson May 25, 2021, 12:57 a.m. UTC | #7
On Mon, May 24, 2021, Jim Mattson wrote:
> On Mon, May 24, 2021 at 5:11 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > Can we instead word it along the lines of:
> >
> >   Defer the KVM_INTERNAL_EXIT until KVM actually attempts to consume the posted
> >   interrupt descriptor on behalf of the vCPU.  Note, KVM may process posted
> >   interrupts when it architecturally should not.  Bugs aside, userspace can at
> >   least rely on KVM to not process posted interrupts if there is no (posted?)
> >   interrupt activity whatsoever.
> 
> How about:
> 
> Defer the KVM_INTERNAL_EXIT until KVM tries to access the contents of
> the VMCS12 posted interrupt descriptor. (Note that KVM may do this
> when it should not, per the architectural specification.)

Works for me!
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 706c31821362..defd42201bb4 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3175,6 +3175,15 @@  static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
 				offset_in_page(vmcs12->posted_intr_desc_addr));
 			vmcs_write64(POSTED_INTR_DESC_ADDR,
 				     pfn_to_hpa(map->pfn) + offset_in_page(vmcs12->posted_intr_desc_addr));
+		} else {
+			/*
+			 * Defer the KVM_INTERNAL_ERROR exit until
+			 * someone tries to trigger posted interrupt
+			 * processing on this vCPU, to avoid breaking
+			 * existing kvm-unit-tests.
+			 */
+			vmx->nested.pi_desc = NULL;
+			pin_controls_clearbit(vmx, PIN_BASED_POSTED_INTR);
 		}
 	}
 	if (nested_vmx_prepare_msr_bitmap(vcpu, vmcs12))
@@ -3689,10 +3698,14 @@  static int vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
 	void *vapic_page;
 	u16 status;
 
-	if (!vmx->nested.pi_desc || !vmx->nested.pi_pending)
+	if (!vmx->nested.pi_pending)
 		return 0;
 
+	if (!vmx->nested.pi_desc)
+		goto mmio_needed;
+
 	vmx->nested.pi_pending = false;
+
 	if (!pi_test_and_clear_on(vmx->nested.pi_desc))
 		return 0;