diff mbox series

KVM: x86/pmu: SRCU protect the PMU event filter in the fast path

Message ID 20230623123522.4185651-2-aaronlewis@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86/pmu: SRCU protect the PMU event filter in the fast path | expand

Commit Message

Aaron Lewis June 23, 2023, 12:35 p.m. UTC
When running KVM's fast path it is possible to get into a situation
where the PMU event filter is dereferenced without grabbing KVM's SRCU
read lock.

The following callstack demonstrates how that is possible.

Call Trace:
  dump_stack+0x85/0xdf
  lockdep_rcu_suspicious+0x109/0x120
  pmc_event_is_allowed+0x165/0x170
  kvm_pmu_trigger_event+0xa5/0x190
  handle_fastpath_set_msr_irqoff+0xca/0x1e0
  svm_vcpu_run+0x5c3/0x7b0 [kvm_amd]
  vcpu_enter_guest+0x2108/0x2580

Fix that by explicitly grabbing the read lock before dereferencing the
PMU event filter.

Fixes: dfdeda67ea2d ("KVM: x86/pmu: Prevent the PMU from counting disallowed events")

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
---
 arch/x86/kvm/pmu.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

Comments

Sean Christopherson June 23, 2023, 3:43 p.m. UTC | #1
On Fri, Jun 23, 2023, Aaron Lewis wrote:
> When running KVM's fast path it is possible to get into a situation
> where the PMU event filter is dereferenced without grabbing KVM's SRCU
> read lock.
> 
> The following callstack demonstrates how that is possible.
> 
> Call Trace:
>   dump_stack+0x85/0xdf
>   lockdep_rcu_suspicious+0x109/0x120
>   pmc_event_is_allowed+0x165/0x170
>   kvm_pmu_trigger_event+0xa5/0x190
>   handle_fastpath_set_msr_irqoff+0xca/0x1e0
>   svm_vcpu_run+0x5c3/0x7b0 [kvm_amd]
>   vcpu_enter_guest+0x2108/0x2580
> 
> Fix that by explicitly grabbing the read lock before dereferencing the
> PMU event filter.

Actually, on second thought, I think it would be better to acquire kvm->srcu in
handle_fastpath_set_msr_irqoff().  This is the second time that invoking
kvm_skip_emulated_instruction() resulted in an SRCU violation, and it probably
won't be the last since one of the benefits of using SRCU instead of per-asset
locks to protect things like memslots and filters is that low(ish) level helpers
don't need to worry about acquiring locks.

The 2x LOCK ADD from smp_mb() is unfortunate, but IMO it's worth eating that cost
to avoid having to play whack-a-mole in the future.  And as a (very small) bonus,
commit 5c30e8101e8d can be reverted.

--
From: Sean Christopherson <seanjc@google.com>
Date: Fri, 23 Jun 2023 08:19:51 -0700
Subject: [PATCH] KVM: x86: Acquire SRCU read lock when handling fastpath MSR
 writes

Temporarily acquire kvm->srcu for read when potentially emulating WRMSR in
the VM-Exit fastpath handler, as several of the common helpers used during
emulation expect the caller to provide SRCU protection.  E.g. if the guest
is counting instructions retired, KVM will query the PMU event filter when
stepping over the WRMSR.

  dump_stack+0x85/0xdf
  lockdep_rcu_suspicious+0x109/0x120
  pmc_event_is_allowed+0x165/0x170
  kvm_pmu_trigger_event+0xa5/0x190
  handle_fastpath_set_msr_irqoff+0xca/0x1e0
  svm_vcpu_run+0x5c3/0x7b0 [kvm_amd]
  vcpu_enter_guest+0x2108/0x2580

Alternatively, check_pmu_event_filter() could acquire kvm->srcu, but this
isn't the first bug of this nature, e.g. see commit 5c30e8101e8d ("KVM:
SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid").  Providing
protection for the entirety of WRMSR emulation will allow reverting the
aforementioned commit, and will avoid having to play whack-a-mole when new
uses of SRCU-protected structures are inevitably added in common emulation
helpers.

Fixes: dfdeda67ea2d ("KVM: x86/pmu: Prevent the PMU from counting disallowed events")
Reported-by: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 439312e04384..5f220c04624e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2172,6 +2172,8 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu)
 	u64 data;
 	fastpath_t ret = EXIT_FASTPATH_NONE;
 
+	kvm_vcpu_srcu_read_lock(vcpu);
+
 	switch (msr) {
 	case APIC_BASE_MSR + (APIC_ICR >> 4):
 		data = kvm_read_edx_eax(vcpu);
@@ -2194,6 +2196,8 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu)
 	if (ret != EXIT_FASTPATH_NONE)
 		trace_kvm_msr_write(msr, data);
 
+	kvm_vcpu_srcu_read_unlock(vcpu);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(handle_fastpath_set_msr_irqoff);

base-commit: 88bb466c9dec4f70d682cf38c685324e7b1b3d60
--
Aaron Lewis June 26, 2023, 4:37 p.m. UTC | #2
>
> Actually, on second thought, I think it would be better to acquire kvm->srcu in
> handle_fastpath_set_msr_irqoff().  This is the second time that invoking
> kvm_skip_emulated_instruction() resulted in an SRCU violation, and it probably
> won't be the last since one of the benefits of using SRCU instead of per-asset
> locks to protect things like memslots and filters is that low(ish) level helpers
> don't need to worry about acquiring locks.

Yeah, I like this approach better.

>
> Alternatively, check_pmu_event_filter() could acquire kvm->srcu, but this
> isn't the first bug of this nature, e.g. see commit 5c30e8101e8d ("KVM:
> SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid").  Providing
> protection for the entirety of WRMSR emulation will allow reverting the
> aforementioned commit, and will avoid having to play whack-a-mole when new
> uses of SRCU-protected structures are inevitably added in common emulation
> helpers.
>
> Fixes: dfdeda67ea2d ("KVM: x86/pmu: Prevent the PMU from counting disallowed events")
> Reported-by: Aaron Lewis <aaronlewis@google.com>

Could we also add "Reported-by: gthelen@google.com"

> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/x86.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 439312e04384..5f220c04624e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2172,6 +2172,8 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu)
>         u64 data;
>         fastpath_t ret = EXIT_FASTPATH_NONE;
>
> +       kvm_vcpu_srcu_read_lock(vcpu);
> +
>         switch (msr) {
>         case APIC_BASE_MSR + (APIC_ICR >> 4):
>                 data = kvm_read_edx_eax(vcpu);
> @@ -2194,6 +2196,8 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu)
>         if (ret != EXIT_FASTPATH_NONE)
>                 trace_kvm_msr_write(msr, data);
>
> +       kvm_vcpu_srcu_read_unlock(vcpu);
> +
>         return ret;
>  }
>  EXPORT_SYMBOL_GPL(handle_fastpath_set_msr_irqoff);
>
> base-commit: 88bb466c9dec4f70d682cf38c685324e7b1b3d60
> --
>

As a separate issue, shouldn't we restrict the MSR filter from being
able to intercept MSRs handled by the fast path?  I see that we do
that for the APIC MSRs, but if MSR_IA32_TSC_DEADLINE is handled by the
fast path, I don't see a way for userspace to override that behavior.
So maybe it shouldn't?  E.g.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 439312e04384..dd0a314da0a3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1787,7 +1787,7 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32
index, u32 type)
        u32 i;

        /* x2APIC MSRs do not support filtering. */
-       if (index >= 0x800 && index <= 0x8ff)
+       if (index >= 0x800 && index <= 0x8ff || index == MSR_IA32_TSC_DEADLINE)
                return true;

        idx = srcu_read_lock(&kvm->srcu);
Sean Christopherson June 26, 2023, 5:34 p.m. UTC | #3
On Mon, Jun 26, 2023, Aaron Lewis wrote:
> As a separate issue, shouldn't we restrict the MSR filter from being
> able to intercept MSRs handled by the fast path?  I see that we do
> that for the APIC MSRs, but if MSR_IA32_TSC_DEADLINE is handled by the
> fast path, I don't see a way for userspace to override that behavior.
> So maybe it shouldn't?  E.g.
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 439312e04384..dd0a314da0a3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1787,7 +1787,7 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32
> index, u32 type)
>         u32 i;
> 
>         /* x2APIC MSRs do not support filtering. */
> -       if (index >= 0x800 && index <= 0x8ff)
> +       if (index >= 0x800 && index <= 0x8ff || index == MSR_IA32_TSC_DEADLINE)
>                 return true;
> 
>         idx = srcu_read_lock(&kvm->srcu);

Yeah, I saw that flaw too :-/  I'm not entirely sure what to do about MSRs that
can be handled in the fastpath.

On one hand, intercepting those MSRs probably doesn't make much sense.  On the
other hand, the MSR filter needs to be uABI, i.e. we can't make the statement
"MSRs handled in KVM's fastpath can't be filtered", because either every new
fastpath MSRs will potentially break userspace, or KVM will be severely limited
with respect to what can be handled in the fastpath.

From an ABI perspective, the easiest thing is to fix the bug and enforce any
filter that affects MSR_IA32_TSC_DEADLINE.  If we ignore performance, the fix is
trivial.  E.g.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5f220c04624e..3ef903bb78ce 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2174,6 +2174,9 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu)
 
        kvm_vcpu_srcu_read_lock(vcpu);
 
+       if (!kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE))
+               goto out;
+
        switch (msr) {
        case APIC_BASE_MSR + (APIC_ICR >> 4):
                data = kvm_read_edx_eax(vcpu);
@@ -2196,6 +2199,7 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu)
        if (ret != EXIT_FASTPATH_NONE)
                trace_kvm_msr_write(msr, data);
 
+out:
        kvm_vcpu_srcu_read_unlock(vcpu);
 
        return ret;

But I don't love the idea of searching through the filters for an MSR that is
pretty much guaranteed to be allowed.  Since x2APIC MSRs can't be filtered, we
could add a per-vCPU flag to track if writes to TSC_DEADLINE are allowed, i.e.
if TSC_DEADLINE can be handled in the fastpath.

However, at some point Intel and/or AMD will (hopefully) add support for full
virtualization of TSC_DEADLINE, and then TSC_DEADLINE will be in the same boat as
the x2APIC MSRs, i.e. allowing userspace to filter TSC_DEADLINE when it's fully
virtualized would be nonsensical.  And depending on how hardware behaves, i.e. how
a virtual TSC_DEADLINE interacts with the MSR bitmaps, *enforcing* userspace's
filtering might require a small amount of additional complexity.

And any MSR that is performance sensitive enough to be handled in the fastpath is
probably worth virtualizing in hardware, i.e. we'll end up revisiting this topic
every time we add an MSR to the fastpath :-(

I'm struggling to come up with an idea that won't create an ABI nightmare, won't
be subject to the whims of AMD and Intel, and won't saddle KVM with complexity to
support behavior that in all likelihood no one wants.

I'm leaning toward enforcing the filter for TSC_DEADLINE, and crossing my fingers
that neither AMD nor Intel implements TSC_DEADLINE virtualization in such a way
that it changes the behavior of WRMSR interception.
diff mbox series

Patch

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index bf653df86112..2b2247f74ab7 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -381,18 +381,29 @@  static bool check_pmu_event_filter(struct kvm_pmc *pmc)
 {
 	struct kvm_x86_pmu_event_filter *filter;
 	struct kvm *kvm = pmc->vcpu->kvm;
+	bool allowed;
+	int idx;
 
 	if (!static_call(kvm_x86_pmu_hw_event_available)(pmc))
 		return false;
 
+	idx = srcu_read_lock(&kvm->srcu);
+
 	filter = srcu_dereference(kvm->arch.pmu_event_filter, &kvm->srcu);
-	if (!filter)
-		return true;
+	if (!filter) {
+		allowed = true;
+		goto out;
+	}
 
 	if (pmc_is_gp(pmc))
-		return is_gp_event_allowed(filter, pmc->eventsel);
+		allowed = is_gp_event_allowed(filter, pmc->eventsel);
+	else
+		allowed = is_fixed_event_allowed(filter, pmc->idx);
+
+out:
+	srcu_read_unlock(&kvm->srcu, idx);
 
-	return is_fixed_event_allowed(filter, pmc->idx);
+	return allowed;
 }
 
 static bool pmc_event_is_allowed(struct kvm_pmc *pmc)