diff mbox series

KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs

Message ID 20220226234131.2167175-1-jmattson@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs | expand

Commit Message

Jim Mattson Feb. 26, 2022, 11:41 p.m. UTC
AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some
reserved bits are cleared, and some are not. Specifically, on
Zen3/Milan, bits 19 and 42 are not cleared.

When emulating such a WRMSR, KVM should not synthesize a #GP,
regardless of which bits are set. However, undocumented bits should
not be passed through to the hardware MSR. So, rather than checking
for reserved bits and synthesizing a #GP, just clear the reserved
bits.

This may seem pedantic, but since KVM currently does not support the
"Host/Guest Only" bits (41:40), it is necessary to clear these bits
rather than synthesizing #GP, because some popular guests (e.g Linux)
will set the "Host Only" bit even on CPUs that don't support
EFER.SVME, and they don't expect a #GP.

For example,

root@Ubuntu1804:~# perf stat -e r26 -a sleep 1

 Performance counter stats for 'system wide':

                 0      r26

       1.001070977 seconds time elapsed

Feb 23 03:59:58 Ubuntu1804 kernel: [  405.379957] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000130026) at rIP: 0xffffffff9b276a28 (native_write_msr+0x8/0x30)
Feb 23 03:59:58 Ubuntu1804 kernel: [  405.379958] Call Trace:
Feb 23 03:59:58 Ubuntu1804 kernel: [  405.379963]  amd_pmu_disable_event+0x27/0x90

Fixes: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM")
Reported-by: Lotus Fenn <lotusf@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/svm/pmu.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

Comments

David Dunn Feb. 27, 2022, 4:54 a.m. UTC | #1
Reviewed-by: David Dunn <daviddunn@google.com>

On Sat, Feb 26, 2022 at 3:41 PM Jim Mattson <jmattson@google.com> wrote:

> +               data &= ~pmu->reserved_bits;
> +               if (data != pmc->eventsel)
>                         reprogram_gp_counter(pmc, data);
> +               return 0;
Like Xu Feb. 28, 2022, 8:26 a.m. UTC | #2
On 27/2/2022 7:41 am, Jim Mattson wrote:
> AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some
> reserved bits are cleared, and some are not. Specifically, on
> Zen3/Milan, bits 19 and 42 are not cleared.

Curiously, is there any additional documentation on what bits 19 and 42 are for?
And we only need this part of logic specifically for at least (guest cpu model) 
Zen3.

> 
> When emulating such a WRMSR, KVM should not synthesize a #GP, > regardless of which bits are set. However, undocumented bits should

If KVM chooses to emulate different #GP behavior on AMD and Intel for
"reserved bits without qualification"[0], there should be more code for almost
all MSRs to be checked one by one.

[0] "If a field is marked reserved without qualification, software must not
change the state of that field; it must reload that field with the same value
returned from a prior read."

> not be passed through to the hardware MSR. So, rather than checking
> for reserved bits and synthesizing a #GP, just clear the reserved
> bits.

wrmsr -a 0xc0010200 0xfffffcf000280000
rdmsr -a 0xc0010200 | sort | uniq
# 0x40000080000 (expected)

According to the test, there will be memory bits somewhere on the host
recording the bit status of bits 19 and 42.

Shouldn't KVM emulate this bit-memory behavior as well ?

> 
> This may seem pedantic, but since KVM currently does not support the
> "Host/Guest Only" bits (41:40), it is necessary to clear these bits

I would have thought you had code to emulate the "Host/Guest Only"
bits for nested SVM PMU to fix this issue fundamentally.

> rather than synthesizing #GP, because some popular guests (e.g Linux)
> will set the "Host Only" bit even on CPUs that don't support
> EFER.SVME, and they don't expect a #GP.

IMO, this fix is just a reprieve.

The logic of special handling of #GP only for AMD PMU MSR's
"reserved without qualification" bits is asymmetric in the KVM/svm
context and will confuse users even more.

> 
> For example,
> 
> root@Ubuntu1804:~# perf stat -e r26 -a sleep 1
> 
>   Performance counter stats for 'system wide':
> 
>                   0      r26
> 
>         1.001070977 seconds time elapsed
> 
> Feb 23 03:59:58 Ubuntu1804 kernel: [  405.379957] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000130026) at rIP: 0xffffffff9b276a28 (native_write_msr+0x8/0x30)
> Feb 23 03:59:58 Ubuntu1804 kernel: [  405.379958] Call Trace:
> Feb 23 03:59:58 Ubuntu1804 kernel: [  405.379963]  amd_pmu_disable_event+0x27/0x90
> 
> Fixes: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM")
> Reported-by: Lotus Fenn <lotusf@google.com>
> Signed-off-by: Jim Mattson <jmattson@google.com>
> ---
>   arch/x86/kvm/svm/pmu.c | 8 +++-----
>   1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
> index d4de52409335..886e8ac5cfaa 100644
> --- a/arch/x86/kvm/svm/pmu.c
> +++ b/arch/x86/kvm/svm/pmu.c
> @@ -262,12 +262,10 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   	/* MSR_EVNTSELn */
>   	pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_EVNTSEL);
>   	if (pmc) {
> -		if (data == pmc->eventsel)
> -			return 0;
> -		if (!(data & pmu->reserved_bits)) {
> +		data &= ~pmu->reserved_bits;
> +		if (data != pmc->eventsel)
>   			reprogram_gp_counter(pmc, data);
> -			return 0;
> -		}
> +		return 0;
>   	}
>   
>   	return 1;
Jim Mattson Feb. 28, 2022, 8:07 p.m. UTC | #3
On Mon, Feb 28, 2022 at 12:27 AM Like Xu <like.xu.linux@gmail.com> wrote:
>
> On 27/2/2022 7:41 am, Jim Mattson wrote:
> > AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some
> > reserved bits are cleared, and some are not. Specifically, on
> > Zen3/Milan, bits 19 and 42 are not cleared.
>
> Curiously, is there any additional documentation on what bits 19 and 42 are for?
> And we only need this part of logic specifically for at least (guest cpu model)
> Zen3.

With the help of an older revision of the APM I found at
https://www.ii.uib.no/~osvik/x86-64/24593.pdf, we can see that bit 19,
on AMD as well as Intel, is the deprecated "Pin Control" bit. I
believe bit 42 is new on Zen3/Milan, but aside from being useful for
fixing erratum #1292, I don't have any idea what it does. Note that
bits 40 and 41 were reserved bits before SVM was introduced, and
should be treated as such for VMs that do not support SVM. Hence, the
motivation for this change is still, as previously mentioned, the
egregious behavior of the Intel perf subsystem with respect to the
Host-Only bit. This is necessary for all AMD vCPUs that do not support
SVM, regardless of model.

> >
> > When emulating such a WRMSR, KVM should not synthesize a #GP, > regardless of which bits are set. However, undocumented bits should
>
> If KVM chooses to emulate different #GP behavior on AMD and Intel for
> "reserved bits without qualification"[0], there should be more code for almost
> all MSRs to be checked one by one.

I think you are manufacturing a problem that doesn't exist.

> [0] "If a field is marked reserved without qualification, software must not
> change the state of that field; it must reload that field with the same value
> returned from a prior read."

Unfortunately, some software (e.g. Linux perf) ignores this
restriction. If, in spite of its misbehavior, the software works fine
on bare metal, we should do whatever is necessary to make it work in a
VM as well.

> > not be passed through to the hardware MSR. So, rather than checking
> > for reserved bits and synthesizing a #GP, just clear the reserved
> > bits.
>
> wrmsr -a 0xc0010200 0xfffffcf000280000
> rdmsr -a 0xc0010200 | sort | uniq
> # 0x40000080000 (expected)
>
> According to the test, there will be memory bits somewhere on the host
> recording the bit status of bits 19 and 42.
>
> Shouldn't KVM emulate this bit-memory behavior as well ?

I'm happy to revert your change that added bit 19 to the reserved
bits. I can remove bit 42 as well, but I don't see the need. Bit 42,
unlike bit 19, has never been documented.

> >
> > This may seem pedantic, but since KVM currently does not support the
> > "Host/Guest Only" bits (41:40), it is necessary to clear these bits
>
> I would have thought you had code to emulate the "Host/Guest Only"
> bits for nested SVM PMU to fix this issue fundamentally.

GCP doesn't support nested SVM at all, so we have no such code.
Regardless, as you can see from the old APM referenced above, these
bits were reserved on AMD CPUs that don't support SVM. They should
also be reserved on virtual CPUs that don't support SVM. That much, at
least, KVM gets right today.

> > rather than synthesizing #GP, because some popular guests (e.g Linux)
> > will set the "Host Only" bit even on CPUs that don't support
> > EFER.SVME, and they don't expect a #GP.
>
> IMO, this fix is just a reprieve.
>
> The logic of special handling of #GP only for AMD PMU MSR's
> "reserved without qualification" bits is asymmetric in the KVM/svm
> context and will confuse users even more.

I'm happy to entertain alternative suggestions.
Like Xu March 15, 2022, 2:33 p.m. UTC | #4
Reviewed-by: Like Xu <likexu@tencent.com>

We may need this for legacy guests w/o df51fe7ea1c1, please note the
proposed changes to the (AMD architecturally defined)  #GP behavior
for AMD reserved bits without qualification.

On 27/2/2022 12:54 pm, David Dunn wrote:
> Reviewed-by: David Dunn <daviddunn@google.com>
> 
> On Sat, Feb 26, 2022 at 3:41 PM Jim Mattson <jmattson@google.com> wrote:
> 
>> +               data &= ~pmu->reserved_bits;
>> +               if (data != pmc->eventsel)
>>                          reprogram_gp_counter(pmc, data);
>> +               return 0;
>
Paolo Bonzini March 15, 2022, 9:15 p.m. UTC | #5
On 2/27/22 00:41, Jim Mattson wrote:
> AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some
> reserved bits are cleared, and some are not. Specifically, on
> Zen3/Milan, bits 19 and 42 are not cleared.
> 
> When emulating such a WRMSR, KVM should not synthesize a #GP,
> regardless of which bits are set. However, undocumented bits should
> not be passed through to the hardware MSR. So, rather than checking
> for reserved bits and synthesizing a #GP, just clear the reserved
> bits.
> 
> This may seem pedantic, but since KVM currently does not support the
> "Host/Guest Only" bits (41:40), it is necessary to clear these bits
> rather than synthesizing #GP, because some popular guests (e.g Linux)
> will set the "Host Only" bit even on CPUs that don't support
> EFER.SVME, and they don't expect a #GP.
> 
> For example,
> 
> root@Ubuntu1804:~# perf stat -e r26 -a sleep 1
> 
>   Performance counter stats for 'system wide':
> 
>                   0      r26
> 
>         1.001070977 seconds time elapsed
> 
> Feb 23 03:59:58 Ubuntu1804 kernel: [  405.379957] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000130026) at rIP: 0xffffffff9b276a28 (native_write_msr+0x8/0x30)
> Feb 23 03:59:58 Ubuntu1804 kernel: [  405.379958] Call Trace:
> Feb 23 03:59:58 Ubuntu1804 kernel: [  405.379963]  amd_pmu_disable_event+0x27/0x90
> 
> Fixes: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM")
> Reported-by: Lotus Fenn <lotusf@google.com>
> Signed-off-by: Jim Mattson <jmattson@google.com>
> ---
>   arch/x86/kvm/svm/pmu.c | 8 +++-----
>   1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
> index d4de52409335..886e8ac5cfaa 100644
> --- a/arch/x86/kvm/svm/pmu.c
> +++ b/arch/x86/kvm/svm/pmu.c
> @@ -262,12 +262,10 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   	/* MSR_EVNTSELn */
>   	pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_EVNTSEL);
>   	if (pmc) {
> -		if (data == pmc->eventsel)
> -			return 0;
> -		if (!(data & pmu->reserved_bits)) {
> +		data &= ~pmu->reserved_bits;
> +		if (data != pmc->eventsel)
>   			reprogram_gp_counter(pmc, data);
> -			return 0;
> -		}
> +		return 0;
>   	}
>   
>   	return 1;

Queued, thanks.

Paolo
diff mbox series

Patch

diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index d4de52409335..886e8ac5cfaa 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -262,12 +262,10 @@  static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	/* MSR_EVNTSELn */
 	pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_EVNTSEL);
 	if (pmc) {
-		if (data == pmc->eventsel)
-			return 0;
-		if (!(data & pmu->reserved_bits)) {
+		data &= ~pmu->reserved_bits;
+		if (data != pmc->eventsel)
 			reprogram_gp_counter(pmc, data);
-			return 0;
-		}
+		return 0;
 	}
 
 	return 1;