Message ID | 20220226234131.2167175-1-jmattson@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs | expand |
Reviewed-by: David Dunn <daviddunn@google.com> On Sat, Feb 26, 2022 at 3:41 PM Jim Mattson <jmattson@google.com> wrote: > + data &= ~pmu->reserved_bits; > + if (data != pmc->eventsel) > reprogram_gp_counter(pmc, data); > + return 0;
On 27/2/2022 7:41 am, Jim Mattson wrote: > AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some > reserved bits are cleared, and some are not. Specifically, on > Zen3/Milan, bits 19 and 42 are not cleared. Curiously, is there any additional documentation on what bits 19 and 42 are for? And we only need this part of logic specifically for at least (guest cpu model) Zen3. > > When emulating such a WRMSR, KVM should not synthesize a #GP, > regardless of which bits are set. However, undocumented bits should If KVM chooses to emulate different #GP behavior on AMD and Intel for "reserved bits without qualification"[0], there should be more code for almost all MSRs to be checked one by one. [0] "If a field is marked reserved without qualification, software must not change the state of that field; it must reload that field with the same value returned from a prior read." > not be passed through to the hardware MSR. So, rather than checking > for reserved bits and synthesizing a #GP, just clear the reserved > bits. wrmsr -a 0xc0010200 0xfffffcf000280000 rdmsr -a 0xc0010200 | sort | uniq # 0x40000080000 (expected) According to the test, there will be memory bits somewhere on the host recording the bit status of bits 19 and 42. Shouldn't KVM emulate this bit-memory behavior as well ? > > This may seem pedantic, but since KVM currently does not support the > "Host/Guest Only" bits (41:40), it is necessary to clear these bits I would have thought you had code to emulate the "Host/Guest Only" bits for nested SVM PMU to fix this issue fundamentally. > rather than synthesizing #GP, because some popular guests (e.g Linux) > will set the "Host Only" bit even on CPUs that don't support > EFER.SVME, and they don't expect a #GP. IMO, this fix is just a reprieve. The logic of special handling of #GP only for AMD PMU MSR's "reserved without qualification" bits is asymmetric in the KVM/svm context and will confuse users even more. > > For example, > > root@Ubuntu1804:~# perf stat -e r26 -a sleep 1 > > Performance counter stats for 'system wide': > > 0 r26 > > 1.001070977 seconds time elapsed > > Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379957] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000130026) at rIP: 0xffffffff9b276a28 (native_write_msr+0x8/0x30) > Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379958] Call Trace: > Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379963] amd_pmu_disable_event+0x27/0x90 > > Fixes: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM") > Reported-by: Lotus Fenn <lotusf@google.com> > Signed-off-by: Jim Mattson <jmattson@google.com> > --- > arch/x86/kvm/svm/pmu.c | 8 +++----- > 1 file changed, 3 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c > index d4de52409335..886e8ac5cfaa 100644 > --- a/arch/x86/kvm/svm/pmu.c > +++ b/arch/x86/kvm/svm/pmu.c > @@ -262,12 +262,10 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > /* MSR_EVNTSELn */ > pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_EVNTSEL); > if (pmc) { > - if (data == pmc->eventsel) > - return 0; > - if (!(data & pmu->reserved_bits)) { > + data &= ~pmu->reserved_bits; > + if (data != pmc->eventsel) > reprogram_gp_counter(pmc, data); > - return 0; > - } > + return 0; > } > > return 1;
On Mon, Feb 28, 2022 at 12:27 AM Like Xu <like.xu.linux@gmail.com> wrote: > > On 27/2/2022 7:41 am, Jim Mattson wrote: > > AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some > > reserved bits are cleared, and some are not. Specifically, on > > Zen3/Milan, bits 19 and 42 are not cleared. > > Curiously, is there any additional documentation on what bits 19 and 42 are for? > And we only need this part of logic specifically for at least (guest cpu model) > Zen3. With the help of an older revision of the APM I found at https://www.ii.uib.no/~osvik/x86-64/24593.pdf, we can see that bit 19, on AMD as well as Intel, is the deprecated "Pin Control" bit. I believe bit 42 is new on Zen3/Milan, but aside from being useful for fixing erratum #1292, I don't have any idea what it does. Note that bits 40 and 41 were reserved bits before SVM was introduced, and should be treated as such for VMs that do not support SVM. Hence, the motivation for this change is still, as previously mentioned, the egregious behavior of the Intel perf subsystem with respect to the Host-Only bit. This is necessary for all AMD vCPUs that do not support SVM, regardless of model. > > > > When emulating such a WRMSR, KVM should not synthesize a #GP, > regardless of which bits are set. However, undocumented bits should > > If KVM chooses to emulate different #GP behavior on AMD and Intel for > "reserved bits without qualification"[0], there should be more code for almost > all MSRs to be checked one by one. I think you are manufacturing a problem that doesn't exist. > [0] "If a field is marked reserved without qualification, software must not > change the state of that field; it must reload that field with the same value > returned from a prior read." Unfortunately, some software (e.g. Linux perf) ignores this restriction. If, in spite of its misbehavior, the software works fine on bare metal, we should do whatever is necessary to make it work in a VM as well. > > not be passed through to the hardware MSR. So, rather than checking > > for reserved bits and synthesizing a #GP, just clear the reserved > > bits. > > wrmsr -a 0xc0010200 0xfffffcf000280000 > rdmsr -a 0xc0010200 | sort | uniq > # 0x40000080000 (expected) > > According to the test, there will be memory bits somewhere on the host > recording the bit status of bits 19 and 42. > > Shouldn't KVM emulate this bit-memory behavior as well ? I'm happy to revert your change that added bit 19 to the reserved bits. I can remove bit 42 as well, but I don't see the need. Bit 42, unlike bit 19, has never been documented. > > > > This may seem pedantic, but since KVM currently does not support the > > "Host/Guest Only" bits (41:40), it is necessary to clear these bits > > I would have thought you had code to emulate the "Host/Guest Only" > bits for nested SVM PMU to fix this issue fundamentally. GCP doesn't support nested SVM at all, so we have no such code. Regardless, as you can see from the old APM referenced above, these bits were reserved on AMD CPUs that don't support SVM. They should also be reserved on virtual CPUs that don't support SVM. That much, at least, KVM gets right today. > > rather than synthesizing #GP, because some popular guests (e.g Linux) > > will set the "Host Only" bit even on CPUs that don't support > > EFER.SVME, and they don't expect a #GP. > > IMO, this fix is just a reprieve. > > The logic of special handling of #GP only for AMD PMU MSR's > "reserved without qualification" bits is asymmetric in the KVM/svm > context and will confuse users even more. I'm happy to entertain alternative suggestions.
Reviewed-by: Like Xu <likexu@tencent.com> We may need this for legacy guests w/o df51fe7ea1c1, please note the proposed changes to the (AMD architecturally defined) #GP behavior for AMD reserved bits without qualification. On 27/2/2022 12:54 pm, David Dunn wrote: > Reviewed-by: David Dunn <daviddunn@google.com> > > On Sat, Feb 26, 2022 at 3:41 PM Jim Mattson <jmattson@google.com> wrote: > >> + data &= ~pmu->reserved_bits; >> + if (data != pmc->eventsel) >> reprogram_gp_counter(pmc, data); >> + return 0; >
On 2/27/22 00:41, Jim Mattson wrote: > AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some > reserved bits are cleared, and some are not. Specifically, on > Zen3/Milan, bits 19 and 42 are not cleared. > > When emulating such a WRMSR, KVM should not synthesize a #GP, > regardless of which bits are set. However, undocumented bits should > not be passed through to the hardware MSR. So, rather than checking > for reserved bits and synthesizing a #GP, just clear the reserved > bits. > > This may seem pedantic, but since KVM currently does not support the > "Host/Guest Only" bits (41:40), it is necessary to clear these bits > rather than synthesizing #GP, because some popular guests (e.g Linux) > will set the "Host Only" bit even on CPUs that don't support > EFER.SVME, and they don't expect a #GP. > > For example, > > root@Ubuntu1804:~# perf stat -e r26 -a sleep 1 > > Performance counter stats for 'system wide': > > 0 r26 > > 1.001070977 seconds time elapsed > > Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379957] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000130026) at rIP: 0xffffffff9b276a28 (native_write_msr+0x8/0x30) > Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379958] Call Trace: > Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379963] amd_pmu_disable_event+0x27/0x90 > > Fixes: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM") > Reported-by: Lotus Fenn <lotusf@google.com> > Signed-off-by: Jim Mattson <jmattson@google.com> > --- > arch/x86/kvm/svm/pmu.c | 8 +++----- > 1 file changed, 3 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c > index d4de52409335..886e8ac5cfaa 100644 > --- a/arch/x86/kvm/svm/pmu.c > +++ b/arch/x86/kvm/svm/pmu.c > @@ -262,12 +262,10 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > /* MSR_EVNTSELn */ > pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_EVNTSEL); > if (pmc) { > - if (data == pmc->eventsel) > - return 0; > - if (!(data & pmu->reserved_bits)) { > + data &= ~pmu->reserved_bits; > + if (data != pmc->eventsel) > reprogram_gp_counter(pmc, data); > - return 0; > - } > + return 0; > } > > return 1; Queued, thanks. Paolo
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c index d4de52409335..886e8ac5cfaa 100644 --- a/arch/x86/kvm/svm/pmu.c +++ b/arch/x86/kvm/svm/pmu.c @@ -262,12 +262,10 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) /* MSR_EVNTSELn */ pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_EVNTSEL); if (pmc) { - if (data == pmc->eventsel) - return 0; - if (!(data & pmu->reserved_bits)) { + data &= ~pmu->reserved_bits; + if (data != pmc->eventsel) reprogram_gp_counter(pmc, data); - return 0; - } + return 0; } return 1;
AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some reserved bits are cleared, and some are not. Specifically, on Zen3/Milan, bits 19 and 42 are not cleared. When emulating such a WRMSR, KVM should not synthesize a #GP, regardless of which bits are set. However, undocumented bits should not be passed through to the hardware MSR. So, rather than checking for reserved bits and synthesizing a #GP, just clear the reserved bits. This may seem pedantic, but since KVM currently does not support the "Host/Guest Only" bits (41:40), it is necessary to clear these bits rather than synthesizing #GP, because some popular guests (e.g Linux) will set the "Host Only" bit even on CPUs that don't support EFER.SVME, and they don't expect a #GP. For example, root@Ubuntu1804:~# perf stat -e r26 -a sleep 1 Performance counter stats for 'system wide': 0 r26 1.001070977 seconds time elapsed Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379957] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000130026) at rIP: 0xffffffff9b276a28 (native_write_msr+0x8/0x30) Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379958] Call Trace: Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379963] amd_pmu_disable_event+0x27/0x90 Fixes: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM") Reported-by: Lotus Fenn <lotusf@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> --- arch/x86/kvm/svm/pmu.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)