Message ID | 20220302111334.12689-13-likexu@tencent.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86/pmu: Get rid of PERF_TYPE_HARDWAR and other minor fixes | expand |
On Wed, Mar 2, 2022 at 3:14 AM Like Xu <like.xu.linux@gmail.com> wrote: > > From: Like Xu <likexu@tencent.com> > > The AMD Family 19h Models 00h-0Fh Processors may experience sampling > inaccuracies that cause the following performance counters to overcount > retire-based events. To count the non-FP affected PMC events correctly, > a patched guest with a target vCPU model would: > > - Use Core::X86::Msr::PERF_CTL2 to count the events, and > - Program Core::X86::Msr::PERF_CTL2[43] to 1b, and > - Program Core::X86::Msr::PERF_CTL2[20] to 0b. > > To support this use of AMD guests, KVM should not reserve bit 43 > only for counter #2. Treatment of other cases remains unchanged. > > AMD hardware team clarified that the conditions under which the > overcounting can happen, is quite rare. This change may make those > PMU driver developers who have read errata #1292 less disappointed. > > Reported-by: Jim Mattson <jmattson@google.com> > Signed-off-by: Like Xu <likexu@tencent.com> This seems unnecessarily convoluted. As I've said previously, KVM should not ever synthesize a #GP for any value written to a PerfEvtSeln MSR when emulating an AMD CPU.
On 3/3/2022 1:52 am, Jim Mattson wrote: > On Wed, Mar 2, 2022 at 3:14 AM Like Xu <like.xu.linux@gmail.com> wrote: >> >> From: Like Xu <likexu@tencent.com> >> >> The AMD Family 19h Models 00h-0Fh Processors may experience sampling >> inaccuracies that cause the following performance counters to overcount >> retire-based events. To count the non-FP affected PMC events correctly, >> a patched guest with a target vCPU model would: >> >> - Use Core::X86::Msr::PERF_CTL2 to count the events, and >> - Program Core::X86::Msr::PERF_CTL2[43] to 1b, and >> - Program Core::X86::Msr::PERF_CTL2[20] to 0b. >> >> To support this use of AMD guests, KVM should not reserve bit 43 >> only for counter #2. Treatment of other cases remains unchanged. >> >> AMD hardware team clarified that the conditions under which the >> overcounting can happen, is quite rare. This change may make those >> PMU driver developers who have read errata #1292 less disappointed. >> >> Reported-by: Jim Mattson <jmattson@google.com> >> Signed-off-by: Like Xu <likexu@tencent.com> > > This seems unnecessarily convoluted. As I've said previously, KVM > should not ever synthesize a #GP for any value written to a > PerfEvtSeln MSR when emulating an AMD CPU. IMO, we should "never synthesize a #GP" for all AMD MSRs, not just for AMD PMU msrs, or keep the status quo. I agree with you on this AMD #GP transition, but we need at least one kernel cycle to make a more radical change and we don't know Paolo's attitude and more, we haven't received a tidal wave of user complaints.
On Fri, Mar 4, 2022 at 1:47 AM Like Xu <like.xu.linux@gmail.com> wrote: > > On 3/3/2022 1:52 am, Jim Mattson wrote: > > On Wed, Mar 2, 2022 at 3:14 AM Like Xu <like.xu.linux@gmail.com> wrote: > >> > >> From: Like Xu <likexu@tencent.com> > >> > >> The AMD Family 19h Models 00h-0Fh Processors may experience sampling > >> inaccuracies that cause the following performance counters to overcount > >> retire-based events. To count the non-FP affected PMC events correctly, > >> a patched guest with a target vCPU model would: > >> > >> - Use Core::X86::Msr::PERF_CTL2 to count the events, and > >> - Program Core::X86::Msr::PERF_CTL2[43] to 1b, and > >> - Program Core::X86::Msr::PERF_CTL2[20] to 0b. > >> > >> To support this use of AMD guests, KVM should not reserve bit 43 > >> only for counter #2. Treatment of other cases remains unchanged. > >> > >> AMD hardware team clarified that the conditions under which the > >> overcounting can happen, is quite rare. This change may make those > >> PMU driver developers who have read errata #1292 less disappointed. > >> > >> Reported-by: Jim Mattson <jmattson@google.com> > >> Signed-off-by: Like Xu <likexu@tencent.com> > > > > This seems unnecessarily convoluted. As I've said previously, KVM > > should not ever synthesize a #GP for any value written to a > > PerfEvtSeln MSR when emulating an AMD CPU. > > IMO, we should "never synthesize a #GP" for all AMD MSRs, > not just for AMD PMU msrs, or keep the status quo. Then, why are you proposing this change? :-) We should continue to synthesize a #GP for an attempt to set "must be zero" bits or for rule violations, like "address must be canonical." However, we have absolutely no business making up our own hardware specification. This is a bug, and it should be fixed, like any other bug. > I agree with you on this AMD #GP transition, but we need at least one > kernel cycle to make a more radical change and we don't know Paolo's > attitude and more, we haven't received a tidal wave of user complaints. Again, if this is your stance, why are you proposing this change? :-) If you wait until you have a tidal wave of user complaints, you have waited too long. It's much better to be proactive than reactive.
On 5/3/2022 3:06 am, Jim Mattson wrote: > We should continue to synthesize a #GP for an attempt to set "must be > zero" bits or for rule violations, like "address must be canonical." Actually, I do stand in the same position as you. > However, we have absolutely no business making up our own hardware > specification. This is a bug, and it should be fixed, like any other > bug. Current virtual hardware interfaces do not strictly comply with vendor specifications and may not be the same in the first step of enablement, or some of them may have to be compromised later out of various complexity. The behavior of AMD's "synthesize a #GP" to "reserved without qualification" bits is clearly a legacy tech decision (not sure if it was intentional). We may need a larger independent patch set to apply this one-time surgery, including of course this pmu issue. What do you think ?
On Tue, Mar 8, 2022 at 3:25 AM Like Xu <like.xu.linux@gmail.com> wrote: > > On 5/3/2022 3:06 am, Jim Mattson wrote: > > We should continue to synthesize a #GP for an attempt to set "must be > > zero" bits or for rule violations, like "address must be canonical." > > Actually, I do stand in the same position as you. > > > However, we have absolutely no business making up our own hardware > > specification. This is a bug, and it should be fixed, like any other > > bug. > Current virtual hardware interfaces do not strictly comply with vendor > specifications > and may not be the same in the first step of enablement, or some of them may have > to be compromised later out of various complexity. > > The behavior of AMD's "synthesize a #GP" to "reserved without qualification" bits > is clearly a legacy tech decision (not sure if it was intentional). We may need > a larger > independent patch set to apply this one-time surgery, including of course this > pmu issue. > > What do you think ? The PMU issue needs to be fixed ASAP, since a Linux guest will set the "host-only" bit on a CPU that doesn't support it, and Linux expects the bit to be ignored and the remainder of the PerfEvtSeln to be written. Currently, KVM synthesizes #GP and the PerfEvtSeln is not written. I don't believe it is necessary to fix all related issues at one time. Incremental fixes should be fine.
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c index 41c9b9e2aec2..05b4e4f2bb66 100644 --- a/arch/x86/kvm/svm/pmu.c +++ b/arch/x86/kvm/svm/pmu.c @@ -18,6 +18,20 @@ #include "pmu.h" #include "svm.h" +/* + * As a workaround of "Retire Based Events May Overcount" for erratum 1292, + * some patched guests may set PERF_CTL2[43] to 1b and PERF_CTL2[20] to 0b + * to count the non-FP affected PMC events correctly. + * + * Note, tests show that the counter difference before and after using the + * workaround is not significant. Host will be scheduling CTR2 indiscriminately. + */ +static inline bool vcpu_overcount_retire_events(struct kvm_vcpu *vcpu) +{ + return guest_cpuid_family(vcpu) == 0x19 && + guest_cpuid_model(vcpu) < 0x10; +} + enum pmu_type { PMU_TYPE_COUNTER = 0, PMU_TYPE_EVNTSEL, @@ -224,6 +238,7 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) struct kvm_pmc *pmc; u32 msr = msr_info->index; u64 data = msr_info->data; + u64 reserved_bits; /* MSR_PERFCTRn */ pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_COUNTER); @@ -236,7 +251,10 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) if (pmc) { if (data == pmc->eventsel) return 0; - if (!(data & pmu->reserved_bits)) { + reserved_bits = pmu->reserved_bits; + if (pmc->idx == 2 && vcpu_overcount_retire_events(vcpu)) + reserved_bits &= ~BIT_ULL(43); + if (!(data & reserved_bits)) { pmc->eventsel = data; reprogram_counter(pmc); return 0;