Message ID | 20230131085031.88939-1-likexu@tencent.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] KVM: x86/pmu: Disable all vPMU features support on Intel hybrid CPUs | expand |
On Tue, Jan 31, 2023 at 04:50:31PM +0800, Like Xu wrote: > From: Like Xu <likexu@tencent.com> > > Disable KVM support for virtualizing PMUs on hosts with hybrid PMUs until > KVM gains a sane way to enumeration the hybrid vPMU to userspace and/or > gains a mechanism to let userspace opt-in to the dangers of exposing a > hybrid vPMU to KVM guests. > > Virtualizing a hybrid PMU, or at least part of a hybrid PMU, is possible, > but it requires userspace to pin vCPUs to pCPUs to prevent migrating a > vCPU between a big core and a little core, requires the VMM to accurately > enumerate the topology to the guest (if exposing a hybrid CPU to the > guest), and also requires the VMM to accurately enumerate the vPMU > capabilities to the guest. > > The last point is especially problematic, as KVM doesn't control which > pCPU it runs on when enumerating KVM's vPMU capabilities to userspace. > For now, simply disable vPMU support on hybrid CPUs to avoid inducing > seemingly random #GPs in guests. > > Reported-by: Jianfeng Gao <jianfeng.gao@intel.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Suggested-by: Sean Christopherson <seanjc@google.com> > Signed-off-by: Like Xu <likexu@tencent.com> This seems reasonable; Paolo, will you take this through the KVM tree? Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > v1: https://lore.kernel.org/all/20230120004051.2043777-1-seanjc@google.com/ > arch/x86/kvm/pmu.h | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h > index 79988dafb15b..6a3995657e1e 100644 > --- a/arch/x86/kvm/pmu.h > +++ b/arch/x86/kvm/pmu.h > @@ -166,9 +166,11 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops) > > /* > * For Intel, only support guest architectural pmu > - * on a host with architectural pmu. > + * on a non-hybrid host with architectural pmu. > */ > - if ((is_intel && !kvm_pmu_cap.version) || !kvm_pmu_cap.num_counters_gp) > + if (!kvm_pmu_cap.num_counters_gp || > + (is_intel && (!kvm_pmu_cap.version || > + boot_cpu_has(X86_FEATURE_HYBRID_CPU)))) > enable_pmu = false; > > if (!enable_pmu) { > -- > 2.39.1 >
On Tue, Jan 31, 2023, Like Xu wrote: > From: Like Xu <likexu@tencent.com> > > Disable KVM support for virtualizing PMUs on hosts with hybrid PMUs until > KVM gains a sane way to enumeration the hybrid vPMU to userspace and/or > gains a mechanism to let userspace opt-in to the dangers of exposing a > hybrid vPMU to KVM guests. > > Virtualizing a hybrid PMU, or at least part of a hybrid PMU, is possible, > but it requires userspace to pin vCPUs to pCPUs to prevent migrating a > vCPU between a big core and a little core, requires the VMM to accurately > enumerate the topology to the guest (if exposing a hybrid CPU to the > guest), and also requires the VMM to accurately enumerate the vPMU > capabilities to the guest. > > The last point is especially problematic, as KVM doesn't control which > pCPU it runs on when enumerating KVM's vPMU capabilities to userspace. > For now, simply disable vPMU support on hybrid CPUs to avoid inducing > seemingly random #GPs in guests. > > Reported-by: Jianfeng Gao <jianfeng.gao@intel.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Suggested-by: Sean Christopherson <seanjc@google.com> > Signed-off-by: Like Xu <likexu@tencent.com> > --- > v1: https://lore.kernel.org/all/20230120004051.2043777-1-seanjc@google.com/ > arch/x86/kvm/pmu.h | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h > index 79988dafb15b..6a3995657e1e 100644 > --- a/arch/x86/kvm/pmu.h > +++ b/arch/x86/kvm/pmu.h > @@ -166,9 +166,11 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops) > > /* > * For Intel, only support guest architectural pmu > - * on a host with architectural pmu. > + * on a non-hybrid host with architectural pmu. > */ > - if ((is_intel && !kvm_pmu_cap.version) || !kvm_pmu_cap.num_counters_gp) > + if (!kvm_pmu_cap.num_counters_gp || > + (is_intel && (!kvm_pmu_cap.version || > + boot_cpu_has(X86_FEATURE_HYBRID_CPU)))) Why do this here instead of in perf_get_x86_pmu_capability()[*]? The issue isn't restricted to Intel CPUs, it just so happens that Intel is the only x86 vendor that has shipped hybrid CPUs/PMUs. Similarly, it's entirely possible to create a hybrid CPU with a fully homogeneous PMU. IMO KVM should rely on the PMU's is_hybrid() and not the generic X86_FEATURE_HYBRID_CPU flag. [*] https://lore.kernel.org/all/20230120004051.2043777-1-seanjc@google.com
On 1/2/2023 12:02 am, Sean Christopherson wrote: > On Tue, Jan 31, 2023, Like Xu wrote: >> From: Like Xu <likexu@tencent.com> >> >> Disable KVM support for virtualizing PMUs on hosts with hybrid PMUs until >> KVM gains a sane way to enumeration the hybrid vPMU to userspace and/or >> gains a mechanism to let userspace opt-in to the dangers of exposing a >> hybrid vPMU to KVM guests. >> >> Virtualizing a hybrid PMU, or at least part of a hybrid PMU, is possible, >> but it requires userspace to pin vCPUs to pCPUs to prevent migrating a >> vCPU between a big core and a little core, requires the VMM to accurately >> enumerate the topology to the guest (if exposing a hybrid CPU to the >> guest), and also requires the VMM to accurately enumerate the vPMU >> capabilities to the guest. >> >> The last point is especially problematic, as KVM doesn't control which >> pCPU it runs on when enumerating KVM's vPMU capabilities to userspace. >> For now, simply disable vPMU support on hybrid CPUs to avoid inducing >> seemingly random #GPs in guests. >> >> Reported-by: Jianfeng Gao <jianfeng.gao@intel.com> >> Cc: Peter Zijlstra <peterz@infradead.org> >> Suggested-by: Sean Christopherson <seanjc@google.com> >> Signed-off-by: Like Xu <likexu@tencent.com> >> --- >> v1: https://lore.kernel.org/all/20230120004051.2043777-1-seanjc@google.com/ >> arch/x86/kvm/pmu.h | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h >> index 79988dafb15b..6a3995657e1e 100644 >> --- a/arch/x86/kvm/pmu.h >> +++ b/arch/x86/kvm/pmu.h >> @@ -166,9 +166,11 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops) >> >> /* >> * For Intel, only support guest architectural pmu >> - * on a host with architectural pmu. >> + * on a non-hybrid host with architectural pmu. >> */ >> - if ((is_intel && !kvm_pmu_cap.version) || !kvm_pmu_cap.num_counters_gp) >> + if (!kvm_pmu_cap.num_counters_gp || >> + (is_intel && (!kvm_pmu_cap.version || >> + boot_cpu_has(X86_FEATURE_HYBRID_CPU)))) > > Why do this here instead of in perf_get_x86_pmu_capability()[*]? The issue isn't > restricted to Intel CPUs, it just so happens that Intel is the only x86 vendor > that has shipped hybrid CPUs/PMUs. Similarly, it's entirely possible to create a > hybrid CPU with a fully homogeneous PMU. IMO KVM should rely on the PMU's is_hybrid() > and not the generic X86_FEATURE_HYBRID_CPU flag. > > [*] https://lore.kernel.org/all/20230120004051.2043777-1-seanjc@google.com As of today, other x86 vendors do not have hybrid core products in their road maps. Before implementing the virtual hybrid vCPU model, there is no practical value in talking about homogeneous PMU on hybrid vCPU at the present stage. The perf interface only provides host PMU capabilities and the logic for choosing to disable (or enable) vPMU based on perf input should be left in the KVM part so that subsequent development work can add most code to the just KVM, which is very helpful for downstream users to upgrade loadable KVM module rather than the entire core kernel. My experience interacting with the perf subsystem has taught me that perf change required from KVM should be made as small as possible. I assume that Peterz's timely "Acked-by" also implies his preference. Thanks, Like Xu
On Thu, Feb 02, 2023, Like Xu wrote: > On 1/2/2023 12:02 am, Sean Christopherson wrote: > > > diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h > > > index 79988dafb15b..6a3995657e1e 100644 > > > --- a/arch/x86/kvm/pmu.h > > > +++ b/arch/x86/kvm/pmu.h > > > @@ -166,9 +166,11 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops) > > > /* > > > * For Intel, only support guest architectural pmu > > > - * on a host with architectural pmu. > > > + * on a non-hybrid host with architectural pmu. > > > */ > > > - if ((is_intel && !kvm_pmu_cap.version) || !kvm_pmu_cap.num_counters_gp) > > > + if (!kvm_pmu_cap.num_counters_gp || > > > + (is_intel && (!kvm_pmu_cap.version || > > > + boot_cpu_has(X86_FEATURE_HYBRID_CPU)))) > > > > Why do this here instead of in perf_get_x86_pmu_capability()[*]? The issue isn't > > restricted to Intel CPUs, it just so happens that Intel is the only x86 vendor > > that has shipped hybrid CPUs/PMUs. Similarly, it's entirely possible to create a > > hybrid CPU with a fully homogeneous PMU. IMO KVM should rely on the PMU's is_hybrid() > > and not the generic X86_FEATURE_HYBRID_CPU flag. > > > > [*] https://lore.kernel.org/all/20230120004051.2043777-1-seanjc@google.com > > As of today, other x86 vendors do not have hybrid core products in their > road maps. Before implementing the virtual hybrid vCPU model, there is > no practical value in talking about homogeneous PMU on hybrid vCPU > at the present stage. Why not? I assume Intel put a fair bit of effort into ensuring feature parity between P and E cores. Other than time, money, and effort, I don't see any reason why Intel can't do the same for the PMU. > The perf interface only provides host PMU capabilities and the logic for > choosing to disable (or enable) vPMU based on perf input should be left > in the KVM part so that subsequent development work can add most code > to the just KVM, which is very helpful for downstream users to upgrade > loadable KVM module rather than the entire core kernel. > > My experience interacting with the perf subsystem has taught me that > perf change required from KVM should be made as small as possible. I don't disagree, but I don't think that's relevant in this case. Perf doesn't provide the necessary bits for KVM to virtualize a hybrid PMU, so unless KVM is somehow able to get away with enumerating a very stripped down vPMU, additional modifications to perf_get_x86_pmu_capability() will be required. What I care more about though is this ugliness in perf_get_x86_pmu_capability(): /* * KVM doesn't support the hybrid PMU yet. * Return the common value in global x86_pmu, * which available for all cores. */ cap->num_counters_gp = x86_pmu.num_counters; I really don't want to leave that comment lying around as it's flat out wrong in that it obviously doesn't address the other differences beyond the number of counters. And since there are dependencies on perf, my preference is to disable PMU enumeration in perf specifically so that whoever takes on vPMU enabling is forced to consider the perf side of things, and get buy in from the perf folks.
On 3/2/2023 2:06 am, Sean Christopherson wrote: > On Thu, Feb 02, 2023, Like Xu wrote: >> On 1/2/2023 12:02 am, Sean Christopherson wrote: >>>> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h >>>> index 79988dafb15b..6a3995657e1e 100644 >>>> --- a/arch/x86/kvm/pmu.h >>>> +++ b/arch/x86/kvm/pmu.h >>>> @@ -166,9 +166,11 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops) >>>> /* >>>> * For Intel, only support guest architectural pmu >>>> - * on a host with architectural pmu. >>>> + * on a non-hybrid host with architectural pmu. >>>> */ >>>> - if ((is_intel && !kvm_pmu_cap.version) || !kvm_pmu_cap.num_counters_gp) >>>> + if (!kvm_pmu_cap.num_counters_gp || >>>> + (is_intel && (!kvm_pmu_cap.version || >>>> + boot_cpu_has(X86_FEATURE_HYBRID_CPU)))) >>> >>> Why do this here instead of in perf_get_x86_pmu_capability()[*]? The issue isn't >>> restricted to Intel CPUs, it just so happens that Intel is the only x86 vendor >>> that has shipped hybrid CPUs/PMUs. Similarly, it's entirely possible to create a >>> hybrid CPU with a fully homogeneous PMU. IMO KVM should rely on the PMU's is_hybrid() >>> and not the generic X86_FEATURE_HYBRID_CPU flag. >>> >>> [*] https://lore.kernel.org/all/20230120004051.2043777-1-seanjc@google.com >> >> As of today, other x86 vendors do not have hybrid core products in their >> road maps. Before implementing the virtual hybrid vCPU model, there is >> no practical value in talking about homogeneous PMU on hybrid vCPU >> at the present stage. > > Why not? I assume Intel put a fair bit of effort into ensuring feature parity > between P and E cores. Other than time, money, and effort, I don't see any > reason why Intel can't do the same for the PMU. I asked the same question when I was last accessed to hyprid core and was told that it wouldn't happen on pmu capabilities since different pmu events on different cpu type imply micro-architectural differences between big and little cores, and even the harmonization of event coding is difficult to achieve in the short term. > >> The perf interface only provides host PMU capabilities and the logic for >> choosing to disable (or enable) vPMU based on perf input should be left >> in the KVM part so that subsequent development work can add most code >> to the just KVM, which is very helpful for downstream users to upgrade >> loadable KVM module rather than the entire core kernel. >> >> My experience interacting with the perf subsystem has taught me that >> perf change required from KVM should be made as small as possible. > > I don't disagree, but I don't think that's relevant in this case. Perf doesn't > provide the necessary bits for KVM to virtualize a hybrid PMU, so unless KVM is > somehow able to get away with enumerating a very stripped down vPMU, additional > modifications to perf_get_x86_pmu_capability() will be required. > > What I care more about though is this ugliness in perf_get_x86_pmu_capability(): > > /* > * KVM doesn't support the hybrid PMU yet. > * Return the common value in global x86_pmu, > * which available for all cores. I would have expected w/ current code base, vpmu (excluding pebs and lbr, intel_pt) to continue to work on any type of pCPU until you decide to disable them completely. Moreover, the caller of perf_get_x86_pmu_capability() may be more than just KVM, it may be technically ebpf helpers. The diff on comments from v1 can be applied to this version (restrict KVM semantics), and it makes the status quo clearer to KVM users. > */ > cap->num_counters_gp = x86_pmu.num_counters; > > I really don't want to leave that comment lying around as it's flat out wrong in > that it obviously doesn't address the other differences beyond the number of > counters. And since there are dependencies on perf, my preference is to disable > PMU enumeration in perf specifically so that whoever takes on vPMU enabling is > forced to consider the perf side of things, and get buy in from the perf folks. The perf_get_x86_pmu_capability() obviously needs to be revamped, but until real effective KVM enabling work arrives, any inconsequential intrusion into perf/core code will only lead to trivial system maintenance.
On Fri, Feb 03, 2023, Like Xu wrote: > On 3/2/2023 2:06 am, Sean Christopherson wrote: > > On Thu, Feb 02, 2023, Like Xu wrote: > > > On 1/2/2023 12:02 am, Sean Christopherson wrote: > > > The perf interface only provides host PMU capabilities and the logic for > > > choosing to disable (or enable) vPMU based on perf input should be left > > > in the KVM part so that subsequent development work can add most code > > > to the just KVM, which is very helpful for downstream users to upgrade > > > loadable KVM module rather than the entire core kernel. > > > > > > My experience interacting with the perf subsystem has taught me that > > > perf change required from KVM should be made as small as possible. > > > > I don't disagree, but I don't think that's relevant in this case. Perf doesn't > > provide the necessary bits for KVM to virtualize a hybrid PMU, so unless KVM is > > somehow able to get away with enumerating a very stripped down vPMU, additional > > modifications to perf_get_x86_pmu_capability() will be required. > > > > What I care more about though is this ugliness in perf_get_x86_pmu_capability(): > > > > /* > > * KVM doesn't support the hybrid PMU yet. > > * Return the common value in global x86_pmu, > > * which available for all cores. > > I would have expected w/ current code base, vpmu (excluding pebs and lbr, intel_pt) > to continue to work on any type of pCPU until you decide to disable them completely. Didn't follow this. > Moreover, the caller of perf_get_x86_pmu_capability() may be more than just KVM, > it may be technically ebpf helpers. The diff on comments from v1 can be applied to > this version (restrict KVM semantics), and it makes the status quo clearer > to KVM users. In that case, eBPF is just as hosed, no? And given that the only people that have touched perf_get_x86_pmu_capability() in its 11+ years of existence are all KVM people, I have a hard time believing there is meaningful use outside of KVM. > > */ > > cap->num_counters_gp = x86_pmu.num_counters; > > > > I really don't want to leave that comment lying around as it's flat out wrong in > > that it obviously doesn't address the other differences beyond the number of > > counters. And since there are dependencies on perf, my preference is to disable > > PMU enumeration in perf specifically so that whoever takes on vPMU enabling is > > forced to consider the perf side of things, and get buy in from the perf folks. > > The perf_get_x86_pmu_capability() obviously needs to be revamped, > but until real effective KVM enabling work arrives, any inconsequential intrusion > into perf/core code will only lead to trivial system maintenance. Trivial doesn't mean useless or unnecessary though. IMO, there's value in capturing, in code, that perf_get_x86_pmu_capability() doesn't properly support hybrid vPMUs. That said, poking around perf, checking is_hybrid() is wrong. This quirk suggests that if E-cores are disabled via BIOS, (a) X86_FEATURE_HYBRID_CPU is _supposed_ to be cleared, and (b) the base PMU will reflect the P-core PMU. I.e. someone can enable vPMU by disabling E-cores. /* * Quirk: For some Alder Lake machine, when all E-cores are disabled in * a BIOS, the leaf 0xA will enumerate all counters of P-cores. However, * the X86_FEATURE_HYBRID_CPU is still set. The above codes will * mistakenly add extra counters for P-cores. Correct the number of * counters here. */ if ((pmu->num_counters > 8) || (pmu->num_counters_fixed > 4)) { pmu->num_counters = x86_pmu.num_counters; pmu->num_counters_fixed = x86_pmu.num_counters_fixed; } Side topic, someone (*cough* Intel) should fix that, e.g. detect the scenario during boot and manually clear X86_FEATURE_HYBRID_CPU. I'm also ok explicitly disabling support in KVM, but since we need to update perf as well (that KVM comment needs to go), I don't see any reason not to also update perf_get_x86_pmu_capability(). How about this? Maybe split over two patches to separate the KVM and perf changes? diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 85a63a41c471..d096b04bf80e 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -2974,17 +2974,19 @@ unsigned long perf_misc_flags(struct pt_regs *regs) void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap) { - if (!x86_pmu_initialized()) { + /* This API doesn't currently support enumerating hybrid PMUs. */ + if (WARN_ON_ONCE(cpu_feature_enabled(X86_FEATURE_HYBRID_CPU)) || + !x86_pmu_initialized()) { memset(cap, 0, sizeof(*cap)); return; } + /* + * Note, hybrid CPU models get tracked as having hybrid PMUs even when + * all E-cores are disabled via BIOS. When E-cores are disabled, the + * base PMU holds the correct number of counters for P-cores. + */ cap->version = x86_pmu.version; - /* - * KVM doesn't support the hybrid PMU yet. - * Return the common value in global x86_pmu, - * which available for all cores. - */ cap->num_counters_gp = x86_pmu.num_counters; cap->num_counters_fixed = x86_pmu.num_counters_fixed; cap->bit_width_gp = x86_pmu.cntval_bits; diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h index cdb91009701d..933165663703 100644 --- a/arch/x86/kvm/pmu.h +++ b/arch/x86/kvm/pmu.h @@ -165,15 +165,27 @@ static inline void kvm_init_pmu_capability(void) { bool is_intel = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL; - perf_get_x86_pmu_capability(&kvm_pmu_cap); - - /* - * For Intel, only support guest architectural pmu - * on a host with architectural pmu. - */ - if ((is_intel && !kvm_pmu_cap.version) || !kvm_pmu_cap.num_counters_gp) + /* + * Hybrid PMUs don't play nice with virtualization unless userspace + * pins vCPUs _and_ can enumerate accurate informations to the guest. + * Disable vPMU support for hybrid PMUs until KVM gains a way to let + * userspace opt into the dangers of hybrid vPMUs. + */ + if (cpu_feature_enabled(X86_FEATURE_HYBRID_CPU)) enable_pmu = false; + if (enable_pmu) { + perf_get_x86_pmu_capability(&kvm_pmu_cap); + + /* + * For Intel, only support guest architectural pmu + * on a host with architectural pmu. + */ + if ((is_intel && !kvm_pmu_cap.version) || + !kvm_pmu_cap.num_counters_gp) + enable_pmu = false; + } + if (!enable_pmu) { memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap)); return;
On 4/2/2023 1:28 am, Sean Christopherson wrote: > On Fri, Feb 03, 2023, Like Xu wrote: >> On 3/2/2023 2:06 am, Sean Christopherson wrote: >>> On Thu, Feb 02, 2023, Like Xu wrote: >>>> On 1/2/2023 12:02 am, Sean Christopherson wrote: >>>> The perf interface only provides host PMU capabilities and the logic for >>>> choosing to disable (or enable) vPMU based on perf input should be left >>>> in the KVM part so that subsequent development work can add most code >>>> to the just KVM, which is very helpful for downstream users to upgrade >>>> loadable KVM module rather than the entire core kernel. >>>> >>>> My experience interacting with the perf subsystem has taught me that >>>> perf change required from KVM should be made as small as possible. >>> >>> I don't disagree, but I don't think that's relevant in this case. Perf doesn't >>> provide the necessary bits for KVM to virtualize a hybrid PMU, so unless KVM is >>> somehow able to get away with enumerating a very stripped down vPMU, additional >>> modifications to perf_get_x86_pmu_capability() will be required. >>> >>> What I care more about though is this ugliness in perf_get_x86_pmu_capability(): >>> >>> /* >>> * KVM doesn't support the hybrid PMU yet. >>> * Return the common value in global x86_pmu, >>> * which available for all cores. >> >> I would have expected w/ current code base, vpmu (excluding pebs and lbr, intel_pt) >> to continue to work on any type of pCPU until you decide to disable them completely. > > Didn't follow this. My expectation is that, if a guest doesn't enable "PEBS, LBR and intel_pt", and only has the most basic pmu conters (its number is the lesser number of big and small cores supported), with some pmu_event_fileter allow list mechanism, vPMU works regardless of the vcpu model and does not require cpu pined. Any complaints from users on this usages ? > >> Moreover, the caller of perf_get_x86_pmu_capability() may be more than just KVM, >> it may be technically ebpf helpers. The diff on comments from v1 can be applied to >> this version (restrict KVM semantics), and it makes the status quo clearer >> to KVM users. > > In that case, eBPF is just as hosed, no? And given that the only people that have > touched perf_get_x86_pmu_capability() in its 11+ years of existence are all KVM > people, I have a hard time believing there is meaningful use outside of KVM. Some radical bpf programs will access the pmu directly, although this is not uncommon in upstream. KVM colleagues shouldn't need to care about them, but at least don't mislead them. > >>> */ >>> cap->num_counters_gp = x86_pmu.num_counters; >>> >>> I really don't want to leave that comment lying around as it's flat out wrong in >>> that it obviously doesn't address the other differences beyond the number of >>> counters. And since there are dependencies on perf, my preference is to disable >>> PMU enumeration in perf specifically so that whoever takes on vPMU enabling is >>> forced to consider the perf side of things, and get buy in from the perf folks. >> >> The perf_get_x86_pmu_capability() obviously needs to be revamped, >> but until real effective KVM enabling work arrives, any inconsequential intrusion >> into perf/core code will only lead to trivial system maintenance. > > Trivial doesn't mean useless or unnecessary though. IMO, there's value in capturing, > in code, that perf_get_x86_pmu_capability() doesn't properly support hybrid vPMUs. > > That said, poking around perf, checking is_hybrid() is wrong. This quirk suggests > that if E-cores are disabled via BIOS, (a) X86_FEATURE_HYBRID_CPU is _supposed_ to > be cleared, and (b) the base PMU will reflect the P-core PMU. I.e. someone can > enable vPMU by disabling E-cores. > > /* > * Quirk: For some Alder Lake machine, when all E-cores are disabled in > * a BIOS, the leaf 0xA will enumerate all counters of P-cores. However, > * the X86_FEATURE_HYBRID_CPU is still set. The above codes will Sigh. Then what if E-cores are manually offline via "/.../cpu$/online" and then init kvm module ? I suggest leaving these open issues to that enabling guy (or maybe it's still me). > * mistakenly add extra counters for P-cores. Correct the number of > * counters here. > */ > if ((pmu->num_counters > 8) || (pmu->num_counters_fixed > 4)) { > pmu->num_counters = x86_pmu.num_counters; > pmu->num_counters_fixed = x86_pmu.num_counters_fixed; > } > > Side topic, someone (*cough* Intel) should fix that, e.g. detect the scenario > during boot and manually clear X86_FEATURE_HYBRID_CPU. Maybe they did it on purpose. > > I'm also ok explicitly disabling support in KVM, but since we need to update > perf as well (that KVM comment needs to go), I don't see any reason not to also > update perf_get_x86_pmu_capability(). > > How about this? Maybe split over two patches to separate the KVM and perf changes? OK, applying your diff below or mine V2 as a KVM move is both fine to me. Just thanks. > > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c > index 85a63a41c471..d096b04bf80e 100644 > --- a/arch/x86/events/core.c > +++ b/arch/x86/events/core.c > @@ -2974,17 +2974,19 @@ unsigned long perf_misc_flags(struct pt_regs *regs) > > void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap) > { > - if (!x86_pmu_initialized()) { > + /* This API doesn't currently support enumerating hybrid PMUs. */ > + if (WARN_ON_ONCE(cpu_feature_enabled(X86_FEATURE_HYBRID_CPU)) || > + !x86_pmu_initialized()) { > memset(cap, 0, sizeof(*cap)); > return; > } > > + /* > + * Note, hybrid CPU models get tracked as having hybrid PMUs even when > + * all E-cores are disabled via BIOS. When E-cores are disabled, the > + * base PMU holds the correct number of counters for P-cores. > + */ > cap->version = x86_pmu.version; > - /* > - * KVM doesn't support the hybrid PMU yet. > - * Return the common value in global x86_pmu, > - * which available for all cores. > - */ > cap->num_counters_gp = x86_pmu.num_counters; > cap->num_counters_fixed = x86_pmu.num_counters_fixed; > cap->bit_width_gp = x86_pmu.cntval_bits; > diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h > index cdb91009701d..933165663703 100644 > --- a/arch/x86/kvm/pmu.h > +++ b/arch/x86/kvm/pmu.h > @@ -165,15 +165,27 @@ static inline void kvm_init_pmu_capability(void) > { > bool is_intel = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL; > > - perf_get_x86_pmu_capability(&kvm_pmu_cap); > - > - /* > - * For Intel, only support guest architectural pmu > - * on a host with architectural pmu. > - */ > - if ((is_intel && !kvm_pmu_cap.version) || !kvm_pmu_cap.num_counters_gp) > + /* > + * Hybrid PMUs don't play nice with virtualization unless userspace > + * pins vCPUs _and_ can enumerate accurate informations to the guest. > + * Disable vPMU support for hybrid PMUs until KVM gains a way to let > + * userspace opt into the dangers of hybrid vPMUs. > + */ > + if (cpu_feature_enabled(X86_FEATURE_HYBRID_CPU)) > enable_pmu = false; > > + if (enable_pmu) { > + perf_get_x86_pmu_capability(&kvm_pmu_cap); > + > + /* > + * For Intel, only support guest architectural pmu > + * on a host with architectural pmu. > + */ > + if ((is_intel && !kvm_pmu_cap.version) || > + !kvm_pmu_cap.num_counters_gp) > + enable_pmu = false; > + } > + > if (!enable_pmu) { > memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap)); > return; >
On Mon, Feb 06, 2023, Like Xu wrote: > On 4/2/2023 1:28 am, Sean Christopherson wrote: > > On Fri, Feb 03, 2023, Like Xu wrote: > > > On 3/2/2023 2:06 am, Sean Christopherson wrote: > > > > On Thu, Feb 02, 2023, Like Xu wrote: > > > > > On 1/2/2023 12:02 am, Sean Christopherson wrote: > > > > > The perf interface only provides host PMU capabilities and the logic for > > > > > choosing to disable (or enable) vPMU based on perf input should be left > > > > > in the KVM part so that subsequent development work can add most code > > > > > to the just KVM, which is very helpful for downstream users to upgrade > > > > > loadable KVM module rather than the entire core kernel. > > > > > > > > > > My experience interacting with the perf subsystem has taught me that > > > > > perf change required from KVM should be made as small as possible. > > > > > > > > I don't disagree, but I don't think that's relevant in this case. Perf doesn't > > > > provide the necessary bits for KVM to virtualize a hybrid PMU, so unless KVM is > > > > somehow able to get away with enumerating a very stripped down vPMU, additional > > > > modifications to perf_get_x86_pmu_capability() will be required. > > > > > > > > What I care more about though is this ugliness in perf_get_x86_pmu_capability(): > > > > > > > > /* > > > > * KVM doesn't support the hybrid PMU yet. > > > > * Return the common value in global x86_pmu, > > > > * which available for all cores. > > > > > > I would have expected w/ current code base, vpmu (excluding pebs and lbr, intel_pt) > > > to continue to work on any type of pCPU until you decide to disable them completely. > > > > Didn't follow this. > > My expectation is that, if a guest doesn't enable "PEBS, LBR and intel_pt", > and only has the most basic pmu conters (its number is the lesser number > of big and small cores supported), with some pmu_event_fileter allow list > mechanism, vPMU works regardless of the vcpu model and does not > require cpu pined. Any complaints from users on this usages ? No? But I highly doubt the average user is even aware of KVM_SET_PMU_EVENT_FILTER, let alone knows how to use it. E.g. AFAICT QEMU doesn't support the ioctl(). And for people that do use event filters, I doubt they're running on hybrid CPUs. > > > > I really don't want to leave that comment lying around as it's flat out wrong in > > > > that it obviously doesn't address the other differences beyond the number of > > > > counters. And since there are dependencies on perf, my preference is to disable > > > > PMU enumeration in perf specifically so that whoever takes on vPMU enabling is > > > > forced to consider the perf side of things, and get buy in from the perf folks. > > > > > > The perf_get_x86_pmu_capability() obviously needs to be revamped, > > > but until real effective KVM enabling work arrives, any inconsequential intrusion > > > into perf/core code will only lead to trivial system maintenance. > > > > Trivial doesn't mean useless or unnecessary though. IMO, there's value in capturing, > > in code, that perf_get_x86_pmu_capability() doesn't properly support hybrid vPMUs. > > > > That said, poking around perf, checking is_hybrid() is wrong. This quirk suggests > > that if E-cores are disabled via BIOS, (a) X86_FEATURE_HYBRID_CPU is _supposed_ to > > be cleared, and (b) the base PMU will reflect the P-core PMU. I.e. someone can > > enable vPMU by disabling E-cores. > > > > /* > > * Quirk: For some Alder Lake machine, when all E-cores are disabled in > > * a BIOS, the leaf 0xA will enumerate all counters of P-cores. However, > > * the X86_FEATURE_HYBRID_CPU is still set. The above codes will > > Sigh. Then what if E-cores are manually offline via "/.../cpu$/online" and > then init kvm module ? KVM has to be paranoid and assume those CPUs could be onlined in the future. > I suggest leaving these open issues to that enabling guy (or maybe it's still me). > > > * mistakenly add extra counters for P-cores. Correct the number of > > * counters here. > > */ > > if ((pmu->num_counters > 8) || (pmu->num_counters_fixed > 4)) { > > pmu->num_counters = x86_pmu.num_counters; > > pmu->num_counters_fixed = x86_pmu.num_counters_fixed; > > } > > > > Side topic, someone (*cough* Intel) should fix that, e.g. detect the scenario > > during boot and manually clear X86_FEATURE_HYBRID_CPU. > > Maybe they did it on purpose. That seems unlikely. My interpretation of the comment, specifically the "Quirk" and "some ... machine" parts, is that the intended behavior is to clear the HYBRID bit, but _some_ platforms fail to do so. I don't think Intel's intent matters though. If the kernel benefits from clearing HYBRID and there are no downsides, then it should be cleared regardless of what Intel intended ucode to do.
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h index 79988dafb15b..6a3995657e1e 100644 --- a/arch/x86/kvm/pmu.h +++ b/arch/x86/kvm/pmu.h @@ -166,9 +166,11 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops) /* * For Intel, only support guest architectural pmu - * on a host with architectural pmu. + * on a non-hybrid host with architectural pmu. */ - if ((is_intel && !kvm_pmu_cap.version) || !kvm_pmu_cap.num_counters_gp) + if (!kvm_pmu_cap.num_counters_gp || + (is_intel && (!kvm_pmu_cap.version || + boot_cpu_has(X86_FEATURE_HYBRID_CPU)))) enable_pmu = false; if (!enable_pmu) {