Message ID | 20240301075007.644152-1-sandipan.das@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86/svm/pmu: Set PerfMonV2 global control bits correctly | expand |
On 1/3/2024 3:50 pm, Sandipan Das wrote: > With PerfMonV2, a performance monitoring counter will start operating > only when both the PERF_CTLx enable bit as well as the corresponding > PerfCntrGlobalCtl enable bit are set. > > When the PerfMonV2 CPUID feature bit (leaf 0x80000022 EAX bit 0) is set > for a guest but the guest kernel does not support PerfMonV2 (such as > kernels older than v5.19), the guest counters do not count since the > PerfCntrGlobalCtl MSR is initialized to zero and the guest kernel never > writes to it. If the vcpu has the PerfMonV2 feature, it should not work the way legacy PMU does. Users need to use the new driver to operate the new hardware, don't they ? One practical approach is that the hypervisor should not set the PerfMonV2 bit for this unpatched 'v5.19' guest. > > This is not observed on bare-metal as the default value of the > PerfCntrGlobalCtl MSR after a reset is 0x3f (assuming there are six > counters) and the counters can still be operated by using the enable > bit in the PERF_CTLx MSRs. Replicate the same behaviour in guests for > compatibility with older kernels. > > Before: > > $ perf stat -e cycles:u true > > Performance counter stats for 'true': > > 0 cycles:u > > 0.001074773 seconds time elapsed > > 0.001169000 seconds user > 0.000000000 seconds sys > > After: > > $ perf stat -e cycles:u true > > Performance counter stats for 'true': > > 227,850 cycles:u > > 0.037770758 seconds time elapsed > > 0.000000000 seconds user > 0.037886000 seconds sys > > Reported-by: Babu Moger <babu.moger@amd.com> > Fixes: 4a2771895ca6 ("KVM: x86/svm/pmu: Add AMD PerfMonV2 support") > Signed-off-by: Sandipan Das <sandipan.das@amd.com> > --- > arch/x86/kvm/svm/pmu.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c > index b6a7ad4d6914..14709c564d6a 100644 > --- a/arch/x86/kvm/svm/pmu.c > +++ b/arch/x86/kvm/svm/pmu.c > @@ -205,6 +205,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu) > if (pmu->version > 1) { > pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) - 1); > pmu->global_status_mask = pmu->global_ctrl_mask; > + pmu->global_ctrl = ~pmu->global_ctrl_mask; > } > > pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1;
On 3/1/2024 2:07 PM, Like Xu wrote: > On 1/3/2024 3:50 pm, Sandipan Das wrote: >> With PerfMonV2, a performance monitoring counter will start operating >> only when both the PERF_CTLx enable bit as well as the corresponding >> PerfCntrGlobalCtl enable bit are set. >> >> When the PerfMonV2 CPUID feature bit (leaf 0x80000022 EAX bit 0) is set >> for a guest but the guest kernel does not support PerfMonV2 (such as >> kernels older than v5.19), the guest counters do not count since the >> PerfCntrGlobalCtl MSR is initialized to zero and the guest kernel never >> writes to it. > > If the vcpu has the PerfMonV2 feature, it should not work the way legacy > PMU does. Users need to use the new driver to operate the new hardware, > don't they ? One practical approach is that the hypervisor should not set > the PerfMonV2 bit for this unpatched 'v5.19' guest. > My understanding is that the legacy method of managing the counters should still work because the enable bits in PerfCntrGlobalCtl are expected to be set. The AMD PPR does mention that the PerfCntrEn bitfield of PerfCntrGlobalCtl is set to 0x3f after a system reset. That way, the guest kernel can use either the new or legacy method. >> >> This is not observed on bare-metal as the default value of the >> PerfCntrGlobalCtl MSR after a reset is 0x3f (assuming there are six >> counters) and the counters can still be operated by using the enable >> bit in the PERF_CTLx MSRs. Replicate the same behaviour in guests for >> compatibility with older kernels. >> >> Before: >> >> $ perf stat -e cycles:u true >> >> Performance counter stats for 'true': >> >> 0 cycles:u >> >> 0.001074773 seconds time elapsed >> >> 0.001169000 seconds user >> 0.000000000 seconds sys >> >> After: >> >> $ perf stat -e cycles:u true >> >> Performance counter stats for 'true': >> >> 227,850 cycles:u >> >> 0.037770758 seconds time elapsed >> >> 0.000000000 seconds user >> 0.037886000 seconds sys >> >> Reported-by: Babu Moger <babu.moger@amd.com> >> Fixes: 4a2771895ca6 ("KVM: x86/svm/pmu: Add AMD PerfMonV2 support") >> Signed-off-by: Sandipan Das <sandipan.das@amd.com> >> --- >> arch/x86/kvm/svm/pmu.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c >> index b6a7ad4d6914..14709c564d6a 100644 >> --- a/arch/x86/kvm/svm/pmu.c >> +++ b/arch/x86/kvm/svm/pmu.c >> @@ -205,6 +205,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu) >> if (pmu->version > 1) { >> pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) - 1); >> pmu->global_status_mask = pmu->global_ctrl_mask; >> + pmu->global_ctrl = ~pmu->global_ctrl_mask; >> } >> pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1;
On 3/1/2024 5:00 PM, Sandipan Das wrote: > On 3/1/2024 2:07 PM, Like Xu wrote: >> On 1/3/2024 3:50 pm, Sandipan Das wrote: >>> With PerfMonV2, a performance monitoring counter will start operating >>> only when both the PERF_CTLx enable bit as well as the corresponding >>> PerfCntrGlobalCtl enable bit are set. >>> >>> When the PerfMonV2 CPUID feature bit (leaf 0x80000022 EAX bit 0) is set >>> for a guest but the guest kernel does not support PerfMonV2 (such as >>> kernels older than v5.19), the guest counters do not count since the >>> PerfCntrGlobalCtl MSR is initialized to zero and the guest kernel never >>> writes to it. >> If the vcpu has the PerfMonV2 feature, it should not work the way legacy >> PMU does. Users need to use the new driver to operate the new hardware, >> don't they ? One practical approach is that the hypervisor should not set >> the PerfMonV2 bit for this unpatched 'v5.19' guest. >> > My understanding is that the legacy method of managing the counters should > still work because the enable bits in PerfCntrGlobalCtl are expected to be > set. The AMD PPR does mention that the PerfCntrEn bitfield of PerfCntrGlobalCtl > is set to 0x3f after a system reset. That way, the guest kernel can use either If so, please add the PPR description here as comments. > the new or legacy method. > >>> This is not observed on bare-metal as the default value of the >>> PerfCntrGlobalCtl MSR after a reset is 0x3f (assuming there are six >>> counters) and the counters can still be operated by using the enable >>> bit in the PERF_CTLx MSRs. Replicate the same behaviour in guests for >>> compatibility with older kernels. >>> >>> Before: >>> >>> $ perf stat -e cycles:u true >>> >>> Performance counter stats for 'true': >>> >>> 0 cycles:u >>> >>> 0.001074773 seconds time elapsed >>> >>> 0.001169000 seconds user >>> 0.000000000 seconds sys >>> >>> After: >>> >>> $ perf stat -e cycles:u true >>> >>> Performance counter stats for 'true': >>> >>> 227,850 cycles:u >>> >>> 0.037770758 seconds time elapsed >>> >>> 0.000000000 seconds user >>> 0.037886000 seconds sys >>> >>> Reported-by: Babu Moger <babu.moger@amd.com> >>> Fixes: 4a2771895ca6 ("KVM: x86/svm/pmu: Add AMD PerfMonV2 support") >>> Signed-off-by: Sandipan Das <sandipan.das@amd.com> >>> --- >>> arch/x86/kvm/svm/pmu.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c >>> index b6a7ad4d6914..14709c564d6a 100644 >>> --- a/arch/x86/kvm/svm/pmu.c >>> +++ b/arch/x86/kvm/svm/pmu.c >>> @@ -205,6 +205,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu) >>> if (pmu->version > 1) { >>> pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) - 1); >>> pmu->global_status_mask = pmu->global_ctrl_mask; >>> + pmu->global_ctrl = ~pmu->global_ctrl_mask; It seems to be more easily understand to calculate global_ctrl firstly and then derive the globol_ctrl_mask (negative logic). diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c index e886300f0f97..7ac9b080aba6 100644 --- a/arch/x86/kvm/svm/pmu.c +++ b/arch/x86/kvm/svm/pmu.c @@ -199,7 +199,8 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu) kvm_pmu_cap.num_counters_gp); if (pmu->version > 1) { - pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) - 1); + pmu->global_ctrl = (1ull << pmu->nr_arch_gp_counters) - 1; + pmu->global_ctrl_mask = ~pmu->global_ctrl; pmu->global_status_mask = pmu->global_ctrl_mask; } >>> } >>> pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1; >
On Mon, Mar 04, 2024, Dapeng Mi wrote: > > On 3/1/2024 5:00 PM, Sandipan Das wrote: > > On 3/1/2024 2:07 PM, Like Xu wrote: > > > On 1/3/2024 3:50 pm, Sandipan Das wrote: > > > > With PerfMonV2, a performance monitoring counter will start operating > > > > only when both the PERF_CTLx enable bit as well as the corresponding > > > > PerfCntrGlobalCtl enable bit are set. > > > > > > > > When the PerfMonV2 CPUID feature bit (leaf 0x80000022 EAX bit 0) is set > > > > for a guest but the guest kernel does not support PerfMonV2 (such as > > > > kernels older than v5.19), the guest counters do not count since the > > > > PerfCntrGlobalCtl MSR is initialized to zero and the guest kernel never > > > > writes to it. > > > If the vcpu has the PerfMonV2 feature, it should not work the way legacy > > > PMU does. Users need to use the new driver to operate the new hardware, > > > don't they ? One practical approach is that the hypervisor should not set > > > the PerfMonV2 bit for this unpatched 'v5.19' guest. > > > > > My understanding is that the legacy method of managing the counters should > > still work because the enable bits in PerfCntrGlobalCtl are expected to be > > set. The AMD PPR does mention that the PerfCntrEn bitfield of PerfCntrGlobalCtl > > is set to 0x3f after a system reset. That way, the guest kernel can use either > > If so, please add the PPR description here as comments. Or even better, make that architectural behavior that's documented in the APM. > > > > --- > > > > arch/x86/kvm/svm/pmu.c | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c > > > > index b6a7ad4d6914..14709c564d6a 100644 > > > > --- a/arch/x86/kvm/svm/pmu.c > > > > +++ b/arch/x86/kvm/svm/pmu.c > > > > @@ -205,6 +205,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu) > > > > if (pmu->version > 1) { > > > > pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) - 1); > > > > pmu->global_status_mask = pmu->global_ctrl_mask; > > > > + pmu->global_ctrl = ~pmu->global_ctrl_mask; > > It seems to be more easily understand to calculate global_ctrl firstly and > then derive the globol_ctrl_mask (negative logic). Hrm, I'm torn. On one hand, awful name aside (global_ctrl_mask should really be something like global_ctrl_rsvd_bits), the computation of the reserved bits should come from the capabilities of the PMU, not from the RESET value. On the other hand, setting _all_ non-reserved bits will likely do the wrong thing if AMD ever adds bits in PerfCntGlobalCtl that aren't tied to general purpose counters. But, that's a future theoretical problem, so I'm inclined to vote for Sandipan's approach. > diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c > index e886300f0f97..7ac9b080aba6 100644 > --- a/arch/x86/kvm/svm/pmu.c > +++ b/arch/x86/kvm/svm/pmu.c > @@ -199,7 +199,8 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu) > kvm_pmu_cap.num_counters_gp); > > if (pmu->version > 1) { > - pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) > - 1); > + pmu->global_ctrl = (1ull << pmu->nr_arch_gp_counters) - 1; > + pmu->global_ctrl_mask = ~pmu->global_ctrl; > pmu->global_status_mask = pmu->global_ctrl_mask; > } > > > > > } > > > > pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1; > >
On Fri, Mar 01, 2024, Like Xu wrote: > On 1/3/2024 3:50 pm, Sandipan Das wrote: > > With PerfMonV2, a performance monitoring counter will start operating > > only when both the PERF_CTLx enable bit as well as the corresponding > > PerfCntrGlobalCtl enable bit are set. > > > > When the PerfMonV2 CPUID feature bit (leaf 0x80000022 EAX bit 0) is set > > for a guest but the guest kernel does not support PerfMonV2 (such as > > kernels older than v5.19), the guest counters do not count since the > > PerfCntrGlobalCtl MSR is initialized to zero and the guest kernel never > > writes to it. > > If the vcpu has the PerfMonV2 feature, it should not work the way legacy > PMU does. Users need to use the new driver to operate the new hardware, > don't they ? One practical approach is that the hypervisor should not set > the PerfMonV2 bit for this unpatched 'v5.19' guest. > How could hypervisor know the guest 'Linux version'? KVM should not even assume it is Linux. So that means, if the 'guest driver' does not support PerfMonV2, then guest should just continue to use its legacy code. Otherwise, the guest is considered broken. Thanks. -Mingwei > > > > This is not observed on bare-metal as the default value of the > > PerfCntrGlobalCtl MSR after a reset is 0x3f (assuming there are six > > counters) and the counters can still be operated by using the enable > > bit in the PERF_CTLx MSRs. Replicate the same behaviour in guests for > > compatibility with older kernels. > > > > Before: > > > > $ perf stat -e cycles:u true > > > > Performance counter stats for 'true': > > > > 0 cycles:u > > > > 0.001074773 seconds time elapsed > > > > 0.001169000 seconds user > > 0.000000000 seconds sys > > > > After: > > > > $ perf stat -e cycles:u true > > > > Performance counter stats for 'true': > > > > 227,850 cycles:u > > > > 0.037770758 seconds time elapsed > > > > 0.000000000 seconds user > > 0.037886000 seconds sys > > > > Reported-by: Babu Moger <babu.moger@amd.com> > > Fixes: 4a2771895ca6 ("KVM: x86/svm/pmu: Add AMD PerfMonV2 support") > > Signed-off-by: Sandipan Das <sandipan.das@amd.com> > > --- > > arch/x86/kvm/svm/pmu.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c > > index b6a7ad4d6914..14709c564d6a 100644 > > --- a/arch/x86/kvm/svm/pmu.c > > +++ b/arch/x86/kvm/svm/pmu.c > > @@ -205,6 +205,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu) > > if (pmu->version > 1) { > > pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) - 1); > > pmu->global_status_mask = pmu->global_ctrl_mask; > > + pmu->global_ctrl = ~pmu->global_ctrl_mask; > > } > > pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1;
On Mon, Mar 04, 2024, Sean Christopherson wrote: > On Mon, Mar 04, 2024, Dapeng Mi wrote: > > > > On 3/1/2024 5:00 PM, Sandipan Das wrote: > > > On 3/1/2024 2:07 PM, Like Xu wrote: > > > > On 1/3/2024 3:50 pm, Sandipan Das wrote: > > > > > With PerfMonV2, a performance monitoring counter will start operating > > > > > only when both the PERF_CTLx enable bit as well as the corresponding > > > > > PerfCntrGlobalCtl enable bit are set. > > > > > > > > > > When the PerfMonV2 CPUID feature bit (leaf 0x80000022 EAX bit 0) is set > > > > > for a guest but the guest kernel does not support PerfMonV2 (such as > > > > > kernels older than v5.19), the guest counters do not count since the > > > > > PerfCntrGlobalCtl MSR is initialized to zero and the guest kernel never > > > > > writes to it. > > > > If the vcpu has the PerfMonV2 feature, it should not work the way legacy > > > > PMU does. Users need to use the new driver to operate the new hardware, > > > > don't they ? One practical approach is that the hypervisor should not set > > > > the PerfMonV2 bit for this unpatched 'v5.19' guest. > > > > > > > My understanding is that the legacy method of managing the counters should > > > still work because the enable bits in PerfCntrGlobalCtl are expected to be > > > set. The AMD PPR does mention that the PerfCntrEn bitfield of PerfCntrGlobalCtl > > > is set to 0x3f after a system reset. That way, the guest kernel can use either > > > > If so, please add the PPR description here as comments. > > Or even better, make that architectural behavior that's documented in the APM. > > > > > > --- > > > > > arch/x86/kvm/svm/pmu.c | 1 + > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c > > > > > index b6a7ad4d6914..14709c564d6a 100644 > > > > > --- a/arch/x86/kvm/svm/pmu.c > > > > > +++ b/arch/x86/kvm/svm/pmu.c > > > > > @@ -205,6 +205,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu) > > > > > if (pmu->version > 1) { > > > > > pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) - 1); > > > > > pmu->global_status_mask = pmu->global_ctrl_mask; > > > > > + pmu->global_ctrl = ~pmu->global_ctrl_mask; > > > > It seems to be more easily understand to calculate global_ctrl firstly and > > then derive the globol_ctrl_mask (negative logic). > > Hrm, I'm torn. On one hand, awful name aside (global_ctrl_mask should really be > something like global_ctrl_rsvd_bits), the computation of the reserved bits should > come from the capabilities of the PMU, not from the RESET value. > +1 > On the other hand, setting _all_ non-reserved bits will likely do the wrong thing > if AMD ever adds bits in PerfCntGlobalCtl that aren't tied to general purpose > counters. But, that's a future theoretical problem, so I'm inclined to vote for > Sandipan's approach. > right. I am ok with either approach. Thanks. -Mingwei > > diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c > > index e886300f0f97..7ac9b080aba6 100644 > > --- a/arch/x86/kvm/svm/pmu.c > > +++ b/arch/x86/kvm/svm/pmu.c > > @@ -199,7 +199,8 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu) > > kvm_pmu_cap.num_counters_gp); > > > > if (pmu->version > 1) { > > - pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) > > - 1); > > + pmu->global_ctrl = (1ull << pmu->nr_arch_gp_counters) - 1; > > + pmu->global_ctrl_mask = ~pmu->global_ctrl; > > pmu->global_status_mask = pmu->global_ctrl_mask; > > } > > > > > > > } > > > > > pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1; > > >
On 5/3/2024 3:46 am, Sean Christopherson wrote: > On Mon, Mar 04, 2024, Dapeng Mi wrote: >> >> On 3/1/2024 5:00 PM, Sandipan Das wrote: >>> On 3/1/2024 2:07 PM, Like Xu wrote: >>>> On 1/3/2024 3:50 pm, Sandipan Das wrote: >>>>> With PerfMonV2, a performance monitoring counter will start operating >>>>> only when both the PERF_CTLx enable bit as well as the corresponding >>>>> PerfCntrGlobalCtl enable bit are set. >>>>> >>>>> When the PerfMonV2 CPUID feature bit (leaf 0x80000022 EAX bit 0) is set >>>>> for a guest but the guest kernel does not support PerfMonV2 (such as >>>>> kernels older than v5.19), the guest counters do not count since the >>>>> PerfCntrGlobalCtl MSR is initialized to zero and the guest kernel never >>>>> writes to it. >>>> If the vcpu has the PerfMonV2 feature, it should not work the way legacy >>>> PMU does. Users need to use the new driver to operate the new hardware, >>>> don't they ? One practical approach is that the hypervisor should not set >>>> the PerfMonV2 bit for this unpatched 'v5.19' guest. >>>> >>> My understanding is that the legacy method of managing the counters should >>> still work because the enable bits in PerfCntrGlobalCtl are expected to be >>> set. The AMD PPR does mention that the PerfCntrEn bitfield of PerfCntrGlobalCtl >>> is set to 0x3f after a system reset. That way, the guest kernel can use either >> >> If so, please add the PPR description here as comments. > > Or even better, make that architectural behavior that's documented in the APM. On the AMD side, we can't even reason that "PerfMonV3" will be compatible with "PerfMonV2" w/o APM clarification which is a concern for both driver/virt impl. > >>>>> --- >>>>> arch/x86/kvm/svm/pmu.c | 1 + >>>>> 1 file changed, 1 insertion(+) >>>>> >>>>> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c >>>>> index b6a7ad4d6914..14709c564d6a 100644 >>>>> --- a/arch/x86/kvm/svm/pmu.c >>>>> +++ b/arch/x86/kvm/svm/pmu.c >>>>> @@ -205,6 +205,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu) >>>>> if (pmu->version > 1) { >>>>> pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) - 1); >>>>> pmu->global_status_mask = pmu->global_ctrl_mask; >>>>> + pmu->global_ctrl = ~pmu->global_ctrl_mask; >> >> It seems to be more easily understand to calculate global_ctrl firstly and >> then derive the globol_ctrl_mask (negative logic). > > Hrm, I'm torn. On one hand, awful name aside (global_ctrl_mask should really be > something like global_ctrl_rsvd_bits), the computation of the reserved bits should > come from the capabilities of the PMU, not from the RESET value. > > On the other hand, setting _all_ non-reserved bits will likely do the wrong thing > if AMD ever adds bits in PerfCntGlobalCtl that aren't tied to general purpose > counters. But, that's a future theoretical problem, so I'm inclined to vote for > Sandipan's approach. I suspect that Intel hardware also has this behaviour [*] although guest kernels using Intel pmu version 1 are pretty much non-existent. [*] Table 10-1. IA-32 and Intel® 64 Processor States Following Power-up, Reset, or INIT (Contd.) We need to update the selftest to guard this. > >> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c >> index e886300f0f97..7ac9b080aba6 100644 >> --- a/arch/x86/kvm/svm/pmu.c >> +++ b/arch/x86/kvm/svm/pmu.c >> @@ -199,7 +199,8 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu) >> kvm_pmu_cap.num_counters_gp); >> >> if (pmu->version > 1) { >> - pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) >> - 1); >> + pmu->global_ctrl = (1ull << pmu->nr_arch_gp_counters) - 1; >> + pmu->global_ctrl_mask = ~pmu->global_ctrl; >> pmu->global_status_mask = pmu->global_ctrl_mask; >> } >> >>>>> } >>>>> pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1; >>> >
On Tue, Mar 05, 2024, Like Xu wrote: > On 5/3/2024 3:46 am, Sean Christopherson wrote: > > > > > > --- > > > > > > arch/x86/kvm/svm/pmu.c | 1 + > > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c > > > > > > index b6a7ad4d6914..14709c564d6a 100644 > > > > > > --- a/arch/x86/kvm/svm/pmu.c > > > > > > +++ b/arch/x86/kvm/svm/pmu.c > > > > > > @@ -205,6 +205,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu) > > > > > > if (pmu->version > 1) { > > > > > > pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) - 1); > > > > > > pmu->global_status_mask = pmu->global_ctrl_mask; > > > > > > + pmu->global_ctrl = ~pmu->global_ctrl_mask; > > > > > > It seems to be more easily understand to calculate global_ctrl firstly and > > > then derive the globol_ctrl_mask (negative logic). > > > > Hrm, I'm torn. On one hand, awful name aside (global_ctrl_mask should really be > > something like global_ctrl_rsvd_bits), the computation of the reserved bits should > > come from the capabilities of the PMU, not from the RESET value. > > > > On the other hand, setting _all_ non-reserved bits will likely do the wrong thing > > if AMD ever adds bits in PerfCntGlobalCtl that aren't tied to general purpose > > counters. But, that's a future theoretical problem, so I'm inclined to vote for > > Sandipan's approach. > > I suspect that Intel hardware also has this behaviour [*] although guest > kernels using Intel pmu version 1 are pretty much non-existent. > > [*] Table 10-1. IA-32 and Intel® 64 Processor States Following Power-up, > Reset, or INIT (Contd.) Aha! Nice. To save people lookups, the table says: IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits. and Where "n" is the number of general-purpose counters available in the processor. Which means that (a) KVM can handle this in common code and (b) we can dodge the whole reserved bits chicken-and-egg problem since global_ctrl *can't* be derived from global_ctrl_mask. This? (compile tested only) --- From: Sean Christopherson <seanjc@google.com> Date: Tue, 5 Mar 2024 09:02:26 -0800 Subject: [PATCH] KVM: x86/pmu: Set enable bits for GP counters in PERF_GLOBAL_CTRL at "RESET" Set the enable bits for general purpose counters in IA32_PERF_GLOBAL_CTRL when refreshing the PMU to emulate the MSR's architecturally defined post-RESET behavior. Per Intel's SDM: IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits. and Where "n" is the number of general-purpose counters available in the processor. This is a long-standing bug that was recently exposed when KVM added supported for AMD's PerfMonV2, i.e. when KVM started exposing a vPMU with PERF_GLOBAL_CTRL to guest software that only knew how to program v1 PMUs (that don't support PERF_GLOBAL_CTRL). Failure to emulate the post-RESET behavior results in such guests unknowingly leaving all general purpose counters globally disabled (the entire reason the post-RESET value sets the GP counter enable bits is to maintain backwards compatibility). The bug has likely gone unnoticed because PERF_GLOBAL_CTRL has been supported on Intel CPUs for as long as KVM has existed, i.e. hardly anyone is running guest software that isn't aware of PERF_GLOBAL_CTRL on Intel PMUs. Note, kvm_pmu_refresh() can be invoked multiple times, i.e. it's not a "pure" RESET flow. But it can only be called prior to the first KVM_RUN, i.e. the guest will only ever observe the final value. Reported-by: Reported-by: Babu Moger <babu.moger@amd.com> Cc: Like Xu <like.xu.linux@gmail.com> Cc: Mingwei Zhang <mizhang@google.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/pmu.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index 87cc6c8809ad..f61ce26aeb90 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -741,6 +741,8 @@ static void kvm_pmu_reset(struct kvm_vcpu *vcpu) */ void kvm_pmu_refresh(struct kvm_vcpu *vcpu) { + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); + if (KVM_BUG_ON(kvm_vcpu_has_run(vcpu), vcpu->kvm)) return; @@ -750,8 +752,18 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu) */ kvm_pmu_reset(vcpu); - bitmap_zero(vcpu_to_pmu(vcpu)->all_valid_pmc_idx, X86_PMC_IDX_MAX); + bitmap_zero(pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX); static_call(kvm_x86_pmu_refresh)(vcpu); + + /* + * At RESET, both Intel and AMD CPUs set all enable bits for general + * purpose counters in IA32_PERF_GLOBAL_CTRL (so that software that + * was written for v1 PMUs don't unknowingly leave GP counters disabled + * in the global controls). Emulate that behavior when refreshing the + * PMU so that userspace doesn't need to manually set PERF_GLOBAL_CTRL. + */ + if (kvm_pmu_has_perf_global_ctrl(pmu)) + pmu->global_ctrl = GENMASK_ULL(pmu->nr_arch_gp_counters - 1, 0); } void kvm_pmu_init(struct kvm_vcpu *vcpu) base-commit: 1d7ae977d219e68698fdb9bed1049dc561038aa1 --
On Tue, Mar 05, 2024, Sean Christopherson wrote: > On Tue, Mar 05, 2024, Like Xu wrote: > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c > index 87cc6c8809ad..f61ce26aeb90 100644 > --- a/arch/x86/kvm/pmu.c > +++ b/arch/x86/kvm/pmu.c > @@ -741,6 +741,8 @@ static void kvm_pmu_reset(struct kvm_vcpu *vcpu) > */ > void kvm_pmu_refresh(struct kvm_vcpu *vcpu) > { > + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); > + > if (KVM_BUG_ON(kvm_vcpu_has_run(vcpu), vcpu->kvm)) > return; > > @@ -750,8 +752,18 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu) > */ > kvm_pmu_reset(vcpu); > > - bitmap_zero(vcpu_to_pmu(vcpu)->all_valid_pmc_idx, X86_PMC_IDX_MAX); > + bitmap_zero(pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX); > static_call(kvm_x86_pmu_refresh)(vcpu); > + > + /* > + * At RESET, both Intel and AMD CPUs set all enable bits for general > + * purpose counters in IA32_PERF_GLOBAL_CTRL (so that software that > + * was written for v1 PMUs don't unknowingly leave GP counters disabled > + * in the global controls). Emulate that behavior when refreshing the > + * PMU so that userspace doesn't need to manually set PERF_GLOBAL_CTRL. > + */ > + if (kvm_pmu_has_perf_global_ctrl(pmu)) > + pmu->global_ctrl = GENMASK_ULL(pmu->nr_arch_gp_counters - 1, 0); > } Doh, this is based on kvm/kvm-uapi, I'll rebase to kvm-x86/next before posting. I'll also update the changelog to call out that KVM has always clobbered global_ctrl during PMU refresh, i.e. there is no danger of breaking existing setups by clobbering a value set by userspace, e.g. during live migration. Lastly, I'll also update the changelog to call out that KVM *did* actually set the general purpose counter enable bits in global_ctrl at "RESET" until v6.0, and that KVM intentionally removed that behavior because of what appears to be an Intel SDM bug. Of course, in typical KVM fashion, that old code was also broken in its own way (the history of this code is a comedy of errors). Initial vPMU support in commit f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests") *almost* got it right, but for some reason only set the bits if the guest PMU was advertised as v1: if (pmu->version == 1) { pmu->global_ctrl = (1 << pmu->nr_arch_gp_counters) - 1; return; } Commit f19a0c2c2e6a ("KVM: PMU emulation: GLOBAL_CTRL MSR should be enabled on reset") then tried to remedy that goof, but botched things and also enabled the fixed counters: pmu->global_ctrl = ((1 << pmu->nr_arch_gp_counters) - 1) | (((1ull << pmu->nr_arch_fixed_counters) - 1) << X86_PMC_IDX_FIXED); pmu->global_ctrl_mask = ~pmu->global_ctrl; Which was KVM's behavior up until commit c49467a45fe0 ("KVM: x86/pmu: Don't overwrite the pmu->global_ctrl when refreshing") incorrectly removed *everything*. Very ironically, that commit came from Like. Author: Like Xu <likexu@tencent.com> Date: Tue May 10 12:44:07 2022 +0800 KVM: x86/pmu: Don't overwrite the pmu->global_ctrl when refreshing Assigning a value to pmu->global_ctrl just to set the value of pmu->global_ctrl_mask is more readable but does not conform to the specification. The value is reset to zero on Power up and Reset but stays unchanged on INIT, like most other MSRs. But wait, it gets even better. Like wasn't making up that behavior, Intel's SDM circa December 2022 states that "Global Perf Counter Controls" is '0' at Power-Up and RESET. But then the March 2023 SDM rolls out and says IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits. So presumably someone at Intel noticed that what their CPUs do and what the documentation says didn't match. *sigh*
On Tue, Mar 05, 2024, Sean Christopherson wrote: > On Tue, Mar 05, 2024, Sean Christopherson wrote: > > On Tue, Mar 05, 2024, Like Xu wrote: > > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c > > index 87cc6c8809ad..f61ce26aeb90 100644 > > --- a/arch/x86/kvm/pmu.c > > +++ b/arch/x86/kvm/pmu.c > > @@ -741,6 +741,8 @@ static void kvm_pmu_reset(struct kvm_vcpu *vcpu) > > */ > > void kvm_pmu_refresh(struct kvm_vcpu *vcpu) > > { > > + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); > > + > > if (KVM_BUG_ON(kvm_vcpu_has_run(vcpu), vcpu->kvm)) > > return; > > > > @@ -750,8 +752,18 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu) > > */ > > kvm_pmu_reset(vcpu); > > > > - bitmap_zero(vcpu_to_pmu(vcpu)->all_valid_pmc_idx, X86_PMC_IDX_MAX); > > + bitmap_zero(pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX); > > static_call(kvm_x86_pmu_refresh)(vcpu); > > + > > + /* > > + * At RESET, both Intel and AMD CPUs set all enable bits for general > > + * purpose counters in IA32_PERF_GLOBAL_CTRL (so that software that > > + * was written for v1 PMUs don't unknowingly leave GP counters disabled > > + * in the global controls). Emulate that behavior when refreshing the > > + * PMU so that userspace doesn't need to manually set PERF_GLOBAL_CTRL. > > + */ > > + if (kvm_pmu_has_perf_global_ctrl(pmu)) > > + pmu->global_ctrl = GENMASK_ULL(pmu->nr_arch_gp_counters - 1, 0); > > } > > Doh, this is based on kvm/kvm-uapi, I'll rebase to kvm-x86/next before posting. > > I'll also update the changelog to call out that KVM has always clobbered global_ctrl > during PMU refresh, i.e. there is no danger of breaking existing setups by > clobbering a value set by userspace, e.g. during live migration. > > Lastly, I'll also update the changelog to call out that KVM *did* actually set > the general purpose counter enable bits in global_ctrl at "RESET" until v6.0, > and that KVM intentionally removed that behavior because of what appears to be > an Intel SDM bug. > > Of course, in typical KVM fashion, that old code was also broken in its own way > (the history of this code is a comedy of errors). Initial vPMU support in commit > f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests") *almost* > got it right, but for some reason only set the bits if the guest PMU was > advertised as v1: > > if (pmu->version == 1) { > pmu->global_ctrl = (1 << pmu->nr_arch_gp_counters) - 1; > return; > } > > > Commit f19a0c2c2e6a ("KVM: PMU emulation: GLOBAL_CTRL MSR should be enabled on > reset") then tried to remedy that goof, but botched things and also enabled the > fixed counters: > > pmu->global_ctrl = ((1 << pmu->nr_arch_gp_counters) - 1) | > (((1ull << pmu->nr_arch_fixed_counters) - 1) << X86_PMC_IDX_FIXED); > pmu->global_ctrl_mask = ~pmu->global_ctrl; > > Which was KVM's behavior up until commit c49467a45fe0 ("KVM: x86/pmu: Don't overwrite > the pmu->global_ctrl when refreshing") incorrectly removed *everything*. Very > ironically, that commit came from Like. > > Author: Like Xu <likexu@tencent.com> > Date: Tue May 10 12:44:07 2022 +0800 > > KVM: x86/pmu: Don't overwrite the pmu->global_ctrl when refreshing > > Assigning a value to pmu->global_ctrl just to set the value of > pmu->global_ctrl_mask is more readable but does not conform to the > specification. The value is reset to zero on Power up and Reset but > stays unchanged on INIT, like most other MSRs. > > But wait, it gets even better. Like wasn't making up that behavior, Intel's SDM > circa December 2022 states that "Global Perf Counter Controls" is '0' at Power-Up > and RESET. But then the March 2023 SDM rolls out and says > > IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits. > > So presumably someone at Intel noticed that what their CPUs do and what the > documentation says didn't match. > Sean, can you update your commit message with the table name of the Intel SDM and the version of the Intel SDM (2023 version). It was quite hard to find where exactly SDM mentioned this, since I was using the 2022 version. Thanks. -Mingwei > *sigh*
On 3/6/2024 2:04 AM, Sean Christopherson wrote: > On Tue, Mar 05, 2024, Sean Christopherson wrote: >> On Tue, Mar 05, 2024, Like Xu wrote: >> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c >> index 87cc6c8809ad..f61ce26aeb90 100644 >> --- a/arch/x86/kvm/pmu.c >> +++ b/arch/x86/kvm/pmu.c >> @@ -741,6 +741,8 @@ static void kvm_pmu_reset(struct kvm_vcpu *vcpu) >> */ >> void kvm_pmu_refresh(struct kvm_vcpu *vcpu) >> { >> + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); >> + >> if (KVM_BUG_ON(kvm_vcpu_has_run(vcpu), vcpu->kvm)) >> return; >> >> @@ -750,8 +752,18 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu) >> */ >> kvm_pmu_reset(vcpu); >> >> - bitmap_zero(vcpu_to_pmu(vcpu)->all_valid_pmc_idx, X86_PMC_IDX_MAX); >> + bitmap_zero(pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX); >> static_call(kvm_x86_pmu_refresh)(vcpu); >> + >> + /* >> + * At RESET, both Intel and AMD CPUs set all enable bits for general >> + * purpose counters in IA32_PERF_GLOBAL_CTRL (so that software that >> + * was written for v1 PMUs don't unknowingly leave GP counters disabled >> + * in the global controls). Emulate that behavior when refreshing the >> + * PMU so that userspace doesn't need to manually set PERF_GLOBAL_CTRL. >> + */ >> + if (kvm_pmu_has_perf_global_ctrl(pmu)) >> + pmu->global_ctrl = GENMASK_ULL(pmu->nr_arch_gp_counters - 1, 0); >> } > Doh, this is based on kvm/kvm-uapi, I'll rebase to kvm-x86/next before posting. > > I'll also update the changelog to call out that KVM has always clobbered global_ctrl > during PMU refresh, i.e. there is no danger of breaking existing setups by > clobbering a value set by userspace, e.g. during live migration. > > Lastly, I'll also update the changelog to call out that KVM *did* actually set > the general purpose counter enable bits in global_ctrl at "RESET" until v6.0, > and that KVM intentionally removed that behavior because of what appears to be > an Intel SDM bug. > > Of course, in typical KVM fashion, that old code was also broken in its own way > (the history of this code is a comedy of errors). Initial vPMU support in commit > f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests") *almost* > got it right, but for some reason only set the bits if the guest PMU was > advertised as v1: > > if (pmu->version == 1) { > pmu->global_ctrl = (1 << pmu->nr_arch_gp_counters) - 1; > return; > } > > > Commit f19a0c2c2e6a ("KVM: PMU emulation: GLOBAL_CTRL MSR should be enabled on > reset") then tried to remedy that goof, but botched things and also enabled the > fixed counters: > > pmu->global_ctrl = ((1 << pmu->nr_arch_gp_counters) - 1) | > (((1ull << pmu->nr_arch_fixed_counters) - 1) << X86_PMC_IDX_FIXED); > pmu->global_ctrl_mask = ~pmu->global_ctrl; > > Which was KVM's behavior up until commit c49467a45fe0 ("KVM: x86/pmu: Don't overwrite > the pmu->global_ctrl when refreshing") incorrectly removed *everything*. Very > ironically, that commit came from Like. > > Author: Like Xu <likexu@tencent.com> > Date: Tue May 10 12:44:07 2022 +0800 > > KVM: x86/pmu: Don't overwrite the pmu->global_ctrl when refreshing > > Assigning a value to pmu->global_ctrl just to set the value of > pmu->global_ctrl_mask is more readable but does not conform to the > specification. The value is reset to zero on Power up and Reset but > stays unchanged on INIT, like most other MSRs. > > But wait, it gets even better. Like wasn't making up that behavior, Intel's SDM > circa December 2022 states that "Global Perf Counter Controls" is '0' at Power-Up > and RESET. But then the March 2023 SDM rolls out and says > > IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits. > > So presumably someone at Intel noticed that what their CPUs do and what the > documentation says didn't match. It's a really long story... thanks for figuring it out. > > *sigh* >
On 3/5/2024 3:46 AM, Sean Christopherson wrote: > On Mon, Mar 04, 2024, Dapeng Mi wrote: >> On 3/1/2024 5:00 PM, Sandipan Das wrote: >>> On 3/1/2024 2:07 PM, Like Xu wrote: >>>> On 1/3/2024 3:50 pm, Sandipan Das wrote: >>>>> With PerfMonV2, a performance monitoring counter will start operating >>>>> only when both the PERF_CTLx enable bit as well as the corresponding >>>>> PerfCntrGlobalCtl enable bit are set. >>>>> >>>>> When the PerfMonV2 CPUID feature bit (leaf 0x80000022 EAX bit 0) is set >>>>> for a guest but the guest kernel does not support PerfMonV2 (such as >>>>> kernels older than v5.19), the guest counters do not count since the >>>>> PerfCntrGlobalCtl MSR is initialized to zero and the guest kernel never >>>>> writes to it. >>>> If the vcpu has the PerfMonV2 feature, it should not work the way legacy >>>> PMU does. Users need to use the new driver to operate the new hardware, >>>> don't they ? One practical approach is that the hypervisor should not set >>>> the PerfMonV2 bit for this unpatched 'v5.19' guest. >>>> >>> My understanding is that the legacy method of managing the counters should >>> still work because the enable bits in PerfCntrGlobalCtl are expected to be >>> set. The AMD PPR does mention that the PerfCntrEn bitfield of PerfCntrGlobalCtl >>> is set to 0x3f after a system reset. That way, the guest kernel can use either >> If so, please add the PPR description here as comments. > Or even better, make that architectural behavior that's documented in the APM. > >>>>> --- >>>>> arch/x86/kvm/svm/pmu.c | 1 + >>>>> 1 file changed, 1 insertion(+) >>>>> >>>>> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c >>>>> index b6a7ad4d6914..14709c564d6a 100644 >>>>> --- a/arch/x86/kvm/svm/pmu.c >>>>> +++ b/arch/x86/kvm/svm/pmu.c >>>>> @@ -205,6 +205,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu) >>>>> if (pmu->version > 1) { >>>>> pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) - 1); >>>>> pmu->global_status_mask = pmu->global_ctrl_mask; >>>>> + pmu->global_ctrl = ~pmu->global_ctrl_mask; >> It seems to be more easily understand to calculate global_ctrl firstly and >> then derive the globol_ctrl_mask (negative logic). > Hrm, I'm torn. On one hand, awful name aside (global_ctrl_mask should really be > something like global_ctrl_rsvd_bits), the computation of the reserved bits should Yeah, same feeling here. global_ctrl_mask is ambiguous and users are hard to get its real meaning just from the name and have to read the all the code. global_ctrl_rsvd_bits looks to be a better name. There are several other variables with similar name "xxx_mask". Sean, do you think it's a good time to change the name for all these variables? Thanks. > come from the capabilities of the PMU, not from the RESET value. > > On the other hand, setting _all_ non-reserved bits will likely do the wrong thing > if AMD ever adds bits in PerfCntGlobalCtl that aren't tied to general purpose > counters. But, that's a future theoretical problem, so I'm inclined to vote for > Sandipan's approach. > >> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c >> index e886300f0f97..7ac9b080aba6 100644 >> --- a/arch/x86/kvm/svm/pmu.c >> +++ b/arch/x86/kvm/svm/pmu.c >> @@ -199,7 +199,8 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu) >> kvm_pmu_cap.num_counters_gp); >> >> if (pmu->version > 1) { >> - pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) >> - 1); >> + pmu->global_ctrl = (1ull << pmu->nr_arch_gp_counters) - 1; >> + pmu->global_ctrl_mask = ~pmu->global_ctrl; >> pmu->global_status_mask = pmu->global_ctrl_mask; >> } >> >>>>> } >>>>> pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1;
On 6/3/2024 10:32 am, Mi, Dapeng wrote: > > On 3/6/2024 2:04 AM, Sean Christopherson wrote: >> On Tue, Mar 05, 2024, Sean Christopherson wrote: >>> On Tue, Mar 05, 2024, Like Xu wrote: >>> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c >>> index 87cc6c8809ad..f61ce26aeb90 100644 >>> --- a/arch/x86/kvm/pmu.c >>> +++ b/arch/x86/kvm/pmu.c >>> @@ -741,6 +741,8 @@ static void kvm_pmu_reset(struct kvm_vcpu *vcpu) >>> */ >>> void kvm_pmu_refresh(struct kvm_vcpu *vcpu) >>> { >>> + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); >>> + >>> if (KVM_BUG_ON(kvm_vcpu_has_run(vcpu), vcpu->kvm)) >>> return; >>> @@ -750,8 +752,18 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu) >>> */ >>> kvm_pmu_reset(vcpu); >>> - bitmap_zero(vcpu_to_pmu(vcpu)->all_valid_pmc_idx, X86_PMC_IDX_MAX); >>> + bitmap_zero(pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX); >>> static_call(kvm_x86_pmu_refresh)(vcpu); >>> + >>> + /* >>> + * At RESET, both Intel and AMD CPUs set all enable bits for general >>> + * purpose counters in IA32_PERF_GLOBAL_CTRL (so that software that >>> + * was written for v1 PMUs don't unknowingly leave GP counters disabled >>> + * in the global controls). Emulate that behavior when refreshing the >>> + * PMU so that userspace doesn't need to manually set PERF_GLOBAL_CTRL. >>> + */ >>> + if (kvm_pmu_has_perf_global_ctrl(pmu)) >>> + pmu->global_ctrl = GENMASK_ULL(pmu->nr_arch_gp_counters - 1, 0); >>> } >> Doh, this is based on kvm/kvm-uapi, I'll rebase to kvm-x86/next before posting. >> >> I'll also update the changelog to call out that KVM has always clobbered >> global_ctrl >> during PMU refresh, i.e. there is no danger of breaking existing setups by >> clobbering a value set by userspace, e.g. during live migration. >> >> Lastly, I'll also update the changelog to call out that KVM *did* actually set >> the general purpose counter enable bits in global_ctrl at "RESET" until v6.0, >> and that KVM intentionally removed that behavior because of what appears to be >> an Intel SDM bug. >> >> Of course, in typical KVM fashion, that old code was also broken in its own way >> (the history of this code is a comedy of errors). Initial vPMU support in commit >> f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests") *almost* >> got it right, but for some reason only set the bits if the guest PMU was >> advertised as v1: >> >> if (pmu->version == 1) { >> pmu->global_ctrl = (1 << pmu->nr_arch_gp_counters) - 1; >> return; >> } >> >> >> Commit f19a0c2c2e6a ("KVM: PMU emulation: GLOBAL_CTRL MSR should be enabled on >> reset") then tried to remedy that goof, but botched things and also enabled the >> fixed counters: >> >> pmu->global_ctrl = ((1 << pmu->nr_arch_gp_counters) - 1) | >> (((1ull << pmu->nr_arch_fixed_counters) - 1) << >> X86_PMC_IDX_FIXED); >> pmu->global_ctrl_mask = ~pmu->global_ctrl; >> >> Which was KVM's behavior up until commit c49467a45fe0 ("KVM: x86/pmu: Don't >> overwrite >> the pmu->global_ctrl when refreshing") incorrectly removed *everything*. Very >> ironically, that commit came from Like. >> >> Author: Like Xu <likexu@tencent.com> >> Date: Tue May 10 12:44:07 2022 +0800 >> >> KVM: x86/pmu: Don't overwrite the pmu->global_ctrl when refreshing >> Assigning a value to pmu->global_ctrl just to set the value of >> pmu->global_ctrl_mask is more readable but does not conform to the >> specification. The value is reset to zero on Power up and Reset but >> stays unchanged on INIT, like most other MSRs. >> >> But wait, it gets even better. Like wasn't making up that behavior, Intel's SDM >> circa December 2022 states that "Global Perf Counter Controls" is '0' at Power-Up >> and RESET. But then the March 2023 SDM rolls out and says >> >> IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits. >> >> So presumably someone at Intel noticed that what their CPUs do and what the >> documentation says didn't match. This reminds me quite a bit of the past, where this kind of thing happened occasionally (e.g. some pmu events don't count as expected, and ucode updates often sneak in to change hardware behaviour). Sometimes we have to rely on hardware behaviour, sometimes we have to trust the documentation specification, my experience has been to find a balance that is more favourable to the end-user and to force the hardware vendors to make more careful documentation and implementation. :D > > It's a really long story... thanks for figuring it out. > >> >> *sigh* >> >
On 3/5/2024 10:52 PM, Sean Christopherson wrote: > On Tue, Mar 05, 2024, Like Xu wrote: >> On 5/3/2024 3:46 am, Sean Christopherson wrote: >>>>>>> --- >>>>>>> arch/x86/kvm/svm/pmu.c | 1 + >>>>>>> 1 file changed, 1 insertion(+) >>>>>>> >>>>>>> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c >>>>>>> index b6a7ad4d6914..14709c564d6a 100644 >>>>>>> --- a/arch/x86/kvm/svm/pmu.c >>>>>>> +++ b/arch/x86/kvm/svm/pmu.c >>>>>>> @@ -205,6 +205,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu) >>>>>>> if (pmu->version > 1) { >>>>>>> pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) - 1); >>>>>>> pmu->global_status_mask = pmu->global_ctrl_mask; >>>>>>> + pmu->global_ctrl = ~pmu->global_ctrl_mask; >>>> >>>> It seems to be more easily understand to calculate global_ctrl firstly and >>>> then derive the globol_ctrl_mask (negative logic). >>> >>> Hrm, I'm torn. On one hand, awful name aside (global_ctrl_mask should really be >>> something like global_ctrl_rsvd_bits), the computation of the reserved bits should >>> come from the capabilities of the PMU, not from the RESET value. >>> >>> On the other hand, setting _all_ non-reserved bits will likely do the wrong thing >>> if AMD ever adds bits in PerfCntGlobalCtl that aren't tied to general purpose >>> counters. But, that's a future theoretical problem, so I'm inclined to vote for >>> Sandipan's approach. >> >> I suspect that Intel hardware also has this behaviour [*] although guest >> kernels using Intel pmu version 1 are pretty much non-existent. >> >> [*] Table 10-1. IA-32 and Intel® 64 Processor States Following Power-up, >> Reset, or INIT (Contd.) > > Aha! Nice. To save people lookups, the table says: > > IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits. > > and > > Where "n" is the number of general-purpose counters available in the processor. > > Which means that (a) KVM can handle this in common code and (b) we can dodge the > whole reserved bits chicken-and-egg problem since global_ctrl *can't* be derived > from global_ctrl_mask. > > This? (compile tested only) > > --- > From: Sean Christopherson <seanjc@google.com> > Date: Tue, 5 Mar 2024 09:02:26 -0800 > Subject: [PATCH] KVM: x86/pmu: Set enable bits for GP counters in > PERF_GLOBAL_CTRL at "RESET" > > Set the enable bits for general purpose counters in IA32_PERF_GLOBAL_CTRL > when refreshing the PMU to emulate the MSR's architecturally defined > post-RESET behavior. Per Intel's SDM: > > IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits. > > and > > Where "n" is the number of general-purpose counters available in the processor. > > This is a long-standing bug that was recently exposed when KVM added > supported for AMD's PerfMonV2, i.e. when KVM started exposing a vPMU with > PERF_GLOBAL_CTRL to guest software that only knew how to program v1 PMUs > (that don't support PERF_GLOBAL_CTRL). Failure to emulate the post-RESET > behavior results in such guests unknowingly leaving all general purpose > counters globally disabled (the entire reason the post-RESET value sets > the GP counter enable bits is to maintain backwards compatibility). > > The bug has likely gone unnoticed because PERF_GLOBAL_CTRL has been > supported on Intel CPUs for as long as KVM has existed, i.e. hardly anyone > is running guest software that isn't aware of PERF_GLOBAL_CTRL on Intel > PMUs. > > Note, kvm_pmu_refresh() can be invoked multiple times, i.e. it's not a > "pure" RESET flow. But it can only be called prior to the first KVM_RUN, > i.e. the guest will only ever observe the final value. > > Reported-by: Reported-by: Babu Moger <babu.moger@amd.com> > Cc: Like Xu <like.xu.linux@gmail.com> > Cc: Mingwei Zhang <mizhang@google.com> > Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> > Cc: Sandipan Das <sandipan.das@amd.com> > Cc: stable@vger.kernel.org > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/kvm/pmu.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c > index 87cc6c8809ad..f61ce26aeb90 100644 > --- a/arch/x86/kvm/pmu.c > +++ b/arch/x86/kvm/pmu.c > @@ -741,6 +741,8 @@ static void kvm_pmu_reset(struct kvm_vcpu *vcpu) > */ > void kvm_pmu_refresh(struct kvm_vcpu *vcpu) > { > + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); > + > if (KVM_BUG_ON(kvm_vcpu_has_run(vcpu), vcpu->kvm)) > return; > > @@ -750,8 +752,18 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu) > */ > kvm_pmu_reset(vcpu); > > - bitmap_zero(vcpu_to_pmu(vcpu)->all_valid_pmc_idx, X86_PMC_IDX_MAX); > + bitmap_zero(pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX); > static_call(kvm_x86_pmu_refresh)(vcpu); > + > + /* > + * At RESET, both Intel and AMD CPUs set all enable bits for general > + * purpose counters in IA32_PERF_GLOBAL_CTRL (so that software that > + * was written for v1 PMUs don't unknowingly leave GP counters disabled > + * in the global controls). Emulate that behavior when refreshing the > + * PMU so that userspace doesn't need to manually set PERF_GLOBAL_CTRL. > + */ > + if (kvm_pmu_has_perf_global_ctrl(pmu)) > + pmu->global_ctrl = GENMASK_ULL(pmu->nr_arch_gp_counters - 1, 0); > } > > void kvm_pmu_init(struct kvm_vcpu *vcpu) > > base-commit: 1d7ae977d219e68698fdb9bed1049dc561038aa1 Thanks. This looks good. Tested-by: Sandipan Das <sandipan.das@amd.com>
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c index b6a7ad4d6914..14709c564d6a 100644 --- a/arch/x86/kvm/svm/pmu.c +++ b/arch/x86/kvm/svm/pmu.c @@ -205,6 +205,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu) if (pmu->version > 1) { pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) - 1); pmu->global_status_mask = pmu->global_ctrl_mask; + pmu->global_ctrl = ~pmu->global_ctrl_mask; } pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1;
With PerfMonV2, a performance monitoring counter will start operating only when both the PERF_CTLx enable bit as well as the corresponding PerfCntrGlobalCtl enable bit are set. When the PerfMonV2 CPUID feature bit (leaf 0x80000022 EAX bit 0) is set for a guest but the guest kernel does not support PerfMonV2 (such as kernels older than v5.19), the guest counters do not count since the PerfCntrGlobalCtl MSR is initialized to zero and the guest kernel never writes to it. This is not observed on bare-metal as the default value of the PerfCntrGlobalCtl MSR after a reset is 0x3f (assuming there are six counters) and the counters can still be operated by using the enable bit in the PERF_CTLx MSRs. Replicate the same behaviour in guests for compatibility with older kernels. Before: $ perf stat -e cycles:u true Performance counter stats for 'true': 0 cycles:u 0.001074773 seconds time elapsed 0.001169000 seconds user 0.000000000 seconds sys After: $ perf stat -e cycles:u true Performance counter stats for 'true': 227,850 cycles:u 0.037770758 seconds time elapsed 0.000000000 seconds user 0.037886000 seconds sys Reported-by: Babu Moger <babu.moger@amd.com> Fixes: 4a2771895ca6 ("KVM: x86/svm/pmu: Add AMD PerfMonV2 support") Signed-off-by: Sandipan Das <sandipan.das@amd.com> --- arch/x86/kvm/svm/pmu.c | 1 + 1 file changed, 1 insertion(+)