mbox series

[RESEND,v12,00/17] KVM: x86/pmu: Add basic support to enable guest PEBS via DS

Message ID 20220411101946.20262-1-likexu@tencent.com (mailing list archive)
Headers show
Series KVM: x86/pmu: Add basic support to enable guest PEBS via DS | expand

Message

Like Xu April 11, 2022, 10:19 a.m. UTC
Hi,

Out of more accurate profiling results, this feature still has loyal
followers and another new rebased version is here. PeterZ had acked
the V9 patchset [0] and Paolo had asked for a new version, so please
check the changelog and feel free to review, test and comment.

[0] https://lore.kernel.org/kvm/YQF7lwM6qzYso0Gg@hirez.programming.kicks-ass.net/
[1] https://lore.kernel.org/kvm/95bf3dca-c6d1-02c8-40b6-8bb29a3a7a36@redhat.com/

---

The guest Precise Event Based Sampling (PEBS) feature can provide an
architectural state of the instruction executed after the guest instruction
that exactly caused the event. It needs new hardware facility only available
on Intel Ice Lake Server platforms. This patch set enables the basic PEBS
feature for KVM guests on ICX.

We can use PEBS feature on the Linux guest like native:

   # echo 0 > /proc/sys/kernel/watchdog (on the host)
   # perf record -e instructions:ppp ./br_instr a
   # perf record -c 100000 -e instructions:pp ./br_instr a

To emulate guest PEBS facility for the above perf usages,
we need to implement 2 code paths:

1) Fast path

This is when the host assigned physical PMC has an identical index as the
virtual PMC (e.g. using physical PMC0 to emulate virtual PMC0).
This path is used in most common use cases.

2) Slow path

This is when the host assigned physical PMC has a different index from the
virtual PMC (e.g. using physical PMC1 to emulate virtual PMC0) In this case,
KVM needs to rewrite the PEBS records to change the applicable counter indexes
to the virtual PMC indexes, which would otherwise contain the physical counter
index written by PEBS facility, and switch the counter reset values to the
offset corresponding to the physical counter indexes in the DS data structure.

The previous version [3] enables both fast path and slow path, which seems
a bit more complex as the first step. In this patchset, we want to start with
the fast path to get the basic guest PEBS enabled while keeping the slow path
disabled. More focused discussion on the slow path [4] is planned to be put to
another patchset in the next step.

Compared to later versions in subsequent steps, the functionality to support
host-guest PEBS both enabled and the functionality to emulate guest PEBS when
the counter is cross-mapped are missing in this patch set
(neither of these are typical scenarios).

With the basic support, the guest can retrieve the correct PEBS information from
its own PEBS records on the Ice Lake servers. And we expect it should work when
migrating to another Ice Lake and no regression about host perf is expected.

Here are the results of pebs test from guest/host for same workload:

perf report on guest:
# Samples: 2K of event 'instructions:ppp', # Event count (approx.): 1473377250 # Overhead  Command   Shared Object      Symbol
   57.74%  br_instr  br_instr           [.] lfsr_cond
   41.40%  br_instr  br_instr           [.] cmp_end
    0.21%  br_instr  [kernel.kallsyms]  [k] __lock_acquire

perf report on host:
# Samples: 2K of event 'instructions:ppp', # Event count (approx.): 1462721386 # Overhead  Command   Shared Object     Symbol
   57.90%  br_instr  br_instr          [.] lfsr_cond
   41.95%  br_instr  br_instr          [.] cmp_end
    0.05%  br_instr  [kernel.vmlinux]  [k] lock_acquire
    Conclusion: the profiling results on the guest are similar tothat on the host.

A minimum guest kernel version may be v5.4 or a backport version support
Icelake server PEBS.

Please check more details in each commit and feel free to comment.

Previous:
https://lore.kernel.org/kvm/20220304090427.90888-1-likexu@tencent.com/

[3]
https://lore.kernel.org/kvm/20210104131542.495413-1-like.xu@linux.intel.com/
[4]
https://lore.kernel.org/kvm/20210115191113.nktlnmivc3edstiv@two.firstfloor.org/

V12->V12 RESEND:
- Rebased on the top of kvm/queue; (Paolo)
- Add PEBS msrs to the msrs_to_save_all[];
- Stay up to date on https://github.com/lkml-likexu/linux/tree/kvm-queue-pebs;

V11->V12:
- Apply the new perf interface from tip/perf/core and fix the merge conflict;
- Rename "x86_pmu.pebs_ept" to "x86_pmu.pebs_ept"; (Sean)
- Rebased on the top of kvm/queue (b13a3befc815); (Paolo)

V10->V11:
- Merge perf_guest_info_callbacks static_call to the tip/perf/core;
- Keep use perf_guest_cbs in the kvm/queue context before merge window;
- Fix MSR_IA32_MISC_ENABLE_EMON bit (Liu XiangDong);
- Rebase "Reprogram PEBS event to emulate guest PEBS counter" patch;

V9->V10:
- improve readability in core.c(Peter Z)
- reuse guest_pebs_idxs(Liu XiangDong)

V8 -> V9 Changelog:
-fix a brackets error in xen_guest_state()

V7 -> V8 Changelog:
- fix coding style, add {} for single statement of multiple lines(Peter Z)
- fix coding style in xen_guest_state() (Boris Ostrovsky)
- s/pmu/kvm_pmu/ in intel_guest_get_msrs() (Peter Z)
- put lower cost branch in the first place for x86_pmu_handle_guest_pebs() (Peter Z)

V6 -> V7 Changelog:
- Fix conditions order and call x86_pmu_handle_guest_pebs() unconditionally; (PeterZ)
- Add a new patch to make all that perf_guest_cbs stuff suck less; (PeterZ)
- Document IA32_MISC_ENABLE[7] that that behavior matches bare metal; (Sean & Venkatesh)
- Update commit message for fixed counter mask refactoring;(PeterZ)
- Clarifying comments about {.host and .guest} for intel_guest_get_msrs(); (PeterZ)
- Add pebs_capable to store valid PEBS_COUNTER_MASK value; (PeterZ)
- Add more comments for perf's precise_ip field; (Andi & PeterZ)
- Refactor perf_overflow_handler_t and make it more legible; (PeterZ)
- Use "(unsigned long)cpuc->ds" instead of __this_cpu_read(cpu_hw_events.ds); (PeterZ)
- Keep using "(struct kvm_pmu *)data" to follow K&R; (Andi)

Like Xu (16):
  perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server
  perf/x86/intel: Handle guest PEBS overflow PMI for KVM guest
  perf/x86/core: Pass "struct kvm_pmu *" to determine the guest values
  KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled
  KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter
  KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
  KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter
  KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counter
  KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS
  KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS
  KVM: x86: Set PEBS_UNAVAIL in IA32_MISC_ENABLE when PEBS is enabled
  KVM: x86/pmu: Move pmc_speculative_in_use() to arch/x86/kvm/pmu.h
  KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations
  KVM: x86/pmu: Add kvm_pmu_cap to optimize perf_get_x86_pmu_capability
  KVM: x86/cpuid: Refactor host/guest CPU model consistency check
  KVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64

Peter Zijlstra (Intel) (1):
  x86/perf/core: Add pebs_capable to store valid PEBS_COUNTER_MASK value

 arch/x86/events/core.c            |   5 +-
 arch/x86/events/intel/core.c      | 157 +++++++++++++++++++++++++-----
 arch/x86/events/perf_event.h      |   6 +-
 arch/x86/include/asm/kvm_host.h   |  16 +++
 arch/x86/include/asm/msr-index.h  |   6 ++
 arch/x86/include/asm/perf_event.h |   5 +-
 arch/x86/kvm/cpuid.c              |  27 ++---
 arch/x86/kvm/cpuid.h              |   5 +
 arch/x86/kvm/pmu.c                |  52 +++++++---
 arch/x86/kvm/pmu.h                |  38 ++++++++
 arch/x86/kvm/vmx/capabilities.h   |  28 +++---
 arch/x86/kvm/vmx/pmu_intel.c      | 118 ++++++++++++++++++----
 arch/x86/kvm/vmx/vmx.c            |  24 ++++-
 arch/x86/kvm/vmx/vmx.h            |   2 +-
 arch/x86/kvm/x86.c                |  31 ++++--
 15 files changed, 412 insertions(+), 108 deletions(-)

Comments

Paolo Bonzini May 10, 2022, 4:55 p.m. UTC | #1
Queued, thanks, but only because I have not done my job very well
in handling this patch series (and LBR too) and I feel bad about
it.  Sending such a large patch series with no kvm-unit-tests should
not happen, and I'd be grateful if you wrote testcases after the fact.

Paolo
Vitaly Kuznetsov May 19, 2022, 12:14 p.m. UTC | #2
Like Xu <like.xu.linux@gmail.com> writes:

...

Hi, the following commit

>   KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS

(currently in kvm/queue) breaks a number of selftests, e.g.:

# ./tools/testing/selftests/kvm/x86_64/state_test 
==== Test Assertion Failure ====
  lib/x86_64/processor.c:1207: r == nmsrs
  pid=6702 tid=6702 errno=7 - Argument list too long
     1	0x000000000040da11: vcpu_save_state at processor.c:1207 (discriminator 4)
     2	0x00000000004024e5: main at state_test.c:209 (discriminator 6)
     3	0x00007f9f48c2d55f: ?? ??:0
     4	0x00007f9f48c2d60b: ?? ??:0
     5	0x00000000004026d4: _start at ??:?
  Unexpected result from KVM_GET_MSRS, r: 29 (failed MSR was 0x3f1)

I don't think any of these failing tests care about MSR_IA32_PEBS_ENABLE
in particular, they're just trying to do KVM_GET_MSRS/KVM_SET_MSRS.
Like Xu May 19, 2022, 1:31 p.m. UTC | #3
On 19/5/2022 8:14 pm, Vitaly Kuznetsov wrote:
> Like Xu <like.xu.linux@gmail.com> writes:
> 
> ...
> 
> Hi, the following commit
> 
>>    KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
> 
> (currently in kvm/queue) breaks a number of selftests, e.g.:

Indeed, e.g.:

x86_64/hyperv_clock
x86_64/max_vcpuid_cap_test
x86_64/mmu_role_test

> 
> # ./tools/testing/selftests/kvm/x86_64/state_test

This test continues to be silent after the top commit a3808d884612 ("KVM: x86/pmu:
Expose CPUIDs feature bits PDCM, DS, DTES64"), which implies a root cause.

Anyway, thanks for this git-bisect report.

> ==== Test Assertion Failure ====
>    lib/x86_64/processor.c:1207: r == nmsrs
>    pid=6702 tid=6702 errno=7 - Argument list too long
>       1	0x000000000040da11: vcpu_save_state at processor.c:1207 (discriminator 4)
>       2	0x00000000004024e5: main at state_test.c:209 (discriminator 6)
>       3	0x00007f9f48c2d55f: ?? ??:0
>       4	0x00007f9f48c2d60b: ?? ??:0
>       5	0x00000000004026d4: _start at ??:?
>    Unexpected result from KVM_GET_MSRS, r: 29 (failed MSR was 0x3f1)
> 
> I don't think any of these failing tests care about MSR_IA32_PEBS_ENABLE
> in particular, they're just trying to do KVM_GET_MSRS/KVM_SET_MSRS.
>
Like Xu May 19, 2022, 1:50 p.m. UTC | #4
On 19/5/2022 9:31 pm, Like Xu wrote:
> ==== Test Assertion Failure ====
>     lib/x86_64/processor.c:1207: r == nmsrs
>     pid=6702 tid=6702 errno=7 - Argument list too long
>        1    0x000000000040da11: vcpu_save_state at processor.c:1207 
> (discriminator 4)
>        2    0x00000000004024e5: main at state_test.c:209 (discriminator 6)
>        3    0x00007f9f48c2d55f: ?? ??:0
>        4    0x00007f9f48c2d60b: ?? ??:0
>        5    0x00000000004026d4: _start at ??:?
>     Unexpected result from KVM_GET_MSRS, r: 29 (failed MSR was 0x3f1)
> 
> I don't think any of these failing tests care about MSR_IA32_PEBS_ENABLE
> in particular, they're just trying to do KVM_GET_MSRS/KVM_SET_MSRS.

One of the lessons I learned here is that the members of msrs_to_save_all[]
are part of the KVM ABI. We don't add feature-related MSRs until the last
step of the KVM exposure feature (in this case, adding MSR_IA32_PEBS_ENABLE,
MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG to msrs_to_save_all[] should take
effect along with exposing the CPUID bits).

Awaiting a ruling from the core guardian on this part of the git-bisect deficiency.
Vitaly Kuznetsov May 19, 2022, 2:46 p.m. UTC | #5
Like Xu <like.xu.linux@gmail.com> writes:

> On 19/5/2022 9:31 pm, Like Xu wrote:
>> ==== Test Assertion Failure ====
>>     lib/x86_64/processor.c:1207: r == nmsrs
>>     pid=6702 tid=6702 errno=7 - Argument list too long
>>        1    0x000000000040da11: vcpu_save_state at processor.c:1207 
>> (discriminator 4)
>>        2    0x00000000004024e5: main at state_test.c:209 (discriminator 6)
>>        3    0x00007f9f48c2d55f: ?? ??:0
>>        4    0x00007f9f48c2d60b: ?? ??:0
>>        5    0x00000000004026d4: _start at ??:?
>>     Unexpected result from KVM_GET_MSRS, r: 29 (failed MSR was 0x3f1)
>> 
>> I don't think any of these failing tests care about MSR_IA32_PEBS_ENABLE
>> in particular, they're just trying to do KVM_GET_MSRS/KVM_SET_MSRS.
>
> One of the lessons I learned here is that the members of msrs_to_save_all[]
> are part of the KVM ABI. We don't add feature-related MSRs until the last
> step of the KVM exposure feature (in this case, adding MSR_IA32_PEBS_ENABLE,
> MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG to msrs_to_save_all[] should take
> effect along with exposing the CPUID bits).

AFAIR the basic rule here is that whatever gets returned with
KVM_GET_MSR_INDEX_LIST can be passed to KVM_GET_MSRS and read
successfully by the host (not necessarily by the guest) so my guess is
that MSR_IA32_PEBS_ENABLE is now returned in KVM_GET_MSR_INDEX_LIST but
can't be read with KVM_GET_MSRS. Later, the expectation is that what was
returned by KVM_GET_MSRS can be set successfully with KVM_SET_MSRS.
Like Xu May 25, 2022, 7:56 a.m. UTC | #6
On 19/5/2022 10:46 pm, Vitaly Kuznetsov wrote:
> Like Xu <like.xu.linux@gmail.com> writes:
> 
>> On 19/5/2022 9:31 pm, Like Xu wrote:
>>> ==== Test Assertion Failure ====
>>>      lib/x86_64/processor.c:1207: r == nmsrs
>>>      pid=6702 tid=6702 errno=7 - Argument list too long
>>>         1    0x000000000040da11: vcpu_save_state at processor.c:1207
>>> (discriminator 4)
>>>         2    0x00000000004024e5: main at state_test.c:209 (discriminator 6)
>>>         3    0x00007f9f48c2d55f: ?? ??:0
>>>         4    0x00007f9f48c2d60b: ?? ??:0
>>>         5    0x00000000004026d4: _start at ??:?
>>>      Unexpected result from KVM_GET_MSRS, r: 29 (failed MSR was 0x3f1)
>>>
>>> I don't think any of these failing tests care about MSR_IA32_PEBS_ENABLE
>>> in particular, they're just trying to do KVM_GET_MSRS/KVM_SET_MSRS.
>>
>> One of the lessons I learned here is that the members of msrs_to_save_all[]
>> are part of the KVM ABI. We don't add feature-related MSRs until the last
>> step of the KVM exposure feature (in this case, adding MSR_IA32_PEBS_ENABLE,
>> MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG to msrs_to_save_all[] should take
>> effect along with exposing the CPUID bits).
> 
> AFAIR the basic rule here is that whatever gets returned with
> KVM_GET_MSR_INDEX_LIST can be passed to KVM_GET_MSRS and read
> successfully by the host (not necessarily by the guest) so my guess is
> that MSR_IA32_PEBS_ENABLE is now returned in KVM_GET_MSR_INDEX_LIST but
> can't be read with KVM_GET_MSRS. Later, the expectation is that what was
> returned by KVM_GET_MSRS can be set successfully with KVM_SET_MSRS.
> 

Thanks for the clarification.

Some kvm x86 selftests have been failing due to this issue even after the last 
commit.

I blame myself for not passing the msr_info->host_initiated to the 
intel_is_valid_msr(),
meanwhile I pondered further whether we should check only the MSR addrs range in
the kvm_pmu_is_valid_msr() and apply this kind of sanity check in the 
pmu_set/get_msr().

Vitaly && Paolo, any preference to move forward ?

Thanks,
Like Xu
Paolo Bonzini May 25, 2022, 8:14 a.m. UTC | #7
On 5/25/22 09:56, Like Xu wrote:
> Thanks for the clarification.
> 
> Some kvm x86 selftests have been failing due to this issue even after 
> the last commit.
> 
> I blame myself for not passing the msr_info->host_initiated to the 
> intel_is_valid_msr(),
> meanwhile I pondered further whether we should check only the MSR addrs 
> range in
> the kvm_pmu_is_valid_msr() and apply this kind of sanity check in the 
> pmu_set/get_msr().
> 
> Vitaly && Paolo, any preference to move forward ?

I'm not sure what I did wrong to not see the failure, so I'll fix it myself.

But from now on, I'll have a hard rule of no new processor features 
enabled without KVM unit tests or selftests.  In fact, it would be nice 
if you wrote some for PEBS.

Paolo
Like Xu May 25, 2022, 8:32 a.m. UTC | #8
On 25/5/2022 4:14 pm, Paolo Bonzini wrote:
> On 5/25/22 09:56, Like Xu wrote:
>> Thanks for the clarification.
>>
>> Some kvm x86 selftests have been failing due to this issue even after the last 
>> commit.
>>
>> I blame myself for not passing the msr_info->host_initiated to the 
>> intel_is_valid_msr(),
>> meanwhile I pondered further whether we should check only the MSR addrs range in
>> the kvm_pmu_is_valid_msr() and apply this kind of sanity check in the 
>> pmu_set/get_msr().
>>
>> Vitaly && Paolo, any preference to move forward ?
> 
> I'm not sure what I did wrong to not see the failure, so I'll fix it myself.

More info, some Skylake hosts fail the tests like x86_64/state_test due to this 
issue.

> 
> But from now on, I'll have a hard rule of no new processor features enabled 
> without KVM unit tests or selftests.  In fact, it would be nice if you wrote 
> some for PEBS.

Great, my team (or at least me) is committed to contributing more tests on vPMU 
features.

We may update the process document to the 
Documentation/virt/kvm/review-checklist.rst.

> 
> Paolo
>
Maxim Levitsky May 25, 2022, 2:12 p.m. UTC | #9
On Wed, 2022-05-25 at 16:32 +0800, Like Xu wrote:
> On 25/5/2022 4:14 pm, Paolo Bonzini wrote:
> > On 5/25/22 09:56, Like Xu wrote:
> > > Thanks for the clarification.
> > > 
> > > Some kvm x86 selftests have been failing due to this issue even after the last 
> > > commit.
> > > 
> > > I blame myself for not passing the msr_info->host_initiated to the 
> > > intel_is_valid_msr(),
> > > meanwhile I pondered further whether we should check only the MSR addrs range in
> > > the kvm_pmu_is_valid_msr() and apply this kind of sanity check in the 
> > > pmu_set/get_msr().
> > > 
> > > Vitaly && Paolo, any preference to move forward ?
> > 
> > I'm not sure what I did wrong to not see the failure, so I'll fix it myself.
> 
> More info, some Skylake hosts fail the tests like x86_64/state_test due to this 
> issue.
> 
> > But from now on, I'll have a hard rule of no new processor features enabled 
> > without KVM unit tests or selftests.  In fact, it would be nice if you wrote 
> > some for PEBS.
> 
> Great, my team (or at least me) is committed to contributing more tests on vPMU 
> features.
> 
> We may update the process document to the 
> Documentation/virt/kvm/review-checklist.rst.
> 
> > Paolo
> > 

FYI, this patch series also break 'msr' test in kvm-unit tests.
(kvm/queue of today, and master of the kvm-unit-tests repo)

The test tries to set the MSR_IA32_MISC_ENABLE to 0x400c51889 and gets #GP.


Commenting this out, gets rid of #GP, but test still fails with unexpected result

		if (!msr_info->host_initiated &&
		    ((old_val ^ data) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL))
			return 1;




It is very possible that the test is broken, I'll check this later.

Best regards,
	Maxim Levitsky
Paolo Bonzini May 25, 2022, 2:13 p.m. UTC | #10
On 5/25/22 16:12, Maxim Levitsky wrote:
> FYI, this patch series also break 'msr' test in kvm-unit tests.
> (kvm/queue of today, and master of the kvm-unit-tests repo)
> 
> The test tries to set the MSR_IA32_MISC_ENABLE to 0x400c51889 and gets #GP.
> 
> 
> Commenting this out, gets rid of #GP, but test still fails with unexpected result
> 
> 		if (!msr_info->host_initiated &&
> 		    ((old_val ^ data) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL))
> 			return 1;
> 
> 
> 
> 
> It is very possible that the test is broken, I'll check this later.

Yes, for that I've sent a patch already:

https://lore.kernel.org/kvm/20220520183207.7952-1-pbonzini@redhat.com/

Paolo
Maxim Levitsky May 25, 2022, 2:14 p.m. UTC | #11
On Wed, 2022-05-25 at 16:13 +0200, Paolo Bonzini wrote:
> On 5/25/22 16:12, Maxim Levitsky wrote:
> > FYI, this patch series also break 'msr' test in kvm-unit tests.
> > (kvm/queue of today, and master of the kvm-unit-tests repo)
> > 
> > The test tries to set the MSR_IA32_MISC_ENABLE to 0x400c51889 and gets #GP.
> > 
> > 
> > Commenting this out, gets rid of #GP, but test still fails with unexpected result
> > 
> > 		if (!msr_info->host_initiated &&
> > 		    ((old_val ^ data) & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL))
> > 			return 1;
> > 
> > 
> > 
> > 
> > It is very possible that the test is broken, I'll check this later.
> 
> Yes, for that I've sent a patch already:
> 
> https://lore.kernel.org/kvm/20220520183207.7952-1-pbonzini@redhat.com/
> 
> Paolo
> 

Thank you very much!


Best regards,
	Maxim Levitsky