Message ID | 20210622094306.8336-1-lingshan.zhu@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS | expand |
On 2021/6/22 17:42, Zhu Lingshan wrote: > The guest Precise Event Based Sampling (PEBS) feature can provide an architectural state of the instruction executed after the guest instruction that exactly caused the event. It needs new hardware facility only available on Intel Ice Lake Server platforms. This patch set enables the basic PEBS feature for KVM guests on ICX. > > We can use PEBS feature on the Linux guest like native: > > # echo 0 > /proc/sys/kernel/watchdog (on the host) Only on the host? I cannot use pebs unless try with "echo 0 > /proc/sys/kernel/watchdog" both on the host and guest on ICX. > # perf record -e instructions:ppp ./br_instr a > # perf record -c 100000 -e instructions:pp ./br_instr a > > To emulate guest PEBS facility for the above perf usages, we need to implement 2 code paths: > > 1) Fast path > > This is when the host assigned physical PMC has an identical index as the virtual PMC (e.g. using physical PMC0 to emulate virtual PMC0). > This path is used in most common use cases. > > 2) Slow path > > This is when the host assigned physical PMC has a different index from the virtual PMC (e.g. using physical PMC1 to emulate virtual PMC0) In this case, KVM needs to rewrite the PEBS records to change the applicable counter indexes to the virtual PMC indexes, which would otherwise contain the physical counter index written by PEBS facility, and switch the counter reset values to the offset corresponding to the physical counter indexes in the DS data structure. > > The previous version [0] enables both fast path and slow path, which seems a bit more complex as the first step. In this patchset, we want to start with the fast path to get the basic guest PEBS enabled while keeping the slow path disabled. More focused discussion on the slow path [1] is planned to be put to another patchset in the next step. > > Compared to later versions in subsequent steps, the functionality to support host-guest PEBS both enabled and the functionality to emulate guest PEBS when the counter is cross-mapped are missing in this patch set (neither of these are typical scenarios). > > With the basic support, the guest can retrieve the correct PEBS information from its own PEBS records on the Ice Lake servers. > And we expect it should work when migrating to another Ice Lake and no regression about host perf is expected. > > Here are the results of pebs test from guest/host for same workload: > > perf report on guest: > # Samples: 2K of event 'instructions:ppp', # Event count (approx.): 1473377250 # Overhead Command Shared Object Symbol > 57.74% br_instr br_instr [.] lfsr_cond > 41.40% br_instr br_instr [.] cmp_end > 0.21% br_instr [kernel.kallsyms] [k] __lock_acquire > > perf report on host: > # Samples: 2K of event 'instructions:ppp', # Event count (approx.): 1462721386 # Overhead Command Shared Object Symbol > 57.90% br_instr br_instr [.] lfsr_cond > 41.95% br_instr br_instr [.] cmp_end > 0.05% br_instr [kernel.vmlinux] [k] lock_acquire > Conclusion: the profiling results on the guest are similar tothat on the host. > > A minimum guest kernel version may be v5.4 or a backport version support Icelake server PEBS. > > Please check more details in each commit and feel free to comment. > > Previous: > https://lore.kernel.org/kvm/20210511024214.280733-1-like.xu@linux.intel.com/ > > [0] > https://lore.kernel.org/kvm/20210104131542.495413-1-like.xu@linux.intel.com/ > [1] > https://lore.kernel.org/kvm/20210115191113.nktlnmivc3edstiv@two.firstfloor.org/ > > V6 -> V7 Changelog: > - Fix conditions order and call x86_pmu_handle_guest_pebs() unconditionally; (PeterZ) > - Add a new patch to make all that perf_guest_cbs stuff suck less; (PeterZ) > - Document IA32_MISC_ENABLE[7] that that behavior matches bare metal; (Sean & Venkatesh) > - Update commit message for fixed counter mask refactoring;(PeterZ) > - Clarifying comments about {.host and .guest} for intel_guest_get_msrs(); (PeterZ) > - Add pebs_capable to store valid PEBS_COUNTER_MASK value; (PeterZ) > - Add more comments for perf's precise_ip field; (Andi & PeterZ) > - Refactor perf_overflow_handler_t and make it more legible; (PeterZ) > - Use "(unsigned long)cpuc->ds" instead of __this_cpu_read(cpu_hw_events.ds); (PeterZ) > - Keep using "(struct kvm_pmu *)data" to follow K&R; (Andi) > > Like Xu (17): > perf/core: Use static_call to optimize perf_guest_info_callbacks > perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server > perf/x86/intel: Handle guest PEBS overflow PMI for KVM guest > perf/x86/core: Pass "struct kvm_pmu *" to determine the guest values > KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled > KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter > KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS > KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter > KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counter > KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS > KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS > KVM: x86: Set PEBS_UNAVAIL in IA32_MISC_ENABLE when PEBS is enabled > KVM: x86/pmu: Move pmc_speculative_in_use() to arch/x86/kvm/pmu.h > KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations > KVM: x86/pmu: Add kvm_pmu_cap to optimize perf_get_x86_pmu_capability > KVM: x86/cpuid: Refactor host/guest CPU model consistency check > KVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64 > > Peter Zijlstra (Intel) (1): > x86/perf/core: Add pebs_capable to store valid PEBS_COUNTER_MASK value > > arch/arm/kernel/perf_callchain.c | 16 +-- > arch/arm64/kernel/perf_callchain.c | 29 +++-- > arch/arm64/kvm/perf.c | 22 ++-- > arch/csky/kernel/perf_callchain.c | 4 +- > arch/nds32/kernel/perf_event_cpu.c | 16 +-- > arch/riscv/kernel/perf_callchain.c | 4 +- > arch/x86/events/core.c | 43 ++++++-- > arch/x86/events/intel/core.c | 165 +++++++++++++++++++++++------ > arch/x86/events/perf_event.h | 6 +- > arch/x86/include/asm/kvm_host.h | 18 +++- > arch/x86/include/asm/msr-index.h | 6 ++ > arch/x86/include/asm/perf_event.h | 5 +- > arch/x86/kvm/cpuid.c | 24 ++--- > arch/x86/kvm/cpuid.h | 5 + > arch/x86/kvm/pmu.c | 60 ++++++++--- > arch/x86/kvm/pmu.h | 38 +++++++ > arch/x86/kvm/vmx/capabilities.h | 26 +++-- > arch/x86/kvm/vmx/pmu_intel.c | 115 ++++++++++++++++---- > arch/x86/kvm/vmx/vmx.c | 24 ++++- > arch/x86/kvm/vmx/vmx.h | 2 +- > arch/x86/kvm/x86.c | 51 +++++---- > arch/x86/xen/pmu.c | 33 +++--- > include/linux/perf_event.h | 12 ++- > kernel/events/core.c | 9 ++ > 24 files changed, 544 insertions(+), 189 deletions(-) >
On 6/25/2021 5:40 PM, Liuxiangdong wrote: > > > On 2021/6/22 17:42, Zhu Lingshan wrote: >> The guest Precise Event Based Sampling (PEBS) feature can provide an >> architectural state of the instruction executed after the guest >> instruction that exactly caused the event. It needs new hardware >> facility only available on Intel Ice Lake Server platforms. This >> patch set enables the basic PEBS feature for KVM guests on ICX. >> >> We can use PEBS feature on the Linux guest like native: >> >> # echo 0 > /proc/sys/kernel/watchdog (on the host) > > Only on the host? > I cannot use pebs unless try with "echo 0 > /proc/sys/kernel/watchdog" > both on the host and guest on ICX. Hi Xiangdong I guess you may run into the "cross-map" case(slow path below), so I think you can disable them both in host and guest to make PEBS work. Thanks > >> # perf record -e instructions:ppp ./br_instr a >> # perf record -c 100000 -e instructions:pp ./br_instr a >> >> To emulate guest PEBS facility for the above perf usages, we need to >> implement 2 code paths: >> >> 1) Fast path >> >> This is when the host assigned physical PMC has an identical index as >> the virtual PMC (e.g. using physical PMC0 to emulate virtual PMC0). >> This path is used in most common use cases. >> >> 2) Slow path >> >> This is when the host assigned physical PMC has a different index >> from the virtual PMC (e.g. using physical PMC1 to emulate virtual >> PMC0) In this case, KVM needs to rewrite the PEBS records to change >> the applicable counter indexes to the virtual PMC indexes, which >> would otherwise contain the physical counter index written by PEBS >> facility, and switch the counter reset values to the offset >> corresponding to the physical counter indexes in the DS data structure. >> >> The previous version [0] enables both fast path and slow path, which >> seems a bit more complex as the first step. In this patchset, we want >> to start with the fast path to get the basic guest PEBS enabled while >> keeping the slow path disabled. More focused discussion on the slow >> path [1] is planned to be put to another patchset in the next step. >> >> Compared to later versions in subsequent steps, the functionality to >> support host-guest PEBS both enabled and the functionality to emulate >> guest PEBS when the counter is cross-mapped are missing in this patch >> set (neither of these are typical scenarios). >> >> With the basic support, the guest can retrieve the correct PEBS >> information from its own PEBS records on the Ice Lake servers. >> And we expect it should work when migrating to another Ice Lake and >> no regression about host perf is expected. >> >> Here are the results of pebs test from guest/host for same workload: >> >> perf report on guest: >> # Samples: 2K of event 'instructions:ppp', # Event count (approx.): >> 1473377250 # Overhead Command Shared Object Symbol >> 57.74% br_instr br_instr [.] lfsr_cond >> 41.40% br_instr br_instr [.] cmp_end >> 0.21% br_instr [kernel.kallsyms] [k] __lock_acquire >> >> perf report on host: >> # Samples: 2K of event 'instructions:ppp', # Event count (approx.): >> 1462721386 # Overhead Command Shared Object Symbol >> 57.90% br_instr br_instr [.] lfsr_cond >> 41.95% br_instr br_instr [.] cmp_end >> 0.05% br_instr [kernel.vmlinux] [k] lock_acquire >> Conclusion: the profiling results on the guest are similar >> tothat on the host. >> >> A minimum guest kernel version may be v5.4 or a backport version >> support Icelake server PEBS. >> >> Please check more details in each commit and feel free to comment. >> >> Previous: >> https://lore.kernel.org/kvm/20210511024214.280733-1-like.xu@linux.intel.com/ >> >> >> [0] >> https://lore.kernel.org/kvm/20210104131542.495413-1-like.xu@linux.intel.com/ >> >> [1] >> https://lore.kernel.org/kvm/20210115191113.nktlnmivc3edstiv@two.firstfloor.org/ >> >> >> V6 -> V7 Changelog: >> - Fix conditions order and call x86_pmu_handle_guest_pebs() >> unconditionally; (PeterZ) >> - Add a new patch to make all that perf_guest_cbs stuff suck less; >> (PeterZ) >> - Document IA32_MISC_ENABLE[7] that that behavior matches bare metal; >> (Sean & Venkatesh) >> - Update commit message for fixed counter mask refactoring;(PeterZ) >> - Clarifying comments about {.host and .guest} for >> intel_guest_get_msrs(); (PeterZ) >> - Add pebs_capable to store valid PEBS_COUNTER_MASK value; (PeterZ) >> - Add more comments for perf's precise_ip field; (Andi & PeterZ) >> - Refactor perf_overflow_handler_t and make it more legible; (PeterZ) >> - Use "(unsigned long)cpuc->ds" instead of >> __this_cpu_read(cpu_hw_events.ds); (PeterZ) >> - Keep using "(struct kvm_pmu *)data" to follow K&R; (Andi) >> >> Like Xu (17): >> perf/core: Use static_call to optimize perf_guest_info_callbacks >> perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server >> perf/x86/intel: Handle guest PEBS overflow PMI for KVM guest >> perf/x86/core: Pass "struct kvm_pmu *" to determine the guest values >> KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled >> KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter >> KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS >> KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter >> KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR >> counter >> KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS >> KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive >> PEBS >> KVM: x86: Set PEBS_UNAVAIL in IA32_MISC_ENABLE when PEBS is enabled >> KVM: x86/pmu: Move pmc_speculative_in_use() to arch/x86/kvm/pmu.h >> KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations >> KVM: x86/pmu: Add kvm_pmu_cap to optimize perf_get_x86_pmu_capability >> KVM: x86/cpuid: Refactor host/guest CPU model consistency check >> KVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64 >> >> Peter Zijlstra (Intel) (1): >> x86/perf/core: Add pebs_capable to store valid PEBS_COUNTER_MASK >> value >> >> arch/arm/kernel/perf_callchain.c | 16 +-- >> arch/arm64/kernel/perf_callchain.c | 29 +++-- >> arch/arm64/kvm/perf.c | 22 ++-- >> arch/csky/kernel/perf_callchain.c | 4 +- >> arch/nds32/kernel/perf_event_cpu.c | 16 +-- >> arch/riscv/kernel/perf_callchain.c | 4 +- >> arch/x86/events/core.c | 43 ++++++-- >> arch/x86/events/intel/core.c | 165 +++++++++++++++++++++++------ >> arch/x86/events/perf_event.h | 6 +- >> arch/x86/include/asm/kvm_host.h | 18 +++- >> arch/x86/include/asm/msr-index.h | 6 ++ >> arch/x86/include/asm/perf_event.h | 5 +- >> arch/x86/kvm/cpuid.c | 24 ++--- >> arch/x86/kvm/cpuid.h | 5 + >> arch/x86/kvm/pmu.c | 60 ++++++++--- >> arch/x86/kvm/pmu.h | 38 +++++++ >> arch/x86/kvm/vmx/capabilities.h | 26 +++-- >> arch/x86/kvm/vmx/pmu_intel.c | 115 ++++++++++++++++---- >> arch/x86/kvm/vmx/vmx.c | 24 ++++- >> arch/x86/kvm/vmx/vmx.h | 2 +- >> arch/x86/kvm/x86.c | 51 +++++---- >> arch/x86/xen/pmu.c | 33 +++--- >> include/linux/perf_event.h | 12 ++- >> kernel/events/core.c | 9 ++ >> 24 files changed, 544 insertions(+), 189 deletions(-) >> >
On Friday, June 25, 2021 5:46 PM, Zhu, Lingshan wrote: > > Only on the host? > > I cannot use pebs unless try with "echo 0 > /proc/sys/kernel/watchdog" > > both on the host and guest on ICX. > Hi Xiangdong > > I guess you may run into the "cross-map" case(slow path below), so I think you > can disable them both in host and guest to make PEBS work. > Hi Lingshan, could we also reproduce this issue? If the guest's watchdog takes away the virtual fixed counter, this will schedule the guest PEBS to use virtual PMC0. With the fast path (1:1 mapping), I think physical PMC0 is likely to be available for the guest PEBS emulation if no other host perf events are running. Best, Wei
On 6/28/2021 3:49 PM, Wang, Wei W wrote: > On Friday, June 25, 2021 5:46 PM, Zhu, Lingshan wrote: >>> Only on the host? >>> I cannot use pebs unless try with "echo 0 > /proc/sys/kernel/watchdog" >>> both on the host and guest on ICX. >> Hi Xiangdong >> >> I guess you may run into the "cross-map" case(slow path below), so I think you >> can disable them both in host and guest to make PEBS work. >> > Hi Lingshan, could we also reproduce this issue? > > If the guest's watchdog takes away the virtual fixed counter, this will schedule the guest PEBS to use virtual PMC0. With the fast path (1:1 mapping), I think physical PMC0 is likely to be available for the guest PEBS emulation if no other host perf events are running. I think it is possible, even a virtual counter need a perf event scheduled on the host. This depends on the guest / host workloads. Thanks, Zhu Lingshan > Best, > Wei
On Tue, Jun 22, 2021 at 05:42:48PM +0800, Zhu Lingshan wrote: > Like Xu (17): > perf/core: Use static_call to optimize perf_guest_info_callbacks > perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server > perf/x86/intel: Handle guest PEBS overflow PMI for KVM guest > perf/x86/core: Pass "struct kvm_pmu *" to determine the guest values > KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled > KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter > KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS > KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter > KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counter > KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS > KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS > KVM: x86: Set PEBS_UNAVAIL in IA32_MISC_ENABLE when PEBS is enabled > KVM: x86/pmu: Move pmc_speculative_in_use() to arch/x86/kvm/pmu.h > KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations > KVM: x86/pmu: Add kvm_pmu_cap to optimize perf_get_x86_pmu_capability > KVM: x86/cpuid: Refactor host/guest CPU model consistency check > KVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64 Overall pretty decent, I send a couple of nits in reply to the individual patches.
Hi, Lingshan. We can use basic pebs for KVM Guest on ICX by this patches set. Will we consider supporting "perf mem" for KVM Guest? AFAIK, the load latency facility requires processor supporting PEBS. Besides, it needs MSR_PEBS_LD_LAT_THRESHOLD msr (3F6H) to specify the desired latency threshold. How about passthrough this msr to Guest? Thanks! Xiangdong Liu
On 12/7/2021 9:37 am, Liuxiangdong wrote: > Hi, Lingshan. > > We can use basic pebs for KVM Guest on ICX by this patches set. Will we > consider supporting "perf mem" for KVM Guest? > I suggest we can enable more advanced PEBS features after the basic support hits the mainline. > AFAIK, the load latency facility requires processor supporting PEBS. > Besides, it needs MSR_PEBS_LD_LAT_THRESHOLD > msr (3F6H) to specify the desired latency threshold. How about > passthrough this msr to Guest? > > Thanks! > Xiangdong Liu > >