mbox series

[V8,00/18] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS

Message ID 20210716085325.10300-1-lingshan.zhu@intel.com (mailing list archive)
Headers show
Series KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS | expand

Message

Zhu, Lingshan July 16, 2021, 8:53 a.m. UTC
The guest Precise Event Based Sampling (PEBS) feature can provide an
architectural state of the instruction executed after the guest instruction
that exactly caused the event. It needs new hardware facility only available
on Intel Ice Lake Server platforms. This patch set enables the basic PEBS
feature for KVM guests on ICX.

We can use PEBS feature on the Linux guest like native:

   # echo 0 > /proc/sys/kernel/watchdog (on the host)
   # perf record -e instructions:ppp ./br_instr a
   # perf record -c 100000 -e instructions:pp ./br_instr a

To emulate guest PEBS facility for the above perf usages,
we need to implement 2 code paths:

1) Fast path

This is when the host assigned physical PMC has an identical index as the
virtual PMC (e.g. using physical PMC0 to emulate virtual PMC0).
This path is used in most common use cases.

2) Slow path

This is when the host assigned physical PMC has a different index from the
virtual PMC (e.g. using physical PMC1 to emulate virtual PMC0) In this case,
KVM needs to rewrite the PEBS records to change the applicable counter indexes
to the virtual PMC indexes, which would otherwise contain the physical counter
index written by PEBS facility, and switch the counter reset values to the
offset corresponding to the physical counter indexes in the DS data structure.

The previous version [0] enables both fast path and slow path, which seems
a bit more complex as the first step. In this patchset, we want to start with
the fast path to get the basic guest PEBS enabled while keeping the slow path
disabled. More focused discussion on the slow path [1] is planned to be put to
another patchset in the next step.

Compared to later versions in subsequent steps, the functionality to support
host-guest PEBS both enabled and the functionality to emulate guest PEBS when
the counter is cross-mapped are missing in this patch set
(neither of these are typical scenarios).

With the basic support, the guest can retrieve the correct PEBS information from
its own PEBS records on the Ice Lake servers. And we expect it should work when
migrating to another Ice Lake and no regression about host perf is expected.

Here are the results of pebs test from guest/host for same workload:

perf report on guest:
# Samples: 2K of event 'instructions:ppp', # Event count (approx.): 1473377250 # Overhead  Command   Shared Object      Symbol
   57.74%  br_instr  br_instr           [.] lfsr_cond
   41.40%  br_instr  br_instr           [.] cmp_end
    0.21%  br_instr  [kernel.kallsyms]  [k] __lock_acquire

perf report on host:
# Samples: 2K of event 'instructions:ppp', # Event count (approx.): 1462721386 # Overhead  Command   Shared Object     Symbol
   57.90%  br_instr  br_instr          [.] lfsr_cond
   41.95%  br_instr  br_instr          [.] cmp_end
    0.05%  br_instr  [kernel.vmlinux]  [k] lock_acquire
    Conclusion: the profiling results on the guest are similar tothat on the host.

A minimum guest kernel version may be v5.4 or a backport version support
Icelake server PEBS.

Please check more details in each commit and feel free to comment.

Previous:
https://lore.kernel.org/kvm/20210622094306.8336-1-lingshan.zhu@intel.com/

[0]
https://lore.kernel.org/kvm/20210104131542.495413-1-like.xu@linux.intel.com/
[1]
https://lore.kernel.org/kvm/20210115191113.nktlnmivc3edstiv@two.firstfloor.org/

V7 -> V8 Changelog:
- fix coding style, add {} for single statement of multiple lines(Peter Z)
- fix coding style in xen_guest_state() (Boris Ostrovsky)
- s/pmu/kvm_pmu/ in intel_guest_get_msrs() (Peter Z)
- put lower cost branch in the first place for x86_pmu_handle_guest_pebs() (Peter Z)

V6 -> V7 Changelog:
- Fix conditions order and call x86_pmu_handle_guest_pebs() unconditionally; (PeterZ)
- Add a new patch to make all that perf_guest_cbs stuff suck less; (PeterZ)
- Document IA32_MISC_ENABLE[7] that that behavior matches bare metal; (Sean & Venkatesh)
- Update commit message for fixed counter mask refactoring;(PeterZ)
- Clarifying comments about {.host and .guest} for intel_guest_get_msrs(); (PeterZ)
- Add pebs_capable to store valid PEBS_COUNTER_MASK value; (PeterZ)
- Add more comments for perf's precise_ip field; (Andi & PeterZ)
- Refactor perf_overflow_handler_t and make it more legible; (PeterZ)
- Use "(unsigned long)cpuc->ds" instead of __this_cpu_read(cpu_hw_events.ds); (PeterZ)
- Keep using "(struct kvm_pmu *)data" to follow K&R; (Andi)

Like Xu (17):
  perf/core: Use static_call to optimize perf_guest_info_callbacks
  perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server
  perf/x86/intel: Handle guest PEBS overflow PMI for KVM guest
  perf/x86/core: Pass "struct kvm_pmu *" to determine the guest values
  KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled
  KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter
  KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
  KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter
  KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counter
  KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS
  KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS
  KVM: x86: Set PEBS_UNAVAIL in IA32_MISC_ENABLE when PEBS is enabled
  KVM: x86/pmu: Move pmc_speculative_in_use() to arch/x86/kvm/pmu.h
  KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations
  KVM: x86/pmu: Add kvm_pmu_cap to optimize perf_get_x86_pmu_capability
  KVM: x86/cpuid: Refactor host/guest CPU model consistency check
  KVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64

Peter Zijlstra (Intel) (1):
  x86/perf/core: Add pebs_capable to store valid PEBS_COUNTER_MASK value

 arch/arm/kernel/perf_callchain.c   |  16 +--
 arch/arm64/kernel/perf_callchain.c |  29 +++--
 arch/arm64/kvm/perf.c              |  22 ++--
 arch/csky/kernel/perf_callchain.c  |   4 +-
 arch/nds32/kernel/perf_event_cpu.c |  16 +--
 arch/riscv/kernel/perf_callchain.c |   4 +-
 arch/x86/events/core.c             |  44 ++++++--
 arch/x86/events/intel/core.c       | 165 +++++++++++++++++++++++------
 arch/x86/events/perf_event.h       |   6 +-
 arch/x86/include/asm/kvm_host.h    |  18 +++-
 arch/x86/include/asm/msr-index.h   |   6 ++
 arch/x86/include/asm/perf_event.h  |   5 +-
 arch/x86/kvm/cpuid.c               |  24 ++---
 arch/x86/kvm/cpuid.h               |   5 +
 arch/x86/kvm/pmu.c                 |  60 ++++++++---
 arch/x86/kvm/pmu.h                 |  38 +++++++
 arch/x86/kvm/vmx/capabilities.h    |  26 +++--
 arch/x86/kvm/vmx/pmu_intel.c       | 116 ++++++++++++++++----
 arch/x86/kvm/vmx/vmx.c             |  24 ++++-
 arch/x86/kvm/vmx/vmx.h             |   2 +-
 arch/x86/kvm/x86.c                 |  51 +++++----
 arch/x86/xen/pmu.c                 |  32 +++---
 include/linux/perf_event.h         |  12 ++-
 kernel/events/core.c               |   9 ++
 24 files changed, 545 insertions(+), 189 deletions(-)

Comments

Jim Mattson July 16, 2021, 5:02 p.m. UTC | #1
On Fri, Jul 16, 2021 at 1:54 AM Zhu Lingshan <lingshan.zhu@intel.com> wrote:
>
> The guest Precise Event Based Sampling (PEBS) feature can provide an
> architectural state of the instruction executed after the guest instruction
> that exactly caused the event. It needs new hardware facility only available
> on Intel Ice Lake Server platforms. This patch set enables the basic PEBS
> feature for KVM guests on ICX.
>
> We can use PEBS feature on the Linux guest like native:
>
>    # echo 0 > /proc/sys/kernel/watchdog (on the host)
>    # perf record -e instructions:ppp ./br_instr a
>    # perf record -c 100000 -e instructions:pp ./br_instr a
>
> To emulate guest PEBS facility for the above perf usages,
> we need to implement 2 code paths:
>
> 1) Fast path
>
> This is when the host assigned physical PMC has an identical index as the
> virtual PMC (e.g. using physical PMC0 to emulate virtual PMC0).
> This path is used in most common use cases.
>
> 2) Slow path
>
> This is when the host assigned physical PMC has a different index from the
> virtual PMC (e.g. using physical PMC1 to emulate virtual PMC0) In this case,
> KVM needs to rewrite the PEBS records to change the applicable counter indexes
> to the virtual PMC indexes, which would otherwise contain the physical counter
> index written by PEBS facility, and switch the counter reset values to the
> offset corresponding to the physical counter indexes in the DS data structure.
>
> The previous version [0] enables both fast path and slow path, which seems
> a bit more complex as the first step. In this patchset, we want to start with
> the fast path to get the basic guest PEBS enabled while keeping the slow path
> disabled. More focused discussion on the slow path [1] is planned to be put to
> another patchset in the next step.
>
> Compared to later versions in subsequent steps, the functionality to support
> host-guest PEBS both enabled and the functionality to emulate guest PEBS when
> the counter is cross-mapped are missing in this patch set
> (neither of these are typical scenarios).

I'm not sure exactly what scenarios you're ruling out here. In our
environment, we always have to be able to support host-level
profiling, whether or not the guest is using the PMU (for PEBS or
anything else). Hence, for our *basic* vPMU offering, we only expose
two general purpose counters to the guest, so that we can keep two
general purpose counters for the host. In this scenario, I would
expect cross-mapped counters to be common. Are we going to be able to
use this implementation?
Liang, Kan July 16, 2021, 7 p.m. UTC | #2
On 7/16/2021 1:02 PM, Jim Mattson wrote:
> On Fri, Jul 16, 2021 at 1:54 AM Zhu Lingshan <lingshan.zhu@intel.com> wrote:
>>
>> The guest Precise Event Based Sampling (PEBS) feature can provide an
>> architectural state of the instruction executed after the guest instruction
>> that exactly caused the event. It needs new hardware facility only available
>> on Intel Ice Lake Server platforms. This patch set enables the basic PEBS
>> feature for KVM guests on ICX.
>>
>> We can use PEBS feature on the Linux guest like native:
>>
>>     # echo 0 > /proc/sys/kernel/watchdog (on the host)
>>     # perf record -e instructions:ppp ./br_instr a
>>     # perf record -c 100000 -e instructions:pp ./br_instr a
>>
>> To emulate guest PEBS facility for the above perf usages,
>> we need to implement 2 code paths:
>>
>> 1) Fast path
>>
>> This is when the host assigned physical PMC has an identical index as the
>> virtual PMC (e.g. using physical PMC0 to emulate virtual PMC0).
>> This path is used in most common use cases.
>>
>> 2) Slow path
>>
>> This is when the host assigned physical PMC has a different index from the
>> virtual PMC (e.g. using physical PMC1 to emulate virtual PMC0) In this case,
>> KVM needs to rewrite the PEBS records to change the applicable counter indexes
>> to the virtual PMC indexes, which would otherwise contain the physical counter
>> index written by PEBS facility, and switch the counter reset values to the
>> offset corresponding to the physical counter indexes in the DS data structure.
>>
>> The previous version [0] enables both fast path and slow path, which seems
>> a bit more complex as the first step. In this patchset, we want to start with
>> the fast path to get the basic guest PEBS enabled while keeping the slow path
>> disabled. More focused discussion on the slow path [1] is planned to be put to
>> another patchset in the next step.
>>
>> Compared to later versions in subsequent steps, the functionality to support
>> host-guest PEBS both enabled and the functionality to emulate guest PEBS when
>> the counter is cross-mapped are missing in this patch set
>> (neither of these are typical scenarios).
> 
> I'm not sure exactly what scenarios you're ruling out here. In our
> environment, we always have to be able to support host-level
> profiling, whether or not the guest is using the PMU (for PEBS or
> anything else). Hence, for our *basic* vPMU offering, we only expose
> two general purpose counters to the guest, so that we can keep two
> general purpose counters for the host. In this scenario, I would
> expect cross-mapped counters to be common. Are we going to be able to
> use this implementation?
> 

Let's say we have 4 GP counters in HW.
Do you mean that the host owns 2 GP counters (counter 0 & 1) and the 
guest own the other 2 GP counters (counter 2 & 3) in your envirinment?
We did a similar implementation in V1, but the proposal has been denied.
https://lore.kernel.org/kvm/20200306135317.GD12561@hirez.programming.kicks-ass.net/

For the current proposal, both guest and host can see all 4 GP counters. 
The counters are shared.
The guest cannot know the availability of the counters. It may requires 
a counter (e.g., counter 0) which may has been used by the host. Host 
may provides another counter (e.g., counter 1) to the guest. This is the 
case described in the slow path. For this case, we have to modify the 
guest PEBS record. Because the counter index in the PEBS record is 1, 
while the guest perf driver expects 0.

If counter 0 is available, guests can use counter 0. That's the fast 
path. I think the fast path should be more common even both host and 
guest are profiling. Because except for some specific events, we may 
move the host event to the counters which are not required by guest if 
we have enough resources.

Thanks,
Kan
Jim Mattson July 16, 2021, 9:07 p.m. UTC | #3
On Fri, Jul 16, 2021 at 12:00 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 7/16/2021 1:02 PM, Jim Mattson wrote:
> > On Fri, Jul 16, 2021 at 1:54 AM Zhu Lingshan <lingshan.zhu@intel.com> wrote:
> >>
> >> The guest Precise Event Based Sampling (PEBS) feature can provide an
> >> architectural state of the instruction executed after the guest instruction
> >> that exactly caused the event. It needs new hardware facility only available
> >> on Intel Ice Lake Server platforms. This patch set enables the basic PEBS
> >> feature for KVM guests on ICX.
> >>
> >> We can use PEBS feature on the Linux guest like native:
> >>
> >>     # echo 0 > /proc/sys/kernel/watchdog (on the host)
> >>     # perf record -e instructions:ppp ./br_instr a
> >>     # perf record -c 100000 -e instructions:pp ./br_instr a
> >>
> >> To emulate guest PEBS facility for the above perf usages,
> >> we need to implement 2 code paths:
> >>
> >> 1) Fast path
> >>
> >> This is when the host assigned physical PMC has an identical index as the
> >> virtual PMC (e.g. using physical PMC0 to emulate virtual PMC0).
> >> This path is used in most common use cases.
> >>
> >> 2) Slow path
> >>
> >> This is when the host assigned physical PMC has a different index from the
> >> virtual PMC (e.g. using physical PMC1 to emulate virtual PMC0) In this case,
> >> KVM needs to rewrite the PEBS records to change the applicable counter indexes
> >> to the virtual PMC indexes, which would otherwise contain the physical counter
> >> index written by PEBS facility, and switch the counter reset values to the
> >> offset corresponding to the physical counter indexes in the DS data structure.
> >>
> >> The previous version [0] enables both fast path and slow path, which seems
> >> a bit more complex as the first step. In this patchset, we want to start with
> >> the fast path to get the basic guest PEBS enabled while keeping the slow path
> >> disabled. More focused discussion on the slow path [1] is planned to be put to
> >> another patchset in the next step.
> >>
> >> Compared to later versions in subsequent steps, the functionality to support
> >> host-guest PEBS both enabled and the functionality to emulate guest PEBS when
> >> the counter is cross-mapped are missing in this patch set
> >> (neither of these are typical scenarios).
> >
> > I'm not sure exactly what scenarios you're ruling out here. In our
> > environment, we always have to be able to support host-level
> > profiling, whether or not the guest is using the PMU (for PEBS or
> > anything else). Hence, for our *basic* vPMU offering, we only expose
> > two general purpose counters to the guest, so that we can keep two
> > general purpose counters for the host. In this scenario, I would
> > expect cross-mapped counters to be common. Are we going to be able to
> > use this implementation?
> >
>
> Let's say we have 4 GP counters in HW.
> Do you mean that the host owns 2 GP counters (counter 0 & 1) and the
> guest own the other 2 GP counters (counter 2 & 3) in your envirinment?
> We did a similar implementation in V1, but the proposal has been denied.
> https://lore.kernel.org/kvm/20200306135317.GD12561@hirez.programming.kicks-ass.net/

It's the other way around. AFAIK, there is no architectural way to
specify that only counters 2 and 3 are available, so we have to give
the guest counters 0 and 1.

> For the current proposal, both guest and host can see all 4 GP counters.
> The counters are shared.

I don't understand how that can work. If the host programs two
counters, how can you give the guest four counters?

> The guest cannot know the availability of the counters. It may requires
> a counter (e.g., counter 0) which may has been used by the host. Host
> may provides another counter (e.g., counter 1) to the guest. This is the
> case described in the slow path. For this case, we have to modify the
> guest PEBS record. Because the counter index in the PEBS record is 1,
> while the guest perf driver expects 0.

If we reserve counters 0 and 1 for the guest, this is not a problem
(assuming we tell the guest it only has two counters). If we don't
statically partition the counters, I don't see how you can ensure that
the guest behaves as architected. For example, what do you do when the
guest programs four counters and the host programs two?

> If counter 0 is available, guests can use counter 0. That's the fast
> path. I think the fast path should be more common even both host and
> guest are profiling. Because except for some specific events, we may
> move the host event to the counters which are not required by guest if
> we have enough resources.

And if you don't have enough resources?
Liang, Kan July 19, 2021, 12:41 a.m. UTC | #4
On 7/16/2021 5:07 PM, Jim Mattson wrote:
> On Fri, Jul 16, 2021 at 12:00 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>>
>>
>> On 7/16/2021 1:02 PM, Jim Mattson wrote:
>>> On Fri, Jul 16, 2021 at 1:54 AM Zhu Lingshan <lingshan.zhu@intel.com> wrote:
>>>>
>>>> The guest Precise Event Based Sampling (PEBS) feature can provide an
>>>> architectural state of the instruction executed after the guest instruction
>>>> that exactly caused the event. It needs new hardware facility only available
>>>> on Intel Ice Lake Server platforms. This patch set enables the basic PEBS
>>>> feature for KVM guests on ICX.
>>>>
>>>> We can use PEBS feature on the Linux guest like native:
>>>>
>>>>      # echo 0 > /proc/sys/kernel/watchdog (on the host)
>>>>      # perf record -e instructions:ppp ./br_instr a
>>>>      # perf record -c 100000 -e instructions:pp ./br_instr a
>>>>
>>>> To emulate guest PEBS facility for the above perf usages,
>>>> we need to implement 2 code paths:
>>>>
>>>> 1) Fast path
>>>>
>>>> This is when the host assigned physical PMC has an identical index as the
>>>> virtual PMC (e.g. using physical PMC0 to emulate virtual PMC0).
>>>> This path is used in most common use cases.
>>>>
>>>> 2) Slow path
>>>>
>>>> This is when the host assigned physical PMC has a different index from the
>>>> virtual PMC (e.g. using physical PMC1 to emulate virtual PMC0) In this case,
>>>> KVM needs to rewrite the PEBS records to change the applicable counter indexes
>>>> to the virtual PMC indexes, which would otherwise contain the physical counter
>>>> index written by PEBS facility, and switch the counter reset values to the
>>>> offset corresponding to the physical counter indexes in the DS data structure.
>>>>
>>>> The previous version [0] enables both fast path and slow path, which seems
>>>> a bit more complex as the first step. In this patchset, we want to start with
>>>> the fast path to get the basic guest PEBS enabled while keeping the slow path
>>>> disabled. More focused discussion on the slow path [1] is planned to be put to
>>>> another patchset in the next step.
>>>>
>>>> Compared to later versions in subsequent steps, the functionality to support
>>>> host-guest PEBS both enabled and the functionality to emulate guest PEBS when
>>>> the counter is cross-mapped are missing in this patch set
>>>> (neither of these are typical scenarios).
>>>
>>> I'm not sure exactly what scenarios you're ruling out here. In our
>>> environment, we always have to be able to support host-level
>>> profiling, whether or not the guest is using the PMU (for PEBS or
>>> anything else). Hence, for our *basic* vPMU offering, we only expose
>>> two general purpose counters to the guest, so that we can keep two
>>> general purpose counters for the host. In this scenario, I would
>>> expect cross-mapped counters to be common. Are we going to be able to
>>> use this implementation?
>>>
>>
>> Let's say we have 4 GP counters in HW.
>> Do you mean that the host owns 2 GP counters (counter 0 & 1) and the
>> guest own the other 2 GP counters (counter 2 & 3) in your envirinment?
>> We did a similar implementation in V1, but the proposal has been denied.
>> https://lore.kernel.org/kvm/20200306135317.GD12561@hirez.programming.kicks-ass.net/
> 
> It's the other way around. AFAIK, there is no architectural way to
> specify that only counters 2 and 3 are available, so we have to give
> the guest counters 0 and 1.

How about the host? Can the host see all 4 counters?

> 
>> For the current proposal, both guest and host can see all 4 GP counters.
>> The counters are shared.
> 
> I don't understand how that can work. If the host programs two
> counters, how can you give the guest four counters?
> 
>> The guest cannot know the availability of the counters. It may requires
>> a counter (e.g., counter 0) which may has been used by the host. Host
>> may provides another counter (e.g., counter 1) to the guest. This is the
>> case described in the slow path. For this case, we have to modify the
>> guest PEBS record. Because the counter index in the PEBS record is 1,
>> while the guest perf driver expects 0.
> 
> If we reserve counters 0 and 1 for the guest, this is not a problem
> (assuming we tell the guest it only has two counters). If we don't
> statically partition the counters, I don't see how you can ensure that
> the guest behaves as architected. For example, what do you do when the
> guest programs four counters and the host programs two?

Ideally, we should do multiplexing if the guest requires four and the 
host requires two. But I doubt this patch set implements the 
multiplexing, because the multiplexing should be part of the slow path, 
which will be supported in the next step.

Could you please share more details regarding your environment?
How do you handle the case that guest programs two counters and the host 
programs four counters?

> 
>> If counter 0 is available, guests can use counter 0. That's the fast
>> path. I think the fast path should be more common even both host and
>> guest are profiling. Because except for some specific events, we may
>> move the host event to the counters which are not required by guest if
>> we have enough resources.
> 
> And if you don't have enough resources? 

As my understanding, multiplexing should be the only choice if we don't 
have enough resources.

Thanks,
Kan
Like Xu July 21, 2021, 12:10 p.m. UTC | #5
On 19/7/2021 8:41 am, Liang, Kan wrote:
> 
> 
> On 7/16/2021 5:07 PM, Jim Mattson wrote:
>> On Fri, Jul 16, 2021 at 12:00 PM Liang, Kan 
>> <kan.liang@linux.intel.com> wrote:
>>>
>>>
>>>
>>> On 7/16/2021 1:02 PM, Jim Mattson wrote:
>>>> On Fri, Jul 16, 2021 at 1:54 AM Zhu Lingshan 
>>>> <lingshan.zhu@intel.com> wrote:
>>>>>
>>>>> The guest Precise Event Based Sampling (PEBS) feature can provide an
>>>>> architectural state of the instruction executed after the guest 
>>>>> instruction
>>>>> that exactly caused the event. It needs new hardware facility only 
>>>>> available
>>>>> on Intel Ice Lake Server platforms. This patch set enables the 
>>>>> basic PEBS
>>>>> feature for KVM guests on ICX.
>>>>>
>>>>> We can use PEBS feature on the Linux guest like native:
>>>>>
>>>>>      # echo 0 > /proc/sys/kernel/watchdog (on the host)
>>>>>      # perf record -e instructions:ppp ./br_instr a
>>>>>      # perf record -c 100000 -e instructions:pp ./br_instr a
>>>>>
>>>>> To emulate guest PEBS facility for the above perf usages,
>>>>> we need to implement 2 code paths:
>>>>>
>>>>> 1) Fast path
>>>>>
>>>>> This is when the host assigned physical PMC has an identical index 
>>>>> as the
>>>>> virtual PMC (e.g. using physical PMC0 to emulate virtual PMC0).
>>>>> This path is used in most common use cases.
>>>>>
>>>>> 2) Slow path
>>>>>
>>>>> This is when the host assigned physical PMC has a different index 
>>>>> from the
>>>>> virtual PMC (e.g. using physical PMC1 to emulate virtual PMC0) In 
>>>>> this case,
>>>>> KVM needs to rewrite the PEBS records to change the applicable 
>>>>> counter indexes
>>>>> to the virtual PMC indexes, which would otherwise contain the 
>>>>> physical counter
>>>>> index written by PEBS facility, and switch the counter reset values 
>>>>> to the
>>>>> offset corresponding to the physical counter indexes in the DS data 
>>>>> structure.
>>>>>
>>>>> The previous version [0] enables both fast path and slow path, 
>>>>> which seems
>>>>> a bit more complex as the first step. In this patchset, we want to 
>>>>> start with
>>>>> the fast path to get the basic guest PEBS enabled while keeping the 
>>>>> slow path
>>>>> disabled. More focused discussion on the slow path [1] is planned 
>>>>> to be put to
>>>>> another patchset in the next step.
>>>>>
>>>>> Compared to later versions in subsequent steps, the functionality 
>>>>> to support
>>>>> host-guest PEBS both enabled and the functionality to emulate guest 
>>>>> PEBS when
>>>>> the counter is cross-mapped are missing in this patch set
>>>>> (neither of these are typical scenarios).
>>>>
>>>> I'm not sure exactly what scenarios you're ruling out here. In our
>>>> environment, we always have to be able to support host-level
>>>> profiling, whether or not the guest is using the PMU (for PEBS or
>>>> anything else). Hence, for our *basic* vPMU offering, we only expose
>>>> two general purpose counters to the guest, so that we can keep two
>>>> general purpose counters for the host. In this scenario, I would
>>>> expect cross-mapped counters to be common. Are we going to be able to
>>>> use this implementation?
>>>>
>>>
>>> Let's say we have 4 GP counters in HW.
>>> Do you mean that the host owns 2 GP counters (counter 0 & 1) and the
>>> guest own the other 2 GP counters (counter 2 & 3) in your envirinment?
>>> We did a similar implementation in V1, but the proposal has been denied.
>>> https://lore.kernel.org/kvm/20200306135317.GD12561@hirez.programming.kicks-ass.net/ 
>>>
>>
>> It's the other way around. AFAIK, there is no architectural way to
>> specify that only counters 2 and 3 are available, so we have to give
>> the guest counters 0 and 1.
> 
> How about the host? Can the host see all 4 counters?
> 
>>
>>> For the current proposal, both guest and host can see all 4 GP counters.
>>> The counters are shared.
>>
>> I don't understand how that can work. If the host programs two
>> counters, how can you give the guest four counters?
>>
>>> The guest cannot know the availability of the counters. It may requires
>>> a counter (e.g., counter 0) which may has been used by the host. Host
>>> may provides another counter (e.g., counter 1) to the guest. This is the
>>> case described in the slow path. For this case, we have to modify the
>>> guest PEBS record. Because the counter index in the PEBS record is 1,
>>> while the guest perf driver expects 0.
>>
>> If we reserve counters 0 and 1 for the guest, this is not a problem
>> (assuming we tell the guest it only has two counters). If we don't
>> statically partition the counters, I don't see how you can ensure that
>> the guest behaves as architected. For example, what do you do when the
>> guest programs four counters and the host programs two?
> 
> Ideally, we should do multiplexing if the guest requires four and the 
> host requires two. But I doubt this patch set implements the 
> multiplexing, because the multiplexing should be part of the slow path, 
> which will be supported in the next step.
> 
> Could you please share more details regarding your environment?

Jim, would you mind sharing more details about the statically
partitioned hardware counters in your virtualization scenario ?

It may be useful for subsequent designs for advanced PEBS features.
Otherwise we will follow the sharing rules defined by perf subsystem.

> How do you handle the case that guest programs two counters and the host 
> programs four counters?
> 
>>
>>> If counter 0 is available, guests can use counter 0. That's the fast
>>> path. I think the fast path should be more common even both host and
>>> guest are profiling. Because except for some specific events, we may
>>> move the host event to the counters which are not required by guest if
>>> we have enough resources.
>>
>> And if you don't have enough resources? 
> 
> As my understanding, multiplexing should be the only choice if we don't 
> have enough resources.
> 
> Thanks,
> Kan
Liuxiangdong July 22, 2021, 12:53 p.m. UTC | #6
Hi,like and lingshan.

We can use pebs on the Icelake by using "perf record -e $event:pp", but 
how can we get all the supported $event for the Icelake?
Because it seems like that all the hardware event/software event/kernel 
pmu event listed by "perf list" can use ":pp" without error.


By quering events list for Icelake("https://perfmon-events.intel.com/), 
we can use "perf record -e cpu/event=0xXX,unask=0xXX/pp"
to enable sampling. There are some events with "PEBS: 
[PreciseEventingIP]" in "Additional Info" column. Are they the only 
supported
precise events? Do those events which have "PEBS:[NonPreciseEventingIP]" 
in last column support PEBS?


Thanks,
Xiangdong Liu
Liang, Kan July 22, 2021, 1:08 p.m. UTC | #7
On 7/22/2021 8:53 AM, Liuxiangdong wrote:
> Hi,like and lingshan.
> 
> We can use pebs on the Icelake by using "perf record -e $event:pp", but 
> how can we get all the supported $event for the Icelake?
> Because it seems like that all the hardware event/software event/kernel 
> pmu event listed by "perf list" can use ":pp" without error.
> 
> 
> By quering events list for Icelake("https://perfmon-events.intel.com/), 
> we can use "perf record -e cpu/event=0xXX,unask=0xXX/pp"
> to enable sampling. There are some events with "PEBS: 
> [PreciseEventingIP]" in "Additional Info" column. Are they the only 
> supported
> precise events? Do those events which have "PEBS:[NonPreciseEventingIP]" 
> in last column support PEBS?
>

Starts from Ice Lake, the extended PEBS feature is supported, which 
extend the PEBS for all counters and all performance monitoring events 
(both precise event and non-precise). You can sample any events with PEBS.
For details, please refer to the 18.9.1 extended PEBS in the latest SDM 
vol3.

Thanks,
Kan