mbox series

[v12,00/11] Guest Last Branch Recording Enabling

Message ID 20200613080958.132489-1-like.xu@linux.intel.com (mailing list archive)
Headers show
Series Guest Last Branch Recording Enabling | expand

Message

Like Xu June 13, 2020, 8:09 a.m. UTC
Hi all,

Please help review this new version for the Kenrel 5.9 release.

Now, you may apply the last two qemu-devel patches to the upstream
qemu and try the guest LBR feature with '-cpu host' command line.

v11->v12 Changelog:
- apply "Signed-off-by" form PeterZ and his codes for the perf subsystem;
- add validity checks before expose LBR via MSR_IA32_PERF_CAPABILITIES;
- refactor MSR_IA32_DEBUGCTLMSR emulation with validity check;
- reorder "perf_event_attr" fields according to how they're declared;
- replace event_is_oncpu() with "event->state" check;
- make LBR emualtion specific to vmx rather than x86 generic;
- move pass-through LBR code to vmx.c instead of pmu_intel.c;
- add vmx_lbr_en/disable_passthrough layer to make code readable;
- rewrite pmu availability check with vmx_passthrough_lbr_msrs();

You may check more details in each commit.

Previous:
https://lore.kernel.org/kvm/20200514083054.62538-1-like.xu@linux.intel.com/

---

The last branch recording (LBR) is a performance monitor unit (PMU)
feature on Intel processors that records a running trace of the most
recent branches taken by the processor in the LBR stack. This patch
series is going to enable this feature for plenty of KVM guests.

The userspace could configure whether it's enabled or not for each
guest via MSR_IA32_PERF_CAPABILITIES msr. As a first step, a guest
could only enable LBR feature if its cpu model is the same as the
host since the LBR feature is still one of model specific features.

If it's enabled on the guest, the guest LBR driver would accesses the
LBR MSR (including IA32_DEBUGCTLMSR and records MSRs) as host does.
The first guest access on the LBR related MSRs is always interceptible.
The KVM trap would create a special LBR event (called guest LBR event)
which enables the callstack mode and none of hardware counter is assigned.
The host perf would enable and schedule this event as usual. 

Guest's first access to a LBR registers gets trapped to KVM, which
creates a guest LBR perf event. It's a regular LBR perf event which gets
the LBR facility assigned from the perf subsystem. Once that succeeds,
the LBR stack msrs are passed through to the guest for efficient accesses.
However, if another host LBR event comes in and takes over the LBR
facility, the LBR msrs will be made interceptible, and guest following
accesses to the LBR msrs will be trapped and meaningless. 

Because saving/restoring tens of LBR MSRs (e.g. 32 LBR stack entries) in
VMX transition brings too excessive overhead to frequent vmx transition
itself, the guest LBR event would help save/restore the LBR stack msrs
during the context switching with the help of native LBR event callstack
mechanism, including LBR_SELECT msr.

If the guest no longer accesses the LBR-related MSRs within a scheduling
time slice and the LBR enable bit is unset, vPMU would release its guest
LBR event as a normal event of a unused vPMC and the pass-through
state of the LBR stack msrs would be canceled.

---

LBR testcase:
echo 1 > /proc/sys/kernel/watchdog
echo 25 > /proc/sys/kernel/perf_cpu_time_max_percent
echo 5000 > /proc/sys/kernel/perf_event_max_sample_rate
echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent
./perf record -b ./br_instr a

- Perf report on the host:
Samples: 72K of event 'cycles', Event count (approx.): 72512
Overhead  Command   Source Shared Object           Source Symbol                           Target Symbol                           Basic Block Cycles
  12.12%  br_instr  br_instr                       [.] cmp_end                             [.] lfsr_cond                           1
  11.05%  br_instr  br_instr                       [.] lfsr_cond                           [.] cmp_end                             5
   8.81%  br_instr  br_instr                       [.] lfsr_cond                           [.] cmp_end                             4
   5.04%  br_instr  br_instr                       [.] cmp_end                             [.] lfsr_cond                           20
   4.92%  br_instr  br_instr                       [.] lfsr_cond                           [.] cmp_end                             6
   4.88%  br_instr  br_instr                       [.] cmp_end                             [.] lfsr_cond                           6
   4.58%  br_instr  br_instr                       [.] cmp_end                             [.] lfsr_cond                           5

- Perf report on the guest:
Samples: 92K of event 'cycles', Event count (approx.): 92544
Overhead  Command   Source Shared Object  Source Symbol                                   Target Symbol                                   Basic Block Cycles
  12.03%  br_instr  br_instr              [.] cmp_end                                     [.] lfsr_cond                                   1
  11.09%  br_instr  br_instr              [.] lfsr_cond                                   [.] cmp_end                                     5
   8.57%  br_instr  br_instr              [.] lfsr_cond                                   [.] cmp_end                                     4
   5.08%  br_instr  br_instr              [.] lfsr_cond                                   [.] cmp_end                                     6
   5.06%  br_instr  br_instr              [.] cmp_end                                     [.] lfsr_cond                                   20
   4.87%  br_instr  br_instr              [.] cmp_end                                     [.] lfsr_cond                                   6
   4.70%  br_instr  br_instr              [.] cmp_end                                     [.] lfsr_cond                                   5

Conclusion: the profiling results on the guest are similar to that on the host.

Like Xu (10):
  perf/x86/core: Refactor hw->idx checks and cleanup
  perf/x86/lbr: Add interface to get LBR information
  perf/x86: Add constraint to create guest LBR event without hw counter
  perf/x86: Keep LBR records unchanged in host context for guest usage
  KVM: vmx/pmu: Expose LBR to guest via MSR_IA32_PERF_CAPABILITIES
  KVM: vmx/pmu: Unmask LBR fields in the MSR_IA32_DEBUGCTLMSR emualtion
  KVM: vmx/pmu: Pass-through LBR msrs when guest LBR event is scheduled
  KVM: vmx/pmu: Emulate legacy freezing LBRs on virtual PMI
  KVM: vmx/pmu: Reduce the overhead of LBR pass-through or cancellation
  KVM: vmx/pmu: Release guest LBR event via lazy release mechanism

Wei Wang (1):
  perf/x86: Fix variable types for LBR registers

Qemu-devel:
  target/i386: define a MSR based feature word - FEAT_PERF_CAPABILITIES
  target/i386: add -cpu,lbr=true support to enable guest LBR

 arch/x86/events/core.c            |  26 +--
 arch/x86/events/intel/core.c      | 109 ++++++++-----
 arch/x86/events/intel/lbr.c       |  51 +++++-
 arch/x86/events/perf_event.h      |   8 +-
 arch/x86/include/asm/perf_event.h |  34 +++-
 arch/x86/kvm/pmu.c                |  12 +-
 arch/x86/kvm/pmu.h                |   5 +
 arch/x86/kvm/vmx/capabilities.h   |  23 ++-
 arch/x86/kvm/vmx/pmu_intel.c      | 253 +++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/vmx.c            |  86 +++++++++-
 arch/x86/kvm/vmx/vmx.h            |  17 ++
 arch/x86/kvm/x86.c                |  13 --
 12 files changed, 559 insertions(+), 78 deletions(-)

Comments

Like Xu June 23, 2020, 1:13 p.m. UTC | #1
On 2020/6/13 16:09, Like Xu wrote:
> Hi all,
> 
> Please help review this new version for the Kenrel 5.9 release.
> 
> Now, you may apply the last two qemu-devel patches to the upstream
> qemu and try the guest LBR feature with '-cpu host' command line.
> 
> v11->v12 Changelog:
> - apply "Signed-off-by" form PeterZ and his codes for the perf subsystem;
> - add validity checks before expose LBR via MSR_IA32_PERF_CAPABILITIES;
> - refactor MSR_IA32_DEBUGCTLMSR emulation with validity check;
> - reorder "perf_event_attr" fields according to how they're declared;
> - replace event_is_oncpu() with "event->state" check;
> - make LBR emualtion specific to vmx rather than x86 generic;
> - move pass-through LBR code to vmx.c instead of pmu_intel.c;
> - add vmx_lbr_en/disable_passthrough layer to make code readable;
> - rewrite pmu availability check with vmx_passthrough_lbr_msrs();
> 
> You may check more details in each commit.
> 
> Previous:
> https://lore.kernel.org/kvm/20200514083054.62538-1-like.xu@linux.intel.com/
> 
> ---
...
> 
> Wei Wang (1):
>   perf/x86: Fix variable types for LBR registers > Like Xu (10):
>    perf/x86/core: Refactor hw->idx checks and cleanup
>    perf/x86/lbr: Add interface to get LBR information
>    perf/x86: Add constraint to create guest LBR event without hw counter
>    perf/x86: Keep LBR records unchanged in host context for guest usage

Hi Peter,
Would you like to add "Acked-by" to the first three perf patches ?

>    KVM: vmx/pmu: Expose LBR to guest via MSR_IA32_PERF_CAPABILITIES
>    KVM: vmx/pmu: Unmask LBR fields in the MSR_IA32_DEBUGCTLMSR emualtion
>    KVM: vmx/pmu: Pass-through LBR msrs when guest LBR event is scheduled
>    KVM: vmx/pmu: Emulate legacy freezing LBRs on virtual PMI
>    KVM: vmx/pmu: Reduce the overhead of LBR pass-through or cancellation
>    KVM: vmx/pmu: Release guest LBR event via lazy release mechanism
> 

Hi Paolo,
Would you like to take a moment to review the KVM part for this feature ?

Thanks,
Like Xu

> 
> Qemu-devel:
>    target/i386: add -cpu,lbr=true support to enable guest LBR
> 
>   arch/x86/events/core.c            |  26 +--
>   arch/x86/events/intel/core.c      | 109 ++++++++-----
>   arch/x86/events/intel/lbr.c       |  51 +++++-
>   arch/x86/events/perf_event.h      |   8 +-
>   arch/x86/include/asm/perf_event.h |  34 +++-
>   arch/x86/kvm/pmu.c                |  12 +-
>   arch/x86/kvm/pmu.h                |   5 +
>   arch/x86/kvm/vmx/capabilities.h   |  23 ++-
>   arch/x86/kvm/vmx/pmu_intel.c      | 253 +++++++++++++++++++++++++++++-
>   arch/x86/kvm/vmx/vmx.c            |  86 +++++++++-
>   arch/x86/kvm/vmx/vmx.h            |  17 ++
>   arch/x86/kvm/x86.c                |  13 --
>   12 files changed, 559 insertions(+), 78 deletions(-)
>
Like Xu July 1, 2020, 2:38 a.m. UTC | #2
Ping friendly.

If there is room for improvement, please let me know.

On 2020/6/23 21:13, Like Xu wrote:
> On 2020/6/13 16:09, Like Xu wrote:
>> Hi all,
>>
>> Please help review this new version for the Kenrel 5.9 release.
>>
>> Now, you may apply the last two qemu-devel patches to the upstream
>> qemu and try the guest LBR feature with '-cpu host' command line.
>>
>> v11->v12 Changelog:
>> - apply "Signed-off-by" form PeterZ and his codes for the perf subsystem;
>> - add validity checks before expose LBR via MSR_IA32_PERF_CAPABILITIES;
>> - refactor MSR_IA32_DEBUGCTLMSR emulation with validity check;
>> - reorder "perf_event_attr" fields according to how they're declared;
>> - replace event_is_oncpu() with "event->state" check;
>> - make LBR emualtion specific to vmx rather than x86 generic;
>> - move pass-through LBR code to vmx.c instead of pmu_intel.c;
>> - add vmx_lbr_en/disable_passthrough layer to make code readable;
>> - rewrite pmu availability check with vmx_passthrough_lbr_msrs();
>>
>> You may check more details in each commit.
>>
>> Previous:
>> https://lore.kernel.org/kvm/20200514083054.62538-1-like.xu@linux.intel.com/
>>
>> ---
> ...
>>
>> Wei Wang (1):
>>   perf/x86: Fix variable types for LBR registers > Like Xu (10):
>>    perf/x86/core: Refactor hw->idx checks and cleanup
>>    perf/x86/lbr: Add interface to get LBR information
>>    perf/x86: Add constraint to create guest LBR event without hw counter
>>    perf/x86: Keep LBR records unchanged in host context for guest usage
> 
> Hi Peter,
> Would you like to add "Acked-by" to the first three perf patches ?
> 
>>    KVM: vmx/pmu: Expose LBR to guest via MSR_IA32_PERF_CAPABILITIES
>>    KVM: vmx/pmu: Unmask LBR fields in the MSR_IA32_DEBUGCTLMSR emualtion
>>    KVM: vmx/pmu: Pass-through LBR msrs when guest LBR event is scheduled
>>    KVM: vmx/pmu: Emulate legacy freezing LBRs on virtual PMI
>>    KVM: vmx/pmu: Reduce the overhead of LBR pass-through or cancellation
>>    KVM: vmx/pmu: Release guest LBR event via lazy release mechanism
>>
> 
> Hi Paolo,
> Would you like to take a moment to review the KVM part for this feature ?
> 
> Thanks,
> Like Xu
> 
>>
>> Qemu-devel:
>>    target/i386: add -cpu,lbr=true support to enable guest LBR
>>
>>   arch/x86/events/core.c            |  26 +--
>>   arch/x86/events/intel/core.c      | 109 ++++++++-----
>>   arch/x86/events/intel/lbr.c       |  51 +++++-
>>   arch/x86/events/perf_event.h      |   8 +-
>>   arch/x86/include/asm/perf_event.h |  34 +++-
>>   arch/x86/kvm/pmu.c                |  12 +-
>>   arch/x86/kvm/pmu.h                |   5 +
>>   arch/x86/kvm/vmx/capabilities.h   |  23 ++-
>>   arch/x86/kvm/vmx/pmu_intel.c      | 253 +++++++++++++++++++++++++++++-
>>   arch/x86/kvm/vmx/vmx.c            |  86 +++++++++-
>>   arch/x86/kvm/vmx/vmx.h            |  17 ++
>>   arch/x86/kvm/x86.c                |  13 --
>>   12 files changed, 559 insertions(+), 78 deletions(-)
>>
>
Peter Zijlstra July 2, 2020, 7:40 a.m. UTC | #3
On Sat, Jun 13, 2020 at 04:09:45PM +0800, Like Xu wrote:
> Like Xu (10):
>   perf/x86/core: Refactor hw->idx checks and cleanup
>   perf/x86/lbr: Add interface to get LBR information
>   perf/x86: Add constraint to create guest LBR event without hw counter
>   perf/x86: Keep LBR records unchanged in host context for guest usage

> Wei Wang (1):
>   perf/x86: Fix variable types for LBR registers

>  arch/x86/events/core.c            |  26 +--
>  arch/x86/events/intel/core.c      | 109 ++++++++-----
>  arch/x86/events/intel/lbr.c       |  51 +++++-
>  arch/x86/events/perf_event.h      |   8 +-
>  arch/x86/include/asm/perf_event.h |  34 +++-

These look good to me; but at the same time Kan is sending me
Architectural LBR patches.

Kan, if I take these perf patches and stick them in a tip/perf/vlbr
topic branch, can you rebase the arch lbr stuff on top, or is there
anything in the arch-lbr series that badly conflicts with this work?

Paolo, would that topic branch work for you too, to then stick these
patches in top?

>   KVM: vmx/pmu: Expose LBR to guest via MSR_IA32_PERF_CAPABILITIES
>   KVM: vmx/pmu: Unmask LBR fields in the MSR_IA32_DEBUGCTLMSR emualtion
>   KVM: vmx/pmu: Pass-through LBR msrs when guest LBR event is scheduled
>   KVM: vmx/pmu: Emulate legacy freezing LBRs on virtual PMI
>   KVM: vmx/pmu: Reduce the overhead of LBR pass-through or cancellation
>   KVM: vmx/pmu: Release guest LBR event via lazy release mechanism

>  arch/x86/kvm/pmu.c                |  12 +-
>  arch/x86/kvm/pmu.h                |   5 +
>  arch/x86/kvm/vmx/capabilities.h   |  23 ++-
>  arch/x86/kvm/vmx/pmu_intel.c      | 253 +++++++++++++++++++++++++++++-
>  arch/x86/kvm/vmx/vmx.c            |  86 +++++++++-
>  arch/x86/kvm/vmx/vmx.h            |  17 ++
>  arch/x86/kvm/x86.c                |  13 --

>  12 files changed, 559 insertions(+), 78 deletions(-)
Liang, Kan July 2, 2020, 1:11 p.m. UTC | #4
On 7/2/2020 3:40 AM, Peter Zijlstra wrote:
> On Sat, Jun 13, 2020 at 04:09:45PM +0800, Like Xu wrote:
>> Like Xu (10):
>>    perf/x86/core: Refactor hw->idx checks and cleanup
>>    perf/x86/lbr: Add interface to get LBR information
>>    perf/x86: Add constraint to create guest LBR event without hw counter
>>    perf/x86: Keep LBR records unchanged in host context for guest usage
> 
>> Wei Wang (1):
>>    perf/x86: Fix variable types for LBR registers
> 
>>   arch/x86/events/core.c            |  26 +--
>>   arch/x86/events/intel/core.c      | 109 ++++++++-----
>>   arch/x86/events/intel/lbr.c       |  51 +++++-
>>   arch/x86/events/perf_event.h      |   8 +-
>>   arch/x86/include/asm/perf_event.h |  34 +++-
> 
> These look good to me; but at the same time Kan is sending me
> Architectural LBR patches.
> 
> Kan, if I take these perf patches and stick them in a tip/perf/vlbr
> topic branch, can you rebase the arch lbr stuff on top, or is there
> anything in the arch-lbr series that badly conflicts with this work?
> 

Yes, I can rebase the arch lbr patches on top of them.
Please push the tip/perf/vlbr branch, so I can pull and rebase my patches.

Thanks,
Kan
Peter Zijlstra July 2, 2020, 1:58 p.m. UTC | #5
On Thu, Jul 02, 2020 at 09:11:06AM -0400, Liang, Kan wrote:
> On 7/2/2020 3:40 AM, Peter Zijlstra wrote:
> > On Sat, Jun 13, 2020 at 04:09:45PM +0800, Like Xu wrote:
> > > Like Xu (10):
> > >    perf/x86/core: Refactor hw->idx checks and cleanup
> > >    perf/x86/lbr: Add interface to get LBR information
> > >    perf/x86: Add constraint to create guest LBR event without hw counter
> > >    perf/x86: Keep LBR records unchanged in host context for guest usage
> > 
> > > Wei Wang (1):
> > >    perf/x86: Fix variable types for LBR registers
> > 
> > >   arch/x86/events/core.c            |  26 +--
> > >   arch/x86/events/intel/core.c      | 109 ++++++++-----
> > >   arch/x86/events/intel/lbr.c       |  51 +++++-
> > >   arch/x86/events/perf_event.h      |   8 +-
> > >   arch/x86/include/asm/perf_event.h |  34 +++-
> > 
> > These look good to me; but at the same time Kan is sending me
> > Architectural LBR patches.
> > 
> > Kan, if I take these perf patches and stick them in a tip/perf/vlbr
> > topic branch, can you rebase the arch lbr stuff on top, or is there
> > anything in the arch-lbr series that badly conflicts with this work?
> > 
> 
> Yes, I can rebase the arch lbr patches on top of them.
> Please push the tip/perf/vlbr branch, so I can pull and rebase my patches.

For now I have:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/vlbr

Once the 0day robot comes back all-green, I'll push it out to
tip/perf/vlbr and merge it into tip/perf/core.

Thanks!
Peter Zijlstra July 3, 2020, 7:56 a.m. UTC | #6
On Thu, Jul 02, 2020 at 03:58:42PM +0200, Peter Zijlstra wrote:
> On Thu, Jul 02, 2020 at 09:11:06AM -0400, Liang, Kan wrote:
> > On 7/2/2020 3:40 AM, Peter Zijlstra wrote:
> > > On Sat, Jun 13, 2020 at 04:09:45PM +0800, Like Xu wrote:
> > > > Like Xu (10):
> > > >    perf/x86/core: Refactor hw->idx checks and cleanup
> > > >    perf/x86/lbr: Add interface to get LBR information
> > > >    perf/x86: Add constraint to create guest LBR event without hw counter
> > > >    perf/x86: Keep LBR records unchanged in host context for guest usage
> > > 
> > > > Wei Wang (1):
> > > >    perf/x86: Fix variable types for LBR registers
> > > 
> > > >   arch/x86/events/core.c            |  26 +--
> > > >   arch/x86/events/intel/core.c      | 109 ++++++++-----
> > > >   arch/x86/events/intel/lbr.c       |  51 +++++-
> > > >   arch/x86/events/perf_event.h      |   8 +-
> > > >   arch/x86/include/asm/perf_event.h |  34 +++-
> > > 
> > > These look good to me; but at the same time Kan is sending me
> > > Architectural LBR patches.
> > > 
> > > Kan, if I take these perf patches and stick them in a tip/perf/vlbr
> > > topic branch, can you rebase the arch lbr stuff on top, or is there
> > > anything in the arch-lbr series that badly conflicts with this work?
> > > 
> > 
> > Yes, I can rebase the arch lbr patches on top of them.
> > Please push the tip/perf/vlbr branch, so I can pull and rebase my patches.
> 
> For now I have:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/vlbr
> 
> Once the 0day robot comes back all-green, I'll push it out to
> tip/perf/vlbr and merge it into tip/perf/core.

tip/perf/vlbr now exists, thanks!
Xu, Like July 3, 2020, 8:04 a.m. UTC | #7
On 2020/7/3 15:56, Peter Zijlstra wrote:
> On Thu, Jul 02, 2020 at 03:58:42PM +0200, Peter Zijlstra wrote:
>> On Thu, Jul 02, 2020 at 09:11:06AM -0400, Liang, Kan wrote:
>>> On 7/2/2020 3:40 AM, Peter Zijlstra wrote:
>>>> On Sat, Jun 13, 2020 at 04:09:45PM +0800, Like Xu wrote:
>>>>> Like Xu (10):
>>>>>     perf/x86/core: Refactor hw->idx checks and cleanup
>>>>>     perf/x86/lbr: Add interface to get LBR information
>>>>>     perf/x86: Add constraint to create guest LBR event without hw counter
>>>>>     perf/x86: Keep LBR records unchanged in host context for guest usage
>>>>> Wei Wang (1):
>>>>>     perf/x86: Fix variable types for LBR registers
>>>>>    arch/x86/events/core.c            |  26 +--
>>>>>    arch/x86/events/intel/core.c      | 109 ++++++++-----
>>>>>    arch/x86/events/intel/lbr.c       |  51 +++++-
>>>>>    arch/x86/events/perf_event.h      |   8 +-
>>>>>    arch/x86/include/asm/perf_event.h |  34 +++-
>>>> These look good to me; but at the same time Kan is sending me
>>>> Architectural LBR patches.
>>>>
>>>> Kan, if I take these perf patches and stick them in a tip/perf/vlbr
>>>> topic branch, can you rebase the arch lbr stuff on top, or is there
>>>> anything in the arch-lbr series that badly conflicts with this work?
>>>>
>>> Yes, I can rebase the arch lbr patches on top of them.
>>> Please push the tip/perf/vlbr branch, so I can pull and rebase my patches.
>> For now I have:
>>
>>    git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/vlbr
>>
>> Once the 0day robot comes back all-green, I'll push it out to
>> tip/perf/vlbr and merge it into tip/perf/core.
> tip/perf/vlbr now exists, thanks!
Hi Peter,

Thanks for your patience and professional support on this feature!

Thanks,
Like Xu