mbox series

[00/13] Implement support for IBS virtualization

Message ID 20230904095347.14994-1-manali.shukla@amd.com (mailing list archive)
Headers show
Series Implement support for IBS virtualization | expand

Message

Manali Shukla Sept. 4, 2023, 9:53 a.m. UTC
Add support for IBS virtualization (VIBS). VIBS feature allows the
guest to collect IBS samples without exiting the guest.  There are
2 parts to it [1].
- Virtualizing the IBS register state.
- Ensuring the IBS interrupt is handled in the guest without exiting
  the hypervisor.

VIBS requires the use of AVIC or NMI virtualization for delivery of
a virtualized interrupt from IBS hardware in the guest [1].

While IBS collects data for IBS fetch/op block for the sampled
interval, VIBS hardware signals a VNMI, but the source of VNMI is
different in both AVIC enabled/disabled case.
- When AVIC is disabled, virtual NMI is HW accelerated.
- When AVIC is enabled, virtual NMI is accelerated via AVIC using
  Extended LVT.

The local interrupts are extended to include more LVT registers, to
allow additional interrupt sources, like instruction based sampling
etc. [3].

Note that, since IBS registers are swap type C [2], the hypervisor is
responsible for saving and restoring of IBS host state. Hypervisor
does so only when IBS is active on the host to avoid unnecessary
rdmsrs/wrmsrs. Hypervisor needs to disable host IBS before saving the
state and enter the guest. After a guest exit, the hypervisor needs to
restore host IBS state and re-enable IBS.

[1]: https://bugzilla.kernel.org/attachment.cgi?id=304653
     AMD64 Architecture Programmer’s Manual, Vol 2, Section 15.38
     Instruction-Based Sampling Virtualization.

[2]: https://bugzilla.kernel.org/attachment.cgi?id=304653
     AMD64 Architecture Programmer’s Manual, Vol 2, Appendix B Layout
     of VMCB, Table B-3 Swap Types.

[3]: https://bugzilla.kernel.org/attachment.cgi?id=304653
     AMD64 Architecture Programmer’s Manual, Vol 2, Section 16.4.5
     Extended Interrupts.

Testing done:
- Following tests were executed on guest
  sudo perf record -e ibs_op// -c 100000 -a
  sudo perf record -e ibs_op// -c 100000 -C 10
  sudo perf record -e ibs_op/cnt_ctl=1/ -c 100000 -a
  sudo perf record -e ibs_op/cnt_ctl=1/ -c 100000 -a --raw-samples
  sudo perf record -e ibs_op/cnt_ctl=1,l3missonly=1/ -c 100000 -a
  sudo perf record -e ibs_op/cnt_ctl=1/ -c 100000 -p 1234
  sudo perf record -e ibs_op/cnt_ctl=1/ -c 100000 -- ls
  sudo ./tools/perf/perf record -e ibs_op// -e ibs_fetch// -a --raw-samples -c 100000
  sudo perf report
  sudo perf script
  sudo perf report -D | grep -P "LdOp 1.*StOp 0" | wc -l
  sudo perf report -D | grep -P "LdOp 1.*StOp 0.*DcMiss 1" | wc -l
  sudo perf report -D | grep -P "LdOp 1.*StOp 0.*DcMiss 1.*L2Miss 1" | wc -l
  sudo perf report -D | grep -B1 -P "LdOp 1.*StOp 0.*DcMiss 1.*L2Miss 1" | grep -P "DataSrc ([02-9]|1[0-2])=" | wc -l

- Following Nested guests combinations were tested manually
  ----------------------------
  | Collected IBS Samples in |
  ----------------------------
  |   L0   |   L1   |   L2   |
  ----------------------------
  |   Y    |   Y    |   Y    |
  |   Y    |   Y    |   N    |
  |   Y    |   N    |   Y    |
  |   Y    |   N    |   N    |
  |   N    |   Y    |   Y    |
  |   N    |   Y    |   N    |
  |   N    |   N    |   Y    |
  ----------------------------

Qemu changes can be found at below location:
https://github.com/Kullu14/qemu/tree/qemu_vibs_branch

Qemu commandline to enable IBS virtualization: qemu-system-x86_64
-enable-kvm -cpu EPYC-Genoa,+svm,+ibs,+extapic,+extlvt,+vibs \ ..

base-commit: 3b2ac85b3d2954b583dc2039825ad76eda8516a9 (kvm_x86 misc)

On top of base commit,
https://lore.kernel.org/all/20230717041903.85480-1-manali.shukla@amd.com
patch is applied, then VIBS patch series is applied 

Manali Shukla (8):
  KVM: Add KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC for
    extapic
  KVM: x86/cpuid: Add a KVM-only leaf for IBS capabilities
  KVM: x86: Extend CPUID range to include new leaf
  perf/x86/amd: Add framework to save/restore host IBS state
  x86/cpufeatures: Add CPUID feature bit for VIBS in SEV-ES guest
  KVM: SVM: Add support for IBS virtualization for SEV-ES guests
  KVM: SVM: Enable IBS virtualization on non SEV-ES and SEV-ES guests
  KVM: x86: nSVM: Implement support for nested IBS virtualization

Santosh Shukla (5):
  x86/cpufeatures: Add CPUID feature bit for Extended LVT
  KVM: x86: Add emulation support for Extented LVT registers
  x86/cpufeatures: Add CPUID feature bit for virtualized IBS
  KVM: SVM: Extend VMCB area for virtualized IBS registers
  KVM: SVM: add support for IBS virtualization for non SEV-ES guests

 Documentation/virt/kvm/api.rst     |  23 ++++
 arch/x86/events/amd/Makefile       |   2 +-
 arch/x86/events/amd/ibs.c          |  23 ++++
 arch/x86/events/amd/vibs.c         | 101 ++++++++++++++
 arch/x86/include/asm/apicdef.h     |  14 ++
 arch/x86/include/asm/cpufeatures.h |   3 +
 arch/x86/include/asm/perf_event.h  |  27 ++++
 arch/x86/include/asm/svm.h         |  34 ++++-
 arch/x86/include/uapi/asm/kvm.h    |   5 +
 arch/x86/kvm/cpuid.c               |  11 ++
 arch/x86/kvm/governed_features.h   |   1 +
 arch/x86/kvm/lapic.c               |  78 ++++++++++-
 arch/x86/kvm/lapic.h               |   7 +-
 arch/x86/kvm/reverse_cpuid.h       |  15 +++
 arch/x86/kvm/svm/avic.c            |   4 +
 arch/x86/kvm/svm/nested.c          |  23 ++++
 arch/x86/kvm/svm/sev.c             |  10 ++
 arch/x86/kvm/svm/svm.c             | 207 +++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.h             |   5 +-
 arch/x86/kvm/x86.c                 |  24 ++--
 include/uapi/linux/kvm.h           |  10 ++
 21 files changed, 603 insertions(+), 24 deletions(-)
 create mode 100644 arch/x86/events/amd/vibs.c

Comments

Peter Zijlstra Sept. 5, 2023, 3:47 p.m. UTC | #1
On Mon, Sep 04, 2023 at 09:53:34AM +0000, Manali Shukla wrote:

> Note that, since IBS registers are swap type C [2], the hypervisor is
> responsible for saving and restoring of IBS host state. Hypervisor
> does so only when IBS is active on the host to avoid unnecessary
> rdmsrs/wrmsrs. Hypervisor needs to disable host IBS before saving the
> state and enter the guest. After a guest exit, the hypervisor needs to
> restore host IBS state and re-enable IBS.

Why do you think it is OK for a guest to disable the host IBS when
entering a guest? Perhaps the host was wanting to profile the guest.

Only when perf_event_attr::exclude_guest is set is this allowed,
otherwise you have to respect the host running IBS and you're not
allowed to touch it.

Host trumps guest etc..
Manali Shukla Sept. 6, 2023, 3:38 p.m. UTC | #2
Hi Peter,

Thank you for looking into this.

On 9/5/2023 9:17 PM, Peter Zijlstra wrote:
> On Mon, Sep 04, 2023 at 09:53:34AM +0000, Manali Shukla wrote:
> 
>> Note that, since IBS registers are swap type C [2], the hypervisor is
>> responsible for saving and restoring of IBS host state. Hypervisor
>> does so only when IBS is active on the host to avoid unnecessary
>> rdmsrs/wrmsrs. Hypervisor needs to disable host IBS before saving the
>> state and enter the guest. After a guest exit, the hypervisor needs to
>> restore host IBS state and re-enable IBS.
> 
> Why do you think it is OK for a guest to disable the host IBS when
> entering a guest? Perhaps the host was wanting to profile the guest.
> 

1. Since IBS registers are of swap type C [1], only guest state is saved
and restored by the hardware. Host state needs to be saved and restored by
hypervisor. In order to save IBS registers correctly, IBS needs to be
disabled before saving the IBS registers.

2. As per APM [2],
"When a VMRUN is executed to an SEV-ES guest with IBS virtualization enabled, the
IbsFetchCtl[IbsFetchEn] and IbsOpCtl[IbsOpEn] MSR bits must be 0. If either of 
these bits are not 0, the VMRUN will fail with a VMEXIT_INVALID error code."
This is enforced by hardware on SEV-ES guests when VIBS is enabled on SEV-ES
guests.

3. VIBS is not enabled by default. It can be enabled by an explicit
qemu command line option "-cpu +ibs". Guest should be invoked without
this option when host wants to profile the guest.

[1] https://bugzilla.kernel.org/attachment.cgi?id=304653
    AMD64 Architecture Programmer’s Manual, Vol 2, Appendix B. Layout
    of VMCB,
    Table B-2. VMCB Layout, State Save Area 
    Table B-4. VMSA Layout, State Save Area for SEV-ES
    
[2] https://bugzilla.kernel.org/attachment.cgi?id=304653
    AMD64 Architecture Programmer’s Manual, Vol 2, Section 15.38,
    Instruction-Based Sampling Virtualization


> Only when perf_event_attr::exclude_guest is set is this allowed,
> otherwise you have to respect the host running IBS and you're not
> allowed to touch it.
> 
> Host trumps guest etc..


- Manali
Peter Zijlstra Sept. 6, 2023, 7:56 p.m. UTC | #3
On Wed, Sep 06, 2023 at 09:08:25PM +0530, Manali Shukla wrote:
> Hi Peter,
> 
> Thank you for looking into this.
> 
> On 9/5/2023 9:17 PM, Peter Zijlstra wrote:
> > On Mon, Sep 04, 2023 at 09:53:34AM +0000, Manali Shukla wrote:
> > 
> >> Note that, since IBS registers are swap type C [2], the hypervisor is
> >> responsible for saving and restoring of IBS host state. Hypervisor
> >> does so only when IBS is active on the host to avoid unnecessary
> >> rdmsrs/wrmsrs. Hypervisor needs to disable host IBS before saving the
> >> state and enter the guest. After a guest exit, the hypervisor needs to
> >> restore host IBS state and re-enable IBS.
> > 
> > Why do you think it is OK for a guest to disable the host IBS when
> > entering a guest? Perhaps the host was wanting to profile the guest.
> > 
> 
> 1. Since IBS registers are of swap type C [1], only guest state is saved
> and restored by the hardware. Host state needs to be saved and restored by
> hypervisor. In order to save IBS registers correctly, IBS needs to be
> disabled before saving the IBS registers.
> 
> 2. As per APM [2],
> "When a VMRUN is executed to an SEV-ES guest with IBS virtualization enabled, the
> IbsFetchCtl[IbsFetchEn] and IbsOpCtl[IbsOpEn] MSR bits must be 0. If either of 
> these bits are not 0, the VMRUN will fail with a VMEXIT_INVALID error code."
> This is enforced by hardware on SEV-ES guests when VIBS is enabled on SEV-ES
> guests.

I'm not sure I'm fluent in virt speak (in fact, I'm sure I'm not). Is
the above saying that a host can never IBS profile a guest?

Does the current IBS thing assert perf_event_attr::exclude_guest is set?

I can't quickly find anything :-(
Manali Shukla Sept. 7, 2023, 3:49 p.m. UTC | #4
Hi Peter,

On 9/7/2023 1:26 AM, Peter Zijlstra wrote:
> On Wed, Sep 06, 2023 at 09:08:25PM +0530, Manali Shukla wrote:
>> Hi Peter,
>>
>> Thank you for looking into this.
>>
>> On 9/5/2023 9:17 PM, Peter Zijlstra wrote:
>>> On Mon, Sep 04, 2023 at 09:53:34AM +0000, Manali Shukla wrote:
>>>
>>>> Note that, since IBS registers are swap type C [2], the hypervisor is
>>>> responsible for saving and restoring of IBS host state. Hypervisor
>>>> does so only when IBS is active on the host to avoid unnecessary
>>>> rdmsrs/wrmsrs. Hypervisor needs to disable host IBS before saving the
>>>> state and enter the guest. After a guest exit, the hypervisor needs to
>>>> restore host IBS state and re-enable IBS.
>>>
>>> Why do you think it is OK for a guest to disable the host IBS when
>>> entering a guest? Perhaps the host was wanting to profile the guest.
>>>
>>
>> 1. Since IBS registers are of swap type C [1], only guest state is saved
>> and restored by the hardware. Host state needs to be saved and restored by
>> hypervisor. In order to save IBS registers correctly, IBS needs to be
>> disabled before saving the IBS registers.
>>
>> 2. As per APM [2],
>> "When a VMRUN is executed to an SEV-ES guest with IBS virtualization enabled, the
>> IbsFetchCtl[IbsFetchEn] and IbsOpCtl[IbsOpEn] MSR bits must be 0. If either of 
>> these bits are not 0, the VMRUN will fail with a VMEXIT_INVALID error code."
>> This is enforced by hardware on SEV-ES guests when VIBS is enabled on SEV-ES
>> guests.
> 
> I'm not sure I'm fluent in virt speak (in fact, I'm sure I'm not). Is
> the above saying that a host can never IBS profile a guest?

Host can profile a guest with IBS if VIBS is disabled for the guest. This is
the default behavior. Host can not profile guest if VIBS is enabled for guest.

> 
> Does the current IBS thing assert perf_event_attr::exclude_guest is set?

Unlike AMD core pmu, IBS doesn't have Host/Guest filtering capability, thus
perf_event_open() fails if exclude_guest is set for an IBS event.

> 
> I can't quickly find anything :-(

Thank you,
Manali
Peter Zijlstra Sept. 8, 2023, 1:31 p.m. UTC | #5
On Thu, Sep 07, 2023 at 09:19:51PM +0530, Manali Shukla wrote:

> > I'm not sure I'm fluent in virt speak (in fact, I'm sure I'm not). Is
> > the above saying that a host can never IBS profile a guest?
> 
> Host can profile a guest with IBS if VIBS is disabled for the guest. This is
> the default behavior. Host can not profile guest if VIBS is enabled for guest.
> 
> > 
> > Does the current IBS thing assert perf_event_attr::exclude_guest is set?
> 
> Unlike AMD core pmu, IBS doesn't have Host/Guest filtering capability, thus
> perf_event_open() fails if exclude_guest is set for an IBS event.

Then you must not allow VIBS if a host cpu-wide IBS counter exists.

Also, VIBS reads like it can be (ab)used as a filter.
Manali Shukla Sept. 11, 2023, 12:32 p.m. UTC | #6
On 9/8/2023 7:01 PM, Peter Zijlstra wrote:
> On Thu, Sep 07, 2023 at 09:19:51PM +0530, Manali Shukla wrote:
> 
>>> I'm not sure I'm fluent in virt speak (in fact, I'm sure I'm not). Is
>>> the above saying that a host can never IBS profile a guest?
>>
>> Host can profile a guest with IBS if VIBS is disabled for the guest. This is
>> the default behavior. Host can not profile guest if VIBS is enabled for guest.
>>
>>>
>>> Does the current IBS thing assert perf_event_attr::exclude_guest is set?
>>
>> Unlike AMD core pmu, IBS doesn't have Host/Guest filtering capability, thus
>> perf_event_open() fails if exclude_guest is set for an IBS event.
> 
> Then you must not allow VIBS if a host cpu-wide IBS counter exists.
> 
> Also, VIBS reads like it can be (ab)used as a filter.

I think I get your point: If host IBS with exclude_guest=0 doesn't capture
guest samples because of VIBS, it is an unintended behavior.

But if a guest cannot use IBS because a host is using it, that is also
unacceptable behavior.

Let me think over it and come back.

- Manali
Manali Shukla Sept. 28, 2023, 11:18 a.m. UTC | #7
On 9/11/2023 6:02 PM, Manali Shukla wrote:
> On 9/8/2023 7:01 PM, Peter Zijlstra wrote:
>> On Thu, Sep 07, 2023 at 09:19:51PM +0530, Manali Shukla wrote:
>>
>>>> I'm not sure I'm fluent in virt speak (in fact, I'm sure I'm not). Is
>>>> the above saying that a host can never IBS profile a guest?
>>>
>>> Host can profile a guest with IBS if VIBS is disabled for the guest. This is
>>> the default behavior. Host can not profile guest if VIBS is enabled for guest.
>>>
>>>>
>>>> Does the current IBS thing assert perf_event_attr::exclude_guest is set?
>>>
>>> Unlike AMD core pmu, IBS doesn't have Host/Guest filtering capability, thus
>>> perf_event_open() fails if exclude_guest is set for an IBS event.
>>
>> Then you must not allow VIBS if a host cpu-wide IBS counter exists.
>>
>> Also, VIBS reads like it can be (ab)used as a filter.
> 
> I think I get your point: If host IBS with exclude_guest=0 doesn't capture
> guest samples because of VIBS, it is an unintended behavior.
> 
> But if a guest cannot use IBS because a host is using it, that is also
> unacceptable behavior.
> 
> Let me think over it and come back.
> 
> - Manali

Hi Peter,

Apologies for the delay in response. It took me a while to think about
various possible solutions and their feasibility.

Problem with current design is, exclude_guest setting of the host IBS event is
not honored. Essentially, guest samples become invisible to the host IBS even
when exclude_guest=0, when VIBS is enabled on the guest.

Solution 1:
Enforce exclude_guest=1 for host IBS when hw supports VIBS, i.e.

        if (cpuid[VIBS])
                enforce exclude_guest=1
        else
                enforce exclude_guest=0

Disable/enable host IBS at VM Entry/VM Exit if cpuid[VIBS] is set and an active
IBS event exists on that cpu.

The major downside of this approach is, it will break all currently working
scripts which are using perf_event_open() to start an ibs event, since new
kernel will suddenly start failing for IBS events with exclude_guest=0. The
other issue is that the host with cpuid[VIBS] set, would not allow profiling any
guest from the host.

Solution 1.1:
This is an extension to Solution 1. Instead of keying off based on just
cpuid[VIBS] bit, introduce a kvm-amd module parameter and use both:

         if (cpuid[VIBS] && kvm-amd.ko loaded with vibs=1)
                enforce exclude_guest=1
         else
                enforce exclude_guest=0

KVM AMD vibs module parameter determines whether guest will be able to use VIBS
or not.  The kvm-amd.ko should be loaded with vibs=0, if a host wants to profile
guest and the kvm-amd.ko should be loaded with vibs=1, if a guest wants to use
VIBS. However, both are mutually exclusive.

The issue of digressing from current exclude_guest behavior remains with this
solution. Other issues are,
1) If the host IBS is active, with vibs=0, reloading of the kvm-amd.ko with
   vibs=1 will fail until IBS is running on the host.
2) If the host IBS is active, with vibs=1, reloading of the kvm-amd.ko with
   vibs=0 will fail until IBS is running on the host.

Solution 2:
Dynamically disable/enable VIBS per vCPU basis, i.e. when a host IBS is active,
guest will not be able to use VIBS _for that vCPU_. If the host is not using
IBS, VIBS will be enabled at VM Entry.

Although this solution does not digress from existing exclude_guest behavior, it
has its own limitations:
1) VIBS inside the guest is unreliable because host IBS can dynamically change
   VIBS behavior.
2) It works only for SVM and SEV guests, but not for SEV-ES and SEV-SNP guests
   since there is no way to disable VIBS dynamically.

From all the above solutions, we would be more inclined on implementing
solution 1.1.

-Manali