mbox series

[v3,0/4] KVM: arm64: Improve PMU support on heterogeneous systems

Message ID 20211213152309.158462-1-alexandru.elisei@arm.com (mailing list archive)
Headers show
Series KVM: arm64: Improve PMU support on heterogeneous systems | expand

Message

Alexandru Elisei Dec. 13, 2021, 3:23 p.m. UTC
(CC'ing Peter Maydell in case this might be of interest to qemu)

The series can be found on a branch at [1], and the kvmtool support at [2].
The kvmtool patches are also on the mailing list [3] and haven't changed
since v1.

Detailed explanation of the issue and symptoms that the patches attempt to
correct can be found in the cover letter for v1 [4].

A summary of the problem is that on heterogeneous systems KVM will always
use the same PMU for creating the VCPU events for *all* VCPUs regardless of
the physical CPU on which the VCPU is running, leading to events suddenly
stopping and resuming in the guest as the VCPU thread gets migrated across
different CPUs.

This series proposes to fix this behaviour by allowing the user to specify
which physical PMU is used when creating the VCPU events needed for guest
PMU emulation. When the PMU is set, KVM will refuse to the VCPU on a
physical which is not part of the supported CPUs for the specified PMU. The
restriction is that all VCPUs must use the same PMU to avoid emulating an
asymmetric platform.

The default behaviour stays the same - without userspace setting the PMU,
events will stop counting if the VCPU is scheduled on the wrong CPU.

Tested with a hacked version of kvmtool that does the PMU initialization
from the VCPU thread as opposed to from the main thread. Tested on
rockpro64 by testing what happens when all VCPUs having the same PMU, one
random VCPU having a different PMU than the other VCPUs and one random VCPU
not having the PMU set (each test was run 1,000 times on the little cores
and 1,000 times on the big cores).

Also tested on an Altra by testing all VCPUs having the same PMU, all VCPUs
not having a PMU set, and one random VCPU not having the PMU set; the VM
had 64 threads in each of the tests and each test was run 10,000 times.

Changes since v2 [5]:

- Rebased on top of v5.16-rc5
- Check that all VCPUs have the same PMU set (or none at all).
- Use the VCPU's PMUVer value when calculating the event mask, if a PMU is
  set for that VCPU.
- Clear the unsupported CPU flag in vcpu_put().
- Move the handling of the unsupported CPU flag in kvm_vcpu_exit_request().
- Free the cpumask of supported CPUs if kvm_arch_vcpu_create() fails.

Changes since v1 [4]:

- Rebased on top of v5.16-rc4
- Implemented review comments: protect iterating through the list of PMUs
  with a mutex, documentation changes, initialize vcpu-arch.supported_cpus
  to cpu_possible_mask, changed vcpu->arch.cpu_not_supported to a VCPU
  flag, set exit reason to KVM_EXIT_FAIL_ENTRY and populate fail_entry when
  the VCPU is run on a CPU not in the PMU's supported cpumask. Many thanks
  for the review!

[1] https://gitlab.arm.com/linux-arm/linux-ae/-/tree/pmu-big-little-fix-v3
[2] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/pmu-big-little-fix-v1
[3] https://www.spinics.net/lists/arm-kernel/msg933584.html
[4] https://www.spinics.net/lists/arm-kernel/msg933579.html
[5] https://www.spinics.net/lists/kvm-arm/msg50944.html


Alexandru Elisei (4):
  perf: Fix wrong name in comment for struct perf_cpu_context
  KVM: arm64: Keep a list of probed PMUs
  KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute
  KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical
    CPU

 Documentation/virt/kvm/devices/vcpu.rst |  34 ++++++-
 arch/arm64/include/asm/kvm_host.h       |  12 +++
 arch/arm64/include/uapi/asm/kvm.h       |   4 +
 arch/arm64/kvm/arm.c                    |  29 +++++-
 arch/arm64/kvm/pmu-emul.c               | 114 ++++++++++++++++++++----
 include/kvm/arm_pmu.h                   |   9 +-
 include/linux/perf_event.h              |   2 +-
 tools/arch/arm64/include/uapi/asm/kvm.h |   1 +
 8 files changed, 180 insertions(+), 25 deletions(-)

Comments

Marc Zyngier Dec. 30, 2021, 8:01 p.m. UTC | #1
Alex,

On Mon, 13 Dec 2021 15:23:05 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> (CC'ing Peter Maydell in case this might be of interest to qemu)
> 
> The series can be found on a branch at [1], and the kvmtool support at [2].
> The kvmtool patches are also on the mailing list [3] and haven't changed
> since v1.
> 
> Detailed explanation of the issue and symptoms that the patches attempt to
> correct can be found in the cover letter for v1 [4].
> 
> A summary of the problem is that on heterogeneous systems KVM will always
> use the same PMU for creating the VCPU events for *all* VCPUs regardless of
> the physical CPU on which the VCPU is running, leading to events suddenly
> stopping and resuming in the guest as the VCPU thread gets migrated across
> different CPUs.
> 
> This series proposes to fix this behaviour by allowing the user to specify
> which physical PMU is used when creating the VCPU events needed for guest
> PMU emulation. When the PMU is set, KVM will refuse to the VCPU on a
> physical which is not part of the supported CPUs for the specified PMU. The
> restriction is that all VCPUs must use the same PMU to avoid emulating an
> asymmetric platform.
> 
> The default behaviour stays the same - without userspace setting the PMU,
> events will stop counting if the VCPU is scheduled on the wrong CPU.
> 
> Tested with a hacked version of kvmtool that does the PMU initialization
> from the VCPU thread as opposed to from the main thread. Tested on
> rockpro64 by testing what happens when all VCPUs having the same PMU, one
> random VCPU having a different PMU than the other VCPUs and one random VCPU
> not having the PMU set (each test was run 1,000 times on the little cores
> and 1,000 times on the big cores).
> 
> Also tested on an Altra by testing all VCPUs having the same PMU, all VCPUs
> not having a PMU set, and one random VCPU not having the PMU set; the VM
> had 64 threads in each of the tests and each test was run 10,000 times.

Came back to this series, and found more problems. On top of the
remarks I had earlier (the per-CPU data structures that really should
per VM, the disappearing attribute size), what happens when event
filters are already registered and that you set a specific PMU?

I took the matter in my own hands (the joy of being in quarantine) and
wrote whatever fixes I thought were necessary[1].

Please have a look.

	M.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pmu-bl
Alexandru Elisei Jan. 6, 2022, 12:07 p.m. UTC | #2
Hi Marc,

On Thu, Dec 30, 2021 at 08:01:10PM +0000, Marc Zyngier wrote:
> Alex,
> 
> On Mon, 13 Dec 2021 15:23:05 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > (CC'ing Peter Maydell in case this might be of interest to qemu)
> > 
> > The series can be found on a branch at [1], and the kvmtool support at [2].
> > The kvmtool patches are also on the mailing list [3] and haven't changed
> > since v1.
> > 
> > Detailed explanation of the issue and symptoms that the patches attempt to
> > correct can be found in the cover letter for v1 [4].
> > 
> > A summary of the problem is that on heterogeneous systems KVM will always
> > use the same PMU for creating the VCPU events for *all* VCPUs regardless of
> > the physical CPU on which the VCPU is running, leading to events suddenly
> > stopping and resuming in the guest as the VCPU thread gets migrated across
> > different CPUs.
> > 
> > This series proposes to fix this behaviour by allowing the user to specify
> > which physical PMU is used when creating the VCPU events needed for guest
> > PMU emulation. When the PMU is set, KVM will refuse to the VCPU on a
> > physical which is not part of the supported CPUs for the specified PMU. The
> > restriction is that all VCPUs must use the same PMU to avoid emulating an
> > asymmetric platform.
> > 
> > The default behaviour stays the same - without userspace setting the PMU,
> > events will stop counting if the VCPU is scheduled on the wrong CPU.
> > 
> > Tested with a hacked version of kvmtool that does the PMU initialization
> > from the VCPU thread as opposed to from the main thread. Tested on
> > rockpro64 by testing what happens when all VCPUs having the same PMU, one
> > random VCPU having a different PMU than the other VCPUs and one random VCPU
> > not having the PMU set (each test was run 1,000 times on the little cores
> > and 1,000 times on the big cores).
> > 
> > Also tested on an Altra by testing all VCPUs having the same PMU, all VCPUs
> > not having a PMU set, and one random VCPU not having the PMU set; the VM
> > had 64 threads in each of the tests and each test was run 10,000 times.
> 
> Came back to this series, and found more problems. On top of the
> remarks I had earlier (the per-CPU data structures that really should
> per VM, the disappearing attribute size), what happens when event
> filters are already registered and that you set a specific PMU?

This is a good point. When I looked at how the PMU event filter works, I
saw that KVM doesn't attempt to check that the events are actually
implemented on the PMU, but somehow skipped over the fact that the PMU
affects the total number of events available.

Thanks,
Alex

> 
> I took the matter in my own hands (the joy of being in quarantine) and
> wrote whatever fixes I thought were necessary[1].
> 
> Please have a look.
> 
> 	M.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pmu-bl
> 
> -- 
> Without deviation from the norm, progress is not possible.
Marc Zyngier Jan. 6, 2022, 6:21 p.m. UTC | #3
On Thu, 06 Jan 2022 12:07:38 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> On Thu, Dec 30, 2021 at 08:01:10PM +0000, Marc Zyngier wrote:
> > Alex,
> > 
> > On Mon, 13 Dec 2021 15:23:05 +0000,
> > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > > 
> > > (CC'ing Peter Maydell in case this might be of interest to qemu)
> > > 
> > > The series can be found on a branch at [1], and the kvmtool support at [2].
> > > The kvmtool patches are also on the mailing list [3] and haven't changed
> > > since v1.
> > > 
> > > Detailed explanation of the issue and symptoms that the patches attempt to
> > > correct can be found in the cover letter for v1 [4].
> > > 
> > > A summary of the problem is that on heterogeneous systems KVM will always
> > > use the same PMU for creating the VCPU events for *all* VCPUs regardless of
> > > the physical CPU on which the VCPU is running, leading to events suddenly
> > > stopping and resuming in the guest as the VCPU thread gets migrated across
> > > different CPUs.
> > > 
> > > This series proposes to fix this behaviour by allowing the user to specify
> > > which physical PMU is used when creating the VCPU events needed for guest
> > > PMU emulation. When the PMU is set, KVM will refuse to the VCPU on a
> > > physical which is not part of the supported CPUs for the specified PMU. The
> > > restriction is that all VCPUs must use the same PMU to avoid emulating an
> > > asymmetric platform.
> > > 
> > > The default behaviour stays the same - without userspace setting the PMU,
> > > events will stop counting if the VCPU is scheduled on the wrong CPU.
> > > 
> > > Tested with a hacked version of kvmtool that does the PMU initialization
> > > from the VCPU thread as opposed to from the main thread. Tested on
> > > rockpro64 by testing what happens when all VCPUs having the same PMU, one
> > > random VCPU having a different PMU than the other VCPUs and one random VCPU
> > > not having the PMU set (each test was run 1,000 times on the little cores
> > > and 1,000 times on the big cores).
> > > 
> > > Also tested on an Altra by testing all VCPUs having the same PMU, all VCPUs
> > > not having a PMU set, and one random VCPU not having the PMU set; the VM
> > > had 64 threads in each of the tests and each test was run 10,000 times.
> > 
> > Came back to this series, and found more problems. On top of the
> > remarks I had earlier (the per-CPU data structures that really should
> > per VM, the disappearing attribute size), what happens when event
> > filters are already registered and that you set a specific PMU?
> 
> This is a good point. When I looked at how the PMU event filter works, I
> saw that KVM doesn't attempt to check that the events are actually
> implemented on the PMU, but somehow skipped over the fact that the PMU
> affects the total number of events available.

That, but also the meaning of the events. Switching PMU after
programmed event filters is really odd, as you don't know what you are
filtering anymore (unless you stick to purely architected events).

Thanks,

	M.