mbox series

[v2,0/4] KVM: arm64: Improve PMU support on heterogeneous systems

Message ID 20211206170223.309789-1-alexandru.elisei@arm.com (mailing list archive)
Headers show
Series KVM: arm64: Improve PMU support on heterogeneous systems | expand

Message

Alexandru Elisei Dec. 6, 2021, 5:02 p.m. UTC
(CC'ing Peter Maydell in case this might be of interest to qemu)

The series can be found on a branch at [1], and the kvmtool support at [2].
The kvmtool patches are also on the mailing list [3] and haven't changed
since v1.

Detailed explanation of the issue and symptoms that the patches attempt to
correct can be found in the cover letter for v1 [4].

A brief summary of the problem is that on heterogeneous systems KVM will
always use the same PMU for creating the VCPU events for *all* VCPUs
regardless of the physical CPU on which the VCPU is running, leading to
events suddenly stopping and resuming in the guest as the VCPU thread gets
migrated across different CPUs.

This series proposes to fix this behaviour by allowing the user to specify
which physical PMU is used when creating the VCPU events needed for guest
PMU emulation. When the PMU is set, KVM will refuse to the VCPU on a
physical which is not part of the supported CPUs for the specified PMU.

The default behaviour stays the same - without userspace setting the PMU,
events will stop counting if the VCPU is scheduled on the wrong CPU.

Changes since v1:

- Rebased on top of v5.16-rc4

- Implemented review comments: protect iterating through the list of PMUs
  with a mutex, documentation changes, initialize vcpu-arch.supported_cpus
  to cpu_possible_mask, changed vcpu->arch.cpu_not_supported to a VCPU
  flag, set exit reason to KVM_EXIT_FAIL_ENTRY and populate fail_entry when
  the VCPU is run on a CPU not in the PMU's supported cpumask. Many thanks
  for the review!

[1] https://gitlab.arm.com/linux-arm/linux-ae/-/tree/pmu-big-little-fix-v2
[2] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/pmu-big-little-fix-v1
[3] https://www.spinics.net/lists/arm-kernel/msg933584.html
[4] https://www.spinics.net/lists/arm-kernel/msg933579.html

Alexandru Elisei (4):
  perf: Fix wrong name in comment for struct perf_cpu_context
  KVM: arm64: Keep a list of probed PMUs
  KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute
  KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical
    CPU

 Documentation/virt/kvm/devices/vcpu.rst | 29 +++++++++++
 arch/arm64/include/asm/kvm_host.h       | 12 +++++
 arch/arm64/include/uapi/asm/kvm.h       |  4 ++
 arch/arm64/kvm/arm.c                    | 19 ++++++++
 arch/arm64/kvm/pmu-emul.c               | 64 +++++++++++++++++++++++--
 include/kvm/arm_pmu.h                   |  6 +++
 include/linux/perf_event.h              |  2 +-
 tools/arch/arm64/include/uapi/asm/kvm.h |  1 +
 8 files changed, 132 insertions(+), 5 deletions(-)

Comments

Reiji Watanabe Dec. 8, 2021, 2:36 a.m. UTC | #1
Hi Alex,

On Mon, Dec 6, 2021 at 9:02 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> (CC'ing Peter Maydell in case this might be of interest to qemu)
>
> The series can be found on a branch at [1], and the kvmtool support at [2].
> The kvmtool patches are also on the mailing list [3] and haven't changed
> since v1.
>
> Detailed explanation of the issue and symptoms that the patches attempt to
> correct can be found in the cover letter for v1 [4].
>
> A brief summary of the problem is that on heterogeneous systems KVM will
> always use the same PMU for creating the VCPU events for *all* VCPUs
> regardless of the physical CPU on which the VCPU is running, leading to
> events suddenly stopping and resuming in the guest as the VCPU thread gets
> migrated across different CPUs.
>
> This series proposes to fix this behaviour by allowing the user to specify
> which physical PMU is used when creating the VCPU events needed for guest
> PMU emulation. When the PMU is set, KVM will refuse to the VCPU on a
> physical which is not part of the supported CPUs for the specified PMU.

Just to confirm, this series provides an API for userspace to request
KVM to detect a wrong affinity setting due to a userspace bug so that
userspace can get an error at KVM_RUN instead of leading to events
suddenly stopping, correct ?


> The default behaviour stays the same - without userspace setting the PMU,
> events will stop counting if the VCPU is scheduled on the wrong CPU.

Can't we fix the default behavior (in addition to the current fix) ?
(Do we need to maintain the default behavior ??)
IMHO I feel it is better to prevent userspace from configuring PMU
for guests on such heterogeneous systems rather than leading to
events suddenly stopping even as the default behavior.

Thanks,
Reiji


>
> Changes since v1:
>
> - Rebased on top of v5.16-rc4
>
> - Implemented review comments: protect iterating through the list of PMUs
>   with a mutex, documentation changes, initialize vcpu-arch.supported_cpus
>   to cpu_possible_mask, changed vcpu->arch.cpu_not_supported to a VCPU
>   flag, set exit reason to KVM_EXIT_FAIL_ENTRY and populate fail_entry when
>   the VCPU is run on a CPU not in the PMU's supported cpumask. Many thanks
>   for the review!
>
> [1] https://gitlab.arm.com/linux-arm/linux-ae/-/tree/pmu-big-little-fix-v2
> [2] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/pmu-big-little-fix-v1
> [3] https://www.spinics.net/lists/arm-kernel/msg933584.html
> [4] https://www.spinics.net/lists/arm-kernel/msg933579.html
>
> Alexandru Elisei (4):
>   perf: Fix wrong name in comment for struct perf_cpu_context
>   KVM: arm64: Keep a list of probed PMUs
>   KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute
>   KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical
>     CPU
>
>  Documentation/virt/kvm/devices/vcpu.rst | 29 +++++++++++
>  arch/arm64/include/asm/kvm_host.h       | 12 +++++
>  arch/arm64/include/uapi/asm/kvm.h       |  4 ++
>  arch/arm64/kvm/arm.c                    | 19 ++++++++
>  arch/arm64/kvm/pmu-emul.c               | 64 +++++++++++++++++++++++--
>  include/kvm/arm_pmu.h                   |  6 +++
>  include/linux/perf_event.h              |  2 +-
>  tools/arch/arm64/include/uapi/asm/kvm.h |  1 +
>  8 files changed, 132 insertions(+), 5 deletions(-)
>
> --
> 2.34.1
>
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Marc Zyngier Dec. 8, 2021, 8:05 a.m. UTC | #2
Reji,

On 2021-12-08 02:36, Reiji Watanabe wrote:
> Hi Alex,
> 
> On Mon, Dec 6, 2021 at 9:02 AM Alexandru Elisei
> <alexandru.elisei@arm.com> wrote:
>> 
>> (CC'ing Peter Maydell in case this might be of interest to qemu)
>> 
>> The series can be found on a branch at [1], and the kvmtool support at 
>> [2].
>> The kvmtool patches are also on the mailing list [3] and haven't 
>> changed
>> since v1.
>> 
>> Detailed explanation of the issue and symptoms that the patches 
>> attempt to
>> correct can be found in the cover letter for v1 [4].
>> 
>> A brief summary of the problem is that on heterogeneous systems KVM 
>> will
>> always use the same PMU for creating the VCPU events for *all* VCPUs
>> regardless of the physical CPU on which the VCPU is running, leading 
>> to
>> events suddenly stopping and resuming in the guest as the VCPU thread 
>> gets
>> migrated across different CPUs.
>> 
>> This series proposes to fix this behaviour by allowing the user to 
>> specify
>> which physical PMU is used when creating the VCPU events needed for 
>> guest
>> PMU emulation. When the PMU is set, KVM will refuse to the VCPU on a
>> physical which is not part of the supported CPUs for the specified 
>> PMU.
> 
> Just to confirm, this series provides an API for userspace to request
> KVM to detect a wrong affinity setting due to a userspace bug so that
> userspace can get an error at KVM_RUN instead of leading to events
> suddenly stopping, correct ?

More than that, it allows userspace to select which PMU will be used
for their guest. The affinity setting is a byproduct of the PMU's own
affinity.

> 
>> The default behaviour stays the same - without userspace setting the 
>> PMU,
>> events will stop counting if the VCPU is scheduled on the wrong CPU.
> 
> Can't we fix the default behavior (in addition to the current fix) ?
> (Do we need to maintain the default behavior ??)

Of course we do. This is a behaviour that has been exposed to userspace
for years, and *we don't break userspace*.

> IMHO I feel it is better to prevent userspace from configuring PMU
> for guests on such heterogeneous systems rather than leading to
> events suddenly stopping even as the default behavior.

People running KVM on asymmetric systems *strongly* disagree with you.

         M.
Reiji Watanabe Dec. 13, 2021, 6:36 a.m. UTC | #3
On Wed, Dec 8, 2021 at 12:05 AM Marc Zyngier <maz@kernel.org> wrote:
>
> Reji,
>
> On 2021-12-08 02:36, Reiji Watanabe wrote:
> > Hi Alex,
> >
> > On Mon, Dec 6, 2021 at 9:02 AM Alexandru Elisei
> > <alexandru.elisei@arm.com> wrote:
> >>
> >> (CC'ing Peter Maydell in case this might be of interest to qemu)
> >>
> >> The series can be found on a branch at [1], and the kvmtool support at
> >> [2].
> >> The kvmtool patches are also on the mailing list [3] and haven't
> >> changed
> >> since v1.
> >>
> >> Detailed explanation of the issue and symptoms that the patches
> >> attempt to
> >> correct can be found in the cover letter for v1 [4].
> >>
> >> A brief summary of the problem is that on heterogeneous systems KVM
> >> will
> >> always use the same PMU for creating the VCPU events for *all* VCPUs
> >> regardless of the physical CPU on which the VCPU is running, leading
> >> to
> >> events suddenly stopping and resuming in the guest as the VCPU thread
> >> gets
> >> migrated across different CPUs.
> >>
> >> This series proposes to fix this behaviour by allowing the user to
> >> specify
> >> which physical PMU is used when creating the VCPU events needed for
> >> guest
> >> PMU emulation. When the PMU is set, KVM will refuse to the VCPU on a
> >> physical which is not part of the supported CPUs for the specified
> >> PMU.
> >
> > Just to confirm, this series provides an API for userspace to request
> > KVM to detect a wrong affinity setting due to a userspace bug so that
> > userspace can get an error at KVM_RUN instead of leading to events
> > suddenly stopping, correct ?
>
> More than that, it allows userspace to select which PMU will be used
> for their guest. The affinity setting is a byproduct of the PMU's own
> affinity.

Thank you for the clarification.
(I overlooked the change in kvm_pmu_create_perf_event()...)


> >
> >> The default behaviour stays the same - without userspace setting the
> >> PMU,
> >> events will stop counting if the VCPU is scheduled on the wrong CPU.
> >
> > Can't we fix the default behavior (in addition to the current fix) ?
> > (Do we need to maintain the default behavior ??)
>
> Of course we do. This is a behaviour that has been exposed to userspace
> for years, and *we don't break userspace*.

I'm wondering if it might be better to have kvm_pmu_create_perf_event()
set attr.type to pmu_id based on the current (physical) CPU by default
on such heterogeneous systems (even if userspace don't explicitly
specify pmu_id with the new API).  Then, by setting the CPU affinity,
the PMU in that environment can behave predictably even with existing
userspace (or maybe this won't be helpful at all?).

Thanks,
Reiji
Alexandru Elisei Dec. 13, 2021, 11:14 a.m. UTC | #4
Hi Reiji,

On Sun, Dec 12, 2021 at 10:36:52PM -0800, Reiji Watanabe wrote:
> On Wed, Dec 8, 2021 at 12:05 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > Reji,
> >
> > On 2021-12-08 02:36, Reiji Watanabe wrote:
> > > Hi Alex,
> > >
> > > On Mon, Dec 6, 2021 at 9:02 AM Alexandru Elisei
> > > <alexandru.elisei@arm.com> wrote:
> > >>
> > >> (CC'ing Peter Maydell in case this might be of interest to qemu)
> > >>
> > >> The series can be found on a branch at [1], and the kvmtool support at
> > >> [2].
> > >> The kvmtool patches are also on the mailing list [3] and haven't
> > >> changed
> > >> since v1.
> > >>
> > >> Detailed explanation of the issue and symptoms that the patches
> > >> attempt to
> > >> correct can be found in the cover letter for v1 [4].
> > >>
> > >> A brief summary of the problem is that on heterogeneous systems KVM
> > >> will
> > >> always use the same PMU for creating the VCPU events for *all* VCPUs
> > >> regardless of the physical CPU on which the VCPU is running, leading
> > >> to
> > >> events suddenly stopping and resuming in the guest as the VCPU thread
> > >> gets
> > >> migrated across different CPUs.
> > >>
> > >> This series proposes to fix this behaviour by allowing the user to
> > >> specify
> > >> which physical PMU is used when creating the VCPU events needed for
> > >> guest
> > >> PMU emulation. When the PMU is set, KVM will refuse to the VCPU on a
> > >> physical which is not part of the supported CPUs for the specified
> > >> PMU.
> > >
> > > Just to confirm, this series provides an API for userspace to request
> > > KVM to detect a wrong affinity setting due to a userspace bug so that
> > > userspace can get an error at KVM_RUN instead of leading to events
> > > suddenly stopping, correct ?
> >
> > More than that, it allows userspace to select which PMU will be used
> > for their guest. The affinity setting is a byproduct of the PMU's own
> > affinity.
> 
> Thank you for the clarification.
> (I overlooked the change in kvm_pmu_create_perf_event()...)
> 
> 
> > >
> > >> The default behaviour stays the same - without userspace setting the
> > >> PMU,
> > >> events will stop counting if the VCPU is scheduled on the wrong CPU.
> > >
> > > Can't we fix the default behavior (in addition to the current fix) ?
> > > (Do we need to maintain the default behavior ??)
> >
> > Of course we do. This is a behaviour that has been exposed to userspace
> > for years, and *we don't break userspace*.
> 
> I'm wondering if it might be better to have kvm_pmu_create_perf_event()
> set attr.type to pmu_id based on the current (physical) CPU by default
> on such heterogeneous systems (even if userspace don't explicitly
> specify pmu_id with the new API).  Then, by setting the CPU affinity,
> the PMU in that environment can behave predictably even with existing
> userspace (or maybe this won't be helpful at all?).

I think then you would end up with the possible mismatch between
kvm->arch.pmuver and the version of the PMU that is used for creating the
events.

Also, as VCPUs get migrated from one physical CPU to the other, the
semantics of the microarchitectural events change, even if the event ID is
the same.

Thanks,
Alex

> 
> Thanks,
> Reiji
Reiji Watanabe Dec. 14, 2021, 6:24 a.m. UTC | #5
Hi Alex,

On Mon, Dec 13, 2021 at 3:14 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> Hi Reiji,
>
> On Sun, Dec 12, 2021 at 10:36:52PM -0800, Reiji Watanabe wrote:
> > On Wed, Dec 8, 2021 at 12:05 AM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > Reji,
> > >
> > > On 2021-12-08 02:36, Reiji Watanabe wrote:
> > > > Hi Alex,
> > > >
> > > > On Mon, Dec 6, 2021 at 9:02 AM Alexandru Elisei
> > > > <alexandru.elisei@arm.com> wrote:
> > > >>
> > > >> (CC'ing Peter Maydell in case this might be of interest to qemu)
> > > >>
> > > >> The series can be found on a branch at [1], and the kvmtool support at
> > > >> [2].
> > > >> The kvmtool patches are also on the mailing list [3] and haven't
> > > >> changed
> > > >> since v1.
> > > >>
> > > >> Detailed explanation of the issue and symptoms that the patches
> > > >> attempt to
> > > >> correct can be found in the cover letter for v1 [4].
> > > >>
> > > >> A brief summary of the problem is that on heterogeneous systems KVM
> > > >> will
> > > >> always use the same PMU for creating the VCPU events for *all* VCPUs
> > > >> regardless of the physical CPU on which the VCPU is running, leading
> > > >> to
> > > >> events suddenly stopping and resuming in the guest as the VCPU thread
> > > >> gets
> > > >> migrated across different CPUs.
> > > >>
> > > >> This series proposes to fix this behaviour by allowing the user to
> > > >> specify
> > > >> which physical PMU is used when creating the VCPU events needed for
> > > >> guest
> > > >> PMU emulation. When the PMU is set, KVM will refuse to the VCPU on a
> > > >> physical which is not part of the supported CPUs for the specified
> > > >> PMU.
> > > >
> > > > Just to confirm, this series provides an API for userspace to request
> > > > KVM to detect a wrong affinity setting due to a userspace bug so that
> > > > userspace can get an error at KVM_RUN instead of leading to events
> > > > suddenly stopping, correct ?
> > >
> > > More than that, it allows userspace to select which PMU will be used
> > > for their guest. The affinity setting is a byproduct of the PMU's own
> > > affinity.
> >
> > Thank you for the clarification.
> > (I overlooked the change in kvm_pmu_create_perf_event()...)
> >
> >
> > > >
> > > >> The default behaviour stays the same - without userspace setting the
> > > >> PMU,
> > > >> events will stop counting if the VCPU is scheduled on the wrong CPU.
> > > >
> > > > Can't we fix the default behavior (in addition to the current fix) ?
> > > > (Do we need to maintain the default behavior ??)
> > >
> > > Of course we do. This is a behaviour that has been exposed to userspace
> > > for years, and *we don't break userspace*.
> >
> > I'm wondering if it might be better to have kvm_pmu_create_perf_event()
> > set attr.type to pmu_id based on the current (physical) CPU by default
> > on such heterogeneous systems (even if userspace don't explicitly
> > specify pmu_id with the new API).  Then, by setting the CPU affinity,
> > the PMU in that environment can behave predictably even with existing
> > userspace (or maybe this won't be helpful at all?).
>
> I think then you would end up with the possible mismatch between
> kvm->arch.pmuver and the version of the PMU that is used for creating the
> events.

Yes, but, I would think we can have kvm_pmu_create_perf_event()
set vcpu->arch.pmu.arm_pmu based on the current physical CPU
when vcpu->arch.pmu.arm_pmu is null (then, the pmuver is handled
as if KVM_ARM_VCPU_PMU_V3_SET_PMU was done implicitly).


> Also, as VCPUs get migrated from one physical CPU to the other, the
> semantics of the microarchitectural events change, even if the event ID is
> the same.

Yes, I understand.  As mentioned, this can work only when the
CPU affinity is set for vCPU threads appropriately (, which could
be done even without changing userspace).

Thanks,
Reiji
Marc Zyngier Dec. 14, 2021, 11:56 a.m. UTC | #6
On Tue, 14 Dec 2021 06:24:38 +0000,
Reiji Watanabe <reijiw@google.com> wrote:
> 
> Hi Alex,
> 
> On Mon, Dec 13, 2021 at 3:14 AM Alexandru Elisei
> <alexandru.elisei@arm.com> wrote:
> >
> > Also, as VCPUs get migrated from one physical CPU to the other, the
> > semantics of the microarchitectural events change, even if the event ID is
> > the same.
> 
> Yes, I understand.  As mentioned, this can work only when the
> CPU affinity is set for vCPU threads appropriately (, which could
> be done even without changing userspace).

Implicit bindings to random PMUs based on the scheduling seems a
pretty fragile API to me, and presents no real incentive for userspace
to start doing the right thing.

I'd prefer not counting events at all when on the wrong CPU (for some
definition of 'wrong'), rather than accumulating unrelated events.
Both are admittedly wrong, but between two evils, I'd rather stick
with the one I know (and that doesn't require any change)...

Alex's series brings a way to solve this by allowing userspace to pick
a PMU and make sure userspace is aware of the consequences. It puts
userspace in charge, and doesn't leave space for ambiguous behaviours.

I definitely find value in this approach.

	M.
Reiji Watanabe Dec. 15, 2021, 6:47 a.m. UTC | #7
On Tue, Dec 14, 2021 at 3:57 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Tue, 14 Dec 2021 06:24:38 +0000,
> Reiji Watanabe <reijiw@google.com> wrote:
> >
> > Hi Alex,
> >
> > On Mon, Dec 13, 2021 at 3:14 AM Alexandru Elisei
> > <alexandru.elisei@arm.com> wrote:
> > >
> > > Also, as VCPUs get migrated from one physical CPU to the other, the
> > > semantics of the microarchitectural events change, even if the event ID is
> > > the same.
> >
> > Yes, I understand.  As mentioned, this can work only when the
> > CPU affinity is set for vCPU threads appropriately (, which could
> > be done even without changing userspace).
>
> Implicit bindings to random PMUs based on the scheduling seems a
> pretty fragile API to me,

Yes, I understand that. I was just looking into the possibility
of improving the default behavior in some way rather than keeping
the unpredictable default behavior.

> and presents no real incentive for userspace
> to start doing the right thing.

I see... It makes sense.
I didn't think about that aspect.

> I'd prefer not counting events at all when on the wrong CPU (for some
> definition of 'wrong'), rather than accumulating unrelated events.
> Both are admittedly wrong, but between two evils, I'd rather stick
> with the one I know (and that doesn't require any change)...
>
> Alex's series brings a way to solve this by allowing userspace to pick
> a PMU and make sure userspace is aware of the consequences. It puts
> userspace in charge, and doesn't leave space for ambiguous behaviours.
>
> I definitely find value in this approach.

Yes, I agree with that.
It wasn't meant to replace Alex's approach.  It was only about the
default behavior (i.e. when userspace does not specify a PMUID with
the new API).

Anyway, thank you so much for sharing your thoughts on it !

Regards,
Reiji