Message ID | 20211206170223.309789-1-alexandru.elisei@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | KVM: arm64: Improve PMU support on heterogeneous systems | expand |
Hi Alex, On Mon, Dec 6, 2021 at 9:02 AM Alexandru Elisei <alexandru.elisei@arm.com> wrote: > > (CC'ing Peter Maydell in case this might be of interest to qemu) > > The series can be found on a branch at [1], and the kvmtool support at [2]. > The kvmtool patches are also on the mailing list [3] and haven't changed > since v1. > > Detailed explanation of the issue and symptoms that the patches attempt to > correct can be found in the cover letter for v1 [4]. > > A brief summary of the problem is that on heterogeneous systems KVM will > always use the same PMU for creating the VCPU events for *all* VCPUs > regardless of the physical CPU on which the VCPU is running, leading to > events suddenly stopping and resuming in the guest as the VCPU thread gets > migrated across different CPUs. > > This series proposes to fix this behaviour by allowing the user to specify > which physical PMU is used when creating the VCPU events needed for guest > PMU emulation. When the PMU is set, KVM will refuse to the VCPU on a > physical which is not part of the supported CPUs for the specified PMU. Just to confirm, this series provides an API for userspace to request KVM to detect a wrong affinity setting due to a userspace bug so that userspace can get an error at KVM_RUN instead of leading to events suddenly stopping, correct ? > The default behaviour stays the same - without userspace setting the PMU, > events will stop counting if the VCPU is scheduled on the wrong CPU. Can't we fix the default behavior (in addition to the current fix) ? (Do we need to maintain the default behavior ??) IMHO I feel it is better to prevent userspace from configuring PMU for guests on such heterogeneous systems rather than leading to events suddenly stopping even as the default behavior. Thanks, Reiji > > Changes since v1: > > - Rebased on top of v5.16-rc4 > > - Implemented review comments: protect iterating through the list of PMUs > with a mutex, documentation changes, initialize vcpu-arch.supported_cpus > to cpu_possible_mask, changed vcpu->arch.cpu_not_supported to a VCPU > flag, set exit reason to KVM_EXIT_FAIL_ENTRY and populate fail_entry when > the VCPU is run on a CPU not in the PMU's supported cpumask. Many thanks > for the review! > > [1] https://gitlab.arm.com/linux-arm/linux-ae/-/tree/pmu-big-little-fix-v2 > [2] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/pmu-big-little-fix-v1 > [3] https://www.spinics.net/lists/arm-kernel/msg933584.html > [4] https://www.spinics.net/lists/arm-kernel/msg933579.html > > Alexandru Elisei (4): > perf: Fix wrong name in comment for struct perf_cpu_context > KVM: arm64: Keep a list of probed PMUs > KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute > KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical > CPU > > Documentation/virt/kvm/devices/vcpu.rst | 29 +++++++++++ > arch/arm64/include/asm/kvm_host.h | 12 +++++ > arch/arm64/include/uapi/asm/kvm.h | 4 ++ > arch/arm64/kvm/arm.c | 19 ++++++++ > arch/arm64/kvm/pmu-emul.c | 64 +++++++++++++++++++++++-- > include/kvm/arm_pmu.h | 6 +++ > include/linux/perf_event.h | 2 +- > tools/arch/arm64/include/uapi/asm/kvm.h | 1 + > 8 files changed, 132 insertions(+), 5 deletions(-) > > -- > 2.34.1 > > _______________________________________________ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Reji, On 2021-12-08 02:36, Reiji Watanabe wrote: > Hi Alex, > > On Mon, Dec 6, 2021 at 9:02 AM Alexandru Elisei > <alexandru.elisei@arm.com> wrote: >> >> (CC'ing Peter Maydell in case this might be of interest to qemu) >> >> The series can be found on a branch at [1], and the kvmtool support at >> [2]. >> The kvmtool patches are also on the mailing list [3] and haven't >> changed >> since v1. >> >> Detailed explanation of the issue and symptoms that the patches >> attempt to >> correct can be found in the cover letter for v1 [4]. >> >> A brief summary of the problem is that on heterogeneous systems KVM >> will >> always use the same PMU for creating the VCPU events for *all* VCPUs >> regardless of the physical CPU on which the VCPU is running, leading >> to >> events suddenly stopping and resuming in the guest as the VCPU thread >> gets >> migrated across different CPUs. >> >> This series proposes to fix this behaviour by allowing the user to >> specify >> which physical PMU is used when creating the VCPU events needed for >> guest >> PMU emulation. When the PMU is set, KVM will refuse to the VCPU on a >> physical which is not part of the supported CPUs for the specified >> PMU. > > Just to confirm, this series provides an API for userspace to request > KVM to detect a wrong affinity setting due to a userspace bug so that > userspace can get an error at KVM_RUN instead of leading to events > suddenly stopping, correct ? More than that, it allows userspace to select which PMU will be used for their guest. The affinity setting is a byproduct of the PMU's own affinity. > >> The default behaviour stays the same - without userspace setting the >> PMU, >> events will stop counting if the VCPU is scheduled on the wrong CPU. > > Can't we fix the default behavior (in addition to the current fix) ? > (Do we need to maintain the default behavior ??) Of course we do. This is a behaviour that has been exposed to userspace for years, and *we don't break userspace*. > IMHO I feel it is better to prevent userspace from configuring PMU > for guests on such heterogeneous systems rather than leading to > events suddenly stopping even as the default behavior. People running KVM on asymmetric systems *strongly* disagree with you. M.
On Wed, Dec 8, 2021 at 12:05 AM Marc Zyngier <maz@kernel.org> wrote: > > Reji, > > On 2021-12-08 02:36, Reiji Watanabe wrote: > > Hi Alex, > > > > On Mon, Dec 6, 2021 at 9:02 AM Alexandru Elisei > > <alexandru.elisei@arm.com> wrote: > >> > >> (CC'ing Peter Maydell in case this might be of interest to qemu) > >> > >> The series can be found on a branch at [1], and the kvmtool support at > >> [2]. > >> The kvmtool patches are also on the mailing list [3] and haven't > >> changed > >> since v1. > >> > >> Detailed explanation of the issue and symptoms that the patches > >> attempt to > >> correct can be found in the cover letter for v1 [4]. > >> > >> A brief summary of the problem is that on heterogeneous systems KVM > >> will > >> always use the same PMU for creating the VCPU events for *all* VCPUs > >> regardless of the physical CPU on which the VCPU is running, leading > >> to > >> events suddenly stopping and resuming in the guest as the VCPU thread > >> gets > >> migrated across different CPUs. > >> > >> This series proposes to fix this behaviour by allowing the user to > >> specify > >> which physical PMU is used when creating the VCPU events needed for > >> guest > >> PMU emulation. When the PMU is set, KVM will refuse to the VCPU on a > >> physical which is not part of the supported CPUs for the specified > >> PMU. > > > > Just to confirm, this series provides an API for userspace to request > > KVM to detect a wrong affinity setting due to a userspace bug so that > > userspace can get an error at KVM_RUN instead of leading to events > > suddenly stopping, correct ? > > More than that, it allows userspace to select which PMU will be used > for their guest. The affinity setting is a byproduct of the PMU's own > affinity. Thank you for the clarification. (I overlooked the change in kvm_pmu_create_perf_event()...) > > > >> The default behaviour stays the same - without userspace setting the > >> PMU, > >> events will stop counting if the VCPU is scheduled on the wrong CPU. > > > > Can't we fix the default behavior (in addition to the current fix) ? > > (Do we need to maintain the default behavior ??) > > Of course we do. This is a behaviour that has been exposed to userspace > for years, and *we don't break userspace*. I'm wondering if it might be better to have kvm_pmu_create_perf_event() set attr.type to pmu_id based on the current (physical) CPU by default on such heterogeneous systems (even if userspace don't explicitly specify pmu_id with the new API). Then, by setting the CPU affinity, the PMU in that environment can behave predictably even with existing userspace (or maybe this won't be helpful at all?). Thanks, Reiji
Hi Reiji, On Sun, Dec 12, 2021 at 10:36:52PM -0800, Reiji Watanabe wrote: > On Wed, Dec 8, 2021 at 12:05 AM Marc Zyngier <maz@kernel.org> wrote: > > > > Reji, > > > > On 2021-12-08 02:36, Reiji Watanabe wrote: > > > Hi Alex, > > > > > > On Mon, Dec 6, 2021 at 9:02 AM Alexandru Elisei > > > <alexandru.elisei@arm.com> wrote: > > >> > > >> (CC'ing Peter Maydell in case this might be of interest to qemu) > > >> > > >> The series can be found on a branch at [1], and the kvmtool support at > > >> [2]. > > >> The kvmtool patches are also on the mailing list [3] and haven't > > >> changed > > >> since v1. > > >> > > >> Detailed explanation of the issue and symptoms that the patches > > >> attempt to > > >> correct can be found in the cover letter for v1 [4]. > > >> > > >> A brief summary of the problem is that on heterogeneous systems KVM > > >> will > > >> always use the same PMU for creating the VCPU events for *all* VCPUs > > >> regardless of the physical CPU on which the VCPU is running, leading > > >> to > > >> events suddenly stopping and resuming in the guest as the VCPU thread > > >> gets > > >> migrated across different CPUs. > > >> > > >> This series proposes to fix this behaviour by allowing the user to > > >> specify > > >> which physical PMU is used when creating the VCPU events needed for > > >> guest > > >> PMU emulation. When the PMU is set, KVM will refuse to the VCPU on a > > >> physical which is not part of the supported CPUs for the specified > > >> PMU. > > > > > > Just to confirm, this series provides an API for userspace to request > > > KVM to detect a wrong affinity setting due to a userspace bug so that > > > userspace can get an error at KVM_RUN instead of leading to events > > > suddenly stopping, correct ? > > > > More than that, it allows userspace to select which PMU will be used > > for their guest. The affinity setting is a byproduct of the PMU's own > > affinity. > > Thank you for the clarification. > (I overlooked the change in kvm_pmu_create_perf_event()...) > > > > > > > >> The default behaviour stays the same - without userspace setting the > > >> PMU, > > >> events will stop counting if the VCPU is scheduled on the wrong CPU. > > > > > > Can't we fix the default behavior (in addition to the current fix) ? > > > (Do we need to maintain the default behavior ??) > > > > Of course we do. This is a behaviour that has been exposed to userspace > > for years, and *we don't break userspace*. > > I'm wondering if it might be better to have kvm_pmu_create_perf_event() > set attr.type to pmu_id based on the current (physical) CPU by default > on such heterogeneous systems (even if userspace don't explicitly > specify pmu_id with the new API). Then, by setting the CPU affinity, > the PMU in that environment can behave predictably even with existing > userspace (or maybe this won't be helpful at all?). I think then you would end up with the possible mismatch between kvm->arch.pmuver and the version of the PMU that is used for creating the events. Also, as VCPUs get migrated from one physical CPU to the other, the semantics of the microarchitectural events change, even if the event ID is the same. Thanks, Alex > > Thanks, > Reiji
Hi Alex, On Mon, Dec 13, 2021 at 3:14 AM Alexandru Elisei <alexandru.elisei@arm.com> wrote: > > Hi Reiji, > > On Sun, Dec 12, 2021 at 10:36:52PM -0800, Reiji Watanabe wrote: > > On Wed, Dec 8, 2021 at 12:05 AM Marc Zyngier <maz@kernel.org> wrote: > > > > > > Reji, > > > > > > On 2021-12-08 02:36, Reiji Watanabe wrote: > > > > Hi Alex, > > > > > > > > On Mon, Dec 6, 2021 at 9:02 AM Alexandru Elisei > > > > <alexandru.elisei@arm.com> wrote: > > > >> > > > >> (CC'ing Peter Maydell in case this might be of interest to qemu) > > > >> > > > >> The series can be found on a branch at [1], and the kvmtool support at > > > >> [2]. > > > >> The kvmtool patches are also on the mailing list [3] and haven't > > > >> changed > > > >> since v1. > > > >> > > > >> Detailed explanation of the issue and symptoms that the patches > > > >> attempt to > > > >> correct can be found in the cover letter for v1 [4]. > > > >> > > > >> A brief summary of the problem is that on heterogeneous systems KVM > > > >> will > > > >> always use the same PMU for creating the VCPU events for *all* VCPUs > > > >> regardless of the physical CPU on which the VCPU is running, leading > > > >> to > > > >> events suddenly stopping and resuming in the guest as the VCPU thread > > > >> gets > > > >> migrated across different CPUs. > > > >> > > > >> This series proposes to fix this behaviour by allowing the user to > > > >> specify > > > >> which physical PMU is used when creating the VCPU events needed for > > > >> guest > > > >> PMU emulation. When the PMU is set, KVM will refuse to the VCPU on a > > > >> physical which is not part of the supported CPUs for the specified > > > >> PMU. > > > > > > > > Just to confirm, this series provides an API for userspace to request > > > > KVM to detect a wrong affinity setting due to a userspace bug so that > > > > userspace can get an error at KVM_RUN instead of leading to events > > > > suddenly stopping, correct ? > > > > > > More than that, it allows userspace to select which PMU will be used > > > for their guest. The affinity setting is a byproduct of the PMU's own > > > affinity. > > > > Thank you for the clarification. > > (I overlooked the change in kvm_pmu_create_perf_event()...) > > > > > > > > > > > >> The default behaviour stays the same - without userspace setting the > > > >> PMU, > > > >> events will stop counting if the VCPU is scheduled on the wrong CPU. > > > > > > > > Can't we fix the default behavior (in addition to the current fix) ? > > > > (Do we need to maintain the default behavior ??) > > > > > > Of course we do. This is a behaviour that has been exposed to userspace > > > for years, and *we don't break userspace*. > > > > I'm wondering if it might be better to have kvm_pmu_create_perf_event() > > set attr.type to pmu_id based on the current (physical) CPU by default > > on such heterogeneous systems (even if userspace don't explicitly > > specify pmu_id with the new API). Then, by setting the CPU affinity, > > the PMU in that environment can behave predictably even with existing > > userspace (or maybe this won't be helpful at all?). > > I think then you would end up with the possible mismatch between > kvm->arch.pmuver and the version of the PMU that is used for creating the > events. Yes, but, I would think we can have kvm_pmu_create_perf_event() set vcpu->arch.pmu.arm_pmu based on the current physical CPU when vcpu->arch.pmu.arm_pmu is null (then, the pmuver is handled as if KVM_ARM_VCPU_PMU_V3_SET_PMU was done implicitly). > Also, as VCPUs get migrated from one physical CPU to the other, the > semantics of the microarchitectural events change, even if the event ID is > the same. Yes, I understand. As mentioned, this can work only when the CPU affinity is set for vCPU threads appropriately (, which could be done even without changing userspace). Thanks, Reiji
On Tue, 14 Dec 2021 06:24:38 +0000, Reiji Watanabe <reijiw@google.com> wrote: > > Hi Alex, > > On Mon, Dec 13, 2021 at 3:14 AM Alexandru Elisei > <alexandru.elisei@arm.com> wrote: > > > > Also, as VCPUs get migrated from one physical CPU to the other, the > > semantics of the microarchitectural events change, even if the event ID is > > the same. > > Yes, I understand. As mentioned, this can work only when the > CPU affinity is set for vCPU threads appropriately (, which could > be done even without changing userspace). Implicit bindings to random PMUs based on the scheduling seems a pretty fragile API to me, and presents no real incentive for userspace to start doing the right thing. I'd prefer not counting events at all when on the wrong CPU (for some definition of 'wrong'), rather than accumulating unrelated events. Both are admittedly wrong, but between two evils, I'd rather stick with the one I know (and that doesn't require any change)... Alex's series brings a way to solve this by allowing userspace to pick a PMU and make sure userspace is aware of the consequences. It puts userspace in charge, and doesn't leave space for ambiguous behaviours. I definitely find value in this approach. M.
On Tue, Dec 14, 2021 at 3:57 AM Marc Zyngier <maz@kernel.org> wrote: > > On Tue, 14 Dec 2021 06:24:38 +0000, > Reiji Watanabe <reijiw@google.com> wrote: > > > > Hi Alex, > > > > On Mon, Dec 13, 2021 at 3:14 AM Alexandru Elisei > > <alexandru.elisei@arm.com> wrote: > > > > > > Also, as VCPUs get migrated from one physical CPU to the other, the > > > semantics of the microarchitectural events change, even if the event ID is > > > the same. > > > > Yes, I understand. As mentioned, this can work only when the > > CPU affinity is set for vCPU threads appropriately (, which could > > be done even without changing userspace). > > Implicit bindings to random PMUs based on the scheduling seems a > pretty fragile API to me, Yes, I understand that. I was just looking into the possibility of improving the default behavior in some way rather than keeping the unpredictable default behavior. > and presents no real incentive for userspace > to start doing the right thing. I see... It makes sense. I didn't think about that aspect. > I'd prefer not counting events at all when on the wrong CPU (for some > definition of 'wrong'), rather than accumulating unrelated events. > Both are admittedly wrong, but between two evils, I'd rather stick > with the one I know (and that doesn't require any change)... > > Alex's series brings a way to solve this by allowing userspace to pick > a PMU and make sure userspace is aware of the consequences. It puts > userspace in charge, and doesn't leave space for ambiguous behaviours. > > I definitely find value in this approach. Yes, I agree with that. It wasn't meant to replace Alex's approach. It was only about the default behavior (i.e. when userspace does not specify a PMUID with the new API). Anyway, thank you so much for sharing your thoughts on it ! Regards, Reiji