Message ID | 20240307005833.827147-1-seanjc@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86/pmu: Disable support for adaptive PEBS | expand |
On 3/7/2024 8:58 AM, Sean Christopherson wrote: > Drop support for virtualizing adaptive PEBS, as KVM's implementation is > architecturally broken without an obvious/easy path forward, and because > exposing adaptive PEBS can leak host LBRs to the guest, i.e. can leak > host kernel addresses to the guest. > > Bug #1 is that KVM doesn't doesn't account for the upper 32 bits of "doesn't doesn't" -> "doesn't"? > IA32_FIXED_CTR_CTRL when (re)programming fixed counters, e.g > fixed_ctrl_field() drops the upper bits, reprogram_fixed_counters() > stores local variables as u8s and truncates the upper bits too, etc. > > Bug #2 is that, because KVM _always_ sets precise_ip to a non-zero value > for PEBS events, perf will _always_ generate an adaptive record, even if > the guest requested a basic record. Note, KVM will also enable adaptive > PEBS in individual *counter*, even if adaptive PEBS isn't exposed to the > guest, but this is benign as MSR_PEBS_DATA_CFG is guaranteed to be zero, > i.e. the guest will only ever see Basic records. > > Bug #3 is in perf. intel_pmu_disable_fixed() doesn't clear the upper > bits either, i.e. leaves ICL_FIXED_0_ADAPTIVE set, and > intel_pmu_enable_fixed() effectively doesn't clear ICL_FIXED_0_ADAPTIVE > either. I.e. perf _always_ enables ADAPTIVE counters, regardless of what > KVM requests. I would talk with Liang Kan and prepare a patch to clear the "adaptive_pebs" bit when disabling GP & fixed counters. > > Bug #4 is that adaptive PEBS *might* effectively bypass event filters set > by the host, as "Updated Memory Access Info Group" records information > that might be disallowed by userspace via KVM_SET_PMU_EVENT_FILTER. > > Bug #5 is that KVM doesn't ensure LBR MSRs hold guest values (or at least > zeros) when entering a vCPU with adaptive PEBS, which allows the guest > to read host LBRs, i.e. host RIPs/addresses, by enabling "LBR Entries" > records. > > Disable adaptive PEBS support as an immediate fix due to the severity of > the LBR leak in particular, and because fixing all of the bugs will be > non-trivial, e.g. not suitable for backporting to stable kernels. > > Note! This will break live migration, but trying to make KVM play nice > with live migration would be quite complicated, wouldn't be guaranteed to > work (i.e. KVM might still kill/confuse the guest), and it's not clear > that there are any publicly available VMMs that support adaptive PEBS, > let alone live migrate VMs that support adaptive PEBS, e.g. QEMU doesn't > support PEBS in any capacity. > > Link: https://lore.kernel.org/all/20240306230153.786365-1-seanjc@google.com > Link: https://lore.kernel.org/all/ZeepGjHCeSfadANM@google.com > Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") > Cc: stable@vger.kernel.org > Cc: Like Xu <like.xu.linux@gmail.com> > Cc: Mingwei Zhang <mizhang@google.com> > Cc: Zhenyu Wang <zhenyuw@linux.intel.com> > Cc: Zhang Xiong <xiong.y.zhang@intel.com> > Cc: Lv Zhiyuan <zhiyuan.lv@intel.com> > Cc: Dapeng Mi <dapeng1.mi@intel.com> > Cc: Jim Mattson <jmattson@google.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/kvm/vmx/vmx.c | 24 ++++++++++++++++++++++-- > 1 file changed, 22 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 7a74388f9ecf..641a7d5bf584 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -7864,8 +7864,28 @@ static u64 vmx_get_perf_capabilities(void) > > if (vmx_pebs_supported()) { > perf_cap |= host_perf_cap & PERF_CAP_PEBS_MASK; > - if ((perf_cap & PERF_CAP_PEBS_FORMAT) < 4) > - perf_cap &= ~PERF_CAP_PEBS_BASELINE; > + > + /* > + * Disallow adaptive PEBS as it is functionally broken, can be > + * used by the guest to read *host* LBRs, and can be used to > + * bypass userspace event filters. To correctly and safely > + * support adaptive PEBS, KVM needs to: > + * > + * 1. Account for the ADAPTIVE flag when (re)programming fixed > + * counters. > + * > + * 2. Gain support from perf (or take direct control of counter > + * programming) to support events without adaptive PEBS > + * enabled for the hardware counter. > + * > + * 3. Ensure LBR MSRs cannot hold host data on VM-Entry with > + * adaptive PEBS enabled and MSR_PEBS_DATA_CFG.LBRS=1. > + * > + * 4. Document which PMU events are effectively exposed to the > + * guest via adaptive PEBS, and make adaptive PEBS mutually > + * exclusive with KVM_SET_PMU_EVENT_FILTER if necessary. > + */ > + perf_cap &= ~PERF_CAP_PEBS_BASELINE; > } > > return perf_cap; > > base-commit: 0c64952fec3ea01cb5b09f00134200f3e7ab40d5
On 7/3/2024 8:58 am, Sean Christopherson wrote: > Drop support for virtualizing adaptive PEBS, as KVM's implementation is > architecturally broken without an obvious/easy path forward, and because > exposing adaptive PEBS can leak host LBRs to the guest, i.e. can leak > host kernel addresses to the guest. > > Bug #1 is that KVM doesn't doesn't account for the upper 32 bits of > IA32_FIXED_CTR_CTRL when (re)programming fixed counters, e.g > fixed_ctrl_field() drops the upper bits, reprogram_fixed_counters() > stores local variables as u8s and truncates the upper bits too, etc. > > Bug #2 is that, because KVM _always_ sets precise_ip to a non-zero value > for PEBS events, perf will _always_ generate an adaptive record, even if > the guest requested a basic record. Note, KVM will also enable adaptive > PEBS in individual *counter*, even if adaptive PEBS isn't exposed to the > guest, but this is benign as MSR_PEBS_DATA_CFG is guaranteed to be zero, > i.e. the guest will only ever see Basic records. > > Bug #3 is in perf. intel_pmu_disable_fixed() doesn't clear the upper > bits either, i.e. leaves ICL_FIXED_0_ADAPTIVE set, and > intel_pmu_enable_fixed() effectively doesn't clear ICL_FIXED_0_ADAPTIVE > either. I.e. perf _always_ enables ADAPTIVE counters, regardless of what > KVM requests. The three issues above all point to a fix in one direction: to pass the value of vcpu's EVT_SELx.Adaptive_Record[34] or FCx_Adaptive_Record to the perf/core in a way and let the PEBS assist take effect as expected by vPEBS. One place to address this is in the intel_guest_get_msrs() again: - update vPMC[x].pPMC.fcctl_or_evtsel.ADAPTIVE = vPMC[x].use_adaptive , since guest PEBS is disabled if host PEBS is enabled. > > Bug #4 is that adaptive PEBS *might* effectively bypass event filters set > by the host, as "Updated Memory Access Info Group" records information > that might be disallowed by userspace via KVM_SET_PMU_EVENT_FILTER. This could be seen as a missing feature, that is, whether PMU_EVENT_FILTER can control PEBS events even if they share the same event encoding. Furthermore, if LBR_FMT is cleared only by VMM, could the guest use adaptive pebs to obtain valid guest_lbr records. It's open in the virt context since real hardware doesn't have this issue. > > Bug #5 is that KVM doesn't ensure LBR MSRs hold guest values (or at least > zeros) when entering a vCPU with adaptive PEBS, which allows the guest > to read host LBRs, i.e. host RIPs/addresses, by enabling "LBR Entries" > records. > > Disable adaptive PEBS support as an immediate fix due to the severity of > the LBR leak in particular, and because fixing all of the bugs will be > non-trivial, e.g. not suitable for backporting to stable kernels. > > Note! This will break live migration, but trying to make KVM play nice > with live migration would be quite complicated, wouldn't be guaranteed to > work (i.e. KVM might still kill/confuse the guest), and it's not clear > that there are any publicly available VMMs that support adaptive PEBS, > let alone live migrate VMs that support adaptive PEBS, e.g. QEMU doesn't > support PEBS in any capacity. > > Link: https://lore.kernel.org/all/20240306230153.786365-1-seanjc@google.com > Link: https://lore.kernel.org/all/ZeepGjHCeSfadANM@google.com > Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") > Cc: stable@vger.kernel.org > Cc: Like Xu <like.xu.linux@gmail.com> > Cc: Mingwei Zhang <mizhang@google.com> > Cc: Zhenyu Wang <zhenyuw@linux.intel.com> > Cc: Zhang Xiong <xiong.y.zhang@intel.com> > Cc: Lv Zhiyuan <zhiyuan.lv@intel.com> > Cc: Dapeng Mi <dapeng1.mi@intel.com> > Cc: Jim Mattson <jmattson@google.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Like Xu <likexu@tencent.com> > --- > arch/x86/kvm/vmx/vmx.c | 24 ++++++++++++++++++++++-- > 1 file changed, 22 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 7a74388f9ecf..641a7d5bf584 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -7864,8 +7864,28 @@ static u64 vmx_get_perf_capabilities(void) > > if (vmx_pebs_supported()) { > perf_cap |= host_perf_cap & PERF_CAP_PEBS_MASK; > - if ((perf_cap & PERF_CAP_PEBS_FORMAT) < 4) > - perf_cap &= ~PERF_CAP_PEBS_BASELINE; > + > + /* > + * Disallow adaptive PEBS as it is functionally broken, can be > + * used by the guest to read *host* LBRs, and can be used to > + * bypass userspace event filters. To correctly and safely > + * support adaptive PEBS, KVM needs to: > + * > + * 1. Account for the ADAPTIVE flag when (re)programming fixed > + * counters. > + * > + * 2. Gain support from perf (or take direct control of counter > + * programming) to support events without adaptive PEBS > + * enabled for the hardware counter. > + * > + * 3. Ensure LBR MSRs cannot hold host data on VM-Entry with > + * adaptive PEBS enabled and MSR_PEBS_DATA_CFG.LBRS=1. > + * > + * 4. Document which PMU events are effectively exposed to the > + * guest via adaptive PEBS, and make adaptive PEBS mutually > + * exclusive with KVM_SET_PMU_EVENT_FILTER if necessary. > + */ > + perf_cap &= ~PERF_CAP_PEBS_BASELINE; > } > > return perf_cap; > > base-commit: 0c64952fec3ea01cb5b09f00134200f3e7ab40d5
On Wed, 06 Mar 2024 16:58:33 -0800, Sean Christopherson wrote: > Drop support for virtualizing adaptive PEBS, as KVM's implementation is > architecturally broken without an obvious/easy path forward, and because > exposing adaptive PEBS can leak host LBRs to the guest, i.e. can leak > host kernel addresses to the guest. > > Bug #1 is that KVM doesn't doesn't account for the upper 32 bits of > IA32_FIXED_CTR_CTRL when (re)programming fixed counters, e.g > fixed_ctrl_field() drops the upper bits, reprogram_fixed_counters() > stores local variables as u8s and truncates the upper bits too, etc. > > [...] Applied to kvm-x86 fixes, thanks! [1/1] KVM: x86/pmu: Disable support for adaptive PEBS https://github.com/kvm-x86/linux/commit/9e985cbf2942 -- https://github.com/kvm-x86/linux/tree/next
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 7a74388f9ecf..641a7d5bf584 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7864,8 +7864,28 @@ static u64 vmx_get_perf_capabilities(void) if (vmx_pebs_supported()) { perf_cap |= host_perf_cap & PERF_CAP_PEBS_MASK; - if ((perf_cap & PERF_CAP_PEBS_FORMAT) < 4) - perf_cap &= ~PERF_CAP_PEBS_BASELINE; + + /* + * Disallow adaptive PEBS as it is functionally broken, can be + * used by the guest to read *host* LBRs, and can be used to + * bypass userspace event filters. To correctly and safely + * support adaptive PEBS, KVM needs to: + * + * 1. Account for the ADAPTIVE flag when (re)programming fixed + * counters. + * + * 2. Gain support from perf (or take direct control of counter + * programming) to support events without adaptive PEBS + * enabled for the hardware counter. + * + * 3. Ensure LBR MSRs cannot hold host data on VM-Entry with + * adaptive PEBS enabled and MSR_PEBS_DATA_CFG.LBRS=1. + * + * 4. Document which PMU events are effectively exposed to the + * guest via adaptive PEBS, and make adaptive PEBS mutually + * exclusive with KVM_SET_PMU_EVENT_FILTER if necessary. + */ + perf_cap &= ~PERF_CAP_PEBS_BASELINE; } return perf_cap;
Drop support for virtualizing adaptive PEBS, as KVM's implementation is architecturally broken without an obvious/easy path forward, and because exposing adaptive PEBS can leak host LBRs to the guest, i.e. can leak host kernel addresses to the guest. Bug #1 is that KVM doesn't doesn't account for the upper 32 bits of IA32_FIXED_CTR_CTRL when (re)programming fixed counters, e.g fixed_ctrl_field() drops the upper bits, reprogram_fixed_counters() stores local variables as u8s and truncates the upper bits too, etc. Bug #2 is that, because KVM _always_ sets precise_ip to a non-zero value for PEBS events, perf will _always_ generate an adaptive record, even if the guest requested a basic record. Note, KVM will also enable adaptive PEBS in individual *counter*, even if adaptive PEBS isn't exposed to the guest, but this is benign as MSR_PEBS_DATA_CFG is guaranteed to be zero, i.e. the guest will only ever see Basic records. Bug #3 is in perf. intel_pmu_disable_fixed() doesn't clear the upper bits either, i.e. leaves ICL_FIXED_0_ADAPTIVE set, and intel_pmu_enable_fixed() effectively doesn't clear ICL_FIXED_0_ADAPTIVE either. I.e. perf _always_ enables ADAPTIVE counters, regardless of what KVM requests. Bug #4 is that adaptive PEBS *might* effectively bypass event filters set by the host, as "Updated Memory Access Info Group" records information that might be disallowed by userspace via KVM_SET_PMU_EVENT_FILTER. Bug #5 is that KVM doesn't ensure LBR MSRs hold guest values (or at least zeros) when entering a vCPU with adaptive PEBS, which allows the guest to read host LBRs, i.e. host RIPs/addresses, by enabling "LBR Entries" records. Disable adaptive PEBS support as an immediate fix due to the severity of the LBR leak in particular, and because fixing all of the bugs will be non-trivial, e.g. not suitable for backporting to stable kernels. Note! This will break live migration, but trying to make KVM play nice with live migration would be quite complicated, wouldn't be guaranteed to work (i.e. KVM might still kill/confuse the guest), and it's not clear that there are any publicly available VMMs that support adaptive PEBS, let alone live migrate VMs that support adaptive PEBS, e.g. QEMU doesn't support PEBS in any capacity. Link: https://lore.kernel.org/all/20240306230153.786365-1-seanjc@google.com Link: https://lore.kernel.org/all/ZeepGjHCeSfadANM@google.com Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") Cc: stable@vger.kernel.org Cc: Like Xu <like.xu.linux@gmail.com> Cc: Mingwei Zhang <mizhang@google.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhang Xiong <xiong.y.zhang@intel.com> Cc: Lv Zhiyuan <zhiyuan.lv@intel.com> Cc: Dapeng Mi <dapeng1.mi@intel.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/vmx/vmx.c | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) base-commit: 0c64952fec3ea01cb5b09f00134200f3e7ab40d5