mbox series

[v3,0/2] Fix for a very old KVM bug in the segment cache

Message ID 20240725175232.337266-1-mlevitsk@redhat.com (mailing list archive)
Headers show
Series Fix for a very old KVM bug in the segment cache | expand

Message

Maxim Levitsky July 25, 2024, 5:52 p.m. UTC
Hi,

Recently, while trying to understand why the pmu_counters_test
selftest sometimes fails when run nested I stumbled
upon a very interesting and old bug:

It turns out that KVM caches guest segment state,
but this cache doesn't have any protection against concurrent use.

This usually works because the cache is per vcpu, and should
only be accessed by vCPU thread, however there is an exception:

If the full preemption is enabled in the host kernel,
it is possible that vCPU thread will be preempted, for
example during the vmx_vcpu_reset.

vmx_vcpu_reset resets the segment cache bitmask and then initializes
the segments in the vmcs, however if the vcpus is preempted in the
middle of this code, the kvm_arch_vcpu_put is called which
reads SS's AR bytes to determine if the vCPU is in the kernel mode,
which caches the old value.

Later vmx_vcpu_reset will set the SS's AR field to the correct value
in vmcs but the cache still contains an invalid value which
can later for example leak via KVM_GET_SREGS and such.

In particular, kvm selftests will do KVM_GET_SREGS,
and then KVM_SET_SREGS, with a broken SS's AR field passed as is,
which will lead to vm entry failure.

This issue is not a nested issue, and actually I was able
to reproduce it on bare metal, but due to timing it happens
much more often nested. The only requirement for this to happen
is to have full preemption enabled in the kernel which runs the selftest.

pmu_counters_test reproduces this issue well, because it creates
lots of short lived VMs, but the issue as was noted
about is not related to pmu.

To paritally fix this issue, call vmx_segment_cache_clear
after we done with segment register setup in vmx_vcpu_reset.

V2: incorporated Paolo's suggestion of having
    vmx_write_segment_cache_start/end functions  (thanks!)

V3: reverted to a partial fix.

Best regards,
	Maxim Levitsky

Maxim Levitsky (2):
  KVM: nVMX: use vmx_segment_cache_clear
  VMX: reset the segment cache after segment initialization in
    vmx_vcpu_reset

 arch/x86/kvm/vmx/nested.c |  3 ++-
 arch/x86/kvm/vmx/vmx.c    | 10 +++-------
 arch/x86/kvm/vmx/vmx.h    |  5 +++++
 3 files changed, 10 insertions(+), 8 deletions(-)

Comments

Maxim Levitsky Aug. 2, 2024, 3:33 p.m. UTC | #1
У чт, 2024-07-25 у 13:52 -0400, Maxim Levitsky пише:
> Hi,
> 
> Recently, while trying to understand why the pmu_counters_test
> selftest sometimes fails when run nested I stumbled
> upon a very interesting and old bug:
> 
> It turns out that KVM caches guest segment state,
> but this cache doesn't have any protection against concurrent use.
> 
> This usually works because the cache is per vcpu, and should
> only be accessed by vCPU thread, however there is an exception:
> 
> If the full preemption is enabled in the host kernel,
> it is possible that vCPU thread will be preempted, for
> example during the vmx_vcpu_reset.
> 
> vmx_vcpu_reset resets the segment cache bitmask and then initializes
> the segments in the vmcs, however if the vcpus is preempted in the
> middle of this code, the kvm_arch_vcpu_put is called which
> reads SS's AR bytes to determine if the vCPU is in the kernel mode,
> which caches the old value.
> 
> Later vmx_vcpu_reset will set the SS's AR field to the correct value
> in vmcs but the cache still contains an invalid value which
> can later for example leak via KVM_GET_SREGS and such.
> 
> In particular, kvm selftests will do KVM_GET_SREGS,
> and then KVM_SET_SREGS, with a broken SS's AR field passed as is,
> which will lead to vm entry failure.
> 
> This issue is not a nested issue, and actually I was able
> to reproduce it on bare metal, but due to timing it happens
> much more often nested. The only requirement for this to happen
> is to have full preemption enabled in the kernel which runs the selftest.
> 
> pmu_counters_test reproduces this issue well, because it creates
> lots of short lived VMs, but the issue as was noted
> about is not related to pmu.
> 
> To paritally fix this issue, call vmx_segment_cache_clear
> after we done with segment register setup in vmx_vcpu_reset.
> 
> V2: incorporated Paolo's suggestion of having
>     vmx_write_segment_cache_start/end functions  (thanks!)
> 
> V3: reverted to a partial fix.
> 
> Best regards,
>         Maxim Levitsky
> 
> Maxim Levitsky (2):
>   KVM: nVMX: use vmx_segment_cache_clear
>   VMX: reset the segment cache after segment initialization in
>     vmx_vcpu_reset
> 
>  arch/x86/kvm/vmx/nested.c |  3 ++-
>  arch/x86/kvm/vmx/vmx.c    | 10 +++-------
>  arch/x86/kvm/vmx/vmx.h    |  5 +++++
>  3 files changed, 10 insertions(+), 8 deletions(-)
> 
> -- 
> 2.26.3
> 
> 

Any update?

Best regards,
	Maxim Levitsky
Sean Christopherson Aug. 2, 2024, 4:07 p.m. UTC | #2
On Fri, Aug 02, 2024, mlevitsk@redhat.com wrote:
> Any update?

I am planning on jumpin into review/merge mode on Monday.
Sean Christopherson Aug. 23, 2024, 11:48 p.m. UTC | #3
On Thu, 25 Jul 2024 13:52:30 -0400, Maxim Levitsky wrote:
> Recently, while trying to understand why the pmu_counters_test
> selftest sometimes fails when run nested I stumbled
> upon a very interesting and old bug:
> 
> It turns out that KVM caches guest segment state,
> but this cache doesn't have any protection against concurrent use.
> 
> [...]

Applied patch 1 to kvm-x86 vmx, thanks!

[1/2] KVM: nVMX: use vmx_segment_cache_clear
      https://github.com/kvm-x86/linux/commit/41ab0d59faa9

--
https://github.com/kvm-x86/linux/tree/next