mbox series

[00/12] KVM: nVMX: Fix vmcs02 PID use-after-free issue

Message ID 20210520230339.267445-1-jmattson@google.com (mailing list archive)
Headers show
Series KVM: nVMX: Fix vmcs02 PID use-after-free issue | expand

Message

Jim Mattson May 20, 2021, 11:03 p.m. UTC
When the VMCS12 posted interrupt descriptor isn't backed by an L1
memslot, kvm will launch vmcs02 with a stale posted interrupt
descriptor. Before commit 6beb7bd52e48 ("kvm: nVMX: Refactor
nested_get_vmcs12_pages()"), kvm would have silently disabled the
VMCS02 "process posted interrupts" VM-execution control. Both
behaviors are wrong, though the use-after-free is more egregious.

Empirical tests on actual hardware reveal that a posted interrupt
descriptor without any backing memory/device has PCI bus error
semantics (reads return all 1's and writes are discarded). However,
kvm can't tell an unbacked address from an MMIO address. Normally, kvm
would ask userspace for an MMIO completion, but that's complicated for
a posted interrupt descriptor access. There are already a number of
cases where kvm bails out to userspace with KVM_INTERNAL_ERROR via
kvm_handle_memory_failure, so that seems like the best route to take.

It would be relatively easy to invoke kvm_handle_memory_failure at
emulated VM-entry, but that approach would break existing
kvm-unit-tests. Moreover, the issue doesn't really come up until the
vCPU--in virtualized VMX non-root operation--received the posted
interrupt notification vector indicated in its VMCS12.

Sadly, it's really hard to arrange for an exit to userspace from
vmx_complete_nested_posted_interrupt, which is where kvm actually
needs to access the unbacked PID. Initially, I added a new kvm request
for a userspace exit on the next guest entry, but Sean hated that
approach. Based on his suggestion, I added the plumbing to get back
out to userspace in the event of an error in
vmx_complete_nested_posted_interrupt. This works in the case of an
unbacked PID, but it doesn't work quite as well in the case of an
unbacked virtual APIC page (another case where kvm was happy to just
silently ignore the problem and attempt to muddle its way through.) In
that case, this series is an incremental improvement, but it's not a
complete fix.

Jim Mattson (12):
  KVM: x86: Remove guest mode check from kvm_check_nested_events
  KVM: x86: Wake up a vCPU when kvm_check_nested_events fails
  KVM: nVMX: Add a return code to vmx_complete_nested_posted_interrupt
  KVM: x86: Add a return code to inject_pending_event
  KVM: x86: Add a return code to kvm_apic_accept_events
  KVM: nVMX: Fail on MMIO completion for nested posted interrupts
  KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't
    mappable
  KVM: selftests: Move APIC definitions into a separate file
  KVM: selftests: Hoist APIC functions out of individual tests
  KVM: selftests: Introduce x2APIC register manipulation functions
  KVM: selftests: Introduce prepare_tpr_shadow
  KVM: selftests: Add a test of an unbacked nested PI descriptor

 arch/x86/kvm/lapic.c                          |  11 +-
 arch/x86/kvm/lapic.h                          |   2 +-
 arch/x86/kvm/vmx/nested.c                     |  31 ++-
 arch/x86/kvm/x86.c                            |  56 ++--
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   3 +-
 .../selftests/kvm/include/x86_64/apic.h       |  91 +++++++
 .../selftests/kvm/include/x86_64/processor.h  |  49 +---
 .../selftests/kvm/include/x86_64/vmx.h        |   6 +
 tools/testing/selftests/kvm/lib/x86_64/apic.c |  45 ++++
 tools/testing/selftests/kvm/lib/x86_64/vmx.c  |   8 +
 .../testing/selftests/kvm/x86_64/evmcs_test.c |  11 +-
 .../selftests/kvm/x86_64/set_boot_cpu_id.c    |   6 +-
 tools/testing/selftests/kvm/x86_64/smm_test.c |   4 +-
 .../selftests/kvm/x86_64/vmx_pi_mmio_test.c   | 252 ++++++++++++++++++
 .../selftests/kvm/x86_64/xapic_ipi_test.c     |  59 +---
 16 files changed, 488 insertions(+), 147 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/include/x86_64/apic.h
 create mode 100644 tools/testing/selftests/kvm/lib/x86_64/apic.c
 create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_pi_mmio_test.c

Comments

Sean Christopherson May 21, 2021, 12:58 a.m. UTC | #1
On Thu, May 20, 2021, Jim Mattson wrote:
> Initially, I added a new kvm request for a userspace exit on the next guest
> entry, but Sean hated that approach.

Hey!  You could at least say _why_ I hated it. :-D

My argument is that the only way to guarantee that vcpu->run->exit_reason isn't
crushed between making the request and servicing the request is by guaranteeing
that KVM makes it back to the run loop immediately after making the request.
Since the only way to accomplish that is by forcing a return up the stack, at
that point we can simply return the "request" directly.
Jim Mattson May 21, 2021, 12:04 p.m. UTC | #2
On Thu, May 20, 2021 at 4:03 PM Jim Mattson <jmattson@google.com> wrote:
>
> When the VMCS12 posted interrupt descriptor isn't backed by an L1
> memslot, kvm will launch vmcs02 with a stale posted interrupt
> descriptor. Before commit 6beb7bd52e48 ("kvm: nVMX: Refactor
> nested_get_vmcs12_pages()"), kvm would have silently disabled the
> VMCS02 "process posted interrupts" VM-execution control. Both
> behaviors are wrong, though the use-after-free is more egregious.

Oops. Prior to the referenced commit, kvm would have forced a vmcs02
VM-entry failure by loading an illegal value into its posted interrupt
descriptor field. Though better than clearing the "process posted
interrupts" VM-execution control, that's still wrong.
Paolo Bonzini May 24, 2021, 3:50 p.m. UTC | #3
On 21/05/21 01:03, Jim Mattson wrote:
> When the VMCS12 posted interrupt descriptor isn't backed by an L1
> memslot, kvm will launch vmcs02 with a stale posted interrupt
> descriptor. Before commit 6beb7bd52e48 ("kvm: nVMX: Refactor
> nested_get_vmcs12_pages()"), kvm would have silently disabled the
> VMCS02 "process posted interrupts" VM-execution control. Both
> behaviors are wrong, though the use-after-free is more egregious.
> 
> Empirical tests on actual hardware reveal that a posted interrupt
> descriptor without any backing memory/device has PCI bus error
> semantics (reads return all 1's and writes are discarded). However,
> kvm can't tell an unbacked address from an MMIO address. Normally, kvm
> would ask userspace for an MMIO completion, but that's complicated for
> a posted interrupt descriptor access. There are already a number of
> cases where kvm bails out to userspace with KVM_INTERNAL_ERROR via
> kvm_handle_memory_failure, so that seems like the best route to take.
> 
> It would be relatively easy to invoke kvm_handle_memory_failure at
> emulated VM-entry, but that approach would break existing
> kvm-unit-tests. Moreover, the issue doesn't really come up until the
> vCPU--in virtualized VMX non-root operation--received the posted
> interrupt notification vector indicated in its VMCS12.
> 
> Sadly, it's really hard to arrange for an exit to userspace from
> vmx_complete_nested_posted_interrupt, which is where kvm actually
> needs to access the unbacked PID. Initially, I added a new kvm request
> for a userspace exit on the next guest entry, but Sean hated that
> approach. Based on his suggestion, I added the plumbing to get back
> out to userspace in the event of an error in
> vmx_complete_nested_posted_interrupt. This works in the case of an
> unbacked PID, but it doesn't work quite as well in the case of an
> unbacked virtual APIC page (another case where kvm was happy to just
> silently ignore the problem and attempt to muddle its way through.) In
> that case, this series is an incremental improvement, but it's not a
> complete fix.
> 
> Jim Mattson (12):
>    KVM: x86: Remove guest mode check from kvm_check_nested_events
>    KVM: x86: Wake up a vCPU when kvm_check_nested_events fails
>    KVM: nVMX: Add a return code to vmx_complete_nested_posted_interrupt
>    KVM: x86: Add a return code to inject_pending_event
>    KVM: x86: Add a return code to kvm_apic_accept_events
>    KVM: nVMX: Fail on MMIO completion for nested posted interrupts
>    KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't
>      mappable
>    KVM: selftests: Move APIC definitions into a separate file
>    KVM: selftests: Hoist APIC functions out of individual tests
>    KVM: selftests: Introduce x2APIC register manipulation functions
>    KVM: selftests: Introduce prepare_tpr_shadow
>    KVM: selftests: Add a test of an unbacked nested PI descriptor

Patch 2 is the only one that seems wrong to me, and I am actually not 
sure why it is part of this series.  It seems to me that it overlaps 
"kvm: x86: move srcu lock out of kvm_vcpu_check_block​", for which both 
Sean and I had some discussions on how to remove the side effects that 
kvm_check_nested_events has on kvm_vcpu_is_runn{able,ing}.

Otherwise looks good, and I've already queued patches 8-10.  Holler if I 
should just apply patches 1 and 3-12.

Paolo
Paolo Bonzini May 24, 2021, 4:46 p.m. UTC | #4
On 21/05/21 01:03, Jim Mattson wrote:
> When the VMCS12 posted interrupt descriptor isn't backed by an L1
> memslot, kvm will launch vmcs02 with a stale posted interrupt
> descriptor. Before commit 6beb7bd52e48 ("kvm: nVMX: Refactor
> nested_get_vmcs12_pages()"), kvm would have silently disabled the
> VMCS02 "process posted interrupts" VM-execution control. Both
> behaviors are wrong, though the use-after-free is more egregious.
> 
> Empirical tests on actual hardware reveal that a posted interrupt
> descriptor without any backing memory/device has PCI bus error
> semantics (reads return all 1's and writes are discarded). However,
> kvm can't tell an unbacked address from an MMIO address. Normally, kvm
> would ask userspace for an MMIO completion, but that's complicated for
> a posted interrupt descriptor access. There are already a number of
> cases where kvm bails out to userspace with KVM_INTERNAL_ERROR via
> kvm_handle_memory_failure, so that seems like the best route to take.
> 
> It would be relatively easy to invoke kvm_handle_memory_failure at
> emulated VM-entry, but that approach would break existing
> kvm-unit-tests. Moreover, the issue doesn't really come up until the
> vCPU--in virtualized VMX non-root operation--received the posted
> interrupt notification vector indicated in its VMCS12.
> 
> Sadly, it's really hard to arrange for an exit to userspace from
> vmx_complete_nested_posted_interrupt, which is where kvm actually
> needs to access the unbacked PID. Initially, I added a new kvm request
> for a userspace exit on the next guest entry, but Sean hated that
> approach. Based on his suggestion, I added the plumbing to get back
> out to userspace in the event of an error in
> vmx_complete_nested_posted_interrupt. This works in the case of an
> unbacked PID, but it doesn't work quite as well in the case of an
> unbacked virtual APIC page (another case where kvm was happy to just
> silently ignore the problem and attempt to muddle its way through.) In
> that case, this series is an incremental improvement, but it's not a
> complete fix.
> 
> Jim Mattson (12):
>    KVM: x86: Remove guest mode check from kvm_check_nested_events
>    KVM: x86: Wake up a vCPU when kvm_check_nested_events fails
>    KVM: nVMX: Add a return code to vmx_complete_nested_posted_interrupt
>    KVM: x86: Add a return code to inject_pending_event
>    KVM: x86: Add a return code to kvm_apic_accept_events
>    KVM: nVMX: Fail on MMIO completion for nested posted interrupts
>    KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't
>      mappable
>    KVM: selftests: Move APIC definitions into a separate file
>    KVM: selftests: Hoist APIC functions out of individual tests
>    KVM: selftests: Introduce x2APIC register manipulation functions
>    KVM: selftests: Introduce prepare_tpr_shadow
>    KVM: selftests: Add a test of an unbacked nested PI descriptor
> 
>   arch/x86/kvm/lapic.c                          |  11 +-
>   arch/x86/kvm/lapic.h                          |   2 +-
>   arch/x86/kvm/vmx/nested.c                     |  31 ++-
>   arch/x86/kvm/x86.c                            |  56 ++--
>   tools/testing/selftests/kvm/.gitignore        |   1 +
>   tools/testing/selftests/kvm/Makefile          |   3 +-
>   .../selftests/kvm/include/x86_64/apic.h       |  91 +++++++
>   .../selftests/kvm/include/x86_64/processor.h  |  49 +---
>   .../selftests/kvm/include/x86_64/vmx.h        |   6 +
>   tools/testing/selftests/kvm/lib/x86_64/apic.c |  45 ++++
>   tools/testing/selftests/kvm/lib/x86_64/vmx.c  |   8 +
>   .../testing/selftests/kvm/x86_64/evmcs_test.c |  11 +-
>   .../selftests/kvm/x86_64/set_boot_cpu_id.c    |   6 +-
>   tools/testing/selftests/kvm/x86_64/smm_test.c |   4 +-
>   .../selftests/kvm/x86_64/vmx_pi_mmio_test.c   | 252 ++++++++++++++++++
>   .../selftests/kvm/x86_64/xapic_ipi_test.c     |  59 +---
>   16 files changed, 488 insertions(+), 147 deletions(-)
>   create mode 100644 tools/testing/selftests/kvm/include/x86_64/apic.h
>   create mode 100644 tools/testing/selftests/kvm/lib/x86_64/apic.c
>   create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_pi_mmio_test.c
> 

Pending further debugging of the selftest I've queued 
1-3-4-5-6-7-8-9-10.  Thanks,

Paolo