mbox series

[RFC,0/3] KVM: Introduce "VM bugged" concept

Message ID 20200923224530.17735-1-sean.j.christopherson@intel.com (mailing list archive)
Headers show
Series KVM: Introduce "VM bugged" concept | expand

Message

Sean Christopherson Sept. 23, 2020, 10:45 p.m. UTC
This series introduces a concept we've discussed a few times in x86 land.
The crux of the problem is that x86 has a few cases where KVM could
theoretically encounter a software or hardware bug deep in a call stack
without any sane way to propagate the error out to userspace.

Another use case would be for scenarios where letting the VM live will
do more harm than good, e.g. we've been using KVM_BUG_ON for early TDX
enabling as botching anything related to secure paging all but guarantees
there will be a flood of WARNs and error messages because lower level PTE
operations will fail if an upper level operation failed.

The basic idea is to WARN_ONCE if a bug is encountered, kick all vCPUs out
to userspace, and mark the VM as bugged so that no ioctls() can be issued
on the VM or its devices/vCPUs.

RFC as I've done nowhere near enough testing to verify that rejecting the
ioctls(), evicting running vCPUs, etc... works as intended.

Sean Christopherson (3):
  KVM: Export kvm_make_all_cpus_request() for use in marking VMs as
    bugged
  KVM: Add infrastructure and macro to mark VM as bugged
  KVM: x86: Use KVM_BUG/KVM_BUG_ON to handle bugs that are fatal to the
    VM

 arch/x86/kvm/svm/svm.c   |  2 +-
 arch/x86/kvm/vmx/vmx.c   | 23 ++++++++++++--------
 arch/x86/kvm/x86.c       |  4 ++++
 include/linux/kvm_host.h | 45 ++++++++++++++++++++++++++++++++--------
 virt/kvm/kvm_main.c      | 11 +++++-----
 5 files changed, 61 insertions(+), 24 deletions(-)

Comments

Christian Borntraeger Sept. 24, 2020, 6:37 a.m. UTC | #1
On 24.09.20 00:45, Sean Christopherson wrote:
> This series introduces a concept we've discussed a few times in x86 land.
> The crux of the problem is that x86 has a few cases where KVM could
> theoretically encounter a software or hardware bug deep in a call stack
> without any sane way to propagate the error out to userspace.
> 
> Another use case would be for scenarios where letting the VM live will
> do more harm than good, e.g. we've been using KVM_BUG_ON for early TDX
> enabling as botching anything related to secure paging all but guarantees
> there will be a flood of WARNs and error messages because lower level PTE
> operations will fail if an upper level operation failed.
> 
> The basic idea is to WARN_ONCE if a bug is encountered, kick all vCPUs out
> to userspace, and mark the VM as bugged so that no ioctls() can be issued
> on the VM or its devices/vCPUs.
> 
> RFC as I've done nowhere near enough testing to verify that rejecting the
> ioctls(), evicting running vCPUs, etc... works as intended.

I like the idea. Especially when we add a common "understanding" in QEMU
across all platforms. That would then even allow to propagate an error.
> 
> Sean Christopherson (3):
>   KVM: Export kvm_make_all_cpus_request() for use in marking VMs as
>     bugged
>   KVM: Add infrastructure and macro to mark VM as bugged
>   KVM: x86: Use KVM_BUG/KVM_BUG_ON to handle bugs that are fatal to the
>     VM
> 
>  arch/x86/kvm/svm/svm.c   |  2 +-
>  arch/x86/kvm/vmx/vmx.c   | 23 ++++++++++++--------
>  arch/x86/kvm/x86.c       |  4 ++++
>  include/linux/kvm_host.h | 45 ++++++++++++++++++++++++++++++++--------
>  virt/kvm/kvm_main.c      | 11 +++++-----
>  5 files changed, 61 insertions(+), 24 deletions(-)
>
Marc Zyngier Sept. 25, 2020, 4:32 p.m. UTC | #2
Hi Sean,

On Wed, 23 Sep 2020 23:45:27 +0100,
Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> 
> This series introduces a concept we've discussed a few times in x86 land.
> The crux of the problem is that x86 has a few cases where KVM could
> theoretically encounter a software or hardware bug deep in a call stack
> without any sane way to propagate the error out to userspace.
> 
> Another use case would be for scenarios where letting the VM live will
> do more harm than good, e.g. we've been using KVM_BUG_ON for early TDX
> enabling as botching anything related to secure paging all but guarantees
> there will be a flood of WARNs and error messages because lower level PTE
> operations will fail if an upper level operation failed.
> 
> The basic idea is to WARN_ONCE if a bug is encountered, kick all vCPUs out
> to userspace, and mark the VM as bugged so that no ioctls() can be issued
> on the VM or its devices/vCPUs.
> 
> RFC as I've done nowhere near enough testing to verify that rejecting the
> ioctls(), evicting running vCPUs, etc... works as intended.

I'm quite like the idea. However, I wonder whether preventing the
vcpus from re-entering the guest is enough. When something goes really
wrong, is it safe to allow the userspace process to terminate normally
and free the associated memory? And is it still safe to allow new VMs
to be started?

I can't really imagine a case where such extreme measures would be
necessary on arm64, but I thought I'd ask.

Thanks,

	M.
Sean Christopherson Sept. 25, 2020, 5 p.m. UTC | #3
On Fri, Sep 25, 2020 at 05:32:53PM +0100, Marc Zyngier wrote:
> Hi Sean,
> 
> On Wed, 23 Sep 2020 23:45:27 +0100,
> Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> > 
> > This series introduces a concept we've discussed a few times in x86 land.
> > The crux of the problem is that x86 has a few cases where KVM could
> > theoretically encounter a software or hardware bug deep in a call stack
> > without any sane way to propagate the error out to userspace.
> > 
> > Another use case would be for scenarios where letting the VM live will
> > do more harm than good, e.g. we've been using KVM_BUG_ON for early TDX
> > enabling as botching anything related to secure paging all but guarantees
> > there will be a flood of WARNs and error messages because lower level PTE
> > operations will fail if an upper level operation failed.
> > 
> > The basic idea is to WARN_ONCE if a bug is encountered, kick all vCPUs out
> > to userspace, and mark the VM as bugged so that no ioctls() can be issued
> > on the VM or its devices/vCPUs.
> > 
> > RFC as I've done nowhere near enough testing to verify that rejecting the
> > ioctls(), evicting running vCPUs, etc... works as intended.
> 
> I'm quite like the idea. However, I wonder whether preventing the
> vcpus from re-entering the guest is enough. When something goes really
> wrong, is it safe to allow the userspace process to terminate normally
> and free the associated memory?

Yes and no.  Yes, there are potential scenarios where freeing memory is unsafe,
e.g. with TDX, improper sanitization of memory can lead to machine checks due
to integrity errors, i.e. freeing memory that wasn't sanitized is not safe.

But, our in-development code intentionally leaks pages that couldn't be
sanitized (with plenty of yelling).  So, "no" in the sense that, IMO, KVM
should be written such that it's sufficiently paranoid when handling "special"
memory (or other state).

> And is it still safe to allow new VMs to be started?

Hmm, anything that is truly fatal to the host/KVM should probably use BUG()
or even panic() directly.  E.g. to avoid a userspace bypass by unloading and
reloading KVM when it's built as a module, we'd have to set a flag in the
kernel proper.
Paolo Bonzini Sept. 25, 2020, 9:05 p.m. UTC | #4
On 25/09/20 18:32, Marc Zyngier wrote:
> I'm quite like the idea. However, I wonder whether preventing the
> vcpus from re-entering the guest is enough. When something goes really
> wrong, is it safe to allow the userspace process to terminate normally
> and free the associated memory? And is it still safe to allow new VMs
> to be started?

For something that bad, where e.g. you can't rule out future memory
corruptions via use-after-free bugs or similar, you're probably entering
BUG_ON territory.

Paolo
Cornelia Huck Sept. 29, 2020, 9:27 a.m. UTC | #5
On Wed, 23 Sep 2020 15:45:27 -0700
Sean Christopherson <sean.j.christopherson@intel.com> wrote:

> This series introduces a concept we've discussed a few times in x86 land.
> The crux of the problem is that x86 has a few cases where KVM could
> theoretically encounter a software or hardware bug deep in a call stack
> without any sane way to propagate the error out to userspace.
> 
> Another use case would be for scenarios where letting the VM live will
> do more harm than good, e.g. we've been using KVM_BUG_ON for early TDX
> enabling as botching anything related to secure paging all but guarantees
> there will be a flood of WARNs and error messages because lower level PTE
> operations will fail if an upper level operation failed.
> 
> The basic idea is to WARN_ONCE if a bug is encountered, kick all vCPUs out
> to userspace, and mark the VM as bugged so that no ioctls() can be issued
> on the VM or its devices/vCPUs.

I think this makes a lot of sense.

Are there other user space interactions where we want to generate an
error for a bugged VM, e.g. via eventfd?

And can we make the 'bugged' information available to user space in a
structured way?

> 
> RFC as I've done nowhere near enough testing to verify that rejecting the
> ioctls(), evicting running vCPUs, etc... works as intended.
> 
> Sean Christopherson (3):
>   KVM: Export kvm_make_all_cpus_request() for use in marking VMs as
>     bugged
>   KVM: Add infrastructure and macro to mark VM as bugged
>   KVM: x86: Use KVM_BUG/KVM_BUG_ON to handle bugs that are fatal to the
>     VM
> 
>  arch/x86/kvm/svm/svm.c   |  2 +-
>  arch/x86/kvm/vmx/vmx.c   | 23 ++++++++++++--------
>  arch/x86/kvm/x86.c       |  4 ++++
>  include/linux/kvm_host.h | 45 ++++++++++++++++++++++++++++++++--------
>  virt/kvm/kvm_main.c      | 11 +++++-----
>  5 files changed, 61 insertions(+), 24 deletions(-)
>