mbox series

[RFC,00/14] Support multiple KVM modules on the same host

Message ID 20231107202002.667900-1-aghulati@google.com (mailing list archive)
Headers show
Series Support multiple KVM modules on the same host | expand

Message

Anish Ghulati Nov. 7, 2023, 8:19 p.m. UTC
This series is a rough, PoC-quality RFC to allow (un)loading and running
multiple KVM modules simultaneously on a single host, e.g. to deploy
fixes, mitigations, and/or new features without having to drain all VMs 
from the host. Multi-KVM will also allow running the "same" KVM module
with different params, e.g. to run trusted VMs with different mitigations.

The goal of this RFC is to get feedback on the idea itself and the
high-level approach.  In particular, we're looking for input on:

 - Combining kvm_intel.ko and kvm_amd.ko into kvm.ko
 - Exposing multiple /dev/kvmX devices via Kconfig
 - The name and prefix of the new base module

Feedback on individual patches is also welcome, but please keep in mind
that this is very much a work in-progress

This builds on Sean's series to hide KVM internals:

    https://lore.kernel.org/lkml/20230916003118.2540661-1-seanjc@google.com

The whole thing can be found at:

    https://github.com/asg-17/linux vac-rfc

The basic gist of the approach is to:

 - Move system-wide virtualization resource management to a new base
   module to avoid collisions between different KVM modules, e.g. VPIDs
   and ASIDs need to be unique per VM, and callbacks from IRQ handlers need
   to be mediated so that things like PMIs get to the right KVM instance.

 - Refactor KVM to make all upgradable assets visible only to KVM, i.e.
   make KVM a black box, so that the layout/size of things like "struct
   kvm_vcpu" isn't exposed to the kernel at-large.

 - Fold kvm_intel.ko and kvm_amd.ko into kvm.ko to avoid complications
   having to generate unique symbols for every symbol exported by kvm.ko.

 - Add a Kconfig string to allow defining a device and module postfix at
   build time, e.g. to create kvmX.ko and /dev/kvmX.

The proposed name of the new base module is vac.ko, a.k.a.
Virtualization Acceleration Code (Unupgradable Units Module). Childish
humor aside, "vac" is a unique name in the kernel and hopefully in x86
and hardware terminology, is a unique name in the kernel and hopefully
in x86 and hardware terminology, e.g. `git grep vac_` yields no hits in
the kernel. It also has the same number of characters as "kvm", e.g.
the namespace can be modified without needing whitespace adjustment if
we want to go that route.

Requirements / Goals / Notes:
 - Fully opt-in and backwards compatible (except for the disappearance
   of kvm_{amd,intel}.ko).

 - User space ultimately controls and is responsible for deployment,
   usage, lifecycles, etc.  Standard module refcounting applies, but 
   ensuruing that a VM is created with the "right" KVM module is a user
   space problem.

 - No user space *VMM* changes are required, e.g. /dev/kvm can be
   presented to a VMM by symlinking /dev/kvmX.

 - Mutually exclusive with subsytems that have a hard dependency on KVM,
   i.e. KVMGT.

 - x86 only (for the foreseeable future).

Anish Ghulati (13):
  KVM: x86: Move common module params from SVM/VMX to x86
  KVM: x86: Fold x86 vendor modules into the main KVM modules
  KVM: x86: Remove unused exports
  KVM: x86: Create stubs for a new VAC module
  KVM: x86: Refactor hardware enable/disable operations into a new file
  KVM: x86: Move user return msr operations out of KVM
  KVM: SVM: Move shared SVM data structures into VAC
  KVM: VMX: Move shared VMX data structures into VAC
  KVM: VMX: Move VMX enable and disable into VAC
  KVM: SVM: Move SVM enable and disable into VAC
  KVM: x86: Move VMX and SVM support checks into VAC
  KVM: x86: VAC: Move all hardware enable/disable code into VAC
  KVM: VAC: Bring up VAC as a new module

Venkatesh Srinivas (1):
  KVM: x86: Move shared KVM state into VAC

 arch/x86/include/asm/kvm-x86-ops.h |   3 +-
 arch/x86/include/asm/kvm_host.h    |  12 +-
 arch/x86/kernel/nmi.c              |   2 +-
 arch/x86/kvm/Kconfig               |  29 +-
 arch/x86/kvm/Makefile              |  31 ++-
 arch/x86/kvm/cpuid.c               |   8 +-
 arch/x86/kvm/hyperv.c              |   2 -
 arch/x86/kvm/irq.c                 |   3 -
 arch/x86/kvm/irq_comm.c            |   2 -
 arch/x86/kvm/kvm_onhyperv.c        |   3 -
 arch/x86/kvm/lapic.c               |  15 -
 arch/x86/kvm/mmu/mmu.c             |  12 -
 arch/x86/kvm/mmu/spte.c            |   4 -
 arch/x86/kvm/mtrr.c                |   1 -
 arch/x86/kvm/pmu.c                 |   2 -
 arch/x86/kvm/svm/nested.c          |   4 +-
 arch/x86/kvm/svm/sev.c             |   2 +-
 arch/x86/kvm/svm/svm.c             | 224 ++-------------
 arch/x86/kvm/svm/svm.h             |  21 +-
 arch/x86/kvm/svm/svm_data.h        |  23 ++
 arch/x86/kvm/svm/svm_ops.h         |   1 +
 arch/x86/kvm/svm/vac.c             | 172 ++++++++++++
 arch/x86/kvm/svm/vac.h             |  20 ++
 arch/x86/kvm/vac.c                 | 214 +++++++++++++++
 arch/x86/kvm/vac.h                 |  69 +++++
 arch/x86/kvm/vmx/nested.c          |   6 +-
 arch/x86/kvm/vmx/vac.c             | 287 +++++++++++++++++++
 arch/x86/kvm/vmx/vac.h             |  20 ++
 arch/x86/kvm/vmx/vmx.c             | 332 +++-------------------
 arch/x86/kvm/vmx/vmx.h             |   2 -
 arch/x86/kvm/vmx/vmx_ops.h         |   1 +
 arch/x86/kvm/x86.c                 | 423 ++---------------------------
 arch/x86/kvm/x86.h                 |  15 +-
 include/linux/kvm_host.h           |   2 +
 virt/kvm/Makefile.kvm              |  14 +-
 virt/kvm/kvm_main.c                | 210 +-------------
 virt/kvm/vac.c                     | 192 +++++++++++++
 virt/kvm/vac.h                     |  40 +++
 38 files changed, 1212 insertions(+), 1211 deletions(-)
 create mode 100644 arch/x86/kvm/svm/svm_data.h
 create mode 100644 arch/x86/kvm/svm/vac.c
 create mode 100644 arch/x86/kvm/svm/vac.h
 create mode 100644 arch/x86/kvm/vac.c
 create mode 100644 arch/x86/kvm/vac.h
 create mode 100644 arch/x86/kvm/vmx/vac.c
 create mode 100644 arch/x86/kvm/vmx/vac.h
 create mode 100644 virt/kvm/vac.c
 create mode 100644 virt/kvm/vac.h


base-commit: 0b78fc46e5450f08ef92431e569c797a63f31517

Comments

Lai Jiangshan Nov. 17, 2023, 8:53 a.m. UTC | #1
On Wed, Nov 8, 2023 at 4:20 AM Anish Ghulati <aghulati@google.com> wrote:
>
> This series is a rough, PoC-quality RFC to allow (un)loading and running
> multiple KVM modules simultaneously on a single host, e.g. to deploy
> fixes, mitigations, and/or new features without having to drain all VMs
> from the host. Multi-KVM will also allow running the "same" KVM module
> with different params, e.g. to run trusted VMs with different mitigations.
>
> The goal of this RFC is to get feedback on the idea itself and the
> high-level approach.  In particular, we're looking for input on:
>
>  - Combining kvm_intel.ko and kvm_amd.ko into kvm.ko
>  - Exposing multiple /dev/kvmX devices via Kconfig
>  - The name and prefix of the new base module
>
> Feedback on individual patches is also welcome, but please keep in mind
> that this is very much a work in-progress

Hello Anish

Scarce effort on multi-KVM can be seen in the mail list albeit many
companies enable multi-KVM internally.

I'm glad that you took a big step in upstreaming it.  And I hope it
can be materialized soon.


>
>  - Move system-wide virtualization resource management to a new base
>    module to avoid collisions between different KVM modules, e.g. VPIDs
>    and ASIDs need to be unique per VM, and callbacks from IRQ handlers need
>    to be mediated so that things like PMIs get to the right KVM instance.

perf_register_guest_info_callbacks() also accesses to system-wide resources,
but I don't see its relating code including kvm_guest_cbs being moved to AVC.

>
>  - Refactor KVM to make all upgradable assets visible only to KVM, i.e.
>    make KVM a black box, so that the layout/size of things like "struct
>    kvm_vcpu" isn't exposed to the kernel at-large.
>
>  - Fold kvm_intel.ko and kvm_amd.ko into kvm.ko to avoid complications
>    having to generate unique symbols for every symbol exported by kvm.ko.

The sizes of kvm_intel.ko and kvm_amd.ko are big, and there
is only 1G in the kernel available for modules. So I don't think folding
two vendors' code into kvm.ko is a good idea.

Since the symbols in the new module are invisible outside, I recommend:
new kvm_intel.ko = kvm_intel.ko + kvm.ko
new kvm_amd.ko = kvm_amd.ko + kvm.ko

>
>  - Add a Kconfig string to allow defining a device and module postfix at
>    build time, e.g. to create kvmX.ko and /dev/kvmX.
>
> The proposed name of the new base module is vac.ko, a.k.a.
> Virtualization Acceleration Code (Unupgradable Units Module). Childish
> humor aside, "vac" is a unique name in the kernel and hopefully in x86
> and hardware terminology, is a unique name in the kernel and hopefully
> in x86 and hardware terminology, e.g. `git grep vac_` yields no hits in
> the kernel. It also has the same number of characters as "kvm", e.g.
> the namespace can be modified without needing whitespace adjustment if
> we want to go that route.

How about the name kvm_base.ko?

And the variable/function name in it can still be kvm_foo (other than
kvm_base_foo).

Thanks
Lai
Sean Christopherson Nov. 28, 2023, 6:10 p.m. UTC | #2
On Fri, Nov 17, 2023, Lai Jiangshan wrote:
> On Wed, Nov 8, 2023 at 4:20 AM Anish Ghulati <aghulati@google.com> wrote:
> >
> > This series is a rough, PoC-quality RFC to allow (un)loading and running
> > multiple KVM modules simultaneously on a single host, e.g. to deploy
> > fixes, mitigations, and/or new features without having to drain all VMs
> > from the host. Multi-KVM will also allow running the "same" KVM module
> > with different params, e.g. to run trusted VMs with different mitigations.
> >
> > The goal of this RFC is to get feedback on the idea itself and the
> > high-level approach.  In particular, we're looking for input on:
> >
> >  - Combining kvm_intel.ko and kvm_amd.ko into kvm.ko
> >  - Exposing multiple /dev/kvmX devices via Kconfig
> >  - The name and prefix of the new base module
> >
> > Feedback on individual patches is also welcome, but please keep in mind
> > that this is very much a work in-progress
> 
> Hello Anish
> 
> Scarce effort on multi-KVM can be seen in the mail list albeit many
> companies enable multi-KVM internally.
> 
> I'm glad that you took a big step in upstreaming it.  And I hope it
> can be materialized soon.
> 
> 
> >
> >  - Move system-wide virtualization resource management to a new base
> >    module to avoid collisions between different KVM modules, e.g. VPIDs
> >    and ASIDs need to be unique per VM, and callbacks from IRQ handlers need
> >    to be mediated so that things like PMIs get to the right KVM instance.
> 
> perf_register_guest_info_callbacks() also accesses to system-wide resources,
> but I don't see its relating code including kvm_guest_cbs being moved to AVC.

Yeah, that's on the TODO list.  IIRC, the plan is to have VAC register a single
callback with perf, and then have VAC deal with invoking the callback(s) for the
correct KVM instance.

> >  - Refactor KVM to make all upgradable assets visible only to KVM, i.e.
> >    make KVM a black box, so that the layout/size of things like "struct
> >    kvm_vcpu" isn't exposed to the kernel at-large.
> >
> >  - Fold kvm_intel.ko and kvm_amd.ko into kvm.ko to avoid complications
> >    having to generate unique symbols for every symbol exported by kvm.ko.
> 
> The sizes of kvm_intel.ko and kvm_amd.ko are big, and there
> is only 1G in the kernel available for modules. So I don't think folding
> two vendors' code into kvm.ko is a good idea.
> 
> Since the symbols in the new module are invisible outside, I recommend:
> new kvm_intel.ko = kvm_intel.ko + kvm.ko
> new kvm_amd.ko = kvm_amd.ko + kvm.ko

Yeah, Paolo also suggested this at LPC.

> >  - Add a Kconfig string to allow defining a device and module postfix at
> >    build time, e.g. to create kvmX.ko and /dev/kvmX.
> >
> > The proposed name of the new base module is vac.ko, a.k.a.
> > Virtualization Acceleration Code (Unupgradable Units Module). Childish
> > humor aside, "vac" is a unique name in the kernel and hopefully in x86
> > and hardware terminology, is a unique name in the kernel and hopefully
> > in x86 and hardware terminology, e.g. `git grep vac_` yields no hits in
> > the kernel. It also has the same number of characters as "kvm", e.g.
> > the namespace can be modified without needing whitespace adjustment if
> > we want to go that route.
> 
> How about the name kvm_base.ko?
> 
> And the variable/function name in it can still be kvm_foo (other than
> kvm_base_foo).

My preference is to have a unique name that allows us to differentitate between
the "base" module/code and KVM code.  Verbal conversations about all of this get
quite confusing because it's not always clear whether "base KVM" refers to what
is currently kvm.ko, or what would become kvm_base.ko/vac.ko.