[RFC,00/37] KVM: Refactor the KVM/x86 TDP MMU into common code

Message ID	20221208193857.4090582-1-dmatlack@google.com (mailing list archive)
Headers	show Return-Path: <linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org> Date: Thu, 8 Dec 2022 11:38:20 -0800 Mime-Version: 1.0 Message-ID: <20221208193857.4090582-1-dmatlack@google.com> Subject: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code From: David Matlack <dmatlack@google.com> To: Paolo Bonzini <pbonzini@redhat.com> Cc: Marc Zyngier <maz@kernel.org>, James Morse <james.morse@arm.com>, Alexandru Elisei <alexandru.elisei@arm.com>, Suzuki K Poulose <suzuki.poulose@arm.com>, Oliver Upton <oliver.upton@linux.dev>, Huacai Chen <chenhuacai@kernel.org>, Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>, Anup Patel <anup@brainfault.org>, Atish Patra <atishp@atishpatra.org>, Paul Walmsley <paul.walmsley@sifive.com>, Palmer Dabbelt <palmer@dabbelt.com>, Albert Ou <aou@eecs.berkeley.edu>, Sean Christopherson <seanjc@google.com>, Andrew Morton <akpm@linux-foundation.org>, David Matlack <dmatlack@google.com>, Anshuman Khandual <anshuman.khandual@arm.com>, Nadav Amit <namit@vmware.com>, "Matthew Wilcox (Oracle)" <willy@infradead.org>, Vlastimil Babka <vbabka@suse.cz>, "Liam R. Howlett" <Liam.Howlett@Oracle.com>, Suren Baghdasaryan <surenb@google.com>, Peter Xu <peterx@redhat.com>, xu xin <cgel.zte@gmail.com>, Arnd Bergmann <arnd@arndb.de>, Yu Zhao <yuzhao@google.com>, Colin Cross <ccross@google.com>, Hugh Dickins <hughd@google.com>, Ben Gardon <bgardon@google.com>, Mingwei Zhang <mizhang@google.com>, Krish Sadhukhan <krish.sadhukhan@oracle.com>, Ricardo Koller <ricarkol@google.com>, Jing Zhang <jingzhangos@google.com>, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvmarm@lists.cs.columbia.edu, linux-mips@vger.kernel.org, kvm@vger.kernel.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-riscv" <linux-riscv-bounces@lists.infradead.org> Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org
Series	KVM: Refactor the KVM/x86 TDP MMU into common code \| expand [RFC,00/37] KVM: Refactor the KVM/x86 TDP MMU into common code [RFC,01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role [RFC,02/37] KVM: MMU: Move struct kvm_mmu_page_role into common code [RFC,03/37] KVM: MMU: Move tdp_ptep_t into common code [RFC,04/37] KVM: x86/mmu: Invert sp->tdp_mmu_page to sp->shadow_mmu_page [RFC,05/37] KVM: x86/mmu: Unify TDP MMU and Shadow MMU root refcounts [RFC,06/37] KVM: MMU: Move struct kvm_mmu_page to common code [RFC,07/37] mm: Introduce architecture-neutral PG_LEVEL macros [RFC,08/37] KVM: selftests: Stop assuming stats are contiguous in kvm_binary_stats_test [RFC,09/37] KVM: Move page size stats into common code [RFC,10/37] KVM: MMU: Move struct kvm_page_fault to common code [RFC,11/37] KVM: MMU: Move RET_PF_* into common code [RFC,12/37] KVM: x86/mmu: Use PG_LEVEL_{PTE,PMD,PUD} in the TDP MMU [RFC,13/37] KVM: MMU: Move sptep_to_sp() to common code [RFC,14/37] KVM: MMU: Introduce common macros for TDP page tables [RFC,15/37] KVM: x86/mmu: Add a common API for inspecting/modifying TDP PTEs [RFC,16/37] KVM: x86/mmu: Abstract away TDP MMU root lookup [RFC,17/37] KVM: Move struct kvm_gfn_range to kvm_types.h [RFC,18/37] KVM: x86/mmu: Add common API for creating TDP PTEs [RFC,19/37] KVM: x86/mmu: Add arch hooks for NX Huge Pages [RFC,20/37] KVM: x86/mmu: Abstract away computing the max mapping level [RFC,21/37] KVM: Introduce CONFIG_HAVE_TDP_MMU [RFC,22/37] KVM: x86: Select HAVE_TDP_MMU if X86_64 [RFC,23/37] KVM: MMU: Move VM-level TDP MMU state to struct kvm [RFC,24/37] KVM: x86/mmu: Move kvm_mmu_hugepage_adjust() up to fault handler [RFC,25/37] KVM: x86/mmu: Pass root role to kvm_tdp_mmu_get_vcpu_root_hpa() [RFC,26/37] KVM: Move page table cache to struct kvm_vcpu [RFC,27/37] KVM: MMU: Move mmu_page_header_cache to common code [RFC,28/37] KVM: MMU: Stub out tracepoints on non-x86 architectures [RFC,29/37] KVM: x86/mmu: Collapse kvm_flush_remote_tlbs_with_{range,address}() together [RFC,30/37] KVM: x86/mmu: Rename kvm_flush_remote_tlbs_with_address() [RFC,31/37] KVM: x86/MMU: Use gfn_t in kvm_flush_remote_tlbs_range() [RFC,32/37] KVM: Allow range-based TLB invalidation from common code [RFC,33/37] KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code [RFC,34/37] KVM: MMU: Move the TDP iterator to common code [RFC,35/37] KVM: x86/mmu: Move tdp_mmu_max_gfn_exclusive() to tdp_pgtable.c [RFC,36/37] KVM: x86/mmu: Move is_tdp_mmu_page() to mmu_internal.h [RFC,37/37] KVM: MMU: Move the TDP MMU to common code

David Matlack Dec. 8, 2022, 7:38 p.m. UTC

[ mm folks: You are being cc'd since this series includes a mm patch
  ("mm: Introduce architecture-neutral PG_LEVEL macros"), but general
  feedback is also welcome. I imagine there are a lot of lessons KVM can
  learn from mm about sharing page table code across architectures. ]

Hello,

This series refactors the KVM/x86 "TDP MMU" into common code. This is
the first step toward sharing TDP (aka Stage-2) page table management
code across architectures that support KVM. For more background on this
effort please see my talk from KVM Forum 2022 "Exploring an
architecture-neutral MMU":

  https://youtu.be/IBhW34fCFi0

By the end of this series, 90% of the TDP MMU code is in common directories
(virt/kvm/mmu/ and include/kvm/). The only pieces that remaing in
arch/x86 are code that deals with constructing/inspecting/modifying PTEs
and arch hooks to implement NX Huge Pages (a mitigation for an
Intel-specific vulnerability).

Before:

  180 arch/x86/kvm/mmu/tdp_iter.c
  118 arch/x86/kvm/mmu/tdp_iter.h
 1917 arch/x86/kvm/mmu/tdp_mmu.c
   98 arch/x86/kvm/mmu/tdp_mmu.h
 ----
 2313 total

After:

  178 virt/kvm/mmu/tdp_iter.c
 1867 virt/kvm/mmu/tdp_mmu.c
  117 include/kvm/tdp_iter.h
   78 include/kvm/tdp_mmu.h
   39 include/kvm/tdp_pgtable.h
 ----
  184 arch/x86/kvm/mmu/tdp_pgtable.c
   76 arch/x86/include/asm/kvm/tdp_pgtable.h
 ----
 2539 total

This series is very much an RFC, but it does build (I tested x86_64 and
ARM64) and pass basic testing (KVM selftests and kvm-unit-tests on
x86_64), so it is entirely functional aside from any bugs.

The main areas I would like feedback are:

 - NX Huge Pages support in the TDP MMU requires 5 arch hooks in
   the common code, which IMO makes the NX Huge Pages implementation
   harder to read. The alternative is to move the NX Huge Pages
   implementation into common code, including the fields in struct
   kvm_mmu_page and kvm_page_fault, which would increase memory usage
   a tiny bit (for non-x86 architectures) and pollute the common code
   with an x86-specific security mitigation. Ideas on better ways to
   handle this would be appreciated.

 - struct kvm_mmu_page increased by 64 bytes because the separation of
   arch and common state eliminated the ability to use unions to
   optimize the size of the struct. There's two things we can do to
   reduce the size of the struct back down: (1) dynamically allocated
   root-specific fields only for root page tables and (2) dynamically
   allocate Shadow MMU state in kvm_mmu_page_arch only for Shadow MMU
   pages. This should actually be a net *reduction* the size of
   kvm_mmu_page relative today for most pages, but I have not
   implemented it.

   Note that an alternative approach I implemented avoided this problem
   by creating an entirely separate struct for the common TDP MMU (e.g.
   struct tdp_mmu_page). This however had a lot of downsides that I
   don't think make it a good solution. Notably, it complicated a ton of
   existing code in arch/x86/kvm/mmu/mmu.c (e.g. anything that touches
   vcpu->arch.mmu->root and kvm_recover_nx_huge_pages()) and created a
   new runtime failure mode in to_shadow_page().

 - Naming. This series does not change the names of any existing code.
   So all the KVM/x86 Shadow MMU-style terminology like
   "shadow_page"/"sp"/"spte" persists. Should we keep that style in
   common code or move toward something less shadow-paging-specific?
   e.g. "page_table"/"pt"/"pte". Also do we want to keep "TDP" or switch
   to something more familiar across architectures (e.g. ARM and RISC-V
   both use "Stage-2")?

Additionally, there are some warts to be aware of. For these I think
they can be addressed in future series, since they only really matter
once we are ready to enable the common TDP MMU on a non-x86
architecture.

 - Tracepoints. For now the common MMU continues to use the x86
   tracepoints code and they are just stubbed (no-ops) for other
   architectures.

 - tdp_mmu_max_mapping_level() and tdp_mmu_max_gfn_exclusive() are
   currently arch hooks but they can probably be made common code at
   some point.

Lastly, I still need to verify that there are no negative performance
impacts of the changes in this series. My main concern is the new
tdp_pte_*() functions adds overhead from not being able to be inlined.

This series applies on top of kvm/queue commit 89b239585965 ("KVM:
x86/mmu: Pivot on "TDP MMU enabled" when handling direct page faults"),
since there are several recent series in kvm/queue that affect this
refactor. A revert of 0c2a04128f50 ("KVM: x86: remove unnecessary
exports") is also needed since it breaks the build on x86 (unrelated to
this refactor).

Thanks.

P.S. Looking to the future... This is just the first step toward
building a common TDP MMU for KVM. After this, We are looking at adding
KUnit testing to the common TDP MMU as a way to offset the risk of
sharing more code across architectures, and then targeting RISC-V as the
first non-x86 architecture to use the common TDP MMU. If any RISC-V
developer is interested in working on the port, please reach out.

David Matlack (36):
  KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
  KVM: MMU: Move struct kvm_mmu_page_role into common code
  KVM: MMU: Move tdp_ptep_t into common code
  KVM: x86/mmu: Invert sp->tdp_mmu_page to sp->shadow_mmu_page
  KVM: x86/mmu: Unify TDP MMU and Shadow MMU root refcounts
  KVM: MMU: Move struct kvm_mmu_page to common code
  mm: Introduce architecture-neutral PG_LEVEL macros
  KVM: Move page size stats into common code
  KVM: MMU: Move struct kvm_page_fault to common code
  KVM: MMU: Move RET_PF_* into common code
  KVM: x86/mmu: Use PG_LEVEL_{PTE,PMD,PUD} in the TDP MMU
  KVM: MMU: Move sptep_to_sp() to common code
  KVM: MMU: Introduce common macros for TDP page tables
  KVM: x86/mmu: Add a common API for inspecting/modifying TDP PTEs
  KVM: x86/mmu: Abstract away TDP MMU root lookup
  KVM: Move struct kvm_gfn_range to kvm_types.h
  KVM: x86/mmu: Add common API for creating TDP PTEs
  KVM: x86/mmu: Add arch hooks for NX Huge Pages
  KVM: x86/mmu: Abstract away computing the max mapping level
  KVM: Introduce CONFIG_HAVE_TDP_MMU
  KVM: x86: Select HAVE_TDP_MMU if X86_64
  KVM: MMU: Move VM-level TDP MMU state to struct kvm
  KVM: x86/mmu: Move kvm_mmu_hugepage_adjust() up to fault handler
  KVM: x86/mmu: Pass root role to kvm_tdp_mmu_get_vcpu_root_hpa()
  KVM: Move page table cache to struct kvm_vcpu
  KVM: MMU: Move mmu_page_header_cache to common code
  KVM: MMU: Stub out tracepoints on non-x86 architectures
  KVM: x86/mmu: Collapse kvm_flush_remote_tlbs_with_{range,address}()
    together
  KVM: x86/mmu: Rename kvm_flush_remote_tlbs_with_address()
  KVM: x86/MMU: Use gfn_t in kvm_flush_remote_tlbs_range()
  KVM: Allow range-based TLB invalidation from common code
  KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code
  KVM: MMU: Move the TDP iterator to common code
  KVM: x86/mmu: Move tdp_mmu_max_gfn_exclusive() to tdp_pgtable.c
  KVM: x86/mmu: Move is_tdp_mmu_page() to mmu_internal.h
  KVM: MMU: Move the TDP MMU to common code

Jing Zhang (1):
  KVM: selftests: Stop assuming stats are contiguous in
    kvm_binary_stats_test

 MAINTAINERS                                   |   6 +-
 arch/arm64/include/asm/kvm_host.h             |   3 -
 arch/arm64/kvm/arm.c                          |  10 +-
 arch/arm64/kvm/mmu.c                          |   2 +-
 arch/mips/include/asm/kvm_host.h              |   3 -
 arch/mips/kvm/mips.c                          |  10 +-
 arch/mips/kvm/mmu.c                           |   4 +-
 arch/riscv/include/asm/kvm_host.h             |   3 -
 arch/riscv/kvm/mmu.c                          |   8 +-
 arch/riscv/kvm/vcpu.c                         |   4 +-
 arch/x86/include/asm/kvm/mmu_types.h          | 138 ++++++
 arch/x86/include/asm/kvm/tdp_pgtable.h        |  73 ++++
 arch/x86/include/asm/kvm_host.h               | 122 +-----
 arch/x86/include/asm/pgtable_types.h          |  12 +-
 arch/x86/kvm/Kconfig                          |   1 +
 arch/x86/kvm/Makefile                         |   2 +-
 arch/x86/kvm/mmu.h                            |   5 -
 arch/x86/kvm/mmu/mmu.c                        | 409 +++++++++--------
 arch/x86/kvm/mmu/mmu_internal.h               | 221 ++--------
 arch/x86/kvm/mmu/mmutrace.h                   |  20 +-
 arch/x86/kvm/mmu/paging_tmpl.h                |  48 +-
 arch/x86/kvm/mmu/spte.c                       |   7 +-
 arch/x86/kvm/mmu/spte.h                       |  38 +-
 arch/x86/kvm/mmu/tdp_pgtable.c                | 183 ++++++++
 arch/x86/kvm/x86.c                            |  16 +-
 include/kvm/mmu.h                             |  21 +
 include/kvm/mmu_types.h                       | 179 ++++++++
 include/kvm/mmutrace.h                        |  17 +
 {arch/x86/kvm/mmu => include/kvm}/tdp_iter.h  |  19 +-
 {arch/x86/kvm/mmu => include/kvm}/tdp_mmu.h   |  17 +-
 include/kvm/tdp_pgtable.h                     |  39 ++
 include/linux/kvm_host.h                      |  36 +-
 include/linux/kvm_types.h                     |  17 +
 include/linux/mm_types.h                      |   9 +
 .../selftests/kvm/kvm_binary_stats_test.c     |  11 +-
 virt/kvm/Kconfig                              |   3 +
 virt/kvm/Makefile.kvm                         |   3 +
 virt/kvm/kvm_main.c                           |  27 +-
 {arch/x86 => virt}/kvm/mmu/tdp_iter.c         |  24 +-
 {arch/x86 => virt}/kvm/mmu/tdp_mmu.c          | 412 ++++++++----------
 40 files changed, 1245 insertions(+), 937 deletions(-)
 create mode 100644 arch/x86/include/asm/kvm/mmu_types.h
 create mode 100644 arch/x86/include/asm/kvm/tdp_pgtable.h
 create mode 100644 arch/x86/kvm/mmu/tdp_pgtable.c
 create mode 100644 include/kvm/mmu.h
 create mode 100644 include/kvm/mmu_types.h
 create mode 100644 include/kvm/mmutrace.h
 rename {arch/x86/kvm/mmu => include/kvm}/tdp_iter.h (90%)
 rename {arch/x86/kvm/mmu => include/kvm}/tdp_mmu.h (87%)
 create mode 100644 include/kvm/tdp_pgtable.h
 rename {arch/x86 => virt}/kvm/mmu/tdp_iter.c (89%)
 rename {arch/x86 => virt}/kvm/mmu/tdp_mmu.c (84%)


base-commit: 89b2395859651113375101bb07cd6340b1ba3637
prerequisite-patch-id: 19dc9f47392d43a9a86c0e4f3ce1f13b62004eeb

Oliver Upton Dec. 9, 2022, 7:07 p.m. UTC | #1

On Thu, Dec 08, 2022 at 11:38:20AM -0800, David Matlack wrote:
> [ mm folks: You are being cc'd since this series includes a mm patch
>   ("mm: Introduce architecture-neutral PG_LEVEL macros"), but general
>   feedback is also welcome. I imagine there are a lot of lessons KVM can
>   learn from mm about sharing page table code across architectures. ]
> 
> Hello,
> 
> This series refactors the KVM/x86 "TDP MMU" into common code. This is
> the first step toward sharing TDP (aka Stage-2) page table management
> code across architectures that support KVM. For more background on this
> effort please see my talk from KVM Forum 2022 "Exploring an
> architecture-neutral MMU":
> 
>   https://youtu.be/IBhW34fCFi0
> 
> By the end of this series, 90% of the TDP MMU code is in common directories
> (virt/kvm/mmu/ and include/kvm/). The only pieces that remaing in
> arch/x86 are code that deals with constructing/inspecting/modifying PTEs
> and arch hooks to implement NX Huge Pages (a mitigation for an
> Intel-specific vulnerability).
> 
> Before:
> 
>   180 arch/x86/kvm/mmu/tdp_iter.c
>   118 arch/x86/kvm/mmu/tdp_iter.h
>  1917 arch/x86/kvm/mmu/tdp_mmu.c
>    98 arch/x86/kvm/mmu/tdp_mmu.h
>  ----
>  2313 total
> 
> After:
> 
>   178 virt/kvm/mmu/tdp_iter.c
>  1867 virt/kvm/mmu/tdp_mmu.c
>   117 include/kvm/tdp_iter.h
>    78 include/kvm/tdp_mmu.h
>    39 include/kvm/tdp_pgtable.h
>  ----
>   184 arch/x86/kvm/mmu/tdp_pgtable.c
>    76 arch/x86/include/asm/kvm/tdp_pgtable.h
>  ----
>  2539 total
> 
> This series is very much an RFC, but it does build (I tested x86_64 and
> ARM64) and pass basic testing (KVM selftests and kvm-unit-tests on
> x86_64), so it is entirely functional aside from any bugs.
> 
> The main areas I would like feedback are:
> 
>  - NX Huge Pages support in the TDP MMU requires 5 arch hooks in
>    the common code, which IMO makes the NX Huge Pages implementation
>    harder to read. The alternative is to move the NX Huge Pages
>    implementation into common code, including the fields in struct
>    kvm_mmu_page and kvm_page_fault, which would increase memory usage
>    a tiny bit (for non-x86 architectures) and pollute the common code
>    with an x86-specific security mitigation. Ideas on better ways to
>    handle this would be appreciated.
> 
>  - struct kvm_mmu_page increased by 64 bytes because the separation of
>    arch and common state eliminated the ability to use unions to
>    optimize the size of the struct. There's two things we can do to
>    reduce the size of the struct back down: (1) dynamically allocated
>    root-specific fields only for root page tables and (2) dynamically
>    allocate Shadow MMU state in kvm_mmu_page_arch only for Shadow MMU
>    pages. This should actually be a net *reduction* the size of
>    kvm_mmu_page relative today for most pages, but I have not
>    implemented it.
> 
>    Note that an alternative approach I implemented avoided this problem
>    by creating an entirely separate struct for the common TDP MMU (e.g.
>    struct tdp_mmu_page). This however had a lot of downsides that I
>    don't think make it a good solution. Notably, it complicated a ton of
>    existing code in arch/x86/kvm/mmu/mmu.c (e.g. anything that touches
>    vcpu->arch.mmu->root and kvm_recover_nx_huge_pages()) and created a
>    new runtime failure mode in to_shadow_page().
> 
>  - Naming. This series does not change the names of any existing code.
>    So all the KVM/x86 Shadow MMU-style terminology like
>    "shadow_page"/"sp"/"spte" persists. Should we keep that style in
>    common code or move toward something less shadow-paging-specific?
>    e.g. "page_table"/"pt"/"pte".

I would strongly be in favor of discarding the shadow paging residue if
x86 folks are willing to part ways with it :)

>    Also do we want to keep "TDP" or switch
>    to something more familiar across architectures (e.g. ARM and RISC-V
>    both use "Stage-2")?

As it relates to guest memory management I don't see much of an issue
with it, TBH. It is sufficiently arch-generic and gets the point across.

Beyond that I think it really depends on the scope of the common code.

To replace the arm64 table walkers we will need to use it for stage-1
tables. I'm only hand-waving at the cover letter and need to do more
reading, but is it possible to accomplish some division:

 - A set of generic table walkers that implement common operations, like
   map and unmap. Names and types at this layer wouldn't be
   virt-specific.

 - Memory management for KVM guests that uses the table walker library,
   which we can probably still call the TDP MMU.

Certainly this doesn't need to be addressed in the first series, as the x86
surgery is enough on its own. Nonetheless, it is probably worthwhile to
get the conversation started about how this code can actually be used by
the other arches.

--
Thanks,
Oliver

David Matlack Dec. 10, 2022, 1:07 a.m. UTC | #2

On Fri, Dec 9, 2022 at 11:07 AM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> On Thu, Dec 08, 2022 at 11:38:20AM -0800, David Matlack wrote:
>
> >    Also do we want to keep "TDP" or switch
> >    to something more familiar across architectures (e.g. ARM and RISC-V
> >    both use "Stage-2")?
>
> As it relates to guest memory management I don't see much of an issue
> with it, TBH. It is sufficiently arch-generic and gets the point across.
>
> Beyond that I think it really depends on the scope of the common code.
>
> To replace the arm64 table walkers we will need to use it for stage-1
> tables.

Speaking of, have ARM folks ever discussed deduplicating the KVM/ARM
stage-1 code with the Linux stage-1 table code (<linux/pgtable.h>),
which is already architecture-neutral? It seems backwards for us to
build out an architecture-neutral stage-1 walker in KVM when one
already exists.

For example, arch/arm64/kvm/mmu.c:get_user_mapping_size() looks like
it could be reimplemented using <linux/pgtable.h>, rather than using
KVM code. In fact that's what we do for walking stage-1 page tables in
KVM/x86. Take a look at
arch/x86/kvm/mmu/mmu.c:host_pfn_mapping_level(). I bet we could move
that somewhere in mm/ so that it could be shared across KVM/x86 and
KVM/ARM.

> I'm only hand-waving at the cover letter and need to do more
> reading, but is it possible to accomplish some division:
>
>  - A set of generic table walkers that implement common operations, like
>    map and unmap. Names and types at this layer wouldn't be
>    virt-specific.
>
>  - Memory management for KVM guests that uses the table walker library,
>    which we can probably still call the TDP MMU.
>
> Certainly this doesn't need to be addressed in the first series, as the x86
> surgery is enough on its own. Nonetheless, it is probably worthwhile to
> get the conversation started about how this code can actually be used by
> the other arches.

Yup, we'll need some sort of split like that in order to integrate
with KVM/ARM, since the hyp can't access struct kvm, work_queues, etc.
in tdp_mmu.c. I don't think we'll need that split for KVM/RISC-V
though. So for the sake of incremental progress I'm not planning on
doing any of that refactoring preemptively. Plus it should be possible
to keep the TDP MMU API constant when the internal implementation
eventually gets split up. i.e. I don't forsee it creating a bunch of
churn down the road.

Paolo Bonzini Dec. 12, 2022, 10:54 p.m. UTC | #3

On 12/9/22 20:07, Oliver Upton wrote:
>>   - Naming. This series does not change the names of any existing code.
>>     So all the KVM/x86 Shadow MMU-style terminology like
>>     "shadow_page"/"sp"/"spte" persists. Should we keep that style in
>>     common code or move toward something less shadow-paging-specific?
>>     e.g. "page_table"/"pt"/"pte".
>
> I would strongly be in favor of discarding the shadow paging residue if
> x86 folks are willing to part ways with it

Sean Christopherson Dec. 12, 2022, 11:26 p.m. UTC | #4

On Mon, Dec 12, 2022, Paolo Bonzini wrote:
> On 12/9/22 20:07, Oliver Upton wrote:
> > >   - Naming. This series does not change the names of any existing code.
> > >     So all the KVM/x86 Shadow MMU-style terminology like
> > >     "shadow_page"/"sp"/"spte" persists. Should we keep that style in
> > >     common code or move toward something less shadow-paging-specific?
> > >     e.g. "page_table"/"pt"/"pte".
> > 
> > I would strongly be in favor of discarding the shadow paging residue if
> > x86 folks are willing to part ways with it

Paolo Bonzini Dec. 12, 2022, 11:43 p.m. UTC | #5

On 12/13/22 00:26, Sean Christopherson wrote:
>>> I would strongly be in favor of discarding the shadow paging residue if
>>> x86 folks are willing to part ways with it

David Matlack Jan. 19, 2023, 5:14 p.m. UTC | #6

On Thu, Dec 08, 2022 at 11:38:20AM -0800, David Matlack wrote:
> 
> Hello,
> 
> This series refactors the KVM/x86 "TDP MMU" into common code. This is
> the first step toward sharing TDP (aka Stage-2) page table management
> code across architectures that support KVM.

Thank you everyone for the feedback on this RFC. I have a couple of
updates to share and a question at the end.

First, Alexandre Ghiti from Rivos is going to work on the RISC-V port.
I'd like to target RISC-V first, since it has significantly lower risk
and complexity than e.g. ARM (which has pKVM, stage-1 walkers, and
[soon] nested virtualization to deal with).

Before I send a v2 I am working on sending several related patches.
These are patches that should have enough justification to be merged
regardless of the fate of the Common MMU. By sending them out
separately, I figure they will be easier to review, allow me to make
incremental progress, and ultimately simplify the v2 of this series.

What I've sent so far:

 - https://lore.kernel.org/kvm/20230117222707.3949974-1-dmatlack@google.com/
 - https://lore.kernel.org/kvm/20230118175300.790835-1-dmatlack@google.com/

What's coming soon:

 - A series to add a common API for range-based TLB flushing (patches
   29-33 from this series, plus another cleanup). This cleanup stands on
   its own, plus Raghavendra from Google has need of this for his ARM
   series to add range-based TLBI support [1]

 - A patch to move sp->tdp_mmu_page into sp->role.tdp_mmu. This was
   suggested by Paolo as an alternative to patch 4, and saves a byte
   from struct kvm_mmu_page.

There will probably be more related cleanups I will send but this is
everything I'm tracking so far. If anyone wants to see a complete v2
sooner, let me know.

Paolo and Sean, what are your thoughts on merging the Common MMU
refactor without RISC-V support? e.g. Should Alexandre and I work on
developing a functional prototype first, or are you open to merging the
refactor and then building RISC-V support on top of that? My preference
is the latter so that there is a more stable base on which to build the
RISC-V support, we can make incremental progress, and keep everyone
upstream more involved in the development.

Thanks.

[2] https://lore.kernel.org/kvm/20230109215347.3119271-4-rananta@google.com/

Paolo Bonzini Jan. 19, 2023, 5:23 p.m. UTC | #7

On Thu, Jan 19, 2023 at 6:14 PM David Matlack <dmatlack@google.com> wrote:
> Paolo and Sean, what are your thoughts on merging the Common MMU
> refactor without RISC-V support?

I have no objection. We know what the long-term plan is, and it's not
so long-term anyway.

Paolo

Marc Zyngier Jan. 19, 2023, 5:24 p.m. UTC | #8

On Thu, 19 Jan 2023 17:14:34 +0000,
David Matlack <dmatlack@google.com> wrote:
> 
> On Thu, Dec 08, 2022 at 11:38:20AM -0800, David Matlack wrote:
> > 
> > Hello,
> > 
> > This series refactors the KVM/x86 "TDP MMU" into common code. This is
> > the first step toward sharing TDP (aka Stage-2) page table management
> > code across architectures that support KVM.
> 
> Thank you everyone for the feedback on this RFC. I have a couple of
> updates to share and a question at the end.
> 
> First, Alexandre Ghiti from Rivos is going to work on the RISC-V port.
> I'd like to target RISC-V first, since it has significantly lower risk
> and complexity than e.g. ARM (which has pKVM, stage-1 walkers, and
> [soon] nested virtualization to deal with).

And (joy, happiness), the upcoming 128bit page table support[1].

	M.

[1] https://developer.arm.com/documentation/ddi0601/2022-12/AArch64-Registers/TTBR0-EL1--Translation-Table-Base-Register-0--EL1-?lang=en

David Matlack Jan. 19, 2023, 6:38 p.m. UTC | #9

On Thu, Jan 19, 2023 at 9:24 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 19 Jan 2023 17:14:34 +0000, David Matlack <dmatlack@google.com> wrote:
> > I'd like to target RISC-V first, since it has significantly lower risk
> > and complexity than e.g. ARM (which has pKVM, stage-1 walkers, and
> > [soon] nested virtualization to deal with).
>
> And (joy, happiness), the upcoming 128bit page table support[1].

Oh good, I was worried the ARM port was going to be too easy :)

David Matlack Jan. 19, 2023, 7:04 p.m. UTC | #10

On Thu, Jan 19, 2023 at 10:38 AM David Matlack <dmatlack@google.com> wrote:
>
> On Thu, Jan 19, 2023 at 9:24 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Thu, 19 Jan 2023 17:14:34 +0000, David Matlack <dmatlack@google.com> wrote:
> > > I'd like to target RISC-V first, since it has significantly lower risk
> > > and complexity than e.g. ARM (which has pKVM, stage-1 walkers, and
> > > [soon] nested virtualization to deal with).
> >
> > And (joy, happiness), the upcoming 128bit page table support[1].
>
> Oh good, I was worried the ARM port was going to be too easy :)

But in all seriousness, I'm not too worried about supporting 128-bit
page tables in the Common MMU, assuming it is a compile-time decision.
The way I'm planning to organize the code, architecture-specific code
will own the PTEs, so each architecture can do whatever they want.
There is a hard-coded assumption that PTEs are u64 in the current
code, but we can abstract that behind a typedef for 128-bit support.

We will need to figure out how to deal with concurrency though. Will
128-bit page table support come with 128-bit atomic support (e.g.
compare-exchange)? If so we should be good to go. If not, we'll need
to emulate them with e.g. spinlocks. But either way, figuring this out
is not specific to the Common MMU. Even if ARM kept its own stage-2
MMU we'd have to solve the same problem there.

[RFC,00/37] KVM: Refactor the KVM/x86 TDP MMU into common code

Message

Comments