mbox series

[v3,00/10] arm64: ptdump: View the second stage page-tables

Message ID 20231115171639.2852644-2-sebastianene@google.com (mailing list archive)
Headers show
Series arm64: ptdump: View the second stage page-tables | expand

Message

Sebastian Ene Nov. 15, 2023, 5:16 p.m. UTC
Hi,

This can be used as a debugging tool for dumping the second stage
page-tables.

When CONFIG_PTDUMP_STAGE2_DEBUGFS is enabled, ptdump registers 
'/sys/debug/kvm/<guest_id>/stage2_page_tables' entry with debugfs
upon guest creation. This allows userspace tools (eg. cat) to dump the
stage-2 pagetables by reading the registered file.

Reading the debugfs file shows stage-2 memory ranges in following format:
<IPA range> <size> <descriptor type> <access permissions> <mem_attributes>

Under pKVM configuration(kvm-arm.mode=protected) ptdump registers an entry
for the host stage-2 pagetables in the following path:
/sys/debug/kvm/host_stage2_page_tables/

The tool interprets the pKVM ownership annotation stored in the invalid
entries and dumps to the console the ownership information. To be able
to access the host stage-2 page-tables from the kernel, a new hypervisor
call was introduced which allows us to snapshot the page-tables in a host
provided buffer. The hypervisor call is hidden behind CONFIG_NVHE_EL2_DEBUG
as this should be used under debugging environment.

Link to the second version:
https://lore.kernel.org/all/20231019144032.2943044-1-sebastianene@google.com/#r

Link to the first version:
https://lore.kernel.org/all/20230927112517.2631674-1-sebastianene@google.com/

Changelog:

  v2 -> v3:
  * register the stage-2 debugfs entry for the host under
    /sys/debug/kvm/host_stage2_page_tables and in
    /sys/debug/kvm/<guest_id>/stage2_page_tables for guests.

  * don't use a static array for parsing the attributes description,
    generate it dynamically based on the number of pagetable levels

  * remove the lock that was guarding the seq_file private inode data,
    and keep the data private to the open file session.

  * minor fixes & renaming of CONFIG_NVHE_EL2_PTDUMP_DEBUGFS to
    CONFIG_PTDUMP_STAGE2_DEBUGFS

  v1 -> v2:
  * use the stage-2 pagetable walker for dumping descriptors instead of
    the one provided by ptdump.

  * support for guests pagetables dumping under VHE/nVHE non-protected

Thanks,

Sebastian Ene (10):
  KVM: arm64: Add snap shooting the host stage-2 pagetables
  arm64: ptdump: Use the mask from the state structure
  arm64: ptdump: Add the walker function to the ptdump info structure
  KVM: arm64: Move pagetable definitions to common header
  arm64: ptdump: Add hooks on debugfs file operations
  arm64: ptdump: Register a debugfs entry for the host stage-2 tables
  arm64: ptdump: Parse the host stage-2 page-tables from the snapshot
  arm64: ptdump: Interpret memory attributes based on runtime
    configuration
  arm64: ptdump: Interpret pKVM ownership annotations
  arm64: ptdump: Add support for guest stage-2 pagetables dumping

 arch/arm64/include/asm/kvm_asm.h              |   1 +
 arch/arm64/include/asm/kvm_pgtable.h          |  85 +++
 arch/arm64/include/asm/ptdump.h               |  27 +
 arch/arm64/kvm/Kconfig                        |  13 +
 arch/arm64/kvm/arm.c                          |   2 +
 arch/arm64/kvm/debug.c                        |   6 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   8 +-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  20 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 102 ++++
 arch/arm64/kvm/hyp/pgtable.c                  |  98 ++--
 arch/arm64/kvm/mmu.c                          |   2 +
 arch/arm64/mm/ptdump.c                        | 483 +++++++++++++++++-
 arch/arm64/mm/ptdump_debugfs.c                |  64 ++-
 13 files changed, 852 insertions(+), 59 deletions(-)

Comments

Oliver Upton Nov. 22, 2023, 11:18 p.m. UTC | #1
Hi Seb,

On Wed, Nov 15, 2023 at 05:16:30PM +0000, Sebastian Ene wrote:
> Hi,
> 
> This can be used as a debugging tool for dumping the second stage
> page-tables.
> 
> When CONFIG_PTDUMP_STAGE2_DEBUGFS is enabled, ptdump registers 
> '/sys/debug/kvm/<guest_id>/stage2_page_tables' entry with debugfs
> upon guest creation. This allows userspace tools (eg. cat) to dump the
> stage-2 pagetables by reading the registered file.
> 
> Reading the debugfs file shows stage-2 memory ranges in following format:
> <IPA range> <size> <descriptor type> <access permissions> <mem_attributes>
> 
> Under pKVM configuration(kvm-arm.mode=protected) ptdump registers an entry
> for the host stage-2 pagetables in the following path:
> /sys/debug/kvm/host_stage2_page_tables/
> 
> The tool interprets the pKVM ownership annotation stored in the invalid
> entries and dumps to the console the ownership information. To be able
> to access the host stage-2 page-tables from the kernel, a new hypervisor
> call was introduced which allows us to snapshot the page-tables in a host
> provided buffer. The hypervisor call is hidden behind CONFIG_NVHE_EL2_DEBUG
> as this should be used under debugging environment.

While I think the value of the feature you're proposing is great, I'm
not a fan of the current shape of this series.

Reusing note_page() for the stage-2 dump is somewhat convenient, but the
series pulls a **massive** amount of KVM details outside of KVM:

 - Open-coding the whole snapshotting interface with EL2 outside of KVM.
   This is a complete non-starter for me; the kernel<->EL2 interface
   needs to be owned by the EL1 portions of KVM.

 - Building page-table walkers using the KVM pgtable library outside of
   KVM.

 - Copying (rather than directly calling) the logic responsible for
   things like FWB and PGD concatenation.

 - Hoisting the definition of _software bits_ outside of KVM. I'm less
   concerned about hardware bits since they have an unambiguous meaning.

I think exporting the necessary stuff from ptdump into KVM will lead to
a much cleaner implementation.
Sebastian Ene Nov. 23, 2023, 9:49 a.m. UTC | #2
On Wed, Nov 22, 2023 at 11:18:45PM +0000, Oliver Upton wrote:

Hi Oliver,

> Hi Seb,
> 
> On Wed, Nov 15, 2023 at 05:16:30PM +0000, Sebastian Ene wrote:
> > Hi,
> > 
> > This can be used as a debugging tool for dumping the second stage
> > page-tables.
> > 
> > When CONFIG_PTDUMP_STAGE2_DEBUGFS is enabled, ptdump registers 
> > '/sys/debug/kvm/<guest_id>/stage2_page_tables' entry with debugfs
> > upon guest creation. This allows userspace tools (eg. cat) to dump the
> > stage-2 pagetables by reading the registered file.
> > 
> > Reading the debugfs file shows stage-2 memory ranges in following format:
> > <IPA range> <size> <descriptor type> <access permissions> <mem_attributes>
> > 
> > Under pKVM configuration(kvm-arm.mode=protected) ptdump registers an entry
> > for the host stage-2 pagetables in the following path:
> > /sys/debug/kvm/host_stage2_page_tables/
> > 
> > The tool interprets the pKVM ownership annotation stored in the invalid
> > entries and dumps to the console the ownership information. To be able
> > to access the host stage-2 page-tables from the kernel, a new hypervisor
> > call was introduced which allows us to snapshot the page-tables in a host
> > provided buffer. The hypervisor call is hidden behind CONFIG_NVHE_EL2_DEBUG
> > as this should be used under debugging environment.
> 
> While I think the value of the feature you're proposing is great, I'm
> not a fan of the current shape of this series.
> 
> Reusing note_page() for the stage-2 dump is somewhat convenient, but the
> series pulls a **massive** amount of KVM details outside of KVM:
> 
>  - Open-coding the whole snapshotting interface with EL2 outside of KVM.
>    This is a complete non-starter for me; the kernel<->EL2 interface
>    needs to be owned by the EL1 portions of KVM.
> 
>  - Building page-table walkers using the KVM pgtable library outside of
>    KVM.
> 
>  - Copying (rather than directly calling) the logic responsible for
>    things like FWB and PGD concatenation.
> 
>  - Hoisting the definition of _software bits_ outside of KVM. I'm less
>    concerned about hardware bits since they have an unambiguous meaning.
> 
> I think exporting the necessary stuff from ptdump into KVM will lead to
> a much cleaner implementation.
> 

Right, I had to import a lot of definitions from KVM, especially for the
prot_bits array and for the IPA size retrieval. I think it would be less
intrusive the other way around, to pull some ptdump hooks into kvm.

> -- 
> Thanks,
> Oliver

Thanks,
Seb