mbox series

[v5,00/13] KVM/X86: Introduce a new guest mapping interface

Message ID 1547026933-31226-1-git-send-email-karahmed@amazon.de (mailing list archive)
Headers show
Series KVM/X86: Introduce a new guest mapping interface | expand

Message

KarimAllah Ahmed Jan. 9, 2019, 9:42 a.m. UTC
Guest memory can either be directly managed by the kernel (i.e. have a "struct
page") or they can simply live outside kernel control (i.e. do not have a
"struct page"). KVM mostly support these two modes, except in a few places
where the code seems to assume that guest memory must have a "struct page".

This patchset introduces a new mapping interface to map guest memory into host
kernel memory which also supports PFN-based memory (i.e. memory without 'struct
page'). It also converts all offending code to this interface or simply
read/write directly from guest memory. Patch 2 is additionally fixing an
incorrect page release and marking the page as dirty (i.e. as a side-effect of
using the helper function to write).

As far as I can see all offending code is now fixed except the APIC-access page
which I will handle in a seperate series along with dropping
kvm_vcpu_gfn_to_page and kvm_vcpu_gpa_to_page from the internal KVM API.

The current implementation of the new API uses memremap to map memory that does
not have a "struct page". This proves to be very slow for high frequency
mappings. Since this does not affect the normal use-case where a "struct page"
is available, the performance of this API will be handled by a seperate patch
series.

v4 -> v5:
- Introduce a new parameter 'dirty' into kvm_vcpu_unmap
- A horrible rebase due to nested.c :)
- Dropped a couple of hyperv patches as the code was fixed already as a
  side-effect of another patch.
- Added a new trivial cleanup patch.

v3 -> v4:
- Rebase
- Add a new patch to also fix the newly introduced enlightned VMCS.

v2 -> v3:
- Rebase
- Add a new patch to also fix the newly introduced shadow VMCS.

Filippo Sironi (1):
  X86/KVM: Handle PFNs outside of kernel reach when touching GPTEs

KarimAllah Ahmed (12):
  X86/nVMX: handle_vmon: Read 4 bytes from guest memory
  X86/nVMX: Update the PML table without mapping and unmapping the page
  KVM: Introduce a new guest mapping API
  X86/nVMX: handle_vmptrld: Use kvm_vcpu_map when copying VMCS12 from
    guest memory
  KVM/nVMX: Use kvm_vcpu_map when mapping the L1 MSR bitmap
  KVM/nVMX: Use kvm_vcpu_map when mapping the virtual APIC page
  KVM/nVMX: Use kvm_vcpu_map when mapping the posted interrupt
    descriptor table
  KVM/X86: Use kvm_vcpu_map in emulator_cmpxchg_emulated
  KVM/nSVM: Use the new mapping API for mapping guest memory
  KVM/nVMX: Use kvm_vcpu_map for accessing the shadow VMCS
  KVM/nVMX: Use kvm_vcpu_map for accessing the enlightened VMCS
  KVM/nVMX: Use page_address_valid in a few more locations

 arch/x86/kvm/paging_tmpl.h |  38 ++++++++---
 arch/x86/kvm/svm.c         |  97 +++++++++++++--------------
 arch/x86/kvm/vmx/nested.c  | 160 ++++++++++++++++-----------------------------
 arch/x86/kvm/vmx/vmx.c     |  19 ++----
 arch/x86/kvm/vmx/vmx.h     |   9 ++-
 arch/x86/kvm/x86.c         |  13 ++--
 include/linux/kvm_host.h   |   9 +++
 virt/kvm/kvm_main.c        |  53 +++++++++++++++
 8 files changed, 217 insertions(+), 181 deletions(-)

Comments

Konrad Rzeszutek Wilk Jan. 23, 2019, 6:16 p.m. UTC | #1
On Wed, Jan 09, 2019 at 10:42:00AM +0100, KarimAllah Ahmed wrote:
> Guest memory can either be directly managed by the kernel (i.e. have a "struct
> page") or they can simply live outside kernel control (i.e. do not have a
> "struct page"). KVM mostly support these two modes, except in a few places
> where the code seems to assume that guest memory must have a "struct page".
> 
> This patchset introduces a new mapping interface to map guest memory into host
> kernel memory which also supports PFN-based memory (i.e. memory without 'struct
> page'). It also converts all offending code to this interface or simply
> read/write directly from guest memory. Patch 2 is additionally fixing an
> incorrect page release and marking the page as dirty (i.e. as a side-effect of
> using the helper function to write).
> 
> As far as I can see all offending code is now fixed except the APIC-access page
> which I will handle in a seperate series along with dropping
> kvm_vcpu_gfn_to_page and kvm_vcpu_gpa_to_page from the internal KVM API.
> 
> The current implementation of the new API uses memremap to map memory that does
> not have a "struct page". This proves to be very slow for high frequency
> mappings. Since this does not affect the normal use-case where a "struct page"
> is available, the performance of this API will be handled by a seperate patch
> series.

Where could one find this patchset?

Also is there an simple test-case (or a writeup) you have for testing
this code? Specifically I am thinking about the use-case of "memory
without the 'struct page'"

And thank you for posting this patchset. It was a pleasure reviewing the
code!
KarimAllah Ahmed Jan. 25, 2019, 6:28 p.m. UTC | #2
On Wed, 2019-01-23 at 13:16 -0500, Konrad Rzeszutek Wilk wrote:
> On Wed, Jan 09, 2019 at 10:42:00AM +0100, KarimAllah Ahmed wrote:
> > 
> > Guest memory can either be directly managed by the kernel (i.e. have a "struct
> > page") or they can simply live outside kernel control (i.e. do not have a
> > "struct page"). KVM mostly support these two modes, except in a few places
> > where the code seems to assume that guest memory must have a "struct page".
> > 
> > This patchset introduces a new mapping interface to map guest memory into host
> > kernel memory which also supports PFN-based memory (i.e. memory without 'struct
> > page'). It also converts all offending code to this interface or simply
> > read/write directly from guest memory. Patch 2 is additionally fixing an
> > incorrect page release and marking the page as dirty (i.e. as a side-effect of
> > using the helper function to write).
> > 
> > As far as I can see all offending code is now fixed except the APIC-access page
> > which I will handle in a seperate series along with dropping
> > kvm_vcpu_gfn_to_page and kvm_vcpu_gpa_to_page from the internal KVM API.
> > 
> > The current implementation of the new API uses memremap to map memory that does
> > not have a "struct page". This proves to be very slow for high frequency
> > mappings. Since this does not affect the normal use-case where a "struct page"
> > is available, the performance of this API will be handled by a seperate patch
> > series.
> 
> Where could one find this patchset?

Let me clean it and send it out as well :)

> 
> Also is there an simple test-case (or a writeup) you have for testing
> this code? Specifically I am thinking about the use-case of "memory
> without the 'struct page'"

So the simple way to do it is:

1- Pass 'mem=' in the kernel command-line to limit the amount of memory managed 
   by the kernel.
2- Map this physical memory you want to give to the guest with
      mmap("/dev/mem", physical_address_offset, ..)
3- Use the user-space virtual address as the "userspace_addr" field 
   in KVM_SET_USER_MEMORY_REGION ioctl.

You will also need this patch (hopefully I will repost next week as well):
https://patchwork.kernel.org/patch/9191755/

I will make sure to expand on this in the cover letter in v6.

> 
> And thank you for posting this patchset. It was a pleasure reviewing the
> code!



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich
Ust-ID: DE 289 237 879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
Paolo Bonzini Jan. 30, 2019, 5:14 p.m. UTC | #3
On 25/01/19 19:28, Raslan, KarimAllah wrote:
> So the simple way to do it is:
> 
> 1- Pass 'mem=' in the kernel command-line to limit the amount of memory managed 
>    by the kernel.
> 2- Map this physical memory you want to give to the guest with
>       mmap("/dev/mem", physical_address_offset, ..)
> 3- Use the user-space virtual address as the "userspace_addr" field 
>    in KVM_SET_USER_MEMORY_REGION ioctl.
> 
> You will also need this patch (hopefully I will repost next week as well):
> https://patchwork.kernel.org/patch/9191755/

I took a look again at that patch and I guess I've changed my mind now
that the kernel provides e820__mapped_any and e820__mapped_all.
However, please do use e820__mapped_any instead of adding a new function
e820_is_ram.

Thanks,

Paolo

> I will make sure to expand on this in the cover letter in v6.
KarimAllah Ahmed Jan. 31, 2019, 1:03 p.m. UTC | #4
On Wed, 2019-01-30 at 18:14 +0100, Paolo Bonzini wrote:
> On 25/01/19 19:28, Raslan, KarimAllah wrote:
> > 
> > So the simple way to do it is:
> > 
> > 1- Pass 'mem=' in the kernel command-line to limit the amount of memory managed 
> >    by the kernel.
> > 2- Map this physical memory you want to give to the guest with
> >       mmap("/dev/mem", physical_address_offset, ..)
> > 3- Use the user-space virtual address as the "userspace_addr" field 
> >    in KVM_SET_USER_MEMORY_REGION ioctl.
> > 
> > You will also need this patch (hopefully I will repost next week as well):
> > https://patchwork.kernel.org/patch/9191755/
> 
> I took a look again at that patch and I guess I've changed my mind now
> that the kernel provides e820__mapped_any and e820__mapped_all.
> However, please do use e820__mapped_any instead of adding a new function
> e820_is_ram.

The problem with e820__mapped_* is that they are iterating over 'e820_table'
which is already truncated from the 'mem=' and 'memmap=' parameters:

"""
 * - 'e820_table': this is the main E820 table that is massaged by the
 *   low level x86 platform code, or modified by boot parameters, before
 *   passed on to higher level MM layers.
"""

.. so I really still can not use it for this purpose. The structure that I want
to look at is actually 'e820_table_firmware' which is:

"""
 * - 'e820_table_firmware': the original firmware version passed to us by the
 *   bootloader - not modified by the kernel. It is composed of two parts:
 *   the first 128 E820 memory entries in boot_params.e820_table and the
remaining
 *   (if any) entries of the SETUP_E820_EXT nodes. We use this to:
 *
 *       - inform the user about the firmware's notion of memory layout
 *         via /sys/firmware/memmap
 *
 *       - the hibernation code uses it to generate a kernel-independent MD5
 *         fingerprint of the physical memory layout of a system.
"""

The users of e820__mapped_any expect these semantics, so even changing the 
implementation of these functions to use 'e820_table_firmware' to handle this 
will not be an option!

One option here would be to add 'e820__mapped_raw_any' (or whatever 
other name) and make it identical to the current implementation of 
e820__mapped_any at. Would that be slightly more acceptable? :)

> 
> Thanks,
> 
> Paolo
> 
> > 
> > I will make sure to expand on this in the cover letter in v6.
> 



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich
Ust-ID: DE 289 237 879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
Paolo Bonzini Jan. 31, 2019, 1:10 p.m. UTC | #5
On 31/01/19 14:03, Raslan, KarimAllah wrote:
> 
> One option here would be to add 'e820__mapped_raw_any' (or whatever 
> other name) and make it identical to the current implementation of 
> e820__mapped_any at. Would that be slightly more acceptable? :)

Yes, of course it would (for me at least :)).

Paolo