mbox series

[v9,0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point

Message ID 1544468734-32763-1-git-send-email-maran.wilson@oracle.com (mailing list archive)
Headers show
Series KVM: x86: Allow Qemu/KVM to use PVH entry point | expand

Message

Maran Wilson Dec. 10, 2018, 7:05 p.m. UTC
For certain applications it is desirable to rapidly boot a KVM virtual
machine. In cases where legacy hardware and software support within the
guest is not needed, Qemu should be able to boot directly into the
uncompressed Linux kernel binary without the need to run firmware.

There already exists an ABI to allow this for Xen PVH guests and the ABI
is supported by Linux and FreeBSD:

   https://xenbits.xen.org/docs/unstable/misc/pvh.html

This patch series would enable Qemu to use that same entry point for
booting KVM guests.

Changes from v8:

 * Removed unused KVM_GUEST_PVH symbol.

Changes from v7:

 (No functional changes from v7 other than rebasing to latest upstream) 
 * Added Review-by tags as provided by Juergen Gross (1,2,3,6,7)
 * Rebasing to upstream 4.18 caused a minor conflict in patch 4 that had
   to be hand merged due to this patch:
      1fe8388 xen: share start flags between PV and PVH
   I just had to make sure we were accounting for the xen_start_flags
   in the new code path.
 * Rebasing to upstream 4.20-rc4 caused a few minor conflicts in patches
   2,3,5,7 that needed to be resolved by hand. The conflicts were due to
   upstream non-functional code cleanup changes in arch/x86/xen/Makefile
   and arch/x86/platform/pvh/enlighten.c due to these patches:
      28c11b0 x86/xen: Move pv irq related functions under CONFIG_XEN_PV
              umbrella
      357d291 x86/xen: Fix boot loader version reported for PVH guests
      3cfa210 xen: don't include <xen/xen.h> from <asm/io.h> and
              <asm/dma-mapping.h>
 * Qemu and qboot RFC patches have been posted to show one example of how
   this functionality can be used. Some preliminary numbers are available
   in those cover letters showing the KVM guest boot time improvement.
      Qemu:
      http://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg00957.html
      qboot:
      http://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg00953.html

Changes from v6:

 * Addressed issues caught by the kbuild test robot:
    - Restored an #include line that had been dropped by mistake (patch 4)
    - Removed a pair of #include lines that were no longer needed in a
      common code file and causing problems for certain 32-bit configs
      (patchs 4 and 7)

Changes from v5:

 * The interface changes to the x86/HVM start info layout have
   now been accepted into the Xen tree.
 * Rebase and merge upstream PVH file changes.
 * (Patch 6) Synced up to the final version of the header file that was
             acked and pulled into the Xen tree.
 * (Patch 1) Fixed typo and removed redundant "def_bool n" line.

Changes from v4:

Note: I've withheld Juergen's earlier "Reviewed-by" tags from patches
1 and 7 since there were minor changes (mostly just addition of
CONFIG_KVM_GUEST_PVH as requested) that came afterwards.

 * Changed subject prefix from RFC to PATCH
 * Added CONFIG_KVM_GUEST_PVH as suggested
 * Relocated the PVH common files to
   arch/x86/platform/pvh/{enlighten.c,head.S}
 * Realized I also needed to move the objtool override for those files
 * Updated a few code comments per reviewer feedback
 * Sent out a patch of the hvm_start_info struct changes against the Xen
   tree since that is the canonical copy of the header. Discussions on
   that thread have resulted in some (non-functional) updates to
   start_info.h (patch 6/7) and those changes are reflected here as well
   in order to keep the files in sync. The header file has since been
   ack'ed for the Xen tree by Jan Beulich.

Changes from v3:

 * Implemented Juergen's suggestion for refactoring and moving the PVH
   code so that CONFIG_XEN is no longer required for booting KVM guests
   via the PVH entry point.
   Functionally, nothing has changed from V3 really, but the patches
   look completely different now because of all the code movement and
   refactoring. Some of these patches can be combined, but I've left
   them very small in some cases to make the refactoring and code
   movement easier to review.
   My approach for refactoring has been to create a PVH entry layer that
   still has understanding and knowledge about Xen vs non-Xen guest types
   so that it can make run time decisions to handle either case, as
   opposed to going all the way and re-writing it to be a completely
   hypervisor agnostic and architecturally pure layer that is separate
   from guest type details. The latter seemed a bit overkill in this
   situation. And I've handled the complexity of having to support
   Qemu/KVM boot of kernels compiled with or without CONFIG_XEN via a
   pair of xen specific __weak routines that can be overridden in kernels
   that support Xen guests. Importantly, the __weak routines are for
   xen specific code only (not generic "guest type" specific code) so
   there is no clashing between xen version of the strong routine and,
   say, a KVM version of the same routine. But I'm sure there are many
   ways to skin this cat, so I'm open to alternate suggestions if there
   is a compelling reason for not using __weak in this situation.

Changes from v2:

 * All structures (including memory map table entries) are padded and
   aligned to an 8 byte boundary.

 * Removed the "packed" attributes and made changes to comments as
   suggested by Jan.

Changes from v1:

 * Adopted Paolo's suggestion for defining a v2 PVH ABI that includes the
   e820 map instead of using the second module entry to pass the table.

 * Cleaned things up a bit to reduce the number of xen vs non-xen special
   cases.


Maran Wilson (7):
  xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH
  xen/pvh: Move PVH entry code out of Xen specific tree
  xen/pvh: Create a new file for Xen specific PVH code
  xen/pvh: Move Xen specific PVH VM initialization out of common file
  xen/pvh: Move Xen code for getting mem map via hcall out of common
    file
  xen/pvh: Add memory map pointer to hvm_start_info struct
  KVM: x86: Allow Qemu/KVM to use PVH entry point

 MAINTAINERS                                     |   1 +
 arch/x86/Kbuild                                 |   2 +
 arch/x86/Kconfig                                |   6 ++
 arch/x86/kernel/head_64.S                       |   2 +-
 arch/x86/platform/pvh/Makefile                  |   5 +
 arch/x86/platform/pvh/enlighten.c               | 137 ++++++++++++++++++++++++
 arch/x86/{xen/xen-pvh.S => platform/pvh/head.S} |   0
 arch/x86/xen/Kconfig                            |   3 +-
 arch/x86/xen/Makefile                           |   2 -
 arch/x86/xen/enlighten_pvh.c                    |  94 ++++------------
 include/xen/interface/hvm/start_info.h          |  63 ++++++++++-
 include/xen/xen.h                               |   3 +
 12 files changed, 237 insertions(+), 81 deletions(-)
 create mode 100644 arch/x86/platform/pvh/Makefile
 create mode 100644 arch/x86/platform/pvh/enlighten.c
 rename arch/x86/{xen/xen-pvh.S => platform/pvh/head.S} (100%)

Comments

Borislav Petkov Dec. 11, 2018, 1:18 p.m. UTC | #1
On Mon, Dec 10, 2018 at 11:05:34AM -0800, Maran Wilson wrote:
> For certain applications it is desirable to rapidly boot a KVM virtual
> machine. In cases where legacy hardware and software support within the
> guest is not needed, Qemu should be able to boot directly into the
> uncompressed Linux kernel binary without the need to run firmware.
> 
> There already exists an ABI to allow this for Xen PVH guests and the ABI
> is supported by Linux and FreeBSD:
> 
>    https://xenbits.xen.org/docs/unstable/misc/pvh.html
> 
> This patch series would enable Qemu to use that same entry point for
> booting KVM guests.

How would I do that, practically?

Looking at those here:

>  * Qemu and qboot RFC patches have been posted to show one example of how
>    this functionality can be used. Some preliminary numbers are available
>    in those cover letters showing the KVM guest boot time improvement.
>       Qemu:
>       http://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg00957.html
>       qboot:
>       http://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg00953.html

I might still need to do some dancing to get stuff going.

Thx.
Maran Wilson Dec. 11, 2018, 7:29 p.m. UTC | #2
On 12/11/2018 5:18 AM, Borislav Petkov wrote:
> On Mon, Dec 10, 2018 at 11:05:34AM -0800, Maran Wilson wrote:
>> For certain applications it is desirable to rapidly boot a KVM virtual
>> machine. In cases where legacy hardware and software support within the
>> guest is not needed, Qemu should be able to boot directly into the
>> uncompressed Linux kernel binary without the need to run firmware.
>>
>> There already exists an ABI to allow this for Xen PVH guests and the ABI
>> is supported by Linux and FreeBSD:
>>
>>     https://xenbits.xen.org/docs/unstable/misc/pvh.html
>>
>> This patch series would enable Qemu to use that same entry point for
>> booting KVM guests.
> How would I do that, practically?
>
> Looking at those here:
>
>>   * Qemu and qboot RFC patches have been posted to show one example of how
>>     this functionality can be used. Some preliminary numbers are available
>>     in those cover letters showing the KVM guest boot time improvement.
>>        Qemu:
>>        http://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg00957.html
>>        qboot:
>>        http://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg00953.html
> I might still need to do some dancing to get stuff going.

Is your question about what options you need to provide to Qemu? Or is 
your question about the SW implementation choices?

Assuming the former... once you have compiled all 3 new binaries 
(kernel, Qemu, and qboot) then you simply invoke qemu the same way you 
normally invoke qemu with qboot + kernel binary, except you provide the 
vmlinux (uncompressed) kernel binary when specifying the "-kernel" 
parameter. Qemu/qboot will automatically detect that you have provided 
an ELF binary, find the PVH ELF note to locate the entry point, and 
proceed to boot the kernel via that method. On the other hand, if you 
leave all the Qemu options as-is, but simply provide the bzImage 
(compressed) kernel binary from the same build, Qemu/qboot will boot the 
way it has always done and not look for PVH.

To make it more concrete, here's an example of how I had been invoking 
PVH boot recently:

    x86_64-softmmu/qemu-system-x86_64 \
      -name testvm01 \
      -machine q35,accel=kvm,nvdimm \
      -cpu host \
      -m 1024,maxmem=20G,slots=2 \
      -smp 1 \
      -nodefaults \
      -kernel binaries/vmlinux \
      -object 
memory-backend-file,id=mem0,share,mem-path=binaries/containers.img,size=235929600 
\
      -device nvdimm,memdev=mem0,id=nv0 \
      -append 'console=ttyS0,115200,8n1 root=/dev/pmem0p1 panic=1 rw 
init=/usr/lib/systemd/systemd rootfstype=ext4' \
      -bios binaries/bios.bin \
      -serial mon:stdio

Thanks,
-Maran



> Thx.
>
Borislav Petkov Dec. 12, 2018, 8:39 p.m. UTC | #3
On Tue, Dec 11, 2018 at 11:29:21AM -0800, Maran Wilson wrote:
> Is your question about what options you need to provide to Qemu? Or is your
> question about the SW implementation choices?
> 
> Assuming the former...

Yeah, that's what I wanted to know. But looking at it, I'm booting
bzImage here just as quickly and as flexible so I don't see the
advantage of this new method for my use case here of booting kernels
in qemu.

But maybe there's a good use case where firmware is slow and one doesn't
really wanna noodle through it or when one does start a gazillion VMs
per second or whatever...

Thx.
Maran Wilson Dec. 12, 2018, 9:56 p.m. UTC | #4
On 12/12/2018 12:39 PM, Borislav Petkov wrote:
> On Tue, Dec 11, 2018 at 11:29:21AM -0800, Maran Wilson wrote:
>> Is your question about what options you need to provide to Qemu? Or is your
>> question about the SW implementation choices?
>>
>> Assuming the former...
> Yeah, that's what I wanted to know. But looking at it, I'm booting
> bzImage here just as quickly and as flexible so I don't see the
> advantage of this new method for my use case here of booting kernels
> in qemu.
>
> But maybe there's a good use case where firmware is slow and one doesn't
> really wanna noodle through it or when one does start a gazillion VMs
> per second or whatever...

Right, the time saved is not something you would notice while starting a 
VM manually. But it does reduce the time to reach startup_64() in Linux 
by about 50% (going from around 94ms to around 47ms) when booting a VM 
using Qemu+qboot (for example). That time savings becomes pretty 
important when you are trying to use VMs as containers (for instance, as 
is the case with Kata containers) and trying to get the latency for 
launching such a container really low -- to come as close as possible to 
match the latency for launching more traditional containers that don't 
have the additional security/isolation of running within a separate VM.

Thanks,
-Maran

>
> Thx.
>
Paolo Bonzini Dec. 13, 2018, 1:15 p.m. UTC | #5
On 12/12/18 21:39, Borislav Petkov wrote:
> On Tue, Dec 11, 2018 at 11:29:21AM -0800, Maran Wilson wrote:
>> Is your question about what options you need to provide to Qemu? Or is your
>> question about the SW implementation choices?
>>
>> Assuming the former...
> Yeah, that's what I wanted to know. But looking at it, I'm booting
> bzImage here just as quickly and as flexible so I don't see the
> advantage of this new method for my use case here of booting kernels
> in qemu.

It's not firmware that is slow, decompression is.  Unlike Xen, which is
using PVH with a regular bzImage and decompression in the host, KVM is
using PVH to boot a vmlinux with no decompression at all.

Paolo

> But maybe there's a good use case where firmware is slow and one doesn't
> really wanna noodle through it or when one does start a gazillion VMs
> per second or whatever...
Boris Ostrovsky Dec. 14, 2018, 11:13 p.m. UTC | #6
On 12/10/18 2:05 PM, Maran Wilson wrote:
> For certain applications it is desirable to rapidly boot a KVM virtual
> machine. In cases where legacy hardware and software support within the
> guest is not needed, Qemu should be able to boot directly into the
> uncompressed Linux kernel binary without the need to run firmware.
>
> There already exists an ABI to allow this for Xen PVH guests and the ABI
> is supported by Linux and FreeBSD:
>
>    https://xenbits.xen.org/docs/unstable/misc/pvh.html
>
> This patch series would enable Qemu to use that same entry point for
> booting KVM guests.
>


Applied to for-linus-4.21


-boris