From patchwork Tue Feb 25 19:17:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamas K Lengyel X-Patchwork-Id: 11404465 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5995D14E3 for ; Tue, 25 Feb 2020 19:19:45 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3F8492082F for ; Tue, 25 Feb 2020 19:19:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3F8492082F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1j6fih-00077o-Eq; Tue, 25 Feb 2020 19:18:15 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1j6fif-00077j-Gc for xen-devel@lists.xenproject.org; Tue, 25 Feb 2020 19:18:13 +0000 X-Inumbo-ID: 8f13714a-5803-11ea-aba8-bc764e2007e4 Received: from mga06.intel.com (unknown [134.134.136.31]) by us1-rack-iad1.inumbo.com (Halon) with ESMTPS id 8f13714a-5803-11ea-aba8-bc764e2007e4; Tue, 25 Feb 2020 19:18:10 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Feb 2020 11:18:08 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,485,1574150400"; d="scan'208";a="237776373" Received: from tlengyel-mobl2.amr.corp.intel.com (HELO localhost.localdomain) ([10.254.187.145]) by orsmga003.jf.intel.com with ESMTP; 25 Feb 2020 11:18:07 -0800 From: Tamas K Lengyel To: xen-devel@lists.xenproject.org Date: Tue, 25 Feb 2020 11:17:54 -0800 Message-Id: X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Subject: [Xen-devel] [PATCH v10 0/3] VM forking X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Stefano Stabellini , Tamas K Lengyel , Wei Liu , Konrad Rzeszutek Wilk , Andrew Cooper , Ian Jackson , George Dunlap , Tamas K Lengyel , Jan Beulich , Anthony PERARD , Julien Grall , =?utf-8?q?Roger_Pau_Monn=C3=A9?= Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" The following series implements VM forking for Intel HVM guests to allow for the fast creation of identical VMs without the assosciated high startup costs of booting or restoring the VM from a savefile. JIRA issue: https://xenproject.atlassian.net/browse/XEN-89 The fork operation is implemented as part of the "xl fork-vm" command: xl fork-vm -C -Q -m By default a fully functional fork is created. The user is in charge however to create the appropriate config file for the fork and to generate the QEMU save file before the fork-vm call is made. The config file needs to give the fork a new name at minimum but other settings may also require changes. Certain settings in the config file of both the parent and the fork have to be set to default. Details are documented. The interface also allows to split the forking into two steps: xl fork-vm --launch-dm no \ -m \ -p xl fork-vm --launch-dm late \ -C \ -Q \ The split creation model is useful when the VM needs to be created as fast as possible. The forked VM can be unpaused without the device model being launched to be monitored and accessed via VMI. Note however that without its device model running (depending on what is executing in the VM) it is bound to misbehave or even crash when its trying to access devices that would be emulated by QEMU. We anticipate that for certain use-cases this would be an acceptable situation, in case for example when fuzzing is performed of code segments that don't access such devices. Launching the device model requires the QEMU Xen savefile to be generated manually from the parent VM. This can be accomplished simply by connecting to its QMP socket and issuing the "xen-save-devices-state" command. For example using the standard tool socat these commands can be used to generate the file: socat - UNIX-CONNECT:/var/run/xen/qmp-libxl- { "execute": "qmp_capabilities" } { "execute": "xen-save-devices-state", \ "arguments": { "filename": "/path/to/save/qemu_state", \ "live": false} } At runtime the forked VM starts running with an empty p2m which gets lazily populated when the VM generates EPT faults, similar to how altp2m views are populated. If the memory access is a read-only access, the p2m entry is populated with a memory shared entry with its parent. For write memory accesses or in case memory sharing wasn't possible (for example in case a reference is held by a third party), a new page is allocated and the page contents are copied over from the parent VM. Forks can be further forked if needed, thus allowing for further memory savings. A VM fork reset hypercall is also added that allows the fork to be reset to the state it was just after a fork, also accessible via xl: xl fork-vm --fork-reset -p This is an optimization for cases where the forks are very short-lived and run without a device model, so resetting saves some time compared to creating a brand new fork provided the fork has not aquired a lot of memory. If the fork has a lot of memory deduplicated it is likely going to be faster to create a new fork from scratch and asynchronously destroying the old one. The series has been tested with Windows VMs and functions as expected. Linux VMs when forked from a running VM will have a frozen VNC screen. Linux VMs at this time can only be forked with a working device model when the parent VM was restored from a snapshot using "xl restore -p". This is a known limitation. Forking time has been measured to be 0.0007s, device model launch to be around 1s depending largely on the number of devices being emulated. Fork resets have been measured to be 0.0001s under the optimal circumstances. New in v10: Rebased on staging and minor fixes for things pointed out by Roger Allocate pages for vcpu_info if used by parent Document limitation of guest settings that have to be set to default Require max-vcpus to be specified by toolstack-side Code movement in toolstack & compile tested on ARM Implement hypercall continuation for reset operation Patch 1-2 implements the VM fork & reset operation hypervisor side bits Patch 3 adds the toolstack-side code implementing VM forking and reset Tamas K Lengyel (3): xen/mem_sharing: VM forking x86/mem_sharing: reset a fork xen/tools: VM forking toolstack side docs/man/xl.1.pod.in | 44 ++++ tools/libxc/include/xenctrl.h | 13 + tools/libxc/xc_memshr.c | 22 ++ tools/libxl/libxl.h | 11 + tools/libxl/libxl_create.c | 361 ++++++++++++++------------ tools/libxl/libxl_dm.c | 2 +- tools/libxl/libxl_dom.c | 43 +++- tools/libxl/libxl_internal.h | 7 + tools/libxl/libxl_types.idl | 1 + tools/libxl/libxl_x86.c | 41 +++ tools/xl/Makefile | 2 +- tools/xl/xl.h | 5 + tools/xl/xl_cmdtable.c | 15 ++ tools/xl/xl_forkvm.c | 147 +++++++++++ tools/xl/xl_vmcontrol.c | 14 + xen/arch/x86/domain.c | 11 + xen/arch/x86/hvm/hvm.c | 4 +- xen/arch/x86/mm/hap/hap.c | 3 +- xen/arch/x86/mm/mem_sharing.c | 411 ++++++++++++++++++++++++++++++ xen/arch/x86/mm/p2m.c | 9 +- xen/common/domain.c | 3 + xen/include/asm-x86/hap.h | 1 + xen/include/asm-x86/hvm/hvm.h | 2 + xen/include/asm-x86/mem_sharing.h | 17 ++ xen/include/public/memory.h | 9 + xen/include/xen/sched.h | 5 + 26 files changed, 1032 insertions(+), 171 deletions(-) create mode 100644 tools/xl/xl_forkvm.c