From patchwork Fri Feb 3 20:01:19 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Cao, Lei" X-Patchwork-Id: 9555071 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6C42060405 for ; Fri, 3 Feb 2017 20:01:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4E59327F94 for ; Fri, 3 Feb 2017 20:01:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4195028111; Fri, 3 Feb 2017 20:01:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5DF8027F94 for ; Fri, 3 Feb 2017 20:01:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752150AbdBCUBb (ORCPT ); Fri, 3 Feb 2017 15:01:31 -0500 Received: from us-smtp-delivery-131.mimecast.com ([63.128.21.131]:46036 "EHLO us-smtp-delivery-131.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752062AbdBCUBa (ORCPT ); Fri, 3 Feb 2017 15:01:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=StratusTechnologies.onmicrosoft.com; s=selector1-stratus-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=5W7eDdvFa/NExS5+IeP7SLW4v2OU2y4/DHpEzojBHWs=; b=qYHIs/M/6zgSWiJvNTZ0bdibsr1cyhK944LjJEywYNYX/i5dncKNTJecd//fWJ5E8V/g3EbdU43Nwe7PufahInndrfAW4tWRA+kUFAtAugT+Ks82CooBrLEDlwMdaesE2hWFRzcBlwY7on7RmzZWiM1C044e9nT+PCKrv+KVKzI= Received: from NAM03-DM3-obe.outbound.protection.outlook.com (mail-dm3nam03lp0016.outbound.protection.outlook.com [207.46.163.16]) (Using TLS) by us-smtp-1.mimecast.com with ESMTP id us-mta-90-ys78MHPvPe6wTJnmaa2jvQ-1; Fri, 03 Feb 2017 15:01:21 -0500 Received: from CY1PR08MB1992.namprd08.prod.outlook.com (10.164.222.24) by CY1PR08MB1991.namprd08.prod.outlook.com (10.164.222.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.874.12; Fri, 3 Feb 2017 20:01:19 +0000 Received: from CY1PR08MB1992.namprd08.prod.outlook.com ([10.164.222.24]) by CY1PR08MB1992.namprd08.prod.outlook.com ([10.164.222.24]) with mapi id 15.01.0874.025; Fri, 3 Feb 2017 20:01:19 +0000 From: "Cao, Lei" To: Paolo Bonzini , =?iso-8859-2?Q?Radim_Kr=E8m=E1=F8?= , "kvm@vger.kernel.org" Subject: [PATCH v3 1/4] KVM: Add new generic capability for ring-based dirty memory logging Thread-Topic: [PATCH v3 1/4] KVM: Add new generic capability for ring-based dirty memory logging Thread-Index: AQHSflhI7skjGVmq3kSTP5570K6lSg== Date: Fri, 3 Feb 2017 20:01:19 +0000 Message-ID: References: <201702031949.v13JnJ9k032009@dev1.sn.stratus.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [198.97.42.5] x-ms-office365-filtering-correlation-id: bfdb7f63-b72b-4350-a696-08d44c6f6b84 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001); SRVR:CY1PR08MB1991; x-microsoft-exchange-diagnostics: 1; CY1PR08MB1991; 7:35J9nE2JgenuPbK9eOBhpVtMSeACrFJOe+FnPYB/e1+YZmAQmVtrkmMPnNetb0/LID9IOkNL5423vREJResOUHLW5ieRCD1oqXUdAsdsTvR6jwojuUfVVCQwHrFYI+lJ5WsbSf1zw+5vcJEt8KLV7ahCE2ENrOmbWfXD5C3oQMHqHcN732XBLFOqklH/STo6/9r26j+QHalBQUzD+3XmAIxWZq7fzKnqisMc1Zl8vHgyjri+aXkKVwU8yaIDmKHoteKlGyjtK/Qcjn+9vbIDh/WtyD0AtxMCZjH0/0t7DOTt95wtN5CHJHhxKk4vCO2a9aoVoVM83fu+ylVa7HmN65JLIP+a+FmjjJvcsd/dggleE69zf4B1KOv00qzdywqbm2Q6ilLalsFFIe7XfRHZ3LHwLk9XO/p8Op21TFGJ7NMAhVMmu2Ty+FLqq41zdBM1TntaSIejKLEFKthS8ky6UBnab8M8gDNyNttLAmgchvzXX01K0Ea+ytlPX2mY67U649XJ4DaE8/881kXwBtuBzQ==; 20:zUyUA1EymayKxdHe03wAZub3AkCRAoWKnSfpaFWd5O2mMu1Ot/dtCGKRSVcBEAf8QXuPqBXAROltw/opJPFYW+q2z0M1CQYMuqnPklF7IMZQFNw0i2dj9yQTMhPNTUFLUAkIjqzk0m5x8x9uhVqrmAbMuAyoOfdo5rIYeYRS9ow= x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040375)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(6041248)(20161123560025)(20161123558025)(20161123555025)(20161123562025)(20161123564025)(6072148); SRVR:CY1PR08MB1991; BCL:0; PCL:0; RULEID:; SRVR:CY1PR08MB1991; x-forefront-prvs: 02070414A1 x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(6009001)(7916002)(39450400003)(189002)(199003)(53936002)(9686003)(101416001)(74316002)(50986999)(76176999)(86362001)(54356999)(33656002)(7736002)(305945005)(7696004)(122556002)(2900100001)(3660700001)(68736007)(102836003)(6116002)(3846002)(92566002)(106116001)(2906002)(106356001)(8676002)(81166006)(8936002)(3280700002)(5660300001)(81156014)(189998001)(107886002)(105586002)(97736004)(66066001)(99286003)(55016002)(2501003)(5001770100001)(6506006)(6436002)(38730400001)(25786008)(77096006)(41533002)(14143004); DIR:OUT; SFP:1102; SCL:1; SRVR:CY1PR08MB1991; H:CY1PR08MB1992.namprd08.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: stratus.com X-MS-Exchange-CrossTenant-originalarrivaltime: 03 Feb 2017 20:01:19.3307 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: de36b473-b8ad-46ff-837f-9da16b8d1b77 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR08MB1991 X-MC-Unique: ys78MHPvPe6wTJnmaa2jvQ-1 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add support for capabilities that can be enabled in a generic way. Introduce new capability: ring-based dirty memory logging Signed-off-by: Lei Cao --- Documentation/virtual/kvm/api.txt | 79 +++++++++++++++++++++++++++++++++++++-- arch/powerpc/kvm/powerpc.c | 14 +------ arch/s390/kvm/kvm-s390.c | 11 +----- arch/x86/kvm/x86.c | 14 +------ include/linux/kvm_host.h | 2 + include/uapi/linux/kvm.h | 1 + virt/kvm/kvm_main.c | 32 ++++++++++++++++ 7 files changed, 115 insertions(+), 38 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 03145b7..453c520 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1006,10 +1006,15 @@ documentation when it pops into existence). 4.37 KVM_ENABLE_CAP -Capability: KVM_CAP_ENABLE_CAP, KVM_CAP_ENABLE_CAP_VM -Architectures: x86 (only KVM_CAP_ENABLE_CAP_VM), - mips (only KVM_CAP_ENABLE_CAP), ppc, s390 -Type: vcpu ioctl, vm ioctl (with KVM_CAP_ENABLE_CAP_VM) +Capability: KVM_CAP_ENABLE_CAP +Architectures: mips, ppc, s390 +Type: vcpu ioctl +Parameters: struct kvm_enable_cap (in) +Returns: 0 on success; -1 on error + +Capability: KVM_CAP_ENABLE_CAP_VM +Architectures: all +Type: vcpu ioctl Parameters: struct kvm_enable_cap (in) Returns: 0 on success; -1 on error @@ -3942,3 +3947,69 @@ In order to use SynIC, it has to be activated by setting this capability via KVM_ENABLE_CAP ioctl on the vcpu fd. Note that this will disable the use of APIC hardware virtualization even if supported by the CPU, as it's incompatible with SynIC auto-EOI behavior. + +8.3 KVM_CAP_DIRTY_LOG_RING + +Architectures: x86 +Parameters: args[0] - size of the dirty log ring + +Kernel is capable of tracking dirty memory using rings, which +are stored in memory regions that can be mmaped into userspace. + +There is one dirty ring per vcpu and one global ring. + +The dirty ring has the following structure. + +struct kvm_dirty_gfn { + __u32 pad; + __u32 slot; /* as_id | slot_id */ + __u64 offset; +}; + +struct kvm_dirty_list { + union { + struct { + __u16 avail_index; /* set by kernel */ + __u16 fetch_index; /* set by userspace */ + } indices; + struct kvm_dirty_gfn dirty_gfns[0]; + }; +}; + +Userspace calls KVM_ENABLE_CAP ioctl right after KVM_CREATE_VM +ioctl to enable this capability for the new guest and set the +size of the rings. The size of the ring should be page aligned +and be 16 pages at a minimum. The larger the ring, the less +likely the ring is full and the VM is forced to exit to +userspace. The optimal size is workload dependent. + +After the capability is enabled, userspace mmaps the global +dirty ring. The per-vcpu dirty ring is mmapped along with kvm_run +when vcpu is created. + +To enable dirty logging with ring, userspace calls +KVM_SET_USER_MEMORY_REGION ioctls on all the user memory regions +with KVM_MEM_LOG_DIRTY_PAGES bit set. + +To disable dirty logging with ring, userspace calls +KVM_SET_USER_MEMORY_REGION ioctls on all the user memory regions +with KVM_MEM_LOG_DIRTY_PAGES bit clear. + +Once the dirty logging is enabled, userspace can start harvesting +dirty pages. + +To harvest the dirty pages, userspace accesses the mmaped dirty +list to read the dirty GFNs up to avail_index and set the +fetch_index accordingly. Harvest can be done when the guest is +running or paused. Dirty pages don't need to be harvest all at +once. + +To rearm the dirty traps, userspace calls KVM_RESET_DIRTY_PAGES +ioctl. This should be done only when the guest is paused and +all the dirty pages have been harvested. + +If one of the dirty lists is full, the guest will exit to userspace +with the exit reason set to KVM_EXIT_DIRTY_LOG_FULL, and the +KVM_RUN ioctl will return -EINTR. Once that happens, userspace +should pause all the vcpus, then harvest all the dirty pages and +rearm the dirty traps. It can unpause the guest after that. diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index cd892de..0edae1b 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -507,7 +507,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_PPC_UNSET_IRQ: case KVM_CAP_PPC_IRQ_LEVEL: case KVM_CAP_ENABLE_CAP: - case KVM_CAP_ENABLE_CAP_VM: case KVM_CAP_ONE_REG: case KVM_CAP_IOEVENTFD: case KVM_CAP_DEVICE_CTRL: @@ -1358,8 +1357,8 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event, } -static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, - struct kvm_enable_cap *cap) +int kvm_vm_ioctl_enable_cap(struct kvm *kvm, + struct kvm_enable_cap *cap) { int r; @@ -1412,15 +1411,6 @@ long kvm_arch_vm_ioctl(struct file *filp, break; } - case KVM_ENABLE_CAP: - { - struct kvm_enable_cap cap; - r = -EFAULT; - if (copy_from_user(&cap, argp, sizeof(cap))) - goto out; - r = kvm_vm_ioctl_enable_cap(kvm, &cap); - break; - } #ifdef CONFIG_PPC_BOOK3S_64 case KVM_CREATE_SPAPR_TCE_64: { struct kvm_create_spapr_tce_64 create_tce_64; diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 6484a25..3192e52 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -366,7 +366,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_S390_CSS_SUPPORT: case KVM_CAP_IOEVENTFD: case KVM_CAP_DEVICE_CTRL: - case KVM_CAP_ENABLE_CAP_VM: case KVM_CAP_S390_IRQCHIP: case KVM_CAP_VM_ATTRIBUTES: case KVM_CAP_MP_STATE: @@ -480,7 +479,7 @@ static void icpt_operexc_on_all_vcpus(struct kvm *kvm) } } -static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) +int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) { int r; @@ -1232,14 +1231,6 @@ long kvm_arch_vm_ioctl(struct file *filp, r = kvm_s390_inject_vm(kvm, &s390int); break; } - case KVM_ENABLE_CAP: { - struct kvm_enable_cap cap; - r = -EFAULT; - if (copy_from_user(&cap, argp, sizeof(cap))) - break; - r = kvm_vm_ioctl_enable_cap(kvm, &cap); - break; - } case KVM_CREATE_IRQCHIP: { struct kvm_irq_routing_entry routing; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d153be8..1889f62 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2629,7 +2629,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_HYPERV_TIME: case KVM_CAP_IOAPIC_POLARITY_IGNORED: case KVM_CAP_TSC_DEADLINE_TIMER: - case KVM_CAP_ENABLE_CAP_VM: case KVM_CAP_DISABLE_QUIRKS: case KVM_CAP_SET_BOOT_CPU_ID: case KVM_CAP_SPLIT_IRQCHIP: @@ -3868,8 +3867,8 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event, return 0; } -static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, - struct kvm_enable_cap *cap) +int kvm_vm_ioctl_enable_cap(struct kvm *kvm, + struct kvm_enable_cap *cap) { int r; @@ -4176,15 +4175,6 @@ long kvm_arch_vm_ioctl(struct file *filp, r = 0; break; } - case KVM_ENABLE_CAP: { - struct kvm_enable_cap cap; - - r = -EFAULT; - if (copy_from_user(&cap, argp, sizeof(cap))) - goto out; - r = kvm_vm_ioctl_enable_cap(kvm, &cap); - break; - } default: r = kvm_vm_ioctl_assigned_device(kvm, ioctl, arg); } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 1c5190d..33d9974 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -718,6 +718,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level, bool line_status); +int kvm_vm_ioctl_enable_cap(struct kvm *kvm, + struct kvm_enable_cap *cap); long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg); diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index cac48ed..117f1f9 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -871,6 +871,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_S390_USER_INSTR0 130 #define KVM_CAP_MSI_DEVID 131 #define KVM_CAP_PPC_HTM 132 +#define KVM_CAP_DIRTY_LOG_RING 133 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 482612b..f2744ce 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2927,6 +2927,7 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) #endif case KVM_CAP_IOEVENTFD_ANY_LENGTH: case KVM_CAP_CHECK_EXTENSION_VM: + case KVM_CAP_ENABLE_CAP_VM: return 1; #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING case KVM_CAP_IRQ_ROUTING: @@ -2944,6 +2945,28 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) return kvm_vm_ioctl_check_extension(kvm, arg); } +static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, __u32 size) +{ + return -EINVAL; +} + +int __attribute__((weak)) kvm_vm_ioctl_enable_cap(struct kvm *kvm, + struct kvm_enable_cap *cap) +{ + return -EINVAL; +} + +static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm, + struct kvm_enable_cap *cap) +{ + switch (cap->cap) { + case KVM_CAP_DIRTY_LOG_RING: + return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]); + default: + return kvm_vm_ioctl_enable_cap(kvm, cap); + } +} + static long kvm_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { @@ -2957,6 +2980,15 @@ static long kvm_vm_ioctl(struct file *filp, case KVM_CREATE_VCPU: r = kvm_vm_ioctl_create_vcpu(kvm, arg); break; + case KVM_ENABLE_CAP: { + struct kvm_enable_cap cap; + + r = -EFAULT; + if (copy_from_user(&cap, argp, sizeof(cap))) + goto out; + r = kvm_vm_ioctl_enable_cap_generic(kvm, &cap); + break; + } case KVM_SET_USER_MEMORY_REGION: { struct kvm_userspace_memory_region kvm_userspace_mem;