From patchwork Fri Nov 29 21:34:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11267659 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4E2D215AB for ; Fri, 29 Nov 2019 21:36:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 17655206B5 for ; Fri, 29 Nov 2019 21:36:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="KV10N1S1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727313AbfK2VfM (ORCPT ); Fri, 29 Nov 2019 16:35:12 -0500 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:25851 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727177AbfK2VfM (ORCPT ); Fri, 29 Nov 2019 16:35:12 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575063311; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eAG5Jn3OuAFdTylu6pJu+NV01EXcH/0EJLHVxQXwkPc=; b=KV10N1S1cAf5OMEOtSQ/72E+Kub1jxAI227j9BKu/6l0APCBjPflHpxAfivlPzrIFv9e39 9d6weWYrj5LH58T21PqtrLaseZHhdhbM+xwcNprl0Dib02C8Z4CbqmiX+l36rski1jrULy LmcXBtOEWdFLt9dRvytoSj9wyZ0+hRY= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-87-_7ZoNR8aNu2IfZ6JIT30HQ-1; Fri, 29 Nov 2019 16:35:09 -0500 Received: by mail-qv1-f71.google.com with SMTP id m12so19639426qvv.8 for ; Fri, 29 Nov 2019 13:35:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1QV5ryCIGq2QV/eZOsMqVJWdaldoWJG8mcfxAcgQq7Q=; b=HdzjPit+l4yOanA9Dy3RFt0wWmNVoZE1TXOrMqDoW1KrZD3SI0EwJ3TetOQjRJkHuK TjF6tB9CHrvm/zpecj/mlxJXXnn9HSyeaBfuyH++Hhvsm3FFXNPyzpqL2Rpe6/+EFVE+ L6fl0NlR4nB2sYNO/gOgwz2uILKFNrZSdMgmvkP8Fmelq2m5zVTHnjFLVSp5OsAPb+c7 baxI6AdCRUsVEiAj+JCDqZpf0jPwu8ICjbyZMTDw8Dyc91NLQFR3nfGqqNuvab6I+8+V sl+64pXmSI08XEzNd/hW38JZB1Tx380MUi51IcODpCeZB0lEDseluFXDOHqNDs7E8TbR P6sQ== X-Gm-Message-State: APjAAAUyU2KRRRzzvsgk6Av1c+4IJYs13glnJxDuv+Znqv2aOTTDmV+H pbMS6z166RAnJ0TEf2FUswYrmkIFRDEMsj8Jfft7bG/JeGVsaOwQEIeqbiiUvDi6S4VggqEKXQk NngJReGrK6qSK X-Received: by 2002:a0c:fecc:: with SMTP id z12mr19076912qvs.189.1575063309331; Fri, 29 Nov 2019 13:35:09 -0800 (PST) X-Google-Smtp-Source: APXvYqzrJ96EHRAhV7IUSiQFZ2LxjdO0gGveoeMOs0GpiQTvE8VMjMFlsNCKCy1hWILb3LPOZIT7+A== X-Received: by 2002:a0c:fecc:: with SMTP id z12mr19076883qvs.189.1575063308934; Fri, 29 Nov 2019 13:35:08 -0800 (PST) Received: from xz-x1.yyz.redhat.com ([104.156.64.74]) by smtp.gmail.com with ESMTPSA id h186sm10679046qkf.64.2019.11.29.13.35.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Nov 2019 13:35:07 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Sean Christopherson , Paolo Bonzini , "Dr . David Alan Gilbert" , peterx@redhat.com, Vitaly Kuznetsov Subject: [PATCH RFC 01/15] KVM: Move running VCPU from ARM to common code Date: Fri, 29 Nov 2019 16:34:51 -0500 Message-Id: <20191129213505.18472-2-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20191129213505.18472-1-peterx@redhat.com> References: <20191129213505.18472-1-peterx@redhat.com> MIME-Version: 1.0 X-MC-Unique: _7ZoNR8aNu2IfZ6JIT30HQ-1 X-Mimecast-Spam-Score: 0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Paolo Bonzini For ring-based dirty log tracking, it will be more efficient to account writes during schedule-out or schedule-in to the currently running VCPU. We would like to do it even if the write doesn't use the current VCPU's address space, as is the case for cached writes (see commit 4e335d9e7ddb, "Revert "KVM: Support vCPU-based gfn->hva cache"", 2017-05-02). Therefore, add a mechanism to track the currently-loaded kvm_vcpu struct. There is already something similar in KVM/ARM; one important difference is that kvm_arch_vcpu_{load,put} have two callers in virt/kvm/kvm_main.c: we have to update both the architecture-independent vcpu_{load,put} and the preempt notifiers. Another change made in the process is to allow using kvm_get_running_vcpu() in preemptible code. This is allowed because preempt notifiers ensure that the value does not change even after the VCPU thread is migrated. Signed-off-by: Paolo Bonzini Signed-off-by: Peter Xu --- arch/arm/include/asm/kvm_host.h | 2 -- arch/arm64/include/asm/kvm_host.h | 2 -- include/linux/kvm_host.h | 3 +++ virt/kvm/arm/arm.c | 29 ----------------------------- virt/kvm/arm/perf.c | 6 +++--- virt/kvm/arm/vgic/vgic-mmio.c | 15 +++------------ virt/kvm/kvm_main.c | 25 ++++++++++++++++++++++++- 7 files changed, 33 insertions(+), 49 deletions(-) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 556cd818eccf..abc3f6f3ad76 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -284,8 +284,6 @@ int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices); int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end); int kvm_test_age_hva(struct kvm *kvm, unsigned long hva); -struct kvm_vcpu *kvm_arm_get_running_vcpu(void); -struct kvm_vcpu __percpu **kvm_get_running_vcpus(void); void kvm_arm_halt_guest(struct kvm *kvm); void kvm_arm_resume_guest(struct kvm *kvm); diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index b36dae9ee5f9..d97855e41469 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -446,8 +446,6 @@ int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte); int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end); int kvm_test_age_hva(struct kvm *kvm, unsigned long hva); -struct kvm_vcpu *kvm_arm_get_running_vcpu(void); -struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void); void kvm_arm_halt_guest(struct kvm *kvm); void kvm_arm_resume_guest(struct kvm *kvm); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 7ed1e2f8641e..498a39462ac1 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1342,6 +1342,9 @@ static inline void kvm_vcpu_set_dy_eligible(struct kvm_vcpu *vcpu, bool val) } #endif /* CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT */ +struct kvm_vcpu *kvm_get_running_vcpu(void); +struct kvm_vcpu __percpu **kvm_get_running_vcpus(void); + #ifdef CONFIG_HAVE_KVM_IRQ_BYPASS bool kvm_arch_has_irq_bypass(void); int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *, diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 12e0280291ce..1df9c39024fa 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -51,9 +51,6 @@ __asm__(".arch_extension virt"); DEFINE_PER_CPU(kvm_host_data_t, kvm_host_data); static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page); -/* Per-CPU variable containing the currently running vcpu. */ -static DEFINE_PER_CPU(struct kvm_vcpu *, kvm_arm_running_vcpu); - /* The VMID used in the VTTBR */ static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1); static u32 kvm_next_vmid; @@ -62,31 +59,8 @@ static DEFINE_SPINLOCK(kvm_vmid_lock); static bool vgic_present; static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled); - -static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu) -{ - __this_cpu_write(kvm_arm_running_vcpu, vcpu); -} - DEFINE_STATIC_KEY_FALSE(userspace_irqchip_in_use); -/** - * kvm_arm_get_running_vcpu - get the vcpu running on the current CPU. - * Must be called from non-preemptible context - */ -struct kvm_vcpu *kvm_arm_get_running_vcpu(void) -{ - return __this_cpu_read(kvm_arm_running_vcpu); -} - -/** - * kvm_arm_get_running_vcpus - get the per-CPU array of currently running vcpus. - */ -struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void) -{ - return &kvm_arm_running_vcpu; -} - int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu) { return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE; @@ -406,7 +380,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) vcpu->cpu = cpu; vcpu->arch.host_cpu_context = &cpu_data->host_ctxt; - kvm_arm_set_running_vcpu(vcpu); kvm_vgic_load(vcpu); kvm_timer_vcpu_load(vcpu); kvm_vcpu_load_sysregs(vcpu); @@ -432,8 +405,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) kvm_vcpu_pmu_restore_host(vcpu); vcpu->cpu = -1; - - kvm_arm_set_running_vcpu(NULL); } static void vcpu_power_off(struct kvm_vcpu *vcpu) diff --git a/virt/kvm/arm/perf.c b/virt/kvm/arm/perf.c index 918cdc3839ea..d45b8b9a4415 100644 --- a/virt/kvm/arm/perf.c +++ b/virt/kvm/arm/perf.c @@ -13,14 +13,14 @@ static int kvm_is_in_guest(void) { - return kvm_arm_get_running_vcpu() != NULL; + return kvm_get_running_vcpu() != NULL; } static int kvm_is_user_mode(void) { struct kvm_vcpu *vcpu; - vcpu = kvm_arm_get_running_vcpu(); + vcpu = kvm_get_running_vcpu(); if (vcpu) return !vcpu_mode_priv(vcpu); @@ -32,7 +32,7 @@ static unsigned long kvm_get_guest_ip(void) { struct kvm_vcpu *vcpu; - vcpu = kvm_arm_get_running_vcpu(); + vcpu = kvm_get_running_vcpu(); if (vcpu) return *vcpu_pc(vcpu); diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c index 0d090482720d..d656ebd5f9d4 100644 --- a/virt/kvm/arm/vgic/vgic-mmio.c +++ b/virt/kvm/arm/vgic/vgic-mmio.c @@ -190,15 +190,6 @@ unsigned long vgic_mmio_read_pending(struct kvm_vcpu *vcpu, * value later will give us the same value as we update the per-CPU variable * in the preempt notifier handlers. */ -static struct kvm_vcpu *vgic_get_mmio_requester_vcpu(void) -{ - struct kvm_vcpu *vcpu; - - preempt_disable(); - vcpu = kvm_arm_get_running_vcpu(); - preempt_enable(); - return vcpu; -} /* Must be called with irq->irq_lock held */ static void vgic_hw_irq_spending(struct kvm_vcpu *vcpu, struct vgic_irq *irq, @@ -221,7 +212,7 @@ void vgic_mmio_write_spending(struct kvm_vcpu *vcpu, gpa_t addr, unsigned int len, unsigned long val) { - bool is_uaccess = !vgic_get_mmio_requester_vcpu(); + bool is_uaccess = !kvm_get_running_vcpu(); u32 intid = VGIC_ADDR_TO_INTID(addr, 1); int i; unsigned long flags; @@ -274,7 +265,7 @@ void vgic_mmio_write_cpending(struct kvm_vcpu *vcpu, gpa_t addr, unsigned int len, unsigned long val) { - bool is_uaccess = !vgic_get_mmio_requester_vcpu(); + bool is_uaccess = !kvm_get_running_vcpu(); u32 intid = VGIC_ADDR_TO_INTID(addr, 1); int i; unsigned long flags; @@ -335,7 +326,7 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq, bool active) { unsigned long flags; - struct kvm_vcpu *requester_vcpu = vgic_get_mmio_requester_vcpu(); + struct kvm_vcpu *requester_vcpu = kvm_get_running_vcpu(); raw_spin_lock_irqsave(&irq->irq_lock, flags); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 00268290dcbd..fac0760c870e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -108,6 +108,7 @@ struct kmem_cache *kvm_vcpu_cache; EXPORT_SYMBOL_GPL(kvm_vcpu_cache); static __read_mostly struct preempt_ops kvm_preempt_ops; +static DEFINE_PER_CPU(struct kvm_vcpu *, kvm_running_vcpu); struct dentry *kvm_debugfs_dir; EXPORT_SYMBOL_GPL(kvm_debugfs_dir); @@ -197,6 +198,8 @@ bool kvm_is_reserved_pfn(kvm_pfn_t pfn) void vcpu_load(struct kvm_vcpu *vcpu) { int cpu = get_cpu(); + + __this_cpu_write(kvm_running_vcpu, vcpu); preempt_notifier_register(&vcpu->preempt_notifier); kvm_arch_vcpu_load(vcpu, cpu); put_cpu(); @@ -208,6 +211,7 @@ void vcpu_put(struct kvm_vcpu *vcpu) preempt_disable(); kvm_arch_vcpu_put(vcpu); preempt_notifier_unregister(&vcpu->preempt_notifier); + __this_cpu_write(kvm_running_vcpu, NULL); preempt_enable(); } EXPORT_SYMBOL_GPL(vcpu_put); @@ -4304,8 +4308,8 @@ static void kvm_sched_in(struct preempt_notifier *pn, int cpu) WRITE_ONCE(vcpu->preempted, false); WRITE_ONCE(vcpu->ready, false); + __this_cpu_write(kvm_running_vcpu, vcpu); kvm_arch_sched_in(vcpu, cpu); - kvm_arch_vcpu_load(vcpu, cpu); } @@ -4319,6 +4323,25 @@ static void kvm_sched_out(struct preempt_notifier *pn, WRITE_ONCE(vcpu->ready, true); } kvm_arch_vcpu_put(vcpu); + __this_cpu_write(kvm_running_vcpu, NULL); +} + +/** + * kvm_get_running_vcpu - get the vcpu running on the current CPU. + * Thanks to preempt notifiers, this can also be called from + * preemptible context. + */ +struct kvm_vcpu *kvm_get_running_vcpu(void) +{ + return __this_cpu_read(kvm_running_vcpu); +} + +/** + * kvm_get_running_vcpus - get the per-CPU array of currently running vcpus. + */ +struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void) +{ + return &kvm_running_vcpu; } static void check_processor_compat(void *rtn) From patchwork Fri Nov 29 21:34:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11267657 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EAE5815AB for ; Fri, 29 Nov 2019 21:36:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C753121781 for ; Fri, 29 Nov 2019 21:36:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="XIweWHQt" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387418AbfK2Vgb (ORCPT ); Fri, 29 Nov 2019 16:36:31 -0500 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:40114 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727130AbfK2VfN (ORCPT ); Fri, 29 Nov 2019 16:35:13 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575063312; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nk+AMYp/B5018WGXHd+pNlZDxGbWGzQP/gZBrdpp4/w=; b=XIweWHQtTxvJ8Y+sbuTXSLBnuCIf7q7xs1FWTJLhd64YBWXJmC/t7esAeqGO5vECMWknuD Agk/w8wh2XGfNW+LAyjxaSGgSs1VURxap2sxagF1f6MSWGxQfpyp8sewxTsxn3Je/z1u0s gDiJZ6V2eArH3Pc8YQsrM4KOhN8Upfs= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-19-pqzvYRSmOQKc3_HUFCfyQQ-1; Fri, 29 Nov 2019 16:35:11 -0500 Received: by mail-qk1-f198.google.com with SMTP id q13so15605878qke.11 for ; Fri, 29 Nov 2019 13:35:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZBxIG8UMSN4DcK848hz6hNmhnmdyT63k+9vVPNH+Rtk=; b=gavvrP6WQflyVGTSMCdpafUs4LjMwAXC/APQWgumWXk+0q2a3DvjDbsD1vetO/wv6x iYWNp7PmHTibDJi4C+m6/Bt4Cv+ZAU6H8/M+XK6VgvFLAYqqNa1hen7zfm1zreDyd5t1 +t3E3P/WDNwofqMH1aVQmeBOvUAI5BpADid0c9lGzQU0n+33fBCNA146d3KXIWYG3CsJ 0vPlu7CRAA3sUfE4gytYtc2IYWAZlvP+VxJvE1ZWTwZERuw1+UGlSC+m16/k1T3WVEul WkRHl8Uu5kAvzwArHAdvbsT/RsKwRkltthWl46Pei7EUd1ybMp0v6+iG50Y+Ux1w/CR4 mE/A== X-Gm-Message-State: APjAAAVAZzI0BZkdw9l3/5pgUu6CU2E+l5nJGvmL4qrJKUuYp9sZJCLt 2mb5i9GW/VyV4+8SNaQ5ncPwrhvU4lUqt/jzQQVHJkqDLEE0/Ij+94upeEWJKzzsu9h7Ry41LJS fCmqpx4awyG0J X-Received: by 2002:ad4:4bc2:: with SMTP id l2mr9372701qvw.50.1575063311141; Fri, 29 Nov 2019 13:35:11 -0800 (PST) X-Google-Smtp-Source: APXvYqxotUIlHNSr/8iS+DZ+KTBCqLO8S5vzYAHxSYPZIcfOoaZ/NoLxHiCXQAhCdpl46tcNdIw+/g== X-Received: by 2002:ad4:4bc2:: with SMTP id l2mr9372678qvw.50.1575063310867; Fri, 29 Nov 2019 13:35:10 -0800 (PST) Received: from xz-x1.yyz.redhat.com ([104.156.64.74]) by smtp.gmail.com with ESMTPSA id h186sm10679046qkf.64.2019.11.29.13.35.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Nov 2019 13:35:09 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Sean Christopherson , Paolo Bonzini , "Dr . David Alan Gilbert" , peterx@redhat.com, Vitaly Kuznetsov Subject: [PATCH RFC 02/15] KVM: Add kvm/vcpu argument to mark_dirty_page_in_slot Date: Fri, 29 Nov 2019 16:34:52 -0500 Message-Id: <20191129213505.18472-3-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20191129213505.18472-1-peterx@redhat.com> References: <20191129213505.18472-1-peterx@redhat.com> MIME-Version: 1.0 X-MC-Unique: pqzvYRSmOQKc3_HUFCfyQQ-1 X-Mimecast-Spam-Score: 0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: "Cao, Lei" Signed-off-by: Cao, Lei Signed-off-by: Paolo Bonzini Signed-off-by: Peter Xu --- virt/kvm/kvm_main.c | 26 +++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index fac0760c870e..8f8940cc4b84 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -145,7 +145,10 @@ static void hardware_disable_all(void); static void kvm_io_bus_destroy(struct kvm_io_bus *bus); -static void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, gfn_t gfn); +static void mark_page_dirty_in_slot(struct kvm *kvm, + struct kvm_vcpu *vcpu, + struct kvm_memory_slot *memslot, + gfn_t gfn); __visible bool kvm_rebooting; EXPORT_SYMBOL_GPL(kvm_rebooting); @@ -2077,7 +2080,8 @@ int kvm_vcpu_read_guest_atomic(struct kvm_vcpu *vcpu, gpa_t gpa, } EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_atomic); -static int __kvm_write_guest_page(struct kvm_memory_slot *memslot, gfn_t gfn, +static int __kvm_write_guest_page(struct kvm *kvm, struct kvm_vcpu *vcpu, + struct kvm_memory_slot *memslot, gfn_t gfn, const void *data, int offset, int len) { int r; @@ -2089,7 +2093,7 @@ static int __kvm_write_guest_page(struct kvm_memory_slot *memslot, gfn_t gfn, r = __copy_to_user((void __user *)addr + offset, data, len); if (r) return -EFAULT; - mark_page_dirty_in_slot(memslot, gfn); + mark_page_dirty_in_slot(kvm, vcpu, memslot, gfn); return 0; } @@ -2098,7 +2102,8 @@ int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, { struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn); - return __kvm_write_guest_page(slot, gfn, data, offset, len); + return __kvm_write_guest_page(kvm, NULL, slot, gfn, data, + offset, len); } EXPORT_SYMBOL_GPL(kvm_write_guest_page); @@ -2107,7 +2112,8 @@ int kvm_vcpu_write_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, { struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); - return __kvm_write_guest_page(slot, gfn, data, offset, len); + return __kvm_write_guest_page(vcpu->kvm, vcpu, slot, gfn, data, + offset, len); } EXPORT_SYMBOL_GPL(kvm_vcpu_write_guest_page); @@ -2221,7 +2227,7 @@ int kvm_write_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, r = __copy_to_user((void __user *)ghc->hva + offset, data, len); if (r) return -EFAULT; - mark_page_dirty_in_slot(ghc->memslot, gpa >> PAGE_SHIFT); + mark_page_dirty_in_slot(kvm, NULL, ghc->memslot, gpa >> PAGE_SHIFT); return 0; } @@ -2286,7 +2292,9 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len) } EXPORT_SYMBOL_GPL(kvm_clear_guest); -static void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, +static void mark_page_dirty_in_slot(struct kvm *kvm, + struct kvm_vcpu *vcpu, + struct kvm_memory_slot *memslot, gfn_t gfn) { if (memslot && memslot->dirty_bitmap) { @@ -2301,7 +2309,7 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn) struct kvm_memory_slot *memslot; memslot = gfn_to_memslot(kvm, gfn); - mark_page_dirty_in_slot(memslot, gfn); + mark_page_dirty_in_slot(kvm, NULL, memslot, gfn); } EXPORT_SYMBOL_GPL(mark_page_dirty); @@ -2310,7 +2318,7 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn) struct kvm_memory_slot *memslot; memslot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); - mark_page_dirty_in_slot(memslot, gfn); + mark_page_dirty_in_slot(vcpu->kvm, vcpu, memslot, gfn); } EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty); From patchwork Fri Nov 29 21:34:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11267653 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F1AC315AB for ; Fri, 29 Nov 2019 21:36:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CE29D217AB for ; Fri, 29 Nov 2019 21:36:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="P9lYJ2tP" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727391AbfK2VfT (ORCPT ); Fri, 29 Nov 2019 16:35:19 -0500 Received: from us-smtp-2.mimecast.com ([205.139.110.61]:48060 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727379AbfK2VfS (ORCPT ); Fri, 29 Nov 2019 16:35:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575063316; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UeO3g+8EGPox+V7IC6l3NZw9JSTY7gBZHk7XmDQfPZM=; b=P9lYJ2tPvrJZACsKinI3PEhqg5XRBaZRtn9kBEW6zGdnSWnZUI/KGaRC9Raf1ZOptjJ/Sp qBcdGSBxmn/iRvTzaa38Z5cysAtQPPWWWEVnoUMGpUFgEfpp/Phrz9qXSAENHLi52SZym/ 3Hd0G44mCMvMoEVivbF+tmKIVfEOZyY= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-292-lPEEOwF1POKBL9bCXOMsjA-1; Fri, 29 Nov 2019 16:35:12 -0500 Received: by mail-qv1-f72.google.com with SMTP id d38so7498584qvc.5 for ; Fri, 29 Nov 2019 13:35:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=FRodLl/V9ZOtTlsVO+ztnjUNpRnKukvb2eZH8xGgAiY=; b=BXoxpW8nFBEvrREUcZKK1jBNgpECUtWT7rhh0tkMjhhcKmq9o74GBH8vXEqQ+llKQx RyC6hZML1KXoBj9UhPOnwtnrfRlKZmB/pu//BHJIsm+kc/HQKhkqEi80xLJEhbICHmKX 7ok+MlKSme25JWupVkMQwEqxixy7eqQSNWZzp7MxHTTYCEEdd5UejAF7cM8DLsUdYNKW oufSwYx9I9p9Twu36PYjUykLVRqHjyLzQN4RWBno+UDo/8bQNw+wsDxKcwm+TXsIFSQ2 s1OJbuOn1drzqkU/JgBUw3TpSotyXiZSd+BMde6s83yLXocKfd3aSvG8Pj5KTXEXsfpX JQZg== X-Gm-Message-State: APjAAAWldTptnqNtB2OmJeeKjESrosXmYNjKew0kI8QVrJvqrAW7FbDh ldTeBgyygK6/BZn4eBYSPyPtKy//dgH3RxVACq+2mp4ZU+TQL5t/5fGuvaxQPuADtaW8kpIXjcy DMabeoTOOBc2l X-Received: by 2002:a05:620a:791:: with SMTP id 17mr14299260qka.31.1575063312368; Fri, 29 Nov 2019 13:35:12 -0800 (PST) X-Google-Smtp-Source: APXvYqyGqiYFQUWgvgas2yyRUTgB8tPpZ6d427pLUydcx4nN9XKk1dNq0eHIFESm5vWkr4DFME/SSg== X-Received: by 2002:a05:620a:791:: with SMTP id 17mr14299242qka.31.1575063312153; Fri, 29 Nov 2019 13:35:12 -0800 (PST) Received: from xz-x1.yyz.redhat.com ([104.156.64.74]) by smtp.gmail.com with ESMTPSA id h186sm10679046qkf.64.2019.11.29.13.35.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Nov 2019 13:35:11 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Sean Christopherson , Paolo Bonzini , "Dr . David Alan Gilbert" , peterx@redhat.com, Vitaly Kuznetsov Subject: [PATCH RFC 03/15] KVM: Add build-time error check on kvm_run size Date: Fri, 29 Nov 2019 16:34:53 -0500 Message-Id: <20191129213505.18472-4-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20191129213505.18472-1-peterx@redhat.com> References: <20191129213505.18472-1-peterx@redhat.com> MIME-Version: 1.0 X-MC-Unique: lPEEOwF1POKBL9bCXOMsjA-1 X-Mimecast-Spam-Score: 0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org It's already going to reach 2400 Bytes (which is over half of page size on 4K page archs), so maybe it's good to have this build-time check in case it overflows when adding new fields. Signed-off-by: Peter Xu --- virt/kvm/kvm_main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 8f8940cc4b84..681452d288cd 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -352,6 +352,8 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) } vcpu->run = page_address(page); + BUILD_BUG_ON(sizeof(struct kvm_run) > PAGE_SIZE); + kvm_vcpu_set_in_spin_loop(vcpu, false); kvm_vcpu_set_dy_eligible(vcpu, false); vcpu->preempted = false; From patchwork Fri Nov 29 21:34:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11267649 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BF435109A for ; Fri, 29 Nov 2019 21:36:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7E239206B5 for ; Fri, 29 Nov 2019 21:36:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="RqyDOSOb" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727230AbfK2VgO (ORCPT ); Fri, 29 Nov 2019 16:36:14 -0500 Received: from us-smtp-1.mimecast.com ([205.139.110.61]:50141 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727389AbfK2VfV (ORCPT ); Fri, 29 Nov 2019 16:35:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575063318; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3cJNu0RAsiOhVZkNahn81e7bhUG0FnlgGkjFlwJhcSY=; b=RqyDOSObd+Vcqc8tZac678W6tWHaWsLql+uEcEn6+DnUosXPByTZCCk31ofPLQbSL6D6O+ /Sv0Hwk/SH8fLQLIMh4Ovu2H9i1mO8dKjHd6GklpQKr8IQtXE9iyuK/Awxk+Ghys/vdYIL f5fp3nuvNYp24bCPvTieHi6ddGKVGpQ= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-114-Asbvy4S1O0Ch6EUICpFNwA-1; Fri, 29 Nov 2019 16:35:15 -0500 Received: by mail-qt1-f198.google.com with SMTP id m8so17215260qta.20 for ; Fri, 29 Nov 2019 13:35:15 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=DQDfGuPpN8no+p10TXG4+N6RFnZjtdtsdIKGp+Yg/4U=; b=tMwZePCvhEKH/IAtY+47NU24ZOl3OlD/MoGVFF13e0sOuhZ83WZnq7t4uytYNTzn6I BHuHQn0OjDkNNU1JtipC7ehsXiX2hLsqWm7//aOrz6Ug/HXLnh6yn+2D9mjeoyt5DUYS E5kFdQVBKP43lVlFMxTPM8+cB/0It1KxZQ79U1dNxJ8bbwkn1UbWoVeMagsASAvyN5iW kgXqAzGVb9NaqmtM3jHZaxRELuZ2zM9QmZAL0IwrQqT7js6blLh68Q6l8xh8TJizemKb C6pcue4w+YuS2IzKR5jd71NP0NyOPl6SZZgxytnwalWcAMigdcyuGhLy+elLxarYNcxS GMHg== X-Gm-Message-State: APjAAAVU39EoEe/BZFx3FEyZOlPoeNej0dcnAYTGa0GAzJ//eYdyHzFT jLPQUDgtWqd0MDiZsswAVn7JUs/scqs6mTgnLdHdBJDtQnMcyqpwY2zi+BLTcI4pX36tqHySHrS 0+/GfHl5jB2Nt X-Received: by 2002:a37:4cd3:: with SMTP id z202mr18869864qka.212.1575063314625; Fri, 29 Nov 2019 13:35:14 -0800 (PST) X-Google-Smtp-Source: APXvYqyvjkbSiupbFnonrnDc1iMCVGbOn2TGZRHzTcNUEv8C05p+XP+iHXymnDvJFDGNw6ZyXx7bGQ== X-Received: by 2002:a37:4cd3:: with SMTP id z202mr18869792qka.212.1575063313675; Fri, 29 Nov 2019 13:35:13 -0800 (PST) Received: from xz-x1.yyz.redhat.com ([104.156.64.74]) by smtp.gmail.com with ESMTPSA id h186sm10679046qkf.64.2019.11.29.13.35.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Nov 2019 13:35:12 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Sean Christopherson , Paolo Bonzini , "Dr . David Alan Gilbert" , peterx@redhat.com, Vitaly Kuznetsov Subject: [PATCH RFC 04/15] KVM: Implement ring-based dirty memory tracking Date: Fri, 29 Nov 2019 16:34:54 -0500 Message-Id: <20191129213505.18472-5-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20191129213505.18472-1-peterx@redhat.com> References: <20191129213505.18472-1-peterx@redhat.com> MIME-Version: 1.0 X-MC-Unique: Asbvy4S1O0Ch6EUICpFNwA-1 X-Mimecast-Spam-Score: 0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch is heavily based on previous work from Lei Cao and Paolo Bonzini . [1] KVM currently uses large bitmaps to track dirty memory. These bitmaps are copied to userspace when userspace queries KVM for its dirty page information. The use of bitmaps is mostly sufficient for live migration, as large parts of memory are be dirtied from one log-dirty pass to another. However, in a checkpointing system, the number of dirty pages is small and in fact it is often bounded---the VM is paused when it has dirtied a pre-defined number of pages. Traversing a large, sparsely populated bitmap to find set bits is time-consuming, as is copying the bitmap to user-space. A similar issue will be there for live migration when the guest memory is huge while the page dirty procedure is trivial. In that case for each dirty sync we need to pull the whole dirty bitmap to userspace and analyse every bit even if it's mostly zeros. The preferred data structure for above scenarios is a dense list of guest frame numbers (GFN). This patch series stores the dirty list in kernel memory that can be memory mapped into userspace to allow speedy harvesting. We defined two new data structures: struct kvm_dirty_ring; struct kvm_dirty_ring_indexes; Firstly, kvm_dirty_ring is defined to represent a ring of dirty pages. When dirty tracking is enabled, we can push dirty gfn onto the ring. Secondly, kvm_dirty_ring_indexes is defined to represent the user/kernel interface of each ring. Currently it contains two indexes: (1) avail_index represents where we should push our next PFN (written by kernel), while (2) fetch_index represents where the userspace should fetch the next dirty PFN (written by userspace). One complete ring is composed by one kvm_dirty_ring plus its corresponding kvm_dirty_ring_indexes. Currently, we have N+1 rings for each VM of N vcpus: - for each vcpu, we have 1 per-vcpu dirty ring, - for each vm, we have 1 per-vm dirty ring Please refer to the documentation update in this patch for more details. Note that this patch implements the core logic of dirty ring buffer. It's still disabled for all archs for now. Also, we'll address some of the other issues in follow up patches before it's firstly enabled on x86. [1] https://patchwork.kernel.org/patch/10471409/ Signed-off-by: Lei Cao Signed-off-by: Paolo Bonzini Signed-off-by: Peter Xu --- Documentation/virt/kvm/api.txt | 109 +++++++++++++++ arch/x86/kvm/Makefile | 3 +- include/linux/kvm_dirty_ring.h | 67 +++++++++ include/linux/kvm_host.h | 33 +++++ include/linux/kvm_types.h | 1 + include/uapi/linux/kvm.h | 36 +++++ virt/kvm/dirty_ring.c | 156 +++++++++++++++++++++ virt/kvm/kvm_main.c | 240 ++++++++++++++++++++++++++++++++- 8 files changed, 642 insertions(+), 3 deletions(-) create mode 100644 include/linux/kvm_dirty_ring.h create mode 100644 virt/kvm/dirty_ring.c diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt index 49183add44e7..fa622c9a2eb8 100644 --- a/Documentation/virt/kvm/api.txt +++ b/Documentation/virt/kvm/api.txt @@ -231,6 +231,7 @@ Based on their initialization different VMs may have different capabilities. It is thus encouraged to use the vm ioctl to query for capabilities (available with KVM_CAP_CHECK_EXTENSION_VM on the vm fd) + 4.5 KVM_GET_VCPU_MMAP_SIZE Capability: basic @@ -243,6 +244,18 @@ The KVM_RUN ioctl (cf.) communicates with userspace via a shared memory region. This ioctl returns the size of that region. See the KVM_RUN documentation for details. +Besides the size of the KVM_RUN communication region, other areas of +the VCPU file descriptor can be mmap-ed, including: + +- if KVM_CAP_COALESCED_MMIO is available, a page at + KVM_COALESCED_MMIO_PAGE_OFFSET * PAGE_SIZE; for historical reasons, + this page is included in the result of KVM_GET_VCPU_MMAP_SIZE. + KVM_CAP_COALESCED_MMIO is not documented yet. + +- if KVM_CAP_DIRTY_LOG_RING is available, a number of pages at + KVM_DIRTY_LOG_PAGE_OFFSET * PAGE_SIZE. For more information on + KVM_CAP_DIRTY_LOG_RING, see section 8.3. + 4.6 KVM_SET_MEMORY_REGION @@ -5358,6 +5371,7 @@ CPU when the exception is taken. If this virtual SError is taken to EL1 using AArch64, this value will be reported in the ISS field of ESR_ELx. See KVM_CAP_VCPU_EVENTS for more details. + 8.20 KVM_CAP_HYPERV_SEND_IPI Architectures: x86 @@ -5365,6 +5379,7 @@ Architectures: x86 This capability indicates that KVM supports paravirtualized Hyper-V IPI send hypercalls: HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx. + 8.21 KVM_CAP_HYPERV_DIRECT_TLBFLUSH Architecture: x86 @@ -5378,3 +5393,97 @@ handling by KVM (as some KVM hypercall may be mistakenly treated as TLB flush hypercalls by Hyper-V) so userspace should disable KVM identification in CPUID and only exposes Hyper-V identification. In this case, guest thinks it's running on Hyper-V and only use Hyper-V hypercalls. + +8.22 KVM_CAP_DIRTY_LOG_RING + +Architectures: x86 +Parameters: args[0] - size of the dirty log ring + +KVM is capable of tracking dirty memory using ring buffers that are +mmaped into userspace; there is one dirty ring per vcpu and one global +ring per vm. + +One dirty ring has the following two major structures: + +struct kvm_dirty_ring { + u16 dirty_index; + u16 reset_index; + u32 size; + u32 soft_limit; + spinlock_t lock; + struct kvm_dirty_gfn *dirty_gfns; +}; + +struct kvm_dirty_ring_indexes { + __u32 avail_index; /* set by kernel */ + __u32 fetch_index; /* set by userspace */ +}; + +While for each of the dirty entry it's defined as: + +struct kvm_dirty_gfn { + __u32 pad; + __u32 slot; /* as_id | slot_id */ + __u64 offset; +}; + +The fields in kvm_dirty_ring will be only internal to KVM itself, +while the fields in kvm_dirty_ring_indexes will be exposed to +userspace to be either read or written. + +The two indices in the ring buffer are free running counters. + +In pseudocode, processing the ring buffer looks like this: + + idx = load-acquire(&ring->fetch_index); + while (idx != ring->avail_index) { + struct kvm_dirty_gfn *entry; + entry = &ring->dirty_gfns[idx & (size - 1)]; + ... + + idx++; + } + ring->fetch_index = idx; + +Userspace calls KVM_ENABLE_CAP ioctl right after KVM_CREATE_VM ioctl +to enable this capability for the new guest and set the size of the +rings. It is only allowed before creating any vCPU, and the size of +the ring must be a power of two. The larger the ring buffer, the less +likely the ring is full and the VM is forced to exit to userspace. The +optimal size depends on the workload, but it is recommended that it be +at least 64 KiB (4096 entries). + +After the capability is enabled, userspace can mmap the global ring +buffer (kvm_dirty_gfn[], offset KVM_DIRTY_LOG_PAGE_OFFSET) and the +indexes (kvm_dirty_ring_indexes, offset 0) from the VM file +descriptor. The per-vcpu dirty ring instead is mmapped when the vcpu +is created, similar to the kvm_run struct (kvm_dirty_ring_indexes +locates inside kvm_run, while kvm_dirty_gfn[] at offset +KVM_DIRTY_LOG_PAGE_OFFSET). + +Just like for dirty page bitmaps, the buffer tracks writes to +all user memory regions for which the KVM_MEM_LOG_DIRTY_PAGES flag was +set in KVM_SET_USER_MEMORY_REGION. Once a memory region is registered +with the flag set, userspace can start harvesting dirty pages from the +ring buffer. + +To harvest the dirty pages, userspace accesses the mmaped ring buffer +to read the dirty GFNs up to avail_index, and sets the fetch_index +accordingly. This can be done when the guest is running or paused, +and dirty pages need not be collected all at once. After processing +one or more entries in the ring buffer, userspace calls the VM ioctl +KVM_RESET_DIRTY_RINGS to notify the kernel that it has updated +fetch_index and to mark those pages clean. Therefore, the ioctl +must be called *before* reading the content of the dirty pages. + +However, there is a major difference comparing to the +KVM_GET_DIRTY_LOG interface in that when reading the dirty ring from +userspace it's still possible that the kernel has not yet flushed the +hardware dirty buffers into the kernel buffer. To achieve that, one +needs to kick the vcpu out for a hardware buffer flush (vmexit). + +If one of the ring buffers is full, the guest will exit to userspace +with the exit reason set to KVM_EXIT_DIRTY_LOG_FULL, and the +KVM_RUN ioctl will return -EINTR. Once that happens, userspace +should pause all the vcpus, then harvest all the dirty pages and +rearm the dirty traps. It can unpause the guest after that. diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index b19ef421084d..0acee817adfb 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -5,7 +5,8 @@ ccflags-y += -Iarch/x86/kvm KVM := ../../../virt/kvm kvm-y += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \ - $(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o + $(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o \ + $(KVM)/dirty_ring.o kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o kvm-y += x86.o emulate.o i8259.o irq.o lapic.o \ diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h new file mode 100644 index 000000000000..8335635b7ff7 --- /dev/null +++ b/include/linux/kvm_dirty_ring.h @@ -0,0 +1,67 @@ +#ifndef KVM_DIRTY_RING_H +#define KVM_DIRTY_RING_H + +/* + * struct kvm_dirty_ring is defined in include/uapi/linux/kvm.h. + * + * dirty_ring: shared with userspace via mmap. It is the compact list + * that holds the dirty pages. + * dirty_index: free running counter that points to the next slot in + * dirty_ring->dirty_gfns where a new dirty page should go. + * reset_index: free running counter that points to the next dirty page + * in dirty_ring->dirty_gfns for which dirty trap needs to + * be reenabled + * size: size of the compact list, dirty_ring->dirty_gfns + * soft_limit: when the number of dirty pages in the list reaches this + * limit, vcpu that owns this ring should exit to userspace + * to allow userspace to harvest all the dirty pages + * lock: protects dirty_ring, only in use if this is the global + * ring + * + * The number of dirty pages in the ring is calculated by, + * dirty_index - reset_index + * + * kernel increments dirty_ring->indices.avail_index after dirty index + * is incremented. When userspace harvests the dirty pages, it increments + * dirty_ring->indices.fetch_index up to dirty_ring->indices.avail_index. + * When kernel reenables dirty traps for the dirty pages, it increments + * reset_index up to dirty_ring->indices.fetch_index. + * + */ +struct kvm_dirty_ring { + u32 dirty_index; + u32 reset_index; + u32 size; + u32 soft_limit; + spinlock_t lock; + struct kvm_dirty_gfn *dirty_gfns; +}; + +u32 kvm_dirty_ring_get_rsvd_entries(void); +int kvm_dirty_ring_alloc(struct kvm *kvm, struct kvm_dirty_ring *ring); + +/* + * called with kvm->slots_lock held, returns the number of + * processed pages. + */ +int kvm_dirty_ring_reset(struct kvm *kvm, + struct kvm_dirty_ring *ring, + struct kvm_dirty_ring_indexes *indexes); + +/* + * returns 0: successfully pushed + * 1: successfully pushed, soft limit reached, + * vcpu should exit to userspace + * -EBUSY: unable to push, dirty ring full. + */ +int kvm_dirty_ring_push(struct kvm_dirty_ring *ring, + struct kvm_dirty_ring_indexes *indexes, + u32 slot, u64 offset, bool lock); + +/* for use in vm_operations_struct */ +struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 i); + +void kvm_dirty_ring_free(struct kvm_dirty_ring *ring); +bool kvm_dirty_ring_full(struct kvm_dirty_ring *ring); + +#endif diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 498a39462ac1..7b747bc9ff3e 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -34,6 +34,7 @@ #include #include +#include #ifndef KVM_MAX_VCPU_ID #define KVM_MAX_VCPU_ID KVM_MAX_VCPUS @@ -146,6 +147,7 @@ static inline bool is_error_page(struct page *page) #define KVM_REQ_MMU_RELOAD (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_PENDING_TIMER 2 #define KVM_REQ_UNHALT 3 +#define KVM_REQ_DIRTY_RING_FULL 4 #define KVM_REQUEST_ARCH_BASE 8 #define KVM_ARCH_REQ_FLAGS(nr, flags) ({ \ @@ -321,6 +323,7 @@ struct kvm_vcpu { bool ready; struct kvm_vcpu_arch arch; struct dentry *debugfs_dentry; + struct kvm_dirty_ring dirty_ring; }; static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu) @@ -501,6 +504,10 @@ struct kvm { struct srcu_struct srcu; struct srcu_struct irq_srcu; pid_t userspace_pid; + /* Data structure to be exported by mmap(kvm->fd, 0) */ + struct kvm_vm_run *vm_run; + u32 dirty_ring_size; + struct kvm_dirty_ring vm_dirty_ring; }; #define kvm_err(fmt, ...) \ @@ -832,6 +839,8 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, gfn_t gfn_offset, unsigned long mask); +void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask); + int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log); int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm, @@ -1411,4 +1420,28 @@ int kvm_vm_create_worker_thread(struct kvm *kvm, kvm_vm_thread_fn_t thread_fn, uintptr_t data, const char *name, struct task_struct **thread_ptr); +/* + * This defines how many reserved entries we want to keep before we + * kick the vcpu to the userspace to avoid dirty ring full. This + * value can be tuned to higher if e.g. PML is enabled on the host. + */ +#define KVM_DIRTY_RING_RSVD_ENTRIES 64 + +/* Max number of entries allowed for each kvm dirty ring */ +#define KVM_DIRTY_RING_MAX_ENTRIES 65536 + +/* + * Arch needs to define these macro after implementing the dirty ring + * feature. KVM_DIRTY_LOG_PAGE_OFFSET should be defined as the + * starting page offset of the dirty ring structures, while + * KVM_DIRTY_RING_VERSION should be defined as >=1. By default, this + * feature is off on all archs. + */ +#ifndef KVM_DIRTY_LOG_PAGE_OFFSET +#define KVM_DIRTY_LOG_PAGE_OFFSET 0 +#endif +#ifndef KVM_DIRTY_RING_VERSION +#define KVM_DIRTY_RING_VERSION 0 +#endif + #endif diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h index 1c88e69db3d9..d9d03eea145a 100644 --- a/include/linux/kvm_types.h +++ b/include/linux/kvm_types.h @@ -11,6 +11,7 @@ struct kvm_irq_routing_table; struct kvm_memory_slot; struct kvm_one_reg; struct kvm_run; +struct kvm_vm_run; struct kvm_userspace_memory_region; struct kvm_vcpu; struct kvm_vcpu_init; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index e6f17c8e2dba..0b88d76d6215 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -236,6 +236,7 @@ struct kvm_hyperv_exit { #define KVM_EXIT_IOAPIC_EOI 26 #define KVM_EXIT_HYPERV 27 #define KVM_EXIT_ARM_NISV 28 +#define KVM_EXIT_DIRTY_RING_FULL 29 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -247,6 +248,11 @@ struct kvm_hyperv_exit { /* Encounter unexpected vm-exit reason */ #define KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON 4 +struct kvm_dirty_ring_indexes { + __u32 avail_index; /* set by kernel */ + __u32 fetch_index; /* set by userspace */ +}; + /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */ struct kvm_run { /* in */ @@ -421,6 +427,13 @@ struct kvm_run { struct kvm_sync_regs regs; char padding[SYNC_REGS_SIZE_BYTES]; } s; + + struct kvm_dirty_ring_indexes vcpu_ring_indexes; +}; + +/* Returned by mmap(kvm->fd, offset=0) */ +struct kvm_vm_run { + struct kvm_dirty_ring_indexes vm_ring_indexes; }; /* for KVM_REGISTER_COALESCED_MMIO / KVM_UNREGISTER_COALESCED_MMIO */ @@ -1009,6 +1022,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_PPC_GUEST_DEBUG_SSTEP 176 #define KVM_CAP_ARM_NISV_TO_USER 177 #define KVM_CAP_ARM_INJECT_EXT_DABT 178 +#define KVM_CAP_DIRTY_LOG_RING 179 #ifdef KVM_CAP_IRQ_ROUTING @@ -1472,6 +1486,9 @@ struct kvm_enc_region { /* Available with KVM_CAP_ARM_SVE */ #define KVM_ARM_VCPU_FINALIZE _IOW(KVMIO, 0xc2, int) +/* Available with KVM_CAP_DIRTY_LOG_RING */ +#define KVM_RESET_DIRTY_RINGS _IO(KVMIO, 0xc3) + /* Secure Encrypted Virtualization command */ enum sev_cmd_id { /* Guest initialization commands */ @@ -1622,4 +1639,23 @@ struct kvm_hyperv_eventfd { #define KVM_HYPERV_CONN_ID_MASK 0x00ffffff #define KVM_HYPERV_EVENTFD_DEASSIGN (1 << 0) +/* + * The following are the requirements for supporting dirty log ring + * (by enabling KVM_DIRTY_LOG_PAGE_OFFSET). + * + * 1. Memory accesses by KVM should call kvm_vcpu_write_* instead + * of kvm_write_* so that the global dirty ring is not filled up + * too quickly. + * 2. kvm_arch_mmu_enable_log_dirty_pt_masked should be defined for + * enabling dirty logging. + * 3. There should not be a separate step to synchronize hardware + * dirty bitmap with KVM's. + */ + +struct kvm_dirty_gfn { + __u32 pad; + __u32 slot; + __u64 offset; +}; + #endif /* __LINUX_KVM_H */ diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c new file mode 100644 index 000000000000..9264891f3c32 --- /dev/null +++ b/virt/kvm/dirty_ring.c @@ -0,0 +1,156 @@ +#include +#include +#include +#include + +u32 kvm_dirty_ring_get_rsvd_entries(void) +{ + return KVM_DIRTY_RING_RSVD_ENTRIES + kvm_cpu_dirty_log_size(); +} + +int kvm_dirty_ring_alloc(struct kvm *kvm, struct kvm_dirty_ring *ring) +{ + u32 size = kvm->dirty_ring_size; + + ring->dirty_gfns = vmalloc(size); + if (!ring->dirty_gfns) + return -ENOMEM; + memset(ring->dirty_gfns, 0, size); + + ring->size = size / sizeof(struct kvm_dirty_gfn); + ring->soft_limit = + (kvm->dirty_ring_size / sizeof(struct kvm_dirty_gfn)) - + kvm_dirty_ring_get_rsvd_entries(); + ring->dirty_index = 0; + ring->reset_index = 0; + spin_lock_init(&ring->lock); + + return 0; +} + +int kvm_dirty_ring_reset(struct kvm *kvm, + struct kvm_dirty_ring *ring, + struct kvm_dirty_ring_indexes *indexes) +{ + u32 cur_slot, next_slot; + u64 cur_offset, next_offset; + unsigned long mask; + u32 fetch; + int count = 0; + struct kvm_dirty_gfn *entry; + + fetch = READ_ONCE(indexes->fetch_index); + if (fetch == ring->reset_index) + return 0; + + entry = &ring->dirty_gfns[ring->reset_index & (ring->size - 1)]; + /* + * The ring buffer is shared with userspace, which might mmap + * it and concurrently modify slot and offset. Userspace must + * not be trusted! READ_ONCE prevents the compiler from changing + * the values after they've been range-checked (the checks are + * in kvm_reset_dirty_gfn). + */ + smp_read_barrier_depends(); + cur_slot = READ_ONCE(entry->slot); + cur_offset = READ_ONCE(entry->offset); + mask = 1; + count++; + ring->reset_index++; + while (ring->reset_index != fetch) { + entry = &ring->dirty_gfns[ring->reset_index & (ring->size - 1)]; + smp_read_barrier_depends(); + next_slot = READ_ONCE(entry->slot); + next_offset = READ_ONCE(entry->offset); + ring->reset_index++; + count++; + /* + * Try to coalesce the reset operations when the guest is + * scanning pages in the same slot. + */ + if (next_slot == cur_slot) { + int delta = next_offset - cur_offset; + + if (delta >= 0 && delta < BITS_PER_LONG) { + mask |= 1ull << delta; + continue; + } + + /* Backwards visit, careful about overflows! */ + if (delta > -BITS_PER_LONG && delta < 0 && + (mask << -delta >> -delta) == mask) { + cur_offset = next_offset; + mask = (mask << -delta) | 1; + continue; + } + } + kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask); + cur_slot = next_slot; + cur_offset = next_offset; + mask = 1; + } + kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask); + + return count; +} + +static inline u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring) +{ + return ring->dirty_index - ring->reset_index; +} + +bool kvm_dirty_ring_full(struct kvm_dirty_ring *ring) +{ + return kvm_dirty_ring_used(ring) >= ring->size; +} + +/* + * Returns: + * >0 if we should kick the vcpu out, + * =0 if the gfn pushed successfully, or, + * <0 if error (e.g. ring full) + */ +int kvm_dirty_ring_push(struct kvm_dirty_ring *ring, + struct kvm_dirty_ring_indexes *indexes, + u32 slot, u64 offset, bool lock) +{ + int ret; + struct kvm_dirty_gfn *entry; + + if (lock) + spin_lock(&ring->lock); + + if (kvm_dirty_ring_full(ring)) { + ret = -EBUSY; + goto out; + } + + entry = &ring->dirty_gfns[ring->dirty_index & (ring->size - 1)]; + entry->slot = slot; + entry->offset = offset; + smp_wmb(); + ring->dirty_index++; + WRITE_ONCE(indexes->avail_index, ring->dirty_index); + ret = kvm_dirty_ring_used(ring) >= ring->soft_limit; + pr_info("%s: slot %u offset %llu used %u\n", + __func__, slot, offset, kvm_dirty_ring_used(ring)); + +out: + if (lock) + spin_unlock(&ring->lock); + + return ret; +} + +struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 i) +{ + return vmalloc_to_page((void *)ring->dirty_gfns + i * PAGE_SIZE); +} + +void kvm_dirty_ring_free(struct kvm_dirty_ring *ring) +{ + if (ring->dirty_gfns) { + vfree(ring->dirty_gfns); + ring->dirty_gfns = NULL; + } +} diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 681452d288cd..8642c977629b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -64,6 +64,8 @@ #define CREATE_TRACE_POINTS #include +#include + /* Worst case buffer size needed for holding an integer. */ #define ITOA_MAX_LEN 12 @@ -149,6 +151,10 @@ static void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot, gfn_t gfn); +static void mark_page_dirty_in_ring(struct kvm *kvm, + struct kvm_vcpu *vcpu, + struct kvm_memory_slot *slot, + gfn_t gfn); __visible bool kvm_rebooting; EXPORT_SYMBOL_GPL(kvm_rebooting); @@ -359,11 +365,22 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) vcpu->preempted = false; vcpu->ready = false; + if (kvm->dirty_ring_size) { + r = kvm_dirty_ring_alloc(vcpu->kvm, &vcpu->dirty_ring); + if (r) { + kvm->dirty_ring_size = 0; + goto fail_free_run; + } + } + r = kvm_arch_vcpu_init(vcpu); if (r < 0) - goto fail_free_run; + goto fail_free_ring; return 0; +fail_free_ring: + if (kvm->dirty_ring_size) + kvm_dirty_ring_free(&vcpu->dirty_ring); fail_free_run: free_page((unsigned long)vcpu->run); fail: @@ -381,6 +398,8 @@ void kvm_vcpu_uninit(struct kvm_vcpu *vcpu) put_pid(rcu_dereference_protected(vcpu->pid, 1)); kvm_arch_vcpu_uninit(vcpu); free_page((unsigned long)vcpu->run); + if (vcpu->kvm->dirty_ring_size) + kvm_dirty_ring_free(&vcpu->dirty_ring); } EXPORT_SYMBOL_GPL(kvm_vcpu_uninit); @@ -690,6 +709,7 @@ static struct kvm *kvm_create_vm(unsigned long type) struct kvm *kvm = kvm_arch_alloc_vm(); int r = -ENOMEM; int i; + struct page *page; if (!kvm) return ERR_PTR(-ENOMEM); @@ -705,6 +725,14 @@ static struct kvm *kvm_create_vm(unsigned long type) BUILD_BUG_ON(KVM_MEM_SLOTS_NUM > SHRT_MAX); + page = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!page) { + r = -ENOMEM; + goto out_err_alloc_page; + } + kvm->vm_run = page_address(page); + BUILD_BUG_ON(sizeof(struct kvm_vm_run) > PAGE_SIZE); + if (init_srcu_struct(&kvm->srcu)) goto out_err_no_srcu; if (init_srcu_struct(&kvm->irq_srcu)) @@ -775,6 +803,9 @@ static struct kvm *kvm_create_vm(unsigned long type) out_err_no_irq_srcu: cleanup_srcu_struct(&kvm->srcu); out_err_no_srcu: + free_page((unsigned long)page); + kvm->vm_run = NULL; +out_err_alloc_page: kvm_arch_free_vm(kvm); mmdrop(current->mm); return ERR_PTR(r); @@ -800,6 +831,15 @@ static void kvm_destroy_vm(struct kvm *kvm) int i; struct mm_struct *mm = kvm->mm; + if (kvm->dirty_ring_size) { + kvm_dirty_ring_free(&kvm->vm_dirty_ring); + } + + if (kvm->vm_run) { + free_page((unsigned long)kvm->vm_run); + kvm->vm_run = NULL; + } + kvm_uevent_notify_change(KVM_EVENT_DESTROY_VM, kvm); kvm_destroy_vm_debugfs(kvm); kvm_arch_sync_events(kvm); @@ -2301,7 +2341,7 @@ static void mark_page_dirty_in_slot(struct kvm *kvm, { if (memslot && memslot->dirty_bitmap) { unsigned long rel_gfn = gfn - memslot->base_gfn; - + mark_page_dirty_in_ring(kvm, vcpu, memslot, gfn); set_bit_le(rel_gfn, memslot->dirty_bitmap); } } @@ -2649,6 +2689,13 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode) } EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin); +static bool kvm_fault_in_dirty_ring(struct kvm *kvm, struct vm_fault *vmf) +{ + return (vmf->pgoff >= KVM_DIRTY_LOG_PAGE_OFFSET) && + (vmf->pgoff < KVM_DIRTY_LOG_PAGE_OFFSET + + kvm->dirty_ring_size / PAGE_SIZE); +} + static vm_fault_t kvm_vcpu_fault(struct vm_fault *vmf) { struct kvm_vcpu *vcpu = vmf->vma->vm_file->private_data; @@ -2664,6 +2711,10 @@ static vm_fault_t kvm_vcpu_fault(struct vm_fault *vmf) else if (vmf->pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET) page = virt_to_page(vcpu->kvm->coalesced_mmio_ring); #endif + else if (kvm_fault_in_dirty_ring(vcpu->kvm, vmf)) + page = kvm_dirty_ring_get_page( + &vcpu->dirty_ring, + vmf->pgoff - KVM_DIRTY_LOG_PAGE_OFFSET); else return kvm_arch_vcpu_fault(vcpu, vmf); get_page(page); @@ -3259,12 +3310,162 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) #endif case KVM_CAP_NR_MEMSLOTS: return KVM_USER_MEM_SLOTS; + case KVM_CAP_DIRTY_LOG_RING: + /* Version will be zero if arch didn't implement it */ + return KVM_DIRTY_RING_VERSION; default: break; } return kvm_vm_ioctl_check_extension(kvm, arg); } +static void mark_page_dirty_in_ring(struct kvm *kvm, + struct kvm_vcpu *vcpu, + struct kvm_memory_slot *slot, + gfn_t gfn) +{ + u32 as_id = 0; + u64 offset; + int ret; + struct kvm_dirty_ring *ring; + struct kvm_dirty_ring_indexes *indexes; + bool is_vm_ring; + + if (!kvm->dirty_ring_size) + return; + + offset = gfn - slot->base_gfn; + + if (vcpu) { + as_id = kvm_arch_vcpu_memslots_id(vcpu); + } else { + as_id = 0; + vcpu = kvm_get_running_vcpu(); + } + + if (vcpu) { + ring = &vcpu->dirty_ring; + indexes = &vcpu->run->vcpu_ring_indexes; + is_vm_ring = false; + } else { + /* + * Put onto per vm ring because no vcpu context. Kick + * vcpu0 if ring is full. + */ + vcpu = kvm->vcpus[0]; + ring = &kvm->vm_dirty_ring; + indexes = &kvm->vm_run->vm_ring_indexes; + is_vm_ring = true; + } + + ret = kvm_dirty_ring_push(ring, indexes, + (as_id << 16)|slot->id, offset, + is_vm_ring); + if (ret < 0) { + if (is_vm_ring) + pr_warn_once("vcpu %d dirty log overflow\n", + vcpu->vcpu_id); + else + pr_warn_once("per-vm dirty log overflow\n"); + return; + } + + if (ret) + kvm_make_request(KVM_REQ_DIRTY_RING_FULL, vcpu); +} + +void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask) +{ + struct kvm_memory_slot *memslot; + int as_id, id; + + as_id = slot >> 16; + id = (u16)slot; + if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS) + return; + + memslot = id_to_memslot(__kvm_memslots(kvm, as_id), id); + if (offset >= memslot->npages) + return; + + spin_lock(&kvm->mmu_lock); + /* FIXME: we should use a single AND operation, but there is no + * applicable atomic API. + */ + while (mask) { + clear_bit_le(offset + __ffs(mask), memslot->dirty_bitmap); + mask &= mask - 1; + } + + kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot, offset, mask); + spin_unlock(&kvm->mmu_lock); +} + +static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size) +{ + int r; + + /* the size should be power of 2 */ + if (!size || (size & (size - 1))) + return -EINVAL; + + /* Should be bigger to keep the reserved entries, or a page */ + if (size < kvm_dirty_ring_get_rsvd_entries() * + sizeof(struct kvm_dirty_gfn) || size < PAGE_SIZE) + return -EINVAL; + + if (size > KVM_DIRTY_RING_MAX_ENTRIES * + sizeof(struct kvm_dirty_gfn)) + return -E2BIG; + + /* We only allow it to set once */ + if (kvm->dirty_ring_size) + return -EINVAL; + + mutex_lock(&kvm->lock); + + if (kvm->created_vcpus) { + /* We don't allow to change this value after vcpu created */ + r = -EINVAL; + } else { + kvm->dirty_ring_size = size; + r = kvm_dirty_ring_alloc(kvm, &kvm->vm_dirty_ring); + if (r) { + /* Unset dirty ring */ + kvm->dirty_ring_size = 0; + } + } + + mutex_unlock(&kvm->lock); + return r; +} + +static int kvm_vm_ioctl_reset_dirty_pages(struct kvm *kvm) +{ + int i; + struct kvm_vcpu *vcpu; + int cleared = 0; + + if (!kvm->dirty_ring_size) + return -EINVAL; + + mutex_lock(&kvm->slots_lock); + + cleared += kvm_dirty_ring_reset(kvm, &kvm->vm_dirty_ring, + &kvm->vm_run->vm_ring_indexes); + + kvm_for_each_vcpu(i, vcpu, kvm) + cleared += kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring, + &vcpu->run->vcpu_ring_indexes); + + mutex_unlock(&kvm->slots_lock); + + if (cleared) + kvm_flush_remote_tlbs(kvm); + + return cleared; +} + int __attribute__((weak)) kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) { @@ -3282,6 +3483,8 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm, kvm->manual_dirty_log_protect = cap->args[0]; return 0; #endif + case KVM_CAP_DIRTY_LOG_RING: + return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]); default: return kvm_vm_ioctl_enable_cap(kvm, cap); } @@ -3469,6 +3672,9 @@ static long kvm_vm_ioctl(struct file *filp, case KVM_CHECK_EXTENSION: r = kvm_vm_ioctl_check_extension_generic(kvm, arg); break; + case KVM_RESET_DIRTY_RINGS: + r = kvm_vm_ioctl_reset_dirty_pages(kvm); + break; default: r = kvm_arch_vm_ioctl(filp, ioctl, arg); } @@ -3517,9 +3723,39 @@ static long kvm_vm_compat_ioctl(struct file *filp, } #endif +static vm_fault_t kvm_vm_fault(struct vm_fault *vmf) +{ + struct kvm *kvm = vmf->vma->vm_file->private_data; + struct page *page = NULL; + + if (vmf->pgoff == 0) + page = virt_to_page(kvm->vm_run); + else if (kvm_fault_in_dirty_ring(kvm, vmf)) + page = kvm_dirty_ring_get_page( + &kvm->vm_dirty_ring, + vmf->pgoff - KVM_DIRTY_LOG_PAGE_OFFSET); + else + return VM_FAULT_SIGBUS; + + get_page(page); + vmf->page = page; + return 0; +} + +static const struct vm_operations_struct kvm_vm_vm_ops = { + .fault = kvm_vm_fault, +}; + +static int kvm_vm_mmap(struct file *file, struct vm_area_struct *vma) +{ + vma->vm_ops = &kvm_vm_vm_ops; + return 0; +} + static struct file_operations kvm_vm_fops = { .release = kvm_vm_release, .unlocked_ioctl = kvm_vm_ioctl, + .mmap = kvm_vm_mmap, .llseek = noop_llseek, KVM_COMPAT(kvm_vm_compat_ioctl), }; From patchwork Fri Nov 29 21:34:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11267655 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B51FB109A for ; Fri, 29 Nov 2019 21:36:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 932FB21781 for ; Fri, 29 Nov 2019 21:36:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="JeH8i2yy" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727356AbfK2VgZ (ORCPT ); Fri, 29 Nov 2019 16:36:25 -0500 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:48727 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727378AbfK2VfR (ORCPT ); Fri, 29 Nov 2019 16:35:17 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575063316; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SUDNloV8++elS+YB1xoQTLeI0C6RrQjWp10JVReyn5w=; b=JeH8i2yyfDXtvyBRYVokN+w8YovVbOhtUeQ4wyIBVlqANM6gEe6PvmMF+gKsi80F0+3TUY pMx/Dgvuqy64T/j/OfYiQo13kGbXnPsJcgu1qZ+DuoAsLiaqr+3MNA4SoJYPZflMnOAMxQ IFmGRtHyUuZDriGai3mgr3Fjb7p7/2w= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-121-hlo-OIEGNi2h1oNcLDJtZQ-1; Fri, 29 Nov 2019 16:35:15 -0500 Received: by mail-qk1-f199.google.com with SMTP id o184so17663713qkf.14 for ; Fri, 29 Nov 2019 13:35:15 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=NO+3fvoemBR5nsjjdrhR5UeVlYU30oVRytgNjHOWnCo=; b=nG3Bw23sgvYBUU1V6+iPz2ClILWxxKrnFZdDe1+7NlOx5DAeZkkTrvEQCp2nnEpj/G ZGJSRWS9D5zPnybfyoiyuYNxKIEJEoWA6AvPrApHxxScwu0zCcrdqmocCzNMc/GNZmGs 7crkkKYflSaywx/kbVEWJl+R9lBGebVMHutkfv7WdzdPw/QWj1adIHiE2yldOxngbe7T BCc1E+ffHjB78daM8wd4eUu8KqHg+Vj0j4yxc+MDBpG8T2q5HoEpIcKuhI+K7bzroqFi R1V8eTeXWkl9K1C0gdh5BbxfDiEgByL7KrCT2mBzTM7/MNsVsH9bJ4TfeCgDRjDgLM6+ EV+Q== X-Gm-Message-State: APjAAAX508XJSk1w6XRCiHB2rOxFM+al5pdg9s5qjaloBvnFV9JNw7Nv 457LfeUhSLuomsABdMlMzKWdc6dYuUzSHzkXipAA0aDsIzaB/155uOdcL+JViDrCn6qZmICfz/A wvKGgPA57B6C+ X-Received: by 2002:ac8:7a83:: with SMTP id x3mr42376744qtr.141.1575063315144; Fri, 29 Nov 2019 13:35:15 -0800 (PST) X-Google-Smtp-Source: APXvYqw+Fj4K6gOnpyuxoxJo2w18VCH+besA2qvETUFyxVvcVY7e7/PmCE7wMyTDBAnPDr+9SenrOw== X-Received: by 2002:ac8:7a83:: with SMTP id x3mr42376726qtr.141.1575063314922; Fri, 29 Nov 2019 13:35:14 -0800 (PST) Received: from xz-x1.yyz.redhat.com ([104.156.64.74]) by smtp.gmail.com with ESMTPSA id h186sm10679046qkf.64.2019.11.29.13.35.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Nov 2019 13:35:14 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Sean Christopherson , Paolo Bonzini , "Dr . David Alan Gilbert" , peterx@redhat.com, Vitaly Kuznetsov Subject: [PATCH RFC 05/15] KVM: Make dirty ring exclusive to dirty bitmap log Date: Fri, 29 Nov 2019 16:34:55 -0500 Message-Id: <20191129213505.18472-6-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20191129213505.18472-1-peterx@redhat.com> References: <20191129213505.18472-1-peterx@redhat.com> MIME-Version: 1.0 X-MC-Unique: hlo-OIEGNi2h1oNcLDJtZQ-1 X-Mimecast-Spam-Score: 0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org There's no good reason to use both the dirty bitmap logging and the new dirty ring buffer to track dirty bits. We should be able to even support both of them at the same time, but it could complicate things which could actually help little. Let's simply make it the rule before we enable dirty ring on any arch, that we don't allow these two interfaces to be used together. The big world switch would be KVM_CAP_DIRTY_LOG_RING capability enablement. That's where we'll switch from the default dirty logging way to the dirty ring way. As long as kvm->dirty_ring_size is setup correctly, we'll once and for all switch to the dirty ring buffer mode for the current virtual machine. Signed-off-by: Peter Xu --- Documentation/virt/kvm/api.txt | 7 +++++++ virt/kvm/kvm_main.c | 12 ++++++++++++ 2 files changed, 19 insertions(+) diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt index fa622c9a2eb8..9f72ca1fd3e4 100644 --- a/Documentation/virt/kvm/api.txt +++ b/Documentation/virt/kvm/api.txt @@ -5487,3 +5487,10 @@ with the exit reason set to KVM_EXIT_DIRTY_LOG_FULL, and the KVM_RUN ioctl will return -EINTR. Once that happens, userspace should pause all the vcpus, then harvest all the dirty pages and rearm the dirty traps. It can unpause the guest after that. + +NOTE: the KVM_CAP_DIRTY_LOG_RING capability and the new ioctl +KVM_RESET_DIRTY_RINGS are exclusive to the existing KVM_GET_DIRTY_LOG +interface. After enabling KVM_CAP_DIRTY_LOG_RING with an acceptable +dirty ring size, the virtual machine will switch to the dirty ring +tracking mode, and KVM_GET_DIRTY_LOG, KVM_CLEAR_DIRTY_LOG ioctls will +stop working. diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 8642c977629b..782127d11e9d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1236,6 +1236,10 @@ int kvm_get_dirty_log(struct kvm *kvm, unsigned long n; unsigned long any = 0; + /* Dirty ring tracking is exclusive to dirty log tracking */ + if (kvm->dirty_ring_size) + return -EINVAL; + as_id = log->slot >> 16; id = (u16)log->slot; if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS) @@ -1293,6 +1297,10 @@ int kvm_get_dirty_log_protect(struct kvm *kvm, unsigned long *dirty_bitmap; unsigned long *dirty_bitmap_buffer; + /* Dirty ring tracking is exclusive to dirty log tracking */ + if (kvm->dirty_ring_size) + return -EINVAL; + as_id = log->slot >> 16; id = (u16)log->slot; if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS) @@ -1364,6 +1372,10 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm, unsigned long *dirty_bitmap; unsigned long *dirty_bitmap_buffer; + /* Dirty ring tracking is exclusive to dirty log tracking */ + if (kvm->dirty_ring_size) + return -EINVAL; + as_id = log->slot >> 16; id = (u16)log->slot; if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS) From patchwork Fri Nov 29 21:34:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11267651 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 68E7415AB for ; Fri, 29 Nov 2019 21:36:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 44C1421781 for ; Fri, 29 Nov 2019 21:36:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Ly5lF06f" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727423AbfK2VgU (ORCPT ); Fri, 29 Nov 2019 16:36:20 -0500 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:20597 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727351AbfK2VfU (ORCPT ); Fri, 29 Nov 2019 16:35:20 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575063320; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wpuq/G52YR/EWKqC8iBbbYIDe0A7Qxl+akM4QmK5sAQ=; b=Ly5lF06f05leskyxu4YqrcBv5NsxooxhFWei94AIcannNfaOe5VHj3EC9dozVM1gz5mnUq RWJYpqqmjRNfXxulMRK+N7xJ9Hpldu0EY2f3LHMW1CawwXHblNdSaOIgs2l5wf+RrNKydl GpSnFPILz5RcBmaQNStt2E7pIy5Oj64= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-237-pWz_IYgmMQiSZ8sv2glSNA-1; Fri, 29 Nov 2019 16:35:18 -0500 Received: by mail-qt1-f198.google.com with SMTP id e37so5326316qtk.7 for ; Fri, 29 Nov 2019 13:35:18 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2Lbo6nzRM/DONZbraTRbxGKXPq3mYjZEFxLpZd/H6rg=; b=EjYJpL4PePBWOaNOU6OBxga6cRwB9M+Egy2mn6STsQJwoVbo+Pt/RwmGdHo38TU8dc 9QSl6XUH+3D55yEdBpEJJMF1OMYj54lSlgo4U0ZtQGpK0EMTeJUsxX6SAp7fXM1icPI1 UNKuGUNQ/5j8VBkMJeXH+3+ZEd0cS7i7n9RsXRh6to0DHO+Q04kfF5BAxBSMNPbMWuQd tonROr6IpHQwElg5KTQPHHsR9hIGhKowN2wXDeXR6WT8hw9+hI0yHYbgSeKUyDz/EAo5 6EVetgRFNnCz0oKF7fex1KZioNe76vXmMQQtKFgFqJe/wptb7MCb9JpMLYSKd8cbhpBm tPcg== X-Gm-Message-State: APjAAAXD4s/oj+3QncJBM5rhWvbTRLZorECunc2Dkbyyh3xSFqLn/4Sw 4mxSltKjkz35f7eIZ2EMBkZ8PIaNBIPTxz91OSdYvAay3CNzqKoza2hBE3K2GxheCVBv752kFda d+zjF2sfP8TcK X-Received: by 2002:a37:9c52:: with SMTP id f79mr5140320qke.371.1575063318367; Fri, 29 Nov 2019 13:35:18 -0800 (PST) X-Google-Smtp-Source: APXvYqxA2Tpilr33v3WBbCSOmy0Z41+CrMJLbX94grlSi+qC20mb3boDfiw5uhwYkHYd+rVpjUCwXg== X-Received: by 2002:a37:9c52:: with SMTP id f79mr5140295qke.371.1575063318117; Fri, 29 Nov 2019 13:35:18 -0800 (PST) Received: from xz-x1.yyz.redhat.com ([104.156.64.74]) by smtp.gmail.com with ESMTPSA id h186sm10679046qkf.64.2019.11.29.13.35.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Nov 2019 13:35:15 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Sean Christopherson , Paolo Bonzini , "Dr . David Alan Gilbert" , peterx@redhat.com, Vitaly Kuznetsov Subject: [PATCH RFC 06/15] KVM: Introduce dirty ring wait queue Date: Fri, 29 Nov 2019 16:34:56 -0500 Message-Id: <20191129213505.18472-7-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20191129213505.18472-1-peterx@redhat.com> References: <20191129213505.18472-1-peterx@redhat.com> MIME-Version: 1.0 X-MC-Unique: pWz_IYgmMQiSZ8sv2glSNA-1 X-Mimecast-Spam-Score: 0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org When the dirty ring is completely full, right now we throw an error message and drop the dirty bit. A better approach could be that we put the thread onto a waitqueue and retry after another KVM_RESET_DIRTY_RINGS. We should still allow the process to be killed, so handle it explicitly. Signed-off-by: Peter Xu --- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 22 ++++++++++++++++------ 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 7b747bc9ff3e..a1c9ce5f23a1 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -508,6 +508,7 @@ struct kvm { struct kvm_vm_run *vm_run; u32 dirty_ring_size; struct kvm_dirty_ring vm_dirty_ring; + wait_queue_head_t dirty_ring_waitqueue; }; #define kvm_err(fmt, ...) \ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 782127d11e9d..bd6172dbff1d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -722,6 +722,7 @@ static struct kvm *kvm_create_vm(unsigned long type) mutex_init(&kvm->irq_lock); mutex_init(&kvm->slots_lock); INIT_LIST_HEAD(&kvm->devices); + init_waitqueue_head(&kvm->dirty_ring_waitqueue); BUILD_BUG_ON(KVM_MEM_SLOTS_NUM > SHRT_MAX); @@ -3370,16 +3371,23 @@ static void mark_page_dirty_in_ring(struct kvm *kvm, is_vm_ring = true; } +retry: ret = kvm_dirty_ring_push(ring, indexes, (as_id << 16)|slot->id, offset, is_vm_ring); if (ret < 0) { - if (is_vm_ring) - pr_warn_once("vcpu %d dirty log overflow\n", - vcpu->vcpu_id); - else - pr_warn_once("per-vm dirty log overflow\n"); - return; + /* + * Ring is full, put us onto per-vm waitqueue and wait + * for another KVM_RESET_DIRTY_RINGS to retry + */ + wait_event_killable(kvm->dirty_ring_waitqueue, + !kvm_dirty_ring_full(ring)); + + /* If we're killed, no worry on lossing dirty bits! */ + if (fatal_signal_pending(current)) + return; + + goto retry; } if (ret) @@ -3475,6 +3483,8 @@ static int kvm_vm_ioctl_reset_dirty_pages(struct kvm *kvm) if (cleared) kvm_flush_remote_tlbs(kvm); + wake_up_all(&kvm->dirty_ring_waitqueue); + return cleared; } From patchwork Fri Nov 29 21:34:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11267647 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AE4A515AB for ; Fri, 29 Nov 2019 21:36:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8234C22AB9 for ; Fri, 29 Nov 2019 21:36:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="LxZkIhF1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387507AbfK2VgL (ORCPT ); Fri, 29 Nov 2019 16:36:11 -0500 Received: from us-smtp-2.mimecast.com ([205.139.110.61]:21280 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727428AbfK2VfY (ORCPT ); Fri, 29 Nov 2019 16:35:24 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575063322; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rb+6DP+izG5ltISMKHwwwOgrlsplIg1b9aT7OhqAVdI=; b=LxZkIhF12KpKao3v/6U+tXVfQtVQVUINEKQQkA+TYUszcP7PYTZdzK9UVoeWrVSmO7It1n g+YY8Wwr/gFdbzFWDTUsYAH76o+oslM/MSrr9IbO/IqKSI2kPMCcWlJYAeqB2A20jyc/A9 R5HPIAfP37LJW9SbBXfJdPooqBEO7Dw= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-320-LMOSyk9AP0GrljO6M_F8zQ-1; Fri, 29 Nov 2019 16:35:21 -0500 Received: by mail-qt1-f198.google.com with SMTP id r9so8017206qtc.4 for ; Fri, 29 Nov 2019 13:35:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=TqT6uWauQhM+3HI2ETpe/nnUa6snC1aBt/hCGN1iX+A=; b=P7GHICwDRlNzl+dlxJK5/MSdBixpbA5rWC9D1cW5EiPjg5e05sxGHKQsRaTcQI4Pyq /n3/kNhWC5uYZNBq1jINZdLXBbujb+vHw05dlIoc11ewOV92+VvFaxXJxGMY8tG7V6VM yGPm0u7h1/Y2PRtgM5AQiamYSBB3oSNhYciQimB+OyfFDRdxnO4cBkgi3ivPvGhfNGHo peridKZS5R24N+5cqGATwlp9Wc2F3+2rDfde1/aD7KbUVby6tn8T3P2pY+QRq4ckyUEO 4trHMKaELGfjZZPxq9UAHRYJfi2XN5Fyz1tKWHgv0YM1Sp3HC0LWCgbNGulitDYYALPr JedQ== X-Gm-Message-State: APjAAAViz8aTAVbzDGWtGSlEOmBv+nohQrHD10f1qx9fRaiwM3UnwRTw TvA3LsJ/lqCFKtcb2khiEmQUBrhuI3t2ixSnZD461aNfwviESzUlC0FC6cVFvjrWqEQYMh2WgwS Xsk77YykHbN0J X-Received: by 2002:ac8:4645:: with SMTP id f5mr6786383qto.38.1575063320072; Fri, 29 Nov 2019 13:35:20 -0800 (PST) X-Google-Smtp-Source: APXvYqxiIQ1NW11IAZ9fWbWj6uQIg3tSGAXVrGfRP6H3jI30bN59TZ3wxfmGTTSHhx9icJnaJHMVaQ== X-Received: by 2002:ac8:4645:: with SMTP id f5mr6786365qto.38.1575063319872; Fri, 29 Nov 2019 13:35:19 -0800 (PST) Received: from xz-x1.yyz.redhat.com ([104.156.64.74]) by smtp.gmail.com with ESMTPSA id h186sm10679046qkf.64.2019.11.29.13.35.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Nov 2019 13:35:18 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Sean Christopherson , Paolo Bonzini , "Dr . David Alan Gilbert" , peterx@redhat.com, Vitaly Kuznetsov Subject: [PATCH RFC 07/15] KVM: X86: Implement ring-based dirty memory tracking Date: Fri, 29 Nov 2019 16:34:57 -0500 Message-Id: <20191129213505.18472-8-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20191129213505.18472-1-peterx@redhat.com> References: <20191129213505.18472-1-peterx@redhat.com> MIME-Version: 1.0 X-MC-Unique: LMOSyk9AP0GrljO6M_F8zQ-1 X-Mimecast-Spam-Score: 0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: "Cao, Lei" Add new KVM exit reason KVM_EXIT_DIRTY_LOG_FULL and connect KVM_REQ_DIRTY_LOG_FULL to it. Signed-off-by: Lei Cao Signed-off-by: Paolo Bonzini [peterx: rebase, return 0 instead of -EINTR for user exits, emul_insn before exit to userspace] Signed-off-by: Peter Xu --- arch/x86/include/asm/kvm_host.h | 5 +++++ arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/mmu/mmu.c | 6 ++++++ arch/x86/kvm/vmx/vmx.c | 7 +++++++ arch/x86/kvm/x86.c | 12 ++++++++++++ 5 files changed, 31 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index b79cd6aa4075..67521627f9e4 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -49,6 +49,8 @@ #define KVM_IRQCHIP_NUM_PINS KVM_IOAPIC_NUM_PINS +#define KVM_DIRTY_RING_VERSION 1 + /* x86-specific vcpu->requests bit members */ #define KVM_REQ_MIGRATE_TIMER KVM_ARCH_REQ(0) #define KVM_REQ_REPORT_TPR_ACCESS KVM_ARCH_REQ(1) @@ -1176,6 +1178,7 @@ struct kvm_x86_ops { struct kvm_memory_slot *slot, gfn_t offset, unsigned long mask); int (*write_log_dirty)(struct kvm_vcpu *vcpu); + int (*cpu_dirty_log_size)(void); /* pmu operations of sub-arch */ const struct kvm_pmu_ops *pmu_ops; @@ -1661,4 +1664,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu) #define GET_SMSTATE(type, buf, offset) \ (*(type *)((buf) + (offset) - 0x7e00)) +int kvm_cpu_dirty_log_size(void); + #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 503d3f42da16..b59bf356c478 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -12,6 +12,7 @@ #define KVM_PIO_PAGE_OFFSET 1 #define KVM_COALESCED_MMIO_PAGE_OFFSET 2 +#define KVM_DIRTY_LOG_PAGE_OFFSET 64 #define DE_VECTOR 0 #define DB_VECTOR 1 diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 6f92b40d798c..f7efb69b089e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1818,7 +1818,13 @@ int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu) { if (kvm_x86_ops->write_log_dirty) return kvm_x86_ops->write_log_dirty(vcpu); + return 0; +} +int kvm_cpu_dirty_log_size(void) +{ + if (kvm_x86_ops->cpu_dirty_log_size) + return kvm_x86_ops->cpu_dirty_log_size(); return 0; } diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index d175429c91b0..871489d92d3c 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7710,6 +7710,7 @@ static __init int hardware_setup(void) kvm_x86_ops->slot_disable_log_dirty = NULL; kvm_x86_ops->flush_log_dirty = NULL; kvm_x86_ops->enable_log_dirty_pt_masked = NULL; + kvm_x86_ops->cpu_dirty_log_size = NULL; } if (!cpu_has_vmx_preemption_timer()) @@ -7774,6 +7775,11 @@ static __exit void hardware_unsetup(void) free_kvm_area(); } +static int vmx_cpu_dirty_log_size(void) +{ + return enable_pml ? PML_ENTITY_NUM : 0; +} + static struct kvm_x86_ops vmx_x86_ops __ro_after_init = { .cpu_has_kvm_support = cpu_has_kvm_support, .disabled_by_bios = vmx_disabled_by_bios, @@ -7896,6 +7902,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = { .flush_log_dirty = vmx_flush_log_dirty, .enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked, .write_log_dirty = vmx_write_pml_buffer, + .cpu_dirty_log_size = vmx_cpu_dirty_log_size, .pre_block = vmx_pre_block, .post_block = vmx_post_block, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3ed167e039e5..03ff34783fa1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8094,6 +8094,18 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) */ if (kvm_check_request(KVM_REQ_HV_STIMER, vcpu)) kvm_hv_process_stimers(vcpu); + + if (kvm_check_request(KVM_REQ_DIRTY_RING_FULL, vcpu)) { + vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL; + /* + * If this is requested, it means that we've + * marked the dirty bit in the dirty ring BUT + * we've not written the date. Do it now. + */ + r = kvm_emulate_instruction(vcpu, 0); + r = r >= 0 ? 0 : r; + goto out; + } } if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) { From patchwork Fri Nov 29 21:34:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11267645 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F107E17E0 for ; Fri, 29 Nov 2019 21:36:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CC16222AB9 for ; Fri, 29 Nov 2019 21:36:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="BWLK2yT9" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387499AbfK2VgH (ORCPT ); Fri, 29 Nov 2019 16:36:07 -0500 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:59140 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727450AbfK2Vf1 (ORCPT ); Fri, 29 Nov 2019 16:35:27 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575063326; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+epP0ZJNrOuBWj/134Gm2Sk2TIexg1BXp/mPjOqGz4M=; b=BWLK2yT9krbcdIIjiUruP3AMscc/Lvzd0GZgOW4iEx+UVEo0GexnZvZwY3EzLOLFgsUlWC AvZ3cRwmK939s20wFo1OqTl7VuO6anADdRizpfVP5ZcePtFwhBMPAgv7wZbtu6Ijw0If5I 2tdsyq5NCdOSqpfp385p3NJElSqs6Nw= Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-271-JYydI4HEMfKFPfbC5T4qqg-1; Fri, 29 Nov 2019 16:35:22 -0500 Received: by mail-qk1-f197.google.com with SMTP id w85so18624501qka.13 for ; Fri, 29 Nov 2019 13:35:22 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=dvCZjak4fUQXL1Tj0+D6gDZOBS7ZBlh7A3EqUR5B3UM=; b=SwomU2dmZojLq1XRwgfZzjJbwbWTMRU0L2T67oLGMUB7x4Ouc/5Mm/mRKRamlyUfp6 O4KhVz8/4oV2xbAFA3ln7d9Bb3ToFfqOItf8houyCWVVLpxQOGwCr8uRdo8GqQybh1Xx bpsxxo80JjzF+J7Ry/u1YZooFCgSFELmjQD4FGLEVBBdBkfe13NvIVLnaJ9c1gF8I7CA 7uN9scYjGLjw4u5i82z7Lg4+7eSzep+UxFq4ZYh4smiOlUi01mqlYdiFubO0VBfCdsgf e+tWPh3YkQ6gna63GteR7odFJLN3kUCjqfOkGqpZG1bSgtCVuu7r0B0e3Id0kdIS0kmM T70w== X-Gm-Message-State: APjAAAX5OOOjiuYFk/uzL29XG7DOACP9Ui3kSvdZz9UoYaxVqkmoWFoB 35h8wS53/zxS/MDtQzk4o0PQtlPB6piy+slb5IvgnAVSYWzw//YfaDQkBGLGPszdfP224X/VOqO 4R1JWCoOcrcx+ X-Received: by 2002:aed:2041:: with SMTP id 59mr53330741qta.79.1575063322429; Fri, 29 Nov 2019 13:35:22 -0800 (PST) X-Google-Smtp-Source: APXvYqwlnBfGzVgTWxwT4m5yNzjwn1Ma5s2++ZIonxctUkmfr8RexiTx7Q64cvVDuUWaR5tujaL0Kg== X-Received: by 2002:aed:2041:: with SMTP id 59mr53330724qta.79.1575063322256; Fri, 29 Nov 2019 13:35:22 -0800 (PST) Received: from xz-x1.yyz.redhat.com ([104.156.64.74]) by smtp.gmail.com with ESMTPSA id h186sm10679046qkf.64.2019.11.29.13.35.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Nov 2019 13:35:20 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Sean Christopherson , Paolo Bonzini , "Dr . David Alan Gilbert" , peterx@redhat.com, Vitaly Kuznetsov Subject: [PATCH RFC 08/15] KVM: selftests: Always clear dirty bitmap after iteration Date: Fri, 29 Nov 2019 16:34:58 -0500 Message-Id: <20191129213505.18472-9-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20191129213505.18472-1-peterx@redhat.com> References: <20191129213505.18472-1-peterx@redhat.com> MIME-Version: 1.0 X-MC-Unique: JYydI4HEMfKFPfbC5T4qqg-1 X-Mimecast-Spam-Score: 0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org We don't clear the dirty bitmap before because KVM_GET_DIRTY_LOG will clear it for us before copying the dirty log onto it. However we'd still better to clear it explicitly instead of assuming the kernel will always do it for us. More importantly, in the upcoming dirty ring tests we'll start to fetch dirty pages from a ring buffer, so no one is going to clear the dirty bitmap for us. Signed-off-by: Peter Xu --- tools/testing/selftests/kvm/dirty_log_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index 5614222a6628..3c0ffd34b3b0 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -197,7 +197,7 @@ static void vm_dirty_log_verify(unsigned long *bmap) page); } - if (test_bit_le(page, bmap)) { + if (test_and_clear_bit_le(page, bmap)) { host_dirty_count++; /* * If the bit is set, the value written onto From patchwork Fri Nov 29 21:34:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11267641 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 98C93109A for ; Fri, 29 Nov 2019 21:36:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6A27322AB9 for ; Fri, 29 Nov 2019 21:36:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ZwadUVqM" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387490AbfK2Vf6 (ORCPT ); Fri, 29 Nov 2019 16:35:58 -0500 Received: from us-smtp-1.mimecast.com ([205.139.110.61]:32509 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2387418AbfK2Vfb (ORCPT ); Fri, 29 Nov 2019 16:35:31 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575063330; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gZpdoO8QRbjuY5mDRDtcxZHO4nZtgRquhpHelnraUvg=; b=ZwadUVqMP1kDLu27vQuDCu5jDTmm2300WSyVfbKqE9yIi7ewDpj++LQtihJ+TrHDER5J44 dY6ZAUpBPX8Ee083JY2FS4ssmvZcUcmr7/6tNIup9OvI8qT94u5iY4/7wi8XNwtUuyxNDD IKyvbzRKgUSvt6mCJZ5HF0O0GlO/uKg= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-138-60OSuQCqOB2pnysdMkI0Pw-1; Fri, 29 Nov 2019 16:35:25 -0500 Received: by mail-qt1-f199.google.com with SMTP id t20so7504142qtr.3 for ; Fri, 29 Nov 2019 13:35:24 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=LPDpRSfU7OJDvK+Jv1HyG5MKUTN9E4L4yVGOfnatozs=; b=fSOqP4IuwpypR/PJGECS9Gc9ygBcUVfiiq4dUqk2qEz0gGf1ElJ3jv6E8XnSA0ImoF lBpP+/ST9UY5IoXJWbogusfD81RSZDAgV+aecd7VkZUvUq8pVXhZzh+GSxFFh1S2Sq4U Ach5nxn5u674UylPK9lXLFmyPSQQynRz01o7cBpdeGfHFCAMLQ7wMsGVfsKCJXo4PoOu vWnvKOy0mJ1wnKOkfYoTr+wtuBJxxXq3uZ0mLbnxnj57/ZvUPnYi7wsuNpaTATUN4a6r CLKbzHCQ0W6JFDdhmDNPiDSlllA45gdaEo31edKxB2dJCvKd4RIpxXN4pG8SFsv6LJbk WRiQ== X-Gm-Message-State: APjAAAXiAEntpXZSZEfnPDR7m6V7u+jKxXMu4PVDjGWGB2ACbmixeX7q 4QRkOCpHH00LWxOyXkoWwN+dDHG4ZAPrO03dxIgaDyAmJICK2xkboLYxnCpfbT5z4JKjvlxyHvS xLWSpkcdP2u+h X-Received: by 2002:aed:24e4:: with SMTP id u33mr54423799qtc.259.1575063323861; Fri, 29 Nov 2019 13:35:23 -0800 (PST) X-Google-Smtp-Source: APXvYqz8f1cm2E9EATQGs2NbJnhtjkcM3Mr1wBhqvHx80R6fUIyhQEakj9JIIhVHAr3mSX4XrMGYWg== X-Received: by 2002:aed:24e4:: with SMTP id u33mr54423779qtc.259.1575063323647; Fri, 29 Nov 2019 13:35:23 -0800 (PST) Received: from xz-x1.yyz.redhat.com ([104.156.64.74]) by smtp.gmail.com with ESMTPSA id h186sm10679046qkf.64.2019.11.29.13.35.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Nov 2019 13:35:22 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Sean Christopherson , Paolo Bonzini , "Dr . David Alan Gilbert" , peterx@redhat.com, Vitaly Kuznetsov Subject: [PATCH RFC 09/15] KVM: selftests: Sync uapi/linux/kvm.h to tools/ Date: Fri, 29 Nov 2019 16:34:59 -0500 Message-Id: <20191129213505.18472-10-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20191129213505.18472-1-peterx@redhat.com> References: <20191129213505.18472-1-peterx@redhat.com> MIME-Version: 1.0 X-MC-Unique: 60OSuQCqOB2pnysdMkI0Pw-1 X-Mimecast-Spam-Score: 0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This will be needed to extend the kvm selftest program. Signed-off-by: Peter Xu --- tools/include/uapi/linux/kvm.h | 47 ++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h index 52641d8ca9e8..0b88d76d6215 100644 --- a/tools/include/uapi/linux/kvm.h +++ b/tools/include/uapi/linux/kvm.h @@ -235,6 +235,8 @@ struct kvm_hyperv_exit { #define KVM_EXIT_S390_STSI 25 #define KVM_EXIT_IOAPIC_EOI 26 #define KVM_EXIT_HYPERV 27 +#define KVM_EXIT_ARM_NISV 28 +#define KVM_EXIT_DIRTY_RING_FULL 29 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -246,6 +248,11 @@ struct kvm_hyperv_exit { /* Encounter unexpected vm-exit reason */ #define KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON 4 +struct kvm_dirty_ring_indexes { + __u32 avail_index; /* set by kernel */ + __u32 fetch_index; /* set by userspace */ +}; + /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */ struct kvm_run { /* in */ @@ -394,6 +401,11 @@ struct kvm_run { } eoi; /* KVM_EXIT_HYPERV */ struct kvm_hyperv_exit hyperv; + /* KVM_EXIT_ARM_NISV */ + struct { + __u64 esr_iss; + __u64 fault_ipa; + } arm_nisv; /* Fix the size of the union. */ char padding[256]; }; @@ -415,6 +427,13 @@ struct kvm_run { struct kvm_sync_regs regs; char padding[SYNC_REGS_SIZE_BYTES]; } s; + + struct kvm_dirty_ring_indexes vcpu_ring_indexes; +}; + +/* Returned by mmap(kvm->fd, offset=0) */ +struct kvm_vm_run { + struct kvm_dirty_ring_indexes vm_ring_indexes; }; /* for KVM_REGISTER_COALESCED_MMIO / KVM_UNREGISTER_COALESCED_MMIO */ @@ -1000,6 +1019,10 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_PMU_EVENT_FILTER 173 #define KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 174 #define KVM_CAP_HYPERV_DIRECT_TLBFLUSH 175 +#define KVM_CAP_PPC_GUEST_DEBUG_SSTEP 176 +#define KVM_CAP_ARM_NISV_TO_USER 177 +#define KVM_CAP_ARM_INJECT_EXT_DABT 178 +#define KVM_CAP_DIRTY_LOG_RING 179 #ifdef KVM_CAP_IRQ_ROUTING @@ -1227,6 +1250,8 @@ enum kvm_device_type { #define KVM_DEV_TYPE_ARM_VGIC_ITS KVM_DEV_TYPE_ARM_VGIC_ITS KVM_DEV_TYPE_XIVE, #define KVM_DEV_TYPE_XIVE KVM_DEV_TYPE_XIVE + KVM_DEV_TYPE_ARM_PV_TIME, +#define KVM_DEV_TYPE_ARM_PV_TIME KVM_DEV_TYPE_ARM_PV_TIME KVM_DEV_TYPE_MAX, }; @@ -1461,6 +1486,9 @@ struct kvm_enc_region { /* Available with KVM_CAP_ARM_SVE */ #define KVM_ARM_VCPU_FINALIZE _IOW(KVMIO, 0xc2, int) +/* Available with KVM_CAP_DIRTY_LOG_RING */ +#define KVM_RESET_DIRTY_RINGS _IO(KVMIO, 0xc3) + /* Secure Encrypted Virtualization command */ enum sev_cmd_id { /* Guest initialization commands */ @@ -1611,4 +1639,23 @@ struct kvm_hyperv_eventfd { #define KVM_HYPERV_CONN_ID_MASK 0x00ffffff #define KVM_HYPERV_EVENTFD_DEASSIGN (1 << 0) +/* + * The following are the requirements for supporting dirty log ring + * (by enabling KVM_DIRTY_LOG_PAGE_OFFSET). + * + * 1. Memory accesses by KVM should call kvm_vcpu_write_* instead + * of kvm_write_* so that the global dirty ring is not filled up + * too quickly. + * 2. kvm_arch_mmu_enable_log_dirty_pt_masked should be defined for + * enabling dirty logging. + * 3. There should not be a separate step to synchronize hardware + * dirty bitmap with KVM's. + */ + +struct kvm_dirty_gfn { + __u32 pad; + __u32 slot; + __u64 offset; +}; + #endif /* __LINUX_KVM_H */ From patchwork Fri Nov 29 21:35:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11267643 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8AD63109A for ; Fri, 29 Nov 2019 21:36:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 539A624248 for ; Fri, 29 Nov 2019 21:36:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="VuQD0mDK" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387411AbfK2Vfa (ORCPT ); Fri, 29 Nov 2019 16:35:30 -0500 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:30415 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727455AbfK2Vf2 (ORCPT ); Fri, 29 Nov 2019 16:35:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575063327; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=P5AbgpUfRNxdJPVDFTcfRzPiSipiKvMe/PdnHweHoi8=; b=VuQD0mDKdKc6OMfv1rPW0kk2ULJpTM8M6tfgaySaAk6vpzYKbQO06/D30H4tW8y9RgNwgC cniwOM8FGdkir1fdtdoAt4OS0yKa4Ud02pAc27Esu2yhPF+J/b8xzhOHYs7CKRlN1i7ycS /Kq293yqGgQSoojVsTNvAQavObJcSoM= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-192-aRhnBmPaMjKVtxlAO66SDA-1; Fri, 29 Nov 2019 16:35:26 -0500 Received: by mail-qt1-f198.google.com with SMTP id x21so19719742qtp.1 for ; Fri, 29 Nov 2019 13:35:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=rJUKGGK5QCK7nGpwM9ItG2F9zksWMA0xUQtg22oHDPc=; b=SuzuGqDBf1KLRHtwAocMHSPT/G6OGHWhB5tiUxmDeOF9Hy9RtE3Oh8OkJLfYHkrzM9 ykUO8VkKqZzH15H6JOjarX3603Y2V3dMxJTKDDEB6DxKgwZ2YKEj4rxvPIHjVZyv7tC8 jg6cBb6iTl9LgA6qzNQjXo3BPXQCASRbr4zq+F2+khByH9vF7u3DRNlapcbYA/Z95891 KTmVaOzhgYO2uFCMwFjutJ9CkFbENBwt26etG76o+dD3AWKM52RziYotZ6GXqv4GPfo7 V+2WuraVYPTFJm7t19HgDTVILLcKGnq+arvD4KEB4+zhcGMkEFg7ztNDDwG4tGArVRPh e/9A== X-Gm-Message-State: APjAAAXzvUyo6F3nfQTp5cM5rf0ZMlsv/pz3JbR6z4rAv6wgdJ6H3ZfV eTcIfNDoRAa8MLQ5GcuXoxwwQ97jElqej69Z3dsp9MGWt+TwaJB54UVI2k7W319UyPiilgaLG/q njyaOUtxk0fxE X-Received: by 2002:aed:2103:: with SMTP id 3mr42418235qtc.132.1575063325389; Fri, 29 Nov 2019 13:35:25 -0800 (PST) X-Google-Smtp-Source: APXvYqwvgOZjGvsXBPgfSr/Px2y4ZuxtVQVUzGcdlunaZ0Md4LZTtDyMj7SjYRKhhNcrydFW4faW5A== X-Received: by 2002:aed:2103:: with SMTP id 3mr42418207qtc.132.1575063325096; Fri, 29 Nov 2019 13:35:25 -0800 (PST) Received: from xz-x1.yyz.redhat.com ([104.156.64.74]) by smtp.gmail.com with ESMTPSA id h186sm10679046qkf.64.2019.11.29.13.35.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Nov 2019 13:35:24 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Sean Christopherson , Paolo Bonzini , "Dr . David Alan Gilbert" , peterx@redhat.com, Vitaly Kuznetsov Subject: [PATCH RFC 10/15] KVM: selftests: Use a single binary for dirty/clear log test Date: Fri, 29 Nov 2019 16:35:00 -0500 Message-Id: <20191129213505.18472-11-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20191129213505.18472-1-peterx@redhat.com> References: <20191129213505.18472-1-peterx@redhat.com> MIME-Version: 1.0 X-MC-Unique: aRhnBmPaMjKVtxlAO66SDA-1 X-Mimecast-Spam-Score: 0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Remove the clear_dirty_log test, instead merge it into the existing dirty_log_test. It should be cleaner to use this single binary to do both tests, also it's a preparation for the upcoming dirty ring test. The default test will still be the dirty_log test. To run the clear dirty log test, we need to specify "-M clear-log". Signed-off-by: Peter Xu --- tools/testing/selftests/kvm/Makefile | 2 - .../selftests/kvm/clear_dirty_log_test.c | 2 - tools/testing/selftests/kvm/dirty_log_test.c | 131 +++++++++++++++--- 3 files changed, 110 insertions(+), 25 deletions(-) delete mode 100644 tools/testing/selftests/kvm/clear_dirty_log_test.c diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 3138a916574a..130a7b1c7ad6 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -26,11 +26,9 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_dirty_log_test TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test TEST_GEN_PROGS_x86_64 += x86_64/xss_msr_test -TEST_GEN_PROGS_x86_64 += clear_dirty_log_test TEST_GEN_PROGS_x86_64 += dirty_log_test TEST_GEN_PROGS_x86_64 += kvm_create_max_vcpus -TEST_GEN_PROGS_aarch64 += clear_dirty_log_test TEST_GEN_PROGS_aarch64 += dirty_log_test TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus diff --git a/tools/testing/selftests/kvm/clear_dirty_log_test.c b/tools/testing/selftests/kvm/clear_dirty_log_test.c deleted file mode 100644 index 749336937d37..000000000000 --- a/tools/testing/selftests/kvm/clear_dirty_log_test.c +++ /dev/null @@ -1,2 +0,0 @@ -#define USE_CLEAR_DIRTY_LOG -#include "dirty_log_test.c" diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index 3c0ffd34b3b0..a8ae8c0042a8 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -128,6 +128,66 @@ static uint64_t host_dirty_count; static uint64_t host_clear_count; static uint64_t host_track_next_count; +enum log_mode_t { + /* Only use KVM_GET_DIRTY_LOG for logging */ + LOG_MODE_DIRTY_LOG = 0, + + /* Use both KVM_[GET|CLEAR]_DIRTY_LOG for logging */ + LOG_MODE_CLERA_LOG = 1, + + LOG_MODE_NUM, +}; + +/* Mode of logging. Default is LOG_MODE_DIRTY_LOG */ +static enum log_mode_t host_log_mode; + +static void clear_log_create_vm_done(struct kvm_vm *vm) +{ + struct kvm_enable_cap cap = {}; + + if (!kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2)) { + fprintf(stderr, "KVM_CLEAR_DIRTY_LOG not available, skipping tests\n"); + exit(KSFT_SKIP); + } + + cap.cap = KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2; + cap.args[0] = 1; + vm_enable_cap(vm, &cap); +} + +static void dirty_log_collect_dirty_pages(struct kvm_vm *vm, int slot, + void *bitmap, uint32_t num_pages) +{ + kvm_vm_get_dirty_log(vm, slot, bitmap); +} + +static void clear_log_collect_dirty_pages(struct kvm_vm *vm, int slot, + void *bitmap, uint32_t num_pages) +{ + kvm_vm_get_dirty_log(vm, slot, bitmap); + kvm_vm_clear_dirty_log(vm, slot, bitmap, 0, num_pages); +} + +struct log_mode { + const char *name; + /* Hook when the vm creation is done (before vcpu creation) */ + void (*create_vm_done)(struct kvm_vm *vm); + /* Hook to collect the dirty pages into the bitmap provided */ + void (*collect_dirty_pages) (struct kvm_vm *vm, int slot, + void *bitmap, uint32_t num_pages); +} log_modes[LOG_MODE_NUM] = { + { + .name = "dirty-log", + .create_vm_done = NULL, + .collect_dirty_pages = dirty_log_collect_dirty_pages, + }, + { + .name = "clear-log", + .create_vm_done = clear_log_create_vm_done, + .collect_dirty_pages = clear_log_collect_dirty_pages, + }, +}; + /* * We use this bitmap to track some pages that should have its dirty * bit set in the _next_ iteration. For example, if we detected the @@ -137,6 +197,33 @@ static uint64_t host_track_next_count; */ static unsigned long *host_bmap_track; +static void log_modes_dump(void) +{ + int i; + + for (i = 0; i < LOG_MODE_NUM; i++) + printf("%s, ", log_modes[i].name); + puts("\b\b \b\b"); +} + +static void log_mode_create_vm_done(struct kvm_vm *vm) +{ + struct log_mode *mode = &log_modes[host_log_mode]; + + if (mode->create_vm_done) + mode->create_vm_done(vm); +} + +static void log_mode_collect_dirty_pages(struct kvm_vm *vm, int slot, + void *bitmap, uint32_t num_pages) +{ + struct log_mode *mode = &log_modes[host_log_mode]; + + TEST_ASSERT(mode->collect_dirty_pages != NULL, + "collect_dirty_pages() is required for any log mode!"); + mode->collect_dirty_pages(vm, slot, bitmap, num_pages); +} + static void generate_random_array(uint64_t *guest_array, uint64_t size) { uint64_t i; @@ -257,6 +344,7 @@ static struct kvm_vm *create_vm(enum vm_guest_mode mode, uint32_t vcpuid, #ifdef __x86_64__ vm_create_irqchip(vm); #endif + log_mode_create_vm_done(vm); vm_vcpu_add_default(vm, vcpuid, guest_code); return vm; } @@ -316,14 +404,6 @@ static void run_test(enum vm_guest_mode mode, unsigned long iterations, bmap = bitmap_alloc(host_num_pages); host_bmap_track = bitmap_alloc(host_num_pages); -#ifdef USE_CLEAR_DIRTY_LOG - struct kvm_enable_cap cap = {}; - - cap.cap = KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2; - cap.args[0] = 1; - vm_enable_cap(vm, &cap); -#endif - /* Add an extra memory slot for testing dirty logging */ vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, guest_test_phys_mem, @@ -364,11 +444,8 @@ static void run_test(enum vm_guest_mode mode, unsigned long iterations, while (iteration < iterations) { /* Give the vcpu thread some time to dirty some pages */ usleep(interval * 1000); - kvm_vm_get_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap); -#ifdef USE_CLEAR_DIRTY_LOG - kvm_vm_clear_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap, 0, - host_num_pages); -#endif + log_mode_collect_dirty_pages(vm, TEST_MEM_SLOT_INDEX, + bmap, host_num_pages); vm_dirty_log_verify(bmap); iteration++; sync_global_to_guest(vm, iteration); @@ -413,6 +490,9 @@ static void help(char *name) TEST_HOST_LOOP_INTERVAL); printf(" -p: specify guest physical test memory offset\n" " Warning: a low offset can conflict with the loaded test code.\n"); + printf(" -M: specify the host logging mode " + "(default: log-dirty). Supported modes: \n\t"); + log_modes_dump(); printf(" -m: specify the guest mode ID to test " "(default: test all supported modes)\n" " This option may be used multiple times.\n" @@ -437,13 +517,6 @@ int main(int argc, char *argv[]) unsigned int host_ipa_limit; #endif -#ifdef USE_CLEAR_DIRTY_LOG - if (!kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2)) { - fprintf(stderr, "KVM_CLEAR_DIRTY_LOG not available, skipping tests\n"); - exit(KSFT_SKIP); - } -#endif - #ifdef __x86_64__ vm_guest_mode_params_init(VM_MODE_PXXV48_4K, true, true); #endif @@ -463,7 +536,7 @@ int main(int argc, char *argv[]) vm_guest_mode_params_init(VM_MODE_P40V48_4K, true, true); #endif - while ((opt = getopt(argc, argv, "hi:I:p:m:")) != -1) { + while ((opt = getopt(argc, argv, "hi:I:p:m:M:")) != -1) { switch (opt) { case 'i': iterations = strtol(optarg, NULL, 10); @@ -485,6 +558,22 @@ int main(int argc, char *argv[]) "Guest mode ID %d too big", mode); vm_guest_mode_params[mode].enabled = true; break; + case 'M': + for (i = 0; i < LOG_MODE_NUM; i++) { + if (!strcmp(optarg, log_modes[i].name)) { + DEBUG("Setting log mode to: '%s'\n", + optarg); + host_log_mode = i; + break; + } + } + if (i == LOG_MODE_NUM) { + printf("Log mode '%s' is invalid. " + "Please choose from: ", optarg); + log_modes_dump(); + exit(-1); + } + break; case 'h': default: help(argv[0]); From patchwork Fri Nov 29 21:35:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11267639 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0D91E15AB for ; Fri, 29 Nov 2019 21:35:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D4D9A24248 for ; Fri, 29 Nov 2019 21:35:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Dy6cNiHR" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387426AbfK2Vfc (ORCPT ); Fri, 29 Nov 2019 16:35:32 -0500 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:55796 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2387406AbfK2Vfa (ORCPT ); Fri, 29 Nov 2019 16:35:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575063329; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AVwR89GpLJvgOqcOYsvRfM8XUB2WN/29zxkwIRsm8Ds=; b=Dy6cNiHRMVJGkit2DR3GVBLLkgLhfDLHDTOrOG0fttB+uWQ+L6d1Bln04RXgh5/Su0gCv1 1totNGJm8/6FXPUHkIP/UzItxS2CioS6Em+fGxxwHofpm8te5eaSu9DoVuvCzxoIShz3JL dLMzi9IFSo8Rt/ZAWrlhP1mNveYnrks= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-227-dFTn7oKpMUSjgxpURXLXoQ-1; Fri, 29 Nov 2019 16:35:27 -0500 Received: by mail-qv1-f71.google.com with SMTP id d3so19713856qvz.2 for ; Fri, 29 Nov 2019 13:35:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=I/ru2uLWH98VwOEpQ4g8Db5rqCjk9Lo4bBQ00K16x0M=; b=HGZQzfWY1pWqiPdMMdJykm4RI2y+gI5lqyzskfrVJwz3MrcdMa71HYtZXWbzBwlAAc /wxvYTjUfibY72llbe+g60rEaTphqaifFs+Y+ka/vWbyh2xBgxfKXLyOqO64UTDvcPk6 +Xo3bTx7aQqGkSHzeZdinLi9n48KPRqePhMkwt4tRuMZfuv+D69FoCasBPjvu7M6c2gz r17nDhy9J07l2fPAje2xkKDbiaXItjkuihdI9gCtu3LYfQgDLNt2v8ZRAYyhL0ys597X fhPUizeQ0GLjv/bwEQ9kHOa6ecWGuqOCB2if2t9Mdi98yRG5JqWLkCf2kKWOq/9D32XH 7Lug== X-Gm-Message-State: APjAAAUuf7GX+RSreBTMs0T1jHsx3G2bTndoeKa5+Gj6OclWR6pi2tXX JQwpQtf8DtkHsC58Iagf63tj/xab/DqLE8pvkb/+acYHHPCD3vizOrpVAJmG6T+Q46Dg2Ky1Oa8 beQISwf9QSlCE X-Received: by 2002:ad4:55e8:: with SMTP id bu8mr15973908qvb.61.1575063326994; Fri, 29 Nov 2019 13:35:26 -0800 (PST) X-Google-Smtp-Source: APXvYqyO0sB52jgNAeOKc4OLF6WGBaoE6fp1MHQy2+mrUu67bIMbdQuVzZkL3OSLnKOEKAm8ZgDobg== X-Received: by 2002:ad4:55e8:: with SMTP id bu8mr15973883qvb.61.1575063326723; Fri, 29 Nov 2019 13:35:26 -0800 (PST) Received: from xz-x1.yyz.redhat.com ([104.156.64.74]) by smtp.gmail.com with ESMTPSA id h186sm10679046qkf.64.2019.11.29.13.35.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Nov 2019 13:35:25 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Sean Christopherson , Paolo Bonzini , "Dr . David Alan Gilbert" , peterx@redhat.com, Vitaly Kuznetsov Subject: [PATCH RFC 11/15] KVM: selftests: Introduce after_vcpu_run hook for dirty log test Date: Fri, 29 Nov 2019 16:35:01 -0500 Message-Id: <20191129213505.18472-12-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20191129213505.18472-1-peterx@redhat.com> References: <20191129213505.18472-1-peterx@redhat.com> MIME-Version: 1.0 X-MC-Unique: dFTn7oKpMUSjgxpURXLXoQ-1 X-Mimecast-Spam-Score: 0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Provide a hook for the checks after vcpu_run() completes. Preparation for the dirty ring test because we'll need to take care of another exit reason. Since at it, drop the pages_count because after all we have a better summary right now with statistics, and clean it up a bit. Signed-off-by: Peter Xu --- tools/testing/selftests/kvm/dirty_log_test.c | 39 ++++++++++++-------- 1 file changed, 23 insertions(+), 16 deletions(-) diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index a8ae8c0042a8..3542311f56ff 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -168,6 +168,15 @@ static void clear_log_collect_dirty_pages(struct kvm_vm *vm, int slot, kvm_vm_clear_dirty_log(vm, slot, bitmap, 0, num_pages); } +static void default_after_vcpu_run(struct kvm_vm *vm) +{ + struct kvm_run *run = vcpu_state(vm, VCPU_ID); + + TEST_ASSERT(get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC, + "Invalid guest sync status: exit_reason=%s\n", + exit_reason_str(run->exit_reason)); +} + struct log_mode { const char *name; /* Hook when the vm creation is done (before vcpu creation) */ @@ -175,16 +184,20 @@ struct log_mode { /* Hook to collect the dirty pages into the bitmap provided */ void (*collect_dirty_pages) (struct kvm_vm *vm, int slot, void *bitmap, uint32_t num_pages); + /* Hook to call when after each vcpu run */ + void (*after_vcpu_run)(struct kvm_vm *vm); } log_modes[LOG_MODE_NUM] = { { .name = "dirty-log", .create_vm_done = NULL, .collect_dirty_pages = dirty_log_collect_dirty_pages, + .after_vcpu_run = default_after_vcpu_run, }, { .name = "clear-log", .create_vm_done = clear_log_create_vm_done, .collect_dirty_pages = clear_log_collect_dirty_pages, + .after_vcpu_run = default_after_vcpu_run, }, }; @@ -224,6 +237,14 @@ static void log_mode_collect_dirty_pages(struct kvm_vm *vm, int slot, mode->collect_dirty_pages(vm, slot, bitmap, num_pages); } +static void log_mode_after_vcpu_run(struct kvm_vm *vm) +{ + struct log_mode *mode = &log_modes[host_log_mode]; + + if (mode->after_vcpu_run) + mode->after_vcpu_run(vm); +} + static void generate_random_array(uint64_t *guest_array, uint64_t size) { uint64_t i; @@ -237,31 +258,17 @@ static void *vcpu_worker(void *data) int ret; struct kvm_vm *vm = data; uint64_t *guest_array; - uint64_t pages_count = 0; - struct kvm_run *run; - - run = vcpu_state(vm, VCPU_ID); guest_array = addr_gva2hva(vm, (vm_vaddr_t)random_array); - generate_random_array(guest_array, TEST_PAGES_PER_LOOP); while (!READ_ONCE(host_quit)) { + generate_random_array(guest_array, TEST_PAGES_PER_LOOP); /* Let the guest dirty the random pages */ ret = _vcpu_run(vm, VCPU_ID); TEST_ASSERT(ret == 0, "vcpu_run failed: %d\n", ret); - if (get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC) { - pages_count += TEST_PAGES_PER_LOOP; - generate_random_array(guest_array, TEST_PAGES_PER_LOOP); - } else { - TEST_ASSERT(false, - "Invalid guest sync status: " - "exit_reason=%s\n", - exit_reason_str(run->exit_reason)); - } + log_mode_after_vcpu_run(vm); } - DEBUG("Dirtied %"PRIu64" pages\n", pages_count); - return NULL; } From patchwork Fri Nov 29 21:35:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11267631 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1D1A7109A for ; Fri, 29 Nov 2019 21:35:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D0B6C2424C for ; Fri, 29 Nov 2019 21:35:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="SuX3Ruu9" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387437AbfK2Vfe (ORCPT ); Fri, 29 Nov 2019 16:35:34 -0500 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:24751 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2387419AbfK2Vfc (ORCPT ); Fri, 29 Nov 2019 16:35:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575063330; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XHvz4uCnwPahuSSwiWfF4UrUk2alj3qViK65/D923A4=; b=SuX3Ruu9pLH4AgjvUXTZ9LHJOyiJlQcDelXBBz9a/UTiXoNhwSyrs9zdprCBxVmxKJawtV K0Y7oML6olt/gdxIS+0KiKONo5mpKX+p/iEU37Orv60kQKtBYw1HaCNEbfGWleZq1VahTx MjAV5MskbICGI4Sbtxn0ARZt9E1UHBk= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-277-eeKeLtMSNcS2tkJimIMY8g-1; Fri, 29 Nov 2019 16:35:29 -0500 Received: by mail-qt1-f198.google.com with SMTP id s8so19654322qtq.17 for ; Fri, 29 Nov 2019 13:35:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=AQHPtukpR8BIC9uU90sfq2RrCEOxqZPLoV+AnhfQkQ0=; b=Le0XIQNC8scxLZPPS4RVyJ7R03PXUrIMBR3z27wmjrA82Tj3Q5jAVZwUqtI885nO+I PZbBlk4rQrJfopTIz4D1qr+e/48wA61QVz1J28JTtwJdqHtDJFsTVLcIi2xQmDvTldUP 8QIf0rpJkTeSTahvZVpzlXyLKyMMCjkNqRES9sr7lJzY0eYZq8i8duWhaJorWu9avkbw n3ESymzmPRrKHDiuzAfvhSPucyY5/rMtlka14qA63MG4wOt/EqrtwcNXeexMIarrFStI FTiOVrDaLJSKfL+kT6/srBnI1w1Nm/yoms/6fDVftiAs0LwRl5AQglw+nAQ7kXizPM1X V+sA== X-Gm-Message-State: APjAAAXN+gl9+8GrNWoLmgq6LseGqDEnaU5BuWANemBNWFlxVV3XJF/f jvAFGMtUlPovIg7tBL0Im7L41my8TKGS9VuQoLxs6V/lwQsbJUsFaF6TyyB9iFbSUcJhOSZRM6C o0efM/p7iCEyK X-Received: by 2002:ac8:43da:: with SMTP id w26mr43275235qtn.272.1575063328684; Fri, 29 Nov 2019 13:35:28 -0800 (PST) X-Google-Smtp-Source: APXvYqxDmuGEjts35ZAA78yzU46ONcDtlygTqVON5jLy8y+XNRNbspZdBrfALEfw2fF5MLkkh53b1w== X-Received: by 2002:ac8:43da:: with SMTP id w26mr43275199qtn.272.1575063328230; Fri, 29 Nov 2019 13:35:28 -0800 (PST) Received: from xz-x1.yyz.redhat.com ([104.156.64.74]) by smtp.gmail.com with ESMTPSA id h186sm10679046qkf.64.2019.11.29.13.35.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Nov 2019 13:35:27 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Sean Christopherson , Paolo Bonzini , "Dr . David Alan Gilbert" , peterx@redhat.com, Vitaly Kuznetsov Subject: [PATCH RFC 12/15] KVM: selftests: Add dirty ring buffer test Date: Fri, 29 Nov 2019 16:35:02 -0500 Message-Id: <20191129213505.18472-13-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20191129213505.18472-1-peterx@redhat.com> References: <20191129213505.18472-1-peterx@redhat.com> MIME-Version: 1.0 X-MC-Unique: eeKeLtMSNcS2tkJimIMY8g-1 X-Mimecast-Spam-Score: 0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add the initial dirty ring buffer test. The current test implements the userspace dirty ring collection, by only reaping the dirty ring when the ring is full. So it's still running asynchronously like this: vcpu main thread 1. vcpu dirties pages 2. vcpu gets dirty ring full (userspace exit) 3. main thread waits until full (so hardware buffers flushed) 4. main thread collects 5. main thread continues vcpu 6. vcpu continues, goes back to 1 We can't directly collects dirty bits during vcpu execution because otherwise we can't guarantee the hardware dirty bits were flushed when we collect and we're very strict on the dirty bits so otherwise we can fail the future verify procedure. A follow up patch will make this test to support async just like the existing dirty log test, by adding a vcpu kick mechanism. Signed-off-by: Peter Xu --- tools/testing/selftests/kvm/dirty_log_test.c | 148 ++++++++++++++++++ .../testing/selftests/kvm/include/kvm_util.h | 5 + tools/testing/selftests/kvm/lib/kvm_util.c | 95 +++++++++++ .../selftests/kvm/lib/kvm_util_internal.h | 5 + 4 files changed, 253 insertions(+) diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index 3542311f56ff..968e35c5d380 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -12,8 +12,10 @@ #include #include #include +#include #include #include +#include #include "test_util.h" #include "kvm_util.h" @@ -57,6 +59,8 @@ # define test_and_clear_bit_le test_and_clear_bit #endif +#define TEST_DIRTY_RING_COUNT 1024 + /* * Guest/Host shared variables. Ensure addr_gva2hva() and/or * sync_global_to/from_guest() are used when accessing from @@ -128,6 +132,10 @@ static uint64_t host_dirty_count; static uint64_t host_clear_count; static uint64_t host_track_next_count; +/* Whether dirty ring reset is requested, or finished */ +static sem_t dirty_ring_vcpu_stop; +static sem_t dirty_ring_vcpu_cont; + enum log_mode_t { /* Only use KVM_GET_DIRTY_LOG for logging */ LOG_MODE_DIRTY_LOG = 0, @@ -135,6 +143,9 @@ enum log_mode_t { /* Use both KVM_[GET|CLEAR]_DIRTY_LOG for logging */ LOG_MODE_CLERA_LOG = 1, + /* Use dirty ring for logging */ + LOG_MODE_DIRTY_RING = 2, + LOG_MODE_NUM, }; @@ -177,6 +188,123 @@ static void default_after_vcpu_run(struct kvm_vm *vm) exit_reason_str(run->exit_reason)); } +static void dirty_ring_create_vm_done(struct kvm_vm *vm) +{ + /* + * Switch to dirty ring mode after VM creation but before any + * of the vcpu creation. + */ + vm_enable_dirty_ring(vm, TEST_DIRTY_RING_COUNT * + sizeof(struct kvm_dirty_gfn)); +} + +static uint32_t dirty_ring_collect_one(struct kvm_dirty_gfn *dirty_gfns, + struct kvm_dirty_ring_indexes *indexes, + int slot, void *bitmap, + uint32_t num_pages, int index) +{ + struct kvm_dirty_gfn *cur; + uint32_t avail, fetch, count = 0; + + /* + * We should keep it somewhere, but to be simple we read + * fetch_index too. + */ + fetch = READ_ONCE(indexes->fetch_index); + avail = READ_ONCE(indexes->avail_index); + + /* Make sure we read valid entries always */ + rmb(); + + DEBUG("ring %d: fetch: 0x%x, avail: 0x%x\n", index, fetch, avail); + + while (fetch != avail) { + cur = &dirty_gfns[fetch % test_dirty_ring_count]; + TEST_ASSERT(cur->pad == 0, "Padding is non-zero: 0x%x", cur->pad); + TEST_ASSERT(cur->slot == slot, "Slot number didn't match: " + "%u != %u", cur->slot, slot); + TEST_ASSERT(cur->offset < num_pages, "Offset overflow: " + "0x%llx >= 0x%llx", cur->offset, num_pages); + //DEBUG("slot %d offset %llu\n", cur->slot, cur->offset); + test_and_set_bit(cur->offset, bitmap); + fetch++; + count++; + } + WRITE_ONCE(indexes->fetch_index, fetch); + + return count; +} + +static void dirty_ring_collect_dirty_pages(struct kvm_vm *vm, int slot, + void *bitmap, uint32_t num_pages) +{ + /* We only have one vcpu */ + struct kvm_run *state = vcpu_state(vm, VCPU_ID); + struct kvm_vm_run *vm_run = vm_state(vm); + uint32_t count = 0, cleared; + + /* + * Before fetching the dirty pages, we need a vmexit of the + * worker vcpu to make sure the hardware dirty buffers were + * flushed. This is not needed for dirty-log/clear-log tests + * because get dirty log will natually do so. + * + * For now we do it in the simple way - we simply wait until + * the vcpu uses up the soft dirty ring, then it'll always + * do a vmexit to make sure that PML buffers will be flushed. + * In real hypervisors, we probably need a vcpu kick or to + * stop the vcpus (before the final sync) to make sure we'll + * get all the existing dirty PFNs even cached in hardware. + */ + sem_wait(&dirty_ring_vcpu_stop); + + count += dirty_ring_collect_one(kvm_map_dirty_ring(vm), + &vm_run->vm_ring_indexes, + slot, bitmap, num_pages, -1); + + /* Only have one vcpu */ + count += dirty_ring_collect_one(vcpu_map_dirty_ring(vm, VCPU_ID), + &state->vcpu_ring_indexes, + slot, bitmap, num_pages, VCPU_ID); + + cleared = kvm_vm_reset_dirty_ring(vm); + + /* Cleared pages should be the same as collected */ + TEST_ASSERT(cleared == count, "Reset dirty pages (%u) mismatch " + "with collected (%u)", cleared, count); + + DEBUG("Notifying vcpu to continue\n"); + sem_post(&dirty_ring_vcpu_cont); + + DEBUG("Iteration %ld collected %u pages\n", iteration, count); +} + +static void dirty_ring_after_vcpu_run(struct kvm_vm *vm) +{ + struct kvm_run *run = vcpu_state(vm, VCPU_ID); + + /* A ucall-sync or ring-full event is allowed */ + if (get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC) { + /* We should allow this to continue */ + ; + } else if (run->exit_reason == KVM_EXIT_DIRTY_RING_FULL) { + sem_post(&dirty_ring_vcpu_stop); + DEBUG("vcpu stops because dirty ring full...\n"); + sem_wait(&dirty_ring_vcpu_cont); + DEBUG("vcpu continues now.\n"); + } else { + TEST_ASSERT(false, "Invalid guest sync status: " + "exit_reason=%s\n", + exit_reason_str(run->exit_reason)); + } +} + +static void dirty_ring_before_vcpu_join(void) +{ + /* Kick another round of vcpu just to make sure it will quit */ + sem_post(&dirty_ring_vcpu_cont); +} + struct log_mode { const char *name; /* Hook when the vm creation is done (before vcpu creation) */ @@ -186,6 +314,7 @@ struct log_mode { void *bitmap, uint32_t num_pages); /* Hook to call when after each vcpu run */ void (*after_vcpu_run)(struct kvm_vm *vm); + void (*before_vcpu_join) (void); } log_modes[LOG_MODE_NUM] = { { .name = "dirty-log", @@ -199,6 +328,13 @@ struct log_mode { .collect_dirty_pages = clear_log_collect_dirty_pages, .after_vcpu_run = default_after_vcpu_run, }, + { + .name = "dirty-ring", + .create_vm_done = dirty_ring_create_vm_done, + .collect_dirty_pages = dirty_ring_collect_dirty_pages, + .before_vcpu_join = dirty_ring_before_vcpu_join, + .after_vcpu_run = dirty_ring_after_vcpu_run, + }, }; /* @@ -245,6 +381,14 @@ static void log_mode_after_vcpu_run(struct kvm_vm *vm) mode->after_vcpu_run(vm); } +static void log_mode_before_vcpu_join(void) +{ + struct log_mode *mode = &log_modes[host_log_mode]; + + if (mode->before_vcpu_join) + mode->before_vcpu_join(); +} + static void generate_random_array(uint64_t *guest_array, uint64_t size) { uint64_t i; @@ -460,6 +604,7 @@ static void run_test(enum vm_guest_mode mode, unsigned long iterations, /* Tell the vcpu thread to quit */ host_quit = true; + log_mode_before_vcpu_join(); pthread_join(vcpu_thread, NULL); DEBUG("Total bits checked: dirty (%"PRIu64"), clear (%"PRIu64"), " @@ -524,6 +669,9 @@ int main(int argc, char *argv[]) unsigned int host_ipa_limit; #endif + sem_init(&dirty_ring_vcpu_stop, 0, 0); + sem_init(&dirty_ring_vcpu_cont, 0, 0); + #ifdef __x86_64__ vm_guest_mode_params_init(VM_MODE_PXXV48_4K, true, true); #endif diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h index 29cccaf96baf..5ad52f38af8d 100644 --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@ -67,6 +67,7 @@ enum vm_mem_backing_src_type { int kvm_check_cap(long cap); int vm_enable_cap(struct kvm_vm *vm, struct kvm_enable_cap *cap); +void vm_enable_dirty_ring(struct kvm_vm *vm, uint32_t ring_size); struct kvm_vm *vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm); struct kvm_vm *_vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm); @@ -76,6 +77,7 @@ void kvm_vm_release(struct kvm_vm *vmp); void kvm_vm_get_dirty_log(struct kvm_vm *vm, int slot, void *log); void kvm_vm_clear_dirty_log(struct kvm_vm *vm, int slot, void *log, uint64_t first_page, uint32_t num_pages); +uint32_t kvm_vm_reset_dirty_ring(struct kvm_vm *vm); int kvm_memcmp_hva_gva(void *hva, struct kvm_vm *vm, const vm_vaddr_t gva, size_t len); @@ -111,6 +113,7 @@ vm_paddr_t addr_hva2gpa(struct kvm_vm *vm, void *hva); vm_paddr_t addr_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva); struct kvm_run *vcpu_state(struct kvm_vm *vm, uint32_t vcpuid); +struct kvm_vm_run *vm_state(struct kvm_vm *vm); void vcpu_run(struct kvm_vm *vm, uint32_t vcpuid); int _vcpu_run(struct kvm_vm *vm, uint32_t vcpuid); void vcpu_run_complete_io(struct kvm_vm *vm, uint32_t vcpuid); @@ -137,6 +140,8 @@ void vcpu_nested_state_get(struct kvm_vm *vm, uint32_t vcpuid, int vcpu_nested_state_set(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_nested_state *state, bool ignore_error); #endif +void *vcpu_map_dirty_ring(struct kvm_vm *vm, uint32_t vcpuid); +void *kvm_map_dirty_ring(struct kvm_vm *vm); const char *exit_reason_str(unsigned int exit_reason); diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 41cf45416060..3a71e66a0b58 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -85,6 +85,26 @@ int vm_enable_cap(struct kvm_vm *vm, struct kvm_enable_cap *cap) return ret; } +void vm_enable_dirty_ring(struct kvm_vm *vm, uint32_t ring_size) +{ + struct kvm_enable_cap cap = {}; + int ret; + + ret = kvm_check_cap(KVM_CAP_DIRTY_LOG_RING); + + TEST_ASSERT(ret >= 0, "KVM_CAP_DIRTY_LOG_RING"); + + if (ret == 0) { + fprintf(stderr, "KVM does not support dirty ring, skipping tests\n"); + exit(KSFT_SKIP); + } + + cap.cap = KVM_CAP_DIRTY_LOG_RING; + cap.args[0] = ring_size; + vm_enable_cap(vm, &cap); + vm->dirty_ring_size = ring_size; +} + static void vm_open(struct kvm_vm *vm, int perm) { vm->kvm_fd = open(KVM_DEV_PATH, perm); @@ -297,6 +317,11 @@ void kvm_vm_clear_dirty_log(struct kvm_vm *vm, int slot, void *log, strerror(-ret)); } +uint32_t kvm_vm_reset_dirty_ring(struct kvm_vm *vm) +{ + return ioctl(vm->fd, KVM_RESET_DIRTY_RINGS); +} + /* * Userspace Memory Region Find * @@ -408,6 +433,13 @@ static void vm_vcpu_rm(struct kvm_vm *vm, uint32_t vcpuid) struct vcpu *vcpu = vcpu_find(vm, vcpuid); int ret; + if (vcpu->dirty_gfns) { + ret = munmap(vcpu->dirty_gfns, vm->dirty_ring_size); + TEST_ASSERT(ret == 0, "munmap of VCPU dirty ring failed, " + "rc: %i errno: %i", ret, errno); + vcpu->dirty_gfns = NULL; + } + ret = munmap(vcpu->state, sizeof(*vcpu->state)); TEST_ASSERT(ret == 0, "munmap of VCPU fd failed, rc: %i " "errno: %i", ret, errno); @@ -447,6 +479,16 @@ void kvm_vm_free(struct kvm_vm *vmp) { int ret; + if (vmp->vm_run) { + munmap(vmp->vm_run, sizeof(struct kvm_vm_run)); + vmp->vm_run = NULL; + } + + if (vmp->vm_dirty_gfns) { + munmap(vmp->vm_dirty_gfns, vmp->dirty_ring_size); + vmp->vm_dirty_gfns = NULL; + } + if (vmp == NULL) return; @@ -1122,6 +1164,18 @@ struct kvm_run *vcpu_state(struct kvm_vm *vm, uint32_t vcpuid) return vcpu->state; } +struct kvm_vm_run *vm_state(struct kvm_vm *vm) +{ + if (!vm->vm_run) { + vm->vm_run = (struct kvm_vm_run *) + mmap(NULL, sizeof(struct kvm_vm_run), + PROT_READ | PROT_WRITE, MAP_SHARED, vm->fd, 0); + TEST_ASSERT(vm->vm_run != MAP_FAILED, + "kvm vm run mapping failed"); + } + return vm->vm_run; +} + /* * VM VCPU Run * @@ -1409,6 +1463,46 @@ int _vcpu_ioctl(struct kvm_vm *vm, uint32_t vcpuid, return ret; } +void *vcpu_map_dirty_ring(struct kvm_vm *vm, uint32_t vcpuid) +{ + struct vcpu *vcpu; + uint32_t size = vm->dirty_ring_size; + + TEST_ASSERT(size > 0, "Should enable dirty ring first"); + + vcpu = vcpu_find(vm, vcpuid); + + TEST_ASSERT(vcpu, "Cannot find vcpu %u", vcpuid); + + if (!vcpu->dirty_gfns) { + vcpu->dirty_gfns_count = size / sizeof(struct kvm_dirty_gfn); + vcpu->dirty_gfns = mmap(NULL, size, PROT_READ | PROT_WRITE, + MAP_SHARED, vcpu->fd, vm->page_size * + KVM_DIRTY_LOG_PAGE_OFFSET); + TEST_ASSERT(vcpu->dirty_gfns != MAP_FAILED, + "Dirty ring map failed"); + } + + return vcpu->dirty_gfns; +} + +void *kvm_map_dirty_ring(struct kvm_vm *vm) +{ + uint32_t size = vm->dirty_ring_size; + + TEST_ASSERT(size > 0, "Should enable dirty ring first"); + + if (!vm->vm_dirty_gfns) { + vm->vm_dirty_gfns = mmap(NULL, size, PROT_READ | PROT_WRITE, + MAP_SHARED, vm->fd, vm->page_size * + KVM_DIRTY_LOG_PAGE_OFFSET); + TEST_ASSERT(vm->vm_dirty_gfns != MAP_FAILED, + "Dirty ring map failed"); + } + + return vm->vm_dirty_gfns; +} + /* * VM Ioctl * @@ -1503,6 +1597,7 @@ static struct exit_reason { {KVM_EXIT_INTERNAL_ERROR, "INTERNAL_ERROR"}, {KVM_EXIT_OSI, "OSI"}, {KVM_EXIT_PAPR_HCALL, "PAPR_HCALL"}, + {KVM_EXIT_DIRTY_RING_FULL, "DIRTY_RING_FULL"}, #ifdef KVM_EXIT_MEMORY_NOT_PRESENT {KVM_EXIT_MEMORY_NOT_PRESENT, "MEMORY_NOT_PRESENT"}, #endif diff --git a/tools/testing/selftests/kvm/lib/kvm_util_internal.h b/tools/testing/selftests/kvm/lib/kvm_util_internal.h index ac50c42750cf..3423d78d7993 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util_internal.h +++ b/tools/testing/selftests/kvm/lib/kvm_util_internal.h @@ -39,6 +39,8 @@ struct vcpu { uint32_t id; int fd; struct kvm_run *state; + struct kvm_dirty_gfn *dirty_gfns; + uint32_t dirty_gfns_count; }; struct kvm_vm { @@ -61,6 +63,9 @@ struct kvm_vm { vm_paddr_t pgd; vm_vaddr_t gdt; vm_vaddr_t tss; + uint32_t dirty_ring_size; + struct kvm_vm_run *vm_run; + struct kvm_dirty_gfn *vm_dirty_gfns; }; struct vcpu *vcpu_find(struct kvm_vm *vm, uint32_t vcpuid); From patchwork Fri Nov 29 21:35:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11267637 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8C05915AB for ; Fri, 29 Nov 2019 21:35:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 546EE206B5 for ; Fri, 29 Nov 2019 21:35:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Jz3H4Uhr" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387433AbfK2Vfd (ORCPT ); Fri, 29 Nov 2019 16:35:33 -0500 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:57161 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2387424AbfK2Vfc (ORCPT ); Fri, 29 Nov 2019 16:35:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575063331; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uyOesPhoNbZ1gr0Wg6vykFa3j3pfatuG9nppSZClqgg=; b=Jz3H4UhrIiQlAs3dWmby2pxdCJExkp6gDe5pcmMcXlwigHAMIbpodlIfIaxMUBvSizk3TL 00gfGjLLwMB0l7Pn0J1eNzZeNjTLj4hxw8XbGx0XuaJmKLzPbtHJlaHKlUjBGAONdBwcDo LIXufltDOTGIN/QcvneqMdzNt30mOAs= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-215-rFCEtGLgNmCy1BsyBprlWA-1; Fri, 29 Nov 2019 16:35:30 -0500 Received: by mail-qv1-f72.google.com with SMTP id bt18so2773236qvb.19 for ; Fri, 29 Nov 2019 13:35:30 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=0UF9lp1lBL4kSan9fGihxlve/ru+UF638aRJRIgajpw=; b=UvATH73T0QRTyHpiJ3NOPrzSIkmtApoVcyAZl6h4hTppYJh5kJrcFIxbyS+OMtS811 yxnbEUuBb11v6pg3d2FXayUDdTBK2Juc+QxZ+OKoAe7AvCubxLOEvYGD4TUM+/tcXsbq NPVxL6RzQH4iqia2FbBT+rgAoSVogmJ3Hvm/7/1nWypb36nxlcNuVSaOBlbtyTxoWIEE ssZ6e+n6R7bszvhmYZlY7fWbjU0y8g2/tkIRsDrc5q04PYdnBR5kXTngqt5CwQPHdXWt kVHzWaQYJKPkzXPPge6a+WGnXiczCCzjT0dG8lve8k9N2b8O6wJRAL/ycXo7PlqyrlgM SlLQ== X-Gm-Message-State: APjAAAUMCP79G9yok4I4Zp3Ap9oiovh4m+CKrj6zie9RQLB4qnMOJqnL vrUm0KoOSRNanVtDoEk6c8nEwngFNgvApJL3evvITkIuF0b0kBulXCARQy3djieKkrcjM1CdHwa sGEkmjVh8A0yF X-Received: by 2002:ac8:425a:: with SMTP id r26mr1876841qtm.138.1575063329885; Fri, 29 Nov 2019 13:35:29 -0800 (PST) X-Google-Smtp-Source: APXvYqzi5Wvkam1/QRsqbEEqLm30CtDE/1BNPpW0WM979SEKE3y801pwHLFnw/vKPwaJPPOjVnyACg== X-Received: by 2002:ac8:425a:: with SMTP id r26mr1876817qtm.138.1575063329571; Fri, 29 Nov 2019 13:35:29 -0800 (PST) Received: from xz-x1.yyz.redhat.com ([104.156.64.74]) by smtp.gmail.com with ESMTPSA id h186sm10679046qkf.64.2019.11.29.13.35.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Nov 2019 13:35:28 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Sean Christopherson , Paolo Bonzini , "Dr . David Alan Gilbert" , peterx@redhat.com, Vitaly Kuznetsov Subject: [PATCH RFC 13/15] KVM: selftests: Let dirty_log_test async for dirty ring test Date: Fri, 29 Nov 2019 16:35:03 -0500 Message-Id: <20191129213505.18472-14-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20191129213505.18472-1-peterx@redhat.com> References: <20191129213505.18472-1-peterx@redhat.com> MIME-Version: 1.0 X-MC-Unique: rFCEtGLgNmCy1BsyBprlWA-1 X-Mimecast-Spam-Score: 0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Previously the dirty ring test was working in synchronous way, because only with a vmexit (with that it was the ring full event) we'll know the hardware dirty bits will be flushed to the dirty ring. With this patch we first introduced the vcpu kick mechanism by using SIGUSR1, meanwhile we can have a guarantee of vmexit and also the flushing of hardware dirty bits. With all these, we can keep the vcpu dirty work asynchronous of the whole collection procedure now. Further increase the dirty ring size to current maximum to make sure we torture more on the no-ring-full case, which should be the major scenario when the hypervisors like QEMU would like to use this feature. Signed-off-by: Peter Xu --- tools/testing/selftests/kvm/dirty_log_test.c | 74 ++++++++++++------- .../testing/selftests/kvm/include/kvm_util.h | 1 + tools/testing/selftests/kvm/lib/kvm_util.c | 8 ++ 3 files changed, 57 insertions(+), 26 deletions(-) diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index 968e35c5d380..4799db91e919 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -13,6 +13,9 @@ #include #include #include +#include +#include +#include #include #include #include @@ -59,7 +62,9 @@ # define test_and_clear_bit_le test_and_clear_bit #endif -#define TEST_DIRTY_RING_COUNT 1024 +#define TEST_DIRTY_RING_COUNT 65536 + +#define SIG_IPI SIGUSR1 /* * Guest/Host shared variables. Ensure addr_gva2hva() and/or @@ -151,6 +156,20 @@ enum log_mode_t { /* Mode of logging. Default is LOG_MODE_DIRTY_LOG */ static enum log_mode_t host_log_mode; +pthread_t vcpu_thread; + +/* Only way to pass this to the signal handler */ +struct kvm_vm *current_vm; + +static void vcpu_sig_handler(int sig) +{ + TEST_ASSERT(sig == SIG_IPI, "unknown signal: %d", sig); +} + +static void vcpu_kick(void) +{ + pthread_kill(vcpu_thread, SIG_IPI); +} static void clear_log_create_vm_done(struct kvm_vm *vm) { @@ -179,10 +198,13 @@ static void clear_log_collect_dirty_pages(struct kvm_vm *vm, int slot, kvm_vm_clear_dirty_log(vm, slot, bitmap, 0, num_pages); } -static void default_after_vcpu_run(struct kvm_vm *vm) +static void default_after_vcpu_run(struct kvm_vm *vm, int ret, int err) { struct kvm_run *run = vcpu_state(vm, VCPU_ID); + TEST_ASSERT(ret == 0 || (ret == -1 && err == EINTR), + "vcpu run failed: errno=%d", err); + TEST_ASSERT(get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC, "Invalid guest sync status: exit_reason=%s\n", exit_reason_str(run->exit_reason)); @@ -244,19 +266,15 @@ static void dirty_ring_collect_dirty_pages(struct kvm_vm *vm, int slot, uint32_t count = 0, cleared; /* - * Before fetching the dirty pages, we need a vmexit of the - * worker vcpu to make sure the hardware dirty buffers were - * flushed. This is not needed for dirty-log/clear-log tests - * because get dirty log will natually do so. - * - * For now we do it in the simple way - we simply wait until - * the vcpu uses up the soft dirty ring, then it'll always - * do a vmexit to make sure that PML buffers will be flushed. - * In real hypervisors, we probably need a vcpu kick or to - * stop the vcpus (before the final sync) to make sure we'll - * get all the existing dirty PFNs even cached in hardware. + * These steps will make sure hardware buffer flushed to dirty + * ring. Now with the vcpu kick mechanism we can keep the + * vcpu running even during collecting dirty bits without ring + * full. */ + vcpu_kick(); sem_wait(&dirty_ring_vcpu_stop); + DEBUG("Notifying vcpu to continue\n"); + sem_post(&dirty_ring_vcpu_cont); count += dirty_ring_collect_one(kvm_map_dirty_ring(vm), &vm_run->vm_ring_indexes, @@ -273,13 +291,10 @@ static void dirty_ring_collect_dirty_pages(struct kvm_vm *vm, int slot, TEST_ASSERT(cleared == count, "Reset dirty pages (%u) mismatch " "with collected (%u)", cleared, count); - DEBUG("Notifying vcpu to continue\n"); - sem_post(&dirty_ring_vcpu_cont); - DEBUG("Iteration %ld collected %u pages\n", iteration, count); } -static void dirty_ring_after_vcpu_run(struct kvm_vm *vm) +static void dirty_ring_after_vcpu_run(struct kvm_vm *vm, int ret, int err) { struct kvm_run *run = vcpu_state(vm, VCPU_ID); @@ -287,9 +302,11 @@ static void dirty_ring_after_vcpu_run(struct kvm_vm *vm) if (get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC) { /* We should allow this to continue */ ; - } else if (run->exit_reason == KVM_EXIT_DIRTY_RING_FULL) { + } else if (run->exit_reason == KVM_EXIT_DIRTY_RING_FULL || + (ret == -1 && err == EINTR)) { + /* Either ring full, or we're probably kicked out */ sem_post(&dirty_ring_vcpu_stop); - DEBUG("vcpu stops because dirty ring full...\n"); + DEBUG("vcpu stops because dirty ring full or kicked...\n"); sem_wait(&dirty_ring_vcpu_cont); DEBUG("vcpu continues now.\n"); } else { @@ -313,7 +330,7 @@ struct log_mode { void (*collect_dirty_pages) (struct kvm_vm *vm, int slot, void *bitmap, uint32_t num_pages); /* Hook to call when after each vcpu run */ - void (*after_vcpu_run)(struct kvm_vm *vm); + void (*after_vcpu_run)(struct kvm_vm *vm, int ret, int err); void (*before_vcpu_join) (void); } log_modes[LOG_MODE_NUM] = { { @@ -373,12 +390,12 @@ static void log_mode_collect_dirty_pages(struct kvm_vm *vm, int slot, mode->collect_dirty_pages(vm, slot, bitmap, num_pages); } -static void log_mode_after_vcpu_run(struct kvm_vm *vm) +static void log_mode_after_vcpu_run(struct kvm_vm *vm, int ret, int err) { struct log_mode *mode = &log_modes[host_log_mode]; if (mode->after_vcpu_run) - mode->after_vcpu_run(vm); + mode->after_vcpu_run(vm, ret, err); } static void log_mode_before_vcpu_join(void) @@ -402,15 +419,21 @@ static void *vcpu_worker(void *data) int ret; struct kvm_vm *vm = data; uint64_t *guest_array; + struct sigaction sigact; + + current_vm = vm; + memset(&sigact, 0, sizeof(sigact)); + sigact.sa_handler = vcpu_sig_handler; + sigaction(SIG_IPI, &sigact, NULL); guest_array = addr_gva2hva(vm, (vm_vaddr_t)random_array); while (!READ_ONCE(host_quit)) { + /* Clear any existing kick signals */ generate_random_array(guest_array, TEST_PAGES_PER_LOOP); /* Let the guest dirty the random pages */ - ret = _vcpu_run(vm, VCPU_ID); - TEST_ASSERT(ret == 0, "vcpu_run failed: %d\n", ret); - log_mode_after_vcpu_run(vm); + ret = __vcpu_run(vm, VCPU_ID); + log_mode_after_vcpu_run(vm, ret, errno); } return NULL; @@ -506,7 +529,6 @@ static struct kvm_vm *create_vm(enum vm_guest_mode mode, uint32_t vcpuid, static void run_test(enum vm_guest_mode mode, unsigned long iterations, unsigned long interval, uint64_t phys_offset) { - pthread_t vcpu_thread; struct kvm_vm *vm; unsigned long *bmap; diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h index 5ad52f38af8d..fe5db2da7e73 100644 --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@ -116,6 +116,7 @@ struct kvm_run *vcpu_state(struct kvm_vm *vm, uint32_t vcpuid); struct kvm_vm_run *vm_state(struct kvm_vm *vm); void vcpu_run(struct kvm_vm *vm, uint32_t vcpuid); int _vcpu_run(struct kvm_vm *vm, uint32_t vcpuid); +int __vcpu_run(struct kvm_vm *vm, uint32_t vcpuid); void vcpu_run_complete_io(struct kvm_vm *vm, uint32_t vcpuid); void vcpu_set_mp_state(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_mp_state *mp_state); diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 3a71e66a0b58..2addd0a7310f 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -1209,6 +1209,14 @@ int _vcpu_run(struct kvm_vm *vm, uint32_t vcpuid) return rc; } +int __vcpu_run(struct kvm_vm *vm, uint32_t vcpuid) +{ + struct vcpu *vcpu = vcpu_find(vm, vcpuid); + + TEST_ASSERT(vcpu != NULL, "vcpu not found, vcpuid: %u", vcpuid); + return ioctl(vcpu->fd, KVM_RUN, NULL); +} + void vcpu_run_complete_io(struct kvm_vm *vm, uint32_t vcpuid) { struct vcpu *vcpu = vcpu_find(vm, vcpuid); From patchwork Fri Nov 29 21:35:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11267635 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2A2C3109A for ; Fri, 29 Nov 2019 21:35:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0696E206B5 for ; Fri, 29 Nov 2019 21:35:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="IzUL7DAG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387474AbfK2Vfp (ORCPT ); Fri, 29 Nov 2019 16:35:45 -0500 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:54825 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2387442AbfK2Vff (ORCPT ); Fri, 29 Nov 2019 16:35:35 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575063334; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8QU/MqZQ7iCrQiVB69BeiYZxD7DRU81Tku8GelhIUiI=; b=IzUL7DAGXh3zcCOhlyhmqbCnoju7FiA+ci2onjymAWUoY4mtqhESoVqIgJyBFX8LZthTzw ozbuU+FB7Wnvty04KSmBpT1qyRn/CfvyeDrns8JQpHPGfsJfdxTSh8n6z2FEoOVMx/D4lc EUi930BTVYEuEw0Sj/SuGFj4ZuRV7qE= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-217--47cwsCDN7OljjmuqIJh2g-1; Fri, 29 Nov 2019 16:35:31 -0500 Received: by mail-qv1-f69.google.com with SMTP id b15so19641156qvw.6 for ; Fri, 29 Nov 2019 13:35:31 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ph8QPMVUt3TogI2QUxz0d6tIzYuPtW9qXzgfM49LbX4=; b=H3nnhXxWUAMfafdT3Yf//OFdj90bZi46GuQ84WDgAbwvwnS0KI63vPDwEW9lUwtFOf lv3rpv7k9+MDr+d6g3mcdanPdi/P3NA36k0BhtHVO+HCVWrNBFqj028VYx2kdYpr3tsH maOHjdUe1hOJdSXKRzadVUSce0H/myi4nd5Wft4WugP8zZRWqWv3uNAn8qHbJ3qEAfBv H4fT87/u6Z1L6VcDDiFsEOvLxnEmc18HiHweQCNyegGp1fKGTfG+c7vA6hC8uxcoNBel IH2ffsvAJZUmKXreNhrsjex/Q6frpcY1GmjbquRPzvr0q8gR/MM8W/dXOWW9QzBC6+jr SyRA== X-Gm-Message-State: APjAAAXra0vtlvIqU3y/yi92HXvJIV9XY377TCnCEUBIDysO9LZxLG35 TKTQNSiIQuSWRhwtHD8kjXsjHrWXduPBz/xA7i94w9d1mNg5qvIaVJWbEno33MEpJuaN8/9Q4JR 37qfFlez1YSlg X-Received: by 2002:a05:6214:82:: with SMTP id n2mr19784795qvr.199.1575063330862; Fri, 29 Nov 2019 13:35:30 -0800 (PST) X-Google-Smtp-Source: APXvYqx+kM+aluFDpDmrKxZEi1V2tHOv0XcDObShdoHe1wBVP230mj/s8I3XMHXPIKR24luMQ/sn5A== X-Received: by 2002:a05:6214:82:: with SMTP id n2mr19784772qvr.199.1575063330639; Fri, 29 Nov 2019 13:35:30 -0800 (PST) Received: from xz-x1.yyz.redhat.com ([104.156.64.74]) by smtp.gmail.com with ESMTPSA id h186sm10679046qkf.64.2019.11.29.13.35.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Nov 2019 13:35:30 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Sean Christopherson , Paolo Bonzini , "Dr . David Alan Gilbert" , peterx@redhat.com, Vitaly Kuznetsov Subject: [PATCH RFC 14/15] KVM: selftests: Add "-c" parameter to dirty log test Date: Fri, 29 Nov 2019 16:35:04 -0500 Message-Id: <20191129213505.18472-15-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20191129213505.18472-1-peterx@redhat.com> References: <20191129213505.18472-1-peterx@redhat.com> MIME-Version: 1.0 X-MC-Unique: -47cwsCDN7OljjmuqIJh2g-1 X-Mimecast-Spam-Score: 0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org It's only used to override the existing dirty ring size/count. If with a bigger ring count, we test async of dirty ring. If with a smaller ring count, we test ring full code path. It has no use for non-dirty-ring tests. Signed-off-by: Peter Xu --- tools/testing/selftests/kvm/dirty_log_test.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index 4799db91e919..c9db136a1f12 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -157,6 +157,7 @@ enum log_mode_t { /* Mode of logging. Default is LOG_MODE_DIRTY_LOG */ static enum log_mode_t host_log_mode; pthread_t vcpu_thread; +static uint32_t test_dirty_ring_count = TEST_DIRTY_RING_COUNT; /* Only way to pass this to the signal handler */ struct kvm_vm *current_vm; @@ -216,7 +217,7 @@ static void dirty_ring_create_vm_done(struct kvm_vm *vm) * Switch to dirty ring mode after VM creation but before any * of the vcpu creation. */ - vm_enable_dirty_ring(vm, TEST_DIRTY_RING_COUNT * + vm_enable_dirty_ring(vm, test_dirty_ring_count * sizeof(struct kvm_dirty_gfn)); } @@ -658,6 +659,9 @@ static void help(char *name) printf("usage: %s [-h] [-i iterations] [-I interval] " "[-p offset] [-m mode]\n", name); puts(""); + printf(" -c: specify dirty ring size, in number of entries\n"); + printf(" (only useful for dirty-ring test; default: %"PRIu32")\n", + TEST_DIRTY_RING_COUNT); printf(" -i: specify iteration counts (default: %"PRIu64")\n", TEST_HOST_LOOP_N); printf(" -I: specify interval in ms (default: %"PRIu64" ms)\n", @@ -713,8 +717,11 @@ int main(int argc, char *argv[]) vm_guest_mode_params_init(VM_MODE_P40V48_4K, true, true); #endif - while ((opt = getopt(argc, argv, "hi:I:p:m:M:")) != -1) { + while ((opt = getopt(argc, argv, "c:hi:I:p:m:M:")) != -1) { switch (opt) { + case 'c': + test_dirty_ring_count = strtol(optarg, NULL, 10); + break; case 'i': iterations = strtol(optarg, NULL, 10); break; From patchwork Fri Nov 29 21:35:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11267633 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3593115AB for ; Fri, 29 Nov 2019 21:35:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0872C2424B for ; Fri, 29 Nov 2019 21:35:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="HkIssdS/" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387470AbfK2Vfk (ORCPT ); Fri, 29 Nov 2019 16:35:40 -0500 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:46604 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2387457AbfK2Vfj (ORCPT ); Fri, 29 Nov 2019 16:35:39 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575063337; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=J/6Vnhm3Bnof3G6qR9jmImN6Mxny1Mm+SQ/fRHV6cvM=; b=HkIssdS/tQH/+DRnKBjBGDBnGEcAfyOPNK6OLN4A+s5TGPiMA44NqmuJiEf6RYzFkPLWkN NiMsw0r8GL+ttD7yvZya4+GpWeJlycCiLRqpxLP4beciCPPgGZXN3KoeY6GYSXSLovYPUp z94M8eNoJYllfS68yFKcfUBqbTyW6WM= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-127-H8M-nMrMOTyf-L6sRIuU_g-1; Fri, 29 Nov 2019 16:35:33 -0500 Received: by mail-qt1-f198.google.com with SMTP id r9so8017474qtc.4 for ; Fri, 29 Nov 2019 13:35:32 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ABKV7wtXWm4O1KQ//VgIZzpF4hiWiXzcQGJHlUZJvqM=; b=e6oUp9hJLp/5eVVhefAS10HPUd6VS80YRWPhkmzW0QRO0Q7r8uPAVBmHmarA0mJiSk xCDQwbrVmPq9w1W/cNebxjw29VqfrtDNTyFUGHvLDycmgqo1sMWkBrtaIhjz99daNFyN 4ByeilP8dYgY9UO05tzVhRLFSiD6tdXO+phxddTZ5tlYvkHo/MixcqQuwxKcF39Fitbr jjw6dNgaMxi43vtnCuOX4Zv6ywRRtw9amHaaXZAI+IUhXGFsbn9epoq1OVXb4ZbUI7yU m3RR50FKSKchB7df0GwIGX0TKI3TvucivZ/CxlBvAnKqF0x+F8NZ/fq7Wmw5bOloL9hU T7Yw== X-Gm-Message-State: APjAAAVxLG+BsSeponWA2dXx6kMX0hdjlFQkQq0+jJ+aOjUV0v+6EFVG f370v220vKqWh9M8zXGW2aulO7uDulbkzuEcW9idApgiTA/2GwlbCgEryLB40pDtRipwhjIrxkV dcaUjH0iMMG7q X-Received: by 2002:a0c:baa5:: with SMTP id x37mr19072002qvf.228.1575063332542; Fri, 29 Nov 2019 13:35:32 -0800 (PST) X-Google-Smtp-Source: APXvYqx0gzwjNokztQZ2uGe1W03JAxQU4oiBDN2ekwESAnlau3v7lb+68GzvJT2MYI6UKslZ39yUvA== X-Received: by 2002:a0c:baa5:: with SMTP id x37mr19071983qvf.228.1575063332220; Fri, 29 Nov 2019 13:35:32 -0800 (PST) Received: from xz-x1.yyz.redhat.com ([104.156.64.74]) by smtp.gmail.com with ESMTPSA id h186sm10679046qkf.64.2019.11.29.13.35.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Nov 2019 13:35:31 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Sean Christopherson , Paolo Bonzini , "Dr . David Alan Gilbert" , peterx@redhat.com, Vitaly Kuznetsov Subject: [PATCH RFC 15/15] KVM: selftests: Test dirty ring waitqueue Date: Fri, 29 Nov 2019 16:35:05 -0500 Message-Id: <20191129213505.18472-16-peterx@redhat.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20191129213505.18472-1-peterx@redhat.com> References: <20191129213505.18472-1-peterx@redhat.com> MIME-Version: 1.0 X-MC-Unique: H8M-nMrMOTyf-L6sRIuU_g-1 X-Mimecast-Spam-Score: 0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This is a bit tricky, but should still be reasonable. Firstly we introduce a totally new dirty log test type, because we need to force vcpu to go into a blocked state by dead loop on vcpu_run even if it wants to quit to userspace. Here the tricky part is we need to read the procfs to make sure the vcpu thread is TASK_UNINTERRUPTIBLE. After that, we reset the ring and the reset should kick the vcpu again by moving out of that state. Signed-off-by: Peter Xu --- tools/testing/selftests/kvm/dirty_log_test.c | 101 +++++++++++++++++++ 1 file changed, 101 insertions(+) diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index c9db136a1f12..41bc015131e1 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -151,12 +152,16 @@ enum log_mode_t { /* Use dirty ring for logging */ LOG_MODE_DIRTY_RING = 2, + /* Dirty ring test but tailored for the waitqueue */ + LOG_MODE_DIRTY_RING_WP = 3, + LOG_MODE_NUM, }; /* Mode of logging. Default is LOG_MODE_DIRTY_LOG */ static enum log_mode_t host_log_mode; pthread_t vcpu_thread; +pid_t vcpu_thread_tid; static uint32_t test_dirty_ring_count = TEST_DIRTY_RING_COUNT; /* Only way to pass this to the signal handler */ @@ -221,6 +226,18 @@ static void dirty_ring_create_vm_done(struct kvm_vm *vm) sizeof(struct kvm_dirty_gfn)); } +static void dirty_ring_wq_create_vm_done(struct kvm_vm *vm) +{ + /* + * Force to use a relatively small ring size, so easier to get + * full. Better bigger than PML size, hence 1024. + */ + test_dirty_ring_count = 1024; + DEBUG("Forcing ring size: %u\n", test_dirty_ring_count); + vm_enable_dirty_ring(vm, test_dirty_ring_count * + sizeof(struct kvm_dirty_gfn)); +} + static uint32_t dirty_ring_collect_one(struct kvm_dirty_gfn *dirty_gfns, struct kvm_dirty_ring_indexes *indexes, int slot, void *bitmap, @@ -295,6 +312,81 @@ static void dirty_ring_collect_dirty_pages(struct kvm_vm *vm, int slot, DEBUG("Iteration %ld collected %u pages\n", iteration, count); } +/* + * Return 'D' for uninterruptible, 'R' for running, 'S' for + * interruptible, etc. + */ +static char read_tid_status_char(unsigned int tid) +{ + int fd, ret, line = 0; + char buf[128], *c; + + snprintf(buf, sizeof(buf) - 1, "/proc/%u/status", tid); + fd = open(buf, O_RDONLY); + TEST_ASSERT(fd >= 0, "open status file failed: %s", buf); + ret = read(fd, buf, sizeof(buf) - 1); + TEST_ASSERT(ret > 0, "read status file failed: %d, %d", ret, errno); + close(fd); + + /* Skip 2 lines */ + for (c = buf; c < buf + sizeof(buf) && line < 2; c++) { + if (*c == '\n') { + line++; + continue; + } + } + + /* Skip "Status: " */ + while (*c != ':') c++; + c++; + while (*c == ' ') c++; + c++; + + return *c; +} + +static void dirty_ring_wq_collect_dirty_pages(struct kvm_vm *vm, int slot, + void *bitmap, uint32_t num_pages) +{ + uint32_t count = test_dirty_ring_count; + struct kvm_run *state = vcpu_state(vm, VCPU_ID); + struct kvm_dirty_ring_indexes *indexes = &state->vcpu_ring_indexes; + uint32_t avail; + + while (count--) { + /* + * Force vcpu to run enough time to make sure we + * trigger the ring full case + */ + sem_post(&dirty_ring_vcpu_cont); + } + + /* Make sure it's stuck */ + TEST_ASSERT(vcpu_thread_tid, "TID not inited"); + /* + * Wait for /proc/pid/status "Status:" changes to "D". "D" + * stands for "D (disk sleep)", TASK_UNINTERRUPTIBLE + */ + while (read_tid_status_char(vcpu_thread_tid) != 'D') { + usleep(1000); + } + DEBUG("Now VCPU thread dirty ring full\n"); + + avail = READ_ONCE(indexes->avail_index); + /* Assuming we've consumed all */ + WRITE_ONCE(indexes->fetch_index, avail); + + kvm_vm_reset_dirty_ring(vm); + + /* Wait for it to be awake */ + while (read_tid_status_char(vcpu_thread_tid) == 'D') { + usleep(1000); + } + DEBUG("VCPU Thread is successfully waked up\n"); + + exit(0); +} + static void dirty_ring_after_vcpu_run(struct kvm_vm *vm, int ret, int err) { struct kvm_run *run = vcpu_state(vm, VCPU_ID); @@ -353,6 +445,12 @@ struct log_mode { .before_vcpu_join = dirty_ring_before_vcpu_join, .after_vcpu_run = dirty_ring_after_vcpu_run, }, + { + .name = "dirty-ring-wait-queue", + .create_vm_done = dirty_ring_wq_create_vm_done, + .collect_dirty_pages = dirty_ring_wq_collect_dirty_pages, + .after_vcpu_run = dirty_ring_after_vcpu_run, + }, }; /* @@ -422,6 +520,9 @@ static void *vcpu_worker(void *data) uint64_t *guest_array; struct sigaction sigact; + vcpu_thread_tid = syscall(SYS_gettid); + printf("VCPU Thread ID: %u\n", vcpu_thread_tid); + current_vm = vm; memset(&sigact, 0, sizeof(sigact)); sigact.sa_handler = vcpu_sig_handler;