From patchwork Fri Oct 23 18:33:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11854399 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF7CCC5517A for ; Fri, 23 Oct 2020 18:35:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5675720E65 for ; Fri, 23 Oct 2020 18:35:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Xn/CXJbH" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754830AbgJWSeH (ORCPT ); Fri, 23 Oct 2020 14:34:07 -0400 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:28185 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754796AbgJWSeF (ORCPT ); Fri, 23 Oct 2020 14:34:05 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1603478044; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=f9RAo1p5Z3QHuNccDaVSZ9lEhBh3TGuTwN8hkmUR6Fw=; b=Xn/CXJbHQLDqvTwGjto5A2ieLmUwsxFkdov09erOIunRgDQM6cCnyowgDtOhkz7MznG0MX JA6LNgXJ2mTaakOxiyTPeYbkS/bjbjkUnPtjMn5Ht1lKFvIb9aHaDUq61DTBpCNZDc//gE Q6gZc1wGngLvB1spsQvUPo75RQOVoow= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-218-ZjkYdkSuNL-7h7ecQesWmQ-1; Fri, 23 Oct 2020 14:34:02 -0400 X-MC-Unique: ZjkYdkSuNL-7h7ecQesWmQ-1 Received: by mail-qv1-f71.google.com with SMTP id k15so184074qvx.4 for ; Fri, 23 Oct 2020 11:34:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=f9RAo1p5Z3QHuNccDaVSZ9lEhBh3TGuTwN8hkmUR6Fw=; b=PllFPkD6vki9KjSEwU+PxtKxPL3hW+S5i29uHJpWgjDSzDfmKQYaIItHFb6zUYX3JY nr7NNh8/sqK85FaHM2ISQodzF+A7J9vYcGbt9OZH2ULX/tB1L513l8uQm5U1A3HEmIJY eXu4orsC63Jf965Nr5Xziv/0E4hl9DG9XkVqaV1Q9jgJ5+zIa4QsPagvABB/ynIfld0Z caS5/ZGsFeGcjEpZwLySNK/V9GiRMK2dL7900IfKlxiIpgdmK0MQRMpS6DQmZv2ikoYj l//kf8sp6kmVgUvjHVd9YwB7pxun866QdoAJkWKIuJdArWaZZ97aKciMTOPhBm1ZoWp2 VHhg== X-Gm-Message-State: AOAM532uVMXnBOc99HDig/hWzKR9elM1BX9CEFHHPI1oRIiIhnMPRhNp xQNfbYqADzMUpQGzegzu5VSISBrmJXQfgRs76KrOz8t9XZbLvite4yBMVY9SETOPVPsJdFe6u3N ue9l+Pr1Y9lh5 X-Received: by 2002:ac8:2f91:: with SMTP id l17mr3488387qta.252.1603478041961; Fri, 23 Oct 2020 11:34:01 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyay/GTLlaHShgebWAWLZqPirAfvX+bQW7w7XUQEvI4fD7jH+bC1CjDWb2H2QxOTagrC/TUKA== X-Received: by 2002:ac8:2f91:: with SMTP id l17mr3488368qta.252.1603478041747; Fri, 23 Oct 2020 11:34:01 -0700 (PDT) Received: from xz-x1.redhat.com (toroon474qw-lp140-04-174-95-215-133.dsl.bell.ca. [174.95.215.133]) by smtp.gmail.com with ESMTPSA id u11sm1490407qtk.61.2020.10.23.11.34.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 11:34:01 -0700 (PDT) From: Peter Xu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Sean Christopherson , peterx@redhat.com, Andrew Jones , "Dr . David Alan Gilbert" , Paolo Bonzini Subject: [PATCH v15 01/14] KVM: Documentation: Update entry for KVM_X86_SET_MSR_FILTER Date: Fri, 23 Oct 2020 14:33:45 -0400 Message-Id: <20201023183358.50607-2-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201023183358.50607-1-peterx@redhat.com> References: <20201023183358.50607-1-peterx@redhat.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org It should be an accident when rebase, since we've already have section 8.25 (which is KVM_CAP_S390_DIAG318). Fix the number. Should be squashed into 1a155254ff937ac92cf9940d. Signed-off-by: Peter Xu --- Documentation/virt/kvm/api.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 76317221d29f..094e128634d2 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6367,7 +6367,7 @@ accesses that would usually trigger a #GP by KVM into the guest will instead get bounced to user space through the KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit notifications. -8.25 KVM_X86_SET_MSR_FILTER +8.27 KVM_X86_SET_MSR_FILTER --------------------------- :Architectures: x86 From patchwork Fri Oct 23 18:33:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11854383 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64C3DC388F9 for ; Fri, 23 Oct 2020 18:34:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 010D020E65 for ; Fri, 23 Oct 2020 18:34:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="DtciYj2p" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754848AbgJWSeK (ORCPT ); Fri, 23 Oct 2020 14:34:10 -0400 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:32195 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754834AbgJWSeI (ORCPT ); Fri, 23 Oct 2020 14:34:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1603478047; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+JBSwivOieEBkLj+y5xNLmYzl5BJ4oB5HgP2xMsJ50A=; b=DtciYj2ptGdJopFzcU+4Kn/c6PmfpohvsPENpDKCkyKLGK+rQvawxOu/4GKgHbo+MPzLWN eb+cKN2M72hqH+Ce6smHj8O5b6OI2FhwgopqoJyRdnY2bNRDoAtMCMXSrwazfDMt75cz5I ppph8iEbpM9phskzyTi1jPiGzbiLEI4= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-146-cgx_hfIuOwKM7hTLEJJBDQ-1; Fri, 23 Oct 2020 14:34:06 -0400 X-MC-Unique: cgx_hfIuOwKM7hTLEJJBDQ-1 Received: by mail-qv1-f71.google.com with SMTP id s8so1529663qvv.18 for ; Fri, 23 Oct 2020 11:34:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+JBSwivOieEBkLj+y5xNLmYzl5BJ4oB5HgP2xMsJ50A=; b=nAjUkR4miLP0WYUWdxvQwmZBeKOPJTUpSxn0B2ScPX+GsA6cYRq6QMhYZ0UBGr7ysQ 46s0gFakmNhV34pKz2pik/SNuwpydMHqswJWNlWmjBikLcAQyWaoURQ8tExv90CyIw5Y 1wg6nH0r4o1jgHrDwM+ZImhT74AM0ELtHHqm2p8FmwD4NT0v5leLri7w9wW0QjswoaVy Yr6uqDH3favM+ZALNrjuPwm7V9eCnWnTo5OhVvxTCrCe+zcOpvN1bOcP/NrhAB9X2dgw bpjH3/zx7zL7ya4IjuC/jarFRd1tPJHowXJPzZTKzkEwUjodd/rCaWshuJ2842nzTLqI baYA== X-Gm-Message-State: AOAM533YkHAAStpzL9efWbYpCwND8a2OntWQ3lbtM8E5d9vx41pZC1E/ A37uWG8WBTU6gCWlO3j0nrpvFLNbXcTxuMQ5Tg0wuDjmchszFylr7VCrj44D98XIhZFbEJZ1XtS Ja8ckS91xqLQL X-Received: by 2002:ae9:f305:: with SMTP id p5mr3428365qkg.481.1603478045562; Fri, 23 Oct 2020 11:34:05 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyotm8QmAhrS0v/p7J+qOW64NCqV+9LILBDSp3tkCimtWR3nIRvM9TECufCj/QUX+Vhg4d9ZA== X-Received: by 2002:ae9:f305:: with SMTP id p5mr3428213qkg.481.1603478043405; Fri, 23 Oct 2020 11:34:03 -0700 (PDT) Received: from xz-x1.redhat.com (toroon474qw-lp140-04-174-95-215-133.dsl.bell.ca. [174.95.215.133]) by smtp.gmail.com with ESMTPSA id u11sm1490407qtk.61.2020.10.23.11.34.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 11:34:02 -0700 (PDT) From: Peter Xu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Sean Christopherson , peterx@redhat.com, Andrew Jones , "Dr . David Alan Gilbert" , Paolo Bonzini Subject: [PATCH v15 02/14] KVM: Documentation: Update entry for KVM_CAP_ENFORCE_PV_CPUID Date: Fri, 23 Oct 2020 14:33:46 -0400 Message-Id: <20201023183358.50607-3-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201023183358.50607-1-peterx@redhat.com> References: <20201023183358.50607-1-peterx@redhat.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Should be squashed into 66570e966dd9cb4f. Signed-off-by: Peter Xu --- Documentation/virt/kvm/api.rst | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 094e128634d2..f78307e77371 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6381,8 +6381,7 @@ In combination with KVM_CAP_X86_USER_SPACE_MSR, this allows user space to trap and emulate MSRs that are outside of the scope of KVM as well as limit the attack surface on KVM's MSR emulation code. - -8.26 KVM_CAP_ENFORCE_PV_CPUID +8.28 KVM_CAP_ENFORCE_PV_CPUID ----------------------------- Architectures: x86 From patchwork Fri Oct 23 18:33:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11854381 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C89CAC55179 for ; Fri, 23 Oct 2020 18:34:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 49B4C20882 for ; Fri, 23 Oct 2020 18:34:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="BOr7RnBv" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754856AbgJWSeM (ORCPT ); Fri, 23 Oct 2020 14:34:12 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:34671 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754835AbgJWSeK (ORCPT ); Fri, 23 Oct 2020 14:34:10 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1603478047; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zybJctNR7wpwDshTLvJgVFmunOV2BkyU6/UMfg2mCTQ=; b=BOr7RnBvhzW5FZ/yfdvOiG4j1pS/5odvtmz7igC1p9kMk3FRsN6OkJOshNMCJfipMBk/lP BqXMQ7U1Y7xK/dEvjeymGzp8sqRIFip7Vy7JTKiyfHIKegUWrM+AfJplPYJTI4Vb4JngYZ 5+xrGinjgPgI9GebSFvTnvgoQsVv+fU= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-225-rwiV3NYnOzabINLlb-JQsw-1; Fri, 23 Oct 2020 14:34:06 -0400 X-MC-Unique: rwiV3NYnOzabINLlb-JQsw-1 Received: by mail-qt1-f199.google.com with SMTP id n8so1671083qtf.10 for ; Fri, 23 Oct 2020 11:34:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=zybJctNR7wpwDshTLvJgVFmunOV2BkyU6/UMfg2mCTQ=; b=E4JA2cw+Yr7LcwXX7DAYkR3aWWeSRdJ9dPUxwQuIY+7zjH2Qp9sOeXwmuKQVmqMSPF lVJSMTLDsBtYftprV5h5HYcnKIzdEiw5LcOFTa+7Bm3Ou0lTQk5lQUp1lGfkWpXKrU9y d9Wq0dqUbAXlpZr47xWqSM1FOQ6+wy1JqUkwo8GO+xDlu8jUPzyoQ2i/7MHJyV4Cw/1V IP+H2Bwiq1S2RmwFkYK+a1xHtt373S2i2y4CEC/jGLNCZfIcb9C2obW2ufKg3G4iD4v/ VOSfzavOdp0ZutiTFF4mXaEIeemortT0gyEE7Ox7QUMm2SU8zywsiWUwtmOoXR53PtL4 cWww== X-Gm-Message-State: AOAM532xx9wdHsr2pHoJuJyx1bMIjXogAn5bDkumxbe+mdyfBYrzP3ZJ Ko49s1tWAH/1zDGXznCMDKoAMKVslxIpC9xI1AE38oiAJoimH3xXY1WEuppmLlU8ChoPYLeaxZb vMWEGoh88O7Va X-Received: by 2002:a05:622a:1c4:: with SMTP id t4mr3289798qtw.147.1603478045401; Fri, 23 Oct 2020 11:34:05 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwCIqSiiG5CVg6cRBgt44ow/vG7hHdg/Yk4cY6CotY/kKTf0To1yAHM1ntiCXY7B119HQAeOg== X-Received: by 2002:a05:622a:1c4:: with SMTP id t4mr3289767qtw.147.1603478045072; Fri, 23 Oct 2020 11:34:05 -0700 (PDT) Received: from xz-x1.redhat.com (toroon474qw-lp140-04-174-95-215-133.dsl.bell.ca. [174.95.215.133]) by smtp.gmail.com with ESMTPSA id u11sm1490407qtk.61.2020.10.23.11.34.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 11:34:04 -0700 (PDT) From: Peter Xu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Sean Christopherson , peterx@redhat.com, Andrew Jones , "Dr . David Alan Gilbert" , Paolo Bonzini Subject: [PATCH v15 03/14] KVM: X86: Don't track dirty for KVM_SET_[TSS_ADDR|IDENTITY_MAP_ADDR] Date: Fri, 23 Oct 2020 14:33:47 -0400 Message-Id: <20201023183358.50607-4-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201023183358.50607-1-peterx@redhat.com> References: <20201023183358.50607-1-peterx@redhat.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Originally, we have three code paths that can dirty a page without vcpu context for X86: - init_rmode_identity_map - init_rmode_tss - kvmgt_rw_gpa init_rmode_identity_map and init_rmode_tss will be setup on destination VM no matter what (and the guest cannot even see them), so it does not make sense to track them at all. To do this, allow __x86_set_memory_region() to return the userspace address that just allocated to the caller. Then in both of the functions we directly write to the userspace address instead of calling kvm_write_*() APIs. Another trivial change is that we don't need to explicitly clear the identity page table root in init_rmode_identity_map() because no matter what we'll write to the whole page with 4M huge page entries. Suggested-by: Paolo Bonzini Reviewed-by: Sean Christopherson Signed-off-by: Peter Xu --- arch/x86/include/asm/kvm_host.h | 3 +- arch/x86/kvm/svm/avic.c | 9 ++-- arch/x86/kvm/vmx/vmx.c | 88 ++++++++++++++++----------------- arch/x86/kvm/x86.c | 37 +++++++++++--- 4 files changed, 81 insertions(+), 56 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d44858b69353..bf4f6f7c4d52 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1694,7 +1694,8 @@ void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu); int kvm_is_in_guest(void); -int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size); +void __user *__x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, + u32 size); bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu); bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 8c550999ace0..0ef84d57b72e 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -233,7 +233,8 @@ static u64 *avic_get_physical_id_entry(struct kvm_vcpu *vcpu, */ static int avic_update_access_page(struct kvm *kvm, bool activate) { - int ret = 0; + void __user *ret; + int r = 0; mutex_lock(&kvm->slots_lock); /* @@ -249,13 +250,15 @@ static int avic_update_access_page(struct kvm *kvm, bool activate) APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, APIC_DEFAULT_PHYS_BASE, activate ? PAGE_SIZE : 0); - if (ret) + if (IS_ERR(ret)) { + r = PTR_ERR(ret); goto out; + } kvm->arch.apic_access_page_done = activate; out: mutex_unlock(&kvm->slots_lock); - return ret; + return r; } static int avic_init_backing_page(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 755896797a81..388a7d9f0a5d 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -3513,42 +3513,33 @@ bool __vmx_guest_state_valid(struct kvm_vcpu *vcpu) return true; } -static int init_rmode_tss(struct kvm *kvm) +static int init_rmode_tss(struct kvm *kvm, void __user *ua) { - gfn_t fn; - u16 data = 0; - int idx, r; + const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0))); + u16 data; + int i; + + for (i = 0; i < 3; i++) { + if (__copy_to_user(ua + PAGE_SIZE * i, zero_page, PAGE_SIZE)) + return -EFAULT; + } - idx = srcu_read_lock(&kvm->srcu); - fn = to_kvm_vmx(kvm)->tss_addr >> PAGE_SHIFT; - r = kvm_clear_guest_page(kvm, fn, 0, PAGE_SIZE); - if (r < 0) - goto out; data = TSS_BASE_SIZE + TSS_REDIRECTION_SIZE; - r = kvm_write_guest_page(kvm, fn++, &data, - TSS_IOPB_BASE_OFFSET, sizeof(u16)); - if (r < 0) - goto out; - r = kvm_clear_guest_page(kvm, fn++, 0, PAGE_SIZE); - if (r < 0) - goto out; - r = kvm_clear_guest_page(kvm, fn, 0, PAGE_SIZE); - if (r < 0) - goto out; + if (__copy_to_user(ua + TSS_IOPB_BASE_OFFSET, &data, sizeof(u16))) + return -EFAULT; + data = ~0; - r = kvm_write_guest_page(kvm, fn, &data, - RMODE_TSS_SIZE - 2 * PAGE_SIZE - 1, - sizeof(u8)); -out: - srcu_read_unlock(&kvm->srcu, idx); - return r; + if (__copy_to_user(ua + RMODE_TSS_SIZE - 1, &data, sizeof(u8))) + return -EFAULT; + + return 0; } static int init_rmode_identity_map(struct kvm *kvm) { struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm); int i, r = 0; - kvm_pfn_t identity_map_pfn; + void __user *uaddr; u32 tmp; /* Protect kvm_vmx->ept_identity_pagetable_done. */ @@ -3559,24 +3550,24 @@ static int init_rmode_identity_map(struct kvm *kvm) if (!kvm_vmx->ept_identity_map_addr) kvm_vmx->ept_identity_map_addr = VMX_EPT_IDENTITY_PAGETABLE_ADDR; - identity_map_pfn = kvm_vmx->ept_identity_map_addr >> PAGE_SHIFT; - r = __x86_set_memory_region(kvm, IDENTITY_PAGETABLE_PRIVATE_MEMSLOT, - kvm_vmx->ept_identity_map_addr, PAGE_SIZE); - if (r < 0) + uaddr = __x86_set_memory_region(kvm, + IDENTITY_PAGETABLE_PRIVATE_MEMSLOT, + kvm_vmx->ept_identity_map_addr, + PAGE_SIZE); + if (IS_ERR(uaddr)) { + r = PTR_ERR(uaddr); goto out; + } - r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE); - if (r < 0) - goto out; /* Set up identity-mapping pagetable for EPT in real mode */ for (i = 0; i < PT32_ENT_PER_PAGE; i++) { tmp = (i << 22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE); - r = kvm_write_guest_page(kvm, identity_map_pfn, - &tmp, i * sizeof(tmp), sizeof(tmp)); - if (r < 0) + if (__copy_to_user(uaddr + i * sizeof(tmp), &tmp, sizeof(tmp))) { + r = -EFAULT; goto out; + } } kvm_vmx->ept_identity_pagetable_done = true; @@ -3603,19 +3594,22 @@ static void seg_setup(int seg) static int alloc_apic_access_page(struct kvm *kvm) { struct page *page; - int r = 0; + void __user *hva; + int ret = 0; mutex_lock(&kvm->slots_lock); if (kvm->arch.apic_access_page_done) goto out; - r = __x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, - APIC_DEFAULT_PHYS_BASE, PAGE_SIZE); - if (r) + hva = __x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, + APIC_DEFAULT_PHYS_BASE, PAGE_SIZE); + if (IS_ERR(hva)) { + ret = PTR_ERR(hva); goto out; + } page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT); if (is_error_page(page)) { - r = -EFAULT; + ret = -EFAULT; goto out; } @@ -3627,7 +3621,7 @@ static int alloc_apic_access_page(struct kvm *kvm) kvm->arch.apic_access_page_done = true; out: mutex_unlock(&kvm->slots_lock); - return r; + return ret; } int allocate_vpid(void) @@ -4636,7 +4630,7 @@ static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection) static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr) { - int ret; + void __user *ret; if (enable_unrestricted_guest) return 0; @@ -4646,10 +4640,12 @@ static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr) PAGE_SIZE * 3); mutex_unlock(&kvm->slots_lock); - if (ret) - return ret; + if (IS_ERR(ret)) + return PTR_ERR(ret); + to_kvm_vmx(kvm)->tss_addr = addr; - return init_rmode_tss(kvm); + + return init_rmode_tss(kvm, ret); } static int vmx_set_identity_map_addr(struct kvm *kvm, u64 ident_addr) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 397f599b20e5..f34d1e463d16 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10309,7 +10309,32 @@ void kvm_arch_sync_events(struct kvm *kvm) kvm_free_pit(kvm); } -int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size) +#define ERR_PTR_USR(e) ((void __user *)ERR_PTR(e)) + +/** + * __x86_set_memory_region: Setup KVM internal memory slot + * + * @kvm: the kvm pointer to the VM. + * @id: the slot ID to setup. + * @gpa: the GPA to install the slot (unused when @size == 0). + * @size: the size of the slot. Set to zero to uninstall a slot. + * + * This function helps to setup a KVM internal memory slot. Specify + * @size > 0 to install a new slot, while @size == 0 to uninstall a + * slot. The return code can be one of the following: + * + * HVA: on success (uninstall will return a bogus HVA) + * -errno: on error + * + * The caller should always use IS_ERR() to check the return value + * before use. Note, the KVM internal memory slots are guaranteed to + * remain valid and unchanged until the VM is destroyed, i.e., the + * GPA->HVA translation will not change. However, the HVA is a user + * address, i.e. its accessibility is not guaranteed, and must be + * accessed via __copy_{to,from}_user(). + */ +void __user * __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, + u32 size) { int i, r; unsigned long hva, old_npages; @@ -10318,12 +10343,12 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size) /* Called with kvm->slots_lock held. */ if (WARN_ON(id >= KVM_MEM_SLOTS_NUM)) - return -EINVAL; + return ERR_PTR_USR(-EINVAL); slot = id_to_memslot(slots, id); if (size) { if (slot && slot->npages) - return -EEXIST; + return ERR_PTR_USR(-EEXIST); /* * MAP_SHARED to prevent internal slot pages from being moved @@ -10332,7 +10357,7 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size) hva = vm_mmap(NULL, 0, size, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, 0); if (IS_ERR((void *)hva)) - return PTR_ERR((void *)hva); + return (void __user *)hva; } else { if (!slot || !slot->npages) return 0; @@ -10351,13 +10376,13 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size) m.memory_size = size; r = __kvm_set_memory_region(kvm, &m); if (r < 0) - return r; + return ERR_PTR_USR(r); } if (!size) vm_munmap(hva, old_npages * PAGE_SIZE); - return 0; + return (void __user *)hva; } EXPORT_SYMBOL_GPL(__x86_set_memory_region); From patchwork Fri Oct 23 18:33:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11854401 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BB67C55179 for ; Fri, 23 Oct 2020 18:35:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C67F620E65 for ; Fri, 23 Oct 2020 18:35:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="fynlrjAR" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754885AbgJWSfj (ORCPT ); Fri, 23 Oct 2020 14:35:39 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:48248 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754849AbgJWSeM (ORCPT ); Fri, 23 Oct 2020 14:34:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1603478050; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9t8T9Ib2e2iMqBzDGwKPkdGpy/vVvvmM60SQlXqCXhs=; b=fynlrjARaKqVXfqE36E9qajZapplUAZ4Wk1a1gz7r8aAHorYwNzdePq4Tgvv7WAcdMNPdv DOUnNvi6e7UqNlRBm0fQwZpy4EX/50heecBpDyadQC6H8Kd6qm/VrUw1WsQRvP98ERNnMl pYeysV3XHfAn0uUhdoto4Fvx7FpeGN0= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-423-AH0zvanbMLCxu2vGsUMYPA-1; Fri, 23 Oct 2020 14:34:08 -0400 X-MC-Unique: AH0zvanbMLCxu2vGsUMYPA-1 Received: by mail-qt1-f198.google.com with SMTP id n8so1671159qtf.10 for ; Fri, 23 Oct 2020 11:34:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9t8T9Ib2e2iMqBzDGwKPkdGpy/vVvvmM60SQlXqCXhs=; b=uM+KbbZa3lyq8/WeXY3lksrIBY5KgKd85YNu2oSmiz5D/vYOGReQzZUzf/9d2pFsnD pt9Po3ny+gp4dZTOIEb/HFSLpdZsj3RjJJPnR9ewadUtrSaS9PBjSo3ag4dBafUeoDFP htRG+3RzT4fK2y/3LbRnnd8tA3k/g4i31Srx9FYidgmDjoAmC/uuy8/OpetFn3afv/XK DPXi/yBWLBLsi5LHAigexDpRBtbZ5hJAmCNxH4eUVkQcpILIj5GPdzRB054bduZ5wGdz 7ZZ8MXT/5yPxNyMQsoRqfwrJcQj3tk5sxYPuNytbh2pWV+93wuEMUw8W25k7Sqan9wx/ OJnA== X-Gm-Message-State: AOAM5307P4vWbvBmNse2hnvxWvzLHG27CiFHgbUgFknIcq2xUstZotst IvYlRMWlrt0EKHGtzKyEn3W2/IcJ/JA3O4ivVcxZeKm4R7SCg3CyuAsmkWOV/8RwtYLwsKDeH1e +vxGiRNfiZLVR X-Received: by 2002:ac8:5389:: with SMTP id x9mr3616359qtp.106.1603478047027; Fri, 23 Oct 2020 11:34:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyi6uGKmSeLVqu5Bsky2+VJNco6/52TDj8b1R2+nS4jod5vgvxqktGMjNvyizIHdnAo1X4STw== X-Received: by 2002:ac8:5389:: with SMTP id x9mr3616338qtp.106.1603478046785; Fri, 23 Oct 2020 11:34:06 -0700 (PDT) Received: from xz-x1.redhat.com (toroon474qw-lp140-04-174-95-215-133.dsl.bell.ca. [174.95.215.133]) by smtp.gmail.com with ESMTPSA id u11sm1490407qtk.61.2020.10.23.11.34.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 11:34:05 -0700 (PDT) From: Peter Xu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Sean Christopherson , peterx@redhat.com, Andrew Jones , "Dr . David Alan Gilbert" , Paolo Bonzini Subject: [PATCH v15 04/14] KVM: Pass in kvm pointer into mark_page_dirty_in_slot() Date: Fri, 23 Oct 2020 14:33:48 -0400 Message-Id: <20201023183358.50607-5-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201023183358.50607-1-peterx@redhat.com> References: <20201023183358.50607-1-peterx@redhat.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The context will be needed to implement the kvm dirty ring. Signed-off-by: Peter Xu --- arch/x86/kvm/mmu/tdp_mmu.c | 2 +- include/linux/kvm_host.h | 3 ++- virt/kvm/kvm_main.c | 29 ++++++++++++++++------------- 3 files changed, 19 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index e246d71b8ea2..19e3aaadb504 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -178,7 +178,7 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn, if ((!is_writable_pte(old_spte) || pfn_changed) && is_writable_pte(new_spte)) { slot = __gfn_to_memslot(__kvm_memslots(kvm, as_id), gfn); - mark_page_dirty_in_slot(slot, gfn); + mark_page_dirty_in_slot(kvm, slot, gfn); } } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 7f2e2a09ebbd..6789500364c2 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -798,7 +798,8 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn); bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn); bool kvm_vcpu_is_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn); unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn); -void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, gfn_t gfn); +void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot, + gfn_t gfn); void mark_page_dirty(struct kvm *kvm, gfn_t gfn); struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 2541a17ff1c4..59ad45ef9b55 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2196,7 +2196,8 @@ int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map) } EXPORT_SYMBOL_GPL(kvm_vcpu_map); -static void __kvm_unmap_gfn(struct kvm_memory_slot *memslot, +static void __kvm_unmap_gfn(struct kvm *kvm, + struct kvm_memory_slot *memslot, struct kvm_host_map *map, struct gfn_to_pfn_cache *cache, bool dirty, bool atomic) @@ -2221,7 +2222,7 @@ static void __kvm_unmap_gfn(struct kvm_memory_slot *memslot, #endif if (dirty) - mark_page_dirty_in_slot(memslot, map->gfn); + mark_page_dirty_in_slot(kvm, memslot, map->gfn); if (cache) cache->dirty |= dirty; @@ -2235,7 +2236,7 @@ static void __kvm_unmap_gfn(struct kvm_memory_slot *memslot, int kvm_unmap_gfn(struct kvm_vcpu *vcpu, struct kvm_host_map *map, struct gfn_to_pfn_cache *cache, bool dirty, bool atomic) { - __kvm_unmap_gfn(gfn_to_memslot(vcpu->kvm, map->gfn), map, + __kvm_unmap_gfn(vcpu->kvm, gfn_to_memslot(vcpu->kvm, map->gfn), map, cache, dirty, atomic); return 0; } @@ -2243,8 +2244,8 @@ EXPORT_SYMBOL_GPL(kvm_unmap_gfn); void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty) { - __kvm_unmap_gfn(kvm_vcpu_gfn_to_memslot(vcpu, map->gfn), map, NULL, - dirty, false); + __kvm_unmap_gfn(vcpu->kvm, kvm_vcpu_gfn_to_memslot(vcpu, map->gfn), + map, NULL, dirty, false); } EXPORT_SYMBOL_GPL(kvm_vcpu_unmap); @@ -2418,7 +2419,8 @@ int kvm_vcpu_read_guest_atomic(struct kvm_vcpu *vcpu, gpa_t gpa, } EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_atomic); -static int __kvm_write_guest_page(struct kvm_memory_slot *memslot, gfn_t gfn, +static int __kvm_write_guest_page(struct kvm *kvm, + struct kvm_memory_slot *memslot, gfn_t gfn, const void *data, int offset, int len) { int r; @@ -2430,7 +2432,7 @@ static int __kvm_write_guest_page(struct kvm_memory_slot *memslot, gfn_t gfn, r = __copy_to_user((void __user *)addr + offset, data, len); if (r) return -EFAULT; - mark_page_dirty_in_slot(memslot, gfn); + mark_page_dirty_in_slot(kvm, memslot, gfn); return 0; } @@ -2439,7 +2441,7 @@ int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, { struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn); - return __kvm_write_guest_page(slot, gfn, data, offset, len); + return __kvm_write_guest_page(kvm, slot, gfn, data, offset, len); } EXPORT_SYMBOL_GPL(kvm_write_guest_page); @@ -2448,7 +2450,7 @@ int kvm_vcpu_write_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, { struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); - return __kvm_write_guest_page(slot, gfn, data, offset, len); + return __kvm_write_guest_page(vcpu->kvm, slot, gfn, data, offset, len); } EXPORT_SYMBOL_GPL(kvm_vcpu_write_guest_page); @@ -2567,7 +2569,7 @@ int kvm_write_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, r = __copy_to_user((void __user *)ghc->hva + offset, data, len); if (r) return -EFAULT; - mark_page_dirty_in_slot(ghc->memslot, gpa >> PAGE_SHIFT); + mark_page_dirty_in_slot(kvm, ghc->memslot, gpa >> PAGE_SHIFT); return 0; } @@ -2643,7 +2645,8 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len) } EXPORT_SYMBOL_GPL(kvm_clear_guest); -void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, gfn_t gfn) +void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot, + gfn_t gfn) { if (memslot && memslot->dirty_bitmap) { unsigned long rel_gfn = gfn - memslot->base_gfn; @@ -2658,7 +2661,7 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn) struct kvm_memory_slot *memslot; memslot = gfn_to_memslot(kvm, gfn); - mark_page_dirty_in_slot(memslot, gfn); + mark_page_dirty_in_slot(kvm, memslot, gfn); } EXPORT_SYMBOL_GPL(mark_page_dirty); @@ -2667,7 +2670,7 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn) struct kvm_memory_slot *memslot; memslot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); - mark_page_dirty_in_slot(memslot, gfn); + mark_page_dirty_in_slot(vcpu->kvm, memslot, gfn); } EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty); From patchwork Fri Oct 23 18:33:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11854393 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB160C55179 for ; Fri, 23 Oct 2020 18:35:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4238720BED for ; Fri, 23 Oct 2020 18:35:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="hOL3Yjw2" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754876AbgJWSeT (ORCPT ); Fri, 23 Oct 2020 14:34:19 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:53510 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754859AbgJWSeQ (ORCPT ); Fri, 23 Oct 2020 14:34:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1603478052; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=r+v3K8B9243ZvC9IJJU2ugOn8YRjT1aPRA74jUUYLh0=; b=hOL3Yjw2y7FcTSZULyVLLBv7vHuH9cqtwEoc6kK0RhgB3R2sed29Ir1k7qGN45ceL5ZPkG 0QQbFa9bAxBVtS2Gvx6bIFptRXvKyqU6++8Ijz0/qAgBrPA8LdbsFRr3kg335V5Ai/Idm7 693GXu+su62n+b3RFSDc8uIoGxbe2PM= Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-333-tf_9PTX3PCWVWTZKny51_w-1; Fri, 23 Oct 2020 14:34:10 -0400 X-MC-Unique: tf_9PTX3PCWVWTZKny51_w-1 Received: by mail-qk1-f197.google.com with SMTP id v186so1618993qkb.11 for ; Fri, 23 Oct 2020 11:34:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=r+v3K8B9243ZvC9IJJU2ugOn8YRjT1aPRA74jUUYLh0=; b=mBj814arVgyE9K6Al3Jl3uUmo8E8/UFKWLzSMrAPGIbFjB6Tin6Vvwo8xNuNquKfyt twwm/IPPIJtStwheq4qMSA6mKfazycoKk+aC8jCu9hTV99VUapz2vWpZ8MEf/9s4IOcj rvMzUjVNWaEeiij0sn89g2sMwOzher9RL1mMiXKb4hKehChK6X+A27zLn4B2vETNVZcX Zd2AXUoKFNV7EYHn7UX8FlbdO7XphiGuYNPj+adMZnV2txLvOlOu8Ug+6YQaPSUX1oKy b+QYAnxnmOI1eFG+4uIKaI7avLCRkanMyvGGV0jdeEOtx2NT/eeZhvi3Tf+YnKISZEUm +nDQ== X-Gm-Message-State: AOAM530QkcQw+m1cGmnTjfU8WJFUvGCRnHK27oamsXBRpZXiZB91I5Zs n4DwPSmsxn5YlQdLNzJfwoMXt7JDl4fvPTnnr8hL6tyIILtGWTPO9/LQrGD8Yxv6NVIkdD3waYL HXilDEkEH4p2Z X-Received: by 2002:ac8:429a:: with SMTP id o26mr3285694qtl.334.1603478048928; Fri, 23 Oct 2020 11:34:08 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwcVQJ3nMHKH6TYGhAb5Ugh6DWRykzTNNCum3pffLf6PulDYy4cNEJsJt/53lrnsnEWDaUOKQ== X-Received: by 2002:ac8:429a:: with SMTP id o26mr3285642qtl.334.1603478048315; Fri, 23 Oct 2020 11:34:08 -0700 (PDT) Received: from xz-x1.redhat.com (toroon474qw-lp140-04-174-95-215-133.dsl.bell.ca. [174.95.215.133]) by smtp.gmail.com with ESMTPSA id u11sm1490407qtk.61.2020.10.23.11.34.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 11:34:07 -0700 (PDT) From: Peter Xu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Sean Christopherson , peterx@redhat.com, Andrew Jones , "Dr . David Alan Gilbert" , Paolo Bonzini , Lei Cao Subject: [PATCH v15 05/14] KVM: X86: Implement ring-based dirty memory tracking Date: Fri, 23 Oct 2020 14:33:49 -0400 Message-Id: <20201023183358.50607-6-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201023183358.50607-1-peterx@redhat.com> References: <20201023183358.50607-1-peterx@redhat.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch is heavily based on previous work from Lei Cao and Paolo Bonzini . [1] KVM currently uses large bitmaps to track dirty memory. These bitmaps are copied to userspace when userspace queries KVM for its dirty page information. The use of bitmaps is mostly sufficient for live migration, as large parts of memory are be dirtied from one log-dirty pass to another. However, in a checkpointing system, the number of dirty pages is small and in fact it is often bounded---the VM is paused when it has dirtied a pre-defined number of pages. Traversing a large, sparsely populated bitmap to find set bits is time-consuming, as is copying the bitmap to user-space. A similar issue will be there for live migration when the guest memory is huge while the page dirty procedure is trivial. In that case for each dirty sync we need to pull the whole dirty bitmap to userspace and analyse every bit even if it's mostly zeros. The preferred data structure for above scenarios is a dense list of guest frame numbers (GFN). This patch series stores the dirty list in kernel memory that can be memory mapped into userspace to allow speedy harvesting. This patch enables dirty ring for X86 only. However it should be easily extended to other archs as well. [1] https://patchwork.kernel.org/patch/10471409/ Signed-off-by: Lei Cao Signed-off-by: Paolo Bonzini Signed-off-by: Peter Xu --- Documentation/virt/kvm/api.rst | 117 +++++++++++++++++++ arch/x86/include/asm/kvm_host.h | 3 + arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/Makefile | 3 +- arch/x86/kvm/mmu/mmu.c | 8 ++ arch/x86/kvm/vmx/vmx.c | 8 +- arch/x86/kvm/x86.c | 9 ++ include/linux/kvm_dirty_ring.h | 103 +++++++++++++++++ include/linux/kvm_host.h | 13 +++ include/trace/events/kvm.h | 63 ++++++++++ include/uapi/linux/kvm.h | 53 +++++++++ virt/kvm/dirty_ring.c | 197 ++++++++++++++++++++++++++++++++ virt/kvm/kvm_main.c | 113 +++++++++++++++++- 13 files changed, 688 insertions(+), 3 deletions(-) create mode 100644 include/linux/kvm_dirty_ring.h create mode 100644 virt/kvm/dirty_ring.c diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index f78307e77371..00da3f01d632 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -249,6 +249,7 @@ Based on their initialization different VMs may have different capabilities. It is thus encouraged to use the vm ioctl to query for capabilities (available with KVM_CAP_CHECK_EXTENSION_VM on the vm fd) + 4.5 KVM_GET_VCPU_MMAP_SIZE -------------------------- @@ -262,6 +263,18 @@ The KVM_RUN ioctl (cf.) communicates with userspace via a shared memory region. This ioctl returns the size of that region. See the KVM_RUN documentation for details. +Besides the size of the KVM_RUN communication region, other areas of +the VCPU file descriptor can be mmap-ed, including: + +- if KVM_CAP_COALESCED_MMIO is available, a page at + KVM_COALESCED_MMIO_PAGE_OFFSET * PAGE_SIZE; for historical reasons, + this page is included in the result of KVM_GET_VCPU_MMAP_SIZE. + KVM_CAP_COALESCED_MMIO is not documented yet. + +- if KVM_CAP_DIRTY_LOG_RING is available, a number of pages at + KVM_DIRTY_LOG_PAGE_OFFSET * PAGE_SIZE. For more information on + KVM_CAP_DIRTY_LOG_RING, see section 8.3. + 4.6 KVM_SET_MEMORY_REGION ------------------------- @@ -6390,3 +6403,107 @@ When enabled, KVM will disable paravirtual features provided to the guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf (0x40000001). Otherwise, a guest may use the paravirtual features regardless of what has actually been exposed through the CPUID leaf. + +8.29 KVM_CAP_DIRTY_LOG_RING +--------------------------- + +:Architectures: x86 +:Parameters: args[0] - size of the dirty log ring + +KVM is capable of tracking dirty memory using ring buffers that are +mmaped into userspace; there is one dirty ring per vcpu. + +One dirty ring is defined as below internally:: + + struct kvm_dirty_ring { + u32 dirty_index; + u32 reset_index; + u32 size; + u32 soft_limit; + struct kvm_dirty_gfn *dirty_gfns; + int index; + }; + +Dirty GFNs (Guest Frame Numbers) are stored in the dirty_gfns array. +For each of the dirty entry it's defined as:: + + struct kvm_dirty_gfn { + __u32 flags; + __u32 slot; /* as_id | slot_id */ + __u64 offset; + }; + +Each GFN is a state machine itself. The state is embeded in the flags +field, as defined in the uapi header:: + + /* + * KVM dirty GFN flags, defined as: + * + * |---------------+---------------+--------------| + * | bit 1 (reset) | bit 0 (dirty) | Status | + * |---------------+---------------+--------------| + * | 0 | 0 | Invalid GFN | + * | 0 | 1 | Dirty GFN | + * | 1 | X | GFN to reset | + * |---------------+---------------+--------------| + * + * Lifecycle of a dirty GFN goes like: + * + * dirtied collected reset + * 00 -----------> 01 -------------> 1X -------+ + * ^ | + * | | + * +------------------------------------------+ + * + * The userspace program is only responsible for the 01->1X state + * conversion (to collect dirty bits). Also, it must not skip any + * dirty bits so that dirty bits are always collected in sequence. + */ + #define KVM_DIRTY_GFN_F_DIRTY BIT(0) + #define KVM_DIRTY_GFN_F_RESET BIT(1) + #define KVM_DIRTY_GFN_F_MASK 0x3 + +Userspace calls KVM_ENABLE_CAP ioctl right after KVM_CREATE_VM ioctl +to enable this capability for the new guest and set the size of the +rings. It is only allowed before creating any vCPU, and the size of +the ring must be a power of two. The larger the ring buffer, the less +likely the ring is full and the VM is forced to exit to userspace. The +optimal size depends on the workload, but it is recommended that it be +at least 64 KiB (4096 entries). + +Just like for dirty page bitmaps, the buffer tracks writes to +all user memory regions for which the KVM_MEM_LOG_DIRTY_PAGES flag was +set in KVM_SET_USER_MEMORY_REGION. Once a memory region is registered +with the flag set, userspace can start harvesting dirty pages from the +ring buffer. + +To harvest the dirty pages, userspace accesses the mmaped ring buffer +to read the dirty GFNs starting from zero. If the flags has the DIRTY +bit set (at this stage the RESET bit must be cleared), then it means +this GFN is a dirty GFN. The userspace should collect this GFN and +mark the flags from state 01b to 1Xb (bit 0 will be ignored by KVM, +but bit 1 must be set to show that this GFN is collected and waiting +for a reset), and move on to the next GFN. The userspace should +continue to do this until when the flags of a GFN has the DIRTY bit +cleared, it means we've collected all the dirty GFNs we have for now. +It's not a must that the userspace collects the all dirty GFNs in +once. However it must collect the dirty GFNs in sequence, i.e., the +userspace program cannot skip one dirty GFN to collect the one next to +it. + +After processing one or more entries in the ring buffer, userspace +calls the VM ioctl KVM_RESET_DIRTY_RINGS to notify the kernel about +it, so that the kernel will reprotect those collected GFNs. +Therefore, the ioctl must be called *before* reading the content of +the dirty pages. + +The dirty ring interface has a major difference comparing to the +KVM_GET_DIRTY_LOG interface in that, when reading the dirty ring from +userspace it's still possible that the kernel has not yet flushed the +hardware dirty buffers into the kernel buffer (the flushing was +previously done by the KVM_GET_DIRTY_LOG ioctl). To achieve that, one +needs to kick the vcpu out for a hardware buffer flush (vmexit) to +make sure all the existing dirty gfns are flushed to the dirty rings. + +The dirty ring can gets full. When it happens, the KVM_RUN of the +vcpu will return with exit reason KVM_EXIT_DIRTY_LOG_FULL. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index bf4f6f7c4d52..76983b6a46fb 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1230,6 +1230,7 @@ struct kvm_x86_ops { void (*enable_log_dirty_pt_masked)(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t offset, unsigned long mask); + int (*cpu_dirty_log_size)(void); /* pmu operations of sub-arch */ const struct kvm_pmu_ops *pmu_ops; @@ -1742,4 +1743,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu) #define GET_SMSTATE(type, buf, offset) \ (*(type *)((buf) + (offset) - 0x7e00)) +int kvm_cpu_dirty_log_size(void); + #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 89e5f3d1bba8..8e76d3701db3 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -12,6 +12,7 @@ #define KVM_PIO_PAGE_OFFSET 1 #define KVM_COALESCED_MMIO_PAGE_OFFSET 2 +#define KVM_DIRTY_LOG_PAGE_OFFSET 64 #define DE_VECTOR 0 #define DB_VECTOR 1 diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index b804444e16d4..4bd14ab01323 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -10,7 +10,8 @@ endif KVM := ../../../virt/kvm kvm-y += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \ - $(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o + $(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o \ + $(KVM)/dirty_ring.o kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o kvm-y += x86.o emulate.o i8259.o irq.o lapic.o \ diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 17587f496ec7..d3cc173dcf55 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1287,6 +1287,14 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask); } +int kvm_cpu_dirty_log_size(void) +{ + if (kvm_x86_ops.cpu_dirty_log_size) + return kvm_x86_ops.cpu_dirty_log_size(); + + return 0; +} + bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, struct kvm_memory_slot *slot, u64 gfn) { diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 388a7d9f0a5d..ebde9ac1c9c7 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7583,6 +7583,11 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bit) return supported & BIT(bit); } +static int vmx_cpu_dirty_log_size(void) +{ + return enable_pml ? PML_ENTITY_NUM : 0; +} + static struct kvm_x86_ops vmx_x86_ops __initdata = { .hardware_unsetup = hardware_unsetup, @@ -7711,6 +7716,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = { .migrate_timers = vmx_migrate_timers, .msr_filter_changed = vmx_msr_filter_changed, + .cpu_dirty_log_size = vmx_cpu_dirty_log_size, }; static __init int hardware_setup(void) @@ -7828,6 +7834,7 @@ static __init int hardware_setup(void) vmx_x86_ops.slot_disable_log_dirty = NULL; vmx_x86_ops.flush_log_dirty = NULL; vmx_x86_ops.enable_log_dirty_pt_masked = NULL; + vmx_x86_ops.cpu_dirty_log_size = NULL; } if (!cpu_has_vmx_preemption_timer()) @@ -7891,7 +7898,6 @@ static struct kvm_x86_init_ops vmx_init_ops __initdata = { .disabled_by_bios = vmx_disabled_by_bios, .check_processor_compatibility = vmx_check_processor_compat, .hardware_setup = hardware_setup, - .runtime_ops = &vmx_x86_ops, }; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f34d1e463d16..ff34346cce7b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8708,6 +8708,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) bool req_immediate_exit = false; + /* Forbid vmenter if vcpu dirty ring is soft-full */ + if (unlikely(vcpu->kvm->dirty_ring_size && + kvm_dirty_ring_soft_full(&vcpu->dirty_ring))) { + vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL; + trace_kvm_dirty_ring_exit(vcpu); + r = 0; + goto out; + } + if (kvm_request_pending(vcpu)) { if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) { if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu))) { diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h new file mode 100644 index 000000000000..120e5e90fa1d --- /dev/null +++ b/include/linux/kvm_dirty_ring.h @@ -0,0 +1,103 @@ +#ifndef KVM_DIRTY_RING_H +#define KVM_DIRTY_RING_H + +#include + +/** + * kvm_dirty_ring: KVM internal dirty ring structure + * + * @dirty_index: free running counter that points to the next slot in + * dirty_ring->dirty_gfns, where a new dirty page should go + * @reset_index: free running counter that points to the next dirty page + * in dirty_ring->dirty_gfns for which dirty trap needs to + * be reenabled + * @size: size of the compact list, dirty_ring->dirty_gfns + * @soft_limit: when the number of dirty pages in the list reaches this + * limit, vcpu that owns this ring should exit to userspace + * to allow userspace to harvest all the dirty pages + * @dirty_gfns: the array to keep the dirty gfns + * @index: index of this dirty ring + */ +struct kvm_dirty_ring { + u32 dirty_index; + u32 reset_index; + u32 size; + u32 soft_limit; + struct kvm_dirty_gfn *dirty_gfns; + int index; +}; + +#if (KVM_DIRTY_LOG_PAGE_OFFSET == 0) +/* + * If KVM_DIRTY_LOG_PAGE_OFFSET not defined, kvm_dirty_ring.o should + * not be included as well, so define these nop functions for the arch. + */ +static inline u32 kvm_dirty_ring_get_rsvd_entries(void) +{ + return 0; +} + +static inline int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, + int index, u32 size) +{ + return 0; +} + +static inline struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm) +{ + return NULL; +} + +static inline int kvm_dirty_ring_reset(struct kvm *kvm, + struct kvm_dirty_ring *ring) +{ + return 0; +} + +static inline void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, + u32 slot, u64 offset) +{ +} + +static inline struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, + u32 offset) +{ + return NULL; +} + +static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring) +{ +} + +static inline bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring) +{ + return true; +} + +#else /* KVM_DIRTY_LOG_PAGE_OFFSET == 0 */ + +u32 kvm_dirty_ring_get_rsvd_entries(void); +int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size); +struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm); + +/* + * called with kvm->slots_lock held, returns the number of + * processed pages. + */ +int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring); + +/* + * returns =0: successfully pushed + * <0: unable to push, need to wait + */ +void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset); + +/* for use in vm_operations_struct */ +struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset); + +void kvm_dirty_ring_free(struct kvm_dirty_ring *ring); +bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring); + +#endif /* KVM_DIRTY_LOG_PAGE_OFFSET == 0 */ + +#endif /* KVM_DIRTY_RING_H */ diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 6789500364c2..c10cf91bde19 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -34,6 +34,7 @@ #include #include +#include #ifndef KVM_MAX_VCPU_ID #define KVM_MAX_VCPU_ID KVM_MAX_VCPUS @@ -319,6 +320,7 @@ struct kvm_vcpu { bool preempted; bool ready; struct kvm_vcpu_arch arch; + struct kvm_dirty_ring dirty_ring; }; static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu) @@ -505,6 +507,7 @@ struct kvm { struct srcu_struct irq_srcu; pid_t userspace_pid; unsigned int max_halt_poll_ns; + u32 dirty_ring_size; }; #define kvm_err(fmt, ...) \ @@ -1479,4 +1482,14 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu) } #endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */ +/* + * This defines how many reserved entries we want to keep before we + * kick the vcpu to the userspace to avoid dirty ring full. This + * value can be tuned to higher if e.g. PML is enabled on the host. + */ +#define KVM_DIRTY_RING_RSVD_ENTRIES 64 + +/* Max number of entries allowed for each kvm dirty ring */ +#define KVM_DIRTY_RING_MAX_ENTRIES 65536 + #endif diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h index 26cfb0fa8e7e..49d7d0fe29f6 100644 --- a/include/trace/events/kvm.h +++ b/include/trace/events/kvm.h @@ -399,6 +399,69 @@ TRACE_EVENT(kvm_halt_poll_ns, #define trace_kvm_halt_poll_ns_shrink(vcpu_id, new, old) \ trace_kvm_halt_poll_ns(false, vcpu_id, new, old) +TRACE_EVENT(kvm_dirty_ring_push, + TP_PROTO(struct kvm_dirty_ring *ring, u32 slot, u64 offset), + TP_ARGS(ring, slot, offset), + + TP_STRUCT__entry( + __field(int, index) + __field(u32, dirty_index) + __field(u32, reset_index) + __field(u32, slot) + __field(u64, offset) + ), + + TP_fast_assign( + __entry->index = ring->index; + __entry->dirty_index = ring->dirty_index; + __entry->reset_index = ring->reset_index; + __entry->slot = slot; + __entry->offset = offset; + ), + + TP_printk("ring %d: dirty 0x%x reset 0x%x " + "slot %u offset 0x%llx (used %u)", + __entry->index, __entry->dirty_index, + __entry->reset_index, __entry->slot, __entry->offset, + __entry->dirty_index - __entry->reset_index) +); + +TRACE_EVENT(kvm_dirty_ring_reset, + TP_PROTO(struct kvm_dirty_ring *ring), + TP_ARGS(ring), + + TP_STRUCT__entry( + __field(int, index) + __field(u32, dirty_index) + __field(u32, reset_index) + ), + + TP_fast_assign( + __entry->index = ring->index; + __entry->dirty_index = ring->dirty_index; + __entry->reset_index = ring->reset_index; + ), + + TP_printk("ring %d: dirty 0x%x reset 0x%x (used %u)", + __entry->index, __entry->dirty_index, __entry->reset_index, + __entry->dirty_index - __entry->reset_index) +); + +TRACE_EVENT(kvm_dirty_ring_exit, + TP_PROTO(struct kvm_vcpu *vcpu), + TP_ARGS(vcpu), + + TP_STRUCT__entry( + __field(int, vcpu_id) + ), + + TP_fast_assign( + __entry->vcpu_id = vcpu->vcpu_id; + ), + + TP_printk("vcpu %d", __entry->vcpu_id) +); + #endif /* _TRACE_KVM_MAIN_H */ /* This part must be outside protection */ diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index ca41220b40b8..de4c69a61524 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -250,6 +250,7 @@ struct kvm_hyperv_exit { #define KVM_EXIT_ARM_NISV 28 #define KVM_EXIT_X86_RDMSR 29 #define KVM_EXIT_X86_WRMSR 30 +#define KVM_EXIT_DIRTY_RING_FULL 31 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -1053,6 +1054,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_X86_USER_SPACE_MSR 188 #define KVM_CAP_X86_MSR_FILTER 189 #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190 +#define KVM_CAP_DIRTY_LOG_RING 191 #ifdef KVM_CAP_IRQ_ROUTING @@ -1557,6 +1559,9 @@ struct kvm_pv_cmd { /* Available with KVM_CAP_X86_MSR_FILTER */ #define KVM_X86_SET_MSR_FILTER _IOW(KVMIO, 0xc6, struct kvm_msr_filter) +/* Available with KVM_CAP_DIRTY_LOG_RING */ +#define KVM_RESET_DIRTY_RINGS _IO(KVMIO, 0xc7) + /* Secure Encrypted Virtualization command */ enum sev_cmd_id { /* Guest initialization commands */ @@ -1710,4 +1715,52 @@ struct kvm_hyperv_eventfd { #define KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (1 << 0) #define KVM_DIRTY_LOG_INITIALLY_SET (1 << 1) +/* + * Arch needs to define the macro after implementing the dirty ring + * feature. KVM_DIRTY_LOG_PAGE_OFFSET should be defined as the + * starting page offset of the dirty ring structures. + */ +#ifndef KVM_DIRTY_LOG_PAGE_OFFSET +#define KVM_DIRTY_LOG_PAGE_OFFSET 0 +#endif + +/* + * KVM dirty GFN flags, defined as: + * + * |---------------+---------------+--------------| + * | bit 1 (reset) | bit 0 (dirty) | Status | + * |---------------+---------------+--------------| + * | 0 | 0 | Invalid GFN | + * | 0 | 1 | Dirty GFN | + * | 1 | X | GFN to reset | + * |---------------+---------------+--------------| + * + * Lifecycle of a dirty GFN goes like: + * + * dirtied collected reset + * 00 -----------> 01 -------------> 1X -------+ + * ^ | + * | | + * +------------------------------------------+ + * + * The userspace program is only responsible for the 01->1X state + * conversion (to collect dirty bits). Also, it must not skip any + * dirty bits so that dirty bits are always collected in sequence. + */ +#define KVM_DIRTY_GFN_F_DIRTY BIT(0) +#define KVM_DIRTY_GFN_F_RESET BIT(1) +#define KVM_DIRTY_GFN_F_MASK 0x3 + +/* + * KVM dirty rings should be mapped at KVM_DIRTY_LOG_PAGE_OFFSET of + * per-vcpu mmaped regions as an array of struct kvm_dirty_gfn. The + * size of the gfn buffer is decided by the first argument when + * enabling KVM_CAP_DIRTY_LOG_RING. + */ +struct kvm_dirty_gfn { + __u32 flags; + __u32 slot; + __u64 offset; +}; + #endif /* __LINUX_KVM_H */ diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c new file mode 100644 index 000000000000..e6a57fdac20d --- /dev/null +++ b/virt/kvm/dirty_ring.c @@ -0,0 +1,197 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * KVM dirty ring implementation + * + * Copyright 2019 Red Hat, Inc. + */ +#include +#include +#include +#include +#include + +int __weak kvm_cpu_dirty_log_size(void) +{ + return 0; +} + +u32 kvm_dirty_ring_get_rsvd_entries(void) +{ + return KVM_DIRTY_RING_RSVD_ENTRIES + kvm_cpu_dirty_log_size(); +} + +static u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring) +{ + return READ_ONCE(ring->dirty_index) - READ_ONCE(ring->reset_index); +} + +bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring) +{ + return kvm_dirty_ring_used(ring) >= ring->soft_limit; +} + +static bool kvm_dirty_ring_full(struct kvm_dirty_ring *ring) +{ + return kvm_dirty_ring_used(ring) >= ring->size; +} + +struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm) +{ + struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); + + WARN_ON_ONCE(vcpu->kvm != kvm); + + return &vcpu->dirty_ring; +} + +static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask) +{ + struct kvm_memory_slot *memslot; + int as_id, id; + + as_id = slot >> 16; + id = (u16)slot; + + if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS) + return; + + memslot = id_to_memslot(__kvm_memslots(kvm, as_id), id); + + if (!memslot || (offset + __fls(mask)) >= memslot->npages) + return; + + spin_lock(&kvm->mmu_lock); + kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot, offset, mask); + spin_unlock(&kvm->mmu_lock); +} + +int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size) +{ + ring->dirty_gfns = vmalloc(size); + if (!ring->dirty_gfns) + return -ENOMEM; + memset(ring->dirty_gfns, 0, size); + + ring->size = size / sizeof(struct kvm_dirty_gfn); + ring->soft_limit = ring->size - kvm_dirty_ring_get_rsvd_entries(); + ring->dirty_index = 0; + ring->reset_index = 0; + ring->index = index; + + return 0; +} + +static inline void kvm_dirty_gfn_set_invalid(struct kvm_dirty_gfn *gfn) +{ + gfn->flags = 0; +} + +static inline void kvm_dirty_gfn_set_dirtied(struct kvm_dirty_gfn *gfn) +{ + gfn->flags = KVM_DIRTY_GFN_F_DIRTY; +} + +static inline bool kvm_dirty_gfn_invalid(struct kvm_dirty_gfn *gfn) +{ + return gfn->flags == 0; +} + +static inline bool kvm_dirty_gfn_collected(struct kvm_dirty_gfn *gfn) +{ + return gfn->flags & KVM_DIRTY_GFN_F_RESET; +} + +int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring) +{ + u32 cur_slot, next_slot; + u64 cur_offset, next_offset; + unsigned long mask; + int count = 0; + struct kvm_dirty_gfn *entry; + bool first_round = true; + + /* This is only needed to make compilers happy */ + cur_slot = cur_offset = mask = 0; + + while (true) { + entry = &ring->dirty_gfns[ring->reset_index & (ring->size - 1)]; + + if (!kvm_dirty_gfn_collected(entry)) + break; + + next_slot = READ_ONCE(entry->slot); + next_offset = READ_ONCE(entry->offset); + + /* Update the flags to reflect that this GFN is reset */ + kvm_dirty_gfn_set_invalid(entry); + + ring->reset_index++; + count++; + /* + * Try to coalesce the reset operations when the guest is + * scanning pages in the same slot. + */ + if (!first_round && next_slot == cur_slot) { + s64 delta = next_offset - cur_offset; + + if (delta >= 0 && delta < BITS_PER_LONG) { + mask |= 1ull << delta; + continue; + } + + /* Backwards visit, careful about overflows! */ + if (delta > -BITS_PER_LONG && delta < 0 && + (mask << -delta >> -delta) == mask) { + cur_offset = next_offset; + mask = (mask << -delta) | 1; + continue; + } + } + kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask); + cur_slot = next_slot; + cur_offset = next_offset; + mask = 1; + first_round = false; + } + + kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask); + + trace_kvm_dirty_ring_reset(ring); + + return count; +} + +void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset) +{ + struct kvm_dirty_gfn *entry; + + /* It should never get full */ + WARN_ON_ONCE(kvm_dirty_ring_full(ring)); + + entry = &ring->dirty_gfns[ring->dirty_index & (ring->size - 1)]; + + /* It should always be an invalid entry to fill in */ + WARN_ON_ONCE(!kvm_dirty_gfn_invalid(entry)); + + entry->slot = slot; + entry->offset = offset; + /* + * Make sure the data is filled in before we publish this to + * the userspace program. There's no paired kernel-side reader. + */ + smp_wmb(); + kvm_dirty_gfn_set_dirtied(entry); + ring->dirty_index++; + trace_kvm_dirty_ring_push(ring, slot, offset); +} + +struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset) +{ + return vmalloc_to_page((void *)ring->dirty_gfns + offset * PAGE_SIZE); +} + +void kvm_dirty_ring_free(struct kvm_dirty_ring *ring) +{ + vfree(ring->dirty_gfns); + ring->dirty_gfns = NULL; +} diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 59ad45ef9b55..357b3f7b1127 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -63,6 +63,8 @@ #define CREATE_TRACE_POINTS #include +#include + /* Worst case buffer size needed for holding an integer. */ #define ITOA_MAX_LEN 12 @@ -415,6 +417,7 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) void kvm_vcpu_destroy(struct kvm_vcpu *vcpu) { + kvm_dirty_ring_free(&vcpu->dirty_ring); kvm_arch_vcpu_destroy(vcpu); /* @@ -2650,8 +2653,13 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot, { if (memslot && memslot->dirty_bitmap) { unsigned long rel_gfn = gfn - memslot->base_gfn; + u32 slot = (memslot->as_id << 16) | memslot->id; - set_bit_le(rel_gfn, memslot->dirty_bitmap); + if (kvm->dirty_ring_size) + kvm_dirty_ring_push(kvm_dirty_ring_get(kvm), + slot, rel_gfn); + else + set_bit_le(rel_gfn, memslot->dirty_bitmap); } } EXPORT_SYMBOL_GPL(mark_page_dirty_in_slot); @@ -3011,6 +3019,17 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode) } EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin); +static bool kvm_page_in_dirty_ring(struct kvm *kvm, unsigned long pgoff) +{ +#if KVM_DIRTY_LOG_PAGE_OFFSET > 0 + return (pgoff >= KVM_DIRTY_LOG_PAGE_OFFSET) && + (pgoff < KVM_DIRTY_LOG_PAGE_OFFSET + + kvm->dirty_ring_size / PAGE_SIZE); +#else + return false; +#endif +} + static vm_fault_t kvm_vcpu_fault(struct vm_fault *vmf) { struct kvm_vcpu *vcpu = vmf->vma->vm_file->private_data; @@ -3026,6 +3045,10 @@ static vm_fault_t kvm_vcpu_fault(struct vm_fault *vmf) else if (vmf->pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET) page = virt_to_page(vcpu->kvm->coalesced_mmio_ring); #endif + else if (kvm_page_in_dirty_ring(vcpu->kvm, vmf->pgoff)) + page = kvm_dirty_ring_get_page( + &vcpu->dirty_ring, + vmf->pgoff - KVM_DIRTY_LOG_PAGE_OFFSET); else return kvm_arch_vcpu_fault(vcpu, vmf); get_page(page); @@ -3039,6 +3062,14 @@ static const struct vm_operations_struct kvm_vcpu_vm_ops = { static int kvm_vcpu_mmap(struct file *file, struct vm_area_struct *vma) { + struct kvm_vcpu *vcpu = file->private_data; + unsigned long pages = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; + + if ((kvm_page_in_dirty_ring(vcpu->kvm, vma->vm_pgoff) || + kvm_page_in_dirty_ring(vcpu->kvm, vma->vm_pgoff + pages - 1)) && + ((vma->vm_flags & VM_EXEC) || !(vma->vm_flags & VM_SHARED))) + return -EINVAL; + vma->vm_ops = &kvm_vcpu_vm_ops; return 0; } @@ -3132,6 +3163,13 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) if (r) goto vcpu_free_run_page; + if (kvm->dirty_ring_size) { + r = kvm_dirty_ring_alloc(&vcpu->dirty_ring, + id, kvm->dirty_ring_size); + if (r) + goto arch_vcpu_destroy; + } + mutex_lock(&kvm->lock); if (kvm_get_vcpu_by_id(kvm, id)) { r = -EEXIST; @@ -3165,6 +3203,8 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) unlock_vcpu_destroy: mutex_unlock(&kvm->lock); + kvm_dirty_ring_free(&vcpu->dirty_ring); +arch_vcpu_destroy: kvm_arch_vcpu_destroy(vcpu); vcpu_free_run_page: free_page((unsigned long)vcpu->run); @@ -3637,12 +3677,78 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) #endif case KVM_CAP_NR_MEMSLOTS: return KVM_USER_MEM_SLOTS; + case KVM_CAP_DIRTY_LOG_RING: +#ifdef CONFIG_X86 + return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn); +#else + return 0; +#endif default: break; } return kvm_vm_ioctl_check_extension(kvm, arg); } +static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size) +{ + int r; + + if (!KVM_DIRTY_LOG_PAGE_OFFSET) + return -EINVAL; + + /* the size should be power of 2 */ + if (!size || (size & (size - 1))) + return -EINVAL; + + /* Should be bigger to keep the reserved entries, or a page */ + if (size < kvm_dirty_ring_get_rsvd_entries() * + sizeof(struct kvm_dirty_gfn) || size < PAGE_SIZE) + return -EINVAL; + + if (size > KVM_DIRTY_RING_MAX_ENTRIES * + sizeof(struct kvm_dirty_gfn)) + return -E2BIG; + + /* We only allow it to set once */ + if (kvm->dirty_ring_size) + return -EINVAL; + + mutex_lock(&kvm->lock); + + if (kvm->created_vcpus) { + /* We don't allow to change this value after vcpu created */ + r = -EINVAL; + } else { + kvm->dirty_ring_size = size; + r = 0; + } + + mutex_unlock(&kvm->lock); + return r; +} + +static int kvm_vm_ioctl_reset_dirty_pages(struct kvm *kvm) +{ + int i; + struct kvm_vcpu *vcpu; + int cleared = 0; + + if (!kvm->dirty_ring_size) + return -EINVAL; + + mutex_lock(&kvm->slots_lock); + + kvm_for_each_vcpu(i, vcpu, kvm) + cleared += kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring); + + mutex_unlock(&kvm->slots_lock); + + if (cleared) + kvm_flush_remote_tlbs(kvm); + + return cleared; +} + int __attribute__((weak)) kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) { @@ -3673,6 +3779,8 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm, kvm->max_halt_poll_ns = cap->args[0]; return 0; } + case KVM_CAP_DIRTY_LOG_RING: + return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]); default: return kvm_vm_ioctl_enable_cap(kvm, cap); } @@ -3857,6 +3965,9 @@ static long kvm_vm_ioctl(struct file *filp, case KVM_CHECK_EXTENSION: r = kvm_vm_ioctl_check_extension_generic(kvm, arg); break; + case KVM_RESET_DIRTY_RINGS: + r = kvm_vm_ioctl_reset_dirty_pages(kvm); + break; default: r = kvm_arch_vm_ioctl(filp, ioctl, arg); } From patchwork Fri Oct 23 18:33:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11854395 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75F1EC5517A for ; Fri, 23 Oct 2020 18:35:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 097E820BED for ; Fri, 23 Oct 2020 18:35:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="hqSYUs5p" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754835AbgJWSfi (ORCPT ); Fri, 23 Oct 2020 14:35:38 -0400 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:55660 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754862AbgJWSeN (ORCPT ); Fri, 23 Oct 2020 14:34:13 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1603478052; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=L3ZJsaKS1gNf0C0NZKbX4rkEwggNcdAZdu3ekfUHbZY=; b=hqSYUs5pNjN9+RCXc6rWPiDtC/dKCcvpsIe3KGu+qOkbRKkC1NhWNoEF0s4cDO8eQOPkuq AxmCiAlRASc8Dpu8uLPQbNU8J0Z18fMdEP2mFmEJAx5n8wfLTL/5qv59MhwA998h5Zt+zE SUXVaGHpi076NoG/BfS8iJE/6J7A+ZU= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-420-519dfBqYObqkufHJoNtkgQ-1; Fri, 23 Oct 2020 14:34:10 -0400 X-MC-Unique: 519dfBqYObqkufHJoNtkgQ-1 Received: by mail-qt1-f197.google.com with SMTP id t4so1652345qtd.23 for ; Fri, 23 Oct 2020 11:34:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=L3ZJsaKS1gNf0C0NZKbX4rkEwggNcdAZdu3ekfUHbZY=; b=L9tjDVjtJ0JFR4oSNKQubJ6JA9VMVKBERxlsQwljOzbL/mwiPL1qLpxls1yUismV50 Ryjdgwx86h264B3YbPfqN85JyR++78K5kFPCwAb/KpiwBAOskFyy9aAHHs6YBhgLBfkF YhPtHxgFvT0MOLdC/z0Y5Qu9runVJGL4q/PNqqUm4Wp0KXKQKr951p+hRtaYnQnGAYjF Jv0OcgQcd+piDJe0SJ62mTxWnWqzVzTVZ4K8WswHMdGWTzu+YHdCp25qTkk/z7h+4ymg ILiufDHG6h1i40pzBUX0XejRKNuv1wpa0gIYHkNJ/jGKxmk7s4/mutfD33ulGsJUSs96 ydYg== X-Gm-Message-State: AOAM5315iHi5FhjxUJN5natdQ+MgWQOMKaDByO9gSA05hRuikrxDydL7 xddTqaKEIS7U3x+RCK8ZZ0DianF/OEJFfJVofOFlTMPDYd/SZ2eo0qXov8VhOqywMig6Byfd3gx 1rnriWfENaOCb X-Received: by 2002:a05:620a:1f3:: with SMTP id x19mr3642317qkn.62.1603478050032; Fri, 23 Oct 2020 11:34:10 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwRGmmVmVvSiP53ER+mNj/yPmjpiiLZ0TmvgyeT1c1TU7ocoQdHqPRJtCsXzUuQa26q5Tsp2w== X-Received: by 2002:a05:620a:1f3:: with SMTP id x19mr3642286qkn.62.1603478049660; Fri, 23 Oct 2020 11:34:09 -0700 (PDT) Received: from xz-x1.redhat.com (toroon474qw-lp140-04-174-95-215-133.dsl.bell.ca. [174.95.215.133]) by smtp.gmail.com with ESMTPSA id u11sm1490407qtk.61.2020.10.23.11.34.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 11:34:09 -0700 (PDT) From: Peter Xu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Sean Christopherson , peterx@redhat.com, Andrew Jones , "Dr . David Alan Gilbert" , Paolo Bonzini Subject: [PATCH v15 06/14] KVM: Make dirty ring exclusive to dirty bitmap log Date: Fri, 23 Oct 2020 14:33:50 -0400 Message-Id: <20201023183358.50607-7-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201023183358.50607-1-peterx@redhat.com> References: <20201023183358.50607-1-peterx@redhat.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org There's no good reason to use both the dirty bitmap logging and the new dirty ring buffer to track dirty bits. We should be able to even support both of them at the same time, but it could complicate things which could actually help little. Let's simply make it the rule before we enable dirty ring on any arch, that we don't allow these two interfaces to be used together. The big world switch would be KVM_CAP_DIRTY_LOG_RING capability enablement. That's where we'll switch from the default dirty logging way to the dirty ring way. As long as kvm->dirty_ring_size is setup correctly, we'll once and for all switch to the dirty ring buffer mode for the current virtual machine. Signed-off-by: Peter Xu --- Documentation/virt/kvm/api.rst | 7 +++++++ virt/kvm/kvm_main.c | 12 ++++++++++++ 2 files changed, 19 insertions(+) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 00da3f01d632..33e393759444 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6507,3 +6507,10 @@ make sure all the existing dirty gfns are flushed to the dirty rings. The dirty ring can gets full. When it happens, the KVM_RUN of the vcpu will return with exit reason KVM_EXIT_DIRTY_LOG_FULL. + +NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding +ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctl +KVM_GET_DIRTY_LOG. After enabling KVM_CAP_DIRTY_LOG_RING with an +acceptable dirty ring size, the virtual machine will switch to the +dirty ring tracking mode. Further ioctls to either KVM_GET_DIRTY_LOG +or KVM_CLEAR_DIRTY_LOG will fail. diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 357b3f7b1127..c05b94696b21 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1426,6 +1426,10 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log, unsigned long n; unsigned long any = 0; + /* Dirty ring tracking is exclusive to dirty log tracking */ + if (kvm->dirty_ring_size) + return -EINVAL; + *memslot = NULL; *is_dirty = 0; @@ -1487,6 +1491,10 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log) unsigned long *dirty_bitmap_buffer; bool flush; + /* Dirty ring tracking is exclusive to dirty log tracking */ + if (kvm->dirty_ring_size) + return -EINVAL; + as_id = log->slot >> 16; id = (u16)log->slot; if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS) @@ -1595,6 +1603,10 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm, unsigned long *dirty_bitmap_buffer; bool flush; + /* Dirty ring tracking is exclusive to dirty log tracking */ + if (kvm->dirty_ring_size) + return -EINVAL; + as_id = log->slot >> 16; id = (u16)log->slot; if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS) From patchwork Fri Oct 23 18:33:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11854375 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 957ABC55179 for ; Fri, 23 Oct 2020 18:34:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 41B3A20BED for ; Fri, 23 Oct 2020 18:34:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="bMuhZCsx" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754898AbgJWSeU (ORCPT ); Fri, 23 Oct 2020 14:34:20 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:26904 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754869AbgJWSeP (ORCPT ); Fri, 23 Oct 2020 14:34:15 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1603478054; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7bU0wiJLATmj7mwD+p1vewMpZhS8DPN+XFbmzRlSmU4=; b=bMuhZCsxE/7zRUlAaYGC5nrcQnnf6HjK+va+rfdblbOTTAbJ5jk9HJ0SAgI8ImDKMn4IGo eRUXuJYMV8Ct9IpZ/OiFpaUujHKmsa0Iymopip2LWLHWpO5LsUoGPPPnWLsU6nIMT239hJ Pmf4/PNca1bfawDdByEtiHQeFmN8wEs= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-3-FYuJrBIsOe2GlmIOQs1dvg-1; Fri, 23 Oct 2020 14:34:12 -0400 X-MC-Unique: FYuJrBIsOe2GlmIOQs1dvg-1 Received: by mail-qv1-f72.google.com with SMTP id l8so1572637qvz.2 for ; Fri, 23 Oct 2020 11:34:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=7bU0wiJLATmj7mwD+p1vewMpZhS8DPN+XFbmzRlSmU4=; b=s5Gq88VkrS9D/z34kwNNgmZdBsvH0rXovqBgLUymyZGvWKGJ2vC/0C66LJ7t2tsI1u JRzsboxy+spQ+ImeVsdafl7BQTvK3Ht8LfYmxdkieGVOABm3sHJH07a4Gk62h9oRMk/a g+hrlMMAaxhMB0q1GQ3v91+afLHG9f18aXFdDCTeW3hsBd2GPZAdz96Q7I/MVM0yIWv3 OcFGPB3CJse4gihsCNYH3Amdn0SApRufVLlJ6FMPUrGAGgFWpvqYblwA7xZ7q+l1Wp4h 53xIpM8nsKvNc2EDRDgubdlKKwCUPJBUAz2nzpwmsF3S+u3GqgFnS6+7+f7HZ+DTIRgM gvTg== X-Gm-Message-State: AOAM5317X4oyNb0+ijwKlrlpsim30586kbkfLGLDTuhUMH2meD9+FrFr +rtZmGxb0tsWkjU8I06jkS4rnf7yLpKzR1UAsw4FKxE7uLeClyuqKhjFahBgJzAt5LYUds5BQe5 n1x+xUeO2pte9 X-Received: by 2002:a37:5ca:: with SMTP id 193mr3631130qkf.44.1603478051491; Fri, 23 Oct 2020 11:34:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx2BvSDmeA/DQo1WlKLH5Sh49c8JiIjEnWjrJdZ2KaYBDlqsWlSfnI2XBSixjBAGFGyb76Czg== X-Received: by 2002:a37:5ca:: with SMTP id 193mr3631100qkf.44.1603478051236; Fri, 23 Oct 2020 11:34:11 -0700 (PDT) Received: from xz-x1.redhat.com (toroon474qw-lp140-04-174-95-215-133.dsl.bell.ca. [174.95.215.133]) by smtp.gmail.com with ESMTPSA id u11sm1490407qtk.61.2020.10.23.11.34.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 11:34:10 -0700 (PDT) From: Peter Xu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Sean Christopherson , peterx@redhat.com, Andrew Jones , "Dr . David Alan Gilbert" , Paolo Bonzini Subject: [PATCH v15 07/14] KVM: Don't allocate dirty bitmap if dirty ring is enabled Date: Fri, 23 Oct 2020 14:33:51 -0400 Message-Id: <20201023183358.50607-8-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201023183358.50607-1-peterx@redhat.com> References: <20201023183358.50607-1-peterx@redhat.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Because kvm dirty rings and kvm dirty log is used in an exclusive way, Let's avoid creating the dirty_bitmap when kvm dirty ring is enabled. At the meantime, since the dirty_bitmap will be conditionally created now, we can't use it as a sign of "whether this memory slot enabled dirty tracking". Change users like that to check against the kvm memory slot flags. Note that there still can be chances where the kvm memory slot got its dirty_bitmap allocated, _if_ the memory slots are created before enabling of the dirty rings and at the same time with the dirty tracking capability enabled, they'll still with the dirty_bitmap. However it should not hurt much (e.g., the bitmaps will always be freed if they are there), and the real users normally won't trigger this because dirty bit tracking flag should in most cases only be applied to kvm slots only before migration starts, that should be far latter than kvm initializes (VM starts). Signed-off-by: Peter Xu --- arch/x86/kvm/mmu/mmu.c | 2 +- include/linux/kvm_host.h | 5 +++++ virt/kvm/kvm_main.c | 4 ++-- 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index d3cc173dcf55..af2460c42922 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -820,7 +820,7 @@ gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn, slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); if (!slot || slot->flags & KVM_MEMSLOT_INVALID) return NULL; - if (no_dirty_log && slot->dirty_bitmap) + if (no_dirty_log && kvm_slot_dirty_track_enabled(slot)) return NULL; return slot; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index c10cf91bde19..dcc1f0e44366 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -351,6 +351,11 @@ struct kvm_memory_slot { u16 as_id; }; +static inline bool kvm_slot_dirty_track_enabled(struct kvm_memory_slot *slot) +{ + return slot->flags & KVM_MEM_LOG_DIRTY_PAGES; +} + static inline unsigned long kvm_dirty_bitmap_bytes(struct kvm_memory_slot *memslot) { return ALIGN(memslot->npages, BITS_PER_LONG) / 8; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index c05b94696b21..abe921f90f35 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1365,7 +1365,7 @@ int __kvm_set_memory_region(struct kvm *kvm, /* Allocate/free page dirty bitmap as needed */ if (!(new.flags & KVM_MEM_LOG_DIRTY_PAGES)) new.dirty_bitmap = NULL; - else if (!new.dirty_bitmap) { + else if (!new.dirty_bitmap && !kvm->dirty_ring_size) { r = kvm_alloc_dirty_bitmap(&new); if (r) return r; @@ -2663,7 +2663,7 @@ EXPORT_SYMBOL_GPL(kvm_clear_guest); void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot, gfn_t gfn) { - if (memslot && memslot->dirty_bitmap) { + if (memslot && kvm_slot_dirty_track_enabled(memslot)) { unsigned long rel_gfn = gfn - memslot->base_gfn; u32 slot = (memslot->as_id << 16) | memslot->id; From patchwork Fri Oct 23 18:33:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11854377 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 125C7C5517A for ; Fri, 23 Oct 2020 18:34:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ADE432192A for ; Fri, 23 Oct 2020 18:34:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="cTvR8xPh" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754903AbgJWSeV (ORCPT ); Fri, 23 Oct 2020 14:34:21 -0400 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:43417 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754879AbgJWSeS (ORCPT ); Fri, 23 Oct 2020 14:34:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1603478056; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=R2ALy67CtgBF8fDQMZT8Ir08uhC8QFjLGT5DkxROP8I=; b=cTvR8xPhBTnctesD+UwXV0vJvGddfU/U+PeIHcm+QMIRXObmB6jlSWbNPtLlI6D5NG8KHq 7IMQNWGbAMvlqonWgtuMb3aBKs3a7oZz550lZjEWRGAazDbMaLgyCbj/euhWy9MLRa688X lRfENadsmSYQNhpzuzrTWmCI4MFxEfc= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-520-Zz6nPlHKPW2id6W1Rf5U1A-1; Fri, 23 Oct 2020 14:34:15 -0400 X-MC-Unique: Zz6nPlHKPW2id6W1Rf5U1A-1 Received: by mail-qk1-f200.google.com with SMTP id j20so1624229qkl.7 for ; Fri, 23 Oct 2020 11:34:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=R2ALy67CtgBF8fDQMZT8Ir08uhC8QFjLGT5DkxROP8I=; b=H5C3/NcFxkdQIh4bagZPNWdO0aqfuI+AYiUceFqudkL2dGdN6UMODiKpilFyRSHK32 rFDy5+HEfJvngJmSZscAuWX4shPTQ8r/pOqy8Me87Jj0qfpLQNH8IkVJQkyquNrDPk1t zkC2K0b4KTnlGx3nZ98Gba1JVWNmlaPlWw9DEbPXK5IhhIl9t6AwcUAmYhPAw24kV/zd 2pmxugWTYWwNi5AyESZSYiZNH68mvOvcjd2M0NfmOy95RPv2MZ4M/nFkxdhp9i/hqSsm RWQ6JRlI6LHq1pe/J2YQ4Ggf+YkxVhFMhCQbmj1EvFMXqpKUrA6e+EQZMz/uaCD97OJD Lo2A== X-Gm-Message-State: AOAM532H8MRnS63JIPGvQ9TqhVa4VX6SB2zNPeYN7qEOS1h+612X3/Nm +q0SYLQJpLF3sbeDBP/SicA2+rUtHTQHunLUSOWETvxLx9SzT4Aju2r3PxwYO2MvEQPMNt8Mqti YNLEuR21uAxOD X-Received: by 2002:a05:620a:1e7:: with SMTP id x7mr3511151qkn.465.1603478054152; Fri, 23 Oct 2020 11:34:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzdbEKO+mgZn7fwYiD7deukv0UF5TGReMh4GNOJSVqM5OzUudOoLbS84dolU1UuPtlhENiuCw== X-Received: by 2002:a05:620a:1e7:: with SMTP id x7mr3511035qkn.465.1603478052715; Fri, 23 Oct 2020 11:34:12 -0700 (PDT) Received: from xz-x1.redhat.com (toroon474qw-lp140-04-174-95-215-133.dsl.bell.ca. [174.95.215.133]) by smtp.gmail.com with ESMTPSA id u11sm1490407qtk.61.2020.10.23.11.34.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 11:34:11 -0700 (PDT) From: Peter Xu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Sean Christopherson , peterx@redhat.com, Andrew Jones , "Dr . David Alan Gilbert" , Paolo Bonzini Subject: [PATCH v15 08/14] KVM: selftests: Always clear dirty bitmap after iteration Date: Fri, 23 Oct 2020 14:33:52 -0400 Message-Id: <20201023183358.50607-9-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201023183358.50607-1-peterx@redhat.com> References: <20201023183358.50607-1-peterx@redhat.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org We don't clear the dirty bitmap before because KVM_GET_DIRTY_LOG will clear it for us before copying the dirty log onto it. However we'd still better to clear it explicitly instead of assuming the kernel will always do it for us. More importantly, in the upcoming dirty ring tests we'll start to fetch dirty pages from a ring buffer, so no one is going to clear the dirty bitmap for us. Reviewed-by: Andrew Jones Signed-off-by: Peter Xu --- tools/testing/selftests/kvm/dirty_log_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index 752ec158ac59..6a8275a22861 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -195,7 +195,7 @@ static void vm_dirty_log_verify(enum vm_guest_mode mode, unsigned long *bmap) page); } - if (test_bit_le(page, bmap)) { + if (test_and_clear_bit_le(page, bmap)) { host_dirty_count++; /* * If the bit is set, the value written onto From patchwork Fri Oct 23 18:33:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11854385 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42166C388F9 for ; Fri, 23 Oct 2020 18:34:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D9E4520882 for ; Fri, 23 Oct 2020 18:34:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="TkLZeKpI" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754936AbgJWSec (ORCPT ); Fri, 23 Oct 2020 14:34:32 -0400 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:51138 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754918AbgJWSea (ORCPT ); Fri, 23 Oct 2020 14:34:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1603478068; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=a57vVrUIpIowCryfJBvA6gkT7bU/YnMsmf35dW0iiOc=; b=TkLZeKpIFCz7JLfeW6+QkSBLFg1MSEZ0xUvPbdkqBBgfnc6NaMRMQcIYqvfRA7NXTGWzUi xfRRi17HX2xiTdSUbrfrdPC297Llf4diMysD2L06E/KUUceGjtHgW2vMLowhAlG8ec7xpe cQZE0f29h77JoNp1TO1lNtNpcDIr0hI= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-202-NKdKTzcfOv-0ri8r8XpSfQ-1; Fri, 23 Oct 2020 14:34:26 -0400 X-MC-Unique: NKdKTzcfOv-0ri8r8XpSfQ-1 Received: by mail-qv1-f72.google.com with SMTP id s8so1530217qvv.18 for ; Fri, 23 Oct 2020 11:34:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=a57vVrUIpIowCryfJBvA6gkT7bU/YnMsmf35dW0iiOc=; b=HOopOL6MXuF5g+qVnCsZbdWoD+9kG5TM0axQAzepUZE2CTWlNEQ/0NRLhSb7CLIMMj d2ugZCjoS0xv5EH3MEUiTogdApyOkD5PhlZCkKkIH5ifdLMBdtOPdGuDrbFdxg7i0vWp cHmwPMrNHNLma9N3YUXibABHGJhuF+dlB+4kJUy2H+9oXXYpAzjoRRKCDUdCfBN8CF1F Z0dpdl1ECObmnLw3kaVukrt+C3KcKS4JAEBIfTNw8441OxKDKHNY0FqH60ICcZmGlwR/ hXw6dqWAZQN9qDSVwQbrGtQ2tDpc0tbz7umFrqYKpybRnWNBnL9cCvbfUC+TznboDpPz jEPw== X-Gm-Message-State: AOAM531FG8tLZr8o1qpB8G0qDW4hHOuwctUudHu2fyJ4XWzZu580/Ayl R/k3ASYg1cVB9GptJAclnap9p0QF5r4CVM/5KoQ5uFX5JCgBavx1FwIQb+P5wyfshIYQcLfwUHh h1X7HwCw+hoI5 X-Received: by 2002:a05:622a:43:: with SMTP id y3mr3215092qtw.93.1603478060881; Fri, 23 Oct 2020 11:34:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz6/Ebnzwnu/6h81bMUFvwrQiqcy5mAPDTqLNib8+JEmF31hWxxXO+x6/3gvy0nZJOOhQREkg== X-Received: by 2002:a05:622a:43:: with SMTP id y3mr3214721qtw.93.1603478054486; Fri, 23 Oct 2020 11:34:14 -0700 (PDT) Received: from xz-x1.redhat.com (toroon474qw-lp140-04-174-95-215-133.dsl.bell.ca. [174.95.215.133]) by smtp.gmail.com with ESMTPSA id u11sm1490407qtk.61.2020.10.23.11.34.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 11:34:13 -0700 (PDT) From: Peter Xu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Sean Christopherson , peterx@redhat.com, Andrew Jones , "Dr . David Alan Gilbert" , Paolo Bonzini Subject: [PATCH v15 09/14] KVM: selftests: Sync uapi/linux/kvm.h to tools/ Date: Fri, 23 Oct 2020 14:33:53 -0400 Message-Id: <20201023183358.50607-10-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201023183358.50607-1-peterx@redhat.com> References: <20201023183358.50607-1-peterx@redhat.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This will be needed to extend the kvm selftest program. Signed-off-by: Peter Xu --- tools/include/uapi/linux/kvm.h | 78 +++++++++++++++++++++++++++++++++- 1 file changed, 76 insertions(+), 2 deletions(-) diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h index f6d86033c4fa..de4c69a61524 100644 --- a/tools/include/uapi/linux/kvm.h +++ b/tools/include/uapi/linux/kvm.h @@ -248,6 +248,9 @@ struct kvm_hyperv_exit { #define KVM_EXIT_IOAPIC_EOI 26 #define KVM_EXIT_HYPERV 27 #define KVM_EXIT_ARM_NISV 28 +#define KVM_EXIT_X86_RDMSR 29 +#define KVM_EXIT_X86_WRMSR 30 +#define KVM_EXIT_DIRTY_RING_FULL 31 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -413,6 +416,17 @@ struct kvm_run { __u64 esr_iss; __u64 fault_ipa; } arm_nisv; + /* KVM_EXIT_X86_RDMSR / KVM_EXIT_X86_WRMSR */ + struct { + __u8 error; /* user -> kernel */ + __u8 pad[7]; +#define KVM_MSR_EXIT_REASON_INVAL (1 << 0) +#define KVM_MSR_EXIT_REASON_UNKNOWN (1 << 1) +#define KVM_MSR_EXIT_REASON_FILTER (1 << 2) + __u32 reason; /* kernel -> user */ + __u32 index; /* kernel -> user */ + __u64 data; /* kernel <-> user */ + } msr; /* Fix the size of the union. */ char padding[256]; }; @@ -790,9 +804,10 @@ struct kvm_ppc_resize_hpt { #define KVM_VM_PPC_HV 1 #define KVM_VM_PPC_PR 2 -/* on MIPS, 0 forces trap & emulate, 1 forces VZ ASE */ -#define KVM_VM_MIPS_TE 0 +/* on MIPS, 0 indicates auto, 1 forces VZ ASE, 2 forces trap & emulate */ +#define KVM_VM_MIPS_AUTO 0 #define KVM_VM_MIPS_VZ 1 +#define KVM_VM_MIPS_TE 2 #define KVM_S390_SIE_PAGE_OFFSET 1 @@ -1035,6 +1050,11 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_LAST_CPU 184 #define KVM_CAP_SMALLER_MAXPHYADDR 185 #define KVM_CAP_S390_DIAG318 186 +#define KVM_CAP_STEAL_TIME 187 +#define KVM_CAP_X86_USER_SPACE_MSR 188 +#define KVM_CAP_X86_MSR_FILTER 189 +#define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190 +#define KVM_CAP_DIRTY_LOG_RING 191 #ifdef KVM_CAP_IRQ_ROUTING @@ -1536,6 +1556,12 @@ struct kvm_pv_cmd { /* Available with KVM_CAP_S390_PROTECTED */ #define KVM_S390_PV_COMMAND _IOWR(KVMIO, 0xc5, struct kvm_pv_cmd) +/* Available with KVM_CAP_X86_MSR_FILTER */ +#define KVM_X86_SET_MSR_FILTER _IOW(KVMIO, 0xc6, struct kvm_msr_filter) + +/* Available with KVM_CAP_DIRTY_LOG_RING */ +#define KVM_RESET_DIRTY_RINGS _IO(KVMIO, 0xc7) + /* Secure Encrypted Virtualization command */ enum sev_cmd_id { /* Guest initialization commands */ @@ -1689,4 +1715,52 @@ struct kvm_hyperv_eventfd { #define KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (1 << 0) #define KVM_DIRTY_LOG_INITIALLY_SET (1 << 1) +/* + * Arch needs to define the macro after implementing the dirty ring + * feature. KVM_DIRTY_LOG_PAGE_OFFSET should be defined as the + * starting page offset of the dirty ring structures. + */ +#ifndef KVM_DIRTY_LOG_PAGE_OFFSET +#define KVM_DIRTY_LOG_PAGE_OFFSET 0 +#endif + +/* + * KVM dirty GFN flags, defined as: + * + * |---------------+---------------+--------------| + * | bit 1 (reset) | bit 0 (dirty) | Status | + * |---------------+---------------+--------------| + * | 0 | 0 | Invalid GFN | + * | 0 | 1 | Dirty GFN | + * | 1 | X | GFN to reset | + * |---------------+---------------+--------------| + * + * Lifecycle of a dirty GFN goes like: + * + * dirtied collected reset + * 00 -----------> 01 -------------> 1X -------+ + * ^ | + * | | + * +------------------------------------------+ + * + * The userspace program is only responsible for the 01->1X state + * conversion (to collect dirty bits). Also, it must not skip any + * dirty bits so that dirty bits are always collected in sequence. + */ +#define KVM_DIRTY_GFN_F_DIRTY BIT(0) +#define KVM_DIRTY_GFN_F_RESET BIT(1) +#define KVM_DIRTY_GFN_F_MASK 0x3 + +/* + * KVM dirty rings should be mapped at KVM_DIRTY_LOG_PAGE_OFFSET of + * per-vcpu mmaped regions as an array of struct kvm_dirty_gfn. The + * size of the gfn buffer is decided by the first argument when + * enabling KVM_CAP_DIRTY_LOG_RING. + */ +struct kvm_dirty_gfn { + __u32 flags; + __u32 slot; + __u64 offset; +}; + #endif /* __LINUX_KVM_H */ From patchwork Fri Oct 23 18:33:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11854403 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19FC7C55178 for ; Fri, 23 Oct 2020 18:35:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9D29520882 for ; Fri, 23 Oct 2020 18:35:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="a1mFPz/K" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755030AbgJWSfQ (ORCPT ); Fri, 23 Oct 2020 14:35:16 -0400 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:49066 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754908AbgJWSeZ (ORCPT ); Fri, 23 Oct 2020 14:34:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1603478064; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KiD3PSm1HruAcyGWQFTaD4JNh5HqjFztRdyDy4Rmc8s=; b=a1mFPz/K9F5Gay7Z9kiJooDU4kqGRU3BaxnVjUgI+RrxPdPPtz15ekOiBcEwIcMys2sIXn MoC/ceSRxpOLXnhLd5exbdVCVA0NiekArQpT0qz5JTr65Y5rGGS0v+4/Qd7F0MkgZcKDsf OVJniidyyfV5m5lU5n8QXYMjmLx6QX4= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-596-_9d0sJ17OuyrmgAvg3sxQw-1; Fri, 23 Oct 2020 14:34:22 -0400 X-MC-Unique: _9d0sJ17OuyrmgAvg3sxQw-1 Received: by mail-qk1-f198.google.com with SMTP id a81so1620744qkg.10 for ; Fri, 23 Oct 2020 11:34:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=KiD3PSm1HruAcyGWQFTaD4JNh5HqjFztRdyDy4Rmc8s=; b=Eu1SS3ReSU0dGO4IQbuxemQpNeu+hFeJUJviOndm8glYmiZVSu2gQRqXcksPx94uxu i9K0QRSX9nIGA/Mo6lrHvz2jZAR2CuGgvP4ulwsO7Zh7cdHx5Jwj+Mu0zjAmVJU6hHwk xbY2EOsJzT+HS9HFJqIn7/iux2M2fyq3td+4RVX46Fi9DNGX0m6zabnLLsV6IOZVAf6j uBNOl+2CYgZMgt4oCkLfwthl4+Y+DxRhuYXoaEVYW3KIkSt/2QsNrba6jSZpSUbiANbP W79iUMl18dUIZKJQ7YT4yv4prNkC+NVxsiswMCGif2r967Y5y7zoH4M4WZMuePfqZIA+ 7HUA== X-Gm-Message-State: AOAM5305DxWPIvUi67Z2iWTryAlIpUFQlllHGKfk4pXTrqwy/GDxqkOe 0lCT4yqcz8fL4jzm1xWF80NQuMwQivem3OJYuVPLboIu9Q6xaGoDgaNyDMMagbt+ldhuDlW9xgS veYs4KgjbvxWH X-Received: by 2002:a37:a304:: with SMTP id m4mr3483133qke.287.1603478061785; Fri, 23 Oct 2020 11:34:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwbpQ/SHS0znI/WhM/AqDQ7dMUnKRKh7xLa9Xn567EdPeAq6V4vzO0vevfhdQcdFpbe4GnWeQ== X-Received: by 2002:a37:a304:: with SMTP id m4mr3483109qke.287.1603478061480; Fri, 23 Oct 2020 11:34:21 -0700 (PDT) Received: from xz-x1.redhat.com (toroon474qw-lp140-04-174-95-215-133.dsl.bell.ca. [174.95.215.133]) by smtp.gmail.com with ESMTPSA id u11sm1490407qtk.61.2020.10.23.11.34.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 11:34:20 -0700 (PDT) From: Peter Xu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Sean Christopherson , peterx@redhat.com, Andrew Jones , "Dr . David Alan Gilbert" , Paolo Bonzini Subject: [PATCH v15 10/14] KVM: selftests: Use a single binary for dirty/clear log test Date: Fri, 23 Oct 2020 14:33:54 -0400 Message-Id: <20201023183358.50607-11-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201023183358.50607-1-peterx@redhat.com> References: <20201023183358.50607-1-peterx@redhat.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Remove the clear_dirty_log test, instead merge it into the existing dirty_log_test. It should be cleaner to use this single binary to do both tests, also it's a preparation for the upcoming dirty ring test. The default behavior will run all the modes in sequence. Reviewed-by: Andrew Jones Signed-off-by: Peter Xu --- tools/testing/selftests/kvm/Makefile | 2 - .../selftests/kvm/clear_dirty_log_test.c | 6 - tools/testing/selftests/kvm/dirty_log_test.c | 187 +++++++++++++++--- 3 files changed, 156 insertions(+), 39 deletions(-) delete mode 100644 tools/testing/selftests/kvm/clear_dirty_log_test.c diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 7ebe71fbca53..47b998f090ef 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -57,14 +57,12 @@ TEST_GEN_PROGS_x86_64 += x86_64/xss_msr_test TEST_GEN_PROGS_x86_64 += x86_64/debug_regs TEST_GEN_PROGS_x86_64 += x86_64/tsc_msrs_test TEST_GEN_PROGS_x86_64 += x86_64/user_msr_test -TEST_GEN_PROGS_x86_64 += clear_dirty_log_test TEST_GEN_PROGS_x86_64 += demand_paging_test TEST_GEN_PROGS_x86_64 += dirty_log_test TEST_GEN_PROGS_x86_64 += kvm_create_max_vcpus TEST_GEN_PROGS_x86_64 += set_memory_region_test TEST_GEN_PROGS_x86_64 += steal_time -TEST_GEN_PROGS_aarch64 += clear_dirty_log_test TEST_GEN_PROGS_aarch64 += demand_paging_test TEST_GEN_PROGS_aarch64 += dirty_log_test TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus diff --git a/tools/testing/selftests/kvm/clear_dirty_log_test.c b/tools/testing/selftests/kvm/clear_dirty_log_test.c deleted file mode 100644 index 11672ec6f74e..000000000000 --- a/tools/testing/selftests/kvm/clear_dirty_log_test.c +++ /dev/null @@ -1,6 +0,0 @@ -#define USE_CLEAR_DIRTY_LOG -#define KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (1 << 0) -#define KVM_DIRTY_LOG_INITIALLY_SET (1 << 1) -#define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \ - KVM_DIRTY_LOG_INITIALLY_SET) -#include "dirty_log_test.c" diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index 6a8275a22861..139ccb550618 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -128,6 +128,78 @@ static uint64_t host_dirty_count; static uint64_t host_clear_count; static uint64_t host_track_next_count; +enum log_mode_t { + /* Only use KVM_GET_DIRTY_LOG for logging */ + LOG_MODE_DIRTY_LOG = 0, + + /* Use both KVM_[GET|CLEAR]_DIRTY_LOG for logging */ + LOG_MODE_CLEAR_LOG = 1, + + LOG_MODE_NUM, + + /* Run all supported modes */ + LOG_MODE_ALL = LOG_MODE_NUM, +}; + +/* Mode of logging to test. Default is to run all supported modes */ +static enum log_mode_t host_log_mode_option = LOG_MODE_ALL; +/* Logging mode for current run */ +static enum log_mode_t host_log_mode; + +static bool clear_log_supported(void) +{ + return kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2); +} + +static void clear_log_create_vm_done(struct kvm_vm *vm) +{ + struct kvm_enable_cap cap = {}; + u64 manual_caps; + + manual_caps = kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2); + TEST_ASSERT(manual_caps, "MANUAL_CAPS is zero!"); + manual_caps &= (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | + KVM_DIRTY_LOG_INITIALLY_SET); + cap.cap = KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2; + cap.args[0] = manual_caps; + vm_enable_cap(vm, &cap); +} + +static void dirty_log_collect_dirty_pages(struct kvm_vm *vm, int slot, + void *bitmap, uint32_t num_pages) +{ + kvm_vm_get_dirty_log(vm, slot, bitmap); +} + +static void clear_log_collect_dirty_pages(struct kvm_vm *vm, int slot, + void *bitmap, uint32_t num_pages) +{ + kvm_vm_get_dirty_log(vm, slot, bitmap); + kvm_vm_clear_dirty_log(vm, slot, bitmap, 0, num_pages); +} + +struct log_mode { + const char *name; + /* Return true if this mode is supported, otherwise false */ + bool (*supported)(void); + /* Hook when the vm creation is done (before vcpu creation) */ + void (*create_vm_done)(struct kvm_vm *vm); + /* Hook to collect the dirty pages into the bitmap provided */ + void (*collect_dirty_pages) (struct kvm_vm *vm, int slot, + void *bitmap, uint32_t num_pages); +} log_modes[LOG_MODE_NUM] = { + { + .name = "dirty-log", + .collect_dirty_pages = dirty_log_collect_dirty_pages, + }, + { + .name = "clear-log", + .supported = clear_log_supported, + .create_vm_done = clear_log_create_vm_done, + .collect_dirty_pages = clear_log_collect_dirty_pages, + }, +}; + /* * We use this bitmap to track some pages that should have its dirty * bit set in the _next_ iteration. For example, if we detected the @@ -137,6 +209,44 @@ static uint64_t host_track_next_count; */ static unsigned long *host_bmap_track; +static void log_modes_dump(void) +{ + int i; + + printf("all"); + for (i = 0; i < LOG_MODE_NUM; i++) + printf(", %s", log_modes[i].name); + printf("\n"); +} + +static bool log_mode_supported(void) +{ + struct log_mode *mode = &log_modes[host_log_mode]; + + if (mode->supported) + return mode->supported(); + + return true; +} + +static void log_mode_create_vm_done(struct kvm_vm *vm) +{ + struct log_mode *mode = &log_modes[host_log_mode]; + + if (mode->create_vm_done) + mode->create_vm_done(vm); +} + +static void log_mode_collect_dirty_pages(struct kvm_vm *vm, int slot, + void *bitmap, uint32_t num_pages) +{ + struct log_mode *mode = &log_modes[host_log_mode]; + + TEST_ASSERT(mode->collect_dirty_pages != NULL, + "collect_dirty_pages() is required for any log mode!"); + mode->collect_dirty_pages(vm, slot, bitmap, num_pages); +} + static void generate_random_array(uint64_t *guest_array, uint64_t size) { uint64_t i; @@ -257,6 +367,7 @@ static struct kvm_vm *create_vm(enum vm_guest_mode mode, uint32_t vcpuid, #ifdef __x86_64__ vm_create_irqchip(vm); #endif + log_mode_create_vm_done(vm); vm_vcpu_add_default(vm, vcpuid, guest_code); return vm; } @@ -264,10 +375,6 @@ static struct kvm_vm *create_vm(enum vm_guest_mode mode, uint32_t vcpuid, #define DIRTY_MEM_BITS 30 /* 1G */ #define PAGE_SHIFT_4K 12 -#ifdef USE_CLEAR_DIRTY_LOG -static u64 dirty_log_manual_caps; -#endif - static void run_test(enum vm_guest_mode mode, unsigned long iterations, unsigned long interval, uint64_t phys_offset) { @@ -275,6 +382,12 @@ static void run_test(enum vm_guest_mode mode, unsigned long iterations, struct kvm_vm *vm; unsigned long *bmap; + if (!log_mode_supported()) { + print_skip("Log mode '%s' not supported", + log_modes[host_log_mode].name); + return; + } + /* * We reserve page table for 2 times of extra dirty mem which * will definitely cover the original (1G+) test range. Here @@ -317,14 +430,6 @@ static void run_test(enum vm_guest_mode mode, unsigned long iterations, bmap = bitmap_alloc(host_num_pages); host_bmap_track = bitmap_alloc(host_num_pages); -#ifdef USE_CLEAR_DIRTY_LOG - struct kvm_enable_cap cap = {}; - - cap.cap = KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2; - cap.args[0] = dirty_log_manual_caps; - vm_enable_cap(vm, &cap); -#endif - /* Add an extra memory slot for testing dirty logging */ vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, guest_test_phys_mem, @@ -362,11 +467,8 @@ static void run_test(enum vm_guest_mode mode, unsigned long iterations, while (iteration < iterations) { /* Give the vcpu thread some time to dirty some pages */ usleep(interval * 1000); - kvm_vm_get_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap); -#ifdef USE_CLEAR_DIRTY_LOG - kvm_vm_clear_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap, 0, - host_num_pages); -#endif + log_mode_collect_dirty_pages(vm, TEST_MEM_SLOT_INDEX, + bmap, host_num_pages); vm_dirty_log_verify(mode, bmap); iteration++; sync_global_to_guest(vm, iteration); @@ -410,6 +512,9 @@ static void help(char *name) TEST_HOST_LOOP_INTERVAL); printf(" -p: specify guest physical test memory offset\n" " Warning: a low offset can conflict with the loaded test code.\n"); + printf(" -M: specify the host logging mode " + "(default: run all log modes). Supported modes: \n\t"); + log_modes_dump(); printf(" -m: specify the guest mode ID to test " "(default: test all supported modes)\n" " This option may be used multiple times.\n" @@ -429,18 +534,7 @@ int main(int argc, char *argv[]) bool mode_selected = false; uint64_t phys_offset = 0; unsigned int mode; - int opt, i; - -#ifdef USE_CLEAR_DIRTY_LOG - dirty_log_manual_caps = - kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2); - if (!dirty_log_manual_caps) { - print_skip("KVM_CLEAR_DIRTY_LOG not available"); - exit(KSFT_SKIP); - } - dirty_log_manual_caps &= (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | - KVM_DIRTY_LOG_INITIALLY_SET); -#endif + int opt, i, j; #ifdef __x86_64__ guest_mode_init(VM_MODE_PXXV48_4K, true, true); @@ -464,7 +558,7 @@ int main(int argc, char *argv[]) guest_mode_init(VM_MODE_P40V48_4K, true, true); #endif - while ((opt = getopt(argc, argv, "hi:I:p:m:")) != -1) { + while ((opt = getopt(argc, argv, "hi:I:p:m:M:")) != -1) { switch (opt) { case 'i': iterations = strtol(optarg, NULL, 10); @@ -486,6 +580,26 @@ int main(int argc, char *argv[]) "Guest mode ID %d too big", mode); guest_modes[mode].enabled = true; break; + case 'M': + if (!strcmp(optarg, "all")) { + host_log_mode_option = LOG_MODE_ALL; + break; + } + for (i = 0; i < LOG_MODE_NUM; i++) { + if (!strcmp(optarg, log_modes[i].name)) { + pr_info("Setting log mode to: '%s'\n", + optarg); + host_log_mode_option = i; + break; + } + } + if (i == LOG_MODE_NUM) { + printf("Log mode '%s' invalid. Please choose " + "from: ", optarg); + log_modes_dump(); + exit(1); + } + break; case 'h': default: help(argv[0]); @@ -507,7 +621,18 @@ int main(int argc, char *argv[]) TEST_ASSERT(guest_modes[i].supported, "Guest mode ID %d (%s) not supported.", i, vm_guest_mode_string(i)); - run_test(i, iterations, interval, phys_offset); + if (host_log_mode_option == LOG_MODE_ALL) { + /* Run each log mode */ + for (j = 0; j < LOG_MODE_NUM; j++) { + pr_info("Testing Log Mode '%s'\n", + log_modes[j].name); + host_log_mode = j; + run_test(i, iterations, interval, phys_offset); + } + } else { + host_log_mode = host_log_mode_option; + run_test(i, iterations, interval, phys_offset); + } } return 0; From patchwork Fri Oct 23 18:33:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11854391 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67C40C55178 for ; Fri, 23 Oct 2020 18:35:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 161112464E for ; Fri, 23 Oct 2020 18:35:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="gSw866OF" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755013AbgJWSfM (ORCPT ); Fri, 23 Oct 2020 14:35:12 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:43884 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754911AbgJWSe1 (ORCPT ); Fri, 23 Oct 2020 14:34:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1603478065; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1Frsr00VTIsYjdqDeTzSRco7BaTYBstaIvbV6rdyHlg=; b=gSw866OFRFAjdk1qPqJwGGuFw0LGP0gqXbXImiMhQ8dlDNYWxxxIODZD0hqBESJg6ualwD TOI7+SN6xu8kSCaGKNSkJwcrQ2qT1sc1+Ac7IhoiGReBhwVEKjhwrnBje/PPUGcXXN0dGe GF9PWuRpMap+kqkHy/f01kbDC9lAPEc= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-173-6XCw4TKEM2yue6KB4yEjeQ-1; Fri, 23 Oct 2020 14:34:24 -0400 X-MC-Unique: 6XCw4TKEM2yue6KB4yEjeQ-1 Received: by mail-qk1-f200.google.com with SMTP id a81so1620789qkg.10 for ; Fri, 23 Oct 2020 11:34:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1Frsr00VTIsYjdqDeTzSRco7BaTYBstaIvbV6rdyHlg=; b=EHoCnFIWqgeM9HiODEZRs8kol+PgyPUKNW+gmozypaoJvWGemeS6ES0e85DYLU6v3i ar36KEEq6jtFszfoYXt3qfszmSrS//nV+6YsLzSHzcBAYzOfQCBURbqQE64gHVzXnndw XTJ421nJsiNOo2RFt55szlkJAW31c53AyrqZ+vtJSpaYZPSdkhu+fylSTSWkeTHkPFPF hzuM7Kjv7fRyQD/2pl4+jPRuRgJPaundaCiIHOKotWKEe6p3FzeSP92Hy7JS4hxKaXau bUVOwwRqRmW/wiJ4gCV+0Inwrq06tzbsfyOp+BFk0gAHD4h2PtWeEI3S2oPTJanUZtJn Vmbw== X-Gm-Message-State: AOAM532lGdhV7tC3YotF1UyJsqrQXvvgjPZ8jRGW7xR6EkJ9Q6k+TvSO ebN+qco47cvlsDF6pU+DdZL0rzn8PFnfJgSyaf5uda8H5mw/0od6lnyHKrwO69etMStaRXPdOOf aoUTs3EpqYVEK X-Received: by 2002:a0c:a046:: with SMTP id b64mr355645qva.55.1603478063249; Fri, 23 Oct 2020 11:34:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwOYTeZ0ac3uGzRsmzmGfUqoKhn3pSnvG8Kmi4QqfMXryXBKDUJGYNAxSV7Vo6420pD0yg2aA== X-Received: by 2002:a0c:a046:: with SMTP id b64mr355628qva.55.1603478063023; Fri, 23 Oct 2020 11:34:23 -0700 (PDT) Received: from xz-x1.redhat.com (toroon474qw-lp140-04-174-95-215-133.dsl.bell.ca. [174.95.215.133]) by smtp.gmail.com with ESMTPSA id u11sm1490407qtk.61.2020.10.23.11.34.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 11:34:22 -0700 (PDT) From: Peter Xu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Sean Christopherson , peterx@redhat.com, Andrew Jones , "Dr . David Alan Gilbert" , Paolo Bonzini Subject: [PATCH v15 11/14] KVM: selftests: Introduce after_vcpu_run hook for dirty log test Date: Fri, 23 Oct 2020 14:33:55 -0400 Message-Id: <20201023183358.50607-12-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201023183358.50607-1-peterx@redhat.com> References: <20201023183358.50607-1-peterx@redhat.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Provide a hook for the checks after vcpu_run() completes. Preparation for the dirty ring test because we'll need to take care of another exit reason. Reviewed-by: Andrew Jones Signed-off-by: Peter Xu --- tools/testing/selftests/kvm/dirty_log_test.c | 36 +++++++++++++------- 1 file changed, 24 insertions(+), 12 deletions(-) diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index 139ccb550618..a2160946bcf5 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -178,6 +178,15 @@ static void clear_log_collect_dirty_pages(struct kvm_vm *vm, int slot, kvm_vm_clear_dirty_log(vm, slot, bitmap, 0, num_pages); } +static void default_after_vcpu_run(struct kvm_vm *vm) +{ + struct kvm_run *run = vcpu_state(vm, VCPU_ID); + + TEST_ASSERT(get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC, + "Invalid guest sync status: exit_reason=%s\n", + exit_reason_str(run->exit_reason)); +} + struct log_mode { const char *name; /* Return true if this mode is supported, otherwise false */ @@ -187,16 +196,20 @@ struct log_mode { /* Hook to collect the dirty pages into the bitmap provided */ void (*collect_dirty_pages) (struct kvm_vm *vm, int slot, void *bitmap, uint32_t num_pages); + /* Hook to call when after each vcpu run */ + void (*after_vcpu_run)(struct kvm_vm *vm); } log_modes[LOG_MODE_NUM] = { { .name = "dirty-log", .collect_dirty_pages = dirty_log_collect_dirty_pages, + .after_vcpu_run = default_after_vcpu_run, }, { .name = "clear-log", .supported = clear_log_supported, .create_vm_done = clear_log_create_vm_done, .collect_dirty_pages = clear_log_collect_dirty_pages, + .after_vcpu_run = default_after_vcpu_run, }, }; @@ -247,6 +260,14 @@ static void log_mode_collect_dirty_pages(struct kvm_vm *vm, int slot, mode->collect_dirty_pages(vm, slot, bitmap, num_pages); } +static void log_mode_after_vcpu_run(struct kvm_vm *vm) +{ + struct log_mode *mode = &log_modes[host_log_mode]; + + if (mode->after_vcpu_run) + mode->after_vcpu_run(vm); +} + static void generate_random_array(uint64_t *guest_array, uint64_t size) { uint64_t i; @@ -261,25 +282,16 @@ static void *vcpu_worker(void *data) struct kvm_vm *vm = data; uint64_t *guest_array; uint64_t pages_count = 0; - struct kvm_run *run; - - run = vcpu_state(vm, VCPU_ID); guest_array = addr_gva2hva(vm, (vm_vaddr_t)random_array); - generate_random_array(guest_array, TEST_PAGES_PER_LOOP); while (!READ_ONCE(host_quit)) { + generate_random_array(guest_array, TEST_PAGES_PER_LOOP); + pages_count += TEST_PAGES_PER_LOOP; /* Let the guest dirty the random pages */ ret = _vcpu_run(vm, VCPU_ID); TEST_ASSERT(ret == 0, "vcpu_run failed: %d\n", ret); - if (get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC) { - pages_count += TEST_PAGES_PER_LOOP; - generate_random_array(guest_array, TEST_PAGES_PER_LOOP); - } else { - TEST_FAIL("Invalid guest sync status: " - "exit_reason=%s\n", - exit_reason_str(run->exit_reason)); - } + log_mode_after_vcpu_run(vm); } pr_info("Dirtied %"PRIu64" pages\n", pages_count); From patchwork Fri Oct 23 18:33:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11854379 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D7E0C55178 for ; Fri, 23 Oct 2020 18:34:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9BEB920BED for ; Fri, 23 Oct 2020 18:34:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="gF1vAAsp" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754950AbgJWSef (ORCPT ); Fri, 23 Oct 2020 14:34:35 -0400 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:20813 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754917AbgJWSeb (ORCPT ); Fri, 23 Oct 2020 14:34:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1603478068; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0z/RWnyx6asV3vaFYvVvq9mFsBydI658yV3XFUiS5T0=; b=gF1vAAspOiO0Clg05ylp7ahdvwKQqV8tZn4fubGg2uES/UIfDowSB63+dUg6aDonGSXp/5 PaBwBbxulvOe450NkeQSLXdRo/Bw+RIU73oxg5p+0AXdZJ46s/r4qnir2A3XZKQotrupOR VYEFZBWGB+mXoyqn6gr03K7Ub3n6RPE= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-38-9LddTZaCOUSAywZH0UMoGw-1; Fri, 23 Oct 2020 14:34:26 -0400 X-MC-Unique: 9LddTZaCOUSAywZH0UMoGw-1 Received: by mail-qv1-f71.google.com with SMTP id s9so1533242qvd.17 for ; Fri, 23 Oct 2020 11:34:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=0z/RWnyx6asV3vaFYvVvq9mFsBydI658yV3XFUiS5T0=; b=IpO2ylkRUekEbFSrBH/0MEMO7+G9j18jZBb11FGSdPkZn/BOJ4D3jaKORz04nX+U69 /cx81mtncWX3IsGWNxnG+Ngb6dzQDssjkd+uAUcFInkdzLRJdffKo7AVKMBagIq6/pQo /Y3vw+RSB0qJn30NB0YgxDU/+dGTGQNwzHk/Q3OJ6m0gjFnJl2UzyK1TvsTdKrYSj3Vd 8xM7KR/+BpxmVarBVKTR5fgwAiIGz+yWgq0eVSP8gseiD/bMcxMz/cpjGGwHRNtirsVV 52xrmlqpCCPL5eso7Jf+6sksactN8pLhb+bhdcf6WQDZjL2V0ojjWPyNLkmFSpyMvZ74 Yr7Q== X-Gm-Message-State: AOAM5329rz//Z29dDY01nTOnUeB88iFIL/lwWVJbyWypb1x4iw82wOHI iOWzQloi7CHdDstP1PMymIP+lDuSciPxkiNvR3nnXmTDUtJX9ccO/cXMw9h2vCx7ky9BbtARCGW 9PX0tGiPuac8B X-Received: by 2002:ad4:59cf:: with SMTP id el15mr3917487qvb.17.1603478064871; Fri, 23 Oct 2020 11:34:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzSbx7IL+PYw0jL0dprlw+lmrmAI1Acrn+1rYqh6WUnXvUtp/OIz6aAZWm82LPEDcB2ygxf5A== X-Received: by 2002:ad4:59cf:: with SMTP id el15mr3917452qvb.17.1603478064479; Fri, 23 Oct 2020 11:34:24 -0700 (PDT) Received: from xz-x1.redhat.com (toroon474qw-lp140-04-174-95-215-133.dsl.bell.ca. [174.95.215.133]) by smtp.gmail.com with ESMTPSA id u11sm1490407qtk.61.2020.10.23.11.34.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 11:34:23 -0700 (PDT) From: Peter Xu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Sean Christopherson , peterx@redhat.com, Andrew Jones , "Dr . David Alan Gilbert" , Paolo Bonzini Subject: [PATCH v15 12/14] KVM: selftests: Add dirty ring buffer test Date: Fri, 23 Oct 2020 14:33:56 -0400 Message-Id: <20201023183358.50607-13-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201023183358.50607-1-peterx@redhat.com> References: <20201023183358.50607-1-peterx@redhat.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add the initial dirty ring buffer test. The current test implements the userspace dirty ring collection, by only reaping the dirty ring when the ring is full. So it's still running synchronously like this: vcpu main thread 1. vcpu dirties pages 2. vcpu gets dirty ring full (userspace exit) 3. main thread waits until full (so hardware buffers flushed) 4. main thread collects 5. main thread continues vcpu 6. vcpu continues, goes back to 1 We can't directly collects dirty bits during vcpu execution because otherwise we can't guarantee the hardware dirty bits were flushed when we collect and we're very strict on the dirty bits so otherwise we can fail the future verify procedure. A follow up patch will make this test to support async just like the existing dirty log test, by adding a vcpu kick mechanism. Signed-off-by: Peter Xu --- tools/testing/selftests/kvm/dirty_log_test.c | 212 +++++++++++++++++- .../testing/selftests/kvm/include/kvm_util.h | 3 + tools/testing/selftests/kvm/lib/kvm_util.c | 63 +++++- .../selftests/kvm/lib/kvm_util_internal.h | 4 + 4 files changed, 278 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index a2160946bcf5..9fcd15b8e999 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -12,8 +12,10 @@ #include #include #include +#include #include #include +#include #include "test_util.h" #include "kvm_util.h" @@ -57,6 +59,8 @@ # define test_and_clear_bit_le test_and_clear_bit #endif +#define TEST_DIRTY_RING_COUNT 1024 + /* * Guest/Host shared variables. Ensure addr_gva2hva() and/or * sync_global_to/from_guest() are used when accessing from @@ -128,6 +132,24 @@ static uint64_t host_dirty_count; static uint64_t host_clear_count; static uint64_t host_track_next_count; +/* Whether dirty ring reset is requested, or finished */ +static sem_t dirty_ring_vcpu_stop; +static sem_t dirty_ring_vcpu_cont; +/* + * This is only used for verifying the dirty pages. Dirty ring has a very + * tricky case when the ring just got full, kvm will do userspace exit due to + * ring full. When that happens, the very last PFN is very tricky in that it's + * set but actually the data is not changed (the guest WRITE is not really + * applied yet), because when we find that the dirty ring is full, we refused + * to continue the vcpu, hence we got the dirty gfn recorded without the new + * data. For this specific case, it's safe to skip checking this pfn for this + * bit, because it's a redundant bit, and when the write happens later the bit + * will be set again. We use this variable to always keep track of the latest + * dirty gfn we've collected, so that if a mismatch of data found later in the + * verifying process, we let it pass. + */ +static uint64_t dirty_ring_last_page; + enum log_mode_t { /* Only use KVM_GET_DIRTY_LOG for logging */ LOG_MODE_DIRTY_LOG = 0, @@ -135,6 +157,9 @@ enum log_mode_t { /* Use both KVM_[GET|CLEAR]_DIRTY_LOG for logging */ LOG_MODE_CLEAR_LOG = 1, + /* Use dirty ring for logging */ + LOG_MODE_DIRTY_RING = 2, + LOG_MODE_NUM, /* Run all supported modes */ @@ -187,6 +212,121 @@ static void default_after_vcpu_run(struct kvm_vm *vm) exit_reason_str(run->exit_reason)); } +static bool dirty_ring_supported(void) +{ + return kvm_check_cap(KVM_CAP_DIRTY_LOG_RING); +} + +static void dirty_ring_create_vm_done(struct kvm_vm *vm) +{ + /* + * Switch to dirty ring mode after VM creation but before any + * of the vcpu creation. + */ + vm_enable_dirty_ring(vm, TEST_DIRTY_RING_COUNT * + sizeof(struct kvm_dirty_gfn)); +} + +static inline bool dirty_gfn_is_dirtied(struct kvm_dirty_gfn *gfn) +{ + return gfn->flags == KVM_DIRTY_GFN_F_DIRTY; +} + +static inline void dirty_gfn_set_collected(struct kvm_dirty_gfn *gfn) +{ + gfn->flags = KVM_DIRTY_GFN_F_RESET; +} + +static uint32_t dirty_ring_collect_one(struct kvm_dirty_gfn *dirty_gfns, + int slot, void *bitmap, + uint32_t num_pages, uint32_t *fetch_index) +{ + struct kvm_dirty_gfn *cur; + uint32_t count = 0; + + while (true) { + cur = &dirty_gfns[*fetch_index % TEST_DIRTY_RING_COUNT]; + if (!dirty_gfn_is_dirtied(cur)) + break; + TEST_ASSERT(cur->slot == slot, "Slot number didn't match: " + "%u != %u", cur->slot, slot); + TEST_ASSERT(cur->offset < num_pages, "Offset overflow: " + "0x%llx >= 0x%x", cur->offset, num_pages); + pr_info("fetch 0x%x page %llu\n", *fetch_index, cur->offset); + set_bit_le(cur->offset, bitmap); + dirty_ring_last_page = cur->offset; + dirty_gfn_set_collected(cur); + (*fetch_index)++; + count++; + } + + return count; +} + +static void dirty_ring_collect_dirty_pages(struct kvm_vm *vm, int slot, + void *bitmap, uint32_t num_pages) +{ + /* We only have one vcpu */ + static uint32_t fetch_index = 0; + uint32_t count = 0, cleared; + + /* + * Before fetching the dirty pages, we need a vmexit of the + * worker vcpu to make sure the hardware dirty buffers were + * flushed. This is not needed for dirty-log/clear-log tests + * because get dirty log will natually do so. + * + * For now we do it in the simple way - we simply wait until + * the vcpu uses up the soft dirty ring, then it'll always + * do a vmexit to make sure that PML buffers will be flushed. + * In real hypervisors, we probably need a vcpu kick or to + * stop the vcpus (before the final sync) to make sure we'll + * get all the existing dirty PFNs even cached in hardware. + */ + sem_wait(&dirty_ring_vcpu_stop); + + /* Only have one vcpu */ + count = dirty_ring_collect_one(vcpu_map_dirty_ring(vm, VCPU_ID), + slot, bitmap, num_pages, &fetch_index); + + cleared = kvm_vm_reset_dirty_ring(vm); + + /* Cleared pages should be the same as collected */ + TEST_ASSERT(cleared == count, "Reset dirty pages (%u) mismatch " + "with collected (%u)", cleared, count); + + pr_info("Notifying vcpu to continue\n"); + sem_post(&dirty_ring_vcpu_cont); + + pr_info("Iteration %ld collected %u pages\n", iteration, count); +} + +static void dirty_ring_after_vcpu_run(struct kvm_vm *vm) +{ + struct kvm_run *run = vcpu_state(vm, VCPU_ID); + + /* A ucall-sync or ring-full event is allowed */ + if (get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC) { + /* We should allow this to continue */ + ; + } else if (run->exit_reason == KVM_EXIT_DIRTY_RING_FULL) { + sem_post(&dirty_ring_vcpu_stop); + pr_info("vcpu stops because dirty ring full...\n"); + sem_wait(&dirty_ring_vcpu_cont); + pr_info("vcpu continues now.\n"); + } else { + TEST_ASSERT(false, "Invalid guest sync status: " + "exit_reason=%s\n", + exit_reason_str(run->exit_reason)); + } +} + +static void dirty_ring_before_vcpu_join(void) +{ + /* Kick another round of vcpu just to make sure it will quit */ + sem_post(&dirty_ring_vcpu_cont); +} + struct log_mode { const char *name; /* Return true if this mode is supported, otherwise false */ @@ -198,6 +338,7 @@ struct log_mode { void *bitmap, uint32_t num_pages); /* Hook to call when after each vcpu run */ void (*after_vcpu_run)(struct kvm_vm *vm); + void (*before_vcpu_join) (void); } log_modes[LOG_MODE_NUM] = { { .name = "dirty-log", @@ -211,6 +352,14 @@ struct log_mode { .collect_dirty_pages = clear_log_collect_dirty_pages, .after_vcpu_run = default_after_vcpu_run, }, + { + .name = "dirty-ring", + .supported = dirty_ring_supported, + .create_vm_done = dirty_ring_create_vm_done, + .collect_dirty_pages = dirty_ring_collect_dirty_pages, + .before_vcpu_join = dirty_ring_before_vcpu_join, + .after_vcpu_run = dirty_ring_after_vcpu_run, + }, }; /* @@ -268,6 +417,14 @@ static void log_mode_after_vcpu_run(struct kvm_vm *vm) mode->after_vcpu_run(vm); } +static void log_mode_before_vcpu_join(void) +{ + struct log_mode *mode = &log_modes[host_log_mode]; + + if (mode->before_vcpu_join) + mode->before_vcpu_join(); +} + static void generate_random_array(uint64_t *guest_array, uint64_t size) { uint64_t i; @@ -318,14 +475,61 @@ static void vm_dirty_log_verify(enum vm_guest_mode mode, unsigned long *bmap) } if (test_and_clear_bit_le(page, bmap)) { + bool matched; + host_dirty_count++; + /* * If the bit is set, the value written onto * the corresponding page should be either the * previous iteration number or the current one. */ - TEST_ASSERT(*value_ptr == iteration || - *value_ptr == iteration - 1, + matched = (*value_ptr == iteration || + *value_ptr == iteration - 1); + + if (host_log_mode == LOG_MODE_DIRTY_RING && !matched) { + if (*value_ptr == iteration - 2) { + /* + * Short answer: this case is special + * only for dirty ring test where the + * page is the last page before a kvm + * dirty ring full in iteration N-2. + * + * Long answer: Assuming ring size R, + * one possible condition is: + * + * main thr vcpu thr + * -------- -------- + * iter=1 + * write 1 to page 0~(R-1) + * full, vmexit + * collect 0~(R-1) + * kick vcpu + * write 1 to (R-1)~(2R-2) + * full, vmexit + * iter=2 + * collect (R-1)~(2R-2) + * kick vcpu + * write 1 to (2R-2) + * (NOTE!!! "1" cached in cpu reg) + * write 2 to (2R-1)~(3R-3) + * full, vmexit + * iter=3 + * collect (2R-2)~(3R-3) + * (here if we read value on page + * "2R-2" is 1, while iter=3!!!) + */ + continue; + } else if (page == dirty_ring_last_page) { + /* + * Please refer to comments in + * dirty_ring_last_page. + */ + continue; + } + } + + TEST_ASSERT(matched, "Set page %"PRIu64" value %"PRIu64 " incorrect (iteration=%"PRIu64")", page, *value_ptr, iteration); @@ -488,6 +692,7 @@ static void run_test(enum vm_guest_mode mode, unsigned long iterations, /* Tell the vcpu thread to quit */ host_quit = true; + log_mode_before_vcpu_join(); pthread_join(vcpu_thread, NULL); pr_info("Total bits checked: dirty (%"PRIu64"), clear (%"PRIu64"), " @@ -548,6 +753,9 @@ int main(int argc, char *argv[]) unsigned int mode; int opt, i, j; + sem_init(&dirty_ring_vcpu_stop, 0, 0); + sem_init(&dirty_ring_vcpu_cont, 0, 0); + #ifdef __x86_64__ guest_mode_init(VM_MODE_PXXV48_4K, true, true); #endif diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h index 919e161dd289..dce07d354a70 100644 --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@ -63,6 +63,7 @@ enum vm_mem_backing_src_type { int kvm_check_cap(long cap); int vm_enable_cap(struct kvm_vm *vm, struct kvm_enable_cap *cap); +void vm_enable_dirty_ring(struct kvm_vm *vm, uint32_t ring_size); struct kvm_vm *vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm); struct kvm_vm *_vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm); @@ -72,6 +73,7 @@ void kvm_vm_release(struct kvm_vm *vmp); void kvm_vm_get_dirty_log(struct kvm_vm *vm, int slot, void *log); void kvm_vm_clear_dirty_log(struct kvm_vm *vm, int slot, void *log, uint64_t first_page, uint32_t num_pages); +uint32_t kvm_vm_reset_dirty_ring(struct kvm_vm *vm); int kvm_memcmp_hva_gva(void *hva, struct kvm_vm *vm, const vm_vaddr_t gva, size_t len); @@ -196,6 +198,7 @@ void vcpu_nested_state_get(struct kvm_vm *vm, uint32_t vcpuid, int vcpu_nested_state_set(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_nested_state *state, bool ignore_error); #endif +void *vcpu_map_dirty_ring(struct kvm_vm *vm, uint32_t vcpuid); const char *exit_reason_str(unsigned int exit_reason); diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 74776ee228f2..831f986674ca 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -85,6 +85,16 @@ int vm_enable_cap(struct kvm_vm *vm, struct kvm_enable_cap *cap) return ret; } +void vm_enable_dirty_ring(struct kvm_vm *vm, uint32_t ring_size) +{ + struct kvm_enable_cap cap = { 0 }; + + cap.cap = KVM_CAP_DIRTY_LOG_RING; + cap.args[0] = ring_size; + vm_enable_cap(vm, &cap); + vm->dirty_ring_size = ring_size; +} + static void vm_open(struct kvm_vm *vm, int perm) { vm->kvm_fd = open(KVM_DEV_PATH, perm); @@ -304,6 +314,11 @@ void kvm_vm_clear_dirty_log(struct kvm_vm *vm, int slot, void *log, __func__, strerror(-ret)); } +uint32_t kvm_vm_reset_dirty_ring(struct kvm_vm *vm) +{ + return ioctl(vm->fd, KVM_RESET_DIRTY_RINGS); +} + /* * Userspace Memory Region Find * @@ -408,10 +423,17 @@ struct vcpu *vcpu_find(struct kvm_vm *vm, uint32_t vcpuid) * * Removes a vCPU from a VM and frees its resources. */ -static void vm_vcpu_rm(struct vcpu *vcpu) +static void vm_vcpu_rm(struct kvm_vm *vm, struct vcpu *vcpu) { int ret; + if (vcpu->dirty_gfns) { + ret = munmap(vcpu->dirty_gfns, vm->dirty_ring_size); + TEST_ASSERT(ret == 0, "munmap of VCPU dirty ring failed, " + "rc: %i errno: %i", ret, errno); + vcpu->dirty_gfns = NULL; + } + ret = munmap(vcpu->state, sizeof(*vcpu->state)); TEST_ASSERT(ret == 0, "munmap of VCPU fd failed, rc: %i " "errno: %i", ret, errno); @@ -429,7 +451,7 @@ void kvm_vm_release(struct kvm_vm *vmp) int ret; list_for_each_entry_safe(vcpu, tmp, &vmp->vcpus, list) - vm_vcpu_rm(vcpu); + vm_vcpu_rm(vmp, vcpu); ret = close(vmp->fd); TEST_ASSERT(ret == 0, "Close of vm fd failed,\n" @@ -1497,6 +1519,42 @@ int _vcpu_ioctl(struct kvm_vm *vm, uint32_t vcpuid, return ret; } +void *vcpu_map_dirty_ring(struct kvm_vm *vm, uint32_t vcpuid) +{ + struct vcpu *vcpu; + uint32_t size = vm->dirty_ring_size; + + TEST_ASSERT(size > 0, "Should enable dirty ring first"); + + vcpu = vcpu_find(vm, vcpuid); + + TEST_ASSERT(vcpu, "Cannot find vcpu %u", vcpuid); + + if (!vcpu->dirty_gfns) { + void *addr; + + addr = mmap(NULL, size, PROT_READ, + MAP_PRIVATE, vcpu->fd, + vm->page_size * KVM_DIRTY_LOG_PAGE_OFFSET); + TEST_ASSERT(addr == MAP_FAILED, "Dirty ring mapped private"); + + addr = mmap(NULL, size, PROT_READ | PROT_EXEC, + MAP_PRIVATE, vcpu->fd, + vm->page_size * KVM_DIRTY_LOG_PAGE_OFFSET); + TEST_ASSERT(addr == MAP_FAILED, "Dirty ring mapped exec"); + + addr = mmap(NULL, size, PROT_READ | PROT_WRITE, + MAP_SHARED, vcpu->fd, + vm->page_size * KVM_DIRTY_LOG_PAGE_OFFSET); + TEST_ASSERT(addr != MAP_FAILED, "Dirty ring map failed"); + + vcpu->dirty_gfns = addr; + vcpu->dirty_gfns_count = size / sizeof(struct kvm_dirty_gfn); + } + + return vcpu->dirty_gfns; +} + /* * VM Ioctl * @@ -1590,6 +1648,7 @@ static struct exit_reason { {KVM_EXIT_INTERNAL_ERROR, "INTERNAL_ERROR"}, {KVM_EXIT_OSI, "OSI"}, {KVM_EXIT_PAPR_HCALL, "PAPR_HCALL"}, + {KVM_EXIT_DIRTY_RING_FULL, "DIRTY_RING_FULL"}, #ifdef KVM_EXIT_MEMORY_NOT_PRESENT {KVM_EXIT_MEMORY_NOT_PRESENT, "MEMORY_NOT_PRESENT"}, #endif diff --git a/tools/testing/selftests/kvm/lib/kvm_util_internal.h b/tools/testing/selftests/kvm/lib/kvm_util_internal.h index 2ef446520748..6a271c876e38 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util_internal.h +++ b/tools/testing/selftests/kvm/lib/kvm_util_internal.h @@ -28,6 +28,9 @@ struct vcpu { uint32_t id; int fd; struct kvm_run *state; + struct kvm_dirty_gfn *dirty_gfns; + uint32_t fetch_index; + uint32_t dirty_gfns_count; }; struct kvm_vm { @@ -50,6 +53,7 @@ struct kvm_vm { vm_paddr_t pgd; vm_vaddr_t gdt; vm_vaddr_t tss; + uint32_t dirty_ring_size; }; struct vcpu *vcpu_find(struct kvm_vm *vm, uint32_t vcpuid); From patchwork Fri Oct 23 18:33:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11854387 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A25FC388F9 for ; Fri, 23 Oct 2020 18:34:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 056B520882 for ; Fri, 23 Oct 2020 18:34:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="PSZzgmCf" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754942AbgJWSee (ORCPT ); Fri, 23 Oct 2020 14:34:34 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:50917 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754925AbgJWSeb (ORCPT ); Fri, 23 Oct 2020 14:34:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1603478069; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uv2c/TR/fmJXSUaX+2BC7rQA6mCtzvLrsJeSDoaQmNs=; b=PSZzgmCfKYRH4lT/zuDuO8IY9WMPzq7mLkMM/NurAeZ98BcCojI237KrGf3oDwOUAQ7VYk 4qUpsXtD+iXOyvjwcStkAawbtsfY0dih/42YiwUE0XewRKt0a5MZvEuZT8jHqsqYhPRs6y 77hWULSLO+MKfRHuEAf7sl8CW2BCBek= Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-451-FzDuEBtDNYC4yh-2MCAsTQ-1; Fri, 23 Oct 2020 14:34:27 -0400 X-MC-Unique: FzDuEBtDNYC4yh-2MCAsTQ-1 Received: by mail-qk1-f197.google.com with SMTP id k12so1608771qkj.18 for ; Fri, 23 Oct 2020 11:34:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=uv2c/TR/fmJXSUaX+2BC7rQA6mCtzvLrsJeSDoaQmNs=; b=ZqwsaRXPg/oth5qATtqXiq8OzDmOxex133CFVoW3PpfwrYwBbpZy+5XQmQtpfvaQcc Qux0nxDWGK+/2u3GH+mTCxGhEO4sIpV58tr0QtOGPl5hSfNHc+VGP8Z6jXKKjDiUVvaN hB8+DUUlttH2SSKkLoZMIlC6YXiFLcrdRpsXyhNdlQbHgEORkVF3l85ax7HUOaTaMthM rVhOnC19BOLsAdGhjMUoY/3lNme3DZ1zLfvGmcspfUCDs+pFzAnJwotmV5QEzFRy2y9J uYGmb8knjwACC0ELKXck4nMuLv1pPrUC3KKLVw+D9Rd8NzgyIZ8V0+oGW7Qab2PvNsxZ xidg== X-Gm-Message-State: AOAM533QrPAUb7NIHsqxx/d4XddmgBMJLHNpaSTv0RPVUGym+rnoKzgm V26blahpmPbUCNt8YErxn/C0K0y+wuc8mUBi1+xp7viajiByuo2BppgTcisSrnzomYARfvDR3Zx rcJ6uBaYjrSEU X-Received: by 2002:ac8:160f:: with SMTP id p15mr3447897qtj.57.1603478066326; Fri, 23 Oct 2020 11:34:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxpsfX9btSjkssYIs1bGhqyIW56bwditiTMbsOtrjnD+a+O5uE/i6va/CmKRqwkLO8B3m4FiA== X-Received: by 2002:ac8:160f:: with SMTP id p15mr3447863qtj.57.1603478065976; Fri, 23 Oct 2020 11:34:25 -0700 (PDT) Received: from xz-x1.redhat.com (toroon474qw-lp140-04-174-95-215-133.dsl.bell.ca. [174.95.215.133]) by smtp.gmail.com with ESMTPSA id u11sm1490407qtk.61.2020.10.23.11.34.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 11:34:25 -0700 (PDT) From: Peter Xu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Sean Christopherson , peterx@redhat.com, Andrew Jones , "Dr . David Alan Gilbert" , Paolo Bonzini Subject: [PATCH v15 13/14] KVM: selftests: Let dirty_log_test async for dirty ring test Date: Fri, 23 Oct 2020 14:33:57 -0400 Message-Id: <20201023183358.50607-14-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201023183358.50607-1-peterx@redhat.com> References: <20201023183358.50607-1-peterx@redhat.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Previously the dirty ring test was working in synchronous way, because only with a vmexit (with that it was the ring full event) we'll know the hardware dirty bits will be flushed to the dirty ring. With this patch we first introduced the vcpu kick mechanism by using SIGUSR1, meanwhile we can have a guarantee of vmexit and also the flushing of hardware dirty bits. With all these, we can keep the vcpu dirty work asynchronous of the whole collection procedure now. Still, we need to be very careful that we can only do it async if the vcpu is not reaching soft limit (no KVM_EXIT_DIRTY_RING_FULL). Otherwise we must collect the dirty bits before continuing the vcpu. Further increase the dirty ring size to current maximum to make sure we torture more on the no-ring-full case, which should be the major scenario when the hypervisors like QEMU would like to use this feature. Reviewed-by: Andrew Jones Signed-off-by: Peter Xu --- tools/testing/selftests/kvm/dirty_log_test.c | 126 +++++++++++++----- .../testing/selftests/kvm/include/kvm_util.h | 1 + tools/testing/selftests/kvm/lib/kvm_util.c | 9 ++ 3 files changed, 106 insertions(+), 30 deletions(-) diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index 9fcd15b8e999..913583cb22b8 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -13,6 +13,9 @@ #include #include #include +#include +#include +#include #include #include #include @@ -59,7 +62,9 @@ # define test_and_clear_bit_le test_and_clear_bit #endif -#define TEST_DIRTY_RING_COUNT 1024 +#define TEST_DIRTY_RING_COUNT 65536 + +#define SIG_IPI SIGUSR1 /* * Guest/Host shared variables. Ensure addr_gva2hva() and/or @@ -149,6 +154,12 @@ static sem_t dirty_ring_vcpu_cont; * verifying process, we let it pass. */ static uint64_t dirty_ring_last_page; +/* + * This is updated by the vcpu thread to tell the host whether it's a + * ring-full event. It should only be read until a sem_wait() of + * dirty_ring_vcpu_stop and before vcpu continues to run. + */ +static bool dirty_ring_vcpu_ring_full; enum log_mode_t { /* Only use KVM_GET_DIRTY_LOG for logging */ @@ -170,6 +181,33 @@ enum log_mode_t { static enum log_mode_t host_log_mode_option = LOG_MODE_ALL; /* Logging mode for current run */ static enum log_mode_t host_log_mode; +static pthread_t vcpu_thread; + +/* Only way to pass this to the signal handler */ +static struct kvm_vm *current_vm; + +static void vcpu_sig_handler(int sig) +{ + TEST_ASSERT(sig == SIG_IPI, "unknown signal: %d", sig); +} + +static void vcpu_kick(void) +{ + pthread_kill(vcpu_thread, SIG_IPI); +} + +/* + * In our test we do signal tricks, let's use a better version of + * sem_wait to avoid signal interrupts + */ +static void sem_wait_until(sem_t *sem) +{ + int ret; + + do + ret = sem_wait(sem); + while (ret == -1 && errno == EINTR); +} static bool clear_log_supported(void) { @@ -203,10 +241,13 @@ static void clear_log_collect_dirty_pages(struct kvm_vm *vm, int slot, kvm_vm_clear_dirty_log(vm, slot, bitmap, 0, num_pages); } -static void default_after_vcpu_run(struct kvm_vm *vm) +static void default_after_vcpu_run(struct kvm_vm *vm, int ret, int err) { struct kvm_run *run = vcpu_state(vm, VCPU_ID); + TEST_ASSERT(ret == 0 || (ret == -1 && err == EINTR), + "vcpu run failed: errno=%d", err); + TEST_ASSERT(get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC, "Invalid guest sync status: exit_reason=%s\n", exit_reason_str(run->exit_reason)); @@ -263,27 +304,37 @@ static uint32_t dirty_ring_collect_one(struct kvm_dirty_gfn *dirty_gfns, return count; } +static void dirty_ring_wait_vcpu(void) +{ + /* This makes sure that hardware PML cache flushed */ + vcpu_kick(); + sem_wait_until(&dirty_ring_vcpu_stop); +} + +static void dirty_ring_continue_vcpu(void) +{ + pr_info("Notifying vcpu to continue\n"); + sem_post(&dirty_ring_vcpu_cont); +} + static void dirty_ring_collect_dirty_pages(struct kvm_vm *vm, int slot, void *bitmap, uint32_t num_pages) { /* We only have one vcpu */ static uint32_t fetch_index = 0; uint32_t count = 0, cleared; + bool continued_vcpu = false; - /* - * Before fetching the dirty pages, we need a vmexit of the - * worker vcpu to make sure the hardware dirty buffers were - * flushed. This is not needed for dirty-log/clear-log tests - * because get dirty log will natually do so. - * - * For now we do it in the simple way - we simply wait until - * the vcpu uses up the soft dirty ring, then it'll always - * do a vmexit to make sure that PML buffers will be flushed. - * In real hypervisors, we probably need a vcpu kick or to - * stop the vcpus (before the final sync) to make sure we'll - * get all the existing dirty PFNs even cached in hardware. - */ - sem_wait(&dirty_ring_vcpu_stop); + dirty_ring_wait_vcpu(); + + if (!dirty_ring_vcpu_ring_full) { + /* + * This is not a ring-full event, it's safe to allow + * vcpu to continue + */ + dirty_ring_continue_vcpu(); + continued_vcpu = true; + } /* Only have one vcpu */ count = dirty_ring_collect_one(vcpu_map_dirty_ring(vm, VCPU_ID), @@ -295,13 +346,16 @@ static void dirty_ring_collect_dirty_pages(struct kvm_vm *vm, int slot, TEST_ASSERT(cleared == count, "Reset dirty pages (%u) mismatch " "with collected (%u)", cleared, count); - pr_info("Notifying vcpu to continue\n"); - sem_post(&dirty_ring_vcpu_cont); + if (!continued_vcpu) { + TEST_ASSERT(dirty_ring_vcpu_ring_full, + "Didn't continue vcpu even without ring full"); + dirty_ring_continue_vcpu(); + } pr_info("Iteration %ld collected %u pages\n", iteration, count); } -static void dirty_ring_after_vcpu_run(struct kvm_vm *vm) +static void dirty_ring_after_vcpu_run(struct kvm_vm *vm, int ret, int err) { struct kvm_run *run = vcpu_state(vm, VCPU_ID); @@ -309,10 +363,16 @@ static void dirty_ring_after_vcpu_run(struct kvm_vm *vm) if (get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC) { /* We should allow this to continue */ ; - } else if (run->exit_reason == KVM_EXIT_DIRTY_RING_FULL) { + } else if (run->exit_reason == KVM_EXIT_DIRTY_RING_FULL || + (ret == -1 && err == EINTR)) { + /* Update the flag first before pause */ + WRITE_ONCE(dirty_ring_vcpu_ring_full, + run->exit_reason == KVM_EXIT_DIRTY_RING_FULL); sem_post(&dirty_ring_vcpu_stop); - pr_info("vcpu stops because dirty ring full...\n"); - sem_wait(&dirty_ring_vcpu_cont); + pr_info("vcpu stops because %s...\n", + dirty_ring_vcpu_ring_full ? + "dirty ring is full" : "vcpu is kicked out"); + sem_wait_until(&dirty_ring_vcpu_cont); pr_info("vcpu continues now.\n"); } else { TEST_ASSERT(false, "Invalid guest sync status: " @@ -337,7 +397,7 @@ struct log_mode { void (*collect_dirty_pages) (struct kvm_vm *vm, int slot, void *bitmap, uint32_t num_pages); /* Hook to call when after each vcpu run */ - void (*after_vcpu_run)(struct kvm_vm *vm); + void (*after_vcpu_run)(struct kvm_vm *vm, int ret, int err); void (*before_vcpu_join) (void); } log_modes[LOG_MODE_NUM] = { { @@ -409,12 +469,12 @@ static void log_mode_collect_dirty_pages(struct kvm_vm *vm, int slot, mode->collect_dirty_pages(vm, slot, bitmap, num_pages); } -static void log_mode_after_vcpu_run(struct kvm_vm *vm) +static void log_mode_after_vcpu_run(struct kvm_vm *vm, int ret, int err) { struct log_mode *mode = &log_modes[host_log_mode]; if (mode->after_vcpu_run) - mode->after_vcpu_run(vm); + mode->after_vcpu_run(vm, ret, err); } static void log_mode_before_vcpu_join(void) @@ -435,20 +495,27 @@ static void generate_random_array(uint64_t *guest_array, uint64_t size) static void *vcpu_worker(void *data) { - int ret; + int ret, vcpu_fd; struct kvm_vm *vm = data; uint64_t *guest_array; uint64_t pages_count = 0; + struct sigaction sigact; + + current_vm = vm; + vcpu_fd = vcpu_get_fd(vm, VCPU_ID); + memset(&sigact, 0, sizeof(sigact)); + sigact.sa_handler = vcpu_sig_handler; + sigaction(SIG_IPI, &sigact, NULL); guest_array = addr_gva2hva(vm, (vm_vaddr_t)random_array); while (!READ_ONCE(host_quit)) { + /* Clear any existing kick signals */ generate_random_array(guest_array, TEST_PAGES_PER_LOOP); pages_count += TEST_PAGES_PER_LOOP; /* Let the guest dirty the random pages */ - ret = _vcpu_run(vm, VCPU_ID); - TEST_ASSERT(ret == 0, "vcpu_run failed: %d\n", ret); - log_mode_after_vcpu_run(vm); + ret = ioctl(vcpu_fd, KVM_RUN, NULL); + log_mode_after_vcpu_run(vm, ret, errno); } pr_info("Dirtied %"PRIu64" pages\n", pages_count); @@ -594,7 +661,6 @@ static struct kvm_vm *create_vm(enum vm_guest_mode mode, uint32_t vcpuid, static void run_test(enum vm_guest_mode mode, unsigned long iterations, unsigned long interval, uint64_t phys_offset) { - pthread_t vcpu_thread; struct kvm_vm *vm; unsigned long *bmap; diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h index dce07d354a70..f3b5da383bb5 100644 --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@ -146,6 +146,7 @@ vm_paddr_t addr_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva); struct kvm_run *vcpu_state(struct kvm_vm *vm, uint32_t vcpuid); void vcpu_run(struct kvm_vm *vm, uint32_t vcpuid); int _vcpu_run(struct kvm_vm *vm, uint32_t vcpuid); +int vcpu_get_fd(struct kvm_vm *vm, uint32_t vcpuid); void vcpu_run_complete_io(struct kvm_vm *vm, uint32_t vcpuid); void vcpu_set_guest_debug(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_guest_debug *debug); diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 831f986674ca..4625e193074e 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -1220,6 +1220,15 @@ int _vcpu_run(struct kvm_vm *vm, uint32_t vcpuid) return rc; } +int vcpu_get_fd(struct kvm_vm *vm, uint32_t vcpuid) +{ + struct vcpu *vcpu = vcpu_find(vm, vcpuid); + + TEST_ASSERT(vcpu != NULL, "vcpu not found, vcpuid: %u", vcpuid); + + return vcpu->fd; +} + void vcpu_run_complete_io(struct kvm_vm *vm, uint32_t vcpuid) { struct vcpu *vcpu = vcpu_find(vm, vcpuid); From patchwork Fri Oct 23 18:33:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 11854389 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16F43C55178 for ; Fri, 23 Oct 2020 18:35:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AFBF322254 for ; Fri, 23 Oct 2020 18:35:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="fijiQJYZ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754951AbgJWSfD (ORCPT ); Fri, 23 Oct 2020 14:35:03 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:36494 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754928AbgJWSec (ORCPT ); Fri, 23 Oct 2020 14:34:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1603478070; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sVF5RFO9TPcnd1hmSsZzSNQ/sV8XWPAAmLyQ5cmt6Ek=; b=fijiQJYZg/FDfS7kS3aSBBML/0ra2bszQJfyLbOt1gukydcrW0r9wjdm/yYY05lke/a2QA 5j7yKvKMoHxBiDLpnFBBlg9Ebn9jtUAvkZeZifyO1DDK4d1f2t8Bug9IEIei8B/DmoNxpl 8Gumv+H66zO+Ai0U8Ci03cMoHYT4SIo= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-379-iOlfcw8YOZ228o4LKYt93Q-1; Fri, 23 Oct 2020 14:34:28 -0400 X-MC-Unique: iOlfcw8YOZ228o4LKYt93Q-1 Received: by mail-qk1-f200.google.com with SMTP id b7so1616852qkh.20 for ; Fri, 23 Oct 2020 11:34:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=sVF5RFO9TPcnd1hmSsZzSNQ/sV8XWPAAmLyQ5cmt6Ek=; b=Q4xTGKUj00FXCgrpBoC7rSwchHLwT09aqZPET2XKuXmNblOXskEFVeFvhcr3Hw04wO wdE/0IMvMxCTDtkSFKxRw1elQRDE1tYwLaCTJp9IC/FxVqwYrqNDqFjq/w9V2dut8vcO oLgFFc2mDlOt4eJ/rCpUX0v9BdK4StDjPxJ+kmZiwjcHFKKH/y8GczZIV91BeMMMUAum U0s3p++yplTmjzk57az96wL009/zrjMeo/X9JrN2Nggkrk2tP0/4GkUX+NxTvRupEzYz YG6bgKKWJ9QJW+lni+WgNVElGZRkRnpNQed4KpEGDAmdGb5GHqb+x0HjHuqrcp7yQ1c/ r9bw== X-Gm-Message-State: AOAM5313IiE++qBgSXh4JRFB/xUVVALNilYlE94oGv4Qex6sPsQoFc5x gMx6xCBSA9LhJcVlfeq55P1/Jb563YD25/fCwV+6CTIRm7Pbai/HarUWwjEhW6E6ueyUS5AE5Jd oqTmxMkao5Cul X-Received: by 2002:ad4:55ea:: with SMTP id bu10mr3897135qvb.28.1603478067511; Fri, 23 Oct 2020 11:34:27 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzXRAVp/Qedn6rtznTnWgwn3EMEI5OC2DvzGQZ3CsgeuShgCK9OZB9T7BHXFVd52vlydhY75w== X-Received: by 2002:ad4:55ea:: with SMTP id bu10mr3897118qvb.28.1603478067247; Fri, 23 Oct 2020 11:34:27 -0700 (PDT) Received: from xz-x1.redhat.com (toroon474qw-lp140-04-174-95-215-133.dsl.bell.ca. [174.95.215.133]) by smtp.gmail.com with ESMTPSA id u11sm1490407qtk.61.2020.10.23.11.34.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Oct 2020 11:34:26 -0700 (PDT) From: Peter Xu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Sean Christopherson , peterx@redhat.com, Andrew Jones , "Dr . David Alan Gilbert" , Paolo Bonzini Subject: [PATCH v15 14/14] KVM: selftests: Add "-c" parameter to dirty log test Date: Fri, 23 Oct 2020 14:33:58 -0400 Message-Id: <20201023183358.50607-15-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201023183358.50607-1-peterx@redhat.com> References: <20201023183358.50607-1-peterx@redhat.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org It's only used to override the existing dirty ring size/count. If with a bigger ring count, we test async of dirty ring. If with a smaller ring count, we test ring full code path. Async is default. It has no use for non-dirty-ring tests. Reviewed-by: Andrew Jones Signed-off-by: Peter Xu --- tools/testing/selftests/kvm/dirty_log_test.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index 913583cb22b8..c2cf11e24e0a 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -182,6 +182,7 @@ static enum log_mode_t host_log_mode_option = LOG_MODE_ALL; /* Logging mode for current run */ static enum log_mode_t host_log_mode; static pthread_t vcpu_thread; +static uint32_t test_dirty_ring_count = TEST_DIRTY_RING_COUNT; /* Only way to pass this to the signal handler */ static struct kvm_vm *current_vm; @@ -264,7 +265,7 @@ static void dirty_ring_create_vm_done(struct kvm_vm *vm) * Switch to dirty ring mode after VM creation but before any * of the vcpu creation. */ - vm_enable_dirty_ring(vm, TEST_DIRTY_RING_COUNT * + vm_enable_dirty_ring(vm, test_dirty_ring_count * sizeof(struct kvm_dirty_gfn)); } @@ -286,7 +287,7 @@ static uint32_t dirty_ring_collect_one(struct kvm_dirty_gfn *dirty_gfns, uint32_t count = 0; while (true) { - cur = &dirty_gfns[*fetch_index % TEST_DIRTY_RING_COUNT]; + cur = &dirty_gfns[*fetch_index % test_dirty_ring_count]; if (!dirty_gfn_is_dirtied(cur)) break; TEST_ASSERT(cur->slot == slot, "Slot number didn't match: " @@ -789,6 +790,9 @@ static void help(char *name) printf("usage: %s [-h] [-i iterations] [-I interval] " "[-p offset] [-m mode]\n", name); puts(""); + printf(" -c: specify dirty ring size, in number of entries\n"); + printf(" (only useful for dirty-ring test; default: %"PRIu32")\n", + TEST_DIRTY_RING_COUNT); printf(" -i: specify iteration counts (default: %"PRIu64")\n", TEST_HOST_LOOP_N); printf(" -I: specify interval in ms (default: %"PRIu64" ms)\n", @@ -844,8 +848,11 @@ int main(int argc, char *argv[]) guest_mode_init(VM_MODE_P40V48_4K, true, true); #endif - while ((opt = getopt(argc, argv, "hi:I:p:m:M:")) != -1) { + while ((opt = getopt(argc, argv, "c:hi:I:p:m:M:")) != -1) { switch (opt) { + case 'c': + test_dirty_ring_count = strtol(optarg, NULL, 10); + break; case 'i': iterations = strtol(optarg, NULL, 10); break;