From patchwork Wed Mar 24 18:39:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12162019 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7AF34C433DB for ; Wed, 24 Mar 2021 18:42:40 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0E7E061A16 for ; Wed, 24 Mar 2021 18:42:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0E7E061A16 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:44648 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lP8Sl-0000i9-1Y for qemu-devel@archiver.kernel.org; Wed, 24 Mar 2021 14:42:39 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:47060) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lP8QJ-0007Hi-Me for qemu-devel@nongnu.org; Wed, 24 Mar 2021 14:40:07 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:45627) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lP8QF-0006Ve-7A for qemu-devel@nongnu.org; Wed, 24 Mar 2021 14:40:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616611201; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=vWEMBOasYrQ8C4jNA7lVFB2dFLgXLsQzIeDhWS++tJU=; b=S3NxipqW8WB2H0OuIZHYN22j2ntU5JDzO31OWzvRq5RtXDcCHDLMuaaN3a2SOX9rlkDO9M k+Y42Th+EP9KJhvOnDkAT6T9z6QgjdIMrLfEPR+ViVKOtNF1a/ZnY/+5xB5J8TW/eKDaTO ljtUroCBOrutR7Gc3D8NXxmsTIKdc5E= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-275-J4RTsDMnM_GNYBowWMVUbA-1; Wed, 24 Mar 2021 14:39:58 -0400 X-MC-Unique: J4RTsDMnM_GNYBowWMVUbA-1 Received: by mail-qk1-f198.google.com with SMTP id k188so2291019qkb.5 for ; Wed, 24 Mar 2021 11:39:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=36pN8hv87mhPAsWcjEHqAKBUsBWWRg6ppeWJqaaFjG4=; b=T1dj9r1B5VqsLNUzdRFoj0bq4o/p/n9gf3v3tQqo4KB2UfqwnZ+gEAKGfoyTDz2Ao1 d8uU6VRI9Mb0JgE7zXiVF5HI/bvfmWhtd+YoiMaUBTu19+6/S9V4ti/RdJ2Q6CNP+T/w bQwGSL1moiX+4YBCUambITYfFjZ4wEM8XgTSXkT2Ph4Le5/8JBIcn68X5hHU1yh4Mg2I thqrv4ybG9OBPW5lB7iHNTpHJ1fB2iSlDklGzMAs+rJQFbI/D53yobI4CTTm5RV3LvYW KNJpZ7XrRjCDM9hkyDVwRYcaghWNeQaKVrKHRsPbz23T7fY10825pMyRMYKQfBeh6Bk9 e/1g== X-Gm-Message-State: AOAM531UnoovjT3iBDf3S0SW7Os6JfSPYI1gnm7ko3TbA2Xtrv/TKTHO HM5Jz9dvYw9VvU6NhPFwadi9paGeWMkOXyWhrvtNMY0z+lY8v9oW4RcnzHqmm+XHxwwqIwlSshn M9fbyFR6olB/SaJ+cHrgMaKOm8rCEOS7nx3n8vTu1/0DID0rtsKZ6tboQEtU3b0Te X-Received: by 2002:a0c:a954:: with SMTP id z20mr4710710qva.29.1616611197327; Wed, 24 Mar 2021 11:39:57 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzEWvwO3NXwFhEwhzKASxZ20EmAVuLZOTxBmblyfQtQDPP5Kci+LL4DES/JFCcv/RlfgKfp0Q== X-Received: by 2002:a0c:a954:: with SMTP id z20mr4710674qva.29.1616611196809; Wed, 24 Mar 2021 11:39:56 -0700 (PDT) Received: from xz-x1.redhat.com (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id t24sm2001083qto.23.2021.03.24.11.39.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Mar 2021 11:39:56 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Subject: [PATCH v6 00/10] KVM: Dirty ring support (QEMU part) Date: Wed, 24 Mar 2021 14:39:44 -0400 Message-Id: <20210324183954.345629-1-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=216.205.24.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Paolo Bonzini , Keqian Zhu , Hyman , "Dr . David Alan Gilbert" , peterx@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" This is v6 of the qemu dirty ring interface support. v6: - Fix slots_lock init [Keqian, Paolo] - Comment above KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 on todo (to enable KVM_CLEAR_DIRTY_LOG for dirty ring too) [Keqian, Paolo] - Fix comment for CPUState [Keqian] v5: - rebase - dropped patch "update-linux-headers: Include const.h" after rebase - dropped patch "KVM: Fixup kvm_log_clear_one_slot() ioctl return check" since similar patch got merged recently (38e0b7904eca7cd32f8953c3) ========= v4 cover letter below ============= It is merely the same as v3 content-wise, but there're a few things to mention besides the rebase itself: - I picked up two patches from Eric Farman for the linux-header updates (from Eric's v3 series) for convenience just in case any of the series would got queued by any maintainer. - One more patch is added as "KVM: Disable manual dirty log when dirty ring enabled". I found this when testing the branch after rebasing to latest qemu, that not only the manual dirty log capability is not needed for kvm dirty ring, but more importantly INITIALLY_ALL_SET is totally against kvm dirty ring and it could silently crash the guest after migration. For this new commit, I touched up "KVM: Add dirty-gfn-count property" a bit. - A few more documentation lines in qemu-options.hx. - I removed the RFC tag after kernel series got merged. Again, this is only the 1st step to support dirty ring. Ideally dirty ring should grant QEMU the possibility to remove the whole layered dirty bitmap so that dirty ring will work similarly as auto-converge enabled but should better; we will just throttle vcpus with the dirty ring kvm exit rather than explicitly adding a timer to stop the vcpu thread from entering the guest again (like what we did with current migration auto-converge). Some more information could also be found in the kvm forum 2020 talk regarding kvm dirty ring (slides 21/22 [1]). That next step (to remove all the dirty bitmaps, as mentioned above) is still discussable: firstly I don't know whether there's anything I've overlooked in there. Meanwhile that's also only services huge VM cases, may not be extremely helpful with a lot major scenarios where VMs are not that huge. There's probably other ways to fix huge VM migration issues, majorly focusing on responsiveness and convergence. For example, Google has proposed some new userfaultfd kernel capability called "minor modes" [2] to track page minor faults and that could be finally served for that purpose too using postcopy. That's another long story so I'll stop here, but just as a marker along with the dirty ring series so there'll still be a record to reference. Said that, I still think this series is very worth merging even if we don't persue the next steps yet, since dirty ring is disabled by default, and we can always work upon this series. Please review, thanks. V3: https://lore.kernel.org/qemu-devel/20200523232035.1029349-1-peterx@redhat.com/ (V3 contains all the pre-v3 changelog) QEMU branch for testing (requires kernel version 5.11-rc1+): https://github.com/xzpeter/qemu/tree/kvm-dirty-ring [1] https://static.sched.com/hosted_files/kvmforum2020/97/kvm_dirty_ring_peter.pdf [2] https://lore.kernel.org/lkml/20210107190453.3051110-1-axelrasmussen@google.com/ ---------------------------8<--------------------------------- Overview ======== KVM dirty ring is a new interface to pass over dirty bits from kernel to the userspace. Instead of using a bitmap for each memory region, the dirty ring contains an array of dirtied GPAs to fetch, one ring per vcpu. There're a few major changes comparing to how the old dirty logging interface would work: - Granularity of dirty bits KVM dirty ring interface does not offer memory region level granularity to collect dirty bits (i.e., per KVM memory slot). Instead the dirty bit is collected globally for all the vcpus at once. The major effect is on VGA part because VGA dirty tracking is enabled as long as the device is created, also it was in memory region granularity. Now that operation will be amplified to a VM sync. Maybe there's smarter way to do the same thing in VGA with the new interface, but so far I don't see it affects much at least on regular VMs. - Collection of dirty bits The old dirty logging interface collects KVM dirty bits when synchronizing dirty bits. KVM dirty ring interface instead used a standalone thread to do that. So when the other thread (e.g., the migration thread) wants to synchronize the dirty bits, it simply kick the thread and wait until it flushes all the dirty bits to the ramblock dirty bitmap. A new parameter "dirty-ring-size" is added to "-accel kvm". By default, dirty ring is still disabled (size==0). To enable it, we need to be with: -accel kvm,dirty-ring-size=65536 This establishes a 64K dirty ring buffer per vcpu. Then if we migrate, it'll switch to dirty ring. I gave it a shot with a 24G guest, 8 vcpus, using 10g NIC as migration channel. When idle or dirty workload small, I don't observe major difference on total migration time. When with higher random dirty workload (800MB/s dirty rate upon 20G memory, worse for kvm dirty ring). Total migration time is (ping pong migrate for 6 times, in seconds): |-------------------------+---------------| | dirty ring (4k entries) | dirty logging | |-------------------------+---------------| | 70 | 58 | | 78 | 70 | | 72 | 48 | | 74 | 52 | | 83 | 49 | | 65 | 54 | |-------------------------+---------------| Summary: dirty ring average: 73s dirty logging average: 55s The KVM dirty ring will be slower in above case. The number may show that the dirty logging is still preferred as a default value because small/medium VMs are still major cases, and high dirty workload happens frequently too. And that's what this series did. TODO: - Consider to drop the BQL dependency: then we can run the reaper thread in parallel of main thread. Needs some thought around the race conditions. - Consider to drop the kvmslot bitmap: logically this can be dropped with kvm dirty ring, not only for space saving, but also it's still another layer linear to guest mem size which is against the whole idea of kvm dirty ring. This should make above number (of kvm dirty ring) even smaller (but still may not be as good as dirty logging when with such high workload). Please refer to the code and comment itself for more information. Thanks, Peter Xu (10): memory: Introduce log_sync_global() to memory listener KVM: Use a big lock to replace per-kml slots_lock KVM: Create the KVMSlot dirty bitmap on flag changes KVM: Provide helper to get kvm dirty log KVM: Provide helper to sync dirty bitmap from slot to ramblock KVM: Simplify dirty log sync in kvm_set_phys_mem KVM: Cache kvm slot dirty bitmap size KVM: Add dirty-gfn-count property KVM: Disable manual dirty log when dirty ring enabled KVM: Dirty ring support accel/kvm/kvm-all.c | 593 +++++++++++++++++++++++++++++++++------ accel/kvm/trace-events | 7 + include/exec/memory.h | 12 + include/hw/core/cpu.h | 7 + include/sysemu/kvm_int.h | 7 +- qemu-options.hx | 12 + softmmu/memory.c | 33 ++- 7 files changed, 572 insertions(+), 99 deletions(-)