From patchwork Sat Jan 12 08:20:59 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Zhuang Yanying <ann.zhuangyanying@huawei.com>
X-Patchwork-Id: 10760903
Return-Path: <kvm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AFD1C91E
	for <patchwork-kvm@patchwork.kernel.org>;
 Sat, 12 Jan 2019 08:23:56 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 90DF828848
	for <patchwork-kvm@patchwork.kernel.org>;
 Sat, 12 Jan 2019 08:23:56 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 6B91428893; Sat, 12 Jan 2019 08:23:56 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C3DCB28848
	for <patchwork-kvm@patchwork.kernel.org>;
 Sat, 12 Jan 2019 08:23:55 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1725826AbfALIXv (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Sat, 12 Jan 2019 03:23:51 -0500
Received: from szxga05-in.huawei.com ([45.249.212.191]:17136 "EHLO huawei.com"
        rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
        id S1725372AbfALIXv (ORCPT <rfc822;kvm@vger.kernel.org>);
        Sat, 12 Jan 2019 03:23:51 -0500
Received: from DGGEMS404-HUB.china.huawei.com (unknown [172.30.72.58])
        by Forcepoint Email with ESMTP id 85FF3D9AFE326FF9BABE;
        Sat, 12 Jan 2019 16:23:45 +0800 (CST)
Received: from localhost (10.177.21.2) by DGGEMS404-HUB.china.huawei.com
 (10.3.19.204) with Microsoft SMTP Server id 14.3.408.0; Sat, 12 Jan 2019
 16:23:35 +0800
From: Zhuangyanying <ann.zhuangyanying@huawei.com>
To: <xiaoguangrong.eric@gmail.com>, <pbonzini@redhat.com>,
        <arei.gonglei@huawei.com>
CC: <qemu-devel@nongnu.org>, <kvm@vger.kernel.org>,
        <wangxinxin.wang@huawei.com>, <liu.jinsong@huawei.com>,
        Zhuang Yanying <ann.zhuangyanying@huawei.com>
Subject: [PATCH] KVM: MMU: fast cleanup D bit based on fast write protect
Date: Sat, 12 Jan 2019 08:20:59 +0000
Message-ID: <1547281259-26180-1-git-send-email-ann.zhuangyanying@huawei.com>
X-Mailer: git-send-email 2.6.4.windows.1
MIME-Version: 1.0
X-Originating-IP: [10.177.21.2]
X-CFilter-Loop: Reflected
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

From: Zhuang Yanying <ann.zhuangyanying@huawei.com>

Recently I tested live-migration with large-memory guests, find vcpu may hang for a long time while starting migration, such as 9s for 2048G(linux-4.20.1+qemu-3.1.0).
The reason is memory_global_dirty_log_start() taking too long, and the vcpu is waiting for BQL. The page-by-page D bit clearup is the main time consumption.
I think that the idea of "KVM: MMU: fast write protect" by xiaoguangrong, especially the function kvm_mmu_write_protect_all_pages(), is very helpful.
After a little modifcation, on his patch, can solve this problem, 9s to 0.5s.

At the begining of live migration, write protection is only applied to the top-level SPTE. Then the write from vm trigger the EPT violation, with for_each_shadow_entry write protection is performed at dirct_map.
Finally the Dirty bit of the target page(at level 1 page table) is cleared, and the dirty page tracking is started. Of coure, the page where GPA is located is marked dirty when mmu_set_spte.
A similar implementation on xen, just emt instead of write protection.

What do you think about this solution?
---
 mmu.c | 5 ++++-
 vmx.c | 3 +--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/mmu.c b/mmu.c
index b079d74..f49d316 100755
--- a/mmu.c
+++ b/mmu.c
@@ -3210,7 +3210,10 @@ static bool mmu_load_shadow_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 			break;
 
                 if (is_last_spte(spte, sp->role.level)) {
-			flush |= spte_write_protect(sptep, false);
+			if (sp->role.level == PT_PAGE_TABLE_LEVEL)
+				flush |= spte_clear_dirty(sptep);
+			else
+				flush |= spte_write_protect(sptep, false);
 			continue;
                 }
 
diff --git a/vmx.c b/vmx.c
index 95784bc..7ec717f 100755
--- a/vmx.c
+++ b/vmx.c
@@ -14421,8 +14421,7 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
 static void vmx_slot_enable_log_dirty(struct kvm *kvm,
 				     struct kvm_memory_slot *slot)
 {
-	kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
-	kvm_mmu_slot_largepage_remove_write_access(kvm, slot);
+	kvm_mmu_write_protect_all_pages(kvm, true);
 }
 
 static void vmx_slot_disable_log_dirty(struct kvm *kvm,