From patchwork Wed Oct 23 13:29:23 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Guangrong X-Patchwork-Id: 3088741 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id D02029F2B8 for ; Wed, 23 Oct 2013 13:34:36 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 900FA203C4 for ; Wed, 23 Oct 2013 13:34:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 46549203AF for ; Wed, 23 Oct 2013 13:34:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754325Ab3JWNeb (ORCPT ); Wed, 23 Oct 2013 09:34:31 -0400 Received: from e23smtp03.au.ibm.com ([202.81.31.145]:32920 "EHLO e23smtp03.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753751Ab3JWN35 (ORCPT ); Wed, 23 Oct 2013 09:29:57 -0400 Received: from /spool/local by e23smtp03.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 23 Oct 2013 23:29:56 +1000 Received: from d23dlp03.au.ibm.com (202.81.31.214) by e23smtp03.au.ibm.com (202.81.31.209) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 23 Oct 2013 23:29:53 +1000 Received: from d23relay05.au.ibm.com (d23relay05.au.ibm.com [9.190.235.152]) by d23dlp03.au.ibm.com (Postfix) with ESMTP id 567743578040; Thu, 24 Oct 2013 00:29:53 +1100 (EST) Received: from d23av02.au.ibm.com (d23av02.au.ibm.com [9.190.235.138]) by d23relay05.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r9NDCUEH22937674; Thu, 24 Oct 2013 00:12:30 +1100 Received: from d23av02.au.ibm.com (localhost [127.0.0.1]) by d23av02.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id r9NDTqbV030904; Thu, 24 Oct 2013 00:29:53 +1100 Received: from localhost (ericxiao.cn.ibm.com [9.111.29.64]) by d23av02.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id r9NDTphZ030884; Thu, 24 Oct 2013 00:29:52 +1100 From: Xiao Guangrong To: gleb@redhat.com Cc: avi.kivity@gmail.com, mtosatti@redhat.com, pbonzini@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Xiao Guangrong Subject: [PATCH v3 05/15] KVM: MMU: update spte and add it into rmap before dirty log Date: Wed, 23 Oct 2013 21:29:23 +0800 Message-Id: <1382534973-13197-6-git-send-email-xiaoguangrong@linux.vnet.ibm.com> X-Mailer: git-send-email 1.8.1.4 In-Reply-To: <1382534973-13197-1-git-send-email-xiaoguangrong@linux.vnet.ibm.com> References: <1382534973-13197-1-git-send-email-xiaoguangrong@linux.vnet.ibm.com> X-TM-AS-MML: No X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13102313-6102-0000-0000-00000463C503 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP kvm_vm_ioctl_get_dirty_log() write-protects the spte based on the its dirty bitmap, so we should ensure the writable spte can be found in rmap before the dirty bitmap is visible. Otherwise, we clear the dirty bitmap but fail to write-protect the page which is detailed in the comments in this patch Signed-off-by: Xiao Guangrong Reviewed-by: Marcelo Tosatti --- arch/x86/kvm/mmu.c | 84 ++++++++++++++++++++++++++++++++++++++++++------------ arch/x86/kvm/x86.c | 10 +++++++ 2 files changed, 76 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 337d173..e85eed6 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2427,6 +2427,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, { u64 spte; int ret = 0; + bool remap = is_rmap_spte(*sptep); if (set_mmio_spte(vcpu->kvm, sptep, gfn, pfn, pte_access)) return 0; @@ -2488,12 +2489,73 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, } } - if (pte_access & ACC_WRITE_MASK) - mark_page_dirty(vcpu->kvm, gfn); - set_pte: if (mmu_spte_update(sptep, spte)) kvm_flush_remote_tlbs(vcpu->kvm); + + if (!remap) { + if (rmap_add(vcpu, sptep, gfn) > RMAP_RECYCLE_THRESHOLD) + rmap_recycle(vcpu, sptep, gfn); + + if (level > PT_PAGE_TABLE_LEVEL) + ++vcpu->kvm->stat.lpages; + } + + /* + * The orders we require are: + * 1) set spte to writable __before__ set the dirty bitmap. + * It makes sure that dirty-logging is not missed when do + * live migration at the final step where kvm should stop + * the guest and push the remaining dirty pages got from + * dirty-bitmap to the destination. The similar cases are + * in fast_pf_fix_direct_spte() and kvm_write_guest_page(). + * + * 2) add the spte into rmap __before__ set the dirty bitmap. + * + * They can ensure we can find the writable spte on the rmap + * when we do lockless write-protection since + * kvm_vm_ioctl_get_dirty_log() write-protects the pages based + * on its dirty-bitmap, otherwise these cases will happen: + * + * CPU 0 CPU 1 + * kvm ioctl doing get-dirty-pages + * mark_page_dirty(gfn) which + * set the gfn on the dirty maps + * mask = xchg(dirty_bitmap, 0) + * + * try to write-protect gfns which + * are set on "mask" then walk then + * rmap, see no spte on that rmap + * add the spte into rmap + * + * !!!!!! Then the page can be freely wrote but not recorded in + * the dirty bitmap. + * + * And: + * + * VCPU 0 CPU 1 + * kvm ioctl doing get-dirty-pages + * mark_page_dirty(gfn) which + * set the gfn on the dirty maps + * + * add spte into rmap + * mask = xchg(dirty_bitmap, 0) + * + * try to write-protect gfns which + * are set on "mask" then walk then + * rmap, see spte is on the ramp + * but it is readonly or nonpresent + * Mark spte writable + * + * !!!!!! Then the page can be freely wrote but not recorded in the + * dirty bitmap. + * + * See the comments in kvm_vm_ioctl_get_dirty_log(). + */ + smp_wmb(); + + if (pte_access & ACC_WRITE_MASK) + mark_page_dirty(vcpu->kvm, gfn); done: return ret; } @@ -2503,9 +2565,6 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, int level, gfn_t gfn, pfn_t pfn, bool speculative, bool host_writable) { - int was_rmapped = 0; - int rmap_count; - pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__, *sptep, write_fault, gfn); @@ -2527,8 +2586,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, spte_to_pfn(*sptep), pfn); drop_spte(vcpu->kvm, sptep); kvm_flush_remote_tlbs(vcpu->kvm); - } else - was_rmapped = 1; + } } if (set_spte(vcpu, sptep, pte_access, level, gfn, pfn, speculative, @@ -2546,16 +2604,6 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, is_large_pte(*sptep)? "2MB" : "4kB", *sptep & PT_PRESENT_MASK ?"RW":"R", gfn, *sptep, sptep); - if (!was_rmapped && is_large_pte(*sptep)) - ++vcpu->kvm->stat.lpages; - - if (is_shadow_present_pte(*sptep)) { - if (!was_rmapped) { - rmap_count = rmap_add(vcpu, sptep, gfn); - if (rmap_count > RMAP_RECYCLE_THRESHOLD) - rmap_recycle(vcpu, sptep, gfn); - } - } kvm_release_pfn_clean(pfn); } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 573c6b3..4ac3a27 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3566,6 +3566,16 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) is_dirty = true; mask = xchg(&dirty_bitmap[i], 0); + /* + * smp_rmb(); + * + * The read-barrier is implied in xchg() acting as a + * full barrier and it ensures getting dirty bitmap + * is before walking the rmap and spte write-protection. + * + * See the comments in set_spte(). + */ + dirty_bitmap_buffer[i] = mask; offset = i * BITS_PER_LONG;