From patchwork Thu May 8 00:40:15 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mario Smarduch X-Patchwork-Id: 4132761 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 7602F9F334 for ; Thu, 8 May 2014 00:43:32 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 4BABD2026F for ; Thu, 8 May 2014 00:43:31 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0A79620253 for ; Thu, 8 May 2014 00:43:30 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1WiCOJ-0001aC-P7; Thu, 08 May 2014 00:40:51 +0000 Received: from mailout4.w2.samsung.com ([211.189.100.14] helo=usmailout4.samsung.com) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1WiCOC-0001X9-Ep for linux-arm-kernel@lists.infradead.org; Thu, 08 May 2014 00:40:45 +0000 Received: from uscpsbgex2.samsung.com (u123.gpu85.samsung.co.kr [203.254.195.123]) by usmailout4.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0N58007GWCJAVM60@usmailout4.samsung.com> for linux-arm-kernel@lists.infradead.org; Wed, 07 May 2014 20:40:22 -0400 (EDT) X-AuditID: cbfec37b-b7fc26d0000070ab-e1-536ad2767711 Received: from usmmp1.samsung.com ( [203.254.195.77]) by uscpsbgex2.samsung.com (USCPEXMTA) with SMTP id 8E.C4.28843.672DA635; Wed, 07 May 2014 20:40:22 -0400 (EDT) Received: from sisasmtp.sisa.samsung.com ([105.144.21.116]) by usmmp1.samsung.com (Oracle Communications Messaging Server 7u4-27.01(7.0.4.27.0) 64bit (built Aug 30 2012)) with ESMTP id <0N5800B31CJ95540@usmmp1.samsung.com>; Wed, 07 May 2014 20:40:22 -0400 (EDT) Received: from mjsmard-530U3C-530U4C-532U3C.sisa.samsung.com (105.144.129.76) by SISAEX02SJ.sisa.samsung.com (105.144.21.116) with Microsoft SMTP Server (TLS) id 14.1.421.2; Wed, 07 May 2014 17:40:21 -0700 From: Mario Smarduch To: kvmarm@lists.cs.columbia.edu, marc.zyngier@arm.com, christoffer.dall@linaro.org Subject: [PATCH v5 3/4] live migration support for VM dirty log management Date: Wed, 07 May 2014 17:40:15 -0700 Message-id: <1399509616-4632-4-git-send-email-m.smarduch@samsung.com> X-Mailer: git-send-email 1.7.9.5 In-reply-to: <1399509616-4632-1-git-send-email-m.smarduch@samsung.com> References: <1399509616-4632-1-git-send-email-m.smarduch@samsung.com> MIME-version: 1.0 X-Originating-IP: [105.144.129.76] X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrALMWRmVeSWpSXmKPExsVy+t9hX92yS1nBBv8nKVu8eP2P0aL3/0VW i/tXvzNazJlaaPHx1HF2i02Pr7Fa/L3zj81izpkHLBaT3mxjsvgwYyWjA5fHmnlrGD1mNfSy edy5tofN4/ymNcwem5fUe/RtWcXo8XmTXAB7FJdNSmpOZllqkb5dAlfG785nLAV7Aypu/L7D 3sB40r6LkZNDQsBEYuGu68wQtpjEhXvr2boYuTiEBJYxSry4/54Jwullkpg19yUrhHORUeL2 jvPsIC1sAroS++9tBLNFBEIlrv9tBOtgFnjOKPHr7RsmkISwgJfE090vWUBsFgFViZu3z4LZ vAKuEk/ubAPaxwG0W0FiziQbkDCngJvExhvTwWYKAZXMWHGYDaJcUOLH5HssIOXMAhISzz8r QZSoSmy7+ZwRYoqSxOoj5hMYhWYhaZiF0LCAkWkVo1hpcXJBcVJ6aoWRXnFibnFpXrpecn7u JkZIrFTvYLz71eYQowAHoxIPb4ZzVrAQa2JZcWXuIUYJDmYlEV7/yUAh3pTEyqrUovz4otKc 1OJDjEwcnFINjIVRvQq7/6ZvKjSXjRXVtE7+vV399Iu8ycx79x/ivsfTpn4ieAHDvAcbbk1+ MG9iqVvtnktuwW6/PX8Wcv/dUrK3ZEPvc92P97VqWE78rzzuJXSsp71rYcb/5lknJ08VmDfj Ba9EU2e7ht+KTY62XyoTZiZMcjgsc+RRkrjkhCXKc3mvqDJoP1NiKc5INNRiLipOBABtP+7/ cwIAAA== X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20140507_174044_649590_2EA73827 X-CRM114-Status: GOOD ( 23.98 ) X-Spam-Score: -5.7 (-----) Cc: peter.maydell@linaro.org, kvm@vger.kernel.org, steve.capper@arm.com, linux-arm-kernel@lists.infradead.org, jays.lee@samsung.com, sungjinn.chung@samsung.com, gavin.guo@canonical.com, Mario Smarduch X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch adds support for keeping track of VM dirty pages, by updating per memslot dirty bitmap and write protecting the page again. Signed-off-by: Mario Smarduch --- arch/arm/include/asm/kvm_host.h | 3 ++ arch/arm/kvm/arm.c | 5 -- arch/arm/kvm/mmu.c | 99 +++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/x86.c | 86 ---------------------------------- virt/kvm/kvm_main.c | 83 ++++++++++++++++++++++++++++++++ 5 files changed, 185 insertions(+), 91 deletions(-) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 91744c3..e2db1b5 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -239,5 +239,8 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value); void kvm_tlb_flush_vmid(struct kvm *kvm); int kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot); +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn_offset, unsigned long mask); #endif /* __ARM_KVM_HOST_H__ */ diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 1055266..0b847b5 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -777,11 +777,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp, } } -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) -{ - return -EINVAL; -} - static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm, struct kvm_arm_device_addr *dev_addr) { diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 85145d8..1458b6e 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -922,6 +922,105 @@ out: return ret; } +/** + * kvm_mmu_write_protected_pt_masked - after migration thread write protects + * the entire VM address space itterative call are made to get diry pags + * as the VM pages are being migrated. New dirty pages may be subset + * of initial WPed VM or new writes faulted in. Here write protect new + * dirty pages again in preparation of next dirty log read. This function is + * called as a result KVM_GET_DIRTY_LOG ioctl, to determine what pages + * need to be migrated. + * 'kvm->mmu_lock' must be held to protect against concurrent modification + * of page tables (2nd stage fault, mmu modifiers, ...) + * + * @kvm: The KVM pointer + * @slot: The memory slot the dirty log is retrieved for + * @gfn_offset: The gfn offset in memory slot + * @mask: The mask of dirty pages at offset 'gnf_offset in this memory + * slot to be writ protect + */ + +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn_offset, unsigned long mask) +{ + phys_addr_t ipa, next, offset_ipa; + pgd_t *pgdp = kvm->arch.pgd, *pgd; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + gfn_t gfnofst = slot->base_gfn + gfn_offset; + bool crosses_pmd; + + ipa = (gfnofst + __ffs(mask)) << PAGE_SHIFT; + offset_ipa = gfnofst << PAGE_SHIFT; + next = (gfnofst + (BITS_PER_LONG - 1)) << PAGE_SHIFT; + + /* check if mask width crosses 2nd level page table range, and + * possibly 3rd, 4th. If not skip upper table lookups. Unlikely + * to be true machine memory regions tend to start on atleast PMD + * boundary and mask is a power of 2. + */ + crosses_pmd = ((offset_ipa & PMD_MASK) ^ (next & PMD_MASK)) ? true : + false; + + /* If pgd, pud, pmd not present and you cross pmd range check next + * index. Unlikely that pgd and pud would be not present. Between + * dirty page marking and now page tables may have been altered. + */ + pgd = pgdp + pgd_index(ipa); + if (unlikely(crosses_pmd && !pgd_present(*pgd))) { + pgd = pgdp + pgd_index(next); + if (!pgd_present(*pgd)) + return; + } + + pud = pud_offset(pgd, ipa); + if (unlikely(crosses_pmd && !pud_present(*pud))) { + pud = pud_offset(pgd, next); + if (!pud_present(*pud)) + return; + } + + pmd = pmd_offset(pud, ipa); + if (unlikely(crosses_pmd && !pmd_present(*pmd))) { + pmd = pmd_offset(pud, next); + if (!pmd_present(*pmd)) + return; + } + + for (;;) { + pte = pte_offset_kernel(pmd, ipa); + if (!pte_present(*pte)) + goto next_ipa; + + if (kvm_s2pte_readonly(pte)) + goto next_ipa; + kvm_set_s2pte_readonly(pte); +next_ipa: + mask &= mask - 1; + if (!mask) + break; + + /* find next page */ + ipa = (gfnofst + __ffs(mask)) << PAGE_SHIFT; + + /* skip upper page table lookups */ + if (!crosses_pmd) + continue; + + pgd = pgdp + pgd_index(ipa); + if (unlikely(!pgd_present(*pgd))) + goto next_ipa; + pud = pud_offset(pgd, ipa); + if (unlikely(!pud_present(*pud))) + goto next_ipa; + pmd = pmd_offset(pud, ipa); + if (unlikely(!pmd_present(*pmd))) + goto next_ipa; + } +} + static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct kvm_memory_slot *memslot, unsigned long fault_status) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c5582c3..a603ca3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3569,92 +3569,6 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm, return 0; } -/** - * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot - * @kvm: kvm instance - * @log: slot id and address to which we copy the log - * - * We need to keep it in mind that VCPU threads can write to the bitmap - * concurrently. So, to avoid losing data, we keep the following order for - * each bit: - * - * 1. Take a snapshot of the bit and clear it if needed. - * 2. Write protect the corresponding page. - * 3. Flush TLB's if needed. - * 4. Copy the snapshot to the userspace. - * - * Between 2 and 3, the guest may write to the page using the remaining TLB - * entry. This is not a problem because the page will be reported dirty at - * step 4 using the snapshot taken before and step 3 ensures that successive - * writes will be logged for the next call. - */ -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) -{ - int r; - struct kvm_memory_slot *memslot; - unsigned long n, i; - unsigned long *dirty_bitmap; - unsigned long *dirty_bitmap_buffer; - bool is_dirty = false; - - mutex_lock(&kvm->slots_lock); - - r = -EINVAL; - if (log->slot >= KVM_USER_MEM_SLOTS) - goto out; - - memslot = id_to_memslot(kvm->memslots, log->slot); - - dirty_bitmap = memslot->dirty_bitmap; - r = -ENOENT; - if (!dirty_bitmap) - goto out; - - n = kvm_dirty_bitmap_bytes(memslot); - - dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long); - memset(dirty_bitmap_buffer, 0, n); - - spin_lock(&kvm->mmu_lock); - - for (i = 0; i < n / sizeof(long); i++) { - unsigned long mask; - gfn_t offset; - - if (!dirty_bitmap[i]) - continue; - - is_dirty = true; - - mask = xchg(&dirty_bitmap[i], 0); - dirty_bitmap_buffer[i] = mask; - - offset = i * BITS_PER_LONG; - kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask); - } - - spin_unlock(&kvm->mmu_lock); - - /* See the comments in kvm_mmu_slot_remove_write_access(). */ - lockdep_assert_held(&kvm->slots_lock); - - /* - * All the TLBs can be flushed out of mmu lock, see the comments in - * kvm_mmu_slot_remove_write_access(). - */ - if (is_dirty) - kvm_flush_remote_tlbs(kvm); - - r = -EFAULT; - if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n)) - goto out; - - r = 0; -out: - mutex_unlock(&kvm->slots_lock); - return r; -} - int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event, bool line_status) { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e49f976..7d95700 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -433,6 +433,89 @@ static int kvm_init_mmu_notifier(struct kvm *kvm) return mmu_notifier_register(&kvm->mmu_notifier, current->mm); } + +/** + * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot + * @kvm: kvm instance + * @log: slot id and address to which we copy the log + * + * Shared by x86 and ARM. + * + * We need to keep it in mind that VCPU threads can write to the bitmap + * concurrently. So, to avoid losing data, we keep the following order for + * each bit: + * + * 1. Take a snapshot of the bit and clear it if needed. + * 2. Write protect the corresponding page. + * 3. Flush TLB's if needed. + * 4. Copy the snapshot to the userspace. + * + * Between 2 and 3, the guest may write to the page using the remaining TLB + * entry. This is not a problem because the page will be reported dirty at + * step 4 using the snapshot taken before and step 3 ensures that successive + * writes will be logged for the next call. + */ + +int __weak kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, + struct kvm_dirty_log *log) +{ + int r; + struct kvm_memory_slot *memslot; + unsigned long n, i; + unsigned long *dirty_bitmap; + unsigned long *dirty_bitmap_buffer; + bool is_dirty = false; + + mutex_lock(&kvm->slots_lock); + + r = -EINVAL; + if (log->slot >= KVM_USER_MEM_SLOTS) + goto out; + + memslot = id_to_memslot(kvm->memslots, log->slot); + + dirty_bitmap = memslot->dirty_bitmap; + r = -ENOENT; + if (!dirty_bitmap) + goto out; + + n = kvm_dirty_bitmap_bytes(memslot); + + dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long); + memset(dirty_bitmap_buffer, 0, n); + + spin_lock(&kvm->mmu_lock); + + for (i = 0; i < n / sizeof(long); i++) { + unsigned long mask; + gfn_t offset; + + if (!dirty_bitmap[i]) + continue; + + is_dirty = true; + + mask = xchg(&dirty_bitmap[i], 0); + dirty_bitmap_buffer[i] = mask; + + offset = i * BITS_PER_LONG; + kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask); + } + if (is_dirty) + kvm_flush_remote_tlbs(kvm); + + spin_unlock(&kvm->mmu_lock); + + r = -EFAULT; + if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n)) + goto out; + + r = 0; +out: + mutex_unlock(&kvm->slots_lock); + return r; +} + #else /* !(CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER) */ static int kvm_init_mmu_notifier(struct kvm *kvm)