From patchwork Fri Aug 25 09:35:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Shameerali Kolothum Thodi X-Patchwork-Id: 13365370 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8F19CC3DA66 for ; Fri, 25 Aug 2023 09:38:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:CC:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=k2TvBeUAbb9X//twuem7rt+X4n8+UUqJNDTspTq47js=; b=0YYTb3vdpHjMrT TX2XRhI8qGfRiXbyN0CFWI6HCOog/rssuwvOrW6mERF6MN4hydA85kSnBryg55shVnsmkS238Dt74 kPWhZRGsi4S5l3pcBUS5fLJw2rsnoJ/zP8WOTT6sK++Kuo061yG3LhWuKmpcsD3WSHDy0a8ObHk37 q7m11ul7mHs/aW3BZbHNkH6HUd7+S9SKcwLtr8aSD+EWTXGje0m2Wij8YFRXD0wXX2+bV0bdJOmmT ncpLuNu+BCcP+l52azfSjXLv4s+uM+hoH+sZFXASYvC2rzaaD80AZbb4BWzYywRNyKifom4ISMxfC I3x2LXK4D7q8y/JrChHQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qZTGC-004kvH-07; Fri, 25 Aug 2023 09:37:44 +0000 Received: from frasgout.his.huawei.com ([185.176.79.56]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1qZTFx-004knW-1u for linux-arm-kernel@lists.infradead.org; Fri, 25 Aug 2023 09:37:31 +0000 Received: from lhrpeml500005.china.huawei.com (unknown [172.18.147.200]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4RXFB336bSz67Q86; Fri, 25 Aug 2023 17:33:15 +0800 (CST) Received: from A2006125610.china.huawei.com (10.202.227.178) by lhrpeml500005.china.huawei.com (7.191.163.240) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Fri, 25 Aug 2023 10:37:18 +0100 From: Shameer Kolothum To: , , , , , , CC: , , , , , Subject: [RFC PATCH v2 8/8] KVM: arm64: Start up SW/HW combined dirty log Date: Fri, 25 Aug 2023 10:35:28 +0100 Message-ID: <20230825093528.1637-9-shameerali.kolothum.thodi@huawei.com> X-Mailer: git-send-email 2.12.0.windows.1 In-Reply-To: <20230825093528.1637-1-shameerali.kolothum.thodi@huawei.com> References: <20230825093528.1637-1-shameerali.kolothum.thodi@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.202.227.178] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To lhrpeml500005.china.huawei.com (7.191.163.240) X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230825_023729_926518_6387D731 X-CRM114-Status: GOOD ( 23.22 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Keqian Zhu When a user has enabled HW DBM support for live migration,set the HW DBM bit for nearby pages(64 pages) for a write faulting page. We track the DBM set pages in a separate bitmap and uses that during sync dirty log avoiding a full scan of PTEs. Signed-off-by: Keqian Zhu Signed-off-by: Shameer Kolothum --- arch/arm64/include/asm/kvm_host.h | 6 ++ arch/arm64/kvm/arm.c | 125 ++++++++++++++++++++++++++++++ arch/arm64/kvm/hyp/pgtable.c | 10 +-- arch/arm64/kvm/mmu.c | 11 ++- 4 files changed, 144 insertions(+), 8 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 17ac53150a1d..5f0be57eebc4 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -181,6 +181,8 @@ struct kvm_s2_mmu { }; struct kvm_arch_memory_slot { + #define HWDBM_GRANULE_SHIFT 6 /* 64 pages per bit */ + unsigned long *hwdbm_bitmap; }; /** @@ -901,6 +903,10 @@ struct kvm_vcpu_stat { u64 exits; }; +int kvm_arm_init_hwdbm_bitmap(struct kvm *kvm, struct kvm_memory_slot *memslot); +void kvm_arm_destroy_hwdbm_bitmap(struct kvm_memory_slot *memslot); +void kvm_arm_enable_nearby_hwdbm(struct kvm *kvm, gfn_t gfn); + void kvm_vcpu_preferred_target(struct kvm_vcpu_init *init); unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu); int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices); diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 0dbf2cda40d7..ab1e2da3bf0d 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -1540,9 +1540,134 @@ long kvm_arch_vcpu_ioctl(struct file *filp, return r; } +static unsigned long kvm_hwdbm_bitmap_bytes(struct kvm_memory_slot *memslot) +{ + unsigned long nbits = DIV_ROUND_UP(memslot->npages, 1 << HWDBM_GRANULE_SHIFT); + + return ALIGN(nbits, BITS_PER_LONG) / 8; +} + +static unsigned long *kvm_second_hwdbm_bitmap(struct kvm_memory_slot *memslot) +{ + unsigned long len = kvm_hwdbm_bitmap_bytes(memslot); + + return (void *)memslot->arch.hwdbm_bitmap + len; +} + +/* + * Allocate twice space. Refer kvm_arch_sync_dirty_log() to see why the + * second space is needed. + */ +int kvm_arm_init_hwdbm_bitmap(struct kvm *kvm, struct kvm_memory_slot *memslot) +{ + unsigned long bytes = 2 * kvm_hwdbm_bitmap_bytes(memslot); + + if (!kvm->arch.mmu.hwdbm_enabled) + return 0; + + if (memslot->arch.hwdbm_bitmap) { + /* Inherited from old memslot */ + bitmap_clear(memslot->arch.hwdbm_bitmap, 0, bytes * 8); + } else { + memslot->arch.hwdbm_bitmap = kvzalloc(bytes, GFP_KERNEL_ACCOUNT); + if (!memslot->arch.hwdbm_bitmap) + return -ENOMEM; + } + + return 0; +} + +void kvm_arm_destroy_hwdbm_bitmap(struct kvm_memory_slot *memslot) +{ + if (!memslot->arch.hwdbm_bitmap) + return; + + kvfree(memslot->arch.hwdbm_bitmap); + memslot->arch.hwdbm_bitmap = NULL; +} + +/* Add DBM for nearby pagetables but do not across memslot */ +void kvm_arm_enable_nearby_hwdbm(struct kvm *kvm, gfn_t gfn) +{ + struct kvm_memory_slot *memslot; + + memslot = gfn_to_memslot(kvm, gfn); + if (memslot && kvm_slot_dirty_track_enabled(memslot) && + memslot->arch.hwdbm_bitmap) { + unsigned long rel_gfn = gfn - memslot->base_gfn; + unsigned long dbm_idx = rel_gfn >> HWDBM_GRANULE_SHIFT; + unsigned long start_page, npages; + + if (!test_and_set_bit(dbm_idx, memslot->arch.hwdbm_bitmap)) { + start_page = dbm_idx << HWDBM_GRANULE_SHIFT; + npages = 1 << HWDBM_GRANULE_SHIFT; + npages = min(memslot->npages - start_page, npages); + kvm_stage2_set_dbm(kvm, memslot, start_page, npages); + } + } +} + +/* + * We have to find a place to clear hwdbm_bitmap, and clear hwdbm_bitmap means + * to clear DBM bit of all related pgtables. Note that between we clear DBM bit + * and flush TLB, HW dirty log may occur, so we must scan all related pgtables + * after flush TLB. Giving above, it's best choice to clear hwdbm_bitmap before + * sync HW dirty log. + */ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot) { + unsigned long *second_bitmap = kvm_second_hwdbm_bitmap(memslot); + unsigned long start_page, npages; + unsigned int end, rs, re; + bool has_hwdbm = false; + + if (!memslot->arch.hwdbm_bitmap) + return; + + end = kvm_hwdbm_bitmap_bytes(memslot) * 8; + bitmap_clear(second_bitmap, 0, end); + + write_lock(&kvm->mmu_lock); + for_each_set_bitrange(rs, re, memslot->arch.hwdbm_bitmap, end) { + has_hwdbm = true; + /* + * Must clear bitmap before clear DBM bit. During we clear DBM + * (it releases the mmu spinlock periodly), SW dirty tracking + * has chance to add DBM which overlaps what we are clearing. So + * if we clear bitmap after clear DBM, we will face a situation + * that bitmap is cleared but DBM are lefted, then we may have + * no chance to scan these lefted pgtables anymore. + */ + bitmap_clear(memslot->arch.hwdbm_bitmap, rs, re - rs); + + /* Record the bitmap cleared */ + bitmap_set(second_bitmap, rs, re - rs); + + start_page = rs << HWDBM_GRANULE_SHIFT; + npages = (re - rs) << HWDBM_GRANULE_SHIFT; + npages = min(memslot->npages - start_page, npages); + kvm_stage2_clear_dbm(kvm, memslot, start_page, npages); + } + write_unlock(&kvm->mmu_lock); + + if (!has_hwdbm) + return; + + /* + * Ensure vcpu write-actions that occur after we clear hwdbm_bitmap can + * be catched by guest memory abort handler. + */ + kvm_flush_remote_tlbs_memslot(kvm, memslot); + + read_lock(&kvm->mmu_lock); + for_each_set_bitrange(rs, re, second_bitmap, end) { + start_page = rs << HWDBM_GRANULE_SHIFT; + npages = (re - rs) << HWDBM_GRANULE_SHIFT; + npages = min(memslot->npages - start_page, npages); + kvm_stage2_sync_dirty(kvm, memslot, start_page, npages); + } + read_unlock(&kvm->mmu_lock); } static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm, diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 4552bfb1f274..330912d647c7 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -651,10 +651,10 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift) #ifdef CONFIG_ARM64_HW_AFDBM /* - * Enable the Hardware Access Flag management, unconditionally - * on all CPUs. In systems that have asymmetric support for the feature - * this allows KVM to leverage hardware support on the subset of cores - * that implement the feature. + * Enable the Hardware Access Flag management and Dirty State management, + * unconditionally on all CPUs. In systems that have asymmetric support for + * the feature this allows KVM to leverage hardware support on the subset of + * cores that implement the feature. * * The architecture requires VTCR_EL2.HA to be RES0 (thus ignored by * hardware) on implementations that do not advertise support for the @@ -663,7 +663,7 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift) * HAFDBS. Here be dragons. */ if (!cpus_have_final_cap(ARM64_WORKAROUND_AMPERE_AC03_CPU_38)) - vtcr |= VTCR_EL2_HA; + vtcr |= VTCR_EL2_HA | VTCR_EL2_HD; #endif /* CONFIG_ARM64_HW_AFDBM */ /* Set the vmid bits */ diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 34251932560e..b2fdcd762d70 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1569,14 +1569,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * permissions only if vma_pagesize equals fault_granule. Otherwise, * kvm_pgtable_stage2_map() should be called to change block size. */ - if (fault_status == ESR_ELx_FSC_PERM && vma_pagesize == fault_granule) + if (fault_status == ESR_ELx_FSC_PERM && vma_pagesize == fault_granule) { ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot); - else + /* Try to enable HW DBM for nearby pages */ + if (!ret && vma_pagesize == PAGE_SIZE && writable) + kvm_arm_enable_nearby_hwdbm(kvm, gfn); + } else { ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize, __pfn_to_phys(pfn), prot, memcache, KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED); + } /* Mark the page dirty only if the fault is handled successfully */ if (writable && !ret) { @@ -2046,11 +2050,12 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, } while (hva < reg_end); mmap_read_unlock(current->mm); - return ret; + return ret ? : kvm_arm_init_hwdbm_bitmap(kvm, new); } void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) { + kvm_arm_destroy_hwdbm_bitmap(slot); } void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)