From patchwork Thu Jan 14 12:13:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yanan Wang X-Patchwork-Id: 12019389 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BAFE0C43331 for ; Thu, 14 Jan 2021 12:15:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7EA9C23877 for ; Thu, 14 Jan 2021 12:15:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729044AbhANMOt (ORCPT ); Thu, 14 Jan 2021 07:14:49 -0500 Received: from szxga07-in.huawei.com ([45.249.212.35]:11385 "EHLO szxga07-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728036AbhANMOr (ORCPT ); Thu, 14 Jan 2021 07:14:47 -0500 Received: from DGGEMS414-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4DGjrD6PJFz7VLh; Thu, 14 Jan 2021 20:13:00 +0800 (CST) Received: from DESKTOP-TMVL5KK.china.huawei.com (10.174.187.128) by DGGEMS414-HUB.china.huawei.com (10.3.19.214) with Microsoft SMTP Server id 14.3.498.0; Thu, 14 Jan 2021 20:13:53 +0800 From: Yanan Wang To: Marc Zyngier , Will Deacon , "Catalin Marinas" , , , , CC: James Morse , Julien Thierry , Suzuki K Poulose , Gavin Shan , Quentin Perret , , , , , Yanan Wang Subject: [PATCH v3 1/3] KVM: arm64: Adjust partial code of hyp stage-1 map and guest stage-2 map Date: Thu, 14 Jan 2021 20:13:48 +0800 Message-ID: <20210114121350.123684-2-wangyanan55@huawei.com> X-Mailer: git-send-email 2.8.4.windows.1 In-Reply-To: <20210114121350.123684-1-wangyanan55@huawei.com> References: <20210114121350.123684-1-wangyanan55@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.187.128] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Procedures of hyp stage-1 map and guest stage-2 map are quite different, but they are tied closely by function kvm_set_valid_leaf_pte(). So adjust the relative code for ease of code maintenance in the future. Signed-off-by: Will Deacon Signed-off-by: Yanan Wang --- arch/arm64/kvm/hyp/pgtable.c | 55 ++++++++++++++++++------------------ 1 file changed, 28 insertions(+), 27 deletions(-) diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index bdf8e55ed308..a11ac874bc2a 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -170,10 +170,9 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp) smp_store_release(ptep, pte); } -static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr, - u32 level) +static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level) { - kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa); + kvm_pte_t pte = kvm_phys_to_pte(pa); u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE : KVM_PTE_TYPE_BLOCK; @@ -181,12 +180,7 @@ static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr, pte |= FIELD_PREP(KVM_PTE_TYPE, type); pte |= KVM_PTE_VALID; - /* Tolerate KVM recreating the exact same mapping. */ - if (kvm_pte_valid(old)) - return old == pte; - - smp_store_release(ptep, pte); - return true; + return pte; } static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr, @@ -341,12 +335,17 @@ static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot, static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, struct hyp_map_data *data) { + kvm_pte_t new, old = *ptep; u64 granule = kvm_granule_size(level), phys = data->phys; if (!kvm_block_mapping_supported(addr, end, phys, level)) return false; - WARN_ON(!kvm_set_valid_leaf_pte(ptep, phys, data->attr, level)); + /* Tolerate KVM recreating the exact same mapping */ + new = kvm_init_valid_leaf_pte(phys, data->attr, level); + if (old != new && !WARN_ON(kvm_pte_valid(old))) + smp_store_release(ptep, new); + data->phys += granule; return true; } @@ -465,27 +464,30 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, struct stage2_map_data *data) { + kvm_pte_t new, old = *ptep; u64 granule = kvm_granule_size(level), phys = data->phys; + struct page *page = virt_to_page(ptep); if (!kvm_block_mapping_supported(addr, end, phys, level)) return false; - /* - * If the PTE was already valid, drop the refcount on the table - * early, as it will be bumped-up again in stage2_map_walk_leaf(). - * This ensures that the refcount stays constant across a valid to - * valid PTE update. - */ - if (kvm_pte_valid(*ptep)) - put_page(virt_to_page(ptep)); + new = kvm_init_valid_leaf_pte(phys, data->attr, level); + if (kvm_pte_valid(old)) { + /* Tolerate KVM recreating the exact same mapping */ + if (old == new) + goto out; - if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level)) - goto out; + /* + * There's an existing different valid leaf entry, so perform + * break-before-make. + */ + kvm_set_invalid_pte(ptep); + kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level); + put_page(page); + } - /* There's an existing valid leaf entry, so perform break-before-make */ - kvm_set_invalid_pte(ptep); - kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level); - kvm_set_valid_leaf_pte(ptep, phys, data->attr, level); + smp_store_release(ptep, new); + get_page(page); out: data->phys += granule; return true; @@ -527,7 +529,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, } if (stage2_map_walker_try_leaf(addr, end, level, ptep, data)) - goto out_get_page; + return 0; if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1)) return -EINVAL; @@ -551,9 +553,8 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, } kvm_set_table_pte(ptep, childp); - -out_get_page: get_page(page); + return 0; } From patchwork Thu Jan 14 12:13:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yanan Wang X-Patchwork-Id: 12019393 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0394BC433E0 for ; Thu, 14 Jan 2021 12:15:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BA5CB2376E for ; Thu, 14 Jan 2021 12:15:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728980AbhANMOq (ORCPT ); Thu, 14 Jan 2021 07:14:46 -0500 Received: from szxga07-in.huawei.com ([45.249.212.35]:11387 "EHLO szxga07-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728604AbhANMOq (ORCPT ); Thu, 14 Jan 2021 07:14:46 -0500 Received: from DGGEMS414-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4DGjrD5ckxz7VLg; Thu, 14 Jan 2021 20:13:00 +0800 (CST) Received: from DESKTOP-TMVL5KK.china.huawei.com (10.174.187.128) by DGGEMS414-HUB.china.huawei.com (10.3.19.214) with Microsoft SMTP Server id 14.3.498.0; Thu, 14 Jan 2021 20:13:54 +0800 From: Yanan Wang To: Marc Zyngier , Will Deacon , "Catalin Marinas" , , , , CC: James Morse , Julien Thierry , Suzuki K Poulose , Gavin Shan , Quentin Perret , , , , , Yanan Wang Subject: [PATCH v3 2/3] KVM: arm64: Filter out the case of only changing permissions from stage-2 map path Date: Thu, 14 Jan 2021 20:13:49 +0800 Message-ID: <20210114121350.123684-3-wangyanan55@huawei.com> X-Mailer: git-send-email 2.8.4.windows.1 In-Reply-To: <20210114121350.123684-1-wangyanan55@huawei.com> References: <20210114121350.123684-1-wangyanan55@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.187.128] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org (1) During running time of a a VM with numbers of vCPUs, if some vCPUs access the same GPA almost at the same time and the stage-2 mapping of the GPA has not been built yet, as a result they will all cause translation faults. The first vCPU builds the mapping, and the followed ones end up updating the valid leaf PTE. Note that these vCPUs might want different access permissions (RO, RW, RX, RWX, etc.). (2) It's inevitable that we sometimes will update an existing valid leaf PTE in the map path, and we perform break-before-make in this case. Then more unnecessary translation faults could be caused if the *break stage* of BBM is just catched by other vCPUS. With (1) and (2), something unsatisfactory could happen: vCPU A causes a translation fault and builds the mapping with RW permissions, vCPU B then update the valid leaf PTE with break-before-make and permissions are updated back to RO. Besides, *break stage* of BBM may trigger more translation faults. Finally, some useless small loops could occur. We can make some optimization to solve above problems: When we need to update a valid leaf PTE in the map path, let's filter out the case where this update only change access permissions, and don't update the valid leaf PTE here in this case. Instead, let the vCPU enter back the guest and it will exit next time to go through the relax_perms path without break-before-make if it still wants more permissions. Signed-off-by: Yanan Wang --- arch/arm64/include/asm/kvm_pgtable.h | 5 +++++ arch/arm64/kvm/hyp/pgtable.c | 32 ++++++++++++++++++---------- 2 files changed, 26 insertions(+), 11 deletions(-) diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index 52ab38db04c7..8886d43cfb11 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -157,6 +157,11 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt); * If device attributes are not explicitly requested in @prot, then the * mapping will be normal, cacheable. * + * Note that the update of a valid leaf PTE in this function will be aborted, + * if it's trying to recreate the exact same mapping or only change the access + * permissions. Instead, the vCPU will exit one more time from guest if still + * needed and then go through the path of relaxing permissions. + * * Note that this function will both coalesce existing table entries and split * existing block mappings, relying on page-faults to fault back areas outside * of the new mapping lazily. diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index a11ac874bc2a..4d177ce1d536 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -45,6 +45,10 @@ #define KVM_PTE_LEAF_ATTR_HI_S2_XN BIT(54) +#define KVM_PTE_LEAF_ATTR_S2_PERMS (KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R | \ + KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \ + KVM_PTE_LEAF_ATTR_HI_S2_XN) + struct kvm_pgtable_walk_data { struct kvm_pgtable *pgt; struct kvm_pgtable_walker *walker; @@ -460,22 +464,27 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot, return 0; } -static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level, - kvm_pte_t *ptep, - struct stage2_map_data *data) +static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level, + kvm_pte_t *ptep, + struct stage2_map_data *data) { kvm_pte_t new, old = *ptep; u64 granule = kvm_granule_size(level), phys = data->phys; struct page *page = virt_to_page(ptep); if (!kvm_block_mapping_supported(addr, end, phys, level)) - return false; + return -E2BIG; new = kvm_init_valid_leaf_pte(phys, data->attr, level); if (kvm_pte_valid(old)) { - /* Tolerate KVM recreating the exact same mapping */ - if (old == new) - goto out; + /* + * Skip updating the PTE if we are trying to recreate the exact + * same mapping or only change the access permissions. Instead, + * the vCPU will exit one more time from guest if still needed + * and then go through the path of relaxing permissions. + */ + if (!((old ^ new) & (~KVM_PTE_LEAF_ATTR_S2_PERMS))) + return -EAGAIN; /* * There's an existing different valid leaf entry, so perform @@ -488,9 +497,8 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level, smp_store_release(ptep, new); get_page(page); -out: data->phys += granule; - return true; + return 0; } static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level, @@ -518,6 +526,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level, static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, struct stage2_map_data *data) { + int ret; kvm_pte_t *childp, pte = *ptep; struct page *page = virt_to_page(ptep); @@ -528,8 +537,9 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, return 0; } - if (stage2_map_walker_try_leaf(addr, end, level, ptep, data)) - return 0; + ret = stage2_map_walker_try_leaf(addr, end, level, ptep, data); + if (ret != -E2BIG) + return ret; if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1)) return -EINVAL; From patchwork Thu Jan 14 12:13:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yanan Wang X-Patchwork-Id: 12019387 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92F92C4332D for ; Thu, 14 Jan 2021 12:15:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 503B72376E for ; Thu, 14 Jan 2021 12:15:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729063AbhANMOv (ORCPT ); Thu, 14 Jan 2021 07:14:51 -0500 Received: from szxga07-in.huawei.com ([45.249.212.35]:11384 "EHLO szxga07-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727416AbhANMOv (ORCPT ); Thu, 14 Jan 2021 07:14:51 -0500 Received: from DGGEMS414-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4DGjrD61f6z7VLk; Thu, 14 Jan 2021 20:13:00 +0800 (CST) Received: from DESKTOP-TMVL5KK.china.huawei.com (10.174.187.128) by DGGEMS414-HUB.china.huawei.com (10.3.19.214) with Microsoft SMTP Server id 14.3.498.0; Thu, 14 Jan 2021 20:13:55 +0800 From: Yanan Wang To: Marc Zyngier , Will Deacon , "Catalin Marinas" , , , , CC: James Morse , Julien Thierry , Suzuki K Poulose , Gavin Shan , Quentin Perret , , , , , Yanan Wang Subject: [PATCH v3 3/3] KVM: arm64: Mark the page dirty only if the fault is handled successfully Date: Thu, 14 Jan 2021 20:13:50 +0800 Message-ID: <20210114121350.123684-4-wangyanan55@huawei.com> X-Mailer: git-send-email 2.8.4.windows.1 In-Reply-To: <20210114121350.123684-1-wangyanan55@huawei.com> References: <20210114121350.123684-1-wangyanan55@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.187.128] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org We now set the pfn dirty and mark the page dirty before calling fault handlers in user_mem_abort(), so we might end up having spurious dirty pages if update of permissions or mapping has failed. Let's move these two operations after the fault handlers, and they will be done only if the fault has been handled successfully. When an -EAGAIN errno is returned from the map handler, we hope to the vcpu to enter guest directly instead of exiting back to userspace, so adjust the return value at the end of function. Signed-off-by: Yanan Wang --- arch/arm64/kvm/mmu.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 7d2257cc5438..77cb2d28f2a4 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -879,11 +879,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (vma_pagesize == PAGE_SIZE && !force_pte) vma_pagesize = transparent_hugepage_adjust(memslot, hva, &pfn, &fault_ipa); - if (writable) { + if (writable) prot |= KVM_PGTABLE_PROT_W; - kvm_set_pfn_dirty(pfn); - mark_page_dirty(kvm, gfn); - } if (fault_status != FSC_PERM && !device) clean_dcache_guest_page(pfn, vma_pagesize); @@ -911,11 +908,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, memcache); } + /* Mark the page dirty only if the fault is handled successfully */ + if (writable && !ret) { + kvm_set_pfn_dirty(pfn); + mark_page_dirty(kvm, gfn); + } + out_unlock: spin_unlock(&kvm->mmu_lock); kvm_set_pfn_accessed(pfn); kvm_release_pfn_clean(pfn); - return ret; + return ret != -EAGAIN ? ret : 0; } /* Resolve the access fault by making the page young again. */