From patchwork Tue Aug 30 19:41:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12959807 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34B43ECAAA1 for ; Tue, 30 Aug 2022 19:41:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230424AbiH3Tl5 (ORCPT ); Tue, 30 Aug 2022 15:41:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36060 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229513AbiH3Tlz (ORCPT ); Tue, 30 Aug 2022 15:41:55 -0400 Received: from out1.migadu.com (out1.migadu.com [91.121.223.63]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1650F726A0; Tue, 30 Aug 2022 12:41:54 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1661888512; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QlZz4xsWZh51ZQP7HNgRWu8mkrkQyxoC5UQCh4dA9Gc=; b=oW0OkHRXICpe1qsMt0Vkf05qJypn89W1g6AlFhTy1KOz5tK4JqHClut6fFaXYCX49Vc1QJ jhWNFu4tgN1b56RXLkVgSjCkxoSdx4TVq936DjQMlyZOnhuv8/848Ta6ygx6EIPnwQZlbP IAcGQ4HxQopueoqeyeKY2SIVQBOPmhQ= From: Oliver Upton To: Marc Zyngier , James Morse , Alexandru Elisei , Suzuki K Poulose , Catalin Marinas , Will Deacon , Quentin Perret , Ricardo Koller , Reiji Watanabe , David Matlack , Ben Gardon , Paolo Bonzini , Gavin Shan , Peter Xu , Sean Christopherson , Oliver Upton Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 01/14] KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees Date: Tue, 30 Aug 2022 19:41:19 +0000 Message-Id: <20220830194132.962932-2-oliver.upton@linux.dev> In-Reply-To: <20220830194132.962932-1-oliver.upton@linux.dev> References: <20220830194132.962932-1-oliver.upton@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org A subsequent change to KVM will move the tear down of an unlinked stage-2 subtree out of the critical path of the break-before-make sequence. Introduce a new helper for tearing down unlinked stage-2 subtrees. Leverage the existing stage-2 free walkers to do so, with a deep call into __kvm_pgtable_walk() as the subtree is no longer reachable from the root. Signed-off-by: Oliver Upton --- arch/arm64/include/asm/kvm_pgtable.h | 11 +++++++++++ arch/arm64/kvm/hyp/pgtable.c | 26 ++++++++++++++++++++++++++ 2 files changed, 37 insertions(+) diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index 9f339dffbc1a..d71fb92dc913 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -316,6 +316,17 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu, */ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt); +/** + * kvm_pgtable_stage2_free_removed() - Free a removed stage-2 paging structure. + * @pgtable: Unlinked stage-2 paging structure to be freed. + * @level: Level of the stage-2 paging structure to be freed. + * @arg: Page-table structure initialised by kvm_pgtable_stage2_init*() + * + * The page-table is assumed to be unreachable by any hardware walkers prior to + * freeing and therefore no TLB invalidation is performed. + */ +void kvm_pgtable_stage2_free_removed(void *pgtable, u32 level, void *arg); + /** * kvm_pgtable_stage2_map() - Install a mapping in a guest stage-2 page-table. * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*(). diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 2cb3867eb7c2..d8127c25424c 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -1233,3 +1233,29 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt) pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz); pgt->pgd = NULL; } + +void kvm_pgtable_stage2_free_removed(void *pgtable, u32 level, void *arg) +{ + struct kvm_pgtable *pgt = (struct kvm_pgtable *)arg; + kvm_pte_t *ptep = (kvm_pte_t *)pgtable; + struct kvm_pgtable_walker walker = { + .cb = stage2_free_walker, + .flags = KVM_PGTABLE_WALK_LEAF | + KVM_PGTABLE_WALK_TABLE_POST, + .arg = pgt->mm_ops, + }; + struct kvm_pgtable_walk_data data = { + .pgt = pgt, + .walker = &walker, + + /* + * At this point the IPA really doesn't matter, as the page + * table being traversed has already been removed from the stage + * 2. Set an appropriate range to cover the entire page table. + */ + .addr = 0, + .end = kvm_granule_size(level), + }; + + WARN_ON(__kvm_pgtable_walk(&data, ptep, level)); +} From patchwork Tue Aug 30 19:41:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12959808 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA1ABECAAA1 for ; Tue, 30 Aug 2022 19:42:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230502AbiH3TmD (ORCPT ); Tue, 30 Aug 2022 15:42:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36236 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229674AbiH3TmA (ORCPT ); Tue, 30 Aug 2022 15:42:00 -0400 Received: from out1.migadu.com (out1.migadu.com [91.121.223.63]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A14F479A7B; Tue, 30 Aug 2022 12:41:58 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1661888516; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5sWbvmiQR0z049lVsjNnTPN0TZTiQqLExNGRDbkgz5I=; b=do6y73UztHCQleZRHs8w3lr1o3uVhCdrpHSBmIpJm8DfT8v1vNA8c4/CM9iztkvfL04B5T I4VK7tBQzNqgv+ySUzpKg/kwxO1M79nW5h6b8cU6Bjj8eOKlTq+c0FMtCeXDjSoFdXJC+d zyRLHQrz7Br1BT9BrRxRqhwtshSAhXE= From: Oliver Upton To: Marc Zyngier , James Morse , Alexandru Elisei , Suzuki K Poulose , Catalin Marinas , Will Deacon , Quentin Perret , Ricardo Koller , Reiji Watanabe , David Matlack , Ben Gardon , Paolo Bonzini , Gavin Shan , Peter Xu , Sean Christopherson , Oliver Upton Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 02/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make Date: Tue, 30 Aug 2022 19:41:20 +0000 Message-Id: <20220830194132.962932-3-oliver.upton@linux.dev> In-Reply-To: <20220830194132.962932-1-oliver.upton@linux.dev> References: <20220830194132.962932-1-oliver.upton@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The break-before-make sequence is a bit annoying as it opens a window wherein memory is unmapped from the guest. KVM should replace the PTE as quickly as possible and avoid unnecessary work in between. Presently, the stage-2 map walker tears down a removed table before installing a block mapping when coalescing a table into a block. As the removed table is no longer visible to hardware walkers after the DSB+TLBI, it is possible to move the remaining cleanup to happen after installing the new PTE. Reshuffle the stage-2 map walker to install the new block entry in the pre-order callback. Unwire all of the teardown logic and replace it with a call to kvm_pgtable_stage2_free_removed() after fixing the PTE. The post-order visitor is now completely unnecessary, so drop it. Finally, touch up the comments to better represent the now simplified map walker. Note that the call to tear down the unlinked stage-2 is indirected as a subsequent change will use an RCU callback to trigger tear down. RCU is not available to pKVM, so there is a need to use different implementations on pKVM and non-pKVM VMs. Signed-off-by: Oliver Upton --- arch/arm64/include/asm/kvm_pgtable.h | 3 + arch/arm64/kvm/hyp/nvhe/mem_protect.c | 1 + arch/arm64/kvm/hyp/pgtable.c | 83 ++++++++------------------- arch/arm64/kvm/mmu.c | 1 + 4 files changed, 28 insertions(+), 60 deletions(-) diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index d71fb92dc913..c25633f53b2b 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -77,6 +77,8 @@ static inline bool kvm_level_supports_block_mapping(u32 level) * allocation is physically contiguous. * @free_pages_exact: Free an exact number of memory pages previously * allocated by zalloc_pages_exact. + * @free_removed_table: Free a removed paging structure by unlinking and + * dropping references. * @get_page: Increment the refcount on a page. * @put_page: Decrement the refcount on a page. When the * refcount reaches 0 the page is automatically @@ -95,6 +97,7 @@ struct kvm_pgtable_mm_ops { void* (*zalloc_page)(void *arg); void* (*zalloc_pages_exact)(size_t size); void (*free_pages_exact)(void *addr, size_t size); + void (*free_removed_table)(void *addr, u32 level, void *arg); void (*get_page)(void *addr); void (*put_page)(void *addr); int (*page_count)(void *addr); diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c index 1e78acf9662e..a930fdee6fce 100644 --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c @@ -93,6 +93,7 @@ static int prepare_s2_pool(void *pgt_pool_base) host_kvm.mm_ops = (struct kvm_pgtable_mm_ops) { .zalloc_pages_exact = host_s2_zalloc_pages_exact, .zalloc_page = host_s2_zalloc_page, + .free_removed_table = kvm_pgtable_stage2_free_removed, .phys_to_virt = hyp_phys_to_virt, .virt_to_phys = hyp_virt_to_phys, .page_count = hyp_page_count, diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index d8127c25424c..5c0c8028d71c 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -763,17 +763,21 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level, return 0; } +static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, + struct stage2_map_data *data); + static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, struct stage2_map_data *data) { - if (data->anchor) - return 0; + struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops; + kvm_pte_t *childp = kvm_pte_follow(*ptep, mm_ops); + struct kvm_pgtable *pgt = data->mmu->pgt; + int ret; if (!stage2_leaf_mapping_allowed(addr, end, level, data)) return 0; - data->childp = kvm_pte_follow(*ptep, data->mm_ops); kvm_clear_pte(ptep); /* @@ -782,8 +786,13 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level, * individually. */ kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu); - data->anchor = ptep; - return 0; + + ret = stage2_map_walk_leaf(addr, end, level, ptep, data); + + mm_ops->put_page(ptep); + mm_ops->free_removed_table(childp, level + 1, pgt); + + return ret; } static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, @@ -793,13 +802,6 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_pte_t *childp, pte = *ptep; int ret; - if (data->anchor) { - if (stage2_pte_is_counted(pte)) - mm_ops->put_page(ptep); - - return 0; - } - ret = stage2_map_walker_try_leaf(addr, end, level, ptep, data); if (ret != -E2BIG) return ret; @@ -828,50 +830,14 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, return 0; } -static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level, - kvm_pte_t *ptep, - struct stage2_map_data *data) -{ - struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops; - kvm_pte_t *childp; - int ret = 0; - - if (!data->anchor) - return 0; - - if (data->anchor == ptep) { - childp = data->childp; - data->anchor = NULL; - data->childp = NULL; - ret = stage2_map_walk_leaf(addr, end, level, ptep, data); - } else { - childp = kvm_pte_follow(*ptep, mm_ops); - } - - mm_ops->put_page(childp); - mm_ops->put_page(ptep); - - return ret; -} - /* - * This is a little fiddly, as we use all three of the walk flags. The idea - * is that the TABLE_PRE callback runs for table entries on the way down, - * looking for table entries which we could conceivably replace with a - * block entry for this mapping. If it finds one, then it sets the 'anchor' - * field in 'struct stage2_map_data' to point at the table entry, before - * clearing the entry to zero and descending into the now detached table. - * - * The behaviour of the LEAF callback then depends on whether or not the - * anchor has been set. If not, then we're not using a block mapping higher - * up the table and we perform the mapping at the existing leaves instead. - * If, on the other hand, the anchor _is_ set, then we drop references to - * all valid leaves so that the pages beneath the anchor can be freed. + * The TABLE_PRE callback runs for table entries on the way down, looking + * for table entries which we could conceivably replace with a block entry + * for this mapping. If it finds one it replaces the entry and calls + * kvm_pgtable_mm_ops::free_removed_table() to tear down the detached table. * - * Finally, the TABLE_POST callback does nothing if the anchor has not - * been set, but otherwise frees the page-table pages while walking back up - * the page-table, installing the block entry when it revisits the anchor - * pointer and clearing the anchor to NULL. + * Otherwise, the LEAF callback performs the mapping at the existing leaves + * instead. */ static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, enum kvm_pgtable_walk_flags flag, void * const arg) @@ -883,11 +849,9 @@ static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, return stage2_map_walk_table_pre(addr, end, level, ptep, data); case KVM_PGTABLE_WALK_LEAF: return stage2_map_walk_leaf(addr, end, level, ptep, data); - case KVM_PGTABLE_WALK_TABLE_POST: - return stage2_map_walk_table_post(addr, end, level, ptep, data); + default: + return -EINVAL; } - - return -EINVAL; } int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size, @@ -905,8 +869,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size, struct kvm_pgtable_walker walker = { .cb = stage2_map_walker, .flags = KVM_PGTABLE_WALK_TABLE_PRE | - KVM_PGTABLE_WALK_LEAF | - KVM_PGTABLE_WALK_TABLE_POST, + KVM_PGTABLE_WALK_LEAF, .arg = &map_data, }; diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index c9a13e487187..91521f4aab97 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -627,6 +627,7 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = { .zalloc_page = stage2_memcache_zalloc_page, .zalloc_pages_exact = kvm_host_zalloc_pages_exact, .free_pages_exact = free_pages_exact, + .free_removed_table = kvm_pgtable_stage2_free_removed, .get_page = kvm_host_get_page, .put_page = kvm_host_put_page, .page_count = kvm_host_page_count, From patchwork Tue Aug 30 19:41:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12959809 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43410ECAAD5 for ; Tue, 30 Aug 2022 19:42:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231276AbiH3TmJ (ORCPT ); Tue, 30 Aug 2022 15:42:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36442 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231272AbiH3TmE (ORCPT ); Tue, 30 Aug 2022 15:42:04 -0400 Received: from out1.migadu.com (out1.migadu.com [91.121.223.63]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5CD7575FE0; Tue, 30 Aug 2022 12:42:03 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1661888521; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Xy7sGLxjAT/lUb6VxuHy5G9CxzSH9danSLyrRA8SNo8=; b=YWhPezTnq9e2aJBPAISR7BKtVevt4gSoGZZr0fvBiC9+/vCIdM0mN6owEMQrvhKDyeCros rxi1MU/gBeGwWVYGk8vk0QtJi6NA82RCdujVR8y3X2gzYtCNz4LzR+OQMJISEBRDYdvGo+ rA6kt1uyY+JJwu/CQ9MKOfx7IuqPbyE= From: Oliver Upton To: Marc Zyngier , James Morse , Alexandru Elisei , Suzuki K Poulose , Catalin Marinas , Will Deacon , Quentin Perret , Ricardo Koller , Reiji Watanabe , David Matlack , Ben Gardon , Paolo Bonzini , Gavin Shan , Peter Xu , Sean Christopherson , Oliver Upton Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 03/14] KVM: arm64: Directly read owner id field in stage2_pte_is_counted() Date: Tue, 30 Aug 2022 19:41:21 +0000 Message-Id: <20220830194132.962932-4-oliver.upton@linux.dev> In-Reply-To: <20220830194132.962932-1-oliver.upton@linux.dev> References: <20220830194132.962932-1-oliver.upton@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org A subsequent change to KVM will make use of additional bits in invalid ptes. Prepare for said change by explicitly checking the valid bit and owner fields in stage2_pte_is_counted() Signed-off-by: Oliver Upton --- arch/arm64/kvm/hyp/pgtable.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 5c0c8028d71c..b6ce786ae570 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -172,6 +172,11 @@ static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id) return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id); } +static u8 kvm_invalid_pte_owner(kvm_pte_t pte) +{ + return FIELD_GET(KVM_INVALID_PTE_OWNER_MASK, pte); +} + static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr, u32 level, kvm_pte_t *ptep, enum kvm_pgtable_walk_flags flag) @@ -679,7 +684,7 @@ static bool stage2_pte_is_counted(kvm_pte_t pte) * encode ownership of a page to another entity than the page-table * owner, whose id is 0. */ - return !!pte; + return kvm_pte_valid(pte) || kvm_invalid_pte_owner(pte); } static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr, From patchwork Tue Aug 30 19:41:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12959810 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 721B7C0502A for ; Tue, 30 Aug 2022 19:42:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231401AbiH3TmU (ORCPT ); Tue, 30 Aug 2022 15:42:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231387AbiH3TmM (ORCPT ); Tue, 30 Aug 2022 15:42:12 -0400 Received: from out1.migadu.com (out1.migadu.com [IPv6:2001:41d0:2:863f::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4CC9479ED1; Tue, 30 Aug 2022 12:42:08 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1661888526; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=540ijd3DSISikR4MfZOPq9ymXvASuAcxTCB5BqenS8g=; b=Wtd7wVmgOM8SS945uvAyWmwRa4U4EJrNf2/GNcY+qdiln8p1rNKwfbVBe9fQ1AUCNtE4YR xo1zOAK0rJx10pPwtZFj7T+CxMeK+Pv2Ohl4GJKfI/E6WlPJRQ56R2WF+zLzkmCmOZWmZz PZq8+6SSFO7ZOhC3/mJDRN04baGgGuI= From: Oliver Upton To: Marc Zyngier , James Morse , Alexandru Elisei , Suzuki K Poulose , Catalin Marinas , Will Deacon , Quentin Perret , Ricardo Koller , Reiji Watanabe , David Matlack , Ben Gardon , Paolo Bonzini , Gavin Shan , Peter Xu , Sean Christopherson , Oliver Upton Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 04/14] KVM: arm64: Read the PTE once per visit Date: Tue, 30 Aug 2022 19:41:22 +0000 Message-Id: <20220830194132.962932-5-oliver.upton@linux.dev> In-Reply-To: <20220830194132.962932-1-oliver.upton@linux.dev> References: <20220830194132.962932-1-oliver.upton@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The page table walkers read the PTE multiple times per visit. Presently, that is safe as changes to the non-leaf PTEs are serialized. A subsequent change to KVM will enable parallel modifications to the stage 2 page tables. Prepare by ensuring a PTE is read only once per visit. Promote the PTE read in __kvm_pgtable_visit() to READ_ONCE() and pass the observed value through to callbacks. Note that the PTE is passed as a pointer to the callbacks; visitors that install new tables need to aim traversal at the new table. Signed-off-by: Oliver Upton --- arch/arm64/include/asm/kvm_pgtable.h | 8 ++- arch/arm64/kvm/hyp/nvhe/mem_protect.c | 4 +- arch/arm64/kvm/hyp/nvhe/setup.c | 4 +- arch/arm64/kvm/hyp/pgtable.c | 73 ++++++++++++++------------- 4 files changed, 48 insertions(+), 41 deletions(-) diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index c25633f53b2b..47920ae3f7e7 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -195,7 +195,7 @@ enum kvm_pgtable_walk_flags { }; typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level, - kvm_pte_t *ptep, + kvm_pte_t *ptep, kvm_pte_t *old, enum kvm_pgtable_walk_flags flag, void * const arg); @@ -561,4 +561,10 @@ enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte); * kvm_pgtable_prot format. */ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte); + +static inline kvm_pte_t kvm_pte_read(kvm_pte_t *ptep) +{ + return READ_ONCE(*ptep); +} + #endif /* __ARM64_KVM_PGTABLE_H__ */ diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c index a930fdee6fce..61cf223e0796 100644 --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c @@ -419,12 +419,12 @@ struct check_walk_data { }; static int __check_page_state_visitor(u64 addr, u64 end, u32 level, - kvm_pte_t *ptep, + kvm_pte_t *ptep, kvm_pte_t *old, enum kvm_pgtable_walk_flags flag, void * const arg) { struct check_walk_data *d = arg; - kvm_pte_t pte = *ptep; + kvm_pte_t pte = *old; if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte))) return -EINVAL; diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c index e8d4ea2fcfa0..2b62ca58ebd4 100644 --- a/arch/arm64/kvm/hyp/nvhe/setup.c +++ b/arch/arm64/kvm/hyp/nvhe/setup.c @@ -187,14 +187,14 @@ static void hpool_put_page(void *addr) } static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level, - kvm_pte_t *ptep, + kvm_pte_t *ptep, kvm_pte_t *old, enum kvm_pgtable_walk_flags flag, void * const arg) { struct kvm_pgtable_mm_ops *mm_ops = arg; enum kvm_pgtable_prot prot; enum pkvm_page_state state; - kvm_pte_t pte = *ptep; + kvm_pte_t pte = *old; phys_addr_t phys; if (!kvm_pte_valid(pte)) diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index b6ce786ae570..430753fbb727 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -178,11 +178,11 @@ static u8 kvm_invalid_pte_owner(kvm_pte_t pte) } static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr, - u32 level, kvm_pte_t *ptep, + u32 level, kvm_pte_t *ptep, kvm_pte_t *old, enum kvm_pgtable_walk_flags flag) { struct kvm_pgtable_walker *walker = data->walker; - return walker->cb(addr, data->end, level, ptep, flag, walker->arg); + return walker->cb(addr, data->end, level, ptep, old, flag, walker->arg); } static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data, @@ -193,17 +193,17 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data, { int ret = 0; u64 addr = data->addr; - kvm_pte_t *childp, pte = *ptep; + kvm_pte_t *childp, pte = kvm_pte_read(ptep); bool table = kvm_pte_table(pte, level); enum kvm_pgtable_walk_flags flags = data->walker->flags; if (table && (flags & KVM_PGTABLE_WALK_TABLE_PRE)) { - ret = kvm_pgtable_visitor_cb(data, addr, level, ptep, + ret = kvm_pgtable_visitor_cb(data, addr, level, ptep, &pte, KVM_PGTABLE_WALK_TABLE_PRE); } if (!table && (flags & KVM_PGTABLE_WALK_LEAF)) { - ret = kvm_pgtable_visitor_cb(data, addr, level, ptep, + ret = kvm_pgtable_visitor_cb(data, addr, level, ptep, &pte, KVM_PGTABLE_WALK_LEAF); pte = *ptep; table = kvm_pte_table(pte, level); @@ -224,7 +224,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data, goto out; if (flags & KVM_PGTABLE_WALK_TABLE_POST) { - ret = kvm_pgtable_visitor_cb(data, addr, level, ptep, + ret = kvm_pgtable_visitor_cb(data, addr, level, ptep, &pte, KVM_PGTABLE_WALK_TABLE_POST); } @@ -297,12 +297,12 @@ struct leaf_walk_data { u32 level; }; -static int leaf_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, +static int leaf_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_pte_t *old, enum kvm_pgtable_walk_flags flag, void * const arg) { struct leaf_walk_data *data = arg; - data->pte = *ptep; + data->pte = *old; data->level = level; return 0; @@ -388,10 +388,10 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte) return prot; } -static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level, - kvm_pte_t *ptep, struct hyp_map_data *data) +static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, + kvm_pte_t old, struct hyp_map_data *data) { - kvm_pte_t new, old = *ptep; + kvm_pte_t new; u64 granule = kvm_granule_size(level), phys = data->phys; if (!kvm_block_mapping_supported(addr, end, phys, level)) @@ -410,14 +410,14 @@ static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level, return true; } -static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, +static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_pte_t *old, enum kvm_pgtable_walk_flags flag, void * const arg) { kvm_pte_t *childp; struct hyp_map_data *data = arg; struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops; - if (hyp_map_walker_try_leaf(addr, end, level, ptep, arg)) + if (hyp_map_walker_try_leaf(addr, end, level, ptep, *old, arg)) return 0; if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1)) @@ -461,10 +461,10 @@ struct hyp_unmap_data { struct kvm_pgtable_mm_ops *mm_ops; }; -static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, +static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_pte_t *old, enum kvm_pgtable_walk_flags flag, void * const arg) { - kvm_pte_t pte = *ptep, *childp = NULL; + kvm_pte_t pte = *old, *childp = NULL; u64 granule = kvm_granule_size(level); struct hyp_unmap_data *data = arg; struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops; @@ -537,11 +537,11 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits, return 0; } -static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, +static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_pte_t *old, enum kvm_pgtable_walk_flags flag, void * const arg) { struct kvm_pgtable_mm_ops *mm_ops = arg; - kvm_pte_t pte = *ptep; + kvm_pte_t pte = *old; if (!kvm_pte_valid(pte)) return 0; @@ -723,10 +723,10 @@ static bool stage2_leaf_mapping_allowed(u64 addr, u64 end, u32 level, } static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level, - kvm_pte_t *ptep, + kvm_pte_t *ptep, kvm_pte_t old, struct stage2_map_data *data) { - kvm_pte_t new, old = *ptep; + kvm_pte_t new; u64 granule = kvm_granule_size(level), phys = data->phys; struct kvm_pgtable *pgt = data->mmu->pgt; struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops; @@ -772,11 +772,11 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, struct stage2_map_data *data); static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level, - kvm_pte_t *ptep, + kvm_pte_t *ptep, kvm_pte_t *old, struct stage2_map_data *data) { struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops; - kvm_pte_t *childp = kvm_pte_follow(*ptep, mm_ops); + kvm_pte_t *childp = kvm_pte_follow(*old, mm_ops); struct kvm_pgtable *pgt = data->mmu->pgt; int ret; @@ -801,13 +801,14 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level, } static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, - struct stage2_map_data *data) + kvm_pte_t *old, struct stage2_map_data *data) { struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops; - kvm_pte_t *childp, pte = *ptep; + kvm_pte_t *childp, pte = *old; int ret; - ret = stage2_map_walker_try_leaf(addr, end, level, ptep, data); + ret = stage2_map_walker_try_leaf(addr, end, level, ptep, pte, data); + if (ret != -E2BIG) return ret; @@ -844,16 +845,16 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, * Otherwise, the LEAF callback performs the mapping at the existing leaves * instead. */ -static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, +static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_pte_t *old, enum kvm_pgtable_walk_flags flag, void * const arg) { struct stage2_map_data *data = arg; switch (flag) { case KVM_PGTABLE_WALK_TABLE_PRE: - return stage2_map_walk_table_pre(addr, end, level, ptep, data); + return stage2_map_walk_table_pre(addr, end, level, ptep, old, data); case KVM_PGTABLE_WALK_LEAF: - return stage2_map_walk_leaf(addr, end, level, ptep, data); + return stage2_map_walk_leaf(addr, end, level, ptep, old, data); default: return -EINVAL; } @@ -918,13 +919,13 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size, } static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, - enum kvm_pgtable_walk_flags flag, + kvm_pte_t *old, enum kvm_pgtable_walk_flags flag, void * const arg) { struct kvm_pgtable *pgt = arg; struct kvm_s2_mmu *mmu = pgt->mmu; struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops; - kvm_pte_t pte = *ptep, *childp = NULL; + kvm_pte_t pte = *old, *childp = NULL; bool need_flush = false; if (!kvm_pte_valid(pte)) { @@ -981,10 +982,10 @@ struct stage2_attr_data { }; static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, - enum kvm_pgtable_walk_flags flag, + kvm_pte_t *old, enum kvm_pgtable_walk_flags flag, void * const arg) { - kvm_pte_t pte = *ptep; + kvm_pte_t pte = *old; struct stage2_attr_data *data = arg; struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops; @@ -1007,7 +1008,7 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, * stage-2 PTE if we are going to add executable permission. */ if (mm_ops->icache_inval_pou && - stage2_pte_executable(pte) && !stage2_pte_executable(*ptep)) + stage2_pte_executable(pte) && !stage2_pte_executable(data->pte)) mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops), kvm_granule_size(level)); WRITE_ONCE(*ptep, pte); @@ -1109,12 +1110,12 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr, } static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, - enum kvm_pgtable_walk_flags flag, + kvm_pte_t *old, enum kvm_pgtable_walk_flags flag, void * const arg) { struct kvm_pgtable *pgt = arg; struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops; - kvm_pte_t pte = *ptep; + kvm_pte_t pte = *old; if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte)) return 0; @@ -1169,11 +1170,11 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu, } static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, - enum kvm_pgtable_walk_flags flag, + kvm_pte_t *old, enum kvm_pgtable_walk_flags flag, void * const arg) { struct kvm_pgtable_mm_ops *mm_ops = arg; - kvm_pte_t pte = *ptep; + kvm_pte_t pte = *old; if (!stage2_pte_is_counted(pte)) return 0; From patchwork Tue Aug 30 19:41:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12959811 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 092AEECAAD5 for ; Tue, 30 Aug 2022 19:42:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231489AbiH3Tm0 (ORCPT ); Tue, 30 Aug 2022 15:42:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37016 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231300AbiH3TmS (ORCPT ); Tue, 30 Aug 2022 15:42:18 -0400 Received: from out1.migadu.com (out1.migadu.com [IPv6:2001:41d0:2:863f::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D9967B7AC; Tue, 30 Aug 2022 12:42:11 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1661888530; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OM9zxSJJuTKTuFSKGRv27d7vX1yV4xi18G4afH0SBng=; b=xVd55y2ntnEMM6i/pBXAEUbvLk9n9/Xp1W3X2aFoztrKb5WjPyoibjUuU7AjudHvyb58FU UxrnI3KP0KNlEWOrDD7iaQKuIadCBJKsIhFQNCbOUcc0BbDaPb7agTa3yUrWAsW4hR3Dzw XaNwFLeQglXkmROtM7gbsywFkgDY0LY= From: Oliver Upton To: Marc Zyngier , James Morse , Alexandru Elisei , Suzuki K Poulose , Catalin Marinas , Will Deacon , Quentin Perret , Ricardo Koller , Reiji Watanabe , David Matlack , Ben Gardon , Paolo Bonzini , Gavin Shan , Peter Xu , Sean Christopherson , Oliver Upton Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 05/14] KVM: arm64: Split init and set for table PTE Date: Tue, 30 Aug 2022 19:41:23 +0000 Message-Id: <20220830194132.962932-6-oliver.upton@linux.dev> In-Reply-To: <20220830194132.962932-1-oliver.upton@linux.dev> References: <20220830194132.962932-1-oliver.upton@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Create a helper to initialize a stage-2 table and directly call smp_store_release() to install it. A subsequent change to KVM will tweak the way we traverse the page tables, requiring that the visitor callbacks steer the walker down a newly installed table. Furthermore, when stage-2 faults are serviced in parallel the PTE must be considered volatile, so walkers will need to stash a pointer to the new table. Signed-off-by: Oliver Upton --- arch/arm64/kvm/hyp/pgtable.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 430753fbb727..331f6e3b2c20 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -142,16 +142,13 @@ static void kvm_clear_pte(kvm_pte_t *ptep) WRITE_ONCE(*ptep, 0); } -static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp, - struct kvm_pgtable_mm_ops *mm_ops) +static kvm_pte_t kvm_init_table_pte(kvm_pte_t *childp, struct kvm_pgtable_mm_ops *mm_ops) { - kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp)); + kvm_pte_t pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp)); pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE); pte |= KVM_PTE_VALID; - - WARN_ON(kvm_pte_valid(old)); - smp_store_release(ptep, pte); + return pte; } static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level) @@ -413,7 +410,7 @@ static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *pte static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_pte_t *old, enum kvm_pgtable_walk_flags flag, void * const arg) { - kvm_pte_t *childp; + kvm_pte_t *childp, new; struct hyp_map_data *data = arg; struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops; @@ -427,8 +424,10 @@ static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_pte if (!childp) return -ENOMEM; - kvm_set_table_pte(ptep, childp, mm_ops); + new = kvm_init_table_pte(childp, mm_ops); mm_ops->get_page(ptep); + smp_store_release(ptep, new); + return 0; } @@ -804,7 +803,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_pte_t *old, struct stage2_map_data *data) { struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops; - kvm_pte_t *childp, pte = *old; + kvm_pte_t *childp, pte = *old, new; int ret; ret = stage2_map_walker_try_leaf(addr, end, level, ptep, pte, data); @@ -830,8 +829,9 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, if (stage2_pte_is_counted(pte)) stage2_put_pte(ptep, data->mmu, addr, level, mm_ops); - kvm_set_table_pte(ptep, childp, mm_ops); + new = kvm_init_table_pte(childp, mm_ops); mm_ops->get_page(ptep); + smp_store_release(ptep, new); return 0; } From patchwork Tue Aug 30 19:41:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12959812 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9ED93ECAAD5 for ; Tue, 30 Aug 2022 19:42:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231599AbiH3Tmk (ORCPT ); Tue, 30 Aug 2022 15:42:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36382 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231477AbiH3TmW (ORCPT ); Tue, 30 Aug 2022 15:42:22 -0400 Received: from out1.migadu.com (out1.migadu.com [91.121.223.63]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6685C7C30F; Tue, 30 Aug 2022 12:42:16 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1661888534; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yispcYoWHwz6WUZiyFc9UJ/2hA2X4vwfQ9rtB9AplkE=; b=QKKwh0mZHI3VDDYssNZqIJUh8qzZEqcGK2T3WlXJOjZzzlmkUOBcLFJw1Cstm2IhK1GFIz 0JosO5MJyt2ok6BmAv66t0Zcb9QGbJ1ROXv1n/pZjIFhAB7qdZSTbe5HfblFcESL/uykDN qbUixnxPOgyPrKiBk/pc54SO8HjqpOI= From: Oliver Upton To: Marc Zyngier , James Morse , Alexandru Elisei , Suzuki K Poulose , Catalin Marinas , Will Deacon , Quentin Perret , Ricardo Koller , Reiji Watanabe , David Matlack , Ben Gardon , Paolo Bonzini , Gavin Shan , Peter Xu , Sean Christopherson , Oliver Upton Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 06/14] KVM: arm64: Return next table from map callbacks Date: Tue, 30 Aug 2022 19:41:24 +0000 Message-Id: <20220830194132.962932-7-oliver.upton@linux.dev> In-Reply-To: <20220830194132.962932-1-oliver.upton@linux.dev> References: <20220830194132.962932-1-oliver.upton@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The map walkers install new page tables during their traversal. Return the newly-installed table PTE from the map callbacks to point the walker at the new table w/o rereading the ptep. Signed-off-by: Oliver Upton --- arch/arm64/kvm/hyp/pgtable.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 331f6e3b2c20..f911509e6512 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -202,13 +202,12 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data, if (!table && (flags & KVM_PGTABLE_WALK_LEAF)) { ret = kvm_pgtable_visitor_cb(data, addr, level, ptep, &pte, KVM_PGTABLE_WALK_LEAF); - pte = *ptep; - table = kvm_pte_table(pte, level); } if (ret) goto out; + table = kvm_pte_table(pte, level); if (!table) { data->addr = ALIGN_DOWN(data->addr, kvm_granule_size(level)); data->addr += kvm_granule_size(level); @@ -427,6 +426,7 @@ static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_pte new = kvm_init_table_pte(childp, mm_ops); mm_ops->get_page(ptep); smp_store_release(ptep, new); + *old = new; return 0; } @@ -768,7 +768,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level, } static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, - struct stage2_map_data *data); + kvm_pte_t *old, struct stage2_map_data *data); static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_pte_t *old, @@ -791,7 +791,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level, */ kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu); - ret = stage2_map_walk_leaf(addr, end, level, ptep, data); + ret = stage2_map_walk_leaf(addr, end, level, ptep, old, data); mm_ops->put_page(ptep); mm_ops->free_removed_table(childp, level + 1, pgt); @@ -832,6 +832,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, new = kvm_init_table_pte(childp, mm_ops); mm_ops->get_page(ptep); smp_store_release(ptep, new); + *old = new; return 0; } From patchwork Tue Aug 30 19:41:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12959814 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41FB2ECAAD5 for ; Tue, 30 Aug 2022 19:42:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231667AbiH3Tmz (ORCPT ); Tue, 30 Aug 2022 15:42:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37622 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231564AbiH3Tmi (ORCPT ); Tue, 30 Aug 2022 15:42:38 -0400 Received: from out1.migadu.com (out1.migadu.com [IPv6:2001:41d0:2:863f::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 558BB7AC32; Tue, 30 Aug 2022 12:42:19 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1661888538; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fQeI6zpamXufaG6UtIBPMTAxGRkDzaVjmSqbL7fqmA4=; b=D9h9JwBXgp4fk3oXEgNrHS+gL2418x+oOQGDIhu15jN7Y5D9JRtyE3eEsyuBeRj9CrBaD2 A90lb39Q6XzojxcGyb2Go/11MI9HEgqpfAGScUJOzE/yMG2o6XrcZg782/2VeGVGKzCubS V8ZWLLP9SU9I0vP7yG914m7jhzl30+0= From: Oliver Upton To: Marc Zyngier , James Morse , Alexandru Elisei , Suzuki K Poulose , Catalin Marinas , Will Deacon , Quentin Perret , Ricardo Koller , Reiji Watanabe , David Matlack , Ben Gardon , Paolo Bonzini , Gavin Shan , Peter Xu , Sean Christopherson , Oliver Upton Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 07/14] KVM: arm64: Document behavior of pgtable visitor callback Date: Tue, 30 Aug 2022 19:41:25 +0000 Message-Id: <20220830194132.962932-8-oliver.upton@linux.dev> In-Reply-To: <20220830194132.962932-1-oliver.upton@linux.dev> References: <20220830194132.962932-1-oliver.upton@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The argument list to kvm_pgtable_visitor_fn_t has gotten rather long. Additionally, @old serves as both an input and output parameter, which isn't easily discerned from the declaration alone. Document the meaning of the visitor callback arguments and the conditions under which @old was written to. Signed-off-by: Oliver Upton --- arch/arm64/include/asm/kvm_pgtable.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index 47920ae3f7e7..78fbb7be1af6 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -194,6 +194,22 @@ enum kvm_pgtable_walk_flags { KVM_PGTABLE_WALK_TABLE_POST = BIT(2), }; +/** + * kvm_pgtable_visitor_fn_t - Page table traversal callback for visiting a PTE. + * @addr: Input address (IA) mapped by the PTE. + * @end: IA corresponding to the end of the page table traversal range. + * @ptep: Pointer to the PTE. + * @old: Value of the PTE observed by the visitor. Also used as an output + * parameter for returning the new PTE value. + * @flag: Flag identifying the entry type visited. + * @arg: Argument passed to the callback function. + * + * Callback function signature invoked during page table traversal. Optionally + * returns the new value of the PTE via @old if the new value requires further + * traversal (i.e. installing a new table). + * + * Return: 0 on success, negative error code on failure. + */ typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_pte_t *old, enum kvm_pgtable_walk_flags flag, From patchwork Tue Aug 30 19:41:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12959813 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB5E5ECAAA1 for ; Tue, 30 Aug 2022 19:42:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231641AbiH3Tmy (ORCPT ); Tue, 30 Aug 2022 15:42:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231555AbiH3Tmi (ORCPT ); Tue, 30 Aug 2022 15:42:38 -0400 Received: from out1.migadu.com (out1.migadu.com [IPv6:2001:41d0:2:863f::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 598047B7AF; Tue, 30 Aug 2022 12:42:24 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1661888542; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PAa+cBkl9mtwNYK8/w+PcWpN4aBBuzKZQpA5735hA3U=; b=ZLs5MEMDu72sPbf1oddAbPXFBK5HqyrH0LLuSMgJABWcVJhbffI94ChKh6fxlG4PUKb7Vt fp76TAzFIbxRDAKaCkX+hrkh4G3NnP0ojf+1rETVa0eO6qJqWT0Ozz+ovLennkecVRUXjy TzWk0l+KpUzy6uyX60KuJjkrZ/YIBkM= From: Oliver Upton To: Marc Zyngier , James Morse , Alexandru Elisei , Suzuki K Poulose , Catalin Marinas , Will Deacon , Quentin Perret , Ricardo Koller , Reiji Watanabe , David Matlack , Ben Gardon , Paolo Bonzini , Gavin Shan , Peter Xu , Sean Christopherson , Oliver Upton Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 08/14] KVM: arm64: Protect page table traversal with RCU Date: Tue, 30 Aug 2022 19:41:26 +0000 Message-Id: <20220830194132.962932-9-oliver.upton@linux.dev> In-Reply-To: <20220830194132.962932-1-oliver.upton@linux.dev> References: <20220830194132.962932-1-oliver.upton@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The use of RCU is necessary to change the paging structures in parallel. Acquire and release an RCU read lock when traversing the page tables. Signed-off-by: Oliver Upton --- arch/arm64/include/asm/kvm_pgtable.h | 19 ++++++++++++++++++- arch/arm64/kvm/hyp/pgtable.c | 7 ++++++- 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index 78fbb7be1af6..7d2de0a98ccb 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -578,9 +578,26 @@ enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte); */ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte); +#if defined(__KVM_NVHE_HYPERVISOR___) + +static inline void kvm_pgtable_walk_begin(void) {} +static inline void kvm_pgtable_walk_end(void) {} + +#define kvm_dereference_ptep rcu_dereference_raw + +#else /* !defined(__KVM_NVHE_HYPERVISOR__) */ + +#define kvm_pgtable_walk_begin rcu_read_lock +#define kvm_pgtable_walk_end rcu_read_unlock +#define kvm_dereference_ptep rcu_dereference + +#endif /* defined(__KVM_NVHE_HYPERVISOR__) */ + static inline kvm_pte_t kvm_pte_read(kvm_pte_t *ptep) { - return READ_ONCE(*ptep); + kvm_pte_t __rcu *p = (kvm_pte_t __rcu *)ptep; + + return READ_ONCE(*kvm_dereference_ptep(p)); } #endif /* __ARM64_KVM_PGTABLE_H__ */ diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index f911509e6512..215a14c434ed 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -284,8 +284,13 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size, .end = PAGE_ALIGN(walk_data.addr + size), .walker = walker, }; + int r; - return _kvm_pgtable_walk(&walk_data); + kvm_pgtable_walk_begin(); + r = _kvm_pgtable_walk(&walk_data); + kvm_pgtable_walk_end(); + + return r; } struct leaf_walk_data { From patchwork Tue Aug 30 19:41:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12959815 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47753ECAAD5 for ; Tue, 30 Aug 2022 19:43:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231403AbiH3Tm6 (ORCPT ); Tue, 30 Aug 2022 15:42:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231310AbiH3Tmk (ORCPT ); Tue, 30 Aug 2022 15:42:40 -0400 Received: from out1.migadu.com (out1.migadu.com [IPv6:2001:41d0:2:863f::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F3EF57C753; Tue, 30 Aug 2022 12:42:27 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1661888545; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NFwSFftgtou/xmW0mHXjm84oZeXHuVM1GyZ5T83/2Wg=; b=uI1s1ijlRaMGjo0jD+nYZDxRHNF+xtPjYfu05N6LNtXIkm3dIie5gtYAIPsiOyuJ6ClfhV ybPxb7Q1Jo+wBTKNu8gdTSY/P1/46syt8d1oBiKjfi4BdrV1c4EXFGM6kN02too/RLt5YY AYtX+JXtuvSVgqwwikvtaPajlrG+7rA= From: Oliver Upton To: Marc Zyngier , James Morse , Alexandru Elisei , Suzuki K Poulose , Catalin Marinas , Will Deacon , Quentin Perret , Ricardo Koller , Reiji Watanabe , David Matlack , Ben Gardon , Paolo Bonzini , Gavin Shan , Peter Xu , Sean Christopherson , Oliver Upton Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 09/14] KVM: arm64: Free removed stage-2 tables in RCU callback Date: Tue, 30 Aug 2022 19:41:27 +0000 Message-Id: <20220830194132.962932-10-oliver.upton@linux.dev> In-Reply-To: <20220830194132.962932-1-oliver.upton@linux.dev> References: <20220830194132.962932-1-oliver.upton@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org There is no real urgency to free a stage-2 subtree that was pruned. Nonetheless, KVM does the tear down in the stage-2 fault path while holding the MMU lock. Free removed stage-2 subtrees after an RCU grace period. To guarantee all stage-2 table pages are freed before killing a VM, add an rcu_barrier() to the flush path. Signed-off-by: Oliver Upton --- arch/arm64/kvm/mmu.c | 35 ++++++++++++++++++++++++++++++++++- 1 file changed, 34 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 91521f4aab97..265951c05879 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -97,6 +97,38 @@ static void *stage2_memcache_zalloc_page(void *arg) return kvm_mmu_memory_cache_alloc(mc); } +#define STAGE2_PAGE_PRIVATE_LEVEL_MASK GENMASK_ULL(2, 0) + +static inline unsigned long stage2_page_private(u32 level, void *arg) +{ + unsigned long pvt = (unsigned long)arg; + + BUILD_BUG_ON(KVM_PGTABLE_MAX_LEVELS > STAGE2_PAGE_PRIVATE_LEVEL_MASK); + WARN_ON_ONCE(pvt & STAGE2_PAGE_PRIVATE_LEVEL_MASK); + + return pvt | level; +} + +static void stage2_free_removed_table_rcu_cb(struct rcu_head *head) +{ + struct page *page = container_of(head, struct page, rcu_head); + unsigned long pvt = page_private(page); + void *arg = (void *)(pvt & ~STAGE2_PAGE_PRIVATE_LEVEL_MASK); + u32 level = (u32)(pvt & STAGE2_PAGE_PRIVATE_LEVEL_MASK); + void *pgtable = page_to_virt(page); + + kvm_pgtable_stage2_free_removed(pgtable, level, arg); +} + +static void stage2_free_removed_table(void *pgtable, u32 level, void *arg) +{ + unsigned long pvt = stage2_page_private(level, arg); + struct page *page = virt_to_page(pgtable); + + set_page_private(page, (unsigned long)pvt); + call_rcu(&page->rcu_head, stage2_free_removed_table_rcu_cb); +} + static void *kvm_host_zalloc_pages_exact(size_t size) { return alloc_pages_exact(size, GFP_KERNEL_ACCOUNT | __GFP_ZERO); @@ -627,7 +659,7 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = { .zalloc_page = stage2_memcache_zalloc_page, .zalloc_pages_exact = kvm_host_zalloc_pages_exact, .free_pages_exact = free_pages_exact, - .free_removed_table = kvm_pgtable_stage2_free_removed, + .free_removed_table = stage2_free_removed_table, .get_page = kvm_host_get_page, .put_page = kvm_host_put_page, .page_count = kvm_host_page_count, @@ -770,6 +802,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu) if (pgt) { kvm_pgtable_stage2_destroy(pgt); kfree(pgt); + rcu_barrier(); } } From patchwork Tue Aug 30 19:50:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12959818 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7AEFECAAD5 for ; Tue, 30 Aug 2022 19:50:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231470AbiH3Tuz (ORCPT ); Tue, 30 Aug 2022 15:50:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229923AbiH3Tux (ORCPT ); Tue, 30 Aug 2022 15:50:53 -0400 Received: from out2.migadu.com (out2.migadu.com [IPv6:2001:41d0:2:aacc::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AB490422E3; Tue, 30 Aug 2022 12:50:51 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1661889050; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wmlaPuCg5CtRxJvAdYOU8R7EmSZufHl35iEQ4YqU9EM=; b=gZwcKwarj6vtKm4xaDadIqmvqzkEqNnWTT5UVK4aQjMzPEiAmWQm88cHA06R3FZQbXgM2G dV6QOVzhnU9Hwbxs1sI9upcn5KUz6nFNtrrD05GTTrR32L6nCWig4wtILoZlwCYlRviJF8 zNj6lMvzeENPIAPn2eR/zKb59oUuyXo= From: Oliver Upton To: Marc Zyngier , James Morse , Alexandru Elisei , Suzuki K Poulose , Oliver Upton , Catalin Marinas , Will Deacon Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, Quentin Perret , Ricardo Koller , Reiji Watanabe , David Matlack , Ben Gardon , Paolo Bonzini , Gavin Shan , Peter Xu , Sean Christopherson , linux-kernel@vger.kernel.org Subject: [PATCH 10/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks Date: Tue, 30 Aug 2022 19:50:36 +0000 Message-Id: <20220830195036.964607-1-oliver.upton@linux.dev> In-Reply-To: <20220830194132.962932-1-oliver.upton@linux.dev> References: <20220830194132.962932-1-oliver.upton@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The stage2 attr walker is already used for parallel walks. Since commit f783ef1c0e82 ("KVM: arm64: Add fast path to handle permission relaxation during dirty logging"), KVM acquires the read lock when write-unprotecting a PTE. However, the walker only uses a simple store to update the PTE. This is safe as the only possible race is with hardware updates to the access flag, which is benign. However, a subsequent change to KVM will allow more changes to the stage 2 page tables to be done in parallel. Prepare the stage 2 attribute walker by performing atomic updates to the PTE when walking in parallel. Signed-off-by: Oliver Upton --- arch/arm64/kvm/hyp/pgtable.c | 28 +++++++++++++++++++++------- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 215a14c434ed..61a4437c8c16 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -691,6 +691,16 @@ static bool stage2_pte_is_counted(kvm_pte_t pte) return kvm_pte_valid(pte) || kvm_invalid_pte_owner(pte); } +static bool stage2_try_set_pte(kvm_pte_t *ptep, kvm_pte_t old, kvm_pte_t new, bool shared) +{ + if (!shared) { + WRITE_ONCE(*ptep, new); + return true; + } + + return cmpxchg(ptep, old, new) == old; +} + static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr, u32 level, struct kvm_pgtable_mm_ops *mm_ops) { @@ -985,6 +995,7 @@ struct stage2_attr_data { kvm_pte_t pte; u32 level; struct kvm_pgtable_mm_ops *mm_ops; + bool shared; }; static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, @@ -1017,7 +1028,9 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, stage2_pte_executable(pte) && !stage2_pte_executable(data->pte)) mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops), kvm_granule_size(level)); - WRITE_ONCE(*ptep, pte); + + if (!stage2_try_set_pte(ptep, data->pte, pte, data->shared)) + return -EAGAIN; } return 0; @@ -1026,7 +1039,7 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr, u64 size, kvm_pte_t attr_set, kvm_pte_t attr_clr, kvm_pte_t *orig_pte, - u32 *level) + u32 *level, bool shared) { int ret; kvm_pte_t attr_mask = KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI; @@ -1034,6 +1047,7 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr, .attr_set = attr_set & attr_mask, .attr_clr = attr_clr & attr_mask, .mm_ops = pgt->mm_ops, + .shared = shared, }; struct kvm_pgtable_walker walker = { .cb = stage2_attr_walker, @@ -1057,14 +1071,14 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size) { return stage2_update_leaf_attrs(pgt, addr, size, 0, KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W, - NULL, NULL); + NULL, NULL, false); } kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr) { kvm_pte_t pte = 0; stage2_update_leaf_attrs(pgt, addr, 1, KVM_PTE_LEAF_ATTR_LO_S2_AF, 0, - &pte, NULL); + &pte, NULL, false); dsb(ishst); return pte; } @@ -1073,7 +1087,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr) { kvm_pte_t pte = 0; stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF, - &pte, NULL); + &pte, NULL, false); /* * "But where's the TLBI?!", you scream. * "Over in the core code", I sigh. @@ -1086,7 +1100,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr) bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr) { kvm_pte_t pte = 0; - stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL); + stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, false); return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF; } @@ -1109,7 +1123,7 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr, if (prot & KVM_PGTABLE_PROT_X) clr |= KVM_PTE_LEAF_ATTR_HI_S2_XN; - ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level); + ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level, true); if (!ret) kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, pgt->mmu, addr, level); return ret; From patchwork Tue Aug 30 19:51:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12959819 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB56CECAAD5 for ; Tue, 30 Aug 2022 19:51:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231650AbiH3Tv2 (ORCPT ); Tue, 30 Aug 2022 15:51:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54324 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231522AbiH3TvX (ORCPT ); Tue, 30 Aug 2022 15:51:23 -0400 Received: from out2.migadu.com (out2.migadu.com [IPv6:2001:41d0:2:aacc::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B8B8772B5E; Tue, 30 Aug 2022 12:51:21 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1661889080; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jyA52n5PXIfZG6yhWBQm3SaPH6T9tI0OWXKlc6krVCE=; b=LUuBibphlFTocfvHxEYwjPEhEbf34nsdxlCj+c68189duPaL4Z3ld/zR4FVqL4zRYfqU/c Ymt4M1PCH0aBBAwlq2co7SkEZPXrLewGCiTfKB4c5BIadgvKQ9jawXUG13aMoTW/tFH4no licZLjpIvirfUDEBqZ8h7/Mu/owfqH4= From: Oliver Upton To: Marc Zyngier , James Morse , Alexandru Elisei , Suzuki K Poulose , Oliver Upton , Catalin Marinas , Will Deacon Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, Quentin Perret , Ricardo Koller , Reiji Watanabe , David Matlack , Ben Gardon , Paolo Bonzini , Gavin Shan , Peter Xu , Sean Christopherson , linux-kernel@vger.kernel.org Subject: [PATCH 11/14] KVM: arm64: Make changes block->table to leaf PTEs parallel-aware Date: Tue, 30 Aug 2022 19:51:01 +0000 Message-Id: <20220830195102.964724-1-oliver.upton@linux.dev> In-Reply-To: <20220830194132.962932-1-oliver.upton@linux.dev> References: <20220830194132.962932-1-oliver.upton@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org In order to service stage-2 faults in parallel, stage-2 table walkers must take exclusive ownership of the PTE being worked on. An additional requirement of the architecture is that software must perform a 'break-before-make' operation when changing the block size used for mapping memory. Roll these two concepts together into helpers for performing a 'break-before-make' sequence. Use a special PTE value to indicate a PTE has been locked by a software walker. Additionally, use an atomic compare-exchange to 'break' the PTE when the stage-2 page tables are possibly shared with another software walker. Elide the DSB + TLBI if the evicted PTE was invalid (and thus not subject to break-before-make). All of the atomics do nothing for now, as the stage-2 walker isn't fully ready to perform parallel walks. Signed-off-by: Oliver Upton --- arch/arm64/kvm/hyp/pgtable.c | 87 +++++++++++++++++++++++++++++++++--- 1 file changed, 82 insertions(+), 5 deletions(-) diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 61a4437c8c16..71ae96608752 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -49,6 +49,12 @@ #define KVM_INVALID_PTE_OWNER_MASK GENMASK(9, 2) #define KVM_MAX_OWNER_ID 1 +/* + * Used to indicate a pte for which a 'break-before-make' sequence is in + * progress. + */ +#define KVM_INVALID_PTE_LOCKED BIT(10) + struct kvm_pgtable_walk_data { struct kvm_pgtable *pgt; struct kvm_pgtable_walker *walker; @@ -586,6 +592,8 @@ struct stage2_map_data { /* Force mappings to page granularity */ bool force_pte; + + bool shared; }; u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift) @@ -691,6 +699,11 @@ static bool stage2_pte_is_counted(kvm_pte_t pte) return kvm_pte_valid(pte) || kvm_invalid_pte_owner(pte); } +static bool stage2_pte_is_locked(kvm_pte_t pte) +{ + return !kvm_pte_valid(pte) && (pte & KVM_INVALID_PTE_LOCKED); +} + static bool stage2_try_set_pte(kvm_pte_t *ptep, kvm_pte_t old, kvm_pte_t new, bool shared) { if (!shared) { @@ -701,6 +714,69 @@ static bool stage2_try_set_pte(kvm_pte_t *ptep, kvm_pte_t old, kvm_pte_t new, bo return cmpxchg(ptep, old, new) == old; } +/** + * stage2_try_break_pte() - Invalidates a pte according to the + * 'break-before-make' requirements of the + * architecture. + * + * @ptep: Pointer to the pte to break + * @old: The previously observed value of the pte + * @addr: IPA corresponding to the pte + * @level: Table level of the pte + * @shared: true if the stage-2 page tables could be shared by multiple software + * walkers + * + * Returns: true if the pte was successfully broken. + * + * If the removed pte was valid, performs the necessary serialization and TLB + * invalidation for the old value. For counted ptes, drops the reference count + * on the containing table page. + */ +static bool stage2_try_break_pte(kvm_pte_t *ptep, kvm_pte_t old, u64 addr, u32 level, + struct stage2_map_data *data) +{ + struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops; + + if (stage2_pte_is_locked(old)) { + /* + * Should never occur if this walker has exclusive access to the + * page tables. + */ + WARN_ON(!data->shared); + return false; + } + + if (!stage2_try_set_pte(ptep, old, KVM_INVALID_PTE_LOCKED, data->shared)) + return false; + + /* + * Perform the appropriate TLB invalidation based on the evicted pte + * value (if any). + */ + if (kvm_pte_table(old, level)) + kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu); + else if (kvm_pte_valid(old)) + kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level); + + if (stage2_pte_is_counted(old)) + mm_ops->put_page(ptep); + + return true; +} + +static void stage2_make_pte(kvm_pte_t *ptep, kvm_pte_t old, kvm_pte_t new, + struct stage2_map_data *data) +{ + struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops; + + WARN_ON(!stage2_pte_is_locked(*ptep)); + + if (stage2_pte_is_counted(new)) + mm_ops->get_page(ptep); + + smp_store_release(ptep, new); +} + static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr, u32 level, struct kvm_pgtable_mm_ops *mm_ops) { @@ -836,17 +912,18 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, if (!childp) return -ENOMEM; + if (!stage2_try_break_pte(ptep, *old, addr, level, data)) { + mm_ops->put_page(childp); + return -EAGAIN; + } + /* * If we've run into an existing block mapping then replace it with * a table. Accesses beyond 'end' that fall within the new table * will be mapped lazily. */ - if (stage2_pte_is_counted(pte)) - stage2_put_pte(ptep, data->mmu, addr, level, mm_ops); - new = kvm_init_table_pte(childp, mm_ops); - mm_ops->get_page(ptep); - smp_store_release(ptep, new); + stage2_make_pte(ptep, *old, new, data); *old = new; return 0; From patchwork Tue Aug 30 19:51:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12959820 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7D81ECAAD5 for ; Tue, 30 Aug 2022 19:51:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231722AbiH3Tvt (ORCPT ); Tue, 30 Aug 2022 15:51:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55310 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231587AbiH3Tvp (ORCPT ); Tue, 30 Aug 2022 15:51:45 -0400 Received: from out0.migadu.com (out0.migadu.com [94.23.1.103]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 45C757E316; Tue, 30 Aug 2022 12:51:44 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1661889102; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BmHJR6N3rI3NqVKRQnsaZiBFli5RmIY4C8X+Uix5thU=; b=eBaq5OcegB50cLDgGEiduR4Tgk5pLMtHfEruHLpo036HKXFf0YTXH2bWlsFFUmU/6mHlgb 4V4yxrZtBF7lSii3YCwzkHciBjKh7H1kMwVqh8gok5JQ6GRUH7NmjinsnU6ud9D2trTwEX A/qAC0OBx34lJZlbrfIE1UPaqv+pv/k= From: Oliver Upton To: Marc Zyngier , James Morse , Alexandru Elisei , Suzuki K Poulose , Oliver Upton , Catalin Marinas , Will Deacon Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, Quentin Perret , Ricardo Koller , Reiji Watanabe , David Matlack , Ben Gardon , Paolo Bonzini , Gavin Shan , Peter Xu , Sean Christopherson , linux-kernel@vger.kernel.org Subject: [PATCH 12/14] KVM: arm64: Make leaf->leaf PTE changes parallel-aware Date: Tue, 30 Aug 2022 19:51:32 +0000 Message-Id: <20220830195132.964800-1-oliver.upton@linux.dev> In-Reply-To: <20220830194132.962932-1-oliver.upton@linux.dev> References: <20220830194132.962932-1-oliver.upton@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Convert stage2_map_walker_try_leaf() to use the new break-before-make helpers, thereby making the handler parallel-aware. As before, avoid the break-before-make if recreating the existing mapping. Additionally, retry execution if another vCPU thread is modifying the same PTE. Signed-off-by: Oliver Upton --- arch/arm64/kvm/hyp/pgtable.c | 26 ++++++++++++-------------- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 71ae96608752..de1d352657d0 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -829,18 +829,17 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level, else new = kvm_init_invalid_leaf_owner(data->owner_id); - if (stage2_pte_is_counted(old)) { - /* - * Skip updating the PTE if we are trying to recreate the exact - * same mapping or only change the access permissions. Instead, - * the vCPU will exit one more time from guest if still needed - * and then go through the path of relaxing permissions. - */ - if (!stage2_pte_needs_update(old, new)) - return -EAGAIN; + /* + * Skip updating the PTE if we are trying to recreate the exact + * same mapping or only change the access permissions. Instead, + * the vCPU will exit one more time from guest if still needed + * and then go through the path of relaxing permissions. + */ + if (!stage2_pte_needs_update(old, new)) + return -EAGAIN; - stage2_put_pte(ptep, data->mmu, addr, level, mm_ops); - } + if (!stage2_try_break_pte(ptep, old, addr, level, data)) + return -EAGAIN; /* Perform CMOs before installation of the guest stage-2 PTE */ if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new)) @@ -850,9 +849,8 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level, if (mm_ops->icache_inval_pou && stage2_pte_executable(new)) mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule); - smp_store_release(ptep, new); - if (stage2_pte_is_counted(new)) - mm_ops->get_page(ptep); + stage2_make_pte(ptep, old, new, data); + if (kvm_phys_is_valid(phys)) data->phys += granule; return 0; From patchwork Tue Aug 30 19:51:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12959827 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93CFFECAAD5 for ; Tue, 30 Aug 2022 19:52:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231770AbiH3TwU (ORCPT ); Tue, 30 Aug 2022 15:52:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55864 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231742AbiH3TwM (ORCPT ); Tue, 30 Aug 2022 15:52:12 -0400 Received: from out1.migadu.com (out1.migadu.com [IPv6:2001:41d0:2:863f::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 664CC7F121; Tue, 30 Aug 2022 12:52:08 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1661889126; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ODGgNehZ2DBuO47qbixsRPCIopuXBTJM37fQBZmKNGY=; b=p3adtOImKuuHlKbTTqLcKwCBydEl0j/c+w/vbd+ofPgw9Mhwk6riSV9oOPI2/zM7kmCPKQ 7kv+idgOio54jiwFe4raiZ5ZbxER28J66uBOxZyU27rNQffLERWmfEyMxVD5FK4YCEDQn4 nAPzOJ8qXbftpvkL9G+bg9Jsd0/3z8o= From: Oliver Upton To: Marc Zyngier , James Morse , Alexandru Elisei , Suzuki K Poulose , Oliver Upton , Catalin Marinas , Will Deacon Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, Quentin Perret , Ricardo Koller , Reiji Watanabe , David Matlack , Ben Gardon , Paolo Bonzini , Gavin Shan , Peter Xu , Sean Christopherson , linux-kernel@vger.kernel.org Subject: [PATCH 13/14] KVM: arm64: Make table->block changes parallel-aware Date: Tue, 30 Aug 2022 19:51:51 +0000 Message-Id: <20220830195151.964912-1-oliver.upton@linux.dev> In-Reply-To: <20220830194132.962932-1-oliver.upton@linux.dev> References: <20220830194132.962932-1-oliver.upton@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org stage2_map_walk_leaf() and friends now handle stage-2 PTEs generically, and perform the correct flush when a table PTE is removed. Additionally, they've been made parallel-aware, using an atomic break to take ownership of the PTE. Stop clearing the PTE in the pre-order callback and instead let stage2_map_walk_leaf() deal with it. Signed-off-by: Oliver Upton --- arch/arm64/kvm/hyp/pgtable.c | 15 +++------------ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index de1d352657d0..92e230e7bf3a 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -871,21 +871,12 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level, if (!stage2_leaf_mapping_allowed(addr, end, level, data)) return 0; - kvm_clear_pte(ptep); - - /* - * Invalidate the whole stage-2, as we may have numerous leaf - * entries below us which would otherwise need invalidating - * individually. - */ - kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu); - ret = stage2_map_walk_leaf(addr, end, level, ptep, old, data); + if (ret) + return ret; - mm_ops->put_page(ptep); mm_ops->free_removed_table(childp, level + 1, pgt); - - return ret; + return 0; } static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, From patchwork Tue Aug 30 19:52:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver Upton X-Patchwork-Id: 12959828 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18B46ECAAD5 for ; Tue, 30 Aug 2022 19:52:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231587AbiH3Twi (ORCPT ); Tue, 30 Aug 2022 15:52:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57016 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231734AbiH3Twd (ORCPT ); Tue, 30 Aug 2022 15:52:33 -0400 Received: from out2.migadu.com (out2.migadu.com [188.165.223.204]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 72CA07F09B; Tue, 30 Aug 2022 12:52:29 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1661889147; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Aaj6aduqJg0qPG7z7QEPBrvjXDSrf093O5MldGzyJW0=; b=GETBblIBmkxAJ4C0aV0FyXIWdykr0NSb97rmSxC7cm9TB+hS6KWX4O93m4eZGwVKCKxaa/ H8m7yaCmeOAuKCdirUXCROdaWetwUGunlab4cdY927z1X2UYORkwuny0Nhby80fYNaW11D 3zsdb9kIDVw4nAvyUC9po0SNDjNSkJU= From: Oliver Upton To: Marc Zyngier , James Morse , Alexandru Elisei , Suzuki K Poulose , Oliver Upton , Catalin Marinas , Will Deacon Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, Quentin Perret , Ricardo Koller , Reiji Watanabe , David Matlack , Ben Gardon , Paolo Bonzini , Gavin Shan , Peter Xu , Sean Christopherson , linux-kernel@vger.kernel.org Subject: [PATCH 14/14] KVM: arm64: Handle stage-2 faults in parallel Date: Tue, 30 Aug 2022 19:52:15 +0000 Message-Id: <20220830195216.964988-1-oliver.upton@linux.dev> In-Reply-To: <20220830194132.962932-1-oliver.upton@linux.dev> References: <20220830194132.962932-1-oliver.upton@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The stage-2 map walker has been made parallel-aware, and as such can be called while only holding the read side of the MMU lock. Rip out the conditional locking in user_mem_abort() and instead grab the read lock. Continue to take the write lock from other callsites to kvm_pgtable_stage2_map(). Signed-off-by: Oliver Upton --- arch/arm64/include/asm/kvm_pgtable.h | 4 +++- arch/arm64/kvm/hyp/nvhe/mem_protect.c | 2 +- arch/arm64/kvm/hyp/pgtable.c | 3 ++- arch/arm64/kvm/mmu.c | 31 ++++++--------------------- 4 files changed, 13 insertions(+), 27 deletions(-) diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index 7d2de0a98ccb..dc839db86a1a 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -355,6 +355,8 @@ void kvm_pgtable_stage2_free_removed(void *pgtable, u32 level, void *arg); * @prot: Permissions and attributes for the mapping. * @mc: Cache of pre-allocated and zeroed memory from which to allocate * page-table pages. + * @shared: true if multiple software walkers could be traversing the tables + * in parallel * * The offset of @addr within a page is ignored, @size is rounded-up to * the next page boundary and @phys is rounded-down to the previous page @@ -376,7 +378,7 @@ void kvm_pgtable_stage2_free_removed(void *pgtable, u32 level, void *arg); */ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys, enum kvm_pgtable_prot prot, - void *mc); + void *mc, bool shared); /** * kvm_pgtable_stage2_set_owner() - Unmap and annotate pages in the IPA space to diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c index 61cf223e0796..924d028af447 100644 --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c @@ -252,7 +252,7 @@ static inline int __host_stage2_idmap(u64 start, u64 end, enum kvm_pgtable_prot prot) { return kvm_pgtable_stage2_map(&host_kvm.pgt, start, end - start, start, - prot, &host_s2_pool); + prot, &host_s2_pool, false); } /* diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 92e230e7bf3a..52ecaaa84b22 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -944,7 +944,7 @@ static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys, enum kvm_pgtable_prot prot, - void *mc) + void *mc, bool shared) { int ret; struct stage2_map_data map_data = { @@ -953,6 +953,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size, .memcache = mc, .mm_ops = pgt->mm_ops, .force_pte = pgt->force_pte_cb && pgt->force_pte_cb(addr, addr + size, prot), + .shared = shared, }; struct kvm_pgtable_walker walker = { .cb = stage2_map_walker, diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 265951c05879..a73adc35cf41 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -840,7 +840,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, write_lock(&kvm->mmu_lock); ret = kvm_pgtable_stage2_map(pgt, addr, PAGE_SIZE, pa, prot, - &cache); + &cache, false); write_unlock(&kvm->mmu_lock); if (ret) break; @@ -1135,7 +1135,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, gfn_t gfn; kvm_pfn_t pfn; bool logging_active = memslot_is_logging(memslot); - bool use_read_lock = false; unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu); unsigned long vma_pagesize, fault_granule; enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R; @@ -1170,8 +1169,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (logging_active) { force_pte = true; vma_shift = PAGE_SHIFT; - use_read_lock = (fault_status == FSC_PERM && write_fault && - fault_granule == PAGE_SIZE); } else { vma_shift = get_vma_page_shift(vma, hva); } @@ -1270,15 +1267,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (exec_fault && device) return -ENOEXEC; - /* - * To reduce MMU contentions and enhance concurrency during dirty - * logging dirty logging, only acquire read lock for permission - * relaxation. - */ - if (use_read_lock) - read_lock(&kvm->mmu_lock); - else - write_lock(&kvm->mmu_lock); + read_lock(&kvm->mmu_lock); pgt = vcpu->arch.hw_mmu->pgt; if (mmu_invalidate_retry(kvm, mmu_seq)) goto out_unlock; @@ -1322,15 +1311,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * permissions only if vma_pagesize equals fault_granule. Otherwise, * kvm_pgtable_stage2_map() should be called to change block size. */ - if (fault_status == FSC_PERM && vma_pagesize == fault_granule) { + if (fault_status == FSC_PERM && vma_pagesize == fault_granule) ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot); - } else { - WARN_ONCE(use_read_lock, "Attempted stage-2 map outside of write lock\n"); - + else ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize, __pfn_to_phys(pfn), prot, - memcache); - } + memcache, true); /* Mark the page dirty only if the fault is handled successfully */ if (writable && !ret) { @@ -1339,10 +1325,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, } out_unlock: - if (use_read_lock) - read_unlock(&kvm->mmu_lock); - else - write_unlock(&kvm->mmu_lock); + read_unlock(&kvm->mmu_lock); kvm_set_pfn_accessed(pfn); kvm_release_pfn_clean(pfn); return ret != -EAGAIN ? ret : 0; @@ -1548,7 +1531,7 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range) */ kvm_pgtable_stage2_map(kvm->arch.mmu.pgt, range->start << PAGE_SHIFT, PAGE_SIZE, __pfn_to_phys(pfn), - KVM_PGTABLE_PROT_R, NULL); + KVM_PGTABLE_PROT_R, NULL, false); return false; }