From patchwork Thu Sep 26 23:17:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163461 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9579217D4 for ; Thu, 26 Sep 2019 23:18:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 74779207E0 for ; Thu, 26 Sep 2019 23:18:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="rRxWbiIG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728882AbfIZXSd (ORCPT ); Thu, 26 Sep 2019 19:18:33 -0400 Received: from mail-pg1-f202.google.com ([209.85.215.202]:52515 "EHLO mail-pg1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728712AbfIZXSd (ORCPT ); Thu, 26 Sep 2019 19:18:33 -0400 Received: by mail-pg1-f202.google.com with SMTP id e15so2326678pgh.19 for ; Thu, 26 Sep 2019 16:18:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=alYv85ii0NvD8Sm9RlADJhYZlkGEwPi7fJVO0bq7mE4=; b=rRxWbiIGWuKCDnOnrQgNuGDQ1dAML0d2NmjR5iKWK7B93mEcRc1Tj+yJCGbD6CvUIj kgMTjzIoqTppKcHJoGi1XjLCw+Qj5RM7H9uCOj2g1jgYW6OZjp/TYreghsYizY4df3qh 8HRRGQjCiJ6i0RY1hNkC2VdOuUX7NFn2hNTtoWb6GaTxVZmiWqvlvkf+scXpdJVnMQBB X1fQvoIC6P0hO2gYxAgq23X+pVJRPLfaX/HxCO+aP8p608LB3FRTApL4UQsyrrDCi1Lt KJEXRxGpeo1aNX/d6dVHu7dq35eXaGl6v4ek3mcVyTG47USG9j4RQNlfRajts/ZarqCk Ihog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=alYv85ii0NvD8Sm9RlADJhYZlkGEwPi7fJVO0bq7mE4=; b=URk3pKsZ+R7h7VQagwnLoC2oyBUX3abSLdJ69/aY821aPBxbDyfU7+Nj5m6rIBDBe7 cgaUszQ9qwOTf3cUkqxRq9c6yLGijmJgi6qLyft/ZOoaX9oi5+Ns6miHjsilSqH5wNKt s7XAlWjcwCOsPEc4VgvSyLFGnVhC4tzm7ynYcTkGqh1EQBt0luRcwAhcHbcqXOJUnYPH y9pGnzM5ThhFAHtsjDaFHhu/tVFF7TJspF1JVbS6JxGpFjXMJB5jeip/9UStGUNY9iNh gRBdSX+Cbeyp7FkaVwM1q0yICDQZKBrDCSZLt6R5rIZS2hlD9f4t1FazpI9fba2/JJoe ThBQ== X-Gm-Message-State: APjAAAVnimH8gN4TlaOrRtrmkFsgiPAObGq5je/muL+JBjYLdLIRIhrl li/hA8+7XC9O4USdebAdtgBJVFcFZ/wiQyY+b0endvrBZJ6rv6wbRjmWD6IdB4pGk5tby+s6Kxb 3gwyrp+K1d0pozNkk/pSnFdEnMl8EK5E2Cr1Z1FwsmS2nnTfbmXEv3f4LIqYr X-Google-Smtp-Source: APXvYqz4p2Iqb5w9GNKy/8Bg8ZBnbKuEpeDJF9FdRqkqMLI+OO5b7Fey5zsyzazLhJW9r/pjWhp2uXBbQGA3 X-Received: by 2002:a65:678a:: with SMTP id e10mr6134185pgr.184.1569539910571; Thu, 26 Sep 2019 16:18:30 -0700 (PDT) Date: Thu, 26 Sep 2019 16:17:57 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-2-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 01/28] kvm: mmu: Separate generating and setting mmio ptes From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Separate the functions for generating MMIO page table entries from the function that inserts them into the paging structure. This refactoring will allow changes to the MMU sychronization model to use atomic compare / exchanges (which are not guaranteed to succeed) instead of a monolithic MMU lock. Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 5269aa057dfa6..781c2ca7455e3 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -390,8 +390,7 @@ static u64 get_mmio_spte_generation(u64 spte) return gen; } -static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn, - unsigned access) +static u64 generate_mmio_pte(struct kvm_vcpu *vcpu, u64 gfn, unsigned access) { u64 gen = kvm_vcpu_memslots(vcpu)->generation & MMIO_SPTE_GEN_MASK; u64 mask = generation_mmio_spte_mask(gen); @@ -403,6 +402,17 @@ static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn, mask |= (gpa & shadow_nonpresent_or_rsvd_mask) << shadow_nonpresent_or_rsvd_mask_len; + return mask; +} + +static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn, + unsigned access) +{ + u64 mask = generate_mmio_pte(vcpu, gfn, access); + unsigned int gen = get_mmio_spte_generation(mask); + + access = mask & ACC_ALL; + trace_mark_mmio_spte(sptep, gfn, access, gen); mmu_spte_set(sptep, mask); } From patchwork Thu Sep 26 23:17:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163463 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D373A14ED for ; Thu, 26 Sep 2019 23:18:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A8B66207E0 for ; Thu, 26 Sep 2019 23:18:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="dmvNTfFR" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728908AbfIZXSf (ORCPT ); Thu, 26 Sep 2019 19:18:35 -0400 Received: from mail-pg1-f201.google.com ([209.85.215.201]:46968 "EHLO mail-pg1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728712AbfIZXSe (ORCPT ); Thu, 26 Sep 2019 19:18:34 -0400 Received: by mail-pg1-f201.google.com with SMTP id f11so2341662pgn.13 for ; Thu, 26 Sep 2019 16:18:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=MsYuIFS52UvWa5jHwwGrflQKqhDfm/npA1PP2h6zMVw=; b=dmvNTfFRAfmgqjRIow3+4tr3+T3CR/V1ZpL0e/6Z+3eYra/ze8Fg1MejzLzKoBk5n5 5imZ3mD0ncRykvBifGIR05f1TkEmXl57G8EOEuLXpCHICAdjc0DLDkm1Wy80OgHlKKNg 4adRmJUvmC1zgHAGOwz6gJqQbVQFrmyJ+YPZ32KzS/bbG7hQPPf6HnmnctwZGTr1hf7r K0/sulvUUZOD6KcmOMe4T1snYHLCgTZ+7qRL06S5YUUk0WbaHRL1Ih5a2ejAdMqXpDOL x8xi0A/6ifJ7A+My9p7gHmKl4bkWAIyMKRIQ4OhRU4j4W6f9GVhcXJO00CD4SJNu/Bwh o2IQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=MsYuIFS52UvWa5jHwwGrflQKqhDfm/npA1PP2h6zMVw=; b=o/9/sHUaOephE0Kbo/cvHbQLchBunKAEuKwSJ/AQ3D6SAGP2kZf6ZcjtZXzdPPPQlN FRUuB42ZCU3lE2oCOlgtCR/H8xxnbGVhAXI76tr1gL7lVtAZDHVQUw1Rf8tfdGWJ5oWR o2vJITMS/UQH4VpZgEkqiZ36iNC3Rq3mdKjipl95UUQEgO1lYYlSQTnFfIDoB5Dy4C71 3vaMLjb2uiF5Jvl78WGQ1dPIe8bWg53kUC79F0pt+Z0TaUJ7JSl/Bun8QhuWxS7eLri4 gY0iQqbXL/m2WkAPnhjxSB2xc2GRnVca7RM7TS4m3WkdUVq1PB7qE4leWHJyXk9ZbYVs b8dA== X-Gm-Message-State: APjAAAXrH6alRUkst6OEI+GmKhLroSPuuyi+iXmwU9/7PmCH76qu+Zm0 6HIxGw7BlqijHSwIRs4o+KWxmUYZCSVkv/KW4e40gxcFFoa9/fgdeUNg7x0ZoKZrDGMzRhls0EU O0hrwvXsbtfgwwsEN0+ddDCCwN57QH+toeEcT95GSf0DeQAwl3OgCcdEsGc9b X-Google-Smtp-Source: APXvYqwkVYPjKqGlcFvLB8KxIMEEpxNz37P80wxlXoS6/iSadDfZ3hRlM4rj6Prr34mh/lvB+VyyaQWoduxh X-Received: by 2002:a63:5050:: with SMTP id q16mr5862350pgl.451.1569539912730; Thu, 26 Sep 2019 16:18:32 -0700 (PDT) Date: Thu, 26 Sep 2019 16:17:58 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-3-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 02/28] kvm: mmu: Separate pte generation from set_spte From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Separate the functions for generating leaf page table entries from the function that inserts them into the paging structure. This refactoring will allow changes to the MMU sychronization model to use atomic compare / exchanges (which are not guaranteed to succeed) instead of a monolithic MMU lock. Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu.c | 93 ++++++++++++++++++++++++++++------------------ 1 file changed, 57 insertions(+), 36 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 781c2ca7455e3..7e5ab9c6e2b09 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2964,21 +2964,15 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn) #define SET_SPTE_WRITE_PROTECTED_PT BIT(0) #define SET_SPTE_NEED_REMOTE_TLB_FLUSH BIT(1) -static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, - unsigned pte_access, int level, - gfn_t gfn, kvm_pfn_t pfn, bool speculative, - bool can_unsync, bool host_writable) +static int generate_pte(struct kvm_vcpu *vcpu, unsigned pte_access, int level, + gfn_t gfn, kvm_pfn_t pfn, u64 old_pte, bool speculative, + bool can_unsync, bool host_writable, bool ad_disabled, + u64 *ptep) { - u64 spte = 0; + u64 pte; int ret = 0; - struct kvm_mmu_page *sp; - - if (set_mmio_spte(vcpu, sptep, gfn, pfn, pte_access)) - return 0; - sp = page_header(__pa(sptep)); - if (sp_ad_disabled(sp)) - spte |= shadow_acc_track_value; + *ptep = 0; /* * For the EPT case, shadow_present_mask is 0 if hardware @@ -2986,36 +2980,39 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, * ACC_USER_MASK and shadow_user_mask are used to represent * read access. See FNAME(gpte_access) in paging_tmpl.h. */ - spte |= shadow_present_mask; + pte = shadow_present_mask; + + if (ad_disabled) + pte |= shadow_acc_track_value; + if (!speculative) - spte |= spte_shadow_accessed_mask(spte); + pte |= spte_shadow_accessed_mask(pte); if (pte_access & ACC_EXEC_MASK) - spte |= shadow_x_mask; + pte |= shadow_x_mask; else - spte |= shadow_nx_mask; + pte |= shadow_nx_mask; if (pte_access & ACC_USER_MASK) - spte |= shadow_user_mask; + pte |= shadow_user_mask; if (level > PT_PAGE_TABLE_LEVEL) - spte |= PT_PAGE_SIZE_MASK; + pte |= PT_PAGE_SIZE_MASK; if (tdp_enabled) - spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn, + pte |= kvm_x86_ops->get_mt_mask(vcpu, gfn, kvm_is_mmio_pfn(pfn)); if (host_writable) - spte |= SPTE_HOST_WRITEABLE; + pte |= SPTE_HOST_WRITEABLE; else pte_access &= ~ACC_WRITE_MASK; if (!kvm_is_mmio_pfn(pfn)) - spte |= shadow_me_mask; + pte |= shadow_me_mask; - spte |= (u64)pfn << PAGE_SHIFT; + pte |= (u64)pfn << PAGE_SHIFT; if (pte_access & ACC_WRITE_MASK) { - /* * Other vcpu creates new sp in the window between * mapping_level() and acquiring mmu-lock. We can @@ -3024,9 +3021,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, */ if (level > PT_PAGE_TABLE_LEVEL && mmu_gfn_lpage_is_disallowed(vcpu, gfn, level)) - goto done; + return 0; - spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE; + pte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE; /* * Optimization: for pte sync, if spte was writable the hash @@ -3034,30 +3031,54 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, * is responsibility of mmu_get_page / kvm_sync_page. * Same reasoning can be applied to dirty page accounting. */ - if (!can_unsync && is_writable_pte(*sptep)) - goto set_pte; + if (!can_unsync && is_writable_pte(old_pte)) { + *ptep = pte; + return 0; + } if (mmu_need_write_protect(vcpu, gfn, can_unsync)) { pgprintk("%s: found shadow page for %llx, marking ro\n", __func__, gfn); - ret |= SET_SPTE_WRITE_PROTECTED_PT; + ret = SET_SPTE_WRITE_PROTECTED_PT; pte_access &= ~ACC_WRITE_MASK; - spte &= ~(PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE); + pte &= ~(PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE); } } - if (pte_access & ACC_WRITE_MASK) { - kvm_vcpu_mark_page_dirty(vcpu, gfn); - spte |= spte_shadow_dirty_mask(spte); - } + if (pte_access & ACC_WRITE_MASK) + pte |= spte_shadow_dirty_mask(pte); if (speculative) - spte = mark_spte_for_access_track(spte); + pte = mark_spte_for_access_track(pte); + + *ptep = pte; + return ret; +} + +static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access, + int level, gfn_t gfn, kvm_pfn_t pfn, bool speculative, + bool can_unsync, bool host_writable) +{ + u64 spte; + int ret; + struct kvm_mmu_page *sp; + + if (set_mmio_spte(vcpu, sptep, gfn, pfn, pte_access)) + return 0; + + sp = page_header(__pa(sptep)); + + ret = generate_pte(vcpu, pte_access, level, gfn, pfn, *sptep, + speculative, can_unsync, host_writable, + sp_ad_disabled(sp), &spte); + if (!spte) + return 0; + + if (spte & PT_WRITABLE_MASK) + kvm_vcpu_mark_page_dirty(vcpu, gfn); -set_pte: if (mmu_spte_update(sptep, spte)) ret |= SET_SPTE_NEED_REMOTE_TLB_FLUSH; -done: return ret; } From patchwork Thu Sep 26 23:17:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163465 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4F00A912 for ; Thu, 26 Sep 2019 23:18:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2C6CF207E0 for ; Thu, 26 Sep 2019 23:18:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="hQMEeARt" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728913AbfIZXSi (ORCPT ); Thu, 26 Sep 2019 19:18:38 -0400 Received: from mail-qk1-f201.google.com ([209.85.222.201]:47831 "EHLO mail-qk1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728924AbfIZXSh (ORCPT ); Thu, 26 Sep 2019 19:18:37 -0400 Received: by mail-qk1-f201.google.com with SMTP id y189so819729qkb.14 for ; Thu, 26 Sep 2019 16:18:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=u7wms7r+9Uw3K98byh8e1nIf7jClvzwZJK4DCYsuOWs=; b=hQMEeARtS7m2tyjEcnwjGolTLGYDGgRW6YGgt4tN9iXnbREIABKj7hSIhoWMmjIEuH PKy4DEcpln7oVvUXLRQjcbhSwoBWl8yfjskUWarkRHThN7Nbv/8ZdWWsgbo0xdunUVsg EmY13pJ7iQmwfBPt9Cl+qdqtdbjJTA11ToDKlfBryGDbD5Bskq1uqg3lhULNV8fZaTnn neCYPUjW0OGVvMtVES97rL6EG43S7XVW2dGlv4jWM+v/EUMk3zbaWfvSFLceJlqjir+y SBJYpLhSeCfyk6Dp76tQ36SWSpWnidqEVPllkoM3hK1GASci/gciwGdd7+ryZXE+OpEi oQgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=u7wms7r+9Uw3K98byh8e1nIf7jClvzwZJK4DCYsuOWs=; b=Ysu3biM/nwR1Hn0nccCdVUi7UTngorO6BKS8mY5thqKeXI8ud8odEFgfi+hjZ7b0ZQ YFVV0fEtlYFV05FyFGwCe1xgPx2XbKBPCuWYnJCRCCfX1NJIF+ybLRRvX4PTNiHo4J4/ BhverEaJatFb1VfCvFAlS19VWUtQZ6wWCDHDvIEZLnxqrI1Mli8KDRX12RjpcYwOkRWd HQb5MfOk0CwO31dZ/Ux8VYf1w2ni2mUIqEU5+1ARleW8w/g4NDblhJYIv8vu1dYdlZ+T nu0RSWS2tA4vlLlMLqiEnDy2IhMX8Yy/CR3E9AkEuul7e0EZME+HM2JcZUnhPYFf8fxl ER7g== X-Gm-Message-State: APjAAAWDqAL/cuqwmpNl2itWoZ3ubW8ciRhwQIC0HTouzeAcu6iGKzlR /NzPM8bThIqeXZTOPoEI6DeKw4gmYDa7nLY5bxGLy2Jqba+HmTNdvWZPNBunvX+xnFDapXfxoRM 4F/b80k1hvoNztEDwZrrsdQKG3/CxL//ellCAbXfW83OybPIHsV6JOsn4v/wu X-Google-Smtp-Source: APXvYqzahzWsTxIn7OQmRfL+FE0xiEYyBoRHz2sC7TQTznm0XQTQzLHHr8VhT0ozV28mLEYsbfpRkX5u+I3D X-Received: by 2002:a0c:b999:: with SMTP id v25mr5361221qvf.80.1569539915106; Thu, 26 Sep 2019 16:18:35 -0700 (PDT) Date: Thu, 26 Sep 2019 16:17:59 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-4-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 03/28] kvm: mmu: Zero page cache memory at allocation time From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Simplify use of the MMU page cache by allocating pages pre-zeroed. This ensures that future code does not accidentally add non-zeroed memory to the paging structure and moves the work of zeroing page page out from under the MMU lock. Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 7e5ab9c6e2b09..1ecd6d51c0ee0 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1037,7 +1037,7 @@ static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache, if (cache->nobjs >= min) return 0; while (cache->nobjs < ARRAY_SIZE(cache->objects)) { - page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT); + page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); if (!page) return cache->nobjs >= min ? 0 : -ENOMEM; cache->objects[cache->nobjs++] = page; @@ -2548,7 +2548,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, if (level > PT_PAGE_TABLE_LEVEL && need_sync) flush |= kvm_sync_pages(vcpu, gfn, &invalid_list); } - clear_page(sp->spt); trace_kvm_mmu_get_page(sp, true); kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush); From patchwork Thu Sep 26 23:18:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163467 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8E27C1800 for ; Thu, 26 Sep 2019 23:18:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6CA45207E0 for ; Thu, 26 Sep 2019 23:18:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="YCq6sGeY" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728924AbfIZXSj (ORCPT ); Thu, 26 Sep 2019 19:18:39 -0400 Received: from mail-pf1-f202.google.com ([209.85.210.202]:33586 "EHLO mail-pf1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728929AbfIZXSi (ORCPT ); Thu, 26 Sep 2019 19:18:38 -0400 Received: by mail-pf1-f202.google.com with SMTP id z4so512666pfn.0 for ; Thu, 26 Sep 2019 16:18:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=eNTPd+DTCd3OebIMAQPAzFyuA2Zn5iBi3EOKt04jDxU=; b=YCq6sGeYsBo0Fs5pjpNrSPiXOqpiSKi6WQqPcN0vHF6UirDvcO/HYn0NJ3EFRIt30p 70y3QvbvZAuojow+ZxeJ6b0Sgh4TBwBKzliiMxVdBWWp3jT3PUKrCd+NbeiwZnQc92Ii 8yoszQAjrwweAD+O4kZIj1r44Gxj7wWBXaAdfX2jgA5lRY31NMowhdNrk/MttNVtgL7A 2W6M48RdFFziw3m4Q3Ln1zgk0LCkLutCmrUoqI6XD49nS08H5xjZ+bejpBUtNJ1tTc20 a6ntVpu908FjJ4hUyDumnJjVzzInPYiimcs1CdNWFPu6bdGJx9gSs+TiBKWHixe8JlCv dNVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=eNTPd+DTCd3OebIMAQPAzFyuA2Zn5iBi3EOKt04jDxU=; b=UG3u1mejHxy98tEj/JyPfUFxHmWtsJjJTn//5c7rLCw+64BAwySdqMPutOfsgnxxIb 4tqrpKJPSI6AaT1WcrJ/KITnDYhta72eKznV9mvprOkt06+VgdIhU74aPulTgfPJywBi LFIAnXbLpN4ipZDou/4rlw3iwzjdr9p3nKqrS6+6eats5ziR1TPgRSoInQ2xRHgXepZM LT7D2HcbSyboRR7UGwmvfKmOWcwANS7r8cL4ZYjfkBXzzVNvaHnoqO8wgTQqNZZZJpcO hh/fbRyItZpnSyn29rraLuE8uwu0qTJ011V2sWMuDNipoypjgoOrGimZAP3EazbP8Wnf Y0Iw== X-Gm-Message-State: APjAAAWuZtzUcSDGPXpkgSv+Yxe83UdysJXxtGmt2gP66UzQq3Ksxj0J 0b5TNXrg5ZLOXoFz+tiDLRSkggBhsKwuP6OOVj2Rg8ilt91LU0UzMJMmFGXoIDUqz9OmmVvKcFc PYhg2Qc9VhfeDRXhIzHsisNDEeoN7uRm4ITtApGaciZUQ7E7JUfzVEfIywjBb X-Google-Smtp-Source: APXvYqxvO5MrPDXceVx9VzMqc0dmKVX6z29ss/HOncVR1XzdETw0B8MMoJ6xsI6HupVbEUGjymKn5P96iP/c X-Received: by 2002:a63:79c4:: with SMTP id u187mr6039593pgc.152.1569539917219; Thu, 26 Sep 2019 16:18:37 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:00 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-5-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 04/28] kvm: mmu: Update the lpages stat atomically From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org In order to pave the way for more concurrent MMU operations, updates to VM-global stats need to be done atomically. Change updates to the lpages stat to be atomic in preparation for the introduction of parallel page fault handling. Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 1ecd6d51c0ee0..56587655aecb9 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1532,7 +1532,7 @@ static bool __drop_large_spte(struct kvm *kvm, u64 *sptep) WARN_ON(page_header(__pa(sptep))->role.level == PT_PAGE_TABLE_LEVEL); drop_spte(kvm, sptep); - --kvm->stat.lpages; + xadd(&kvm->stat.lpages, -1); return true; } @@ -2676,7 +2676,7 @@ static bool mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp, if (is_last_spte(pte, sp->role.level)) { drop_spte(kvm, spte); if (is_large_pte(pte)) - --kvm->stat.lpages; + xadd(&kvm->stat.lpages, -1); } else { child = page_header(pte & PT64_BASE_ADDR_MASK); drop_parent_pte(child, spte); @@ -3134,7 +3134,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access, pgprintk("%s: setting spte %llx\n", __func__, *sptep); trace_kvm_mmu_set_spte(level, gfn, sptep); if (!was_rmapped && is_large_pte(*sptep)) - ++vcpu->kvm->stat.lpages; + xadd(&vcpu->kvm->stat.lpages, 1); if (is_shadow_present_pte(*sptep)) { if (!was_rmapped) { From patchwork Thu Sep 26 23:18:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163469 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 39C78912 for ; Thu, 26 Sep 2019 23:18:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 179E52086A for ; Thu, 26 Sep 2019 23:18:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="pw5S9DMr" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728929AbfIZXSl (ORCPT ); Thu, 26 Sep 2019 19:18:41 -0400 Received: from mail-ua1-f73.google.com ([209.85.222.73]:40396 "EHLO mail-ua1-f73.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728911AbfIZXSk (ORCPT ); Thu, 26 Sep 2019 19:18:40 -0400 Received: by mail-ua1-f73.google.com with SMTP id i7so440589uak.7 for ; Thu, 26 Sep 2019 16:18:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=h1eRr/18Kk3DwWOhz7ufs1Dfy78b771ZeplWcQlEuy0=; b=pw5S9DMrlL3FxkxmKKXdeSlEbtW4+3d8ad9H3DRIteICUVHF+DH9uuOuzI6kkd1YdN LQ8d9olQUNKNHHuImEDck6FIrBJm2B9NK2CDaS4hFlVKiMXVSSYBhwyMgOMNQShI9FAD E3VepwWTQiT7zTWd1uwXSvfefKHyTECdZv9ww0NE3xQaHAHPRAiLEsfpORHyvGmDAWHR S5aGmko5heJ06u0EXYsUNzxC/Ei4n6lpGtbTxRh1k5z2dj4QgF4HyMuYwp7LcPCYisz8 k8jd5h3Hh6gb+zPB9RBiPVPM3RdNdTyl2xntIcrBvl967LbCIAo9JIUapQqlz+nk5i4R T7cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=h1eRr/18Kk3DwWOhz7ufs1Dfy78b771ZeplWcQlEuy0=; b=q0SoCFeh+ONFVIT8RIulUxO6OVYxHzCj0xQfJ+oGiIfA67xUks1Tc0kc6WPELHs1Sv 4PrjhZfZm1MoRcWi3Aytq6xRXmfKBsY1NoK2mwcbEVlVNJSRBXC++3pQnQxcLzdLQX5k WNHm17EnDeAzypne18+mdFYE+yASkIIn9T/0qqJ69eMrPfNEKRjXYzL8CpES5016wnVd 3U2ObZDB+sPEl5ZkxN+Rx+lj+xInIRMvgp8zHaInOHgYYjoG8+GvFwMirDNw/ucZOBeT 2LPcw1NPWwqMWKQypSq1QCTO6ZaPtiOm0Cwu6nEES82S04Jf1VL8MIHK5T8135Z6Qm86 4RMA== X-Gm-Message-State: APjAAAUqZSJNHAzrBUNl94gj763WZif5wFTwlgaf1CtSIArGsiKjCwS6 tXNQliiypU5yUnlMOxSgcCYYJV7mN/v55QBy19ZX4S6jkqHhFjoPwxQzBz0UaDAW4mHYbeGxSap ni3lkwKFy8c7tLS22X4VYaSI4r7SAg1CIpNzCT6S5GxYVRjwsd+2vRznpC425 X-Google-Smtp-Source: APXvYqye69RKA0oX4duAzprh6uQ/W780TnSiSOD7EaE7bJJFuaj5qi4FSLICr4SLFWg9uVkhhwZsjG8dT/7z X-Received: by 2002:a05:6102:224b:: with SMTP id e11mr858013vsb.232.1569539919449; Thu, 26 Sep 2019 16:18:39 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:01 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-6-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 05/28] sched: Add cond_resched_rwlock From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Rescheduling while holding a spin lock is essential for keeping long running kernel operations running smoothly. Add the facility to cond_resched read/write spin locks. RFC_NOTE: The current implementation of this patch set uses a read/write lock to replace the existing MMU spin lock. See the next patch in this series for more on why a read/write lock was chosen, and possible alternatives. Signed-off-by: Ben Gardon --- include/linux/sched.h | 11 +++++++++++ kernel/sched/core.c | 23 +++++++++++++++++++++++ 2 files changed, 34 insertions(+) diff --git a/include/linux/sched.h b/include/linux/sched.h index 70db597d6fd4f..4d1fd96693d9b 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1767,12 +1767,23 @@ static inline int _cond_resched(void) { return 0; } }) extern int __cond_resched_lock(spinlock_t *lock); +extern int __cond_resched_rwlock(rwlock_t *lock, bool write_lock); #define cond_resched_lock(lock) ({ \ ___might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET);\ __cond_resched_lock(lock); \ }) +#define cond_resched_rwlock_read(lock) ({ \ + __might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET); \ + __cond_resched_rwlock(lock, false); \ +}) + +#define cond_resched_rwlock_write(lock) ({ \ + __might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET); \ + __cond_resched_rwlock(lock, true); \ +}) + static inline void cond_resched_rcu(void) { #if defined(CONFIG_DEBUG_ATOMIC_SLEEP) || !defined(CONFIG_PREEMPT_RCU) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f9a1346a5fa95..ba7ed4bed5036 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5663,6 +5663,29 @@ int __cond_resched_lock(spinlock_t *lock) } EXPORT_SYMBOL(__cond_resched_lock); +int __cond_resched_rwlock(rwlock_t *lock, bool write_lock) +{ + int ret = 0; + + lockdep_assert_held(lock); + if (should_resched(PREEMPT_LOCK_OFFSET)) { + if (write_lock) { + write_unlock(lock); + preempt_schedule_common(); + write_lock(lock); + } else { + read_unlock(lock); + preempt_schedule_common(); + read_lock(lock); + } + + ret = 1; + } + + return ret; +} +EXPORT_SYMBOL(__cond_resched_rwlock); + /** * yield - yield the current processor to other threads. * From patchwork Thu Sep 26 23:18:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163471 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 58001912 for ; Thu, 26 Sep 2019 23:18:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1A5962086A for ; Thu, 26 Sep 2019 23:18:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="N9A4UBio" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728953AbfIZXSo (ORCPT ); Thu, 26 Sep 2019 19:18:44 -0400 Received: from mail-qt1-f202.google.com ([209.85.160.202]:36810 "EHLO mail-qt1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728923AbfIZXSn (ORCPT ); Thu, 26 Sep 2019 19:18:43 -0400 Received: by mail-qt1-f202.google.com with SMTP id i10so4045440qtq.3 for ; Thu, 26 Sep 2019 16:18:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=PwnyhLkA6u7J7PG1BVddDEJ6Txr1VsKczG8arla6GfI=; b=N9A4UBioedxIVi3yP6mx0HQykLcAWDHh7FVzyTwf7rDTA0n6mwIjbKFjAaJ2JdDdi/ lv4pWqShSL4aHMMzU7K+quePauYCGRFIp+bTqM1nEQqQwuwYiTx+QwW79N3pzIfz+KJC oyryd3kzhZM4qRLKm10DudQYSrLPkLmVIVMey3Z/rChnB/elV3XbN1pmcjxKSNMyfINc 2hUgEZEsxRrJSRyzHESxbjkOzOYsCQ2jLlhhezhRAAqysu2RYOFtz2cWAbGlCVS0c6Tr WjQjAe5h2kbi8EY/LZwx0GzvQhfF4+9XXprvsf3ooCZ6h8uCuvTkbGenYKOqiDs2qnte uRWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=PwnyhLkA6u7J7PG1BVddDEJ6Txr1VsKczG8arla6GfI=; b=Glp0od9AzYSVi1Mi82JzEOPdDkTTv1ymPproYqvafQyy80Gzh7sHQeaxTLEzJ8GR1Z +Z9b6GvzhxCi8v+Fxg51w3p7oL5YEO3fHclK8NeBgAsKOzmJCA9ic7BavmyjuwU7p0lG Nz75tMBCnLR9RNmOmUSmBqZ0XYAPlZocUwB01qWhL9POI3AzwFD6hu9fy1uFf74kTTBO uq13uMCJsg1PmfEmC8xX/stzL42Km528fdJ05+aes6qh1DC9sTFWVTVTMK0B+zmHKUBL YFwisrcQ+8KAkXJ0tgadA+Jr/SyCVGjh1/b50zVM/odpq4+BwA3R+9gEA2eoNBnvJE7I Y+kg== X-Gm-Message-State: APjAAAXBVnxkxAVFv1sppFNUPmtDrrOEpX+gxVnj8LWtJuyxM7jg7INs 07kuwmnYCg/6NxAeeTVe7uHAv4zfGbTDbvdWxQ7Pg5PcRfIWIc624YOrd2pTRzB3GYKFs1ChjTw TIcfE95A3D8zZr80lmhyFN2k/iKZVrQiWnXCWcgC+n1+mTtO6/nnteMc6lItf X-Google-Smtp-Source: APXvYqwTo4zQcJI8UygdkVON6I9fIXV5H0L9NvdB/yJxNXAT7f+xk3F2exv83KlVILSHZmqWPahJeMuvoEb9 X-Received: by 2002:aed:3fe9:: with SMTP id w38mr7103979qth.180.1569539921785; Thu, 26 Sep 2019 16:18:41 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:02 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-7-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 06/28] kvm: mmu: Replace mmu_lock with a read/write lock From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Replace the KVM MMU spinlock with a read/write lock so that some parts of the MMU can be made more concurrent in future commits by switching some write mode aquisitions to read mode. A read/write lock was chosen over other synchronization options beause it has minimal initial impact: this change simply changes all uses of the MMU spin lock to an MMU read/write lock, in write mode. This change has no effect on the logic of the code and only a small performance penalty. Other, more invasive options were considered for synchronizing access to the paging structures. Sharding the MMU lock to protect 2MB chunks of addresses, as the main MM does, would also work, however it makes acquiring locks for operations on large regions of memory expensive. Further, the parallel page fault handling algorithm introduced later in this series does not require exclusive access to the region of memory for which it is handling a fault. There are several disadvantages to the read/write lock approach: 1. The reader/writer terminology does not apply well to MMU operations. 2. Many operations require exclusive access to a region of memory (often a memslot), but not all of memory. The read/write lock does not facilitate this. 3. Contention between readers and writers can still create problems in the face of long running MMU operations. Despite these issues,the use of a read/write lock facilitates substantial improvements over the monolithic locking scheme. Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu.c | 106 +++++++++++++++++++------------------ arch/x86/kvm/page_track.c | 8 +-- arch/x86/kvm/paging_tmpl.h | 8 +-- arch/x86/kvm/x86.c | 4 +- include/linux/kvm_host.h | 3 +- virt/kvm/kvm_main.c | 34 ++++++------ 6 files changed, 83 insertions(+), 80 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 56587655aecb9..0311d18d9a995 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2446,9 +2446,9 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu, flush |= kvm_sync_page(vcpu, sp, &invalid_list); mmu_pages_clear_parents(&parents); } - if (need_resched() || spin_needbreak(&vcpu->kvm->mmu_lock)) { + if (need_resched()) { kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush); - cond_resched_lock(&vcpu->kvm->mmu_lock); + cond_resched_rwlock_write(&vcpu->kvm->mmu_lock); flush = false; } } @@ -2829,7 +2829,7 @@ void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long goal_nr_mmu_pages) { LIST_HEAD(invalid_list); - spin_lock(&kvm->mmu_lock); + write_lock(&kvm->mmu_lock); if (kvm->arch.n_used_mmu_pages > goal_nr_mmu_pages) { /* Need to free some mmu pages to achieve the goal. */ @@ -2843,7 +2843,7 @@ void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long goal_nr_mmu_pages) kvm->arch.n_max_mmu_pages = goal_nr_mmu_pages; - spin_unlock(&kvm->mmu_lock); + write_unlock(&kvm->mmu_lock); } int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn) @@ -2854,7 +2854,7 @@ int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn) pgprintk("%s: looking for gfn %llx\n", __func__, gfn); r = 0; - spin_lock(&kvm->mmu_lock); + write_lock(&kvm->mmu_lock); for_each_gfn_indirect_valid_sp(kvm, sp, gfn) { pgprintk("%s: gfn %llx role %x\n", __func__, gfn, sp->role.word); @@ -2862,7 +2862,7 @@ int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn) kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list); } kvm_mmu_commit_zap_page(kvm, &invalid_list); - spin_unlock(&kvm->mmu_lock); + write_unlock(&kvm->mmu_lock); return r; } @@ -3578,7 +3578,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code, return r; r = RET_PF_RETRY; - spin_lock(&vcpu->kvm->mmu_lock); + write_lock(&vcpu->kvm->mmu_lock); if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) goto out_unlock; if (make_mmu_pages_available(vcpu) < 0) @@ -3586,8 +3586,9 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code, if (likely(!force_pt_level)) transparent_hugepage_adjust(vcpu, gfn, &pfn, &level); r = __direct_map(vcpu, v, write, map_writable, level, pfn, prefault); + out_unlock: - spin_unlock(&vcpu->kvm->mmu_lock); + write_unlock(&vcpu->kvm->mmu_lock); kvm_release_pfn_clean(pfn); return r; } @@ -3629,7 +3630,7 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, return; } - spin_lock(&vcpu->kvm->mmu_lock); + write_lock(&vcpu->kvm->mmu_lock); for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) if (roots_to_free & KVM_MMU_ROOT_PREVIOUS(i)) @@ -3653,7 +3654,7 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, } kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); - spin_unlock(&vcpu->kvm->mmu_lock); + write_unlock(&vcpu->kvm->mmu_lock); } EXPORT_SYMBOL_GPL(kvm_mmu_free_roots); @@ -3675,31 +3676,31 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) unsigned i; if (vcpu->arch.mmu->shadow_root_level >= PT64_ROOT_4LEVEL) { - spin_lock(&vcpu->kvm->mmu_lock); + write_lock(&vcpu->kvm->mmu_lock); if(make_mmu_pages_available(vcpu) < 0) { - spin_unlock(&vcpu->kvm->mmu_lock); + write_unlock(&vcpu->kvm->mmu_lock); return -ENOSPC; } sp = kvm_mmu_get_page(vcpu, 0, 0, vcpu->arch.mmu->shadow_root_level, 1, ACC_ALL); ++sp->root_count; - spin_unlock(&vcpu->kvm->mmu_lock); + write_unlock(&vcpu->kvm->mmu_lock); vcpu->arch.mmu->root_hpa = __pa(sp->spt); } else if (vcpu->arch.mmu->shadow_root_level == PT32E_ROOT_LEVEL) { for (i = 0; i < 4; ++i) { hpa_t root = vcpu->arch.mmu->pae_root[i]; MMU_WARN_ON(VALID_PAGE(root)); - spin_lock(&vcpu->kvm->mmu_lock); + write_lock(&vcpu->kvm->mmu_lock); if (make_mmu_pages_available(vcpu) < 0) { - spin_unlock(&vcpu->kvm->mmu_lock); + write_unlock(&vcpu->kvm->mmu_lock); return -ENOSPC; } sp = kvm_mmu_get_page(vcpu, i << (30 - PAGE_SHIFT), i << 30, PT32_ROOT_LEVEL, 1, ACC_ALL); root = __pa(sp->spt); ++sp->root_count; - spin_unlock(&vcpu->kvm->mmu_lock); + write_unlock(&vcpu->kvm->mmu_lock); vcpu->arch.mmu->pae_root[i] = root | PT_PRESENT_MASK; } vcpu->arch.mmu->root_hpa = __pa(vcpu->arch.mmu->pae_root); @@ -3732,16 +3733,16 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) MMU_WARN_ON(VALID_PAGE(root)); - spin_lock(&vcpu->kvm->mmu_lock); + write_lock(&vcpu->kvm->mmu_lock); if (make_mmu_pages_available(vcpu) < 0) { - spin_unlock(&vcpu->kvm->mmu_lock); + write_unlock(&vcpu->kvm->mmu_lock); return -ENOSPC; } sp = kvm_mmu_get_page(vcpu, root_gfn, 0, vcpu->arch.mmu->shadow_root_level, 0, ACC_ALL); root = __pa(sp->spt); ++sp->root_count; - spin_unlock(&vcpu->kvm->mmu_lock); + write_unlock(&vcpu->kvm->mmu_lock); vcpu->arch.mmu->root_hpa = root; goto set_root_cr3; } @@ -3769,16 +3770,16 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) if (mmu_check_root(vcpu, root_gfn)) return 1; } - spin_lock(&vcpu->kvm->mmu_lock); + write_lock(&vcpu->kvm->mmu_lock); if (make_mmu_pages_available(vcpu) < 0) { - spin_unlock(&vcpu->kvm->mmu_lock); + write_unlock(&vcpu->kvm->mmu_lock); return -ENOSPC; } sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, PT32_ROOT_LEVEL, 0, ACC_ALL); root = __pa(sp->spt); ++sp->root_count; - spin_unlock(&vcpu->kvm->mmu_lock); + write_unlock(&vcpu->kvm->mmu_lock); vcpu->arch.mmu->pae_root[i] = root | pm_mask; } @@ -3854,17 +3855,17 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu) !smp_load_acquire(&sp->unsync_children)) return; - spin_lock(&vcpu->kvm->mmu_lock); + write_lock(&vcpu->kvm->mmu_lock); kvm_mmu_audit(vcpu, AUDIT_PRE_SYNC); mmu_sync_children(vcpu, sp); kvm_mmu_audit(vcpu, AUDIT_POST_SYNC); - spin_unlock(&vcpu->kvm->mmu_lock); + write_unlock(&vcpu->kvm->mmu_lock); return; } - spin_lock(&vcpu->kvm->mmu_lock); + write_lock(&vcpu->kvm->mmu_lock); kvm_mmu_audit(vcpu, AUDIT_PRE_SYNC); for (i = 0; i < 4; ++i) { @@ -3878,7 +3879,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu) } kvm_mmu_audit(vcpu, AUDIT_POST_SYNC); - spin_unlock(&vcpu->kvm->mmu_lock); + write_unlock(&vcpu->kvm->mmu_lock); } EXPORT_SYMBOL_GPL(kvm_mmu_sync_roots); @@ -4204,7 +4205,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, return r; r = RET_PF_RETRY; - spin_lock(&vcpu->kvm->mmu_lock); + write_lock(&vcpu->kvm->mmu_lock); if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) goto out_unlock; if (make_mmu_pages_available(vcpu) < 0) @@ -4212,8 +4213,9 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, if (likely(!force_pt_level)) transparent_hugepage_adjust(vcpu, gfn, &pfn, &level); r = __direct_map(vcpu, gpa, write, map_writable, level, pfn, prefault); + out_unlock: - spin_unlock(&vcpu->kvm->mmu_lock); + write_unlock(&vcpu->kvm->mmu_lock); kvm_release_pfn_clean(pfn); return r; } @@ -5338,7 +5340,7 @@ static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, */ mmu_topup_memory_caches(vcpu); - spin_lock(&vcpu->kvm->mmu_lock); + write_lock(&vcpu->kvm->mmu_lock); gentry = mmu_pte_write_fetch_gpte(vcpu, &gpa, &bytes); @@ -5374,7 +5376,7 @@ static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, } kvm_mmu_flush_or_zap(vcpu, &invalid_list, remote_flush, local_flush); kvm_mmu_audit(vcpu, AUDIT_POST_PTE_WRITE); - spin_unlock(&vcpu->kvm->mmu_lock); + write_unlock(&vcpu->kvm->mmu_lock); } int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva) @@ -5581,14 +5583,14 @@ slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot, if (iterator.rmap) flush |= fn(kvm, iterator.rmap); - if (need_resched() || spin_needbreak(&kvm->mmu_lock)) { + if (need_resched()) { if (flush && lock_flush_tlb) { kvm_flush_remote_tlbs_with_address(kvm, start_gfn, iterator.gfn - start_gfn + 1); flush = false; } - cond_resched_lock(&kvm->mmu_lock); + cond_resched_rwlock_write(&kvm->mmu_lock); } } @@ -5738,7 +5740,7 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm) * be in active use by the guest. */ if (batch >= BATCH_ZAP_PAGES && - cond_resched_lock(&kvm->mmu_lock)) { + cond_resched_rwlock_write(&kvm->mmu_lock)) { batch = 0; goto restart; } @@ -5771,7 +5773,7 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm) { lockdep_assert_held(&kvm->slots_lock); - spin_lock(&kvm->mmu_lock); + write_lock(&kvm->mmu_lock); trace_kvm_mmu_zap_all_fast(kvm); /* @@ -5794,7 +5796,7 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm) kvm_reload_remote_mmus(kvm); kvm_zap_obsolete_pages(kvm); - spin_unlock(&kvm->mmu_lock); + write_unlock(&kvm->mmu_lock); } static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm) @@ -5831,7 +5833,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) struct kvm_memory_slot *memslot; int i; - spin_lock(&kvm->mmu_lock); + write_lock(&kvm->mmu_lock); for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { slots = __kvm_memslots(kvm, i); kvm_for_each_memslot(memslot, slots) { @@ -5848,7 +5850,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) } } - spin_unlock(&kvm->mmu_lock); + write_unlock(&kvm->mmu_lock); } static bool slot_rmap_write_protect(struct kvm *kvm, @@ -5862,10 +5864,10 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, { bool flush; - spin_lock(&kvm->mmu_lock); + write_lock(&kvm->mmu_lock); flush = slot_handle_all_level(kvm, memslot, slot_rmap_write_protect, false); - spin_unlock(&kvm->mmu_lock); + write_unlock(&kvm->mmu_lock); /* * kvm_mmu_slot_remove_write_access() and kvm_vm_ioctl_get_dirty_log() @@ -5933,10 +5935,10 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm, const struct kvm_memory_slot *memslot) { /* FIXME: const-ify all uses of struct kvm_memory_slot. */ - spin_lock(&kvm->mmu_lock); + write_lock(&kvm->mmu_lock); slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot, kvm_mmu_zap_collapsible_spte, true); - spin_unlock(&kvm->mmu_lock); + write_unlock(&kvm->mmu_lock); } void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, @@ -5944,9 +5946,9 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, { bool flush; - spin_lock(&kvm->mmu_lock); + write_lock(&kvm->mmu_lock); flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty, false); - spin_unlock(&kvm->mmu_lock); + write_unlock(&kvm->mmu_lock); lockdep_assert_held(&kvm->slots_lock); @@ -5967,10 +5969,10 @@ void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm, { bool flush; - spin_lock(&kvm->mmu_lock); + write_lock(&kvm->mmu_lock); flush = slot_handle_large_level(kvm, memslot, slot_rmap_write_protect, false); - spin_unlock(&kvm->mmu_lock); + write_unlock(&kvm->mmu_lock); /* see kvm_mmu_slot_remove_write_access */ lockdep_assert_held(&kvm->slots_lock); @@ -5986,9 +5988,9 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm, { bool flush; - spin_lock(&kvm->mmu_lock); + write_lock(&kvm->mmu_lock); flush = slot_handle_all_level(kvm, memslot, __rmap_set_dirty, false); - spin_unlock(&kvm->mmu_lock); + write_unlock(&kvm->mmu_lock); lockdep_assert_held(&kvm->slots_lock); @@ -6005,19 +6007,19 @@ void kvm_mmu_zap_all(struct kvm *kvm) LIST_HEAD(invalid_list); int ign; - spin_lock(&kvm->mmu_lock); + write_lock(&kvm->mmu_lock); restart: list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) { if (sp->role.invalid && sp->root_count) continue; if (__kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list, &ign)) goto restart; - if (cond_resched_lock(&kvm->mmu_lock)) + if (cond_resched_rwlock_write(&kvm->mmu_lock)) goto restart; } kvm_mmu_commit_zap_page(kvm, &invalid_list); - spin_unlock(&kvm->mmu_lock); + write_unlock(&kvm->mmu_lock); } void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen) @@ -6077,7 +6079,7 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) continue; idx = srcu_read_lock(&kvm->srcu); - spin_lock(&kvm->mmu_lock); + write_lock(&kvm->mmu_lock); if (kvm_has_zapped_obsolete_pages(kvm)) { kvm_mmu_commit_zap_page(kvm, @@ -6090,7 +6092,7 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) kvm_mmu_commit_zap_page(kvm, &invalid_list); unlock: - spin_unlock(&kvm->mmu_lock); + write_unlock(&kvm->mmu_lock); srcu_read_unlock(&kvm->srcu, idx); /* diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c index 3521e2d176f2f..a43f4fa020db2 100644 --- a/arch/x86/kvm/page_track.c +++ b/arch/x86/kvm/page_track.c @@ -188,9 +188,9 @@ kvm_page_track_register_notifier(struct kvm *kvm, head = &kvm->arch.track_notifier_head; - spin_lock(&kvm->mmu_lock); + write_lock(&kvm->mmu_lock); hlist_add_head_rcu(&n->node, &head->track_notifier_list); - spin_unlock(&kvm->mmu_lock); + write_unlock(&kvm->mmu_lock); } EXPORT_SYMBOL_GPL(kvm_page_track_register_notifier); @@ -206,9 +206,9 @@ kvm_page_track_unregister_notifier(struct kvm *kvm, head = &kvm->arch.track_notifier_head; - spin_lock(&kvm->mmu_lock); + write_lock(&kvm->mmu_lock); hlist_del_rcu(&n->node); - spin_unlock(&kvm->mmu_lock); + write_unlock(&kvm->mmu_lock); synchronize_srcu(&head->track_srcu); } EXPORT_SYMBOL_GPL(kvm_page_track_unregister_notifier); diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 7d5cdb3af5943..97903c8dcad16 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -841,7 +841,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, } r = RET_PF_RETRY; - spin_lock(&vcpu->kvm->mmu_lock); + write_lock(&vcpu->kvm->mmu_lock); if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) goto out_unlock; @@ -855,7 +855,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT); out_unlock: - spin_unlock(&vcpu->kvm->mmu_lock); + write_unlock(&vcpu->kvm->mmu_lock); kvm_release_pfn_clean(pfn); return r; } @@ -892,7 +892,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa) return; } - spin_lock(&vcpu->kvm->mmu_lock); + write_lock(&vcpu->kvm->mmu_lock); for_each_shadow_entry_using_root(vcpu, root_hpa, gva, iterator) { level = iterator.level; sptep = iterator.sptep; @@ -925,7 +925,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa) if (!is_shadow_present_pte(*sptep) || !sp->unsync_children) break; } - spin_unlock(&vcpu->kvm->mmu_lock); + write_unlock(&vcpu->kvm->mmu_lock); } static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr, u32 access, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0ed07d8d2caa0..9ecf83da396c9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6376,9 +6376,9 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t cr2, if (vcpu->arch.mmu->direct_map) { unsigned int indirect_shadow_pages; - spin_lock(&vcpu->kvm->mmu_lock); + write_lock(&vcpu->kvm->mmu_lock); indirect_shadow_pages = vcpu->kvm->arch.indirect_shadow_pages; - spin_unlock(&vcpu->kvm->mmu_lock); + write_unlock(&vcpu->kvm->mmu_lock); if (indirect_shadow_pages) kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa)); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index fcb46b3374c60..baed80f8a7f00 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -441,7 +441,8 @@ struct kvm_memslots { }; struct kvm { - spinlock_t mmu_lock; + rwlock_t mmu_lock; + struct mutex slots_lock; struct mm_struct *mm; /* userspace tied to this vm */ struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM]; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e6de3159e682f..9ce067b6882b7 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -356,13 +356,13 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn, int idx; idx = srcu_read_lock(&kvm->srcu); - spin_lock(&kvm->mmu_lock); + write_lock(&kvm->mmu_lock); kvm->mmu_notifier_seq++; if (kvm_set_spte_hva(kvm, address, pte)) kvm_flush_remote_tlbs(kvm); - spin_unlock(&kvm->mmu_lock); + write_unlock(&kvm->mmu_lock); srcu_read_unlock(&kvm->srcu, idx); } @@ -374,7 +374,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, int ret; idx = srcu_read_lock(&kvm->srcu); - spin_lock(&kvm->mmu_lock); + write_lock(&kvm->mmu_lock); /* * The count increase must become visible at unlock time as no * spte can be established without taking the mmu_lock and @@ -387,7 +387,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, if (need_tlb_flush) kvm_flush_remote_tlbs(kvm); - spin_unlock(&kvm->mmu_lock); + write_unlock(&kvm->mmu_lock); ret = kvm_arch_mmu_notifier_invalidate_range(kvm, range->start, range->end, @@ -403,7 +403,7 @@ static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, { struct kvm *kvm = mmu_notifier_to_kvm(mn); - spin_lock(&kvm->mmu_lock); + write_lock(&kvm->mmu_lock); /* * This sequence increase will notify the kvm page fault that * the page that is going to be mapped in the spte could have @@ -417,7 +417,7 @@ static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, * in conjunction with the smp_rmb in mmu_notifier_retry(). */ kvm->mmu_notifier_count--; - spin_unlock(&kvm->mmu_lock); + write_unlock(&kvm->mmu_lock); BUG_ON(kvm->mmu_notifier_count < 0); } @@ -431,13 +431,13 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn, int young, idx; idx = srcu_read_lock(&kvm->srcu); - spin_lock(&kvm->mmu_lock); + write_lock(&kvm->mmu_lock); young = kvm_age_hva(kvm, start, end); if (young) kvm_flush_remote_tlbs(kvm); - spin_unlock(&kvm->mmu_lock); + write_unlock(&kvm->mmu_lock); srcu_read_unlock(&kvm->srcu, idx); return young; @@ -452,7 +452,7 @@ static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, int young, idx; idx = srcu_read_lock(&kvm->srcu); - spin_lock(&kvm->mmu_lock); + write_lock(&kvm->mmu_lock); /* * Even though we do not flush TLB, this will still adversely * affect performance on pre-Haswell Intel EPT, where there is @@ -467,7 +467,7 @@ static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, * more sophisticated heuristic later. */ young = kvm_age_hva(kvm, start, end); - spin_unlock(&kvm->mmu_lock); + write_unlock(&kvm->mmu_lock); srcu_read_unlock(&kvm->srcu, idx); return young; @@ -481,9 +481,9 @@ static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, int young, idx; idx = srcu_read_lock(&kvm->srcu); - spin_lock(&kvm->mmu_lock); + write_lock(&kvm->mmu_lock); young = kvm_test_age_hva(kvm, address); - spin_unlock(&kvm->mmu_lock); + write_unlock(&kvm->mmu_lock); srcu_read_unlock(&kvm->srcu, idx); return young; @@ -632,7 +632,7 @@ static struct kvm *kvm_create_vm(unsigned long type) if (!kvm) return ERR_PTR(-ENOMEM); - spin_lock_init(&kvm->mmu_lock); + rwlock_init(&kvm->mmu_lock); mmgrab(current->mm); kvm->mm = current->mm; kvm_eventfd_init(kvm); @@ -1193,7 +1193,7 @@ int kvm_get_dirty_log_protect(struct kvm *kvm, dirty_bitmap_buffer = kvm_second_dirty_bitmap(memslot); memset(dirty_bitmap_buffer, 0, n); - spin_lock(&kvm->mmu_lock); + write_lock(&kvm->mmu_lock); for (i = 0; i < n / sizeof(long); i++) { unsigned long mask; gfn_t offset; @@ -1209,7 +1209,7 @@ int kvm_get_dirty_log_protect(struct kvm *kvm, kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot, offset, mask); } - spin_unlock(&kvm->mmu_lock); + write_unlock(&kvm->mmu_lock); } if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n)) @@ -1263,7 +1263,7 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm, if (copy_from_user(dirty_bitmap_buffer, log->dirty_bitmap, n)) return -EFAULT; - spin_lock(&kvm->mmu_lock); + write_lock(&kvm->mmu_lock); for (offset = log->first_page, i = offset / BITS_PER_LONG, n = DIV_ROUND_UP(log->num_pages, BITS_PER_LONG); n--; i++, offset += BITS_PER_LONG) { @@ -1286,7 +1286,7 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm, offset, mask); } } - spin_unlock(&kvm->mmu_lock); + write_unlock(&kvm->mmu_lock); return 0; } From patchwork Thu Sep 26 23:18:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163473 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A5FB0912 for ; Thu, 26 Sep 2019 23:18:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7C6DC2086A for ; Thu, 26 Sep 2019 23:18:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="eB2CtsZP" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728957AbfIZXSr (ORCPT ); Thu, 26 Sep 2019 19:18:47 -0400 Received: from mail-pg1-f201.google.com ([209.85.215.201]:38129 "EHLO mail-pg1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728911AbfIZXSq (ORCPT ); Thu, 26 Sep 2019 19:18:46 -0400 Received: by mail-pg1-f201.google.com with SMTP id m1so2350676pgq.5 for ; Thu, 26 Sep 2019 16:18:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=qc2FIJjoMRMnM9iD3YvkjCVVKh2fl8gDEeddE5ycrwI=; b=eB2CtsZP8+WWfjzK2Mpl4c8nXWmwbmKfextssWXG3u4+b6UmMQQVRdkv8OFvtOEKLe Hb/KssEm3NqyDfTSt3x22L4Kr3Ex1lxrIKiYg1Tr37bl8ycteODynwqzhRD36hEHAxpn QWrVh+TJf7RBnSDcesCvpN/CsjlJhY2Brw08Eke5jIWxvfO4n0OC4xfijf1OX/Hf4KrP te9oXyH3bAlUl5Dcev5g1oCoPByBjm6f6heUhHRnlviBoCfH1gMDylnBn7xlMkacn5CL ALTadmCjeBPmIIUa+Bdo++ZiV4a6j4ZbyITRZEScG75hX7c1vQ/TnrxGEXasyEWWC2VL zGyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=qc2FIJjoMRMnM9iD3YvkjCVVKh2fl8gDEeddE5ycrwI=; b=JuvO8Esq5GdAAmmZemrkBfO2HERfDOtU4ACTIUz26VpUe5/cm6VvFdJuztgi+fl4ZN lzE/9rEXifKQb2G+uAP8elpUN0ZNRRJtzaNeoLbZfLxL0+8wxwto4n2yJ1qAz2DExgr/ rtnEjOBF3EpTZE8cdQx8b3APW8dDd7KmzFw8IkUnL335HhKRGQtP7XPnAClC0E5jn7Is lb81c4pRVa/4NnlHKURQbYJ3NX0ERE2pmdvTwTH5awRcVF71uXCnk12UUDDAtB7TUCTL rtTPOdO7OQ2XrOLuD8x0JXP73AnRMLTobduKAV2cqN2CoyGESI+HeofEMRrTrg6s5JbJ CSXA== X-Gm-Message-State: APjAAAXFxrxjCWMFSUafce5Q4ciVbmWIgw3tBFZPf8siaYDfX+apO+5w 3/00wE4qFvjRKEYBYuZeFnFixdRinDN3vvQ9DsfnAUIz3PYLIxfIH4EqPfczhBi/PJ7ca37I3Jj WyVLuyWP2Fyq9yxKnyf+l5DiRvM5llyUQIgIKLLAvhN6bYICLAGhd238w5Lrk X-Google-Smtp-Source: APXvYqz4nMCIPOZJL7l6N5+EC9ZP9HANzvmB+9r4PD6oVlBdu6/3M5JM4WDO99SehT32DscVGIh1NhEqzKiE X-Received: by 2002:a65:68c9:: with SMTP id k9mr6156369pgt.49.1569539923984; Thu, 26 Sep 2019 16:18:43 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:03 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-8-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 07/28] kvm: mmu: Add functions for handling changed PTEs From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The existing bookkeeping done by KVM when a PTE is changed is spread around several functions. This makes it difficult to remember all the stats, bitmaps, and other subsystems that need to be updated whenever a PTE is modified. When a non-leaf PTE is marked non-present or becomes a leaf PTE, page table memory must also be freed. Further, most of the bookkeeping is done before the PTE is actually set. This works well with a monolithic MMU lock, however if changes use atomic compare/exchanges, the bookkeeping cannot be done before the change is made. In either case, there is a short window in which some statistics, e.g. the dirty bitmap will be inconsistent, however consistency is still restored before the MMU lock is released. To simplify the MMU and facilitate the use of atomic operations on PTEs, create functions to handle some of the bookkeeping required as a result of the change. Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu.c | 145 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 145 insertions(+) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 0311d18d9a995..50413f17c7cd0 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -143,6 +143,18 @@ module_param(dbg, bool, 0644); #define SPTE_HOST_WRITEABLE (1ULL << PT_FIRST_AVAIL_BITS_SHIFT) #define SPTE_MMU_WRITEABLE (1ULL << (PT_FIRST_AVAIL_BITS_SHIFT + 1)) +/* + * PTEs in a disconnected page table can be set to DISCONNECTED_PTE to indicate + * to other threads that the page table in which the pte resides is no longer + * connected to the root of a paging structure. + * + * This constant works because it is considered non-present on both AMD and + * Intel CPUs and does not create a L1TF vulnerability because the pfn section + * is zeroed out. PTE bit 57 is available to software, per vol 3, figure 28-1 + * of the Intel SDM and vol 2, figures 5-18 to 5-21 of the AMD APM. + */ +#define DISCONNECTED_PTE (1ull << 57) + #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level) /* make pte_list_desc fit well in cache line */ @@ -555,6 +567,16 @@ static int is_shadow_present_pte(u64 pte) return (pte != 0) && !is_mmio_spte(pte); } +static inline int is_disconnected_pte(u64 pte) +{ + return pte == DISCONNECTED_PTE; +} + +static int is_present_direct_pte(u64 pte) +{ + return is_shadow_present_pte(pte) && !is_disconnected_pte(pte); +} + static int is_large_pte(u64 pte) { return pte & PT_PAGE_SIZE_MASK; @@ -1659,6 +1681,129 @@ static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head) return flush; } +static void handle_changed_pte(struct kvm *kvm, int as_id, gfn_t gfn, + u64 old_pte, u64 new_pte, int level); + +/** + * mark_pte_disconnected - Mark a PTE as part of a disconnected PT + * @kvm: kvm instance + * @as_id: the address space of the paging structure the PTE was a part of + * @gfn: the base GFN that was mapped by the PTE + * @ptep: a pointer to the PTE to be marked disconnected + * @level: the level of the PT this PTE was a part of, when it was part of the + * paging structure + */ +static void mark_pte_disconnected(struct kvm *kvm, int as_id, gfn_t gfn, + u64 *ptep, int level) +{ + u64 old_pte; + + old_pte = xchg(ptep, DISCONNECTED_PTE); + BUG_ON(old_pte == DISCONNECTED_PTE); + + handle_changed_pte(kvm, as_id, gfn, old_pte, DISCONNECTED_PTE, level); +} + +/** + * handle_disconnected_pt - Mark a PT as disconnected and handle associated + * bookkeeping and freeing + * @kvm: kvm instance + * @as_id: the address space of the paging structure the PT was a part of + * @pt_base_gfn: the base GFN that was mapped by the first PTE in the PT + * @pfn: The physical frame number of the disconnected PT page + * @level: the level of the PT, when it was part of the paging structure + * + * Given a pointer to a page table that has been removed from the paging + * structure and its level, recursively free child page tables and mark their + * entries as disconnected. + */ +static void handle_disconnected_pt(struct kvm *kvm, int as_id, + gfn_t pt_base_gfn, kvm_pfn_t pfn, int level) +{ + int i; + gfn_t gfn = pt_base_gfn; + u64 *pt = pfn_to_kaddr(pfn); + + for (i = 0; i < PT64_ENT_PER_PAGE; i++) { + /* + * Mark the PTE as disconnected so that no other thread will + * try to map in an entry there or try to free any child page + * table the entry might have pointed to. + */ + mark_pte_disconnected(kvm, as_id, gfn, &pt[i], level); + + gfn += KVM_PAGES_PER_HPAGE(level); + } + + free_page((unsigned long)pt); +} + +/** + * handle_changed_pte - handle bookkeeping associated with a PTE change + * @kvm: kvm instance + * @as_id: the address space of the paging structure the PTE was a part of + * @gfn: the base GFN that was mapped by the PTE + * @old_pte: The value of the PTE before the atomic compare / exchange + * @new_pte: The value of the PTE after the atomic compare / exchange + * @level: the level of the PT the PTE is part of in the paging structure + * + * Handle bookkeeping that might result from the modification of a PTE. + * This function should be called in the same RCU read critical section as the + * atomic cmpxchg on the pte. This function must be called for all direct pte + * modifications except those which strictly emulate hardware, for example + * setting the dirty bit on a pte. + */ +static void handle_changed_pte(struct kvm *kvm, int as_id, gfn_t gfn, + u64 old_pte, u64 new_pte, int level) +{ + bool was_present = is_present_direct_pte(old_pte); + bool is_present = is_present_direct_pte(new_pte); + bool was_leaf = was_present && is_last_spte(old_pte, level); + bool pfn_changed = spte_to_pfn(old_pte) != spte_to_pfn(new_pte); + int child_level; + + BUG_ON(level > PT64_ROOT_MAX_LEVEL); + BUG_ON(level < PT_PAGE_TABLE_LEVEL); + BUG_ON(gfn % KVM_PAGES_PER_HPAGE(level)); + + /* + * The only times a pte should be changed from a non-present to + * non-present state is when an entry in an unlinked page table is + * marked as a disconnected PTE as part of freeing the page table, + * or an MMIO entry is installed/modified. In these cases there is + * nothing to do. + */ + if (!was_present && !is_present) { + /* + * If this change is not on an MMIO PTE and not setting a PTE + * as disconnected, then it is unexpected. Log the change, + * though it should not impact the guest since both the former + * and current PTEs are nonpresent. + */ + WARN_ON((new_pte != DISCONNECTED_PTE) && + !is_mmio_spte(new_pte)); + return; + } + + if (was_present && !was_leaf && (pfn_changed || !is_present)) { + /* + * The level of the page table being freed is one level lower + * than the level at which it is mapped. + */ + child_level = level - 1; + + /* + * If there was a present non-leaf entry before, and now the + * entry points elsewhere, the lpage stats and dirty logging / + * access tracking status for all the entries the old pte + * pointed to must be updated and the page table pages it + * pointed to must be freed. + */ + handle_disconnected_pt(kvm, as_id, gfn, spte_to_pfn(old_pte), + child_level); + } +} + /** * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages * @kvm: kvm instance From patchwork Thu Sep 26 23:18:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163475 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9C8D214ED for ; Thu, 26 Sep 2019 23:18:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 73C682086A for ; Thu, 26 Sep 2019 23:18:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="dv/hu5RS" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727869AbfIZXSs (ORCPT ); Thu, 26 Sep 2019 19:18:48 -0400 Received: from mail-pg1-f202.google.com ([209.85.215.202]:37256 "EHLO mail-pg1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728941AbfIZXSr (ORCPT ); Thu, 26 Sep 2019 19:18:47 -0400 Received: by mail-pg1-f202.google.com with SMTP id h189so2348410pgc.4 for ; Thu, 26 Sep 2019 16:18:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=k2Dc/bYQQNYo+zhNw9+XsgIjDidPtlvntzaPZkQuDU8=; b=dv/hu5RS3U8yfyOn8gQ5HEIZx9BjA4IB7i370xSi6K0Of0jKZZfX9w1s03R3y00X9x qv60GgMvxfLsF+B+QVrQCBsG+un0NhMFEVjgjR7T6SZkk6Co+mMcDUHSlNdamW90kRRu hjbvNZDYDSiuCfhU2FsUIdW4L63zCvDYF/z/6pHziKsqvFT1p2T4ouKOrxHKsXBbE1Ny pPJgxdLUdfhaQjclSvynS+HASXQmsXFX2mqaV0PRMEOl2kevxuvXzlfeAa9lwiAKZZqA DOYXRTqTxFUubW5o8dgNDr0P+GJvIXalwFliishx/tXBDEc+ph30nnprURg5ZrF26a8E qaeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=k2Dc/bYQQNYo+zhNw9+XsgIjDidPtlvntzaPZkQuDU8=; b=Qy8CZ1fShkX3TEIpN5V5y0Lg4VHll9AwJXOyyiiQvj4kKKQbQmdqmLsUY6Owq9GvWC bMJvbll18ofU4hpN7DxBtdy1nRJpSOBJMTDQuW8l6txoz6WpHRrJh1yVdnjleEf1nKwh s45VngJylQMmJE0/zSHjU8bnOqGMqz/d9MsSXozuRvaCTdeHCyL/QGrbSX8FthwYAtKf Js+ooq6KMa72VF4ZgnIXHzCnmsy6NTpF3ogBvEixRFIGO7Mk7xPOoADySXBzYPqHA/wT VIppP2xDISl43eEh5sVkZN8/ghq+3b3IqNgL5zqwY7+25jD/vrf52Of2eE6etrFJnxHx MF8w== X-Gm-Message-State: APjAAAWI27Hqj6ECe6Ducc3m2mP9hhhDrPkhcDl8e2+VQYQHmhWWQkiN n/L92IfgNYUigJxAQufZmm7Lf94YuRID2wBhgOMG29zUEFpkuc4Soco14Z43IXeJDfICGXGCr4W oZBdFRspdAXB9PFWwR8hGMUybjIKQ57ICPWuFMlQ0pkMrAARmTEksOpp2iJtC X-Google-Smtp-Source: APXvYqyXMt9mVj8RdDe6RNZpP6A9qiv5gcO7SqK8vKWcn6m346ItE0pHlZeqIbkKTpnf9Z/7gwzFW5hJM12M X-Received: by 2002:a63:5f09:: with SMTP id t9mr5994025pgb.51.1569539926203; Thu, 26 Sep 2019 16:18:46 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:04 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-9-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 08/28] kvm: mmu: Init / Uninit the direct MMU From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The direct MMU introduces several new fields that need to be initialized and torn down. Add functions to do that initialization / cleanup. Signed-off-by: Ben Gardon --- arch/x86/include/asm/kvm_host.h | 51 ++++++++---- arch/x86/kvm/mmu.c | 132 +++++++++++++++++++++++++++++--- arch/x86/kvm/x86.c | 16 +++- 3 files changed, 169 insertions(+), 30 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 23edf56cf577c..1f8164c577d50 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -236,6 +236,22 @@ enum { */ #define KVM_APIC_PV_EOI_PENDING 1 +#define HF_GIF_MASK (1 << 0) +#define HF_HIF_MASK (1 << 1) +#define HF_VINTR_MASK (1 << 2) +#define HF_NMI_MASK (1 << 3) +#define HF_IRET_MASK (1 << 4) +#define HF_GUEST_MASK (1 << 5) /* VCPU is in guest-mode */ +#define HF_SMM_MASK (1 << 6) +#define HF_SMM_INSIDE_NMI_MASK (1 << 7) + +#define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE +#define KVM_ADDRESS_SPACE_NUM 2 + +#define kvm_arch_vcpu_memslots_id(vcpu) \ + ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0) +#define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm) + struct kvm_kernel_irq_routing_entry; /* @@ -940,6 +956,24 @@ struct kvm_arch { bool exception_payload_enabled; struct kvm_pmu_event_filter *pmu_event_filter; + + /* + * Whether the direct MMU is enabled for this VM. This contains a + * snapshot of the direct MMU module parameter from when the VM was + * created and remains unchanged for the life of the VM. If this is + * true, direct MMU handler functions will run for various MMU + * operations. + */ + bool direct_mmu_enabled; + /* + * Indicates that the paging structure built by the direct MMU is + * currently the only one in use. If nesting is used, prompting the + * creation of shadow page tables for L2, this will be set to false. + * While this is true, only direct MMU handlers will be run for many + * MMU functions. Ignored if !direct_mmu_enabled. + */ + bool pure_direct_mmu; + hpa_t direct_root_hpa[KVM_ADDRESS_SPACE_NUM]; }; struct kvm_vm_stat { @@ -1255,7 +1289,7 @@ void kvm_mmu_module_exit(void); void kvm_mmu_destroy(struct kvm_vcpu *vcpu); int kvm_mmu_create(struct kvm_vcpu *vcpu); -void kvm_mmu_init_vm(struct kvm *kvm); +int kvm_mmu_init_vm(struct kvm *kvm); void kvm_mmu_uninit_vm(struct kvm *kvm); void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask, @@ -1519,21 +1553,6 @@ enum { TASK_SWITCH_GATE = 3, }; -#define HF_GIF_MASK (1 << 0) -#define HF_HIF_MASK (1 << 1) -#define HF_VINTR_MASK (1 << 2) -#define HF_NMI_MASK (1 << 3) -#define HF_IRET_MASK (1 << 4) -#define HF_GUEST_MASK (1 << 5) /* VCPU is in guest-mode */ -#define HF_SMM_MASK (1 << 6) -#define HF_SMM_INSIDE_NMI_MASK (1 << 7) - -#define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE -#define KVM_ADDRESS_SPACE_NUM 2 - -#define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0) -#define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm) - asmlinkage void kvm_spurious_fault(void); /* diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 50413f17c7cd0..788edbda02f69 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -47,6 +47,10 @@ #include #include "trace.h" +static bool __read_mostly direct_mmu_enabled; +module_param_named(enable_direct_mmu, direct_mmu_enabled, bool, + S_IRUGO | S_IWUSR); + /* * When setting this variable to true it enables Two-Dimensional-Paging * where the hardware walks 2 page tables: @@ -3754,27 +3758,56 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa, *root_hpa = INVALID_PAGE; } +static bool is_direct_mmu_root(struct kvm *kvm, hpa_t root) +{ + int as_id; + + for (as_id = 0; as_id < KVM_ADDRESS_SPACE_NUM; as_id++) + if (root == kvm->arch.direct_root_hpa[as_id]) + return true; + + return false; +} + /* roots_to_free must be some combination of the KVM_MMU_ROOT_* flags */ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, ulong roots_to_free) { int i; LIST_HEAD(invalid_list); - bool free_active_root = roots_to_free & KVM_MMU_ROOT_CURRENT; BUILD_BUG_ON(KVM_MMU_NUM_PREV_ROOTS >= BITS_PER_LONG); - /* Before acquiring the MMU lock, see if we need to do any real work. */ - if (!(free_active_root && VALID_PAGE(mmu->root_hpa))) { - for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) - if ((roots_to_free & KVM_MMU_ROOT_PREVIOUS(i)) && - VALID_PAGE(mmu->prev_roots[i].hpa)) - break; + /* + * Direct MMU paging structures follow the life of the VM, so instead of + * destroying direct MMU paging structure root, simply mark the root + * HPA pointing to it as invalid. + */ + if (vcpu->kvm->arch.direct_mmu_enabled && + roots_to_free & KVM_MMU_ROOT_CURRENT && + is_direct_mmu_root(vcpu->kvm, mmu->root_hpa)) + mmu->root_hpa = INVALID_PAGE; - if (i == KVM_MMU_NUM_PREV_ROOTS) - return; + if (!VALID_PAGE(mmu->root_hpa)) + roots_to_free &= ~KVM_MMU_ROOT_CURRENT; + + for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) { + if (roots_to_free & KVM_MMU_ROOT_PREVIOUS(i)) { + if (is_direct_mmu_root(vcpu->kvm, + mmu->prev_roots[i].hpa)) + mmu->prev_roots[i].hpa = INVALID_PAGE; + if (!VALID_PAGE(mmu->prev_roots[i].hpa)) + roots_to_free &= ~KVM_MMU_ROOT_PREVIOUS(i); + } } + /* + * If there are no valid roots that need freeing at this point, avoid + * acquiring the MMU lock and return. + */ + if (!roots_to_free) + return; + write_lock(&vcpu->kvm->mmu_lock); for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) @@ -3782,7 +3815,7 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, mmu_free_root_page(vcpu->kvm, &mmu->prev_roots[i].hpa, &invalid_list); - if (free_active_root) { + if (roots_to_free & KVM_MMU_ROOT_CURRENT) { if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL && (mmu->root_level >= PT64_ROOT_4LEVEL || mmu->direct_map)) { mmu_free_root_page(vcpu->kvm, &mmu->root_hpa, @@ -3820,7 +3853,12 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) struct kvm_mmu_page *sp; unsigned i; - if (vcpu->arch.mmu->shadow_root_level >= PT64_ROOT_4LEVEL) { + if (vcpu->kvm->arch.direct_mmu_enabled) { + // TODO: Support 5 level paging in the direct MMU + BUG_ON(vcpu->arch.mmu->shadow_root_level > PT64_ROOT_4LEVEL); + vcpu->arch.mmu->root_hpa = vcpu->kvm->arch.direct_root_hpa[ + kvm_arch_vcpu_memslots_id(vcpu)]; + } else if (vcpu->arch.mmu->shadow_root_level >= PT64_ROOT_4LEVEL) { write_lock(&vcpu->kvm->mmu_lock); if(make_mmu_pages_available(vcpu) < 0) { write_unlock(&vcpu->kvm->mmu_lock); @@ -3863,6 +3901,10 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) gfn_t root_gfn, root_cr3; int i; + write_lock(&vcpu->kvm->mmu_lock); + vcpu->kvm->arch.pure_direct_mmu = false; + write_unlock(&vcpu->kvm->mmu_lock); + root_cr3 = vcpu->arch.mmu->get_cr3(vcpu); root_gfn = root_cr3 >> PAGE_SHIFT; @@ -5710,6 +5752,64 @@ void kvm_disable_tdp(void) } EXPORT_SYMBOL_GPL(kvm_disable_tdp); +static bool is_direct_mmu_enabled(void) +{ + if (!READ_ONCE(direct_mmu_enabled)) + return false; + + if (WARN_ONCE(!tdp_enabled, + "Creating a VM with direct MMU enabled requires TDP.")) + return false; + + return true; +} + +static int kvm_mmu_init_direct_mmu(struct kvm *kvm) +{ + struct page *page; + int i; + + if (!is_direct_mmu_enabled()) + return 0; + + /* + * Allocate the direct MMU root pages. These pages follow the life of + * the VM. + */ + for (i = 0; i < ARRAY_SIZE(kvm->arch.direct_root_hpa); i++) { + page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); + if (!page) + goto err; + kvm->arch.direct_root_hpa[i] = page_to_phys(page); + } + + /* This should not be changed for the lifetime of the VM. */ + kvm->arch.direct_mmu_enabled = true; + + kvm->arch.pure_direct_mmu = true; + return 0; +err: + for (i = 0; i < ARRAY_SIZE(kvm->arch.direct_root_hpa); i++) { + if (kvm->arch.direct_root_hpa[i] && + VALID_PAGE(kvm->arch.direct_root_hpa[i])) + free_page((unsigned long)kvm->arch.direct_root_hpa[i]); + kvm->arch.direct_root_hpa[i] = INVALID_PAGE; + } + return -ENOMEM; +} + +static void kvm_mmu_uninit_direct_mmu(struct kvm *kvm) +{ + int i; + + if (!kvm->arch.direct_mmu_enabled) + return; + + for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) + handle_disconnected_pt(kvm, i, 0, + (kvm_pfn_t)(kvm->arch.direct_root_hpa[i] >> PAGE_SHIFT), + PT64_ROOT_4LEVEL); +} /* The return value indicates if tlb flush on all vcpus is needed. */ typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head *rmap_head); @@ -5956,13 +6056,19 @@ static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm, kvm_mmu_zap_all_fast(kvm); } -void kvm_mmu_init_vm(struct kvm *kvm) +int kvm_mmu_init_vm(struct kvm *kvm) { struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker; + int r; + + r = kvm_mmu_init_direct_mmu(kvm); + if (r) + return r; node->track_write = kvm_mmu_pte_write; node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot; kvm_page_track_register_notifier(kvm, node); + return 0; } void kvm_mmu_uninit_vm(struct kvm *kvm) @@ -5970,6 +6076,8 @@ void kvm_mmu_uninit_vm(struct kvm *kvm) struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker; kvm_page_track_unregister_notifier(kvm, node); + + kvm_mmu_uninit_direct_mmu(kvm); } void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9ecf83da396c9..2972b6c6029fb 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9421,6 +9421,8 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) { + int err; + if (type) return -EINVAL; @@ -9450,9 +9452,19 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) kvm_hv_init_vm(kvm); kvm_page_track_init(kvm); - kvm_mmu_init_vm(kvm); + err = kvm_mmu_init_vm(kvm); + if (err) + return err; + + err = kvm_x86_ops->vm_init(kvm); + if (err) + goto error; + + return 0; - return kvm_x86_ops->vm_init(kvm); +error: + kvm_mmu_uninit_vm(kvm); + return err; } static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu) From patchwork Thu Sep 26 23:18:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163477 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9803914ED for ; Thu, 26 Sep 2019 23:18:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 76D2D2086A for ; Thu, 26 Sep 2019 23:18:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="c+cGJoGg" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729011AbfIZXSu (ORCPT ); Thu, 26 Sep 2019 19:18:50 -0400 Received: from mail-pg1-f201.google.com ([209.85.215.201]:36010 "EHLO mail-pg1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728941AbfIZXSt (ORCPT ); Thu, 26 Sep 2019 19:18:49 -0400 Received: by mail-pg1-f201.google.com with SMTP id h36so2354599pgb.3 for ; Thu, 26 Sep 2019 16:18:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=FlkPCmyBxMafc7xPNdBrX6KkTlj7nxb6S27SSYwjwEc=; b=c+cGJoGgC2dOXIcRCQFrjIuxggFV83guIJW6qgAo97v92wA+CvCmDAs5eDSnZuGEKk k15C90v18WhD3T0baKjZ77x35RsqmWx3F9zLFv4AZn/435i44OnvnZSNqP5uYF6QXqNb myLfoxD4QK7krQhhL5Nq4ixBLlkFfh+fNscq/w5RAavmJ0KLBB+MAEMb8YM7u64vhyfN 0QvyyAVUfTjqk7zxfFymmbXnk4phyg55UFVghVzTSnPUYiS5MEn4X1jd9JIpEDsogAn7 HUN1GyQinfK5yR0lGfsrZRgl7FvAWhz9lOuJxjovtEpepfVJgGod9hLMtSP499fD9HHi +Uog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=FlkPCmyBxMafc7xPNdBrX6KkTlj7nxb6S27SSYwjwEc=; b=nZHyvndvgB9dpP/xxGk3j2Og37N99ZLHp+xCxiKFpNzayRICu7h8CvXgHjFJ9ypV2F y1ouSHwzvtvpNaqXhFtw8nYUUwnHjIXJVEV4NJqqPlDg3OJ+wCi1EGeIm0dWd5FQ1NDU IXy7XXaz2hfcmfQs5JUWip2oYCw9sZwocPLJCz3rc9QK73zgNSf4OzmHCWP4Cela7umO 2Y/TEQBOVwt/7h1Td63C4Hy9g023hrl/jRw9tYUsUOWzQ4pVlzMxJGjYqopXT/8dimoI Yu1oIiRdyZs2Sv23H23B/pOqR+E9hXnH71iu8dwYSfrd+eZjhk96Bquj2zCS2+ihFE1y uoGA== X-Gm-Message-State: APjAAAVQpQd3aWyoLyjs9M6A8dpFCOi0EClnuwF0+5OaSB3g3kPtYuKI Rs0+t4UBn7d5vtD8UhLy5f/rLI4yz1C+fm6dRivY3WK4M+mk9+J3i675xmdlXNOVa4U81YN4RS4 96mitPl/nscBFvvUQGZ293l3CmOj91ZnQLcAZzXKdX8xCgx8LCuu6/L654Py+ X-Google-Smtp-Source: APXvYqzRlf0AIBCSNJ8RIZfGoFaBxuKo3SRU4e5x+Rk1ompUil1TXLdH5M09pV6dQaK0yGMUYQNe/QK+rQpb X-Received: by 2002:a63:2808:: with SMTP id o8mr6057944pgo.118.1569539928434; Thu, 26 Sep 2019 16:18:48 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:05 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-10-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 09/28] kvm: mmu: Free direct MMU page table memory in an RCU callback From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The direct walk iterator, introduced in a later commit in this series, uses RCU to ensure that its concurrent access to paging structure memory is safe. This requires that page table memory not be freed until an RCU grace period has elapsed. In order to keep the threads removing page table memory from the paging structure from blocking, free the disonnected page table memory in an RCU callback. Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 788edbda02f69..9fe57ef7baa29 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1685,6 +1685,21 @@ static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head) return flush; } +/* + * This function is called through call_rcu in order to free direct page table + * memory safely, with resepct to other KVM MMU threads that might be operating + * on it. By only accessing direct page table memory in a RCU read critical + * section, and freeing it after a grace period, lockless access to that memory + * won't use it after it is freed. + */ +static void free_pt_rcu_callback(struct rcu_head *rp) +{ + struct page *req = container_of(rp, struct page, rcu_head); + u64 *disconnected_pt = page_address(req); + + free_page((unsigned long)disconnected_pt); +} + static void handle_changed_pte(struct kvm *kvm, int as_id, gfn_t gfn, u64 old_pte, u64 new_pte, int level); @@ -1720,6 +1735,11 @@ static void mark_pte_disconnected(struct kvm *kvm, int as_id, gfn_t gfn, * Given a pointer to a page table that has been removed from the paging * structure and its level, recursively free child page tables and mark their * entries as disconnected. + * + * RCU dereferences are not necessary to protect access to the disconnected + * page table or its children because it has been atomically removed from the + * root of the paging structure, so no other thread will be trying to free the + * memory. */ static void handle_disconnected_pt(struct kvm *kvm, int as_id, gfn_t pt_base_gfn, kvm_pfn_t pfn, int level) @@ -1727,6 +1747,7 @@ static void handle_disconnected_pt(struct kvm *kvm, int as_id, int i; gfn_t gfn = pt_base_gfn; u64 *pt = pfn_to_kaddr(pfn); + struct page *page; for (i = 0; i < PT64_ENT_PER_PAGE; i++) { /* @@ -1739,7 +1760,12 @@ static void handle_disconnected_pt(struct kvm *kvm, int as_id, gfn += KVM_PAGES_PER_HPAGE(level); } - free_page((unsigned long)pt); + /* + * Free the pt page in an RCU callback, once it's safe to do + * so. + */ + page = pfn_to_page(pfn); + call_rcu(&page->rcu_head, free_pt_rcu_callback); } /** From patchwork Thu Sep 26 23:18:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163479 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7B837912 for ; Thu, 26 Sep 2019 23:18:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4E2C320835 for ; Thu, 26 Sep 2019 23:18:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="SW7G1l0E" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729024AbfIZXSy (ORCPT ); Thu, 26 Sep 2019 19:18:54 -0400 Received: from mail-yw1-f74.google.com ([209.85.161.74]:34332 "EHLO mail-yw1-f74.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728947AbfIZXSx (ORCPT ); Thu, 26 Sep 2019 19:18:53 -0400 Received: by mail-yw1-f74.google.com with SMTP id u131so682903ywa.1 for ; Thu, 26 Sep 2019 16:18:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=YTeM/yG0z9GHNFu6KUcb+DaUxituIo1MCdiiAiyqIs0=; b=SW7G1l0E3/jxIMely99rYls+zRqK3Gbmtow7AR/A6/Gz7NMa5fst+XyG1d3R1kLLan i8wWmTcNr8puqVl4nqSIAlQ6jOmK28E9CgeBDwtYt5tSX4xRRt4faQ69p2iRk51HFRW1 HBkkMCqLyHMEeQmU7TGTDhtRsTjptmhxOxV2gsnimMHOrH/M4VInJ8w5UX0IgXdOwjd8 XjVyCsPfY5Vjhy+DWpmv3uQBodnjJv1nNXSb0oo3mwMVnqhdoo1pA2M9Xui9ZsRl/9nf fR+/hsR1PmeGlH3MTS7xzYIYvoTd4BQztoFUqt5wMvfATht3509GIAkQwpUWbCSaqdH8 ZOeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=YTeM/yG0z9GHNFu6KUcb+DaUxituIo1MCdiiAiyqIs0=; b=HsV67cxAdmP5mElv/Y1+eoxFWUSjnSfHEDygvefGnSwvlUbOEzAqzRLjB129ReCOUp vB8u43rGRM5zIGnawe+G8i+09yuslAfmiZWrG4Bh2xn7Us6CvfHoTXMr5ASOv2e+AzmY YorBUj7npkXvwdoTaBR0jWlmkygzejEXTOTOsJnmy/pntmCMgUoAYQ3fYSYYbsJZd5iq Oy4FuNvHf7Y6wH1wCmXmWObwI6kCL8QixpGgJrdJj0J/3O+6uMpR1j0rRxY6j24R5dcE Twjb0CAk2ylSPgZ1qSybZpy52U2AiFkSNWMS7GcyehP7CgYzeC/MpzN/DBee5CYIs0Ac O4UQ== X-Gm-Message-State: APjAAAUAI7jV4aoplE0e5Fyw1eZ5FNdoHPj8z/wzPm2Xf4giSXDS+3G5 MesZAtfoBGsQCmqL1u4Mh7d7Fuqg8HBgpj2DNj1ricLSHtAAg3wpSQDDppJkjuIXBwcA1IVNAHL PSgOHqSiwkJNamxCO/hzoyFWedWlkb1lINRec2wGY8zyDik3h/97WO3Ahbjwg X-Google-Smtp-Source: APXvYqx21m+7OPniJOumsOcNtmlKWsjH5VaiFr1fi0Y7z9PWchrV+fsZu2Fghy6Rn4CRvsQxCQSeB/wywP3l X-Received: by 2002:a81:ca43:: with SMTP id y3mr762810ywk.432.1569539931047; Thu, 26 Sep 2019 16:18:51 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:06 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-11-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 10/28] kvm: mmu: Flush TLBs before freeing direct MMU page table memory From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org If page table memory is freed before a TLB flush, it can result in improper guest access to memory through paging structure caches. Specifically, until a TLB flush, memory that was part of the paging structure could be used by the hardware for address translation if a partial walk leading to it is stored in the paging structure cache. Ensure that there is a TLB flush before page table memory is freed by transferring disconnected pages to a disconnected list, and on a flush transferring a snapshot of the disconnected list to a free list. The free list is processed asynchronously to avoid slowing TLB flushes. Signed-off-by: Ben Gardon --- arch/x86/include/asm/kvm_host.h | 5 ++ arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/mmu.c | 127 ++++++++++++++++++++++++++++++-- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 9 ++- 5 files changed, 136 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 1f8164c577d50..9bf149dce146d 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -974,6 +974,11 @@ struct kvm_arch { */ bool pure_direct_mmu; hpa_t direct_root_hpa[KVM_ADDRESS_SPACE_NUM]; + spinlock_t direct_mmu_disconnected_pts_lock; + struct list_head direct_mmu_disconnected_pts; + spinlock_t direct_mmu_pt_free_list_lock; + struct list_head direct_mmu_pt_free_list; + struct work_struct direct_mmu_free_work; }; struct kvm_vm_stat { diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 840e12583b85b..7c615f3cebf8f 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -45,6 +45,7 @@ config KVM select KVM_GENERIC_DIRTYLOG_READ_PROTECT select KVM_VFIO select SRCU + select HAVE_KVM_ARCH_TLB_FLUSH_ALL ---help--- Support hosting fully virtualized guest machines using hardware virtualization extensions. You will need a fairly recent diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 9fe57ef7baa29..317e9238f17b2 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1700,6 +1700,100 @@ static void free_pt_rcu_callback(struct rcu_head *rp) free_page((unsigned long)disconnected_pt); } +/* + * Takes a snapshot of, and clears, the direct MMU disconnected pt list. Once + * TLBs have been flushed, this snapshot can be transferred to the direct MMU + * PT free list to be freed. + */ +static void direct_mmu_cut_disconnected_pt_list(struct kvm *kvm, + struct list_head *snapshot) +{ + spin_lock(&kvm->arch.direct_mmu_disconnected_pts_lock); + list_splice_tail_init(&kvm->arch.direct_mmu_disconnected_pts, snapshot); + spin_unlock(&kvm->arch.direct_mmu_disconnected_pts_lock); +} + +/* + * Takes a snapshot of, and clears, the direct MMU PT free list and then sets + * each page in the snapshot to be freed after an RCU grace period. + */ +static void direct_mmu_process_pt_free_list(struct kvm *kvm) +{ + LIST_HEAD(free_list); + struct page *page; + struct page *next; + + spin_lock(&kvm->arch.direct_mmu_pt_free_list_lock); + list_splice_tail_init(&kvm->arch.direct_mmu_pt_free_list, &free_list); + spin_unlock(&kvm->arch.direct_mmu_pt_free_list_lock); + + list_for_each_entry_safe(page, next, &free_list, lru) { + list_del(&page->lru); + /* + * Free the pt page in an RCU callback, once it's safe to do + * so. + */ + call_rcu(&page->rcu_head, free_pt_rcu_callback); + } +} + +static void direct_mmu_free_work_fn(struct work_struct *work) +{ + struct kvm *kvm = container_of(work, struct kvm, + arch.direct_mmu_free_work); + + direct_mmu_process_pt_free_list(kvm); +} + +/* + * Propagate a snapshot of the direct MMU disonnected pt list to the direct MMU + * PT free list, after TLBs have been flushed. Schedule work to free the pages + * in the direct MMU PT free list. + */ +static void direct_mmu_process_free_list_async(struct kvm *kvm, + struct list_head *snapshot) +{ + spin_lock(&kvm->arch.direct_mmu_pt_free_list_lock); + list_splice_tail_init(snapshot, &kvm->arch.direct_mmu_pt_free_list); + spin_unlock(&kvm->arch.direct_mmu_pt_free_list_lock); + + schedule_work(&kvm->arch.direct_mmu_free_work); +} + +/* + * To be used during teardown once all VCPUs are paused. Ensures that the + * direct MMU disconnected PT and PT free lists are emptied and outstanding + * page table memory freed. + */ +static void direct_mmu_process_pt_free_list_sync(struct kvm *kvm) +{ + LIST_HEAD(snapshot); + + cancel_work_sync(&kvm->arch.direct_mmu_free_work); + direct_mmu_cut_disconnected_pt_list(kvm, &snapshot); + + spin_lock(&kvm->arch.direct_mmu_pt_free_list_lock); + list_splice_tail_init(&snapshot, &kvm->arch.direct_mmu_pt_free_list); + spin_unlock(&kvm->arch.direct_mmu_pt_free_list_lock); + + direct_mmu_process_pt_free_list(kvm); +} + +/* + * Add a page of memory that has been disconnected from the paging structure to + * a queue to be freed. This is a two step process: after a page has been + * disconnected, the TLBs must be flushed, and an RCU grace period must elapse + * before the memory can be freed. + */ +static void direct_mmu_disconnected_pt_list_add(struct kvm *kvm, + struct page *page) +{ + spin_lock(&kvm->arch.direct_mmu_disconnected_pts_lock); + list_add_tail(&page->lru, &kvm->arch.direct_mmu_disconnected_pts); + spin_unlock(&kvm->arch.direct_mmu_disconnected_pts_lock); +} + + static void handle_changed_pte(struct kvm *kvm, int as_id, gfn_t gfn, u64 old_pte, u64 new_pte, int level); @@ -1760,12 +1854,8 @@ static void handle_disconnected_pt(struct kvm *kvm, int as_id, gfn += KVM_PAGES_PER_HPAGE(level); } - /* - * Free the pt page in an RCU callback, once it's safe to do - * so. - */ page = pfn_to_page(pfn); - call_rcu(&page->rcu_head, free_pt_rcu_callback); + direct_mmu_disconnected_pt_list_add(kvm, page); } /** @@ -5813,6 +5903,12 @@ static int kvm_mmu_init_direct_mmu(struct kvm *kvm) kvm->arch.direct_mmu_enabled = true; kvm->arch.pure_direct_mmu = true; + spin_lock_init(&kvm->arch.direct_mmu_disconnected_pts_lock); + INIT_LIST_HEAD(&kvm->arch.direct_mmu_disconnected_pts); + spin_lock_init(&kvm->arch.direct_mmu_pt_free_list_lock); + INIT_LIST_HEAD(&kvm->arch.direct_mmu_pt_free_list); + INIT_WORK(&kvm->arch.direct_mmu_free_work, direct_mmu_free_work_fn); + return 0; err: for (i = 0; i < ARRAY_SIZE(kvm->arch.direct_root_hpa); i++) { @@ -5831,6 +5927,8 @@ static void kvm_mmu_uninit_direct_mmu(struct kvm *kvm) if (!kvm->arch.direct_mmu_enabled) return; + direct_mmu_process_pt_free_list_sync(kvm); + for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) handle_disconnected_pt(kvm, i, 0, (kvm_pfn_t)(kvm->arch.direct_root_hpa[i] >> PAGE_SHIFT), @@ -6516,3 +6614,22 @@ void kvm_mmu_module_exit(void) unregister_shrinker(&mmu_shrinker); mmu_audit_disable(); } + +void kvm_flush_remote_tlbs(struct kvm *kvm) +{ + LIST_HEAD(disconnected_snapshot); + + if (kvm->arch.direct_mmu_enabled) + direct_mmu_cut_disconnected_pt_list(kvm, + &disconnected_snapshot); + + /* + * Synchronously flush the TLBs before processing the direct MMU free + * list. + */ + __kvm_flush_remote_tlbs(kvm); + + if (kvm->arch.direct_mmu_enabled) + direct_mmu_process_free_list_async(kvm, &disconnected_snapshot); +} +EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index baed80f8a7f00..350a3b79cc8d1 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -786,6 +786,7 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu); int kvm_vcpu_yield_to(struct kvm_vcpu *target); void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool usermode_vcpu_not_eligible); +void __kvm_flush_remote_tlbs(struct kvm *kvm); void kvm_flush_remote_tlbs(struct kvm *kvm); void kvm_reload_remote_mmus(struct kvm *kvm); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 9ce067b6882b7..c8559a86625ce 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -255,8 +255,7 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req) return called; } -#ifndef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL -void kvm_flush_remote_tlbs(struct kvm *kvm) +void __kvm_flush_remote_tlbs(struct kvm *kvm) { /* * Read tlbs_dirty before setting KVM_REQ_TLB_FLUSH in @@ -280,6 +279,12 @@ void kvm_flush_remote_tlbs(struct kvm *kvm) ++kvm->stat.remote_tlb_flush; cmpxchg(&kvm->tlbs_dirty, dirty_count, 0); } + +#ifndef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL +void kvm_flush_remote_tlbs(struct kvm *kvm) +{ + __kvm_flush_remote_tlbs(kvm); +} EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs); #endif From patchwork Thu Sep 26 23:18:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163481 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 220E414ED for ; Thu, 26 Sep 2019 23:18:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0140D2086A for ; Thu, 26 Sep 2019 23:18:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="rGDN1/dP" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729032AbfIZXSz (ORCPT ); Thu, 26 Sep 2019 19:18:55 -0400 Received: from mail-qk1-f202.google.com ([209.85.222.202]:44959 "EHLO mail-qk1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729019AbfIZXSy (ORCPT ); Thu, 26 Sep 2019 19:18:54 -0400 Received: by mail-qk1-f202.google.com with SMTP id x77so824222qka.11 for ; Thu, 26 Sep 2019 16:18:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=HYTTeYbK/aVfFSgOVXcFV9VVH02ZEYoiUG/MoI0UcY0=; b=rGDN1/dPr08EUOP55QKYL85lqDyr1fTQif9OW5w6aspEMjXO5EzFlqbVMFjLCjtsxH kWxKu/lRL1bXJ90jHLIyGRowzWQF0Pfg69OK5rUg7x1euh9U7q43Lmd7oIafXTdkByFd GonlM/nQnoOKgseOi6R47QsXESaHWdV0b/tFPtFPGg03cKPoeZNd9hBTo0v0alGfHQgO KgqctMZ7FrXcWhKghrXsHCIqStK98A4ItMPEMLggzS7RE/FwjsLd059Z7BOqEqnyms7F o0qoG62TZpiEVWylfB91mhvjIGpdq99WyHr99o6P4DIDwQLFlZwfaUlLLu0ljydl4NZF oHiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=HYTTeYbK/aVfFSgOVXcFV9VVH02ZEYoiUG/MoI0UcY0=; b=KZl6eNAi3aL7KzEdYn+P/45hKlDQMspseJxNNpNncpZfDdx+drkGmzoQOtH3Cq+EPU yqznKht4rpuXw1Rr89qmd/FdOomTrGXXMPNAmxOa+CEPkwbvBLlcNNGHAJvyUMOo1Og3 rkh83BxKx8+7uYKiafW2d3RI4+gBmdDKRDUC+HaaD5eqXhEqOtrH/aIpB2s+FPG9mnbZ Nnxn/3lMDg78hvJYS8nz1y4DTtMF4qxk1+HmJTlhpOhO4rlUZqfN367G9RxmP1SZQ20a GZXaIEAoyaWdDmngtca7zrtJQCskBKVJa+YWGPmdQKgLo9TVv9RJh8UPy4PHKYwAB9E8 OXxw== X-Gm-Message-State: APjAAAWuqEnENIDPrPXDnY6r83OgR9HsY7X4c+ar6UKDqR0WVFjeq3/L c5/kHeGYKv1i3r1AcxYlEKuCsC767PUW9o1IHbrTwoD9vQ8gPD0WAAbsprb312HauZCcKyoEaO+ N1v97yKGGRs8OCpCuiFP5YAlU58bqG6Yz/tDtRlXapTIEMyV4v1xatUzC7gjT X-Google-Smtp-Source: APXvYqwwM08fU8vK7ZnfwpAp/teTxfnJH2gKHmczwiSyWthxs6MU/ERqwWYIyhoittx4Ow3E6Xnsv7V2nCLt X-Received: by 2002:ac8:2d2c:: with SMTP id n41mr6867280qta.335.1569539933282; Thu, 26 Sep 2019 16:18:53 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:07 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-12-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 11/28] kvm: mmu: Optimize for freeing direct MMU PTs on teardown From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Waiting for a TLB flush and an RCU grace priod before freeing page table memory grants safety in steady state operation, however these protections are not always necessary. On VM teardown, only one thread is operating on the paging structures and no vCPUs are running. As a result a fast path can be added to the disconnected page table handler which frees the memory immediately. Add the fast path and use it when tearing down VMs. Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu.c | 44 ++++++++++++++++++++++++++++++++++---------- 1 file changed, 34 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 317e9238f17b2..263718d49f730 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1795,7 +1795,8 @@ static void direct_mmu_disconnected_pt_list_add(struct kvm *kvm, static void handle_changed_pte(struct kvm *kvm, int as_id, gfn_t gfn, - u64 old_pte, u64 new_pte, int level); + u64 old_pte, u64 new_pte, int level, + bool vm_teardown); /** * mark_pte_disconnected - Mark a PTE as part of a disconnected PT @@ -1805,16 +1806,19 @@ static void handle_changed_pte(struct kvm *kvm, int as_id, gfn_t gfn, * @ptep: a pointer to the PTE to be marked disconnected * @level: the level of the PT this PTE was a part of, when it was part of the * paging structure + * @vm_teardown: all vCPUs are paused and the VM is being torn down. Yield and + * free child page table memory immediately. */ static void mark_pte_disconnected(struct kvm *kvm, int as_id, gfn_t gfn, - u64 *ptep, int level) + u64 *ptep, int level, bool vm_teardown) { u64 old_pte; old_pte = xchg(ptep, DISCONNECTED_PTE); BUG_ON(old_pte == DISCONNECTED_PTE); - handle_changed_pte(kvm, as_id, gfn, old_pte, DISCONNECTED_PTE, level); + handle_changed_pte(kvm, as_id, gfn, old_pte, DISCONNECTED_PTE, level, + vm_teardown); } /** @@ -1825,6 +1829,8 @@ static void mark_pte_disconnected(struct kvm *kvm, int as_id, gfn_t gfn, * @pt_base_gfn: the base GFN that was mapped by the first PTE in the PT * @pfn: The physical frame number of the disconnected PT page * @level: the level of the PT, when it was part of the paging structure + * @vm_teardown: all vCPUs are paused and the VM is being torn down. Yield and + * free child page table memory immediately. * * Given a pointer to a page table that has been removed from the paging * structure and its level, recursively free child page tables and mark their @@ -1834,9 +1840,17 @@ static void mark_pte_disconnected(struct kvm *kvm, int as_id, gfn_t gfn, * page table or its children because it has been atomically removed from the * root of the paging structure, so no other thread will be trying to free the * memory. + * + * If vm_teardown=true, this function will yield while handling the + * disconnected page tables and will free memory immediately. This option + * should only be used during VM teardown when no other CPUs are accessing the + * direct paging structures. Yielding is necessary because the paging structure + * could be quite large, and freeing it without yielding would induce + * soft-lockups or scheduler warnings. */ static void handle_disconnected_pt(struct kvm *kvm, int as_id, - gfn_t pt_base_gfn, kvm_pfn_t pfn, int level) + gfn_t pt_base_gfn, kvm_pfn_t pfn, int level, + bool vm_teardown) { int i; gfn_t gfn = pt_base_gfn; @@ -1849,13 +1863,20 @@ static void handle_disconnected_pt(struct kvm *kvm, int as_id, * try to map in an entry there or try to free any child page * table the entry might have pointed to. */ - mark_pte_disconnected(kvm, as_id, gfn, &pt[i], level); + mark_pte_disconnected(kvm, as_id, gfn, &pt[i], level, + vm_teardown); gfn += KVM_PAGES_PER_HPAGE(level); } - page = pfn_to_page(pfn); - direct_mmu_disconnected_pt_list_add(kvm, page); + if (vm_teardown) { + BUG_ON(atomic_read(&kvm->online_vcpus) != 0); + cond_resched(); + free_page((unsigned long)pt); + } else { + page = pfn_to_page(pfn); + direct_mmu_disconnected_pt_list_add(kvm, page); + } } /** @@ -1866,6 +1887,8 @@ static void handle_disconnected_pt(struct kvm *kvm, int as_id, * @old_pte: The value of the PTE before the atomic compare / exchange * @new_pte: The value of the PTE after the atomic compare / exchange * @level: the level of the PT the PTE is part of in the paging structure + * @vm_teardown: all vCPUs are paused and the VM is being torn down. Yield and + * free child page table memory immediately. * * Handle bookkeeping that might result from the modification of a PTE. * This function should be called in the same RCU read critical section as the @@ -1874,7 +1897,8 @@ static void handle_disconnected_pt(struct kvm *kvm, int as_id, * setting the dirty bit on a pte. */ static void handle_changed_pte(struct kvm *kvm, int as_id, gfn_t gfn, - u64 old_pte, u64 new_pte, int level) + u64 old_pte, u64 new_pte, int level, + bool vm_teardown) { bool was_present = is_present_direct_pte(old_pte); bool is_present = is_present_direct_pte(new_pte); @@ -1920,7 +1944,7 @@ static void handle_changed_pte(struct kvm *kvm, int as_id, gfn_t gfn, * pointed to must be freed. */ handle_disconnected_pt(kvm, as_id, gfn, spte_to_pfn(old_pte), - child_level); + child_level, vm_teardown); } } @@ -5932,7 +5956,7 @@ static void kvm_mmu_uninit_direct_mmu(struct kvm *kvm) for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) handle_disconnected_pt(kvm, i, 0, (kvm_pfn_t)(kvm->arch.direct_root_hpa[i] >> PAGE_SHIFT), - PT64_ROOT_4LEVEL); + PT64_ROOT_4LEVEL, true); } /* The return value indicates if tlb flush on all vcpus is needed. */ From patchwork Thu Sep 26 23:18:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163483 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 816CC912 for ; Thu, 26 Sep 2019 23:18:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 608F520835 for ; Thu, 26 Sep 2019 23:18:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Ift8WiOE" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729005AbfIZXS6 (ORCPT ); Thu, 26 Sep 2019 19:18:58 -0400 Received: from mail-qt1-f202.google.com ([209.85.160.202]:50779 "EHLO mail-qt1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729000AbfIZXS6 (ORCPT ); Thu, 26 Sep 2019 19:18:58 -0400 Received: by mail-qt1-f202.google.com with SMTP id d24so3175079qtn.17 for ; Thu, 26 Sep 2019 16:18:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Ss4PksOshRYZKrEYLVpamWjUtlHgN82Ss6+WzEoWR/I=; b=Ift8WiOEhGWektL5HivQyZFE6GUOYG6bYHJxB+QK/zvwyAy5RgaxcTKPJkAgYkbC9o VgJGgZwBMHwgTx2T+Z8bf6gmS0it8Za8RTAKS6tave4CMaRDgoE0FkFuJ4VIojLLd91t qIoYGO6We3qZnJOK4j1vKdmAEnkVfNgom1E1aQSNVPIScq2EkqvLF9azhnHMdEZqs+rQ vhFx8dWvyga0G8a4YocPl6fcDSb9xnjDkha0vJOqpRVIf2G4QOgtorMumVqpJ3uNJQHN dtzypsDQL4BsuJnE8G5lcQN4fUgaoJ0bS8JawYbkOkTgwBcVm4K8WxwDysYDWoqb22ff l8lw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Ss4PksOshRYZKrEYLVpamWjUtlHgN82Ss6+WzEoWR/I=; b=f+QjFp3htt0cbjur80lsiAOa2KoxLV20vhy2d4UdA1BaVAKluVM48j11VopJLf3Ldx U1FYhH+HbM+NaDIPlwclYnge6sW0XLo1JbaNmC4NNHjPcaHUdljnHvkDq2XvF7tG8utE XwICeYHop0FdZzig8Uu+QRIcsKfvFkPdSqoxzUFZIZalvXrZ6Y8MQkmRR1rcRsZUXcKP ucax4X1irLXhZzsGewYzxuO24upJ1X9LkUGfMPOg4mvfypXLrDUoVyBwDY6G2u0cRCXC QxGU/yJVbYoqCbKntjSBt11FO5nMF3f3o9IrooVOXVJS7Y79q0JaA9jcDrpn5UsFtO0E AHlA== X-Gm-Message-State: APjAAAXePOthbsR8n7f49cjWaKYhC79AxyZpOInSCVMcWvHAs549MeU9 jnoH8JwE9BbEUv0zqTf3gb5AVqp/+Ei+onVbaMBBXPhqRWWL9igMSbUJMF5uue6QfP/DUFNQqlw zJ0zKn+l068x6a7iSl43rRNbNuK+YXJH79k5zEhVaC2ejpYqVyrK2v6f6lbWK X-Google-Smtp-Source: APXvYqy1uVlE4u8nD9WPQhr+cMF/pU4j6Tjbp9qq0W2tzRVbJsleNJ0RaZVKbErMoAEuHuUaLSXsKuqffwww X-Received: by 2002:ac8:5399:: with SMTP id x25mr6959278qtp.144.1569539935556; Thu, 26 Sep 2019 16:18:55 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:08 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-13-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 12/28] kvm: mmu: Set tlbs_dirty atomically From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The tlbs_dirty mechanism for deferring flushes can be expanded beyond its current use case. This allows MMU operations which do not themselves require TLB flushes to notify other threads that there are unflushed modifications to the paging structure. In order to use this mechanism concurrently, the updates to the global tlbs_dirty must be made atomically. Signed-off-by: Ben Gardon --- arch/x86/kvm/paging_tmpl.h | 29 +++++++++++++---------------- 1 file changed, 13 insertions(+), 16 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 97903c8dcad16..cc3630c8bd3ea 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -986,6 +986,8 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) bool host_writable; gpa_t first_pte_gpa; int set_spte_ret = 0; + int ret; + int tlbs_dirty = 0; /* direct kvm_mmu_page can not be unsync. */ BUG_ON(sp->role.direct); @@ -1004,17 +1006,13 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) pte_gpa = first_pte_gpa + i * sizeof(pt_element_t); if (kvm_vcpu_read_guest_atomic(vcpu, pte_gpa, &gpte, - sizeof(pt_element_t))) - return 0; + sizeof(pt_element_t))) { + ret = 0; + goto out; + } if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) { - /* - * Update spte before increasing tlbs_dirty to make - * sure no tlb flush is lost after spte is zapped; see - * the comments in kvm_flush_remote_tlbs(). - */ - smp_wmb(); - vcpu->kvm->tlbs_dirty++; + tlbs_dirty++; continue; } @@ -1029,12 +1027,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) if (gfn != sp->gfns[i]) { drop_spte(vcpu->kvm, &sp->spt[i]); - /* - * The same as above where we are doing - * prefetch_invalid_gpte(). - */ - smp_wmb(); - vcpu->kvm->tlbs_dirty++; + tlbs_dirty++; continue; } @@ -1051,7 +1044,11 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH) kvm_flush_remote_tlbs(vcpu->kvm); - return nr_present; + ret = nr_present; + +out: + xadd(&vcpu->kvm->tlbs_dirty, tlbs_dirty); + return ret; } #undef pt_element_t From patchwork Thu Sep 26 23:18:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163485 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 56644912 for ; Thu, 26 Sep 2019 23:19:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 22C6A2086A for ; Thu, 26 Sep 2019 23:19:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="jcCv9Gms" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728939AbfIZXS7 (ORCPT ); Thu, 26 Sep 2019 19:18:59 -0400 Received: from mail-pf1-f201.google.com ([209.85.210.201]:47674 "EHLO mail-pf1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729000AbfIZXS7 (ORCPT ); Thu, 26 Sep 2019 19:18:59 -0400 Received: by mail-pf1-f201.google.com with SMTP id t65so490632pfd.14 for ; Thu, 26 Sep 2019 16:18:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=x+VzNiLJHjNZuhYKHwySsWShXnowbtgPFa+qtIsFFJY=; b=jcCv9GmsNfyrMTWIoGIXXAx3XRrSLUhhpyMMdimXIhPWyh5w5mjfAJZ7CklyVFeWV3 ryzq6XzPPoo4C5rYKTgMA5aboTcp19K1M0RHEBVPnMx01j3t18Biac4FTrakKcWQ+DZR B9r/kAhZG/p2LgxNHn0zrmVMwExtQY6snxq9GsO/3DgMcYgZornZV3IigWTFptg5sNBn fmqZCtJonlk5QHvUhm5R+J79t4syxdpBqOjn8SGNfjR13lvQQrELYQsqZavzdEX383rJ evHAYkLFkJy54DONF4YDVGS/6be0V6gFa3RDgBU0PWs7hdmKclV+Q2PJ0adlb2VQAQhY +fJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=x+VzNiLJHjNZuhYKHwySsWShXnowbtgPFa+qtIsFFJY=; b=hvOHBW24utFRp6zeUsUYF8qinuupvKYqeQyYtJ7mtFzh68ZtWpJOsfaNwHxuRG7NNT ZZA05JRhE8udT2KAGa6ldsBX90jL6Rv4zMECXA9Dby6ODzaz5plEz8xbsqJsFvToJ3ci F1LTyas8bkr/jy2YQaEwvW4VMBx/8PxxXrlnEWZDknQNZfXYAfmhj0QRdijo27UUPu3A aVw/tCiXnQsB8aDVc8V2ZhSzPsyTP/hAMZwRqh02pUSBBHHfRypKSYcL2hUs6wqBr8uB X6a73cuXVtDDNjWOBCd0Or1yF7rvIa0wcJcUj2mX83W5TVcpiI700d0qsWI7oIXwY3fs 9fNg== X-Gm-Message-State: APjAAAV0MwITXXr4ZwEcjFjH+zPoWIUCski5EdvCWMm0Tmt9DDgq9tm7 QQakKPDSzW001rvY/cpa86HcsPwjaCkPSZRl8zSY06weBQS45rrYx6CqDT6VEqm5sQTnhGCVahv 2glOA7JZyjOMqXTszSOnViULKfQZEreqvQ+O9wwHpux6wKDggBGn+2RABGJ0M X-Google-Smtp-Source: APXvYqzD11vasZoAfJTdFcItsEwIjGHLbYchuPXaUVpHfnaW7z0twc3Cl202wjq9UtfWqKIaWxgI6oo2vfvu X-Received: by 2002:a65:5648:: with SMTP id m8mr6141682pgs.37.1569539937612; Thu, 26 Sep 2019 16:18:57 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:09 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-14-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 13/28] kvm: mmu: Add an iterator for concurrent paging structure walks From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add a utility for concurrent paging structure traversals. This iterator uses several mechanisms to ensure that its accesses to paging structure memory are safe, and that memory can be freed safely in the face of lockless access. The purpose of the iterator is to create a unified pattern for concurrent paging structure traversals and simplify the implementation of other MMU functions. This iterator implements a pre-order traversal of PTEs for a given GFN range within a given address space. The iterator abstracts away bookkeeping on successful changes to PTEs, retrying on failed PTE modifications, TLB flushing, and yielding during long operations. Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu.c | 455 ++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/mmutrace.h | 50 +++++ 2 files changed, 505 insertions(+) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 263718d49f730..59d1866398c42 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1948,6 +1948,461 @@ static void handle_changed_pte(struct kvm *kvm, int as_id, gfn_t gfn, } } +/* + * Given a host page table entry and its level, returns a pointer containing + * the host virtual address of the child page table referenced by the page table + * entry. Returns null if there is no such entry. + */ +static u64 *pte_to_child_pt(u64 pte, int level) +{ + u64 *pt; + /* There's no child entry if this entry isn't present */ + if (!is_present_direct_pte(pte)) + return NULL; + + /* There is no child page table if this is a leaf entry. */ + if (is_last_spte(pte, level)) + return NULL; + + pt = (u64 *)__va(pte & PT64_BASE_ADDR_MASK); + return pt; +} + +enum mmu_lock_mode { + MMU_NO_LOCK = 0, + MMU_READ_LOCK = 1, + MMU_WRITE_LOCK = 2, + MMU_LOCK_MAY_RESCHED = 4 +}; + +/* + * A direct walk iterator encapsulates a walk through a direct paging structure. + * It handles ensuring that the walk uses RCU to safely access page table + * memory. + */ +struct direct_walk_iterator { + /* Internal */ + gfn_t walk_start; + gfn_t walk_end; + gfn_t target_gfn; + long tlbs_dirty; + + /* the address space id. */ + int as_id; + u64 *pt_path[PT64_ROOT_4LEVEL]; + bool walk_in_progress; + + /* + * If set, the next call to direct_walk_iterator_next_pte_raw will + * simply reread the current pte and return. This is useful in cases + * where a thread misses a race to set a pte and wants to retry. This + * should be set with a call to direct_walk_iterator_retry_pte. + */ + bool retry_pte; + + /* + * If set, the next call to direct_walk_iterator_next_pte_raw will not + * step down to a lower level on its next step, even if it is at a + * present, non-leaf pte. This is useful when, for example, splitting + * pages, since we know that the entries below the now split page don't + * need to be handled again. + */ + bool skip_step_down; + + enum mmu_lock_mode lock_mode; + struct kvm *kvm; + + /* Output */ + + /* The iterator's current level within the paging structure */ + int level; + /* A pointer to the current PTE */ + u64 *ptep; + /* The a snapshot of the PTE pointed to by ptep */ + u64 old_pte; + /* The lowest GFN mapped by the current PTE */ + gfn_t pte_gfn_start; + /* The highest GFN mapped by the current PTE, + 1 */ + gfn_t pte_gfn_end; +}; + +static void direct_walk_iterator_start_traversal( + struct direct_walk_iterator *iter) +{ + int level; + + /* + * Only clear the levels below the root. The root level page table is + * allocated at VM creation time and will never change for the life of + * the VM. + */ + for (level = PT_PAGE_TABLE_LEVEL; level < PT64_ROOT_4LEVEL; level++) + iter->pt_path[level - 1] = NULL; + iter->level = 0; + iter->ptep = NULL; + iter->old_pte = 0; + iter->pte_gfn_start = 0; + iter->pte_gfn_end = 0; + iter->walk_in_progress = false; + iter->retry_pte = false; + iter->skip_step_down = false; +} + +static bool direct_walk_iterator_flush_needed(struct direct_walk_iterator *iter) +{ + long tlbs_dirty; + + if (iter->tlbs_dirty) { + tlbs_dirty = xadd(&iter->kvm->tlbs_dirty, iter->tlbs_dirty) + + iter->tlbs_dirty; + iter->tlbs_dirty = 0; + } else { + tlbs_dirty = READ_ONCE(iter->kvm->tlbs_dirty); + } + + return (iter->lock_mode & MMU_WRITE_LOCK) && tlbs_dirty; +} + +static bool direct_walk_iterator_end_traversal( + struct direct_walk_iterator *iter) +{ + if (iter->walk_in_progress) + rcu_read_unlock(); + return direct_walk_iterator_flush_needed(iter); +} + +/* + * Resets a direct walk iterator to the root of the paging structure and RCU + * unlocks. After calling this function, the traversal can be reattempted. + */ +static void direct_walk_iterator_reset_traversal( + struct direct_walk_iterator *iter) +{ + /* + * It's okay it ignore the return value, indicating whether a TLB flush + * is needed here because we are ending and then restarting the + * traversal without releasing the MMU lock. At this point the + * iterator tlbs_dirty will have been flushed to the kvm tlbs_dirty, so + * the next end_traversal will return that a flush is needed, if there's + * not an intervening flush for some other reason. + */ + direct_walk_iterator_end_traversal(iter); + direct_walk_iterator_start_traversal(iter); +} + +/* + * Sets a direct walk iterator to seek the gfn range [start, end). + * If end is greater than the maximum possible GFN, it will be changed to the + * maximum possible gfn + 1. (Note that start/end is and inclusive/exclusive + * range, so the last gfn to be interated over would be the largest possible + * GFN, in this scenario.) + */ +__attribute__((unused)) +static void direct_walk_iterator_setup_walk(struct direct_walk_iterator *iter, + struct kvm *kvm, int as_id, gfn_t start, gfn_t end, + enum mmu_lock_mode lock_mode) +{ + BUG_ON(!kvm->arch.direct_mmu_enabled); + BUG_ON((lock_mode & MMU_WRITE_LOCK) && (lock_mode & MMU_READ_LOCK)); + BUG_ON(as_id < 0); + BUG_ON(as_id >= KVM_ADDRESS_SPACE_NUM); + BUG_ON(!VALID_PAGE(kvm->arch.direct_root_hpa[as_id])); + + /* End cannot be greater than the maximum possible gfn. */ + end = min(end, 1ULL << (PT64_ROOT_4LEVEL * PT64_PT_BITS)); + + iter->as_id = as_id; + iter->pt_path[PT64_ROOT_4LEVEL - 1] = + (u64 *)__va(kvm->arch.direct_root_hpa[as_id]); + + iter->walk_start = start; + iter->walk_end = end; + iter->target_gfn = start; + + iter->lock_mode = lock_mode; + iter->kvm = kvm; + iter->tlbs_dirty = 0; + + direct_walk_iterator_start_traversal(iter); +} + +__attribute__((unused)) +static void direct_walk_iterator_retry_pte(struct direct_walk_iterator *iter) +{ + BUG_ON(!iter->walk_in_progress); + iter->retry_pte = true; +} + +__attribute__((unused)) +static void direct_walk_iterator_skip_step_down( + struct direct_walk_iterator *iter) +{ + BUG_ON(!iter->walk_in_progress); + iter->skip_step_down = true; +} + +/* + * Steps down one level in the paging structure towards the previously set + * target gfn. Returns true if the iterator was able to step down a level, + * false otherwise. + */ +static bool direct_walk_iterator_try_step_down( + struct direct_walk_iterator *iter) +{ + u64 *child_pt; + + /* + * Reread the pte before stepping down to avoid traversing into page + * tables that are no longer linked from this entry. This is not + * needed for correctness - just a small optimization. + */ + iter->old_pte = READ_ONCE(*iter->ptep); + + child_pt = pte_to_child_pt(iter->old_pte, iter->level); + if (child_pt == NULL) + return false; + child_pt = rcu_dereference(child_pt); + + iter->level--; + iter->pt_path[iter->level - 1] = child_pt; + return true; +} + +/* + * Steps to the next entry in the current page table, at the current page table + * level. The next entry could map a page of guest memory, another page table, + * or it could be non-present or invalid. Returns true if the iterator was able + * to step to the next entry in the page table, false otherwise. + */ +static bool direct_walk_iterator_try_step_side( + struct direct_walk_iterator *iter) +{ + /* + * If the current gfn maps past the target gfn range, the next entry in + * the current page table will be outside the target range. + */ + if (iter->pte_gfn_end >= iter->walk_end) + return false; + + /* + * Check if the iterator is already at the end of the current page + * table. + */ + if (!(iter->pte_gfn_end % KVM_PAGES_PER_HPAGE(iter->level + 1))) + return false; + + iter->target_gfn = iter->pte_gfn_end; + return true; +} + +/* + * Tries to back up a level in the paging structure so that the walk can + * continue from the next entry in the parent page table. Returns true on a + * successful step up, false otherwise. + */ +static bool direct_walk_iterator_try_step_up(struct direct_walk_iterator *iter) +{ + if (iter->level == PT64_ROOT_4LEVEL) + return false; + + iter->level++; + return true; +} + +/* + * Step to the next pte in a pre-order traversal of the target gfn range. + * To get to the next pte, the iterator either steps down towards the current + * target gfn, if at a present, non-leaf pte, or over to a pte mapping a + * highter gfn, if there's room in the gfn range. If there is no step within + * the target gfn range, returns false. + */ +static bool direct_walk_iterator_next_pte_raw(struct direct_walk_iterator *iter) +{ + bool retry_pte = iter->retry_pte; + bool skip_step_down = iter->skip_step_down; + + iter->retry_pte = false; + iter->skip_step_down = false; + + if (iter->target_gfn >= iter->walk_end) + return false; + + /* If the walk is just starting, set up initial values. */ + if (!iter->walk_in_progress) { + rcu_read_lock(); + + iter->level = PT64_ROOT_4LEVEL; + iter->walk_in_progress = true; + return true; + } + + if (retry_pte) + return true; + + if (!skip_step_down && direct_walk_iterator_try_step_down(iter)) + return true; + + while (!direct_walk_iterator_try_step_side(iter)) + if (!direct_walk_iterator_try_step_up(iter)) + return false; + return true; +} + +static void direct_walk_iterator_recalculate_output_fields( + struct direct_walk_iterator *iter) +{ + iter->ptep = iter->pt_path[iter->level - 1] + + PT64_INDEX(iter->target_gfn << PAGE_SHIFT, iter->level); + iter->old_pte = READ_ONCE(*iter->ptep); + iter->pte_gfn_start = ALIGN_DOWN(iter->target_gfn, + KVM_PAGES_PER_HPAGE(iter->level)); + iter->pte_gfn_end = iter->pte_gfn_start + + KVM_PAGES_PER_HPAGE(iter->level); +} + +static void direct_walk_iterator_prepare_cond_resched( + struct direct_walk_iterator *iter) +{ + if (direct_walk_iterator_end_traversal(iter)) + kvm_flush_remote_tlbs(iter->kvm); + + if (iter->lock_mode & MMU_WRITE_LOCK) + write_unlock(&iter->kvm->mmu_lock); + else if (iter->lock_mode & MMU_READ_LOCK) + read_unlock(&iter->kvm->mmu_lock); + +} + +static void direct_walk_iterator_finish_cond_resched( + struct direct_walk_iterator *iter) +{ + if (iter->lock_mode & MMU_WRITE_LOCK) + write_lock(&iter->kvm->mmu_lock); + else if (iter->lock_mode & MMU_READ_LOCK) + read_lock(&iter->kvm->mmu_lock); + + direct_walk_iterator_start_traversal(iter); +} + +static void direct_walk_iterator_cond_resched(struct direct_walk_iterator *iter) +{ + if (!(iter->lock_mode & MMU_LOCK_MAY_RESCHED) || !need_resched()) + return; + + direct_walk_iterator_prepare_cond_resched(iter); + cond_resched(); + direct_walk_iterator_finish_cond_resched(iter); +} + +static bool direct_walk_iterator_next_pte(struct direct_walk_iterator *iter) +{ + /* + * This iterator could be iterating over a large number of PTEs, such + * that if this thread did not yield, it would cause scheduler\ + * problems. To avoid this, yield if needed. Note the check on + * MMU_LOCK_MAY_RESCHED in direct_walk_iterator_cond_resched. This + * iterator will not yield unless that flag is set in its lock_mode. + */ + direct_walk_iterator_cond_resched(iter); + + while (true) { + if (!direct_walk_iterator_next_pte_raw(iter)) + return false; + + direct_walk_iterator_recalculate_output_fields(iter); + if (iter->old_pte != DISCONNECTED_PTE) + break; + + /* + * The iterator has encountered a disconnected pte, so it is in + * a page that has been disconnected from the root. Restart the + * traversal from the root in this case. + */ + direct_walk_iterator_reset_traversal(iter); + } + + trace_kvm_mmu_direct_walk_iterator_step(iter->walk_start, + iter->walk_end, iter->pte_gfn_start, + iter->level, iter->old_pte); + + return true; +} + +/* + * As direct_walk_iterator_next_pte but skips over non-present ptes. + * (i.e. ptes that are 0 or invalidated.) + */ +static bool direct_walk_iterator_next_present_pte( + struct direct_walk_iterator *iter) +{ + while (direct_walk_iterator_next_pte(iter)) + if (is_present_direct_pte(iter->old_pte)) + return true; + + return false; +} + +/* + * As direct_walk_iterator_next_present_pte but skips over non-leaf ptes. + */ +__attribute__((unused)) +static bool direct_walk_iterator_next_present_leaf_pte( + struct direct_walk_iterator *iter) +{ + while (direct_walk_iterator_next_present_pte(iter)) + if (is_last_spte(iter->old_pte, iter->level)) + return true; + + return false; +} + +/* + * Performs an atomic compare / exchange of ptes. + * Returns true if the pte was successfully set to the new value, false if the + * there was a race and the compare exchange needs to be retried. + */ +static bool cmpxchg_pte(u64 *ptep, u64 old_pte, u64 new_pte, int level, u64 gfn) +{ + u64 r; + + r = cmpxchg64(ptep, old_pte, new_pte); + if (r == old_pte) + trace_kvm_mmu_set_pte_atomic(gfn, level, old_pte, new_pte); + + return r == old_pte; +} + +__attribute__((unused)) +static bool direct_walk_iterator_set_pte(struct direct_walk_iterator *iter, + u64 new_pte) +{ + bool r; + + if (!(iter->lock_mode & (MMU_READ_LOCK | MMU_WRITE_LOCK))) { + BUG_ON(is_present_direct_pte(iter->old_pte) != + is_present_direct_pte(new_pte)); + BUG_ON(spte_to_pfn(iter->old_pte) != spte_to_pfn(new_pte)); + BUG_ON(is_last_spte(iter->old_pte, iter->level) != + is_last_spte(new_pte, iter->level)); + } + + if (iter->old_pte == new_pte) + return true; + + r = cmpxchg_pte(iter->ptep, iter->old_pte, new_pte, iter->level, + iter->pte_gfn_start); + if (r) { + handle_changed_pte(iter->kvm, iter->as_id, iter->pte_gfn_start, + iter->old_pte, new_pte, iter->level, false); + + if (iter->lock_mode & (MMU_WRITE_LOCK | MMU_READ_LOCK)) + iter->tlbs_dirty++; + } else + direct_walk_iterator_retry_pte(iter); + + return r; +} + /** * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages * @kvm: kvm instance diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h index 7ca8831c7d1a2..530723038296a 100644 --- a/arch/x86/kvm/mmutrace.h +++ b/arch/x86/kvm/mmutrace.h @@ -166,6 +166,56 @@ TRACE_EVENT( __entry->created ? "new" : "existing") ); +TRACE_EVENT( + kvm_mmu_direct_walk_iterator_step, + TP_PROTO(u64 walk_start, u64 walk_end, u64 base_gfn, int level, + u64 pte), + TP_ARGS(walk_start, walk_end, base_gfn, level, pte), + + TP_STRUCT__entry( + __field(u64, walk_start) + __field(u64, walk_end) + __field(u64, base_gfn) + __field(int, level) + __field(u64, pte) + ), + + TP_fast_assign( + __entry->walk_start = walk_start; + __entry->walk_end = walk_end; + __entry->base_gfn = base_gfn; + __entry->level = level; + __entry->pte = pte; + ), + + TP_printk("walk_start=%llx walk_end=%llx base_gfn=%llx lvl=%d pte=%llx", + __entry->walk_start, __entry->walk_end, __entry->base_gfn, + __entry->level, __entry->pte) +); + +TRACE_EVENT( + kvm_mmu_set_pte_atomic, + TP_PROTO(u64 gfn, int level, u64 old_pte, u64 new_pte), + TP_ARGS(gfn, level, old_pte, new_pte), + + TP_STRUCT__entry( + __field(u64, gfn) + __field(int, level) + __field(u64, old_pte) + __field(u64, new_pte) + ), + + TP_fast_assign( + __entry->gfn = gfn; + __entry->level = level; + __entry->old_pte = old_pte; + __entry->new_pte = new_pte; + ), + + TP_printk("gfn=%llx level=%d old_pte=%llx new_pte=%llx", __entry->gfn, + __entry->level, __entry->old_pte, __entry->new_pte) +); + DECLARE_EVENT_CLASS(kvm_mmu_page_class, TP_PROTO(struct kvm_mmu_page *sp), From patchwork Thu Sep 26 23:18:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163487 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0C5B0912 for ; Thu, 26 Sep 2019 23:19:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D3BC620835 for ; Thu, 26 Sep 2019 23:19:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="HJHliQkf" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728958AbfIZXTC (ORCPT ); Thu, 26 Sep 2019 19:19:02 -0400 Received: from mail-qk1-f201.google.com ([209.85.222.201]:43851 "EHLO mail-qk1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728911AbfIZXTC (ORCPT ); Thu, 26 Sep 2019 19:19:02 -0400 Received: by mail-qk1-f201.google.com with SMTP id w7so831553qkf.10 for ; Thu, 26 Sep 2019 16:19:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=yiHBZbfaEUL6oyk2w4l/q3M9S9S7wHBEf2BUGmUMZAk=; b=HJHliQkfXMwQIhVrOTEHtbvMNtpAMna2v7l+x0CWiUOxK9mRha5cnvY7jEkQYuE6U7 1Lb4DJeenmtFy0pNgNp78fUkq1DJM4DC3EXZe1OTa2tUij/yW3HcLAxbz4LnRMMOdzzg pEhLFaLs5BYCw2SHYLaBr5Gl3q5uVSgYT3zGs26ZOkrZo/jPkVNWKfRzoq/tTnhBeD2W D31qeFjpPr2NEK4z9wYfjP0YZYzg10xh/PP0dc0FfWLufSnEyIYldsjQRJfLyMbK4VrE DkAyiLMvZJUbdIVQ5lfE1iUkFClTxCiotYNzPznyVTrSbcWl61VKcc7wAJH0iCXPcgnJ 5nIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=yiHBZbfaEUL6oyk2w4l/q3M9S9S7wHBEf2BUGmUMZAk=; b=lZn4hZC9pSSs3aa/yq+yephlRHHuTGz8F1YbcvkaGVXKeDY0WgvSeQe7F3jSrunnD9 d63SIOBuaNWQc9UmZYsjhooWBvTCCoO/OT+l+kc5QYT6iAetrq1IKytFh+pM9JbGXAXD M2tzAvF0BDBVoMQrjX57sJH3kYLLUXbS+d0XrnZy7mj/qRU4/Qx2WGmGPhSpms5S3izm JZUAEpvKkj3Jw82irmKpkR6gAXlat5LkrgBpt62OBttNgb1e2Ja0nB50CcvD+2HmO9Wt kFFbkRxchmZ3lHWjhgYcKkT36UWqIQkWfDfA05RGrhPSjMdS3VjwUzLkZOTkOw+pocuB 36Ag== X-Gm-Message-State: APjAAAV9vkXNqjqi2TKplnXqzLFloj4cNpEfd0bnrjcgJKBOLqnl24cH AL0mse5flxl6ywBdtuYW/ZgHmKum8QLqEVW53sgvmiDjQNFnMvD0968jlW5SMRVgR2SSwZUaeD7 Euke654cXld3xl7aJQ9n7nVa73hb7+berJ9maRXdA7PSW0po6O4bCZ2V/rNjW X-Google-Smtp-Source: APXvYqz3G+QtwY8PigT47G1XYVHqOsxOqS/gSpdiRo/iX+OLD+a+98fjwm/1rVjGOnvxLhCDiR95byKE1j6X X-Received: by 2002:a0c:ad01:: with SMTP id u1mr5337710qvc.137.1569539939976; Thu, 26 Sep 2019 16:18:59 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:10 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-15-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 14/28] kvm: mmu: Batch updates to the direct mmu disconnected list From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org When many threads are removing pages of page table memory from the paging structures, the number of list operations on the disconnected page table list can be quite high. Since a spin lock protects the disconnected list, the high rate of list additions can lead to contention. Instead, queue disconnected pages in the paging structure walk iterator and add them to the global list when updating tlbs_dirty, right before releasing the MMU lock. Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu.c | 54 ++++++++++++++++++++++++++++++++++------------ 1 file changed, 40 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 59d1866398c42..234db5f4246a4 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1780,23 +1780,30 @@ static void direct_mmu_process_pt_free_list_sync(struct kvm *kvm) } /* - * Add a page of memory that has been disconnected from the paging structure to + * Add pages of memory that have been disconnected from the paging structure to * a queue to be freed. This is a two step process: after a page has been * disconnected, the TLBs must be flushed, and an RCU grace period must elapse * before the memory can be freed. */ static void direct_mmu_disconnected_pt_list_add(struct kvm *kvm, - struct page *page) + struct list_head *list) { + /* + * No need to acquire the disconnected pts lock if we're adding an + * empty list. + */ + if (list_empty(list)) + return; + spin_lock(&kvm->arch.direct_mmu_disconnected_pts_lock); - list_add_tail(&page->lru, &kvm->arch.direct_mmu_disconnected_pts); + list_splice_tail_init(list, &kvm->arch.direct_mmu_disconnected_pts); spin_unlock(&kvm->arch.direct_mmu_disconnected_pts_lock); } - static void handle_changed_pte(struct kvm *kvm, int as_id, gfn_t gfn, u64 old_pte, u64 new_pte, int level, - bool vm_teardown); + bool vm_teardown, + struct list_head *disconnected_pts); /** * mark_pte_disconnected - Mark a PTE as part of a disconnected PT @@ -1808,9 +1815,12 @@ static void handle_changed_pte(struct kvm *kvm, int as_id, gfn_t gfn, * paging structure * @vm_teardown: all vCPUs are paused and the VM is being torn down. Yield and * free child page table memory immediately. + * @disconnected_pts: a local list of page table pages that need to be freed. + * Used to batch updtes to the disconnected pts list. */ static void mark_pte_disconnected(struct kvm *kvm, int as_id, gfn_t gfn, - u64 *ptep, int level, bool vm_teardown) + u64 *ptep, int level, bool vm_teardown, + struct list_head *disconnected_pts) { u64 old_pte; @@ -1818,7 +1828,7 @@ static void mark_pte_disconnected(struct kvm *kvm, int as_id, gfn_t gfn, BUG_ON(old_pte == DISCONNECTED_PTE); handle_changed_pte(kvm, as_id, gfn, old_pte, DISCONNECTED_PTE, level, - vm_teardown); + vm_teardown, disconnected_pts); } /** @@ -1831,6 +1841,8 @@ static void mark_pte_disconnected(struct kvm *kvm, int as_id, gfn_t gfn, * @level: the level of the PT, when it was part of the paging structure * @vm_teardown: all vCPUs are paused and the VM is being torn down. Yield and * free child page table memory immediately. + * @disconnected_pts: a local list of page table pages that need to be freed. + * Used to batch updtes to the disconnected pts list. * * Given a pointer to a page table that has been removed from the paging * structure and its level, recursively free child page tables and mark their @@ -1850,7 +1862,8 @@ static void mark_pte_disconnected(struct kvm *kvm, int as_id, gfn_t gfn, */ static void handle_disconnected_pt(struct kvm *kvm, int as_id, gfn_t pt_base_gfn, kvm_pfn_t pfn, int level, - bool vm_teardown) + bool vm_teardown, + struct list_head *disconnected_pts) { int i; gfn_t gfn = pt_base_gfn; @@ -1864,7 +1877,7 @@ static void handle_disconnected_pt(struct kvm *kvm, int as_id, * table the entry might have pointed to. */ mark_pte_disconnected(kvm, as_id, gfn, &pt[i], level, - vm_teardown); + vm_teardown, disconnected_pts); gfn += KVM_PAGES_PER_HPAGE(level); } @@ -1875,7 +1888,8 @@ static void handle_disconnected_pt(struct kvm *kvm, int as_id, free_page((unsigned long)pt); } else { page = pfn_to_page(pfn); - direct_mmu_disconnected_pt_list_add(kvm, page); + BUG_ON(!disconnected_pts); + list_add_tail(&page->lru, disconnected_pts); } } @@ -1889,6 +1903,8 @@ static void handle_disconnected_pt(struct kvm *kvm, int as_id, * @level: the level of the PT the PTE is part of in the paging structure * @vm_teardown: all vCPUs are paused and the VM is being torn down. Yield and * free child page table memory immediately. + * @disconnected_pts: a local list of page table pages that need to be freed. + * Used to batch updtes to the disconnected pts list. * * Handle bookkeeping that might result from the modification of a PTE. * This function should be called in the same RCU read critical section as the @@ -1898,7 +1914,8 @@ static void handle_disconnected_pt(struct kvm *kvm, int as_id, */ static void handle_changed_pte(struct kvm *kvm, int as_id, gfn_t gfn, u64 old_pte, u64 new_pte, int level, - bool vm_teardown) + bool vm_teardown, + struct list_head *disconnected_pts) { bool was_present = is_present_direct_pte(old_pte); bool is_present = is_present_direct_pte(new_pte); @@ -1944,7 +1961,8 @@ static void handle_changed_pte(struct kvm *kvm, int as_id, gfn_t gfn, * pointed to must be freed. */ handle_disconnected_pt(kvm, as_id, gfn, spte_to_pfn(old_pte), - child_level, vm_teardown); + child_level, vm_teardown, + disconnected_pts); } } @@ -1987,6 +2005,8 @@ struct direct_walk_iterator { gfn_t target_gfn; long tlbs_dirty; + struct list_head disconnected_pts; + /* the address space id. */ int as_id; u64 *pt_path[PT64_ROOT_4LEVEL]; @@ -2056,6 +2076,9 @@ static bool direct_walk_iterator_flush_needed(struct direct_walk_iterator *iter) tlbs_dirty = xadd(&iter->kvm->tlbs_dirty, iter->tlbs_dirty) + iter->tlbs_dirty; iter->tlbs_dirty = 0; + + direct_mmu_disconnected_pt_list_add(iter->kvm, + &iter->disconnected_pts); } else { tlbs_dirty = READ_ONCE(iter->kvm->tlbs_dirty); } @@ -2115,6 +2138,8 @@ static void direct_walk_iterator_setup_walk(struct direct_walk_iterator *iter, iter->pt_path[PT64_ROOT_4LEVEL - 1] = (u64 *)__va(kvm->arch.direct_root_hpa[as_id]); + INIT_LIST_HEAD(&iter->disconnected_pts); + iter->walk_start = start; iter->walk_end = end; iter->target_gfn = start; @@ -2393,7 +2418,8 @@ static bool direct_walk_iterator_set_pte(struct direct_walk_iterator *iter, iter->pte_gfn_start); if (r) { handle_changed_pte(iter->kvm, iter->as_id, iter->pte_gfn_start, - iter->old_pte, new_pte, iter->level, false); + iter->old_pte, new_pte, iter->level, false, + &iter->disconnected_pts); if (iter->lock_mode & (MMU_WRITE_LOCK | MMU_READ_LOCK)) iter->tlbs_dirty++; @@ -6411,7 +6437,7 @@ static void kvm_mmu_uninit_direct_mmu(struct kvm *kvm) for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) handle_disconnected_pt(kvm, i, 0, (kvm_pfn_t)(kvm->arch.direct_root_hpa[i] >> PAGE_SHIFT), - PT64_ROOT_4LEVEL, true); + PT64_ROOT_4LEVEL, true, NULL); } /* The return value indicates if tlb flush on all vcpus is needed. */ From patchwork Thu Sep 26 23:18:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163489 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2894014ED for ; Thu, 26 Sep 2019 23:19:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0809E20835 for ; Thu, 26 Sep 2019 23:19:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="XpQjmlV4" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728947AbfIZXTD (ORCPT ); Thu, 26 Sep 2019 19:19:03 -0400 Received: from mail-pl1-f201.google.com ([209.85.214.201]:34947 "EHLO mail-pl1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729020AbfIZXTD (ORCPT ); Thu, 26 Sep 2019 19:19:03 -0400 Received: by mail-pl1-f201.google.com with SMTP id o12so481117pll.2 for ; Thu, 26 Sep 2019 16:19:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=jOKwZN3XEhzdi2oUrC3zhPSHel7/C8L4AM2NV+iLdm8=; b=XpQjmlV4NN0PNUXJ9uWgfEqDa42W8EQEQUC99DPTihxftPgFebsAFBM7/pK6hQ63Er 9o6gTAF4jRM2weCgsxsb2NaraUJ9b6Rg5VPcUUl3H6IF7QdT7oKEFv6vhlq278x+46cI 5WzXv67hbLUHF2rd6/YsNQtEa3mfOGsmRYN3iHxH0+jb3eLMDmzEiPYRkZWFOTq5K+LX 5KObRlmFhjR2/B7VUzxxIJaqHXjBSp7jtqpV9fSzdRdzH/bxpnXUNXx/f4opflQURDhK wf+S+2djRkDr9QACyRmMPOAnbnlpku+98P2JaDte37JF/xlfsGEmLBhqe51Gm0oD9Q3n saGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=jOKwZN3XEhzdi2oUrC3zhPSHel7/C8L4AM2NV+iLdm8=; b=Z+i+MIjhdAVxiJSUvo1FmCQkiOslar4yOmtqGHQ4xlgNcgzHa2syAGcSlVfmEfEyvn tTw0vtzVH9YnDu2eFr/H2svubzzjNB0tWnDcNEIJem9OYVmit1WN8sS7SdOZcBOowQ+Q BNwWSh8W1WWOYH5RZ3Zi6J4KelgxQOVx4tAQ6fPueHs2scxxrIHTRVqjjHVwThhVjjdC EnbThHQZvlOWyWCywF0gLXmQVSmMY+8snU+4+OpojIEDPAZwyFlmFEJZGRHPA0taePI/ ihoWJzpccWb0G/AmOewWnDF2N9t5WPRgrC0GFrclV6pOz4Z9Mc+yCPnDfwV9JPsZv3ru asBg== X-Gm-Message-State: APjAAAWIiscwCdQp62LSdvfI68ky5oZFJhRHGbuqle2992PgOvv5f5N6 VQKpbGZAqzJ7oaONdaIN0qo8ExXyrnhIUYG0hFX0fIeW6NlHjN0pOhh8zsL79m61keBYIqSYJXR nOKfuqkRWmguQzr7bcpUa3DS80yT2HVCRe7XctQVNZJaLtslsG7gxJYlr1yPg X-Google-Smtp-Source: APXvYqxuL271toK8U8WGwGmj4KV/b8ktfcX5u08/t3ZVQnvxQ3tYLhIErb8VmQ4MHV/wCDzUGYosm6xkA83t X-Received: by 2002:a65:6557:: with SMTP id a23mr6036986pgw.439.1569539942167; Thu, 26 Sep 2019 16:19:02 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:11 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-16-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 15/28] kvm: mmu: Support invalidate_zap_all_pages From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Adds a function for zapping ranges of GFNs in an address space which uses the paging structure iterator and uses the function to support invalidate_zap_all_pages for the direct MMU. Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu.c | 69 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 66 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 234db5f4246a4..f0696658b527c 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2120,7 +2120,6 @@ static void direct_walk_iterator_reset_traversal( * range, so the last gfn to be interated over would be the largest possible * GFN, in this scenario.) */ -__attribute__((unused)) static void direct_walk_iterator_setup_walk(struct direct_walk_iterator *iter, struct kvm *kvm, int as_id, gfn_t start, gfn_t end, enum mmu_lock_mode lock_mode) @@ -2151,7 +2150,6 @@ static void direct_walk_iterator_setup_walk(struct direct_walk_iterator *iter, direct_walk_iterator_start_traversal(iter); } -__attribute__((unused)) static void direct_walk_iterator_retry_pte(struct direct_walk_iterator *iter) { BUG_ON(!iter->walk_in_progress); @@ -2397,7 +2395,6 @@ static bool cmpxchg_pte(u64 *ptep, u64 old_pte, u64 new_pte, int level, u64 gfn) return r == old_pte; } -__attribute__((unused)) static bool direct_walk_iterator_set_pte(struct direct_walk_iterator *iter, u64 new_pte) { @@ -2725,6 +2722,44 @@ static int kvm_handle_hva_range(struct kvm *kvm, return ret; } +/* + * Marks the range of gfns, [start, end), non-present. + */ +static bool zap_direct_gfn_range(struct kvm *kvm, int as_id, gfn_t start, + gfn_t end, enum mmu_lock_mode lock_mode) +{ + struct direct_walk_iterator iter; + + direct_walk_iterator_setup_walk(&iter, kvm, as_id, start, end, + lock_mode); + while (direct_walk_iterator_next_present_pte(&iter)) { + /* + * The gfn range should be handled at the largest granularity + * possible, however since the functions which handle changed + * PTEs (and freeing child PTs) will not yield, zapping an + * entry with too many child PTEs can lead to scheduler + * problems. In order to avoid scheduler problems, only zap + * PTEs at PDPE level and lower. The root level entries will be + * zapped and the high level page table pages freed on VM + * teardown. + */ + if ((iter.pte_gfn_start < start || + iter.pte_gfn_end > end || + iter.level > PT_PDPE_LEVEL) && + !is_last_spte(iter.old_pte, iter.level)) + continue; + + /* + * If the compare / exchange succeeds, then we will continue on + * to the next pte. If it fails, the next iteration will repeat + * the current pte. We'll handle both cases in the same way, so + * we don't need to check the result here. + */ + direct_walk_iterator_set_pte(&iter, 0); + } + return direct_walk_iterator_end_traversal(&iter); +} + static int kvm_handle_hva(struct kvm *kvm, unsigned long hva, unsigned long data, int (*handler)(struct kvm *kvm, @@ -6645,11 +6680,26 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm) */ static void kvm_mmu_zap_all_fast(struct kvm *kvm) { + int i; + lockdep_assert_held(&kvm->slots_lock); write_lock(&kvm->mmu_lock); trace_kvm_mmu_zap_all_fast(kvm); + /* Zap all direct MMU PTEs slowly */ + if (kvm->arch.direct_mmu_enabled) { + for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) + zap_direct_gfn_range(kvm, i, 0, ~0ULL, + MMU_WRITE_LOCK | MMU_LOCK_MAY_RESCHED); + } + + if (kvm->arch.pure_direct_mmu) { + kvm_flush_remote_tlbs(kvm); + write_unlock(&kvm->mmu_lock); + return; + } + /* * Toggle mmu_valid_gen between '0' and '1'. Because slots_lock is * held for the entire duration of zapping obsolete pages, it's @@ -6888,8 +6938,21 @@ void kvm_mmu_zap_all(struct kvm *kvm) struct kvm_mmu_page *sp, *node; LIST_HEAD(invalid_list); int ign; + int i; write_lock(&kvm->mmu_lock); + if (kvm->arch.direct_mmu_enabled) { + for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) + zap_direct_gfn_range(kvm, i, 0, ~0ULL, + MMU_WRITE_LOCK | MMU_LOCK_MAY_RESCHED); + kvm_flush_remote_tlbs(kvm); + } + + if (kvm->arch.pure_direct_mmu) { + write_unlock(&kvm->mmu_lock); + return; + } + restart: list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) { if (sp->role.invalid && sp->root_count) From patchwork Thu Sep 26 23:18:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163491 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3377D14ED for ; Thu, 26 Sep 2019 23:19:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 07F9020835 for ; Thu, 26 Sep 2019 23:19:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="q8U3azyE" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729043AbfIZXTG (ORCPT ); Thu, 26 Sep 2019 19:19:06 -0400 Received: from mail-pf1-f202.google.com ([209.85.210.202]:45657 "EHLO mail-pf1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729022AbfIZXTF (ORCPT ); Thu, 26 Sep 2019 19:19:05 -0400 Received: by mail-pf1-f202.google.com with SMTP id a2so495902pfo.12 for ; Thu, 26 Sep 2019 16:19:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=rbCHVX8S6Xv1F39QkuhFsIHrv/52XI+U8N8+e+hW8gw=; b=q8U3azyEov9GNtxrO8ip4/DIsryx6EgcujtMFmFSX0+OfjGrK+uNy8V5mCeW6s8eKW XtFWG4IoXLdm4Vpqk3abJ404vCOFR9GPFwMrbYs0oWsQF2SMBMLSvnJnLs3qzDaXLTXv G2ESGTx5i7t4Jdc+YwKF11J+YWAuY9iTOqOzUXGcnil7Q9YBsKwxNcmxjQsoaTdAlFq9 ME+hxmwzghij5DB9dRdgw+L/jJ91VlQLj7+bMQ74KMaqGoHYwyY5YMfGuWh2DAdUWCjY ZyVvBX8RO3Hdef92cUo0kLDEM0+TB59pJa7MDPnmhfjhupUzCbVEA6lpqwD803w9e2T0 soWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=rbCHVX8S6Xv1F39QkuhFsIHrv/52XI+U8N8+e+hW8gw=; b=Va0DqQRC0L4httkdlrjJ/vuR7Jsn8qvhji+RIleFzg19C2t4MVnkyaQZr4Z5UND8lP jr0mPIEwWOxKfCYgIlc60brcDY5jCcDlgaNKqyCt5PMowvBFEG5bZgL5lOlWlycOqBF6 UVJM68/RRvXGi31KMsngMn6h7MQ+sZgr5Uij5vfyRRQPUhk2opP9QpGO7sCKdvZTGSbE 4rIQc++P+co8TkOINwHPCfGIEpMR8OqRApoT8fs0y1SZGrviDoMGcWdNx6wHprx777lT I6AQGuHC/szQsutYYZxwzbT2VzJ3VQLuXgSnCRpjQHXVveYtBtIiilyqeE/lUNWp9ONY aFFg== X-Gm-Message-State: APjAAAWhPzfxFY4V7r7aVz1aacUH9xiPBV8jGruQ+Zk42oxk4WN0XuAP x96uwVpwGrzJsCZJGEFw3zxhwVXF1RuixMH4nk4Igac+ns3HEp78Wl1IDe2DXpY/CPzSnEYsL3g sllopuvVH7zL6Ll8pvoIF9iNMNF93bEwwykD7faXQDkPFznWLIV9VPIEnfWEy X-Google-Smtp-Source: APXvYqyzLY70mk2MDnm2hHzEF95H4ysEgJkbwxekRuBokTH7fm+rfbx8OVdb6AEmuIT9fv5ORsZIR1N86B40 X-Received: by 2002:a63:5745:: with SMTP id h5mr6179668pgm.268.1569539944225; Thu, 26 Sep 2019 16:19:04 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:12 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-17-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 16/28] kvm: mmu: Add direct MMU page fault handler From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Adds handler functions to replace __direct_map in handling direct page faults. These functions, unlike __direct_map can handle page faults on multiple VCPUs simultaneously. Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu.c | 192 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 179 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index f0696658b527c..f3a26a32c8174 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1117,6 +1117,24 @@ static struct pte_list_desc *mmu_alloc_pte_list_desc(struct kvm_vcpu *vcpu) return mmu_memory_cache_alloc(&vcpu->arch.mmu_pte_list_desc_cache); } +/* + * Return an unused object to the specified cache. The object's memory should + * be zeroed before being returned if that memory was modified after allocation + * from the cache. + */ +static void mmu_memory_cache_return(struct kvm_mmu_memory_cache *mc, + void *obj) +{ + /* + * Since this object was allocated from the cache, the cache should + * have at least one spare capacity to put the object back. + */ + BUG_ON(mc->nobjs >= ARRAY_SIZE(mc->objects)); + + mc->objects[mc->nobjs] = obj; + mc->nobjs++; +} + static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc) { kmem_cache_free(pte_list_desc_cache, pte_list_desc); @@ -2426,6 +2444,21 @@ static bool direct_walk_iterator_set_pte(struct direct_walk_iterator *iter, return r; } +static u64 generate_nonleaf_pte(u64 *child_pt, bool ad_disabled) +{ + u64 pte; + + pte = __pa(child_pt) | shadow_present_mask | PT_WRITABLE_MASK | + shadow_user_mask | shadow_x_mask | shadow_me_mask; + + if (ad_disabled) + pte |= shadow_acc_track_value; + else + pte |= shadow_accessed_mask; + + return pte; +} + /** * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages * @kvm: kvm instance @@ -3432,13 +3465,7 @@ static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep, BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK); - spte = __pa(sp->spt) | shadow_present_mask | PT_WRITABLE_MASK | - shadow_user_mask | shadow_x_mask | shadow_me_mask; - - if (sp_ad_disabled(sp)) - spte |= shadow_acc_track_value; - else - spte |= shadow_accessed_mask; + spte = generate_nonleaf_pte(sp->spt, sp_ad_disabled(sp)); mmu_spte_set(sptep, spte); @@ -4071,6 +4098,126 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write, return ret; } +static int direct_page_fault_handle_target_level(struct kvm_vcpu *vcpu, + int write, int map_writable, struct direct_walk_iterator *iter, + kvm_pfn_t pfn, bool prefault) +{ + u64 new_pte; + int ret = 0; + int generate_pte_ret = 0; + + if (unlikely(is_noslot_pfn(pfn))) + new_pte = generate_mmio_pte(vcpu, iter->pte_gfn_start, ACC_ALL); + else { + generate_pte_ret = generate_pte(vcpu, ACC_ALL, iter->level, + iter->pte_gfn_start, pfn, + iter->old_pte, prefault, false, + map_writable, false, &new_pte); + /* Failed to construct a PTE. Retry the page fault. */ + if (!new_pte) + return RET_PF_RETRY; + } + + /* + * If the page fault was caused by a write but the page is write + * protected, emulation is needed. If the emulation was skipped, + * the vcpu would have the same fault again. + */ + if ((generate_pte_ret & SET_SPTE_WRITE_PROTECTED_PT) && write) + ret = RET_PF_EMULATE; + + /* If an MMIO PTE was installed, the MMIO will need to be emulated. */ + if (unlikely(is_mmio_spte(new_pte))) + ret = RET_PF_EMULATE; + + /* + * If this would not change the PTE then some other thread must have + * already fixed the page fault and there's no need to proceed. + */ + if (iter->old_pte == new_pte) + return ret; + + /* + * If this warning were to trigger, it would indicate that there was a + * missing MMU notifier or this thread raced with some notifier + * handler. The page fault handler should never change a present, leaf + * PTE to point to a differnt PFN. A notifier handler should have + * zapped the PTE before the main MM's page table was changed. + */ + WARN_ON(is_present_direct_pte(iter->old_pte) && + is_present_direct_pte(new_pte) && + is_last_spte(iter->old_pte, iter->level) && + is_last_spte(new_pte, iter->level) && + spte_to_pfn(iter->old_pte) != spte_to_pfn(new_pte)); + + /* + * If the page fault handler lost the race to set the PTE, retry the + * page fault. + */ + if (!direct_walk_iterator_set_pte(iter, new_pte)) + return RET_PF_RETRY; + + /* + * Update some stats for this page fault, if the page + * fault was not speculative. + */ + if (!prefault) + vcpu->stat.pf_fixed++; + + return ret; + +} + +static int handle_direct_page_fault(struct kvm_vcpu *vcpu, + unsigned long mmu_seq, int write, int map_writable, int level, + gpa_t gpa, gfn_t gfn, kvm_pfn_t pfn, bool prefault) +{ + struct direct_walk_iterator iter; + struct kvm_mmu_memory_cache *pf_pt_cache = &vcpu->arch.mmu_page_cache; + u64 *child_pt; + u64 new_pte; + int ret = RET_PF_RETRY; + + direct_walk_iterator_setup_walk(&iter, vcpu->kvm, + kvm_arch_vcpu_memslots_id(vcpu), gpa >> PAGE_SHIFT, + (gpa >> PAGE_SHIFT) + 1, MMU_READ_LOCK); + while (direct_walk_iterator_next_pte(&iter)) { + if (iter.level == level) { + ret = direct_page_fault_handle_target_level(vcpu, + write, map_writable, &iter, pfn, + prefault); + + break; + } else if (!is_present_direct_pte(iter.old_pte) || + is_large_pte(iter.old_pte)) { + /* + * The leaf PTE for this fault must be mapped at a + * lower level, so a non-leaf PTE must be inserted into + * the paging structure. If the assignment below + * succeeds, it will add the non-leaf PTE and a new + * page of page table memory. Then the iterator can + * traverse into that new page. If the atomic compare/ + * exchange fails, the iterator will repeat the current + * PTE, so the only thing this function must do + * differently is return the page table memory to the + * vCPU's fault cache. + */ + child_pt = mmu_memory_cache_alloc(pf_pt_cache); + new_pte = generate_nonleaf_pte(child_pt, false); + + if (!direct_walk_iterator_set_pte(&iter, new_pte)) + mmu_memory_cache_return(pf_pt_cache, child_pt); + } + } + direct_walk_iterator_end_traversal(&iter); + + /* If emulating, flush this vcpu's TLB. */ + if (ret == RET_PF_EMULATE) + kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu); + + return ret; +} + static void kvm_send_hwpoison_signal(unsigned long address, struct task_struct *tsk) { send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, PAGE_SHIFT, tsk); @@ -5014,7 +5161,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, gfn_t gfn = gpa >> PAGE_SHIFT; unsigned long mmu_seq; int write = error_code & PFERR_WRITE_MASK; - bool map_writable; + bool map_writable = false; MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)); @@ -5035,8 +5182,9 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1); } - if (fast_page_fault(vcpu, gpa, level, error_code)) - return RET_PF_RETRY; + if (!vcpu->kvm->arch.direct_mmu_enabled) + if (fast_page_fault(vcpu, gpa, level, error_code)) + return RET_PF_RETRY; mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb(); @@ -5048,17 +5196,31 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, return r; r = RET_PF_RETRY; - write_lock(&vcpu->kvm->mmu_lock); + if (vcpu->kvm->arch.direct_mmu_enabled) + read_lock(&vcpu->kvm->mmu_lock); + else + write_lock(&vcpu->kvm->mmu_lock); + if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) goto out_unlock; if (make_mmu_pages_available(vcpu) < 0) goto out_unlock; if (likely(!force_pt_level)) transparent_hugepage_adjust(vcpu, gfn, &pfn, &level); - r = __direct_map(vcpu, gpa, write, map_writable, level, pfn, prefault); + + if (vcpu->kvm->arch.direct_mmu_enabled) + r = handle_direct_page_fault(vcpu, mmu_seq, write, map_writable, + level, gpa, gfn, pfn, prefault); + else + r = __direct_map(vcpu, gpa, write, map_writable, level, pfn, + prefault); out_unlock: - write_unlock(&vcpu->kvm->mmu_lock); + if (vcpu->kvm->arch.direct_mmu_enabled) + read_unlock(&vcpu->kvm->mmu_lock); + else + write_unlock(&vcpu->kvm->mmu_lock); + kvm_release_pfn_clean(pfn); return r; } @@ -6242,6 +6404,10 @@ static int make_mmu_pages_available(struct kvm_vcpu *vcpu) { LIST_HEAD(invalid_list); + if (vcpu->arch.mmu->direct_map && vcpu->kvm->arch.direct_mmu_enabled) + /* Reclaim is a todo. */ + return true; + if (likely(kvm_mmu_available_pages(vcpu->kvm) >= KVM_MIN_FREE_MMU_PAGES)) return 0; From patchwork Thu Sep 26 23:18:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163493 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6460D912 for ; Thu, 26 Sep 2019 23:19:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4303520835 for ; Thu, 26 Sep 2019 23:19:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="NXrdMUZL" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729050AbfIZXTI (ORCPT ); Thu, 26 Sep 2019 19:19:08 -0400 Received: from mail-pg1-f202.google.com ([209.85.215.202]:48379 "EHLO mail-pg1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729025AbfIZXTH (ORCPT ); Thu, 26 Sep 2019 19:19:07 -0400 Received: by mail-pg1-f202.google.com with SMTP id w13so2335249pge.15 for ; Thu, 26 Sep 2019 16:19:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=6mS1NA7D3Okqq6Iz6AwG0BJcrsajspJtR6LdvWfB7Lo=; b=NXrdMUZLcOYiwUyyAvHAkgcbbBICcwIV4StGAmANm1xk2TkvXzY/SQKiAT92scZ4xE 0IIwF/tzMQDMS/IIWImbctunYnzeEywBp+oQ46OFICzzRrD7L0EKqLcwRoWN2oAYpNE/ gDnLrYDBL9ZN4ZbwDsGO/sL4UrJmgtfniBL1tL9vLwumWn4KbR2YISGkg3emJHiSUxB+ pkKwM8qtd9isz6B51cCyCra1FMwLaG9HpKpBQ/ag0baBJ8e1O5/KIZvlJHoFZkklLylE Wmb0GCxYFfM7mn7xaqcf5cNV8W56bb8SxfcquXR+lgXDU/7FcApJSrOvsXcyAg0AQJlN W0Ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=6mS1NA7D3Okqq6Iz6AwG0BJcrsajspJtR6LdvWfB7Lo=; b=LawUQeD040lpnwQdDtVKHhqPMEgvYb2qJjXdyYXmhU8bZPklnlcg5vYKFvnJs2Igp1 uNgK5dUHmRfqTMXIHc1ttK8+HzEnKeOLu0la8GhJEW7Seyu1yWsv9GOtwBC5uZ+9DNp1 7ePqz9B/0M6qcEoBzgFWHSh9fqDsRjVgazGhXMurdccmfx7LE3+E9omcmtduYbXQEhsJ OJVChyVwtpkjXiJTUZNo3Jar0gVPl0OsooDmvfyQMtaSe4lFAR08jvKGyvTW01CjOGke bFF33TGvaSe18iG54XJIU+ACrqOPNTxdLX/a3XdslFTcsgg1F2V6PQoPvLaUBnVvlSGZ E5rA== X-Gm-Message-State: APjAAAXu8hIW/6nnZvE/Dzh+1ncaZUv56qP1FTfRaEnBRZUlXFkqh8Tb YHqB74P5GY4H0PDiiwYbrnHVFAmPOj3B0KIHNnqbmOBP3PbgHF3tI4Hnf7cdp+1z58YGuiYm4B7 u22M7tQrTxKnTJEo8eFzySt+t4ehjQIePXIPjqrlAaKONVli7thYVzjo+um0p X-Google-Smtp-Source: APXvYqx/Xado94oyi45O1zbfaUB6hKPsTb/DmCxwDWimZIiRiwG+fLEo+LMV0XrgDzGOQ0JdHVx5PDaaj3wf X-Received: by 2002:a63:3f46:: with SMTP id m67mr6125465pga.146.1569539946408; Thu, 26 Sep 2019 16:19:06 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:13 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-18-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 17/28] kvm: mmu: Add direct MMU fast page fault handler From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org While the direct MMU can handle page faults much faster than the existing implementation, it cannot handle faults caused by write protection or access tracking as quickly. Add a fast path similar to the existing fast path to handle these cases without the MMU read lock or calls to get_user_pages. Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu.c | 93 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 92 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index f3a26a32c8174..3d4a78f2461a9 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -4490,6 +4490,93 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level, return fault_handled; } +/* + * Attempt to handle a page fault without the use of get_user_pages, or + * acquiring the MMU lock. This function can handle page faults resulting from + * missing permissions on a PTE, set up by KVM for dirty logging or access + * tracking. + * + * Return value: + * - true: The page fault may have been fixed by this function. Let the vCPU + * access on the same address again. + * - false: This function cannot handle the page fault. Let the full page fault + * path fix it. + */ +static bool fast_direct_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, int level, + u32 error_code) +{ + struct direct_walk_iterator iter; + bool fault_handled = false; + bool remove_write_prot; + bool remove_acc_track; + u64 new_pte; + + if (!VALID_PAGE(vcpu->arch.mmu->root_hpa)) + return false; + + if (!page_fault_can_be_fast(error_code)) + return false; + + direct_walk_iterator_setup_walk(&iter, vcpu->kvm, + kvm_arch_vcpu_memslots_id(vcpu), gpa >> PAGE_SHIFT, + (gpa >> PAGE_SHIFT) + 1, MMU_NO_LOCK); + while (direct_walk_iterator_next_present_leaf_pte(&iter)) { + remove_write_prot = (error_code & PFERR_WRITE_MASK); + remove_write_prot &= !(iter.old_pte & PT_WRITABLE_MASK); + remove_write_prot &= spte_can_locklessly_be_made_writable( + iter.old_pte); + + remove_acc_track = is_access_track_spte(iter.old_pte); + + /* Verify that the fault can be handled in the fast path */ + if (!remove_acc_track && !remove_write_prot) + break; + + /* + * If dirty logging is enabled: + * + * Do not fix write-permission on the large spte since we only + * dirty the first page into the dirty-bitmap in + * fast_pf_fix_direct_spte() that means other pages are missed + * if its slot is dirty-logged. + * + * Instead, we let the slow page fault path create a normal spte + * to fix the access. + * + * See the comments in kvm_arch_commit_memory_region(). + */ + if (remove_write_prot && + iter.level > PT_PAGE_TABLE_LEVEL) + break; + + new_pte = iter.old_pte; + if (remove_acc_track) + new_pte = restore_acc_track_spte(iter.old_pte); + if (remove_write_prot) + new_pte |= PT_WRITABLE_MASK; + + if (new_pte == iter.old_pte) { + fault_handled = true; + break; + } + + if (!direct_walk_iterator_set_pte(&iter, new_pte)) + continue; + + if (remove_write_prot) + kvm_vcpu_mark_page_dirty(vcpu, iter.pte_gfn_start); + + fault_handled = true; + break; + } + direct_walk_iterator_end_traversal(&iter); + + trace_fast_page_fault(vcpu, gpa, error_code, iter.ptep, + iter.old_pte, fault_handled); + + return fault_handled; +} + static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, gva_t gva, kvm_pfn_t *pfn, bool write, bool *writable); static int make_mmu_pages_available(struct kvm_vcpu *vcpu); @@ -5182,9 +5269,13 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1); } - if (!vcpu->kvm->arch.direct_mmu_enabled) + if (vcpu->kvm->arch.direct_mmu_enabled) { + if (fast_direct_page_fault(vcpu, gpa, level, error_code)) + return RET_PF_RETRY; + } else { if (fast_page_fault(vcpu, gpa, level, error_code)) return RET_PF_RETRY; + } mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb(); From patchwork Thu Sep 26 23:18:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163497 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CF0C917D4 for ; Thu, 26 Sep 2019 23:19:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id AE2C82086A for ; Thu, 26 Sep 2019 23:19:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="LlBwOHHw" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729057AbfIZXTL (ORCPT ); Thu, 26 Sep 2019 19:19:11 -0400 Received: from mail-pf1-f201.google.com ([209.85.210.201]:49651 "EHLO mail-pf1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729000AbfIZXTL (ORCPT ); Thu, 26 Sep 2019 19:19:11 -0400 Received: by mail-pf1-f201.google.com with SMTP id i28so486179pfq.16 for ; Thu, 26 Sep 2019 16:19:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=ydNs+dGf9qH5fCzvJX67P4i4jDJmbvKjCUwxSiEESq0=; b=LlBwOHHwLzBiYwqRvte9RURUBOoy58oicZAPFN1wWHQ5tsOQCEXpQn+i5ctu0S6vbA 6qmg2NxKxYX6PsCa+5+zg+yXDvolOL1fhhqGS0MXjm1lBJ/Val/kaeRIcF+4dHb+MdrS BdcOaBJLKJDaKUtD5RHT9iwtyIRkuSBQQ859Av5oVoukzj7KWwUtibuXbDx24hpWL6+c 0YyZE9vuMvnktNw3mxUHuCsAEAfJKH0n3rCsWIKQ/mSeHDLuf45MrRkaK2CSL/e/bTiw BaglwqxiqZk9FT831KNBCgTjgxjw95nmf17IDsc2inDSset2thj7YS0H4Xyeiv5wrBVc FwAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ydNs+dGf9qH5fCzvJX67P4i4jDJmbvKjCUwxSiEESq0=; b=k8Ul0vpxPt5JpQtAIZHEpOyTKxSTGqBGbdDpj7z7nnEBKdfa319EWm6Mi27rwBfFBC RniqaflzEWLBdYjXHWUP411xuAt744wWPixqTX1A9sZcmTN/GGOoP0nHpnY75niml9sw KeJZuwjIUE/Ih7R8Mj0yg4IYArDlbllVSfudJ1Yx1YssLoeYcTI6SLovH2rr1P61kAsv 8a4JFsPzxkNNWIek3v4doNXvptS1FcVQGTE059zyjrvt8ixOuDIsQCDaonOUvdqPeU5T ktLSVVsJYQ9yJdbOyAmHX851Co46LkYr86r3n3v+Ek7xLyAo3uf4B7l06f7OjrSGFDcR vT7w== X-Gm-Message-State: APjAAAXUvAwMDwXKtXPAB8e2niSe+Mh35+Z64m5Qvi+ZzvmrjHEzaX+9 OxG6eBr4P1DwRMKkWezfaiJXF1PgfZmR6oPPr+SkJu/Pi55NxTkia+kHYwZqMWhs12SnrU5TcdI Ffdq3kHgHhSPCd5Al8/XAbmoJUvlGqUexEXMQbuhXyIxouQSKDA5J/aioPNcP X-Google-Smtp-Source: APXvYqyAlEecQbVjGiS3Xcto6KF+YkoUGm+S/RkP3N+1VO0UF5KDPE/OSm0pGqVIwmN+Nu72wdHuYw89yN2S X-Received: by 2002:a63:3585:: with SMTP id c127mr5805957pga.93.1569539948500; Thu, 26 Sep 2019 16:19:08 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:14 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-19-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 18/28] kvm: mmu: Add an hva range iterator for memslot GFNs From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Factors out a utility for iterating over host virtual address ranges to get the gfn ranges they map from kvm_handle_hva_range. This moves the rmap-reliant HVA iterator approach used for shadow paging to a wrapper around an HVA range to GFN range iterator. Since the direct MMU only maps each GFN to one physical address, and does not use the rmap, it can use the GFN ranges directly. Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu.c | 96 +++++++++++++++++++++++++++++++--------------- 1 file changed, 66 insertions(+), 30 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 3d4a78f2461a9..32426536723c6 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2701,27 +2701,14 @@ static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator) rmap_walk_init_level(iterator, iterator->level); } -#define for_each_slot_rmap_range(_slot_, _start_level_, _end_level_, \ - _start_gfn, _end_gfn, _iter_) \ - for (slot_rmap_walk_init(_iter_, _slot_, _start_level_, \ - _end_level_, _start_gfn, _end_gfn); \ - slot_rmap_walk_okay(_iter_); \ - slot_rmap_walk_next(_iter_)) - -static int kvm_handle_hva_range(struct kvm *kvm, - unsigned long start, - unsigned long end, - unsigned long data, - int (*handler)(struct kvm *kvm, - struct kvm_rmap_head *rmap_head, - struct kvm_memory_slot *slot, - gfn_t gfn, - int level, - unsigned long data)) +static int kvm_handle_direct_hva_range(struct kvm *kvm, unsigned long start, + unsigned long end, unsigned long data, + int (*handler)(struct kvm *kvm, struct kvm_memory_slot *memslot, + gfn_t gfn_start, gfn_t gfn_end, + unsigned long data)) { struct kvm_memslots *slots; struct kvm_memory_slot *memslot; - struct slot_rmap_walk_iterator iterator; int ret = 0; int i; @@ -2736,25 +2723,74 @@ static int kvm_handle_hva_range(struct kvm *kvm, (memslot->npages << PAGE_SHIFT)); if (hva_start >= hva_end) continue; - /* - * {gfn(page) | page intersects with [hva_start, hva_end)} = - * {gfn_start, gfn_start+1, ..., gfn_end-1}. - */ gfn_start = hva_to_gfn_memslot(hva_start, memslot); - gfn_end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, memslot); - - for_each_slot_rmap_range(memslot, PT_PAGE_TABLE_LEVEL, - PT_MAX_HUGEPAGE_LEVEL, - gfn_start, gfn_end - 1, - &iterator) - ret |= handler(kvm, iterator.rmap, memslot, - iterator.gfn, iterator.level, data); + gfn_end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, + memslot); + + ret |= handler(kvm, memslot, gfn_start, gfn_end, data); } } return ret; } +#define for_each_slot_rmap_range(_slot_, _start_level_, _end_level_, \ + _start_gfn, _end_gfn, _iter_) \ + for (slot_rmap_walk_init(_iter_, _slot_, _start_level_, \ + _end_level_, _start_gfn, _end_gfn); \ + slot_rmap_walk_okay(_iter_); \ + slot_rmap_walk_next(_iter_)) + + +struct handle_hva_range_shadow_data { + unsigned long data; + int (*handler)(struct kvm *kvm, struct kvm_rmap_head *rmap_head, + struct kvm_memory_slot *slot, gfn_t gfn, int level, + unsigned long data); +}; + +static int handle_hva_range_shadow_handler(struct kvm *kvm, + struct kvm_memory_slot *memslot, + gfn_t gfn_start, gfn_t gfn_end, + unsigned long data) +{ + int ret = 0; + struct slot_rmap_walk_iterator iterator; + struct handle_hva_range_shadow_data *shadow_data = + (struct handle_hva_range_shadow_data *)data; + + for_each_slot_rmap_range(memslot, PT_PAGE_TABLE_LEVEL, + PT_MAX_HUGEPAGE_LEVEL, + gfn_start, gfn_end - 1, &iterator) { + BUG_ON(!iterator.rmap); + ret |= shadow_data->handler(kvm, iterator.rmap, memslot, + iterator.gfn, iterator.level, shadow_data->data); + } + + return ret; +} + +static int kvm_handle_hva_range(struct kvm *kvm, + unsigned long start, + unsigned long end, + unsigned long data, + int (*handler)(struct kvm *kvm, + struct kvm_rmap_head *rmap_head, + struct kvm_memory_slot *slot, + gfn_t gfn, + int level, + unsigned long data)) +{ + struct handle_hva_range_shadow_data shadow_data; + + shadow_data.data = data; + shadow_data.handler = handler; + + return kvm_handle_direct_hva_range(kvm, start, end, + (unsigned long)&shadow_data, + handle_hva_range_shadow_handler); +} + /* * Marks the range of gfns, [start, end), non-present. */ From patchwork Thu Sep 26 23:18:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163495 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6151314ED for ; Thu, 26 Sep 2019 23:19:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 40A0B20835 for ; Thu, 26 Sep 2019 23:19:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="e53IdVL8" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729051AbfIZXTM (ORCPT ); Thu, 26 Sep 2019 19:19:12 -0400 Received: from mail-pl1-f202.google.com ([209.85.214.202]:53441 "EHLO mail-pl1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729000AbfIZXTL (ORCPT ); Thu, 26 Sep 2019 19:19:11 -0400 Received: by mail-pl1-f202.google.com with SMTP id g13so453099plq.20 for ; Thu, 26 Sep 2019 16:19:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Ab8tif6jeEAXLrD8vhHhma3+C4AYyNv8PtxKZt4emaM=; b=e53IdVL8NTngyayc4GJItm3h4ccqyNk3lgOTOg2ql9hwB79nK+cf1dzrPCf77XY+za aybsJMJRSkPWruhRfYT6IAOq1m7zE17wzLoOqKxyNCGERGpnlgle9lgfJgcSIkpzGqgO A6cXOKfAQLOSiwIJWf2zOUjiC5ArueiQ4fuucX4ek7fn3CaXY6vfwFt0BSHRH7WLZNsB wl/yi2Xqa5Lv9YYNeAoLOFvTuC8MPuCDj6RuSc8qTUG7og72kLKWvaO/fE0AJwafsGT7 +/RJaVe/vJzIdkbMvP7yXM2GVDA486yG1tv+QPco5qYCd9wSJoN9Yomnet5zbbHTgvHH 8Neg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Ab8tif6jeEAXLrD8vhHhma3+C4AYyNv8PtxKZt4emaM=; b=ib7ybfDThjEnOqXayhvlBZo3Hqf/lolAFs+wGXURQX5l9G5CzmwkqmnWKDHrEUxwg4 A42frEcz5fDlOFqNBsIe6nUStvqCSsEPUW2h/UdfP2MFFcn2KfVGYnRywKu+PuT+hdV1 y1rFFD0s1l63DmM+l+YV/exP8WCrwBKKoQInmgTFCS0jQY6CkrcnOy/win4rDcDn6Y+q qAb/XN5aSIX37u5QlReEdtMrMLToWlMMTNNE4TvCU7B1+NiP/FId6k3PkxeVV706TXsQ +reNTHyv3Fb633YNEgLrFUcIGFnegtU8waY5g9wh5BLcLxdDDIXsVHl4fINSMTvm+wpr E9qQ== X-Gm-Message-State: APjAAAVTMx9O8ppa/pZh+AoHx75srpyuHPDNllbgjBQaBpVa+OHnzu5N N2/qU2AejMfr1bUBA9/PJaSjV1zEZuRn+gihY79rMKMcTS36RuTvtYdwQzTMfgktAKTZcph2rKK UWohfJ5bvWrdxMPAoxcJvxOxRXMb2skE5idhUyLvWVK7ajgaviLOkMDD7oYAG X-Google-Smtp-Source: APXvYqylVdBUNUgHe0+IWIEUyxS1qY4EPWeTmVx0dw6QaQf8R1MRboN82DojvVJi/vR/jB3boVabAeDe/sBm X-Received: by 2002:a63:f401:: with SMTP id g1mr6224858pgi.314.1569539950737; Thu, 26 Sep 2019 16:19:10 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:15 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-20-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 19/28] kvm: mmu: Make address space ID a property of memslots From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Save address space ID as a field in each memslot so that functions that do not use rmaps (which implicitly encode the address space id) can handle multiple address spaces correctly. Signed-off-by: Ben Gardon --- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 1 + 2 files changed, 2 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 350a3b79cc8d1..ce6b22fcb90f3 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -347,6 +347,7 @@ struct kvm_memory_slot { struct kvm_arch_memory_slot arch; unsigned long userspace_addr; u32 flags; + int as_id; short id; }; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index c8559a86625ce..d494044104270 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -969,6 +969,7 @@ int __kvm_set_memory_region(struct kvm *kvm, new.base_gfn = base_gfn; new.npages = npages; new.flags = mem->flags; + new.as_id = as_id; if (npages) { if (!old.npages) From patchwork Thu Sep 26 23:18:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163499 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7F8B9912 for ; Thu, 26 Sep 2019 23:19:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5D5F7207E0 for ; Thu, 26 Sep 2019 23:19:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="M1Z2oAMk" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729029AbfIZXTQ (ORCPT ); Thu, 26 Sep 2019 19:19:16 -0400 Received: from mail-pg1-f202.google.com ([209.85.215.202]:45742 "EHLO mail-pg1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729000AbfIZXTQ (ORCPT ); Thu, 26 Sep 2019 19:19:16 -0400 Received: by mail-pg1-f202.google.com with SMTP id x31so2342209pgl.12 for ; Thu, 26 Sep 2019 16:19:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=pjsieQrVqzRo5zN4EmfN5iO0IfPCBNYoWVGBuyUs8Qg=; b=M1Z2oAMkPk6JfY68Hizv1Zgi11SrxKxsM+aC/SXiywqyr2sRFyDTGTvblWAwG1X/iM W2YMHZcx1SXFyy/ul87lZr+jBxxLaqNPBgGDaWr6dcY96W2gztU/u/fgfmHxZkf0BOXv RKI7SjnTfA51x9DDO5HBVg9aZRVM57i6urq3dgbEkKUYXE5Ev4gTo++4Ys1Zf8tevaSh v0sRnuJBFepPKbZO4YLmaIrBP/xh2v6DylcF/sKDHi0c7+B/QjXUxv78TqoI734EcLhk nn7jvgIkR+yXIL82hQ/F4m3jflNTo4XPX3m3iP1+K9UMGWFnqrK3KP0S5ppyx3kzsHMD lquw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=pjsieQrVqzRo5zN4EmfN5iO0IfPCBNYoWVGBuyUs8Qg=; b=n8IWgllDJ6inpBkP5GXhdTBHpTo+pkh1CidjLFP80ORoBcRVQ7yPAJ3mPmHVjFJwLW yzkzqVydTMOdhgegpZ3po8DMlA9Bwn2uTyZHFaxdFUEvGhjIbFDV1LJO6BGynRUMWUHP JJUDZvXVGz1u6KHIjv34TYx9xtjDW/felB9gjU3l9wck8xsl6kRCZOQZyzgqsLMibbro Sr9jjx3EehenkAl7RTvBsfV7L8TtEkEApWYZ6Vf8f7lwmbDKJQ1ku4rMp/I7SlYTR18Z PtMKqKzwtzFodxAbQFSvR6HfZIAW68P03hTgiax8xzprhJIRDN7G1jNlOHhvboxjHWJ+ bgpg== X-Gm-Message-State: APjAAAW8YyeVn3GdLLFRdApoRi+bdBEx/eLKomgLAKPAanOLvt0Wpa+l aGKti/RegdEZo3CurfscE7wee/RKFgk4ZQms9gfXQ/z2FgyHNL9b7JWxQ3zQuN92/hYFpF8EibD YY9I+Kd6Xa/de3B8B0Cm6iJsn5eh0cCzfS4IfgNI6NQP5ubS/eiLloolAYMxv X-Google-Smtp-Source: APXvYqx7Faea5HKGs73dumpZ1D1T3gyHtvbcnqAcoCEIHi9dw7VbN5oB+yBP0Olfz1k48+/iGdL66ygWiojR X-Received: by 2002:a63:ca06:: with SMTP id n6mr5863067pgi.17.1569539953221; Thu, 26 Sep 2019 16:19:13 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:16 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-21-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 20/28] kvm: mmu: Implement the invalidation MMU notifiers for the direct MMU From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Implements arch specific handler functions for the invalidation MMU notifiers, using a paging structure iterator. These handlers are responsible for zapping paging structure entries to enable the main MM to safely remap memory that was used to back guest memory. Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 32426536723c6..ca9b3f574f401 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2829,6 +2829,22 @@ static bool zap_direct_gfn_range(struct kvm *kvm, int as_id, gfn_t start, return direct_walk_iterator_end_traversal(&iter); } +static int zap_direct_gfn_range_handler(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t start, gfn_t end, + unsigned long data) +{ + return zap_direct_gfn_range(kvm, slot->as_id, start, end, + MMU_WRITE_LOCK); +} + +static bool zap_direct_hva_range(struct kvm *kvm, unsigned long start, + unsigned long end) +{ + return kvm_handle_direct_hva_range(kvm, start, end, 0, + zap_direct_gfn_range_handler); +} + static int kvm_handle_hva(struct kvm *kvm, unsigned long hva, unsigned long data, int (*handler)(struct kvm *kvm, @@ -2842,7 +2858,13 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva, int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end) { - return kvm_handle_hva_range(kvm, start, end, 0, kvm_unmap_rmapp); + int r = 0; + + if (kvm->arch.direct_mmu_enabled) + r |= zap_direct_hva_range(kvm, start, end); + if (!kvm->arch.pure_direct_mmu) + r |= kvm_handle_hva_range(kvm, start, end, 0, kvm_unmap_rmapp); + return r; } int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) From patchwork Thu Sep 26 23:18:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163501 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1026414ED for ; Thu, 26 Sep 2019 23:19:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E34D920835 for ; Thu, 26 Sep 2019 23:19:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="iXWnTMR7" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729066AbfIZXTS (ORCPT ); Thu, 26 Sep 2019 19:19:18 -0400 Received: from mail-pf1-f202.google.com ([209.85.210.202]:38234 "EHLO mail-pf1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728923AbfIZXTS (ORCPT ); Thu, 26 Sep 2019 19:19:18 -0400 Received: by mail-pf1-f202.google.com with SMTP id o73so506139pfg.5 for ; Thu, 26 Sep 2019 16:19:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=F99sLOt6ycqNpVSwRazWYbJGaaY9sNjDqxSMusz0x3Q=; b=iXWnTMR7sovsTNb7RwuU1sO/9obsyk5x2c2g8W9SWQLW5CMgjoQ99yNcU9qR8qCVlK aogC9AjD8eGueFb6ve4S/u8IFwUDAWSzaTBLC/3Vb8+x9MvLk/thfUlQZ8gbitCNF89H WpvCPJ0UInLXUaWLRRFMVZZXfAz4eoLaZWYpWmkBEzOumY9RmaZpa6+ID5MquLpp14ry 3CuCYiu9nqpXcu4r2svqE1gwd3KVihlU4RDjT3mQUNxnIgc1iKdhIO/VBoAVjCUhmdot EJ/79gZEQdYIKC78FPBOxmCQ8+/nRvt05ivM+xFuzpJkNVmAH/6WtGZ1qX4B24p7nvmQ aiKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=F99sLOt6ycqNpVSwRazWYbJGaaY9sNjDqxSMusz0x3Q=; b=meq2vgkw63N2AZSlUWHZJXcJGQuWqnefgYeKOYPA9VCVzfuYgC+m+/dUsrAOmJAJf2 bXr1Ob178GWEiwq1/lxwMKP7xZxj3IydhQYI1griYJe0lmohUmr/Q+n/z9MI0GXZzPgq 7YoqzwrBJk+owNLFOrY1M4FNdUwWMMXNxvaXvd/u8k/N0GGBL/PgA0hfSuPfEHU457CE FJIZjgzOsSlEyCHtBbyZF9AdhMUnyNqS61ZiYwv02keszchrQ+jPU2Zxyx18GtSvwYB5 nf2j9vOYBNRMlNrjMIrw8rjBL1Z6rwYEJ974N1dyICvqOYwCBgWWqRAQP8L+B/bLIVRo NhZw== X-Gm-Message-State: APjAAAU8s0gH270N2j0ZjirXVK3KIdRkTIAPyU0/ZuqaQLGxsERyU9yj XEyFvWrejM3iNb+sV5TcMYaJAnj4T8Ud1eruGQxWgTBHfVUOf/wG9UR7mXAk2U2Ok4PKu3B44LP LOo6Rjlbcw0JRtOA1nRRdqPWZMD2jpSaRtEvhy8KsnyoG3oEb21ojmoReegPg X-Google-Smtp-Source: APXvYqxy4rEPW7fSq0jEXq3PQ3dro8OT2SKGdp0doBqa2qiOp3znwJyslw+3RPUpUKvQvBBrmk0rzSd8U74y X-Received: by 2002:a63:78cf:: with SMTP id t198mr6065109pgc.227.1569539955863; Thu, 26 Sep 2019 16:19:15 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:17 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-22-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 21/28] kvm: mmu: Integrate the direct mmu with the changed pte notifier From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Implements arch specific handler functions for the changed pte MMU notifier. This handler uses the paging structure walk iterator and is needed to allow the main MM to update page permissions safely on pages backing guest memory. Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu.c | 53 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 51 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index ca9b3f574f401..b144c803c36d2 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2386,7 +2386,6 @@ static bool direct_walk_iterator_next_present_pte( /* * As direct_walk_iterator_next_present_pte but skips over non-leaf ptes. */ -__attribute__((unused)) static bool direct_walk_iterator_next_present_leaf_pte( struct direct_walk_iterator *iter) { @@ -2867,9 +2866,59 @@ int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end) return r; } +static int set_direct_pte_gfn(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t start, gfn_t end, unsigned long pte) +{ + struct direct_walk_iterator iter; + pte_t host_pte; + kvm_pfn_t new_pfn; + u64 new_pte; + + host_pte.pte = pte; + new_pfn = pte_pfn(host_pte); + + direct_walk_iterator_setup_walk(&iter, kvm, slot->as_id, start, end, + MMU_WRITE_LOCK); + while (direct_walk_iterator_next_present_leaf_pte(&iter)) { + BUG_ON(iter.level != PT_PAGE_TABLE_LEVEL); + + if (pte_write(host_pte)) + new_pte = 0; + else { + new_pte = iter.old_pte & ~PT64_BASE_ADDR_MASK; + new_pte |= new_pfn << PAGE_SHIFT; + new_pte &= ~PT_WRITABLE_MASK; + new_pte &= ~SPTE_HOST_WRITEABLE; + new_pte &= ~shadow_dirty_mask; + new_pte &= ~shadow_accessed_mask; + new_pte = mark_spte_for_access_track(new_pte); + } + + if (!direct_walk_iterator_set_pte(&iter, new_pte)) + continue; + } + return direct_walk_iterator_end_traversal(&iter); +} + +static int set_direct_pte_hva(struct kvm *kvm, unsigned long address, + pte_t host_pte) +{ + return kvm_handle_direct_hva_range(kvm, address, address + 1, + host_pte.pte, set_direct_pte_gfn); +} + int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) { - return kvm_handle_hva(kvm, hva, (unsigned long)&pte, kvm_set_pte_rmapp); + int need_flush = 0; + + WARN_ON(pte_huge(pte)); + + if (kvm->arch.direct_mmu_enabled) + need_flush |= set_direct_pte_hva(kvm, hva, pte); + if (!kvm->arch.pure_direct_mmu) + need_flush |= kvm_handle_hva(kvm, hva, (unsigned long)&pte, + kvm_set_pte_rmapp); + return need_flush; } static int kvm_age_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head, From patchwork Thu Sep 26 23:18:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163503 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3470C14ED for ; Thu, 26 Sep 2019 23:19:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 084FB207E0 for ; Thu, 26 Sep 2019 23:19:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="AVaY56eN" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729034AbfIZXTV (ORCPT ); Thu, 26 Sep 2019 19:19:21 -0400 Received: from mail-qk1-f201.google.com ([209.85.222.201]:46448 "EHLO mail-qk1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729060AbfIZXTV (ORCPT ); Thu, 26 Sep 2019 19:19:21 -0400 Received: by mail-qk1-f201.google.com with SMTP id x186so817032qke.13 for ; Thu, 26 Sep 2019 16:19:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=zUj93KTO3Bj0DuO7C1ywcJksue4QOJ1xOkJ8pFn7qX0=; b=AVaY56eN6ZkokF6BsY3KMUTU6xzOaYVEl9pcF61NUz4f1pT1/b4ep7zvdd6D6lkTO2 4xzLhnMeJDa9PpGARIlh5LvKCjijWxDEODkzLaBF5bvIKVXkv9w2mG/FLFNxCB+rmI1x PFf6PFyQBl374jFYbVcLBGecHy2KQwEskmkSQ6BuWhm3vFCYHTKiuZaIKcPAFP8GYzqD sqeLbvWg3dWC6Iu4yHAGj9U3r+gj9zRjQfe+EpaZAaJmhSQ+dTsfAlNDqmUiw67+jFfL OuVNRG4yGW46busQtaK1JaLByJoer3plqIEmiTishskOBVXvS6pmGuR4eBgZN0ZVKXKQ Ws+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=zUj93KTO3Bj0DuO7C1ywcJksue4QOJ1xOkJ8pFn7qX0=; b=BhualqJ9knGxP1ngXqwbqOY4L9ZlQW1+t2GMX1n56Y5fxbtQyfmpDq9IOwQFdhXiSi drbGoIUD8iMZt/rzh6mrRse/vBT3XnsQntlw9ky6zq8diHQ1apz20ZRlAPPbp9fvCUZo H8tjtn4HFew4Ihvp4uReXyxGicUpTCYcUXdO01KCqtLbpPrtuWrXtPNzqPYoaMU1tLmr QZVn3KRY0ddrfgwnY4qfnudpq+1WrEAelWHJtggKYnVD2cGFiw+KkLuZQZuD7suMV6sc s0Ur/zxsDfUL8IEUNyKU6VM19ce7CLKci+/ylqcaOI/Bg0DjCExvU3UTdO7jyuZpEGGu e5Lw== X-Gm-Message-State: APjAAAXYlFXpNZhMwuTSB1z4LH/dan0zwYGARdJlYSp6GPlkVyzvkN+a asmTE6nje5l7gH/67L3j9plQFWH3LQiPIiQkPFqPnR055qk5t+s0FqeS25UYocC/Vavh/X2+LjF mFzzWkSIwM/U3XbR44ygj9ZYB1zcUoCpcrH+aNpOsbuWGxrDV4y34wcDKYbLe X-Google-Smtp-Source: APXvYqwkjfOAxyVkFOdXaNuE2POb89kbOdFWBlXXqCMIcvwhdKkiX71CUbjWrn/i4wo5z3LEQDjhuvVDEbqq X-Received: by 2002:ac8:75cd:: with SMTP id z13mr6748518qtq.87.1569539958233; Thu, 26 Sep 2019 16:19:18 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:18 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-23-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 22/28] kvm: mmu: Implement access tracking for the direct MMU From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Adds functions for dealing with the accessed state of PTEs which can operate with the direct MMU. Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu.c | 153 +++++++++++++++++++++++++++++++++++++++++--- virt/kvm/kvm_main.c | 7 +- 2 files changed, 150 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index b144c803c36d2..cc81ba5ee46d6 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -779,6 +779,17 @@ static bool spte_has_volatile_bits(u64 spte) return false; } +static bool is_accessed_direct_pte(u64 pte, int level) +{ + if (!is_last_spte(pte, level)) + return false; + + if (shadow_accessed_mask) + return pte & shadow_accessed_mask; + + return pte & shadow_acc_track_mask; +} + static bool is_accessed_spte(u64 spte) { u64 accessed_mask = spte_shadow_accessed_mask(spte); @@ -929,6 +940,14 @@ static u64 mmu_spte_get_lockless(u64 *sptep) return __get_spte_lockless(sptep); } +static u64 save_pte_permissions_for_access_track(u64 pte) +{ + pte |= (pte & shadow_acc_track_saved_bits_mask) << + shadow_acc_track_saved_bits_shift; + pte &= ~shadow_acc_track_mask; + return pte; +} + static u64 mark_spte_for_access_track(u64 spte) { if (spte_ad_enabled(spte)) @@ -944,16 +963,13 @@ static u64 mark_spte_for_access_track(u64 spte) */ WARN_ONCE((spte & PT_WRITABLE_MASK) && !spte_can_locklessly_be_made_writable(spte), - "kvm: Writable SPTE is not locklessly dirty-trackable\n"); + "kvm: Writable PTE is not locklessly dirty-trackable\n"); WARN_ONCE(spte & (shadow_acc_track_saved_bits_mask << shadow_acc_track_saved_bits_shift), "kvm: Access Tracking saved bit locations are not zero\n"); - spte |= (spte & shadow_acc_track_saved_bits_mask) << - shadow_acc_track_saved_bits_shift; - spte &= ~shadow_acc_track_mask; - + spte = save_pte_permissions_for_access_track(spte); return spte; } @@ -1718,6 +1734,15 @@ static void free_pt_rcu_callback(struct rcu_head *rp) free_page((unsigned long)disconnected_pt); } +static void handle_changed_pte_acc_track(u64 old_pte, u64 new_pte, int level) +{ + bool pfn_changed = spte_to_pfn(old_pte) != spte_to_pfn(new_pte); + + if (is_accessed_direct_pte(old_pte, level) && + (!is_accessed_direct_pte(new_pte, level) || pfn_changed)) + kvm_set_pfn_accessed(spte_to_pfn(old_pte)); +} + /* * Takes a snapshot of, and clears, the direct MMU disconnected pt list. Once * TLBs have been flushed, this snapshot can be transferred to the direct MMU @@ -1847,6 +1872,7 @@ static void mark_pte_disconnected(struct kvm *kvm, int as_id, gfn_t gfn, handle_changed_pte(kvm, as_id, gfn, old_pte, DISCONNECTED_PTE, level, vm_teardown, disconnected_pts); + handle_changed_pte_acc_track(old_pte, DISCONNECTED_PTE, level); } /** @@ -2412,8 +2438,8 @@ static bool cmpxchg_pte(u64 *ptep, u64 old_pte, u64 new_pte, int level, u64 gfn) return r == old_pte; } -static bool direct_walk_iterator_set_pte(struct direct_walk_iterator *iter, - u64 new_pte) +static bool direct_walk_iterator_set_pte_raw(struct direct_walk_iterator *iter, + u64 new_pte, bool handle_acc_track) { bool r; @@ -2435,6 +2461,10 @@ static bool direct_walk_iterator_set_pte(struct direct_walk_iterator *iter, iter->old_pte, new_pte, iter->level, false, &iter->disconnected_pts); + if (handle_acc_track) + handle_changed_pte_acc_track(iter->old_pte, new_pte, + iter->level); + if (iter->lock_mode & (MMU_WRITE_LOCK | MMU_READ_LOCK)) iter->tlbs_dirty++; } else @@ -2443,6 +2473,18 @@ static bool direct_walk_iterator_set_pte(struct direct_walk_iterator *iter, return r; } +static bool direct_walk_iterator_set_pte_no_acc_track( + struct direct_walk_iterator *iter, u64 new_pte) +{ + return direct_walk_iterator_set_pte_raw(iter, new_pte, false); +} + +static bool direct_walk_iterator_set_pte(struct direct_walk_iterator *iter, + u64 new_pte) +{ + return direct_walk_iterator_set_pte_raw(iter, new_pte, true); +} + static u64 generate_nonleaf_pte(u64 *child_pt, bool ad_disabled) { u64 pte; @@ -2965,14 +3007,107 @@ static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn) KVM_PAGES_PER_HPAGE(sp->role.level)); } +static int age_direct_gfn_range(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t start, gfn_t end, unsigned long ignored) +{ + struct direct_walk_iterator iter; + int young = 0; + u64 new_pte = 0; + + direct_walk_iterator_setup_walk(&iter, kvm, slot->as_id, start, end, + MMU_WRITE_LOCK); + while (direct_walk_iterator_next_present_leaf_pte(&iter)) { + /* + * If we have a non-accessed entry we don't need to change the + * pte. + */ + if (!is_accessed_direct_pte(iter.old_pte, iter.level)) + continue; + + if (shadow_accessed_mask) + new_pte = iter.old_pte & ~shadow_accessed_mask; + else { + new_pte = save_pte_permissions_for_access_track( + iter.old_pte); + new_pte |= shadow_acc_track_value; + } + + /* + * We've created a new pte with the accessed state cleared. + * Warn if we're about to put in a pte that still looks + * accessed. + */ + WARN_ON(is_accessed_direct_pte(new_pte, iter.level)); + + if (!direct_walk_iterator_set_pte_no_acc_track(&iter, new_pte)) + continue; + + young = true; + + if (shadow_accessed_mask) + trace_kvm_age_page(iter.pte_gfn_start, iter.level, slot, + young); + } + direct_walk_iterator_end_traversal(&iter); + + return young; +} + int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end) { - return kvm_handle_hva_range(kvm, start, end, 0, kvm_age_rmapp); + int young = 0; + + if (kvm->arch.direct_mmu_enabled) + young |= kvm_handle_direct_hva_range(kvm, start, end, 0, + age_direct_gfn_range); + + if (!kvm->arch.pure_direct_mmu) + young |= kvm_handle_hva_range(kvm, start, end, 0, + kvm_age_rmapp); + return young; +} + +static int test_age_direct_gfn_range(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t start, gfn_t end, + unsigned long ignored) +{ + struct direct_walk_iterator iter; + int young = 0; + + direct_walk_iterator_setup_walk(&iter, kvm, slot->as_id, start, end, + MMU_WRITE_LOCK); + while (direct_walk_iterator_next_present_leaf_pte(&iter)) { + if (is_accessed_direct_pte(iter.old_pte, iter.level)) { + young = true; + break; + } + } + direct_walk_iterator_end_traversal(&iter); + + return young; } int kvm_test_age_hva(struct kvm *kvm, unsigned long hva) { - return kvm_handle_hva(kvm, hva, 0, kvm_test_age_rmapp); + int young = 0; + + /* + * If there's no access bit in the secondary pte set by the + * hardware it's up to gup-fast/gup to set the access bit in + * the primary pte or in the page structure. + */ + if (!shadow_accessed_mask) + return young; + + if (kvm->arch.direct_mmu_enabled) + young |= kvm_handle_direct_hva_range(kvm, hva, hva + 1, 0, + test_age_direct_gfn_range); + + if (!kvm->arch.pure_direct_mmu) + young |= kvm_handle_hva(kvm, hva, 0, kvm_test_age_rmapp); + + return young; } #ifdef MMU_DEBUG diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d494044104270..771e159d6bea9 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -439,7 +439,12 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn, write_lock(&kvm->mmu_lock); young = kvm_age_hva(kvm, start, end); - if (young) + + /* + * If there was an accessed page in the provided range, or there are + * un-flushed paging structure changes, flush the TLBs. + */ + if (young || kvm->tlbs_dirty) kvm_flush_remote_tlbs(kvm); write_unlock(&kvm->mmu_lock); From patchwork Thu Sep 26 23:18:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163505 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B6497912 for ; Thu, 26 Sep 2019 23:19:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 93FCB2086A for ; Thu, 26 Sep 2019 23:19:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="GMFf7MPW" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729071AbfIZXTV (ORCPT ); Thu, 26 Sep 2019 19:19:21 -0400 Received: from mail-pg1-f201.google.com ([209.85.215.201]:54436 "EHLO mail-pg1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729048AbfIZXTV (ORCPT ); Thu, 26 Sep 2019 19:19:21 -0400 Received: by mail-pg1-f201.google.com with SMTP id m17so2321066pgh.21 for ; Thu, 26 Sep 2019 16:19:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=p5p3XErJyYwp3TttwsaCpTh9txi10KL45ED980LPgg0=; b=GMFf7MPW942Con3/Rc2iJn8yCbxcqfm+v7lFZxdo86jyIW8VfsL43BcRgFVD2Ah715 dyQGfyf1ZOEXSobzfHyG3xt2MARUQtM8o3xx6SH4UEUYg1H88kmCjneD0dUv+FTJqZc9 O4jYISq6cfpSPzboZgcckHlbvtsAiF8hJ50qa25imoQ5eIIWjeUuWn2vtorO2p+J1Yv9 fgkmMgS5FArtQ0/C2VjpGlw5EdWNVUYOGq/LBa5+Gsq1elhsuqw50+qfzLnVaBP0Lduq HSaFT/UOixGHKnYuBwU3YjfFdnXz0/DUkUywcCatgkeQmNiKiP5eYtRntErqOvI2Vsm3 xDqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=p5p3XErJyYwp3TttwsaCpTh9txi10KL45ED980LPgg0=; b=nG5fdv0ru2EUS3gR/qXj4ZQEXEwaqF9IqwIQ2nYgxd23+OWkMhjqk9Qd0djbbfYuJ6 cHX1ELTY4/u4xOF3QyoQu1rquPyhJZ0E3UiuPxg7YCBAuQBmCPFwq/2NnEZP2NJo2N/X cCXMeOuHLK+ms22Lg2End4HjPwH1Xl3y3M/8taaYsaG39+aN1c+qRkeB4FAqp4TqBryZ s2O4dkiEU/3+YIK3Cosvj79Q/dZQzKlnzHtOIYxSUpBtFI4CaZ7rYUtZWeBMpZt38e/K 5CL4HY5k6zpsdu2AEMjqUFTd2AhINRe3daHI9ng9fgURdtkVJgYSwFFxp5T75B4FyqEm 6skg== X-Gm-Message-State: APjAAAXeiFjJ/0EPGCJuKwJdwz1wCVhj6XxdBxqGAAYe/+zDAUABx7EV VbnaiAO/FyTuh8rL4kVnVMAy+wgn3tvmE3nQYp05NHGp6BU1UJk3zqQIqjDTjCJYb+L0sRpm0iR OYNXhzXHKAyht5lP2Y6XR1klgCrH30j5WrHDKRAHO7ktT5gUTdNzyfP9jotBi X-Google-Smtp-Source: APXvYqx/tRFyJsYEGQZ+9leVtfPKHme5XskASyrx5EnjXQ6VY6WPrRUn5jYo8an8xbCeop9PnyfltsgTBxkc X-Received: by 2002:a63:6d0:: with SMTP id 199mr5758315pgg.299.1569539960545; Thu, 26 Sep 2019 16:19:20 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:19 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-24-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 23/28] kvm: mmu: Make mark_page_dirty_in_slot usable from outside kvm_main From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org When operating on PTEs within a memslot, the dirty status of the page must be recorded for dirty logging. Currently the only mechanism for marking pages dirty in mmu.c is mark_page_dirty, which assumes address space 0. This means that dirty pages in other address spaces will be lost. Signed-off-by: Ben Gardon --- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 6 ++---- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index ce6b22fcb90f3..1212d5c8a3f6d 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -753,6 +753,7 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len); struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn); bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn); unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn); +void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, gfn_t gfn); void mark_page_dirty(struct kvm *kvm, gfn_t gfn); struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 771e159d6bea9..ffc6951f2bc93 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -130,8 +130,6 @@ static void hardware_disable_all(void); static void kvm_io_bus_destroy(struct kvm_io_bus *bus); -static void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, gfn_t gfn); - __visible bool kvm_rebooting; EXPORT_SYMBOL_GPL(kvm_rebooting); @@ -2214,8 +2212,7 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len) } EXPORT_SYMBOL_GPL(kvm_clear_guest); -static void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, - gfn_t gfn) +void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, gfn_t gfn) { if (memslot && memslot->dirty_bitmap) { unsigned long rel_gfn = gfn - memslot->base_gfn; @@ -2223,6 +2220,7 @@ static void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, set_bit_le(rel_gfn, memslot->dirty_bitmap); } } +EXPORT_SYMBOL_GPL(mark_page_dirty_in_slot); void mark_page_dirty(struct kvm *kvm, gfn_t gfn) { From patchwork Thu Sep 26 23:18:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163507 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5871F912 for ; Thu, 26 Sep 2019 23:19:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2566720835 for ; Thu, 26 Sep 2019 23:19:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="pKnvT6Hg" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729052AbfIZXTY (ORCPT ); Thu, 26 Sep 2019 19:19:24 -0400 Received: from mail-pf1-f201.google.com ([209.85.210.201]:36834 "EHLO mail-pf1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728923AbfIZXTY (ORCPT ); Thu, 26 Sep 2019 19:19:24 -0400 Received: by mail-pf1-f201.google.com with SMTP id 194so510486pfu.3 for ; Thu, 26 Sep 2019 16:19:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=8HD0TRmSoGCa91TLgT7Wl5W5ZqHPYC6XlL7tI1HTTBk=; b=pKnvT6Hg6lFBsOGhCWWH8YQLnZ7hZd8006IBg53fuek5jytJO8M9pZzBLthK/pXQgb FC3i487C9svDvCXo766VGD6NtNWDUKBjQp1fYIpe77yCBzwn4Duw5fc0gVPMNZtCJthG 2J3IlhFOjizeXDX/Cto4beAb42G6VgQCUaYAmjvg1nJobAgCBFZGF+o7iDl34J0L2a54 KvCyjTEEShRnCGEMnNLODzKMPUueJhBelQm3t1HKskSykVfJea3e1uF/rlIongwQ2tVr xY9icHDH6klF9IeVfsPrg5YMGBa6fuuXxEV8ciF3ehJvepqRhAUXGraV47ajCK7qQEJk jSEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=8HD0TRmSoGCa91TLgT7Wl5W5ZqHPYC6XlL7tI1HTTBk=; b=rYusKNQ2NkqHjULmTyDXvuK5+ctItX5snR7LWVwRWviZzbZZQvSMugJCz4ZnTLytKh TRRaLCZtmsMIfkX2Mj2BrJfitXS2fTSEdSTc//8PG/WxOEktg/rhwIGTpvkUxt98HSPn pP3qhgJg+kf8coxhB02HxR9dkUtiwi+NbbtLK/b7WH8PD/jZxiCb5ilvz4FuB3RzRAoE TSiGBwMr+m3TpuGwMumZwXd074eI6oCsggAtG1Q7wgjsx/i2zTQ7ZyujanKxtl8fy2fz MSdww/o+W19CoySwl75Z0GUdIohI261vNkhUZ72kLclNTrCS1p+/EUHfc+9MAW3XpsDl d9fw== X-Gm-Message-State: APjAAAVtd+AfiemgkhdzCtRPhc4FMe6Y1kb5FaJzjta9h7xO3R6jlKGB +aoZHNnA8yE3yQBRXo5GV4NFR7PLOAW+Ox2Op9AleYtS0XgRoSQcJ+CZFksHqF6d0LtElfHuacz beZkB1TM85HH/E6lrV+xWu8xvwlhVpchxJFMRAMfGlbLpihK4xO96lL9tFlkz X-Google-Smtp-Source: APXvYqxumecpzpya7qCvjMzVto+bHggKvYwVfcY8JrtTnDd4T+/S0jxzqXnKcldOGIxfmMrcf+21ABDmL1h8 X-Received: by 2002:a63:741a:: with SMTP id p26mr5923543pgc.177.1569539962803; Thu, 26 Sep 2019 16:19:22 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:20 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-25-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 24/28] kvm: mmu: Support dirty logging in the direct MMU From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Adds functions for handling changes to the dirty state of PTEs and functions for enabling / resetting dirty logging which use a paging structure iterator. Signed-off-by: Ben Gardon --- arch/x86/include/asm/kvm_host.h | 10 ++ arch/x86/kvm/mmu.c | 259 ++++++++++++++++++++++++++++++-- arch/x86/kvm/vmx/vmx.c | 10 +- arch/x86/kvm/x86.c | 4 +- 4 files changed, 269 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 9bf149dce146d..b6a3380e66d44 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1305,6 +1305,16 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, struct kvm_memory_slot *memslot); void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm, const struct kvm_memory_slot *memslot); + +#define KVM_DIRTY_LOG_MODE_WRPROT 1 +#define KVM_DIRTY_LOG_MODE_PML 2 + +void kvm_mmu_zap_collapsible_direct_ptes(struct kvm *kvm, + const struct kvm_memory_slot *memslot); +void reset_direct_mmu_dirty_logging(struct kvm *kvm, + struct kvm_memory_slot *slot, + int dirty_log_mode, + bool record_dirty_pages); void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, struct kvm_memory_slot *memslot); void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm, diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index cc81ba5ee46d6..ca58b27a17c52 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -790,6 +790,18 @@ static bool is_accessed_direct_pte(u64 pte, int level) return pte & shadow_acc_track_mask; } +static bool is_dirty_direct_pte(u64 pte, int dlog_mode) +{ + /* If the pte is non-present, the entry cannot have been dirtied. */ + if (!is_present_direct_pte(pte)) + return false; + + if (dlog_mode == KVM_DIRTY_LOG_MODE_WRPROT) + return pte & PT_WRITABLE_MASK; + + return pte & shadow_dirty_mask; +} + static bool is_accessed_spte(u64 spte) { u64 accessed_mask = spte_shadow_accessed_mask(spte); @@ -1743,6 +1755,38 @@ static void handle_changed_pte_acc_track(u64 old_pte, u64 new_pte, int level) kvm_set_pfn_accessed(spte_to_pfn(old_pte)); } +static void handle_changed_pte_dlog(struct kvm *kvm, int as_id, gfn_t gfn, + u64 old_pte, u64 new_pte, int level) +{ + bool pfn_changed = spte_to_pfn(old_pte) != spte_to_pfn(new_pte); + bool was_wrprot_dirty = is_dirty_direct_pte(old_pte, + KVM_DIRTY_LOG_MODE_WRPROT); + bool is_wrprot_dirty = is_dirty_direct_pte(new_pte, + KVM_DIRTY_LOG_MODE_WRPROT); + bool wrprot_dirty = (!was_wrprot_dirty || pfn_changed) && + is_wrprot_dirty; + struct kvm_memory_slot *slot; + + if (level > PT_PAGE_TABLE_LEVEL) + return; + + /* + * Only mark pages dirty if they are becoming writable or no longer have + * the dbit set and dbit dirty logging is enabled. + * If pages are marked dirty when unsetting the dbit when dbit + * dirty logging isn't on, it can cause spurious dirty pages, e.g. from + * zapping PTEs during VM teardown. + * If, on the other hand, pages were only marked dirty when becoming + * writable when in wrprot dirty logging, that would also cause problems + * because dirty pages could be lost when switching from dbit to wrprot + * dirty logging. + */ + if (wrprot_dirty) { + slot = __gfn_to_memslot(__kvm_memslots(kvm, as_id), gfn); + mark_page_dirty_in_slot(slot, gfn); + } +} + /* * Takes a snapshot of, and clears, the direct MMU disconnected pt list. Once * TLBs have been flushed, this snapshot can be transferred to the direct MMU @@ -1873,6 +1917,8 @@ static void mark_pte_disconnected(struct kvm *kvm, int as_id, gfn_t gfn, handle_changed_pte(kvm, as_id, gfn, old_pte, DISCONNECTED_PTE, level, vm_teardown, disconnected_pts); handle_changed_pte_acc_track(old_pte, DISCONNECTED_PTE, level); + handle_changed_pte_dlog(kvm, as_id, gfn, old_pte, DISCONNECTED_PTE, + level); } /** @@ -1964,6 +2010,14 @@ static void handle_changed_pte(struct kvm *kvm, int as_id, gfn_t gfn, bool was_present = is_present_direct_pte(old_pte); bool is_present = is_present_direct_pte(new_pte); bool was_leaf = was_present && is_last_spte(old_pte, level); + bool was_dirty = is_dirty_direct_pte(old_pte, + KVM_DIRTY_LOG_MODE_WRPROT) || + is_dirty_direct_pte(old_pte, + KVM_DIRTY_LOG_MODE_PML); + bool is_dirty = is_dirty_direct_pte(new_pte, + KVM_DIRTY_LOG_MODE_WRPROT) || + is_dirty_direct_pte(new_pte, + KVM_DIRTY_LOG_MODE_PML); bool pfn_changed = spte_to_pfn(old_pte) != spte_to_pfn(new_pte); int child_level; @@ -1990,6 +2044,9 @@ static void handle_changed_pte(struct kvm *kvm, int as_id, gfn_t gfn, return; } + if (((was_dirty && !is_dirty) || pfn_changed) && was_leaf) + kvm_set_pfn_dirty(spte_to_pfn(old_pte)); + if (was_present && !was_leaf && (pfn_changed || !is_present)) { /* * The level of the page table being freed is one level lower @@ -2439,7 +2496,8 @@ static bool cmpxchg_pte(u64 *ptep, u64 old_pte, u64 new_pte, int level, u64 gfn) } static bool direct_walk_iterator_set_pte_raw(struct direct_walk_iterator *iter, - u64 new_pte, bool handle_acc_track) + u64 new_pte, bool handle_acc_track, + bool handle_dlog) { bool r; @@ -2464,6 +2522,11 @@ static bool direct_walk_iterator_set_pte_raw(struct direct_walk_iterator *iter, if (handle_acc_track) handle_changed_pte_acc_track(iter->old_pte, new_pte, iter->level); + if (handle_dlog) + handle_changed_pte_dlog(iter->kvm, iter->as_id, + iter->pte_gfn_start, + iter->old_pte, new_pte, + iter->level); if (iter->lock_mode & (MMU_WRITE_LOCK | MMU_READ_LOCK)) iter->tlbs_dirty++; @@ -2476,13 +2539,19 @@ static bool direct_walk_iterator_set_pte_raw(struct direct_walk_iterator *iter, static bool direct_walk_iterator_set_pte_no_acc_track( struct direct_walk_iterator *iter, u64 new_pte) { - return direct_walk_iterator_set_pte_raw(iter, new_pte, false); + return direct_walk_iterator_set_pte_raw(iter, new_pte, false, true); +} + +static bool direct_walk_iterator_set_pte_no_dlog( + struct direct_walk_iterator *iter, u64 new_pte) +{ + return direct_walk_iterator_set_pte_raw(iter, new_pte, true, false); } static bool direct_walk_iterator_set_pte(struct direct_walk_iterator *iter, u64 new_pte) { - return direct_walk_iterator_set_pte_raw(iter, new_pte, true); + return direct_walk_iterator_set_pte_raw(iter, new_pte, true, true); } static u64 generate_nonleaf_pte(u64 *child_pt, bool ad_disabled) @@ -2500,6 +2569,83 @@ static u64 generate_nonleaf_pte(u64 *child_pt, bool ad_disabled) return pte; } +static u64 mark_direct_pte_for_dirty_track(u64 pte, int dlog_mode) +{ + if (dlog_mode == KVM_DIRTY_LOG_MODE_WRPROT) + pte &= ~PT_WRITABLE_MASK; + else + pte &= ~shadow_dirty_mask; + + return pte; +} + +void reset_direct_mmu_dirty_logging(struct kvm *kvm, + struct kvm_memory_slot *slot, + int dirty_log_mode, bool record_dirty_pages) +{ + struct direct_walk_iterator iter; + u64 new_pte; + bool pte_set; + + write_lock(&kvm->mmu_lock); + + direct_walk_iterator_setup_walk(&iter, kvm, slot->as_id, slot->base_gfn, + slot->base_gfn + slot->npages, + MMU_WRITE_LOCK); + while (direct_walk_iterator_next_present_leaf_pte(&iter)) { + if (iter.level == PT_PAGE_TABLE_LEVEL && + !is_dirty_direct_pte(iter.old_pte, dirty_log_mode)) + continue; + + new_pte = mark_direct_pte_for_dirty_track(iter.old_pte, + dirty_log_mode); + + if (record_dirty_pages) + pte_set = direct_walk_iterator_set_pte(&iter, new_pte); + else + pte_set = direct_walk_iterator_set_pte_no_dlog(&iter, + new_pte); + if (!pte_set) + continue; + } + if (direct_walk_iterator_end_traversal(&iter)) + kvm_flush_remote_tlbs(kvm); + write_unlock(&kvm->mmu_lock); +} +EXPORT_SYMBOL_GPL(reset_direct_mmu_dirty_logging); + +static bool clear_direct_dirty_log_gfn_masked(struct kvm *kvm, + struct kvm_memory_slot *slot, gfn_t gfn, unsigned long mask, + int dirty_log_mode, enum mmu_lock_mode lock_mode) +{ + struct direct_walk_iterator iter; + u64 new_pte; + + direct_walk_iterator_setup_walk(&iter, kvm, slot->as_id, + gfn + __ffs(mask), gfn + BITS_PER_LONG, lock_mode); + while (mask && direct_walk_iterator_next_present_leaf_pte(&iter)) { + if (iter.level > PT_PAGE_TABLE_LEVEL) { + BUG_ON(iter.old_pte & PT_WRITABLE_MASK); + continue; + } + + if (!is_dirty_direct_pte(iter.old_pte, dirty_log_mode)) + continue; + + if (!(mask & (1UL << (iter.pte_gfn_start - gfn)))) + continue; + + new_pte = mark_direct_pte_for_dirty_track(iter.old_pte, + dirty_log_mode); + + if (!direct_walk_iterator_set_pte_no_dlog(&iter, new_pte)) + continue; + + mask &= ~(1UL << (iter.pte_gfn_start - gfn)); + } + return direct_walk_iterator_end_traversal(&iter); +} + /** * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages * @kvm: kvm instance @@ -2509,12 +2655,24 @@ static u64 generate_nonleaf_pte(u64 *child_pt, bool ad_disabled) * * Used when we do not need to care about huge page mappings: e.g. during dirty * logging we do not have any such mappings. + * + * We don't need to worry about flushing tlbs here as they are flushed + * unconditionally at a higher level. See the comments on + * kvm_vm_ioctl_get_dirty_log and kvm_mmu_slot_remove_write_access. */ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn_offset, unsigned long mask) { struct kvm_rmap_head *rmap_head; + gfn_t gfn = slot->base_gfn + gfn_offset; + + if (kvm->arch.direct_mmu_enabled) + clear_direct_dirty_log_gfn_masked(kvm, slot, gfn, mask, + KVM_DIRTY_LOG_MODE_WRPROT, + MMU_WRITE_LOCK); + if (kvm->arch.pure_direct_mmu) + return; while (mask) { rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask), @@ -2541,6 +2699,16 @@ void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm, gfn_t gfn_offset, unsigned long mask) { struct kvm_rmap_head *rmap_head; + gfn_t gfn = slot->base_gfn + gfn_offset; + + if (!mask) + return; + + if (kvm->arch.direct_mmu_enabled) + clear_direct_dirty_log_gfn_masked(kvm, slot, gfn, mask, + KVM_DIRTY_LOG_MODE_PML, MMU_WRITE_LOCK); + if (kvm->arch.pure_direct_mmu) + return; while (mask) { rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask), @@ -3031,6 +3199,7 @@ static int age_direct_gfn_range(struct kvm *kvm, struct kvm_memory_slot *slot, iter.old_pte); new_pte |= shadow_acc_track_value; } + new_pte &= ~shadow_dirty_mask; /* * We've created a new pte with the accessed state cleared. @@ -7293,11 +7462,17 @@ static bool slot_rmap_write_protect(struct kvm *kvm, void kvm_mmu_slot_remove_write_access(struct kvm *kvm, struct kvm_memory_slot *memslot) { - bool flush; + bool flush = false; + + if (kvm->arch.direct_mmu_enabled) + reset_direct_mmu_dirty_logging(kvm, memslot, + KVM_DIRTY_LOG_MODE_WRPROT, false); write_lock(&kvm->mmu_lock); - flush = slot_handle_all_level(kvm, memslot, slot_rmap_write_protect, - false); + if (!kvm->arch.pure_direct_mmu) + flush = slot_handle_all_level(kvm, memslot, + slot_rmap_write_protect, + false); write_unlock(&kvm->mmu_lock); /* @@ -7367,8 +7542,42 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm, { /* FIXME: const-ify all uses of struct kvm_memory_slot. */ write_lock(&kvm->mmu_lock); - slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot, - kvm_mmu_zap_collapsible_spte, true); + if (!kvm->arch.pure_direct_mmu) + slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot, + kvm_mmu_zap_collapsible_spte, true); + write_unlock(&kvm->mmu_lock); +} + +void kvm_mmu_zap_collapsible_direct_ptes(struct kvm *kvm, + const struct kvm_memory_slot *memslot) +{ + struct direct_walk_iterator iter; + kvm_pfn_t pfn; + + if (!kvm->arch.direct_mmu_enabled) + return; + + write_lock(&kvm->mmu_lock); + + direct_walk_iterator_setup_walk(&iter, kvm, memslot->as_id, + memslot->base_gfn, + memslot->base_gfn + memslot->npages, + MMU_READ_LOCK | MMU_LOCK_MAY_RESCHED); + while (direct_walk_iterator_next_present_leaf_pte(&iter)) { + pfn = spte_to_pfn(iter.old_pte); + if (kvm_is_reserved_pfn(pfn) || + !PageTransCompoundMap(pfn_to_page(pfn))) + continue; + /* + * If the compare / exchange succeeds, then we will continue on + * to the next pte. If it fails, the next iteration will repeat + * the current pte. We'll handle both cases in the same way, so + * we don't need to check the result here. + */ + direct_walk_iterator_set_pte(&iter, 0); + } + direct_walk_iterator_end_traversal(&iter); + write_unlock(&kvm->mmu_lock); } @@ -7414,18 +7623,46 @@ void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm, } EXPORT_SYMBOL_GPL(kvm_mmu_slot_largepage_remove_write_access); +static bool slot_set_dirty_direct(struct kvm *kvm, + struct kvm_memory_slot *memslot) +{ + struct direct_walk_iterator iter; + u64 new_pte; + + direct_walk_iterator_setup_walk(&iter, kvm, memslot->as_id, + memslot->base_gfn, memslot->base_gfn + memslot->npages, + MMU_WRITE_LOCK | MMU_LOCK_MAY_RESCHED); + while (direct_walk_iterator_next_present_pte(&iter)) { + new_pte = iter.old_pte | shadow_dirty_mask; + + if (!direct_walk_iterator_set_pte(&iter, new_pte)) + continue; + } + return direct_walk_iterator_end_traversal(&iter); +} + void kvm_mmu_slot_set_dirty(struct kvm *kvm, struct kvm_memory_slot *memslot) { - bool flush; + bool flush = false; write_lock(&kvm->mmu_lock); - flush = slot_handle_all_level(kvm, memslot, __rmap_set_dirty, false); + if (kvm->arch.direct_mmu_enabled) + flush |= slot_set_dirty_direct(kvm, memslot); + + if (!kvm->arch.pure_direct_mmu) + flush |= slot_handle_all_level(kvm, memslot, __rmap_set_dirty, + false); write_unlock(&kvm->mmu_lock); lockdep_assert_held(&kvm->slots_lock); - /* see kvm_mmu_slot_leaf_clear_dirty */ + /* + * It's also safe to flush TLBs out of mmu lock here as currently this + * function is only used for dirty logging, in which case flushing TLB + * out of mmu lock also guarantees no dirty pages will be lost in + * dirty_bitmap. + */ if (flush) kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn, memslot->npages); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index d4575ffb3cec7..aab8f3ab456ec 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7221,8 +7221,14 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu) static void vmx_slot_enable_log_dirty(struct kvm *kvm, struct kvm_memory_slot *slot) { - kvm_mmu_slot_leaf_clear_dirty(kvm, slot); - kvm_mmu_slot_largepage_remove_write_access(kvm, slot); + if (kvm->arch.direct_mmu_enabled) + reset_direct_mmu_dirty_logging(kvm, slot, + KVM_DIRTY_LOG_MODE_PML, false); + + if (!kvm->arch.pure_direct_mmu) { + kvm_mmu_slot_leaf_clear_dirty(kvm, slot); + kvm_mmu_slot_largepage_remove_write_access(kvm, slot); + } } static void vmx_slot_disable_log_dirty(struct kvm *kvm, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2972b6c6029fb..edd7d7bece2fe 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9776,8 +9776,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, */ if (change == KVM_MR_FLAGS_ONLY && (old->flags & KVM_MEM_LOG_DIRTY_PAGES) && - !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) + !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) { kvm_mmu_zap_collapsible_sptes(kvm, new); + kvm_mmu_zap_collapsible_direct_ptes(kvm, new); + } /* * Set up write protection and/or dirty logging for the new slot. From patchwork Thu Sep 26 23:18:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163509 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A18A2912 for ; Thu, 26 Sep 2019 23:19:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8192B20835 for ; Thu, 26 Sep 2019 23:19:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="SZn15o07" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729054AbfIZXT0 (ORCPT ); Thu, 26 Sep 2019 19:19:26 -0400 Received: from mail-pl1-f201.google.com ([209.85.214.201]:34949 "EHLO mail-pl1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729075AbfIZXTZ (ORCPT ); Thu, 26 Sep 2019 19:19:25 -0400 Received: by mail-pl1-f201.google.com with SMTP id o12so481502pll.2 for ; Thu, 26 Sep 2019 16:19:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=rjAV0P/C6E3BGhkruiYDZPulhPa8Gd0ceWhLOWGPyq4=; b=SZn15o0788D3ocpRkzfN4Dzt1Lh1ZZmALbCZM+rU97IjwYgC/ughizxyXYOAu+xCJq OxMFHPWRGHpMQf0bOKGfl9d/JJh3PVHR3+cinysHcY+4HBvf0GLwm38Pen8UBRagX8j/ NL8R7UJBmIFCdWmDMiCNKL9l8NmFjcAn5cfDVZNjIKmJP8KVdF0CX1DBgZ8+/k3iXwHD 0DVYvJvRHAjJHsTG9sFkAZYFi5AcWpCcnC7NR08IW5rlnJVX9Fi/A33mVZ5y0i7bjXXB QImToM9c0oiyCHHSRj0vuaLPhlIs8DVDrUjTw6CoAUub/l6LovTRMho+uTMyHm/0L9qH CePw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=rjAV0P/C6E3BGhkruiYDZPulhPa8Gd0ceWhLOWGPyq4=; b=VFiIah5K4024UzM5hWV6mUF9Oin9QSqTr/bSu8owbMCh6BLdiY1qDIvQ2yNSfyzSM9 nSAkuJeOO6owd3ZBS6GWsxLaKlUFtRaUtfQ3t9pV4Ny8XLiT8bQDWaSlCq+IZSOWVpId GlmOgqleF8QO6eb5lenxVxatlzkc+P0sIO4kddgOuasaODOWvgaRR5C8WILZArQiHUpX CA3O34fujbSxvANfqSDVDbrZnsFLFcLmdEOqSKpftzH8h1xddxxwUQyNi9NgaHJYyefX WGNmiA/MrAUjkDZVHBMWbxyUMRuM6m51uW9kfYqJP8Xam5GDP4aFZcwvw9fAOFJi4QJk XIow== X-Gm-Message-State: APjAAAXcOfyp/UYV2ZOLu9ahB2LAuTepq0znBNFXPg48XtFsV9TLMOKT i0h9mfq4n4YLx383GI+rEQ2K4v29Qm6IyeazBKWQ6Q2VKoNBhdn2q4/56abmKY4GElaGisWhugC RZ4yFlROWxHp5C0r+hoteMEEw7xltkB0nnQRTvK4kTc4jgScsosjIpMqkLrp1 X-Google-Smtp-Source: APXvYqwSd5rwD/xI88cGwMjUAafzG6rKacbAvolJihEMJJHVyDqNvLwvvYkvusHISF1WlBWPtQ+mNPsq7IK/ X-Received: by 2002:a63:e512:: with SMTP id r18mr5892821pgh.117.1569539964969; Thu, 26 Sep 2019 16:19:24 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:21 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-26-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 25/28] kvm: mmu: Support kvm_zap_gfn_range in the direct MMU From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add a function for zapping ranges of GFNs in a memslot to support kvm_zap_gfn_range for the direct MMU. Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu.c | 27 +++++++++++++++++++++------ arch/x86/kvm/mmu.h | 2 ++ 2 files changed, 23 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index ca58b27a17c52..a0c5271ae2381 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -7427,13 +7427,32 @@ void kvm_mmu_uninit_vm(struct kvm *kvm) kvm_mmu_uninit_direct_mmu(kvm); } +void kvm_zap_slot_gfn_range(struct kvm *kvm, struct kvm_memory_slot *memslot, + gfn_t start, gfn_t end) +{ + write_lock(&kvm->mmu_lock); + if (kvm->arch.direct_mmu_enabled) { + zap_direct_gfn_range(kvm, memslot->as_id, start, end, + MMU_READ_LOCK); + } + + if (kvm->arch.pure_direct_mmu) { + write_unlock(&kvm->mmu_lock); + return; + } + + slot_handle_level_range(kvm, memslot, kvm_zap_rmapp, + PT_PAGE_TABLE_LEVEL, PT_MAX_HUGEPAGE_LEVEL, + start, end - 1, true); + write_unlock(&kvm->mmu_lock); +} + void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) { struct kvm_memslots *slots; struct kvm_memory_slot *memslot; int i; - write_lock(&kvm->mmu_lock); for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { slots = __kvm_memslots(kvm, i); kvm_for_each_memslot(memslot, slots) { @@ -7444,13 +7463,9 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) if (start >= end) continue; - slot_handle_level_range(kvm, memslot, kvm_zap_rmapp, - PT_PAGE_TABLE_LEVEL, PT_MAX_HUGEPAGE_LEVEL, - start, end - 1, true); + kvm_zap_slot_gfn_range(kvm, memslot, start, end); } } - - write_unlock(&kvm->mmu_lock); } static bool slot_rmap_write_protect(struct kvm *kvm, diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 11f8ec89433b6..4ea8a72c8868d 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -204,6 +204,8 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, } void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end); +void kvm_zap_slot_gfn_range(struct kvm *kvm, struct kvm_memory_slot *memslot, + gfn_t start, gfn_t end); void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn); void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn); From patchwork Thu Sep 26 23:18:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163511 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 03F37912 for ; Thu, 26 Sep 2019 23:19:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D5E282086A for ; Thu, 26 Sep 2019 23:19:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="irIWD+rb" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729072AbfIZXT2 (ORCPT ); Thu, 26 Sep 2019 19:19:28 -0400 Received: from mail-pl1-f202.google.com ([209.85.214.202]:54810 "EHLO mail-pl1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729073AbfIZXT2 (ORCPT ); Thu, 26 Sep 2019 19:19:28 -0400 Received: by mail-pl1-f202.google.com with SMTP id j9so453609plk.21 for ; Thu, 26 Sep 2019 16:19:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=JfDeLVxckSbuxvoobfrDn0KoCHmXdzom4g1j7usnEf8=; b=irIWD+rb7q97rYKemG2B/3ZPawfdoH6L4aaUTkeipkVYAmZGB5pn3p8/VoHyMJHviU d4UKYVfL/oQIqnPIkd+ieV1wk0I+QfATqBVHIrswjG5F1akSR3Z2yKKKRnorGzW//3ee AbooL3rflLP7OORaxz4uJ3D0clgEj4OCwPdDru7j8bdcI3ybiar1xoJ3ev8rcdUf3eFY KjBnJHaXZNeT8aYYSbkvZjkK0p4d7N1MqQjcU1lkNNvltzC+WPJg4XHJBAFjEQC/0rx7 8sS2/ZII9X5Laycf/GqllgxrbhdPofuyTYoEitzDBSsoGbk+MUdovt5ILPLVUrCFEJBL 4ZwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=JfDeLVxckSbuxvoobfrDn0KoCHmXdzom4g1j7usnEf8=; b=bOo+/BlTZlcL3PIoJXvtOqfBXoQtpDxLzs494Ff98tl/reCXPkImX7QN7dKsNZAnyU uIqPo8GMQ2blBOEQD8OV6flgWfk2pzfit95g4Kk07VdnCn1HrqgBKzzM6EH7lNzPQ2N5 VGZRKEJzKRde8dO93H1yncMrsCyENYyFfU0IrhB843sTYQsSackp3URRoOc2YfnY+Ojl nkYzsGNS08ffoa0JpQgrI7HDG76EYRtoi/wsVXgfAWnJ3GfiW1fw+lDmOned745/al6L DxFdkIYDSuXSFgfnZj3xz7Y3AvnIkkHUfuvFBlAXifQDs/egmAETwZ8qc5Ys3u03UO8R vQ5g== X-Gm-Message-State: APjAAAUSCVOskZdqtXeE3kpXUDLkP08SnmPOvaLzB6a5h0nHDJev9cFJ VjEqychIhAdB9Yr1824C8B3Ordm4Cf8NljfsQdz/ZcccxK4syUqK+vqRJmZkTVYMJHmqZN4MEa/ l0xY/OFboVZ45w47wl3BXqqKZ+nGrhXff8JZ9lSfOam6UE+E0dEnfV/GuOHPi X-Google-Smtp-Source: APXvYqycUacQUNFjaesE8Ugk2g3b1EDiUnQse53R0VirEk/9laIBFfgjEKQJxlkUEZ0XZ+n9G3U1buoDFD3l X-Received: by 2002:a63:4857:: with SMTP id x23mr6080652pgk.142.1569539967277; Thu, 26 Sep 2019 16:19:27 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:22 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-27-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 26/28] kvm: mmu: Integrate direct MMU with nesting From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Allows the existing nesting implementation to interoperate with the direct MMU. Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu.c | 51 ++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 45 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index a0c5271ae2381..e0f35da0d1027 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2742,6 +2742,29 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask); } +static bool rmap_write_protect_direct_gfn(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn) +{ + struct direct_walk_iterator iter; + u64 new_pte; + + direct_walk_iterator_setup_walk(&iter, kvm, slot->as_id, gfn, gfn + 1, + MMU_WRITE_LOCK); + while (direct_walk_iterator_next_present_leaf_pte(&iter)) { + if (!is_writable_pte(iter.old_pte) && + !spte_can_locklessly_be_made_writable(iter.old_pte)) + break; + + new_pte = iter.old_pte & + ~(PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE); + + if (!direct_walk_iterator_set_pte(&iter, new_pte)) + continue; + } + return direct_walk_iterator_end_traversal(&iter); +} + /** * kvm_arch_write_log_dirty - emulate dirty page logging * @vcpu: Guest mode vcpu @@ -2764,6 +2787,10 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, int i; bool write_protected = false; + if (kvm->arch.direct_mmu_enabled) + write_protected |= rmap_write_protect_direct_gfn(kvm, slot, + gfn); + for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) { rmap_head = __gfn_to_rmap(gfn, i, slot); write_protected |= __rmap_write_protect(kvm, rmap_head, true); @@ -5755,6 +5782,8 @@ static bool cached_root_available(struct kvm_vcpu *vcpu, gpa_t new_cr3, uint i; struct kvm_mmu_root_info root; struct kvm_mmu *mmu = vcpu->arch.mmu; + bool direct_mmu_root = (vcpu->kvm->arch.direct_mmu_enabled && + new_role.direct); root.cr3 = mmu->root_cr3; root.hpa = mmu->root_hpa; @@ -5762,10 +5791,14 @@ static bool cached_root_available(struct kvm_vcpu *vcpu, gpa_t new_cr3, for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) { swap(root, mmu->prev_roots[i]); - if (new_cr3 == root.cr3 && VALID_PAGE(root.hpa) && - page_header(root.hpa) != NULL && - new_role.word == page_header(root.hpa)->role.word) - break; + if (new_cr3 == root.cr3 && VALID_PAGE(root.hpa)) { + BUG_ON(direct_mmu_root && + !is_direct_mmu_root(vcpu->kvm, root.hpa)); + + if (direct_mmu_root || (page_header(root.hpa) != NULL && + new_role.word == page_header(root.hpa)->role.word)) + break; + } } mmu->root_hpa = root.hpa; @@ -5813,8 +5846,14 @@ static bool fast_cr3_switch(struct kvm_vcpu *vcpu, gpa_t new_cr3, */ vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY); - __clear_sp_write_flooding_count( - page_header(mmu->root_hpa)); + /* + * If this is a direct MMU root page, it doesn't have a + * write flooding count. + */ + if (!(vcpu->kvm->arch.direct_mmu_enabled && + new_role.direct)) + __clear_sp_write_flooding_count( + page_header(mmu->root_hpa)); return true; } From patchwork Thu Sep 26 23:18:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163513 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2D365912 for ; Thu, 26 Sep 2019 23:19:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0555B20835 for ; Thu, 26 Sep 2019 23:19:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="UHT5p+/f" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729090AbfIZXTd (ORCPT ); Thu, 26 Sep 2019 19:19:33 -0400 Received: from mail-qt1-f201.google.com ([209.85.160.201]:56643 "EHLO mail-qt1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729070AbfIZXTc (ORCPT ); Thu, 26 Sep 2019 19:19:32 -0400 Received: by mail-qt1-f201.google.com with SMTP id m6so3975547qtk.23 for ; Thu, 26 Sep 2019 16:19:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=LsnPdTp5GaF/lyahKH1vXf0L/46JuHXj66N8OEChvcw=; b=UHT5p+/fYiB2mVGalDD3QamKeFpexD5xPdbGirmdxGCOFk1bHht+I2nMePtIJT4ACs UMfr9YhOfJ7YAPrSnxfyoksMvT6Yf9Etr9DixePSzgZ0zjHle7T3xyz8iK4tI/ksG+aT lfxpZMzNhwQOy9H4kxJAgybMqxirsaQb6N+ocPVv4QThIHl7bRUo+YY31RyCplQp9jCW LynlaxfM2B+buYzKFnQovEcu6kgxw5SJN+cAp0/LSZaDD4aCqfh1/+ft3tSAy+SD18MQ AGP9ZWKzRRGw1p5JCfQ+BGhZ81YZwVPfJKxGPQhEMTlpewBAjLgNi5HNL3ClEwk/eb9h eQng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=LsnPdTp5GaF/lyahKH1vXf0L/46JuHXj66N8OEChvcw=; b=CUyQHlITUuqnXCeTj9PbnzB0Pw+WUKreZMFR5HSfuhYbiY6bTjr06xk3vhlZj5c0HY s9Nf3ibPj4v/ju/yDGIZIeMnJWAi53SjdISv2jWMNeOgvN4taIWbYf6SkW5dDaMW0XxH dERZu/K0DrSzb1r72d88I1C8c6EL4I1rDcMwSsu/BnW7+qFqNAZUehtYJBfVzNevv/nT GMCl3hVxOxg1ptoNp6kcl2UDNP/ypcB7mxp0rNuJbXWRCLgSuzBbSXAZJd33UfpcrBAk VzoKvRsq07aHhuiK52BeLudYfSBgrMD30Gyj+z07U7jTgTOM7peqqwKREBKtOwEoD/nr xKzw== X-Gm-Message-State: APjAAAU8A3jfP9nERsO74tUzDWy4oldOcxbwQF1hnSnNivUQftqi2E7B 1Lfoq4nb0WxQQ+HBHITxxeFfcyNRYJ+zR70c7EU1QrsANXDS4oc10qpbqqlv9P/cmVb7E7Vi/c5 6m6b1PIc1j76g2fHU2X9BfsUESzVX1h5eev0Wy8b9TWwMWlKv69HSEMIwKVXb X-Google-Smtp-Source: APXvYqwg4RzLJ/Y1SXOhSKcCwkH8r+t5GE/AuTwr5grm7ZIMxbjSFivGDBODkaxkwod3AB/GQYOtNclFsa2G X-Received: by 2002:ad4:42d2:: with SMTP id f18mr5314963qvr.52.1569539969639; Thu, 26 Sep 2019 16:19:29 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:23 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-28-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 27/28] kvm: mmu: Lazily allocate rmap when direct MMU is enabled From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org When the MMU is in pure direct mode, it uses a paging structure walk iterator and does not require the rmap. The rmap requires 8 bytes for every PTE that could be used to map guest memory. It is an expensive data strucutre at ~0.2% of the size of guest memory. Delay allocating the rmap until the MMU is no longer in pure direct mode. This could be caused, for example, by the guest launching a nested, L2 VM. Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu.c | 15 ++++++++++ arch/x86/kvm/x86.c | 72 ++++++++++++++++++++++++++++++++++++++++++---- arch/x86/kvm/x86.h | 2 ++ 3 files changed, 83 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index e0f35da0d1027..72c2289132c43 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -5228,8 +5228,23 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) u64 pdptr, pm_mask; gfn_t root_gfn, root_cr3; int i; + int r; write_lock(&vcpu->kvm->mmu_lock); + if (vcpu->kvm->arch.pure_direct_mmu) { + write_unlock(&vcpu->kvm->mmu_lock); + /* + * If this is the first time a VCPU has allocated shadow roots + * and the direct MMU is enabled on this VM, it will need to + * allocate rmaps for all its memslots. If the rmaps are already + * allocated, this call will have no effect. + */ + r = kvm_allocate_rmaps(vcpu->kvm); + if (r < 0) + return r; + write_lock(&vcpu->kvm->mmu_lock); + } + vcpu->kvm->arch.pure_direct_mmu = false; write_unlock(&vcpu->kvm->mmu_lock); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index edd7d7bece2fe..566521f956425 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9615,14 +9615,21 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free, kvm_page_track_free_memslot(free, dont); } -int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, - unsigned long npages) +static int allocate_memslot_rmap(struct kvm *kvm, + struct kvm_memory_slot *slot, + unsigned long npages) { int i; + /* + * rmaps are allocated all-or-nothing under the slots + * lock, so we only need to check that the first rmap + * has been allocated. + */ + if (slot->arch.rmap[0]) + return 0; + for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) { - struct kvm_lpage_info *linfo; - unsigned long ugfn; int lpages; int level = i + 1; @@ -9634,8 +9641,61 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, GFP_KERNEL_ACCOUNT); if (!slot->arch.rmap[i]) goto out_free; - if (i == 0) - continue; + } + return 0; + +out_free: + for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) { + kvfree(slot->arch.rmap[i]); + slot->arch.rmap[i] = NULL; + } + return -ENOMEM; +} + +int kvm_allocate_rmaps(struct kvm *kvm) +{ + struct kvm_memslots *slots; + struct kvm_memory_slot *slot; + int r = 0; + int i; + + mutex_lock(&kvm->slots_lock); + for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { + slots = __kvm_memslots(kvm, i); + kvm_for_each_memslot(slot, slots) { + r = allocate_memslot_rmap(kvm, slot, slot->npages); + if (r < 0) + break; + } + } + mutex_unlock(&kvm->slots_lock); + return r; +} + +int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, + unsigned long npages) +{ + int i; + int r; + + /* Set the rmap pointer for each level to NULL */ + memset(slot->arch.rmap, 0, + ARRAY_SIZE(slot->arch.rmap) * sizeof(*slot->arch.rmap)); + + if (!kvm->arch.pure_direct_mmu) { + r = allocate_memslot_rmap(kvm, slot, npages); + if (r < 0) + return r; + } + + for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) { + struct kvm_lpage_info *linfo; + unsigned long ugfn; + int lpages; + int level = i + 1; + + lpages = gfn_to_index(slot->base_gfn + npages - 1, + slot->base_gfn, level) + 1; linfo = kvcalloc(lpages, sizeof(*linfo), GFP_KERNEL_ACCOUNT); if (!linfo) diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index dbf7442a822b6..91bfbfd2c58d4 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -369,4 +369,6 @@ static inline bool kvm_pat_valid(u64 data) void kvm_load_guest_xcr0(struct kvm_vcpu *vcpu); void kvm_put_guest_xcr0(struct kvm_vcpu *vcpu); +int kvm_allocate_rmaps(struct kvm *kvm); + #endif From patchwork Thu Sep 26 23:18:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Gardon X-Patchwork-Id: 11163515 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C2E0617D4 for ; Thu, 26 Sep 2019 23:19:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A20352086A for ; Thu, 26 Sep 2019 23:19:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="N8BmjKUD" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729078AbfIZXTe (ORCPT ); Thu, 26 Sep 2019 19:19:34 -0400 Received: from mail-pl1-f202.google.com ([209.85.214.202]:51111 "EHLO mail-pl1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729000AbfIZXTc (ORCPT ); Thu, 26 Sep 2019 19:19:32 -0400 Received: by mail-pl1-f202.google.com with SMTP id y13so462789plr.17 for ; Thu, 26 Sep 2019 16:19:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=JUGraIaLqmijv4F/Vbw+p3xiRS5LLRdChqSbaFpw9Zc=; b=N8BmjKUDW4mTWxF4ImTP9YCNv7HkPOoZFL1d36GmZId8glcLuKTlpmGGWU0IGRJ6yd TsmbcM9cELfIQ7ycXdigaJbMfkNzuAjnnSQtJOmbDammr7KZAHE83wDvdGyGu1bLPVl6 qIuwhp4somevfLty1G5JgMAAdKE8De8CqXdBbUENoyP+zDGGv49+5bWN8eCqV+HZIZM/ q7DyPYkd3jQtrantXPx+l/jvR1hjs92FHUJKJILBwc+CogynF/LrbB5aX4bR53sRGDeJ 8aD9NrcWYuFVIkbCwkFhUH9sqjJO84FvMwKHw3LwdNXCOqW63ZQPt8mA9ynLhozgBV5t /Usg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=JUGraIaLqmijv4F/Vbw+p3xiRS5LLRdChqSbaFpw9Zc=; b=fdsdpeewkJBMjKKjJG22/n/44/FYl8CheKJOC00raTx65RDL6gWP5cJQLfo0vSGS2J PpTssoPPeS7bbu+8hgcO13KqEiHtTKgiWj3bHQJbgMBldGycW6PB9cekQBmku0SK74+V BSBFiiZkA2cgPTXxa6iTad6J7P3aZKPI7rhdIMOO9GrULyrHY6H8kcdDzpnQj8Ua8bhd gITiqN2yKUt5uCO9EQPmfxcDZmUiNjacxEnNIertmHmAB5VA1h6sb+2uqMQw1Oa08jsS 2tHBI9EnbLqHn/FeHyPTAvqFn+cUDfR6gZOZ9dFtHMeyP3zr0isJ8FOdFMyumM4V5kjr E3rg== X-Gm-Message-State: APjAAAVnh0PixBHUB3rReCqJbkbq7P/zU3LBpuY/oC1lW7eHrliIuQZi NI7AE2/+jkQk1W33KSpjvsunxqEPT5o5BjCwE+aKPnOJ6JhcgsRIIoM3Us9Li1Dt2Hxm6tqsPV9 JWXDdjlVVsxn5cvwCxJCvZg6A/+99GcEg1wbghxNqKQQfgsMTqp4781czajpA X-Google-Smtp-Source: APXvYqxNgqUKLM/jFdgIhbiE8U/C18awddqHW2XEsmiwqTv95FMcPzfMPl8aeVfthvboeyuxCeBSa7mLI+4d X-Received: by 2002:a63:2062:: with SMTP id r34mr6122824pgm.48.1569539971754; Thu, 26 Sep 2019 16:19:31 -0700 (PDT) Date: Thu, 26 Sep 2019 16:18:24 -0700 In-Reply-To: <20190926231824.149014-1-bgardon@google.com> Message-Id: <20190926231824.149014-29-bgardon@google.com> Mime-Version: 1.0 References: <20190926231824.149014-1-bgardon@google.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog Subject: [RFC PATCH 28/28] kvm: mmu: Support MMIO in the direct MMU From: Ben Gardon To: kvm@vger.kernel.org Cc: Paolo Bonzini , Peter Feiner , Peter Shier , Junaid Shahid , Jim Mattson , Ben Gardon Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add direct MMU handlers to the functions required to support MMIO Signed-off-by: Ben Gardon --- arch/x86/kvm/mmu.c | 91 ++++++++++++++++++++++++++++++++++------------ 1 file changed, 68 insertions(+), 23 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 72c2289132c43..0a23daea0df50 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -5464,49 +5464,94 @@ static bool mmio_info_in_cache(struct kvm_vcpu *vcpu, u64 addr, bool direct) return vcpu_match_mmio_gva(vcpu, addr); } -/* return true if reserved bit is detected on spte. */ -static bool -walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep) +/* + * Return the level of the lowest level pte added to ptes. + * That pte may be non-present. + */ +static int direct_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *ptes) { - struct kvm_shadow_walk_iterator iterator; - u64 sptes[PT64_ROOT_MAX_LEVEL], spte = 0ull; - int root, leaf; - bool reserved = false; + struct direct_walk_iterator iter; + int leaf = vcpu->arch.mmu->root_level; - if (!VALID_PAGE(vcpu->arch.mmu->root_hpa)) - goto exit; + direct_walk_iterator_setup_walk(&iter, vcpu->kvm, + kvm_arch_vcpu_memslots_id(vcpu), addr >> PAGE_SHIFT, + (addr >> PAGE_SHIFT) + 1, MMU_NO_LOCK); + while (direct_walk_iterator_next_pte(&iter)) { + leaf = iter.level; + ptes[leaf - 1] = iter.old_pte; + if (!is_shadow_present_pte(iter.old_pte)) + break; + } + direct_walk_iterator_end_traversal(&iter); + + return leaf; +} + +/* + * Return the level of the lowest level spte added to sptes. + * That spte may be non-present. + */ +static int shadow_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes) +{ + struct kvm_shadow_walk_iterator iterator; + int leaf = vcpu->arch.mmu->root_level; + u64 spte; walk_shadow_page_lockless_begin(vcpu); - for (shadow_walk_init(&iterator, vcpu, addr), - leaf = root = iterator.level; + for (shadow_walk_init(&iterator, vcpu, addr); shadow_walk_okay(&iterator); __shadow_walk_next(&iterator, spte)) { + leaf = iterator.level; spte = mmu_spte_get_lockless(iterator.sptep); - sptes[leaf - 1] = spte; - leaf--; if (!is_shadow_present_pte(spte)) break; - - reserved |= is_shadow_zero_bits_set(vcpu->arch.mmu, spte, - iterator.level); } walk_shadow_page_lockless_end(vcpu); + return leaf; +} + +/* return true if reserved bit is detected on spte. */ +static bool get_mmio_pte(struct kvm_vcpu *vcpu, u64 addr, bool direct, + u64 *ptep) +{ + u64 ptes[PT64_ROOT_MAX_LEVEL]; + int root = vcpu->arch.mmu->root_level; + int leaf; + int level; + bool reserved = false; + + + if (!VALID_PAGE(vcpu->arch.mmu->root_hpa)) { + *ptep = 0ull; + return reserved; + } + + if (direct && vcpu->kvm->arch.direct_mmu_enabled) + leaf = direct_mmu_get_walk(vcpu, addr, ptes); + else + leaf = shadow_mmu_get_walk(vcpu, addr, ptes); + + for (level = root; level >= leaf; level--) { + if (!is_shadow_present_pte(ptes[level - 1])) + break; + reserved |= is_shadow_zero_bits_set(vcpu->arch.mmu, + ptes[level - 1], level); + } + if (reserved) { pr_err("%s: detect reserved bits on spte, addr 0x%llx, dump hierarchy:\n", __func__, addr); - while (root > leaf) { + for (level = root; level >= leaf; level--) pr_err("------ spte 0x%llx level %d.\n", - sptes[root - 1], root); - root--; - } + ptes[level - 1], level); } -exit: - *sptep = spte; + + *ptep = ptes[leaf - 1]; return reserved; } @@ -5518,7 +5563,7 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct) if (mmio_info_in_cache(vcpu, addr, direct)) return RET_PF_EMULATE; - reserved = walk_shadow_page_get_mmio_spte(vcpu, addr, &spte); + reserved = get_mmio_pte(vcpu, addr, direct, &spte); if (WARN_ON(reserved)) return -EINVAL;