[v2,6/8] KVM: Fix multiple races in gfn=>pfn cache refresh

Rework the gfn=>pfn cache (gpc) refresh logic to address multiple races
between the cache itself, and between the cache and mmu_notifier events.

The existing refresh code attempts to guard against races with the
mmu_notifier by speculatively marking the cache valid, and then marking
it invalid if a mmu_notifier invalidation occurs.  That handles the case
where an invalidation occurs between dropping and re-acquiring gpc->lock,
but it doesn't handle the scenario where the cache is refreshed after the
cache was invalidated by the notifier, but before the notifier elevates
mmu_notifier_count.  The gpc refresh can't use the "retry" helper as its
invalidation occurs _before_ mmu_notifier_count is elevated and before
mmu_notifier_range_start is set/updated.

  CPU0                                    CPU1
  ----                                    ----

  gfn_to_pfn_cache_invalidate_start()
  |
  -> gpc->valid = false;
                                          kvm_gfn_to_pfn_cache_refresh()
                                          |
                                          |-> gpc->valid = true;

                                          hva_to_pfn_retry()
                                          |
                                          -> acquire kvm->mmu_lock
                                             kvm->mmu_notifier_count == 0
                                             mmu_seq == kvm->mmu_notifier_seq
                                             drop kvm->mmu_lock
                                             return pfn 'X'
  acquire kvm->mmu_lock
  kvm_inc_notifier_count()
  drop kvm->mmu_lock()
  kernel frees pfn 'X'
                                          kvm_gfn_to_pfn_cache_check()
                                          |
                                          |-> gpc->valid == true

                                          caller accesses freed pfn 'X'

Key off of mn_active_invalidate_count to detect that a pfncache refresh
needs to wait for an in-progress mmu_notifier invalidation.  While
mn_active_invalidate_count is not guaranteed to be stable, it is
guaranteed to be elevated prior to an invalidation acquiring gpc->lock,
so either the refresh will see an active invalidation and wait, or the
invalidation will run after the refresh completes.

Speculatively marking the cache valid is itself flawed, as a concurrent
kvm_gfn_to_pfn_cache_check() would see a valid cache with stale pfn/khva
values.  The KVM Xen use case explicitly allows/wants multiple users;
even though the caches are allocated per vCPU, __kvm_xen_has_interrupt()
can read a different vCPU (or vCPUs).  Address this race by invalidating
the cache prior to dropping gpc->lock (this is made possible by fixing
the above mmu_notifier race).

Finally, the refresh logic doesn't protect against concurrent refreshes
with different GPAs (which may or may not be a desired use case, but its
allowed in the code), nor does it protect against a false negative on the
memslot generation.  If the first refresh sees a stale memslot generation,
it will refresh the hva and generation before moving on to the hva=>pfn
translation.  If it then drops gpc->lock, a different user can come along,
acquire gpc->lock, see that the memslot generation is fresh, and skip
the hva=>pfn update due to the userspace address also matching (because
it too was updated).  Address this race by adding an "in-progress" flag
so that the refresh that acquires gpc->lock first runs to completion
before other users can start their refresh.

Complicating all of this is the fact that both the hva=>pfn resolution
and mapping of the kernel address can sleep, i.e. must be done outside
of gpc->lock

Fix the above races in one fell swoop, trying to fix each individual race
in a sane manner is impossible, for all intents and purposes.

Fixes: 982ed0de4753 ("KVM: Reinstate gfn_to_pfn_cache with invalidation support")
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Mingwei Zhang <mizhang@google.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/kvm_types.h |   1 +
 virt/kvm/kvm_main.c       |   9 ++
 virt/kvm/pfncache.c       | 209 +++++++++++++++++++++++++-------------
 3 files changed, 148 insertions(+), 71 deletions(-)

Message ID	20220427014004.1992589-7-seanjc@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEB87C433FE for <kvm@archiver.kernel.org>; Wed, 27 Apr 2022 01:40:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356923AbiD0Bnl (ORCPT <rfc822;kvm@archiver.kernel.org>); Tue, 26 Apr 2022 21:43:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356883AbiD0Bnd (ORCPT <rfc822;kvm@vger.kernel.org>); Tue, 26 Apr 2022 21:43:33 -0400 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C803C13F6D for <kvm@vger.kernel.org>; Tue, 26 Apr 2022 18:40:19 -0700 (PDT) Received: by mail-pj1-x104a.google.com with SMTP id o7-20020a17090a0a0700b001d93c491131so2505459pjo.6 for <kvm@vger.kernel.org>; Tue, 26 Apr 2022 18:40:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc:content-transfer-encoding; bh=8TMYzuplXVBJjftIf0wWJwIwTvZInrXR9aSkWsO9tM8=; b=pR21KqiH+heq1AXiJUy5J/tRw6+j/6AeYxYPc05VIqlwxRwT+13vU4jLMlbVMO4Tt4 rFRt4eUip8LFde3wtT9xX4AnHsRDes3ri7bhxY2X44KokV/Y4gY9OtfDkAD3Jzn+q62A YTkANdzuqo7oMgiJX+wr/jmDEdTgJ2wbTADbC3ltNJKSJdPp85Cw0RcD3vISL7tkf34L oJ3BEZ9yRP6G6cmA3VkHaBaQ24ET6kA7nu1Ud90wUPkZDtVTaC6QBWsnlhRijEySN73Z W/pktQhX5oj7Ug4vKS4Ql/UOwBCe1fuBiBrsN24cBqRpKgCdnL27GWZB394p0dWa/k7M I39g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc :content-transfer-encoding; bh=8TMYzuplXVBJjftIf0wWJwIwTvZInrXR9aSkWsO9tM8=; b=17fuAgM+1gmhNGUuoDtUlcP2AyafIy3AGI1KOc31gflvI7Jw+ZtgVGV9vMlugALnlT H2DSVlX0ZWYiRlBEMJZ9yT0bLMzDe15nzW6xKN5a4cZiMeEaIZqAc6znlUdGS4fRAtI2 qF1m7qiJqtOcyQDX/9wMwjPGfoJ6fdYVVj/H7Ls+Rl48YvVVJg0V9hM5XVdL77q/Fis5 in83guH4105CultimKyTUnK3Tcs4N4noVlvNRVhBrfExnflEmj6atx4A4IMo2Hfq0XLH XJxJv0OVUYtptAx8zSU0nC3e2BncgGJXHSbplq+rhy1OgRUp6wILJgpmzYLSmMpWDj2Q QGxA== X-Gm-Message-State: AOAM533Pr+/x3RaYTNfeOWlrrNdiHFZsOKT73X7amMuu2UOEHmNurWdc Vk60PHAcqv9FhFbaMEVh485wpUpmgzc= X-Google-Smtp-Source: ABdhPJxu7HQKSDyKxDWejft8wIx2gqh6LYqranx0Clfvo3O3OyjbgQPAEniOr3qipQk1Ydjn56mi8llXxDY= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a05:6a00:1502:b0:50d:8077:18bc with SMTP id q2-20020a056a00150200b0050d807718bcmr1105402pfu.63.1651023619319; Tue, 26 Apr 2022 18:40:19 -0700 (PDT) Reply-To: Sean Christopherson <seanjc@google.com> Date: Wed, 27 Apr 2022 01:40:02 +0000 In-Reply-To: <20220427014004.1992589-1-seanjc@google.com> Message-Id: <20220427014004.1992589-7-seanjc@google.com> Mime-Version: 1.0 References: <20220427014004.1992589-1-seanjc@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH v2 6/8] KVM: Fix multiple races in gfn=>pfn cache refresh From: Sean Christopherson <seanjc@google.com> To: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com>, Vitaly Kuznetsov <vkuznets@redhat.com>, Wanpeng Li <wanpengli@tencent.com>, Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Woodhouse <dwmw@amazon.co.uk>, Mingwei Zhang <mizhang@google.com>, Maxim Levitsky <mlevitsk@redhat.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: <kvm.vger.kernel.org> X-Mailing-List: kvm@vger.kernel.org
Series	KVM: Fix mmu_notifier vs. pfncache vs. pfncache races \| expand [v2,0/8] KVM: Fix mmu_notifier vs. pfncache vs. pfncache races [v2,1/8] Revert "KVM: Do not speculatively mark pfn cache valid to "fix" race" [v2,2/8] Revert "KVM: Fix race between mmu_notifier invalidation and pfncache refresh" [v2,3/8] KVM: Drop unused @gpa param from gfn=>pfn cache's __release_gpc() helper [v2,4/8] KVM: Put the extra pfn reference when reusing a pfn in the gpc cache [v2,5/8] KVM: Do not incorporate page offset into gfn=>pfn cache user address [v2,6/8] KVM: Fix multiple races in gfn=>pfn cache refresh [v2,7/8] KVM: Do not pin pages tracked by gfn=>pfn caches [v2,8/8] DO NOT MERGE: Hack-a-test to verify gpc invalidation+refresh

[v2,6/8] KVM: Fix multiple races in gfn=>pfn cache refresh

Commit Message

Comments

Patch