[v12,20/20] KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues

From: David Woodhouse <dwmw@amazon.co.uk>

From: David Woodhouse <dwmw@amazon.co.uk>

This function can race with kvm_gpc_deactivate(), which does not take
the ->refresh_lock. This means kvm_gpc_deactivate() can wipe the ->pfn
and ->khva fields, and unmap the latter, while hva_to_pfn_retry() has
temporarily dropped its write lock on gpc->lock.

Then if hva_to_pfn_retry() determines that the PFN hasn't changed and
that the original pfn and khva can be reused, they get assigned back to
gpc->pfn and gpc->khva even though the khva was already unmapped by
kvm_gpc_deactivate(). This leaves the cache in an apparently valid state
but with ->khva pointing to an address which has been unmapped. Which in
turn leads to oopses in e.g. __kvm_xen_has_interrupt() and
set_shinfo_evtchn_pending() when they dereference said khva.

It may be possible to fix this just by making kvm_gpc_deactivate() take
the ->refresh_lock, but that still leaves ->refresh_lock being basically
redundant with the write lock on ->lock, which frankly makes my skin
itch, with the way that pfn_to_hva_retry() operates on fields in the gpc
without holding a write lock on ->lock.

Instead, fix it by cleaning up the semantics of hva_to_pfn_retry(). It
now no longer does locking gymnastics because it no longer operates on
the gpc object at all. I's called with a uhva and simply returns the
corresponding pfn (pinned), and a mapped khva for it.

Its caller __kvm_gpc_refresh() now sets gpc->uhva and clears gpc->valid
before dropping ->lock, calling hva_to_pfn_retry() and retaking ->lock
for write.

If hva_to_pfn_retry() fails, *or* if the ->uhva or ->active fields in
the gpc changed while the lock was dropped, the new mapping is discarded
and the gpc is not modified. On success with an unchanged gpc, the new
mapping is installed and the current ->pfn and ->uhva are taken into the
local old_pfn and old_khva variables to be unmapped once the locks are
all released.

This simplification means that ->refresh_lock is no longer needed for
correctness, but it does still provide a minor optimisation because it
will prevent two concurrent __kvm_gpc_refresh() calls from mapping a
given PFN, only for one of them to lose the race and discard its
mapping.

The optimisation in hva_to_pfn_retry() where it attempts to use the old
mapping if the pfn doesn't change is dropped, since it makes the pinning
more complex. It's a pointless optimisation anyway, since the odds of
the pfn ending up the same when the uhva has changed (i.e. the odds of
the two userspace addresses both pointing to the same underlying
physical page) are negligible,

The 'hva_changed' local variable in __kvm_gpc_refresh() is also removed,
since it's simpler just to clear gpc->valid if the uhva changed.
Likewise the unmap_old variable is dropped because it's just as easy to
check the old_pfn variable for KVM_PFN_ERR_FAULT.

I remain slightly confused because although this is clearly a race in
the gfn_to_pfn_cache code, I don't quite know how the Xen support code
actually managed to trigger it. We've seen oopses from dereferencing a
valid-looking ->khva in both __kvm_xen_has_interrupt() (the vcpu_info)
and in set_shinfo_evtchn_pending() (the shared_info). But surely the
race shouldn't happen for the vcpu_info gpc because all calls to both
refresh and deactivate hold the vcpu mutex, and it shouldn't happen
for the shared_info gpc because all calls to both will hold the
kvm->arch.xen.xen_lock mutex.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
---
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>

v12:
 - New in this version.
---
 virt/kvm/pfncache.c | 184 +++++++++++++++++++++-----------------------
 1 file changed, 88 insertions(+), 96 deletions(-)

Message ID	20240115125707.1183-21-paul@xen.org (mailing list archive)
State	New, archived
Headers	show Received: from mail.xenproject.org (mail.xenproject.org [104.130.215.37]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 716E817C69; Mon, 15 Jan 2024 13:09:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=xen.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=xen.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=xen.org header.i=@xen.org header.b="fMdOW6y8" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=xen.org; s=20200302mail; h=Content-Transfer-Encoding:Content-Type:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:To:From; bh=G1MUmXqcUNbndfoB441ogomDIdc1jFzjEaJbDsyBneM=; b=fMdOW6y8mLDlhil3ydDLY0Yg5B 7nXngyRdTXYnL8FwuVifLBcVIGr5fZ0Jg/uuWK0v2Rx+pyMfPcPKR1XRtFhKZgn0Ay3T2hGh33jE3 CANrdtp3/yiQzphOC1uYK38NdrSCtTMzb8HFO5x2g8XfNJ5ehDSQZFUQfLpJ9SYHzHJU=; Received: from xenbits.xenproject.org ([104.239.192.120]) by mail.xenproject.org with esmtp (Exim 4.92) (envelope-from <paul@xen.org>) id 1rPMiZ-00035m-1u; Mon, 15 Jan 2024 13:09:31 +0000 Received: from 54-240-197-231.amazon.com ([54.240.197.231] helo=REM-PW02S00X.ant.amazon.com) by xenbits.xenproject.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <paul@xen.org>) id 1rPMXk-0002kM-Fg; Mon, 15 Jan 2024 12:58:20 +0000 From: Paul Durrant <paul@xen.org> To: Paolo Bonzini <pbonzini@redhat.com>, Jonathan Corbet <corbet@lwn.net>, Sean Christopherson <seanjc@google.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>, David Woodhouse <dwmw2@infradead.org>, Paul Durrant <paul@xen.org>, Shuah Khan <shuah@kernel.org>, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH v12 20/20] KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues Date: Mon, 15 Jan 2024 12:57:07 +0000 Message-Id: <20240115125707.1183-21-paul@xen.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240115125707.1183-1-paul@xen.org> References: <20240115125707.1183-1-paul@xen.org> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: <kvm.vger.kernel.org> List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
Series	KVM: xen: update shared_info and vcpu_info handling \| expand [v12,00/20] KVM: xen: update shared_info and vcpu_info handling [v12,01/20] KVM: pfncache: Add a map helper function [v12,02/20] KVM: pfncache: remove unnecessary exports [v12,03/20] KVM: xen: mark guest pages dirty with the pfncache lock held [v12,04/20] KVM: pfncache: add a mark-dirty helper [v12,05/20] KVM: pfncache: remove KVM_GUEST_USES_PFN usage [v12,06/20] KVM: pfncache: stop open-coding offset_in_page() [v12,07/20] KVM: pfncache: include page offset in uhva and use it consistently [v12,08/20] KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA [v12,09/20] KVM: xen: separate initialization of shared_info cache and content [v12,10/20] KVM: xen: re-initialize shared_info if guest (32/64-bit) mode is set [v12,11/20] KVM: xen: allow shared_info to be mapped by fixed HVA [v12,12/20] KVM: xen: allow vcpu_info to be mapped by fixed HVA [v12,13/20] KVM: selftests / xen: map shared_info using HVA rather than GFN [v12,14/20] KVM: selftests / xen: re-map vcpu_info using HVA rather than GPA [v12,15/20] KVM: xen: advertize the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA capability [v12,16/20] KVM: xen: split up kvm_xen_set_evtchn_fast() [v12,17/20] KVM: xen: don't block on pfncache locks in kvm_xen_set_evtchn_fast() [v12,18/20] KVM: pfncache: check the need for invalidation under read lock first [v12,19/20] KVM: xen: allow vcpu_info content to be 'safely' copied [v12,20/20] KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues

[v12,20/20] KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues

Commit Message

Patch