From patchwork Sat Feb 3 00:23:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13543633 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9DC7164D for ; Sat, 3 Feb 2024 00:23:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706919830; cv=none; b=QGBXXEo1a2dzQp6pgP1586vACeyaLElhnXJ3V4YFSCcoP0kxQC6OtzA3nNyOKuvikJqPei+lSTnCPlKY2vYJaexxEUn2aqQMb4T00vSpSi8qIyUfyb0nViYy9QA4m7eQZUjayBFpKpyNJ0mSBR4QxVjMlD/cTQ3g7VEGj8Mygiw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706919830; c=relaxed/simple; bh=F3gM+mRPBb7pChqqTVYezZoTdDnp7w6wvEhOfKxw6A8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=LcU6U2cCmV/Bi0605/027EkEK7E7pK3a2HRHnaO/jPsur50IY0niR0QaSwrElIovr2P9GXzDcwWB2Z0Ddq4GZffbr1B4FTXkhMu9d9aYvKLefpaWEY8dm2kxpQ54wConzSMyOSu9W8n9/I7moenUl3JsNELveyOUWb0mlFj1h0s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=E5M0nWAW; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="E5M0nWAW" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2965e421c73so424100a91.2 for ; Fri, 02 Feb 2024 16:23:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706919828; x=1707524628; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=uDSqele7NE0OONssRlZQ9urUuntmdp88j24rPAm3SJc=; b=E5M0nWAW8IvkBQKGzVtATR37ylWds7zkYJd4IH6XutkuptQ0M4hYmSC3dQosiUo8+R HGf1LRG6sUqcCNudx7Dn77aI1u9TgHXjQHZtstHL7yF8eK2d69XMjw7h7abYnr+e0uKj sdBWRF/Jxo4Fux9X5yKCt7BanLpEkqNYdCvU0dENN+2dd912mych3zZoOiTKiRjqyefG MSNjoghukRTpA8/DWANl5E6mfJsfoKBOwU0S2QCQOwbzYQnC00U1CjL+ZKfiWRdRwVwc r+8XhjBBe1Ok5nJDGjpqRYFI2iAGja+ZHtci17sKzmy4sPPSxZ4iQdxisqsCigeptyeo sYWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706919828; x=1707524628; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=uDSqele7NE0OONssRlZQ9urUuntmdp88j24rPAm3SJc=; b=ARf7bEiMl2+s+pcNBx4jw0wqboQr0ddyS/5NUOemGAD0kM/+GzAowh0BqP8WlsfW3Y ClsFYRyklZGJRyNTpqiH6DPQoiCqpmzHz80sE+OSMzlnRpyceaMqUD6mcWPn3FLXUpTK 39VmqJ+bXeNO1DSXX5rCGfXsZZtseImJFZzX26ERepPALZDje+mIWfPYcHEu4UocKCFn AKV+yUXO9mYICsSQaql2BSaDuyxW0qPGN2MzAvNN5E/AQMALjoQ0wR8JJQLUp599l+jl IDLiBBVbKg64Y3xkx9euOHDCTciOUA+A7V/2Hz1DnobnbbfTxgsWMgUyCiTDs6KIobXh 0mQg== X-Gm-Message-State: AOJu0YzGZR2mm7M/mvNADOYoWdcjuXy8v6OynuFOAvinwltHvie6AsOA gg83zWTPFxqJqKZIsl8WY33ks0nCrdJdEe2jb8n+nsHZv3r7Txavka0+EtN2J8AlZuYDeNb1TuE 5gA== X-Google-Smtp-Source: AGHT+IGZlK5BxEG1BMmbcQ2/Jse5hoja4Dn0N5mQwD6YTOdKWTQsBPE9VCAtqp/a8VZd29eQ2VMQMSDz1Ks= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:90b:48cd:b0:296:30e4:2c2d with SMTP id li13-20020a17090b48cd00b0029630e42c2dmr82613pjb.6.1706919827886; Fri, 02 Feb 2024 16:23:47 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 2 Feb 2024 16:23:40 -0800 In-Reply-To: <20240203002343.383056-1-seanjc@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240203002343.383056-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.594.gd9cf4e227d-goog Message-ID: <20240203002343.383056-2-seanjc@google.com> Subject: [PATCH v2 1/4] KVM: x86/mmu: Don't acquire mmu_lock when using indirect_shadow_pages as a heuristic From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Jim Mattson , Mingwei Zhang From: Mingwei Zhang Drop KVM's completely pointless acquisition of mmu_lock when deciding whether or not to unprotect any shadow pages residing at the gfn before resuming the guest to let it retry an instruction that KVM failed to emulated. In this case, indirect_shadow_pages is used as a coarse-grained heuristic to check if there is any chance of there being a relevant shadow page to unprotected. But acquiring mmu_lock largely defeats any benefit to the heuristic, as taking mmu_lock for write is likely far more costly to the VM as a whole than unnecessarily walking mmu_page_hash. Furthermore, the current code is already prone to false negatives and false positives, as it drops mmu_lock before checking the flag and unprotecting shadow pages. And as evidenced by the lack of bug reports, neither false positives nor false negatives are problematic. A false positive simply means that KVM will try to unprotect shadow pages that have already been zapped. And a false negative means that KVM will resume the guest without unprotecting the gfn, i.e. if a shadow page was _just_ created, the vCPU will hit the same page fault and do the whole dance all over again, and detect and unprotect the shadow page the second time around (or not, if something else zaps it first). Reported-by: Jim Mattson Signed-off-by: Mingwei Zhang [sean: drop READ_ONCE() and comment change, rewrite changelog] Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c339d9f95b4b..2ec3e1851f2f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8787,13 +8787,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, /* The instructions are well-emulated on direct mmu. */ if (vcpu->arch.mmu->root_role.direct) { - unsigned int indirect_shadow_pages; - - write_lock(&vcpu->kvm->mmu_lock); - indirect_shadow_pages = vcpu->kvm->arch.indirect_shadow_pages; - write_unlock(&vcpu->kvm->mmu_lock); - - if (indirect_shadow_pages) + if (vcpu->kvm->arch.indirect_shadow_pages) kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa)); return true; From patchwork Sat Feb 3 00:23:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13543634 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0539046A5 for ; Sat, 3 Feb 2024 00:23:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706919832; cv=none; b=ErdbMh+jjJYX/CyPZDnbhL2/MV/NdpBaH54BCvmfPTwmTnR+7ftsHINvlzKI7DpHPgO/esA9MjbTEgOSWjgWBkRsKg2cJsYeDWqWbwJfLI/OGc9HsD8Htfh09cSOZcOkNi0TvDU1O5XdztUk+KDNZE5RStcMAnmV1Xvzds3FUf8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706919832; c=relaxed/simple; bh=+txod2hID5r4KzljZQHI8kZJi4ylAzG2GvBBhxnFqJs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=tiZwAOujONSWAw3DIE3Z0ZdGeyNnuqcLWEkH7XGmWNzlNWY03b5XryNRrFp15zWTUUfKL7p3kNIX3gLGr/hXAWOGnEeVlnJgZgNEtfgoY0el3JsC2YF810VKN3hNOVIy4YXL0805+U5UUd+SLzXoF1l8fVoYd33n4NOj6cwrLiA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=2EnIrlHC; arc=none smtp.client-ip=209.85.219.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="2EnIrlHC" Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-dc6ade10cb8so2754024276.0 for ; Fri, 02 Feb 2024 16:23:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706919830; x=1707524630; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=gdrXXYnWw+bLoNL2lm3uC9b/C5lFzeIuJK5MljquSco=; b=2EnIrlHCTa0QDVqrK5SXKWbVSnY9Q5aNcoylD8hqad54bzA+m179YKbCBskIg2/sUu J39i3Tyj35w9CR8T/ESiYE5mj8AtB3HR7/Re2s/7/EYq2q/jsKyVig49i9cxPpiomnru yPwDEECDccrwfknqvvuRNUiLVE1Y3bs+e4vdAz7otOZC18sjmpMeF2XbW+dh9PtddqkT uEsko5VPT1He9zFO19qWgfQNH6lbi9HzrmTUALmomZ8ZNznvZoKrY1+ICrL/uSW3Eihs +D/jBkD4r0V4sx45sK6rxHyvc2dgH95wTDL8mn0qKeZjssnyp+O9Rwpi3YZjvtNP9Wj9 wSqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706919830; x=1707524630; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=gdrXXYnWw+bLoNL2lm3uC9b/C5lFzeIuJK5MljquSco=; b=o4kH6KVA6GfGthX7lAauXfnTfJdCQlLHEplyUZl7atfwvJ2Ey4lCOn1qzQ+ieW1L0Q wq2qMQvNDjbvblFBQx5H2JmqqTCGH3/O97pojcqUMI0VWUyIFjHbQTWl5pj8dFlQ7du6 O2rBkxXsKymVSVAVqli1V8VwFyh6/wrIq2sKfvJzjff8WI6ahk3sgS5FCuQWSKHOds8R pECDl5agaxfT/AqQXqwvgtDl7ewfNNubZp7Z9X0m8VHkI/UCsYk2OCXKlE5YlxK/yG9L MeXCPhtRenpux2gOLh2BraSuF1Ercmx55Ru5Io5Kf6kn1eppJm+DdWa+BXzJm+ivZ+Qh sp8w== X-Gm-Message-State: AOJu0YzsUr08sRstqUC+721Jbfm8qGlIOjGzzjgtPBdhr0LN1XAlIAV0 uenk1TF8Wff3gLAQ09Wf5s+73QvSEtZoGq/VLMlpa2RQuhPqyBG6FuhElcyZdULePD0+R15EEKc UMQ== X-Google-Smtp-Source: AGHT+IHvKP/rATiXeqKUH+UrRwvrsL9ZmSNIDc5BWAvnn33fK4l2sC00BL98oqRDtrhMCdV8x0/7YNdE4IQ= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:2208:b0:dc6:c9e8:8b0d with SMTP id dm8-20020a056902220800b00dc6c9e88b0dmr43666ybb.1.1706919830060; Fri, 02 Feb 2024 16:23:50 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 2 Feb 2024 16:23:41 -0800 In-Reply-To: <20240203002343.383056-1-seanjc@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240203002343.383056-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.594.gd9cf4e227d-goog Message-ID: <20240203002343.383056-3-seanjc@google.com> Subject: [PATCH v2 2/4] KVM: x86: Drop dedicated logic for direct MMUs in reexecute_instruction() From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Jim Mattson , Mingwei Zhang Now that KVM doesn't pointlessly acquire mmu_lock for direct MMUs, drop the dedicated path entirely and always query indirect_shadow_pages when deciding whether or not to try unprotecting the gfn. For indirect, a.k.a. shadow MMUs, checking indirect_shadow_pages is harmless; unless *every* shadow page was somehow zapped while KVM was attempting to emulate the instruction, indirect_shadow_pages is guaranteed to be non-zero. Well, unless the instruction used a direct hugepage with 2-level paging for its code page, but in that case, there's obviously nothing to unprotect. And in the extremely unlikely case all shadow pages were zapped, there's again obviously nothing to unprotect. Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2ec3e1851f2f..c502121b7bee 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8785,27 +8785,27 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, kvm_release_pfn_clean(pfn); - /* The instructions are well-emulated on direct mmu. */ - if (vcpu->arch.mmu->root_role.direct) { - if (vcpu->kvm->arch.indirect_shadow_pages) - kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa)); - - return true; - } - /* - * if emulation was due to access to shadowed page table - * and it failed try to unshadow page and re-enter the - * guest to let CPU execute the instruction. + * If emulation may have been triggered by a write to a shadowed page + * table, unprotect the gfn (zap any relevant SPTEs) and re-enter the + * guest to let the CPU re-execute the instruction in the hope that the + * CPU can cleanly execute the instruction that KVM failed to emulate. */ - kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa)); + if (vcpu->kvm->arch.indirect_shadow_pages) + kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa)); /* - * If the access faults on its page table, it can not - * be fixed by unprotecting shadow page and it should - * be reported to userspace. + * If the failed instruction faulted on an access to page tables that + * are used to translate any part of the instruction, KVM can't resolve + * the issue by unprotecting the gfn, as zapping the shadow page will + * result in the instruction taking a !PRESENT page fault and thus put + * the vCPU into an infinite loop of page faults. E.g. KVM will create + * a SPTE and write-protect the gfn to resolve the !PRESENT fault, and + * then zap the SPTE to unprotect the gfn, and then do it all over + * again. Report the error to userspace. */ - return !(emulation_type & EMULTYPE_WRITE_PF_TO_SP); + return vcpu->arch.mmu->root_role.direct || + !(emulation_type & EMULTYPE_WRITE_PF_TO_SP); } static bool retry_instruction(struct x86_emulate_ctxt *ctxt, From patchwork Sat Feb 3 00:23:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13543635 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB5096FC9 for ; Sat, 3 Feb 2024 00:23:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706919834; cv=none; b=KiWjIt46STqDfa+Przzk0SPr8uFMoZSsA/XVTA6JHuL7o7wnCJJ33mRXjtOAJ/IQKd/WB5q9Hu5jtk9dy2MDVLw1NWSrLGackcZNvQpDSMqXKXgnQ07TI6TNHMcnafrsYkqZfGCDIATD2r8gPD1PbtGm2a1zcJHAm7WRMtoWi/8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706919834; c=relaxed/simple; bh=WRSYmBjqMlFqc1HY3FJ4/F7rWHbJJwswChlMeIjfn2s=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=GfRN9Z/LWMM1CmIZZycIKeYpyG2bp6tQMViauWtDrVJ3xGd9pTTQkmtsezqf8kbhaGuBkRAcLVhmPMLux5tsBvkT4JbZuQ5x7TTRj47xK52uQr2N2Ms9ZqMMNqhgth8zGVfHw1M3M6noDUbQnD8S7p0GWT/fBytGqPxXCRZxj8Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=MYQpIBes; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="MYQpIBes" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6040ffa60ddso52964057b3.2 for ; Fri, 02 Feb 2024 16:23:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706919832; x=1707524632; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=cwT/GAm9mwG3ErXCbNNgFODtOKZ2Q++g1RjuNuyLwPU=; b=MYQpIBesBVEjCYCp6g1CTdewc+ZsdG8kpcQH1gDUz5v/sl5O1droXSteg2xJxuamku EdEaFpdIMAp9xoHlOn0bAco/w3f2ubMxaTHbGsTQmwqsFWPiU/FWn79kS2r3G7bE9zMp m4WI0zykIV2skMPXkZB+uMyeMUA5hM6ltchfwLA2syzY+V4OPGI91q446RHFnAydkcbl vyuixMeU/6247uCJfD2c3cpeiV87jXegttCu+/ahB/0CDo8TVWXSykJEst/EMMkz9u9T ZF4Qe8y2gMQWYtIv77G3wkEdf3BH8GSNKWCRrY8he48lnBB5G6q145aR/LuqU4Yb3uKw abRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706919832; x=1707524632; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=cwT/GAm9mwG3ErXCbNNgFODtOKZ2Q++g1RjuNuyLwPU=; b=myBbodl3q8VVnWtgbDsdKenadX46kfSsJ1D9nUjMmUBbc0MVMopFkddAgumzxV9PfS abQ/PKIAn8zmS/A42ZYZhS1bMu1il5k7//QeabOosWwmptkE1y1kJbc7FwOTK60IBkNP 2xjax0jPZu9gs22gZOONfqCumUzPklsPGPGsxVHRy7d8JUnaukRnefN6JrMtxVZWmjlW s4rcXpy1oe8Qipa49SjfmdieZD1R1IT/zU9vjaeiM+nMQWx0BsJW/9clKZOlqPD1mEBh W4lT2Qkm+MDQu6+GE4NKglOpzu2AdNKCoLDvbkqEyuRhBlq4AsZlMbsuAy8GFul7ABmC AKgQ== X-Gm-Message-State: AOJu0YzLqJKlXW2N6OVsfRX/FXZBrwPn2mDwhEpcx7zsIMD9/hlQ58sM fKW/ytVqfXVM9dNwmlA86gpXDrtlZnTHy1qTRh0kXSQIN+KZ7uBN3wsHFSvX8CYZ1sWH8WymrU3 zfQ== X-Google-Smtp-Source: AGHT+IF7YoTc6bbRQGeDGj5CJ6pUb22TLJNw++X22QdIvPDWhovn32KwmmcnsiuT11YXA4gLI09YpltZzVw= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:690c:9:b0:5d3:5a95:2338 with SMTP id bc9-20020a05690c000900b005d35a952338mr939298ywb.9.1706919831997; Fri, 02 Feb 2024 16:23:51 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 2 Feb 2024 16:23:42 -0800 In-Reply-To: <20240203002343.383056-1-seanjc@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240203002343.383056-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.594.gd9cf4e227d-goog Message-ID: <20240203002343.383056-4-seanjc@google.com> Subject: [PATCH v2 3/4] KVM: x86: Drop superfluous check on direct MMU vs. WRITE_PF_TO_SP flag From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Jim Mattson , Mingwei Zhang Remove reexecute_instruction()'s final check on the MMU being direct, as EMULTYPE_WRITE_PF_TO_SP is only ever set if the MMU is indirect, i.e. is a shadow MMU. Prior to commit 93c05d3ef252 ("KVM: x86: improve reexecute_instruction"), the flag simply didn't exist (and KVM actually returned "true" unconditionally for both types of MMUs). I.e. the explicit check for a direct MMU is simply leftover artifact from old code. Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c502121b7bee..5fe94b2de1dc 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8804,8 +8804,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, * then zap the SPTE to unprotect the gfn, and then do it all over * again. Report the error to userspace. */ - return vcpu->arch.mmu->root_role.direct || - !(emulation_type & EMULTYPE_WRITE_PF_TO_SP); + return !(emulation_type & EMULTYPE_WRITE_PF_TO_SP); } static bool retry_instruction(struct x86_emulate_ctxt *ctxt, From patchwork Sat Feb 3 00:23:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13543636 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 36BC879E2 for ; Sat, 3 Feb 2024 00:23:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706919835; cv=none; b=BQXIiMFDcRLeV36Zo7Y7aKDij0vZpJn+i178lnO+buj0Xhw3zbKynaq1+NQhIyyuCE+z//Y/3PkhcX3grsCCUdT9P8dlTiWZvqB8UfhgUoDMx3hHZg2Q1wvvMHaGowzFBfqquw6pOn4cC+EvgZ2NNcT6ZJjjAGzbEV3jbcEC4nc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706919835; c=relaxed/simple; bh=bj+xt0tQx/aHe27FVFwz6GpRQEJPlX/TxMKuiSnFpKQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=YjqRen465QtgeAXBZisesg7xn5u3DT72+0UgYMWRcS8t11OYs7NcCHwn/lmZ4gb3LRfpH7fiFtyUdR2h0nORltwYFbDP770ovOwrb71EzUaRtU8x+akvLuZ728Qc1k4SviPDdgzOSRZML6nfWt1Nk3MSBnVfSrQrTejOQvyqBDs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Q/OYFDCz; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Q/OYFDCz" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-6dde04e1c67so2319382b3a.0 for ; Fri, 02 Feb 2024 16:23:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706919833; x=1707524633; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=YMTbLD7fakVliu9PWzNdDCpL894AL7erRJ6ZL070HUM=; b=Q/OYFDCzEaEi2/ExUkal7KFup9u8uyc315UNMDCVabluK+X1gZhwFt8Onb9m8UR2Vv 0yKEA45jTgqAZP2Xtl3ZrJfrtxiouT4c1zQ3ByDHU4uJ4/apoeauRdt6eci/i5S77eoE xCRMnuspDTra4KVyMUqucpKgita6vxg/seqSHhQMNFwqcdbCaZqpKRqFnngNY1MSQQtU c+eO3+v3Y/K83to0GVC7d1OjxLJ8HT0LVRNbxEhMlLCu7jgwE1Axu/FhLQ7gv4iaTFcg KfrTQskfdMkZSCdvTD/Bkw9Bp4HwUnhobQgqa6pezTCx249YsoElHKAscAoDjFuLp2LH CVmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706919833; x=1707524633; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=YMTbLD7fakVliu9PWzNdDCpL894AL7erRJ6ZL070HUM=; b=HVF20bWJqLNoPUrI/RJlxjhGKi8ZN8NoP2GRGzp/wBGZIKlJo8oqVQePj1yOiIR3aR wzUhOtpIckH5EL+jEjMvhbPZb+ZdRhAEEW+d6n0+GgTzi8KASUwNsQ+fIKsetd3iuuVK PuIvQNv0gpYzdM3+ZeU9C7KqtaeuwzXgSXQ4YjeIw7NFvEkMpiBcwCQTzBtyk+a4lU/L Hfs3f80vm52haX/W4g+/CGIDM/gDDVjc5hOCBCoFjumaB2nABPVve47hZ+SQe7T+s4Ri BetYAwmfNYardrXU9SsezohvoKKwC7QH5kl52VX3sc169NB79AzBMqkzupxXkuFv1oEx TWAg== X-Gm-Message-State: AOJu0YzXZrGGeUVuC4VfwXb0vwWJV6eVvtoF5cCV0D9GrB1p7s+nSFQS ddlkksNjJERKdOQlhf73d6tiheNNme2l8ajUu+LjrbwrzV2xFCPqNFYpph5M2brwKx7zRIqHi+B RTw== X-Google-Smtp-Source: AGHT+IFdh99V2tM7O2ZTQjK+2KSFUG7mdx47/k6wCN3Dq6hcxr5sbk5Qwfu9yU7pPF5nEGWFotYGwzH+t3U= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6a00:1797:b0:6df:eae5:79bd with SMTP id s23-20020a056a00179700b006dfeae579bdmr185067pfg.0.1706919833504; Fri, 02 Feb 2024 16:23:53 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 2 Feb 2024 16:23:43 -0800 In-Reply-To: <20240203002343.383056-1-seanjc@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240203002343.383056-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.594.gd9cf4e227d-goog Message-ID: <20240203002343.383056-5-seanjc@google.com> Subject: [PATCH v2 4/4] KVM: x86/mmu: Fix a *very* theoretical race in kvm_mmu_track_write() From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Jim Mattson , Mingwei Zhang Add full memory barriers in kvm_mmu_track_write() and account_shadowed() to plug a (very, very theoretical) race where kvm_mmu_track_write() could miss a 0->1 transition of indirect_shadow_pages and fail to zap relevant, *stale* SPTEs. Without the barriers, because modern x86 CPUs allow (per the SDM): Reads may be reordered with older writes to different locations but not with older writes to the same location. it's (again, super theoretically) possible that the following could happen (terms of values being visible/resolved): CPU0 CPU1 read memory[gfn] (=Y) memory[gfn] Y=>X read indirect_shadow_pages (=0) indirect_shadow_pages 0=>1 or conversely: CPU0 CPU1 indirect_shadow_pages 0=>1 read indirect_shadow_pages (=0) read memory[gfn] (=Y) memory[gfn] Y=>X In practice, this bug is likely benign as both the 0=>1 transition and reordering of this scope are extremely rare occurrences. Note, if the cost of the barrier (which is simply a locked ADD, see commit 450cbdd0125c ("locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE")), is problematic, KVM could avoid the barrier by bailing earlier if checking kvm_memslots_have_rmaps() is false. But the odds of the barrier being problematic is extremely low, *and* the odds of the extra checks being meaningfully faster overall is also low. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 3c193b096b45..86b85060534d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -830,6 +830,14 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp) struct kvm_memory_slot *slot; gfn_t gfn; + /* + * Ensure indirect_shadow_pages is elevated prior to re-reading guest + * child PTEs in FNAME(gpte_changed), i.e. guarantee either in-flight + * emulated writes are visible before re-reading guest PTEs, or that + * an emulated write will see the elevated count and acquire mmu_lock + * to update SPTEs. Pairs with the smp_mb() in kvm_mmu_track_write(). + */ + smp_mb(); kvm->arch.indirect_shadow_pages++; gfn = sp->gfn; slots = kvm_memslots_for_spte_role(kvm, sp->role); @@ -5747,10 +5755,15 @@ void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new, bool flush = false; /* - * If we don't have indirect shadow pages, it means no page is - * write-protected, so we can exit simply. + * When emulating guest writes, ensure the written value is visible to + * any task that is handling page faults before checking whether or not + * KVM is shadowing a guest PTE. This ensures either KVM will create + * the correct SPTE in the page fault handler, or this task will see + * a non-zero indirect_shadow_pages. Pairs with the smp_mb() in + * account_shadowed(). */ - if (!READ_ONCE(vcpu->kvm->arch.indirect_shadow_pages)) + smp_mb(); + if (!vcpu->kvm->arch.indirect_shadow_pages) return; write_lock(&vcpu->kvm->mmu_lock);