From patchwork Tue Jul 9 13:20:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Roy X-Patchwork-Id: 13727958 Received: from smtp-fw-52004.amazon.com (smtp-fw-52004.amazon.com [52.119.213.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F32A84A0A; Tue, 9 Jul 2024 13:20:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.154 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720531262; cv=none; b=JyQEBHSRsBzMofe9rUspCMCIlOvc+Mg6ieKgA5jN6c99CrelfV/l+8iGYmWe+Ow1hTyhCgSLwl7ILuaKXz3KDns3W23bQawlTS1JfUXEA25mgM5TnTIzlFgv62J1Ph6kHhtZ6AG08x21wMLDV/XMZJe4VQNlYPgCWDcMntMO/fY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720531262; c=relaxed/simple; bh=UJ4B+Q4x+Ego4l6SJ2nX8aLysQrcdIIiRMf9d2sbkio=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=XDJzib2b409ivgk19cd/Osp+R2Er8P56AoIvydj/IOin5qIatooSsavnE2y8cnk4qqxi/Y9S/BwyFGZevp18DWGaX1J3OmI6mXjYaeCOVGIldM0zLfyJU0Wd9Gmx18srxde34d2hgZNysmxCsr5GEnoa0qFWpHhp/r1HzsEREr0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (1024-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b=bEsxCxdN; arc=none smtp.client-ip=52.119.213.154 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b="bEsxCxdN" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1720531260; x=1752067260; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=myefINasqoLP8LFmYmmDEFb4CuSNvTYNmrkp8V5RNNQ=; b=bEsxCxdN3ru94aKZb0/1q4Gst3DjyxBRIiFbG8KuVk28wAnM/z/Xpc9s WRlKtr2wazKJVhx5mHreB1pfy2RkrhEmP1VU1iNKLU6wIwVOL5FOontin 1XSKRl2kDsQ+Za/E9B8ZDSBLejoVfWuuusJRbh4twHSnA8Yid6P5Mu4ja 0=; X-IronPort-AV: E=Sophos;i="6.09,195,1716249600"; d="scan'208";a="217222124" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.2]) by smtp-border-fw-52004.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2024 13:20:58 +0000 Received: from EX19MTAUEB001.ant.amazon.com [10.0.0.204:19435] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.13.24:2525] with esmtp (Farcaster) id 18a4efc4-1b3b-4e58-98d8-17e4bb420299; Tue, 9 Jul 2024 13:20:57 +0000 (UTC) X-Farcaster-Flow-ID: 18a4efc4-1b3b-4e58-98d8-17e4bb420299 Received: from EX19D008UEA004.ant.amazon.com (10.252.134.191) by EX19MTAUEB001.ant.amazon.com (10.252.135.108) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 9 Jul 2024 13:20:57 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D008UEA004.ant.amazon.com (10.252.134.191) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 9 Jul 2024 13:20:56 +0000 Received: from ua2d7e1a6107c5b.ant.amazon.com (172.19.88.180) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Tue, 9 Jul 2024 13:20:54 +0000 From: Patrick Roy To: , , , , , CC: Patrick Roy , , , , , , , , , , , , , , , , , Subject: [RFC PATCH 1/8] kvm: Allow reading/writing gmem using kvm_{read,write}_guest Date: Tue, 9 Jul 2024 14:20:29 +0100 Message-ID: <20240709132041.3625501-2-roypat@amazon.co.uk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240709132041.3625501-1-roypat@amazon.co.uk> References: <20240709132041.3625501-1-roypat@amazon.co.uk> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 If KVM can access guest-private memory without causing a host-kernel panic (e.g. currently only if the vm type is KVM_SW_PROTECTED_VM), allow `kvm_{read,write}_guest` to access gfns that are set to "private". If KVM cannot access guest-private memory (say, because it is running a TDX VM), prepare a KVM_EXIT_MEMORY_FAULT (if possible) and return -EFAULT. KVM can only prepare the memory fault exit inside the `kvm_vcpu_{read,write}_guest` variant, as it needs a vcpu reference to assign the exit reason to. KVM accesses guest-private memory via kernel virtual addresses/the direct map. In the special case of guest_memfd, it does not have to worry about gfn->pfn mappings being invalidated, since guest_memfd pages are immovable. Signed-off-by: Patrick Roy --- include/linux/kvm_host.h | 5 +++ virt/kvm/kvm_main.c | 85 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 90 insertions(+) base-commit: 771df9ffadb8204e61d3e98f36c5067102aab78f diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 2a6679b46427..8f980aafd5ca 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2407,6 +2407,11 @@ static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE; } +static inline bool kvm_can_access_gmem(struct kvm *kvm) +{ + return kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM; +} + #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn) { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 8c7cbc9ec9ee..b3b3de70a4df 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3286,11 +3286,51 @@ static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn, return 0; } +static int __kvm_read_guest_private_page(struct kvm *kvm, + struct kvm_memory_slot *memslot, gfn_t gfn, + void *data, int offset, int len) +{ + kvm_pfn_t pfn; + int r; + struct page *page; + void *kaddr; + + if (!kvm_can_access_gmem(kvm)) + return -EFAULT; + + r = kvm_gmem_get_pfn(kvm, memslot, gfn, &pfn, NULL); + + if (r < 0) + return -EFAULT; + + page = pfn_to_page(pfn); + lock_page(page); + kaddr = page_address(page) + offset; + memcpy(data, kaddr, len); + unlock_page(page); + put_page(page); + return 0; +} + +static int __kvm_vcpu_read_guest_private_page(struct kvm_vcpu *vcpu, + struct kvm_memory_slot *memslot, gfn_t gfn, + void *data, int offset, int len) +{ + if (!kvm_can_access_gmem(vcpu->kvm)) { + kvm_prepare_memory_fault_exit(vcpu, gfn + offset, len, false, + false, true); + return -EFAULT; + } + return __kvm_read_guest_private_page(vcpu->kvm, memslot, gfn, data, offset, len); +} + int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset, int len) { struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn); + if (kvm_mem_is_private(kvm, gfn)) + return __kvm_read_guest_private_page(kvm, slot, gfn, data, offset, len); return __kvm_read_guest_page(slot, gfn, data, offset, len); } EXPORT_SYMBOL_GPL(kvm_read_guest_page); @@ -3300,6 +3340,8 @@ int kvm_vcpu_read_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, void *data, { struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); + if (kvm_mem_is_private(vcpu->kvm, gfn)) + return __kvm_vcpu_read_guest_private_page(vcpu, slot, gfn, data, offset, len); return __kvm_read_guest_page(slot, gfn, data, offset, len); } EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_page); @@ -3390,11 +3432,52 @@ static int __kvm_write_guest_page(struct kvm *kvm, return 0; } +static int __kvm_write_guest_private_page(struct kvm *kvm, + struct kvm_memory_slot *memslot, gfn_t gfn, + const void *data, int offset, int len) +{ + kvm_pfn_t pfn; + int r; + struct page *page; + void *kaddr; + + if (!kvm_can_access_gmem(kvm)) + return -EFAULT; + + r = kvm_gmem_get_pfn(kvm, memslot, gfn, &pfn, NULL); + + if (r < 0) + return -EFAULT; + + page = pfn_to_page(pfn); + lock_page(page); + kaddr = page_address(page) + offset; + memcpy(kaddr, data, len); + unlock_page(page); + put_page(page); + + return 0; +} + +static int __kvm_vcpu_write_guest_private_page(struct kvm_vcpu *vcpu, + struct kvm_memory_slot *memslot, gfn_t gfn, + const void *data, int offset, int len) +{ + if (!kvm_can_access_gmem(vcpu->kvm)) { + kvm_prepare_memory_fault_exit(vcpu, gfn + offset, len, true, + false, true); + return -EFAULT; + } + return __kvm_write_guest_private_page(vcpu->kvm, memslot, gfn, data, offset, len); +} + int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data, int offset, int len) { struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn); + if (kvm_mem_is_private(kvm, gfn)) + return __kvm_write_guest_private_page(kvm, slot, gfn, data, offset, len); return __kvm_write_guest_page(kvm, slot, gfn, data, offset, len); } EXPORT_SYMBOL_GPL(kvm_write_guest_page); @@ -3404,6 +3487,8 @@ int kvm_vcpu_write_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, { struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); + if (kvm_mem_is_private(vcpu->kvm, gfn)) + return __kvm_vcpu_write_guest_private_page(vcpu, slot, gfn, data, offset, len); return __kvm_write_guest_page(vcpu->kvm, slot, gfn, data, offset, len); } EXPORT_SYMBOL_GPL(kvm_vcpu_write_guest_page); From patchwork Tue Jul 9 13:20:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Roy X-Patchwork-Id: 13727959 Received: from smtp-fw-52002.amazon.com (smtp-fw-52002.amazon.com [52.119.213.150]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18FE52D057; Tue, 9 Jul 2024 13:21:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.150 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720531265; cv=none; b=kH7p9wLePVE23Nm44VPoMBanTeyLM3e8VSaE+wGuCQeJjW9iU/KKyIuaPavsqJlyNwCngcfK09Dr9fQpEgUfi1uz390euNFNIq7fyKh4iPWjn0NboSWtDwv64lXYTJBJqnjrHkq9GZH+qHim8TP9jwnW04xoi8HW3HYnxOvSHJk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720531265; c=relaxed/simple; bh=ofrjlTwjZutvnSbSpWjUvPgpAsV5ivDbnT/GrlE1wMY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=tkmqlE8QpfHF4pEb3+CGAiyVYYbRcHrlmcr8uefNzVetnfDx7cSeFjNECeE7mzYh41fAEYG5cUdxMPkFvc1OLUrQ6iPyBymLxXn/UbCR39J3fD2MxQf0oeRj2rO11dB8rN1gIjTm5+ocR6zOH8ow8pHvvEHm0B+h8997wvGR25o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (1024-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b=TZ2rp0jy; arc=none smtp.client-ip=52.119.213.150 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b="TZ2rp0jy" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1720531263; x=1752067263; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/DNQp/z0GUBgLTV7ZRmRKhZQi2WgjcFO0AlHckf358s=; b=TZ2rp0jyaEv7Fhw6699bQXwux3wtJpG9Ssn2sELoX9EhPTOGaUA/JUD3 caz1/WE64tFFZNYZe8FbYXcGHpdTvzTaIoQsJmsFRTsG9MWaEMZqzfGIN GSY83QZ0NuFXcSLBCSbLQFeLTknfc0eNObqnoVf7eINj3RR0XPFTTL6/6 8=; X-IronPort-AV: E=Sophos;i="6.09,195,1716249600"; d="scan'208";a="644664049" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52002.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2024 13:21:01 +0000 Received: from EX19MTAUEA002.ant.amazon.com [10.0.0.204:55215] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.17.85:2525] with esmtp (Farcaster) id cb50a819-87b3-4fab-a17a-29e4e0b8d5a4; Tue, 9 Jul 2024 13:21:00 +0000 (UTC) X-Farcaster-Flow-ID: cb50a819-87b3-4fab-a17a-29e4e0b8d5a4 Received: from EX19D008UEC001.ant.amazon.com (10.252.135.232) by EX19MTAUEA002.ant.amazon.com (10.252.134.9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 9 Jul 2024 13:21:00 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D008UEC001.ant.amazon.com (10.252.135.232) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 9 Jul 2024 13:21:00 +0000 Received: from ua2d7e1a6107c5b.ant.amazon.com (172.19.88.180) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Tue, 9 Jul 2024 13:20:57 +0000 From: Patrick Roy To: , , , , , CC: Patrick Roy , , , , , , , , , , , , , , , , , Subject: [RFC PATCH 2/8] kvm: use slowpath in gfn_to_hva_cache if memory is private Date: Tue, 9 Jul 2024 14:20:30 +0100 Message-ID: <20240709132041.3625501-3-roypat@amazon.co.uk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240709132041.3625501-1-roypat@amazon.co.uk> References: <20240709132041.3625501-1-roypat@amazon.co.uk> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Currently, KVM uses gfn_to_hva_caches to cache gfn->memslot->userspace host virtual address (uhva) translations. If a gfn is backed by guest_memfd however, there is no uhva-equivalent item we could possible cache, since accesses go through a file descriptor instead of a VMA. Thus, we effectively disable gfn_to_hva_caches in the case where gfns are gmem-backed, and instead do a gfn->pfn translation on the fly by calling `kvm_{read,write}_guest` inside `kvm_{read,write}_guest_cached`. Signed-off-by: Patrick Roy --- virt/kvm/kvm_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index b3b3de70a4df..4357f7cdf040 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3603,7 +3603,7 @@ int kvm_write_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, if (kvm_is_error_hva(ghc->hva)) return -EFAULT; - if (unlikely(!ghc->memslot)) + if (unlikely(!ghc->memslot || kvm_mem_is_private(kvm, gpa_to_gfn(gpa)))) return kvm_write_guest(kvm, gpa, data, len); r = __copy_to_user((void __user *)ghc->hva + offset, data, len); @@ -3641,7 +3641,7 @@ int kvm_read_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, if (kvm_is_error_hva(ghc->hva)) return -EFAULT; - if (unlikely(!ghc->memslot)) + if (unlikely(!ghc->memslot || kvm_mem_is_private(kvm, gpa_to_gfn(gpa)))) return kvm_read_guest(kvm, gpa, data, len); r = __copy_from_user(data, (void __user *)ghc->hva + offset, len); From patchwork Tue Jul 9 13:20:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Roy X-Patchwork-Id: 13727962 Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4C55A15ECC0; Tue, 9 Jul 2024 13:21:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=207.171.184.29 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720531276; cv=none; b=drCdk/YsB3PaHBWhqFe25KIBv1UXdbNP6EAETOFkLO6NSTnViaS4meUxTh0KCgCVfFzG88+81pwbUqXJ/O4Z5CVp5E9MZQ0fqkSkKDZMvNOZM2Lnk959TaSI79jEOkgk49vUCLcCNZx6SG/j7MFB/Sxc+qIvzwSHNV5eH4w54/Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720531276; c=relaxed/simple; bh=uuopR67Q1J3vX2/gS9DwzERKE3IhuTwl8q4/PTiHb6I=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=nPmWT9PyKb1wrI8k/lQK232UZwckuFU0JNs3MwpOuTtR+jr3Ov2KQpyb+M6M0FGeYaJtU8Funb4yVY18fJq8vLg/RxudgmJAKQ9Lsl/q5Je/tNZ3OfTfOxuQ268TiHsfElk/eMV8s+ebQMpN/sCCBUIISvb0+Onyy71AHDbJN6A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (1024-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b=IBGmho88; arc=none smtp.client-ip=207.171.184.29 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b="IBGmho88" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1720531276; x=1752067276; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=j7xh8Fp0rGL+JNyIcONb4iIMzBYUq25BVxGN/Q+52J4=; b=IBGmho88qkJBK1FznX1Awd2MCCKuFbT+w20JujnLxX1RJto2mAXbmuGX EZkpIakmXASfZfgGtkrDebXGFXaVfAFCeCb+2XKAYdqNt3juF7EzhT5ox s+W4NsgRiO8dKvqsMie6mKdgmNafALOBl0W3+e/Q0m5hrJUk8/DClH3cM I=; X-IronPort-AV: E=Sophos;i="6.09,195,1716249600"; d="scan'208";a="432897833" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-9102.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2024 13:21:09 +0000 Received: from EX19MTAUEA001.ant.amazon.com [10.0.0.204:54070] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.19.12:2525] with esmtp (Farcaster) id 9adfff71-39c0-4eea-be63-6a4028520ef9; Tue, 9 Jul 2024 13:21:07 +0000 (UTC) X-Farcaster-Flow-ID: 9adfff71-39c0-4eea-be63-6a4028520ef9 Received: from EX19D008UEA004.ant.amazon.com (10.252.134.191) by EX19MTAUEA001.ant.amazon.com (10.252.134.203) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 9 Jul 2024 13:21:04 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D008UEA004.ant.amazon.com (10.252.134.191) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 9 Jul 2024 13:21:04 +0000 Received: from ua2d7e1a6107c5b.ant.amazon.com (172.19.88.180) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Tue, 9 Jul 2024 13:21:01 +0000 From: Patrick Roy To: , , , , , CC: Patrick Roy , , , , , , , , , , , , , , , , , Subject: [RFC PATCH 3/8] kvm: pfncache: enlighten about gmem Date: Tue, 9 Jul 2024 14:20:31 +0100 Message-ID: <20240709132041.3625501-4-roypat@amazon.co.uk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240709132041.3625501-1-roypat@amazon.co.uk> References: <20240709132041.3625501-1-roypat@amazon.co.uk> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 KVM uses gfn_to_pfn_caches to cache translations from gfn all the way to the pfn (for example, kvm-clock caches the page storing the page used for guest/host communication this way). Unlike the gfn_to_hva_cache, where no equivalent caching semantics were possible to gmem-backed gfns (see also 858e8068a750 ("kvm: pfncache: enlighten about gmem")), here it is possible to simply cache the pfn returned by `kvm_gmem_get_pfn`. Additionally, gfn_to_pfn_caches now invalidate whenever a cached gfn's attributes are flipped from shared to private (or vice-versa). Signed-off-by: Patrick Roy --- include/linux/kvm_types.h | 1 + virt/kvm/pfncache.c | 41 +++++++++++++++++++++++++++++++++------ 2 files changed, 36 insertions(+), 6 deletions(-) diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h index 827ecc0b7e10..8f85f01f6bb0 100644 --- a/include/linux/kvm_types.h +++ b/include/linux/kvm_types.h @@ -70,6 +70,7 @@ struct gfn_to_pfn_cache { kvm_pfn_t pfn; bool active; bool valid; + bool is_private; }; #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index f0039efb9e1e..6430e0a49558 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -90,6 +90,9 @@ bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len) if (!kvm_gpc_is_valid_len(gpc->gpa, gpc->uhva, len)) return false; + if (gpc->is_private != kvm_mem_is_private(gpc->kvm, gpa_to_gfn(gpc->gpa))) + return false; + if (!gpc->valid) return false; @@ -159,6 +162,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc) kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT; void *new_khva = NULL; unsigned long mmu_seq; + gfn_t gfn; lockdep_assert_held(&gpc->refresh_lock); @@ -173,6 +177,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc) do { mmu_seq = gpc->kvm->mmu_invalidate_seq; + gfn = gpa_to_gfn(gpc->gpa); smp_rmb(); write_unlock_irq(&gpc->lock); @@ -197,10 +202,19 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc) cond_resched(); } - /* We always request a writeable mapping */ - new_pfn = hva_to_pfn(gpc->uhva, false, false, NULL, true, NULL); - if (is_error_noslot_pfn(new_pfn)) - goto out_error; + if (gpc->is_private) { + int r = kvm_gmem_get_pfn(gpc->kvm, gfn_to_memslot(gpc->kvm, gfn), gfn, + &new_pfn, NULL); + + if (r) + goto out_error; + } else { + /* We always request a writeable mapping */ + new_pfn = hva_to_pfn(gpc->uhva, false, false, NULL, + true, NULL); + if (is_error_noslot_pfn(new_pfn)) + goto out_error; + } /* * Obtain a new kernel mapping if KVM itself will access the @@ -252,6 +266,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l unsigned long old_uhva; kvm_pfn_t old_pfn; bool hva_change = false; + bool old_private; void *old_khva; int ret; @@ -271,8 +286,21 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l old_pfn = gpc->pfn; old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva); old_uhva = PAGE_ALIGN_DOWN(gpc->uhva); + old_private = gpc->is_private; + + gpc->is_private = kvm_mem_is_private(gpc->kvm, gpa_to_gfn(gpa)); + + if (gpc->is_private && !kvm_can_access_gmem(gpc->kvm)) { + ret = -EFAULT; + goto out_unlock; + } if (kvm_is_error_gpa(gpa)) { + if (WARN_ON_ONCE(gpc->is_private)) { + ret = -EINVAL; + goto out_unlock; + } + page_offset = offset_in_page(uhva); gpc->gpa = INVALID_GPA; @@ -316,9 +344,10 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l /* * If the userspace HVA changed or the PFN was already invalid, - * drop the lock and do the HVA to PFN lookup again. + * drop the lock and do the HVA to PFN lookup again. Also + * recompute the pfn if the gfn changed from shared to private (or vice-versa). */ - if (!gpc->valid || hva_change) { + if (!gpc->valid || hva_change || gpc->is_private != old_private) { ret = hva_to_pfn_retry(gpc); } else { /* From patchwork Tue Jul 9 13:20:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Roy X-Patchwork-Id: 13727961 Received: from smtp-fw-52004.amazon.com (smtp-fw-52004.amazon.com [52.119.213.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A47A51662E2; Tue, 9 Jul 2024 13:21:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.154 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720531276; cv=none; b=hgtPO8i6VXwcecC/i4Q3fuxP+RiKUCNS+t9oDabxsGY7PzgAQSdDT/bMXnBUrrhvLDhJ2WW4/sc4TIUedAG3zKeQbhpHF7+l0ViJZxr3+D1kiKOEdkQfce4NO1xe4G9MX10LOJTQY+Sakwk49ubkCG69RrG652jDKXk6s0dBVdQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720531276; c=relaxed/simple; bh=afo9GEYjrpoi5drHYQ5eHUyfY5bIc48pWzRhbwW7tKc=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ODELP12hgb44mJb9m+yQu+qdXkE66vVc+SL8+ifqbhGgMzlXYkRkp9kmUSXouqTgOM1U2YUwXs92wOcG2nyBOr9t8CfHS9dDax+/+ePv5Wf5UWVXMQ26Ush4aBr8NwEbJCjSTiW/sQclc9fJlnrFH2p/bsKesjjFoz2gpR8iitA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (1024-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b=JnpSm0c9; arc=none smtp.client-ip=52.119.213.154 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b="JnpSm0c9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1720531275; x=1752067275; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OgAfYDQpLUE9XUDFTSU5X0VWjGezZLvoUAAPFhPmJtQ=; b=JnpSm0c9VvVKm0hIJyw6c+XsxSw22ZTBfOd50ga4QZo4e4mLCqqUGAH+ P6onjyhCXp5249F8lT84dxbXRYY594hRDCO3jEWOne9XtEy+r026y3xX/ tnfJ+JFg+sZ0imeMb7teVNl9ZEATbBGTjeEl9viB0ByITsFBreUrEfnmS k=; X-IronPort-AV: E=Sophos;i="6.09,195,1716249600"; d="scan'208";a="217222162" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.2]) by smtp-border-fw-52004.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2024 13:21:14 +0000 Received: from EX19MTAUEC002.ant.amazon.com [10.0.0.204:13938] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.50.89:2525] with esmtp (Farcaster) id 57bc0aa8-df5f-4f40-9cd0-d79eb3ced12e; Tue, 9 Jul 2024 13:21:13 +0000 (UTC) X-Farcaster-Flow-ID: 57bc0aa8-df5f-4f40-9cd0-d79eb3ced12e Received: from EX19D008UEA004.ant.amazon.com (10.252.134.191) by EX19MTAUEC002.ant.amazon.com (10.252.135.253) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 9 Jul 2024 13:21:07 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D008UEA004.ant.amazon.com (10.252.134.191) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 9 Jul 2024 13:21:07 +0000 Received: from ua2d7e1a6107c5b.ant.amazon.com (172.19.88.180) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Tue, 9 Jul 2024 13:21:04 +0000 From: Patrick Roy To: , , , , , CC: Patrick Roy , , , , , , , , , , , , , , , , , Subject: [RFC PATCH 4/8] kvm: x86: support walking guest page tables in gmem Date: Tue, 9 Jul 2024 14:20:32 +0100 Message-ID: <20240709132041.3625501-5-roypat@amazon.co.uk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240709132041.3625501-1-roypat@amazon.co.uk> References: <20240709132041.3625501-1-roypat@amazon.co.uk> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Update the logic in paging_tmpl.h to work with guest_private memory. If KVM cannot access gmem and the guest's page tables are in gfns marked as private, then error out. Let the guest page table walker access gmem by making it use gfn_to_pfn_caches, which are already gmem aware, and will later also handle on-demand mapping of gmem once it supports being removed from the direct map. We re-use the gfn_to_pfn_cache here to avoid implementing yet another remapping solution to support the cmpxchg used to set the "accessed" bit on guest PTEs. The only case that now needs some special handling is page tables in read-only memslots, as gfn_to_pfn_caches cannot be used for readonly memory. In this case, use kvm_vcpu_read_guest (which is also gmem aware), as there is no need to cache the gfn->pfn translation in this case (there is no need to do a cmpxchg on the PTE as the walker does not set the accessed bit for read-only ptes). Signed-off-by: Patrick Roy --- arch/x86/kvm/mmu/paging_tmpl.h | 94 ++++++++++++++++++++++++++++------ 1 file changed, 77 insertions(+), 17 deletions(-) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 69941cebb3a8..ddf3b4bd479e 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -84,7 +84,7 @@ struct guest_walker { pt_element_t ptes[PT_MAX_FULL_LEVELS]; pt_element_t prefetch_ptes[PTE_PREFETCH_NUM]; gpa_t pte_gpa[PT_MAX_FULL_LEVELS]; - pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS]; + struct gfn_to_pfn_cache ptep_caches[PT_MAX_FULL_LEVELS]; bool pte_writable[PT_MAX_FULL_LEVELS]; unsigned int pt_access[PT_MAX_FULL_LEVELS]; unsigned int pte_access; @@ -201,7 +201,7 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu, { unsigned level, index; pt_element_t pte, orig_pte; - pt_element_t __user *ptep_user; + struct gfn_to_pfn_cache *pte_cache; gfn_t table_gfn; int ret; @@ -210,10 +210,12 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu, return 0; for (level = walker->max_level; level >= walker->level; --level) { + unsigned long flags; + pte = orig_pte = walker->ptes[level - 1]; table_gfn = walker->table_gfn[level - 1]; - ptep_user = walker->ptep_user[level - 1]; - index = offset_in_page(ptep_user) / sizeof(pt_element_t); + pte_cache = &walker->ptep_caches[level - 1]; + index = offset_in_page(pte_cache->khva) / sizeof(pt_element_t); if (!(pte & PT_GUEST_ACCESSED_MASK)) { trace_kvm_mmu_set_accessed_bit(table_gfn, index, sizeof(pte)); pte |= PT_GUEST_ACCESSED_MASK; @@ -246,11 +248,26 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu, if (unlikely(!walker->pte_writable[level - 1])) continue; - ret = __try_cmpxchg_user(ptep_user, &orig_pte, pte, fault); + read_lock_irqsave(&pte_cache->lock, flags); + while (!kvm_gpc_check(pte_cache, sizeof(pte))) { + read_unlock_irqrestore(&pte_cache->lock, flags); + + ret = kvm_gpc_refresh(pte_cache, sizeof(pte)); + if (ret) + return ret; + + read_lock_irqsave(&pte_cache->lock, flags); + } + ret = __try_cmpxchg((pt_element_t *)pte_cache->khva, &orig_pte, pte, sizeof(pte)); + + if (!ret) + kvm_gpc_mark_dirty_in_slot(pte_cache); + + read_unlock_irqrestore(&pte_cache->lock, flags); + if (ret) return ret; - kvm_vcpu_mark_page_dirty(vcpu, table_gfn); walker->ptes[level - 1] = pte; } return 0; @@ -296,6 +313,12 @@ static inline bool FNAME(is_last_gpte)(struct kvm_mmu *mmu, return gpte & PT_PAGE_SIZE_MASK; } + +static void FNAME(walk_deactivate_gpcs)(struct guest_walker *walker) { + for (unsigned int level = 0; level < PT_MAX_FULL_LEVELS; ++level) + kvm_gpc_deactivate(&walker->ptep_caches[level]); +} + /* * Fetch a guest pte for a guest virtual address, or for an L2's GPA. */ @@ -305,7 +328,6 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, { int ret; pt_element_t pte; - pt_element_t __user *ptep_user; gfn_t table_gfn; u64 pt_access, pte_access; unsigned index, accessed_dirty, pte_pkey; @@ -320,8 +342,17 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, u16 errcode = 0; gpa_t real_gpa; gfn_t gfn; + struct gfn_to_pfn_cache *pte_cache; trace_kvm_mmu_pagetable_walk(addr, access); + + for (unsigned int level = 0; level < PT_MAX_FULL_LEVELS; ++level) { + pte_cache = &walker->ptep_caches[level]; + + memset(pte_cache, 0, sizeof(*pte_cache)); + kvm_gpc_init(pte_cache, vcpu->kvm); + } + retry_walk: walker->level = mmu->cpu_role.base.level; pte = kvm_mmu_get_guest_pgd(vcpu, mmu); @@ -362,11 +393,13 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, do { struct kvm_memory_slot *slot; - unsigned long host_addr; + unsigned long flags; pt_access = pte_access; --walker->level; + pte_cache = &walker->ptep_caches[walker->level - 1]; + index = PT_INDEX(addr, walker->level); table_gfn = gpte_to_gfn(pte); offset = index * sizeof(pt_element_t); @@ -396,15 +429,36 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, if (!kvm_is_visible_memslot(slot)) goto error; - host_addr = gfn_to_hva_memslot_prot(slot, gpa_to_gfn(real_gpa), - &walker->pte_writable[walker->level - 1]); - if (unlikely(kvm_is_error_hva(host_addr))) - goto error; + /* + * gfn_to_pfn_cache expects the memory to be writable. However, + * if the memory is not writable, we do not need caching in the + * first place, as we only need it to later potentially write + * the access bit (which we cannot do anyway if the memory is + * readonly). + */ + if (slot->flags & KVM_MEM_READONLY) { + if (kvm_vcpu_read_guest(vcpu, real_gpa + offset, &pte, sizeof(pte))) + goto error; + } else { + if (kvm_gpc_activate(pte_cache, real_gpa + offset, + sizeof(pte))) + goto error; - ptep_user = (pt_element_t __user *)((void *)host_addr + offset); - if (unlikely(__get_user(pte, ptep_user))) - goto error; - walker->ptep_user[walker->level - 1] = ptep_user; + read_lock_irqsave(&pte_cache->lock, flags); + while (!kvm_gpc_check(pte_cache, sizeof(pte))) { + read_unlock_irqrestore(&pte_cache->lock, flags); + + if (kvm_gpc_refresh(pte_cache, sizeof(pte))) + goto error; + + read_lock_irqsave(&pte_cache->lock, flags); + } + + pte = *(pt_element_t *)pte_cache->khva; + read_unlock_irqrestore(&pte_cache->lock, flags); + + walker->pte_writable[walker->level - 1] = true; + } trace_kvm_mmu_paging_element(pte, walker->level); @@ -467,13 +521,19 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, addr, write_fault); if (unlikely(ret < 0)) goto error; - else if (ret) + else if (ret) { + FNAME(walk_deactivate_gpcs)(walker); goto retry_walk; + } } + FNAME(walk_deactivate_gpcs)(walker); + return 1; error: + FNAME(walk_deactivate_gpcs)(walker); + errcode |= write_fault | user_fault; if (fetch_fault && (is_efer_nx(mmu) || is_cr4_smep(mmu))) errcode |= PFERR_FETCH_MASK; From patchwork Tue Jul 9 13:20:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Roy X-Patchwork-Id: 13727963 Received: from smtp-fw-9105.amazon.com (smtp-fw-9105.amazon.com [207.171.188.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 399F016B381; Tue, 9 Jul 2024 13:21:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=207.171.188.204 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720531283; cv=none; b=Lw/IERPqyn2VdZUFBNlC1ugBGQbTr2reTp0eFZBlObYWm6SAMaOfM+V2yiTf5IiP029+2Ojg33gH+dKXBMBKMoYcTckOgfNLJbzJ8vkMiTuuffnB114Eg1ugsbaF6FtG6EL3CGuRiimbiEQbTnigjo+eGeM9YUi6n/PSTj953I4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720531283; c=relaxed/simple; bh=I23lBRGUQaayc5ffZ+uZ4QKT658elNyo5piW8m0re7U=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=by55RGRrzxTHknCRz0m7NlOofkzf6DPMEW3OR6jQ14ncwc1rR2YQtzomNphtgmUj+C4876PD5OV5kjtKGo3cwudjJwBwmgy9RFvW3bj2iWUUgf/k/glHV7CRXpl1vZE6gZu2Liu+Li5LsCnOaKObYECdGLzO7luIfEn1nQWOZjE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (1024-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b=SJ4rfNYq; arc=none smtp.client-ip=207.171.188.204 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b="SJ4rfNYq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1720531283; x=1752067283; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+mzO007algvViExHdnpknhkipITD3cHhlfbBcPgvLYk=; b=SJ4rfNYqkGBW89jP4cSGSgwSqCGGFeraGPQGashE8UVRJol6I0HjpRPE ZKIvfahZlC7ki/iojpPdOr4CVcPyZSJniPUKUatv1wdSmvy3ikqrp2QOY 0BBbE0rdpBQt+luO4kYI/0djHT7jlSBT/xH8wAyC//2eFVhNLxPzgEnTI Q=; X-IronPort-AV: E=Sophos;i="6.09,195,1716249600"; d="scan'208";a="739970201" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-9105.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2024 13:21:15 +0000 Received: from EX19MTAUEC001.ant.amazon.com [10.0.0.204:50152] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.92.61:2525] with esmtp (Farcaster) id 1a418e6e-d33b-4bef-8883-26ade4dd88c5; Tue, 9 Jul 2024 13:21:13 +0000 (UTC) X-Farcaster-Flow-ID: 1a418e6e-d33b-4bef-8883-26ade4dd88c5 Received: from EX19D008UEA004.ant.amazon.com (10.252.134.191) by EX19MTAUEC001.ant.amazon.com (10.252.135.222) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 9 Jul 2024 13:21:10 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D008UEA004.ant.amazon.com (10.252.134.191) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 9 Jul 2024 13:21:10 +0000 Received: from ua2d7e1a6107c5b.ant.amazon.com (172.19.88.180) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Tue, 9 Jul 2024 13:21:07 +0000 From: Patrick Roy To: , , , , , CC: Patrick Roy , , , , , , , , , , , , , , , , , Subject: [RFC PATCH 5/8] kvm: gmem: add option to remove guest private memory from direct map Date: Tue, 9 Jul 2024 14:20:33 +0100 Message-ID: <20240709132041.3625501-6-roypat@amazon.co.uk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240709132041.3625501-1-roypat@amazon.co.uk> References: <20240709132041.3625501-1-roypat@amazon.co.uk> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 While guest_memfd is not available to be mapped by userspace, it is still accessible through the kernel's direct map. This means that in scenarios where guest-private memory is not hardware protected, it can be speculatively read and its contents potentially leaked through hardware side-channels. Removing guest-private memory from the direct map, thus mitigates a large class of speculative execution issues [1, Table 1]. This patch adds a flag to the `KVM_CREATE_GUEST_MEMFD` which, if set, removes the struct pages backing guest-private memory from the direct map. Should `CONFIG_HAVE_KVM_GMEM_{INVALIDATE, PREPARE}` be set, pages are removed after preparation and before invalidation, so that the prepare/invalidate routines do not have to worry about potentially absent direct map entries. Direct map removal do not reuse the `KVM_GMEM_PREPARE` machinery, since `prepare` can be called multiple time, and it is the responsibility of the preparation routine to not "prepare" the same folio twice [2]. Thus, instead explicitly check if `filemap_grab_folio` allocated a new folio, and remove the returned folio from the direct map only if this was the case. The patch uses release_folio instead of free_folio to reinsert pages back into the direct map as by the time free_folio is called, folio->mapping can already be NULL. This means that a call to folio_inode inside free_folio might deference a NULL pointer, leaving no way to access the inode which stores the flags that allow determining whether the page was removed from the direct map in the first place. Lastly, the patch uses set_direct_map_{invalid,default}_noflush instead of `set_memory_[n]p` to avoid expensive flushes of TLBs and the L*-cache hierarchy. This is especially important once KVM restores direct map entries on-demand in later patches, where simple FIO benchmarks of a virtio-blk device have shown that TLB flushes on a Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz resulted in 80% degradation in throughput compared to a non-flushing solution. Not flushing the TLB means that until TLB entries for temporarily restored direct map entries get naturally evicted, they can be used during speculative execution, and effectively "unhide" the memory for longer than intended. We consider this acceptable, as the only pages that are temporarily reinserted into the direct map like this will either hold PV data structures (kvm-clock, asyncpf, etc), or pages containing privileged instructions inside the guest kernel image (in the MMIO emulation case). [1]: https://download.vusec.net/papers/quarantine_raid23.pdf Signed-off-by: Patrick Roy --- include/uapi/linux/kvm.h | 2 ++ virt/kvm/guest_memfd.c | 52 ++++++++++++++++++++++++++++++++++------ 2 files changed, 47 insertions(+), 7 deletions(-) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index e065d9fe7ab2..409116aa23c9 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1563,4 +1563,6 @@ struct kvm_create_guest_memfd { __u64 reserved[6]; }; +#define KVM_GMEM_NO_DIRECT_MAP (1ULL << 0) + #endif /* __LINUX_KVM_H */ diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 9148b9679bb1..dc9b0c2d0b0e 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -4,6 +4,7 @@ #include #include #include +#include #include "kvm_mm.h" @@ -49,9 +50,16 @@ static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct fol return 0; } +static bool kvm_gmem_not_present(struct inode *inode) +{ + return ((unsigned long)inode->i_private & KVM_GMEM_NO_DIRECT_MAP) != 0; +} + static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool prepare) { struct folio *folio; + bool zap_direct_map = false; + int r; /* TODO: Support huge pages. */ folio = filemap_grab_folio(inode->i_mapping, index); @@ -74,16 +82,30 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool for (i = 0; i < nr_pages; i++) clear_highpage(folio_page(folio, i)); + // We need to clear the folio before calling kvm_gmem_prepare_folio, + // but can only remove it from the direct map _after_ preparation is done. + zap_direct_map = kvm_gmem_not_present(inode); + folio_mark_uptodate(folio); } if (prepare) { - int r = kvm_gmem_prepare_folio(inode, index, folio); - if (r < 0) { - folio_unlock(folio); - folio_put(folio); - return ERR_PTR(r); - } + r = kvm_gmem_prepare_folio(inode, index, folio); + if (r < 0) + goto out_err; + } + + if (zap_direct_map) { + r = set_direct_map_invalid_noflush(&folio->page); + if (r < 0) + goto out_err; + + // We use the private flag to track whether the folio has been removed + // from the direct map. This is because inside of ->free_folio, + // we do not have access to the address_space anymore, meaning we + // cannot check folio_inode(folio)->i_private to determine whether + // KVM_GMEM_NO_DIRECT_MAP was set. + folio_set_private(folio); } /* @@ -91,6 +113,10 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool * unevictable and there is no storage to write back to. */ return folio; +out_err: + folio_unlock(folio); + folio_put(folio); + return ERR_PTR(r); } static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start, @@ -354,10 +380,22 @@ static void kvm_gmem_free_folio(struct folio *folio) } #endif +static void kvm_gmem_invalidate_folio(struct folio *folio, size_t start, size_t end) +{ + if (start == 0 && end == PAGE_SIZE) { + // We only get here if PG_private is set, which only happens if kvm_gmem_not_present + // returned true in kvm_gmem_get_folio. Thus no need to do that check again. + BUG_ON(set_direct_map_default_noflush(&folio->page)); + + folio_clear_private(folio); + } +} + static const struct address_space_operations kvm_gmem_aops = { .dirty_folio = noop_dirty_folio, .migrate_folio = kvm_gmem_migrate_folio, .error_remove_folio = kvm_gmem_error_folio, + .invalidate_folio = kvm_gmem_invalidate_folio, #ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE .free_folio = kvm_gmem_free_folio, #endif @@ -443,7 +481,7 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) { loff_t size = args->size; u64 flags = args->flags; - u64 valid_flags = 0; + u64 valid_flags = KVM_GMEM_NO_DIRECT_MAP; if (flags & ~valid_flags) return -EINVAL; From patchwork Tue Jul 9 13:20:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Roy X-Patchwork-Id: 13727964 Received: from smtp-fw-9105.amazon.com (smtp-fw-9105.amazon.com [207.171.188.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B35A215ECCD; Tue, 9 Jul 2024 13:21:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=207.171.188.204 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720531285; cv=none; b=u8gyp1kQm0zTSTpk+Xxq8Z9M/k+kwiExpM1GTSLSMZ0KtEnmzchDZlI0LhyLAMCJXi9kt3Ehl+ildbtMFj0jokX1My/KOSVzeCfX3Q+vZ62Q7SLAM4egg9qKETzrknWpizkU3pmkSK1mwOP7zmEHymCc5422Xor8vinDZpy3D2k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720531285; c=relaxed/simple; bh=rTxQSgURDcpl3BzGNJ8JiwifVeB4Xnp57taZkMVuTnc=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=FoMkwEld2wWvAcoHWetbrOyaWmUcGrIsc1g+EWlF64uaRwmjWSBo4v71zxMJ0N2QCTbZ9sF4C+4/VBsX0q+mZqGXe41Orm719ZJ6xQlV869jnhwexLKseWk8c8ikYfdprBDx6Sy9JUfU1IoDpuD0GIU9SnxKPYYB/TOplO3IKH8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (1024-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b=R7+fxuFK; arc=none smtp.client-ip=207.171.188.204 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b="R7+fxuFK" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1720531284; x=1752067284; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mg7XA6znVd/yr9yTmGtkTxgi1xD8jDw5CHJVcz9oSG8=; b=R7+fxuFKMCvUVaJbOHGUaP5+qo6Lkyn0xTP2vgyjB2xh+mZYh8e6woHM 2hrN8zJ62WjREEsesMOpme6iHOJ0CEMGeILOCSxk4mVLvL75XX/ke97i/ lzWT7qkMu0ThKp19nES/VBsXj/AdvaKjTiVtJhl0ywfSxupHKmQ2BwRSf Q=; X-IronPort-AV: E=Sophos;i="6.09,195,1716249600"; d="scan'208";a="739970210" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-9105.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2024 13:21:17 +0000 Received: from EX19MTAUEA002.ant.amazon.com [10.0.0.204:6203] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.50.89:2525] with esmtp (Farcaster) id 2ee5a019-f3c3-4312-aa43-5ccfe349ee40; Tue, 9 Jul 2024 13:21:15 +0000 (UTC) X-Farcaster-Flow-ID: 2ee5a019-f3c3-4312-aa43-5ccfe349ee40 Received: from EX19D008UEA002.ant.amazon.com (10.252.134.125) by EX19MTAUEA002.ant.amazon.com (10.252.134.9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 9 Jul 2024 13:21:13 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D008UEA002.ant.amazon.com (10.252.134.125) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 9 Jul 2024 13:21:13 +0000 Received: from ua2d7e1a6107c5b.ant.amazon.com (172.19.88.180) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Tue, 9 Jul 2024 13:21:11 +0000 From: Patrick Roy To: , , , , , CC: Patrick Roy , , , , , , , , , , , , , , , , , Subject: [RFC PATCH 6/8] kvm: gmem: Temporarily restore direct map entries when needed Date: Tue, 9 Jul 2024 14:20:34 +0100 Message-ID: <20240709132041.3625501-7-roypat@amazon.co.uk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240709132041.3625501-1-roypat@amazon.co.uk> References: <20240709132041.3625501-1-roypat@amazon.co.uk> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 If KVM_GMEM_NO_DIRECT_MAP is set, and KVM tries to internally access guest-private memory inside kvm_{read,write}_guest, or via a gfn_to_pfn_cache, temporarily restore the direct map entry. To avoid race conditions between two threads restoring or zapping direct map entries for the same page and potentially interfering with each other (e.g. unfortune interweavings of map->read->unmap in the form of map(A)->map(B)->read(A)->unmap(A)->read(B) [BOOM]), the following invariant is upheld in this patch: - Only a single gfn_to_pfn_cache can exist for any given pfn, and - All non-gfn_to_pfn_cache code paths that temporarily restore direct map entries complete the entire map->access->unmap critical section while holding the folio lock. To remember whether a given folio currently has a direct map entry, use the PG_private flag. If this flag is set, then the folio is removed from the direct map, otherwise it is present in the direct map. Modifications of this flag, together with the corresponding direct map manipulations, must happen while holding the folio's lock. A gfn_to_pfn_cache cannot hold the folio lock for its entire lifetime, so it operates as follows: In gpc_map, under folio lock, restore the direct map entry and set PG_private to 0. In gpc_unmap, zap the direct map entry again and set PG_private back to 1. If inside gpc_map the cache finds a folio that has PG_private set to 0, it knows that another gfn_to_pfn_cache is currently active for the given pfn (as this is the only scenario in which PG_private can be 0 without the folio lock being held), and so it returns -EINVAL. The only other interesting scenario is then if kvm_{read,write}_guest is called for a gfn whose translation is currently cached inside a gfn_to_pfn_cache. In this case, kvm_{read,write}_guest notices that PG_private is 0 and skips all direct map manipulations. Since it is holding the folio lock, it can be sure that gpc_unmap cannot concurrently zap the direct map entries while kvm_{read,write}_guest still needs them. Note that this implementation is slightly too restrictive, as sometimes multiple gfn_to_pfn_caches need to be active for the same gfn (for example, each vCPU has its own kvm-clock structure, which they all try to put into the same gfn). Signed-off-by: Patrick Roy --- virt/kvm/kvm_main.c | 59 +++++++++++++++++++++--------- virt/kvm/pfncache.c | 89 +++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 123 insertions(+), 25 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4357f7cdf040..f968f1f3d7f7 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -52,6 +52,7 @@ #include #include +#include #include #include "coalesced_mmio.h" @@ -3291,8 +3292,8 @@ static int __kvm_read_guest_private_page(struct kvm *kvm, void *data, int offset, int len) { kvm_pfn_t pfn; - int r; - struct page *page; + int r = 0; + struct folio *folio; void *kaddr; if (!kvm_can_access_gmem(kvm)) @@ -3303,13 +3304,24 @@ static int __kvm_read_guest_private_page(struct kvm *kvm, if (r < 0) return -EFAULT; - page = pfn_to_page(pfn); - lock_page(page); - kaddr = page_address(page) + offset; - memcpy(data, kaddr, len); - unlock_page(page); - put_page(page); - return 0; + folio = pfn_folio(pfn); + folio_lock(folio); + kaddr = folio_address(folio); + if (folio_test_private(folio)) { + r = set_direct_map_default_noflush(&folio->page); + if (r) + goto out_unlock; + } + memcpy(data, kaddr + offset, len); + if (folio_test_private(folio)) { + r = set_direct_map_invalid_noflush(&folio->page); + if (r) + goto out_unlock; + } +out_unlock: + folio_unlock(folio); + folio_put(folio); + return r; } static int __kvm_vcpu_read_guest_private_page(struct kvm_vcpu *vcpu, @@ -3437,8 +3449,8 @@ static int __kvm_write_guest_private_page(struct kvm *kvm, const void *data, int offset, int len) { kvm_pfn_t pfn; - int r; - struct page *page; + int r = 0; + struct folio *folio; void *kaddr; if (!kvm_can_access_gmem(kvm)) @@ -3449,14 +3461,25 @@ static int __kvm_write_guest_private_page(struct kvm *kvm, if (r < 0) return -EFAULT; - page = pfn_to_page(pfn); - lock_page(page); - kaddr = page_address(page) + offset; - memcpy(kaddr, data, len); - unlock_page(page); - put_page(page); + folio = pfn_folio(pfn); + folio_lock(folio); + kaddr = folio_address(folio); + if (folio_test_private(folio)) { + r = set_direct_map_default_noflush(&folio->page); + if (r) + goto out_unlock; + } + memcpy(kaddr + offset, data, len); + if (folio_test_private(folio)) { + r = set_direct_map_invalid_noflush(&folio->page); + if (r) + goto out_unlock; + } - return 0; +out_unlock: + folio_unlock(folio); + folio_put(folio); + return r; } static int __kvm_vcpu_write_guest_private_page(struct kvm_vcpu *vcpu, diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index 6430e0a49558..95d2d5cdaa12 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -16,6 +16,9 @@ #include #include #include +#include + +#include #include "kvm_mm.h" @@ -99,10 +102,68 @@ bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len) return true; } -static void *gpc_map(kvm_pfn_t pfn) +static int gpc_map_gmem(kvm_pfn_t pfn) { - if (pfn_valid(pfn)) + int r = 0; + struct folio *folio = pfn_folio(pfn); + struct inode *inode = folio_inode(folio); + + if (((unsigned long)inode->i_private & KVM_GMEM_NO_DIRECT_MAP) == 0) + goto out; + + /* We need to avoid race conditions where set_memory_np is called for + * pages that other parts of KVM still try to access. We use the + * PG_private bit for this. If it is set, then the page is removed from + * the direct map. If it is cleared, the page is present in the direct + * map. All changes to this bit, and all modifications of the direct + * map entries for the page happen under the page lock. The _only_ + * place where a page will be in the direct map while the page lock is + * _not_ held is if it is inside a gpc. All other parts of KVM that + * temporarily re-insert gmem pages into the direct map (currently only + * guest_{read,write}_page) take the page lock before the direct map + * entry is restored, and hold it until it is zapped again. This means + * - If we reach gpc_map while, say, guest_read_page is operating on + * this page, we block on acquiring the page lock until + * guest_read_page is done. + * - If we reach gpc_map while another gpc is already caching this + * page, the page is present in the direct map and the PG_private + * flag is cleared. Int his case, we return -EINVAL below to avoid + * two gpcs caching the same page (since we do not ref-count + * insertions back into the direct map, when the first cache gets + * invalidated it would "break" the second cache that assumes the + * page is present in the direct map until the second cache itself + * gets invalidated). + * - Lastly, if guest_read_page is called for a page inside of a gpc, + * it will see that the PG_private flag is cleared, and thus assume + * it is present in the direct map (and leave the direct map entry + * untouched). Since it will be holding the page lock, it cannot race + * with gpc_unmap. + */ + folio_lock(folio); + if (folio_test_private(folio)) { + r = set_direct_map_default_noflush(&folio->page); + if (r) + goto out_unlock; + + folio_clear_private(folio); + } else { + r = -EINVAL; + } +out_unlock: + folio_unlock(folio); +out: + return r; +} + +static void *gpc_map(kvm_pfn_t pfn, bool private) +{ + if (pfn_valid(pfn)) { + if (private) { + if (gpc_map_gmem(pfn)) + return NULL; + } return kmap(pfn_to_page(pfn)); + } #ifdef CONFIG_HAS_IOMEM return memremap(pfn_to_hpa(pfn), PAGE_SIZE, MEMREMAP_WB); @@ -111,13 +172,27 @@ static void *gpc_map(kvm_pfn_t pfn) #endif } -static void gpc_unmap(kvm_pfn_t pfn, void *khva) +static void gpc_unmap(kvm_pfn_t pfn, void *khva, bool private) { /* Unmap the old pfn/page if it was mapped before. */ if (is_error_noslot_pfn(pfn) || !khva) return; if (pfn_valid(pfn)) { + if (private) { + struct folio *folio = pfn_folio(pfn); + struct inode *inode = folio_inode(folio); + + if ((unsigned long)inode->i_private & + KVM_GMEM_NO_DIRECT_MAP) { + folio_lock(folio); + BUG_ON(folio_test_private(folio)); + BUG_ON(set_direct_map_invalid_noflush( + &folio->page)); + folio_set_private(folio); + folio_unlock(folio); + } + } kunmap(pfn_to_page(pfn)); return; } @@ -195,7 +270,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc) * the existing mapping and didn't create a new one. */ if (new_khva != old_khva) - gpc_unmap(new_pfn, new_khva); + gpc_unmap(new_pfn, new_khva, gpc->is_private); kvm_release_pfn_clean(new_pfn); @@ -224,7 +299,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc) if (new_pfn == gpc->pfn) new_khva = old_khva; else - new_khva = gpc_map(new_pfn); + new_khva = gpc_map(new_pfn, gpc->is_private); if (!new_khva) { kvm_release_pfn_clean(new_pfn); @@ -379,7 +454,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l write_unlock_irq(&gpc->lock); if (unmap_old) - gpc_unmap(old_pfn, old_khva); + gpc_unmap(old_pfn, old_khva, old_private); return ret; } @@ -500,6 +575,6 @@ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc) list_del(&gpc->list); spin_unlock(&kvm->gpc_lock); - gpc_unmap(old_pfn, old_khva); + gpc_unmap(old_pfn, old_khva, gpc->is_private); } } From patchwork Tue Jul 9 13:20:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Roy X-Patchwork-Id: 13727966 Received: from smtp-fw-33001.amazon.com (smtp-fw-33001.amazon.com [207.171.190.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1ADE215F3E7; Tue, 9 Jul 2024 13:21:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=207.171.190.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720531290; cv=none; b=DpgWZ6qwG2zR6TtIZqMhzQ8weRxTONnehx5wpqcorjsi+c6+zAuf3Zn/VN6B13zCHr9dSN6tgicOA2Li7DOkeO6YCQyG2BP1ErHHjXNpi+M+4EYeD/vaiTLwpLWaMNlliao1g6LChBQBzozH3eaLiM5SnEqKSQt/E+2CkSRBSMU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720531290; c=relaxed/simple; bh=V11/sfRjwHrSFGwa38UBGzs+K1mzWvLoZ6p6GzApERs=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=A5Oj735/97Lt9kLdhkIXEYqn7NBm1EkAmTmNd1i5dsxaNpzECNrx+IP9rKXYH9XAEPo1oZ7K61kmsHkfxhgLxZRHAq/VVVXcuyahMcYjdY7zXOF2bIjJg1ihdJgz0ZHM/S+X2wJ1ioU0uGu/+F0x+kJxrr2QREAnmH3uLdR8OBE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (1024-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b=PlwmWrIi; arc=none smtp.client-ip=207.171.190.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b="PlwmWrIi" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1720531287; x=1752067287; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Zs9pi34/tgrMwAZ4z/XLHKURo1LH266GsNUebv4yq5Y=; b=PlwmWrIiBE+IVPEi67xgHs9KoDET7GLIOHr6V70yGpa0nuwRMOjv9/j/ gM8D/sZeij5p+PRz1AgcA1RbCdi123ZtYs7VgXAcNYLtqaCYQzOib3iUn h/eqFNp5BtjZiXs7XZ+NJQf+zWHEcdLSg+HDSOoBGxgl6OtNLjHZdiJAD Q=; X-IronPort-AV: E=Sophos;i="6.09,195,1716249600"; d="scan'208";a="355121480" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-33001.sea14.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2024 13:21:21 +0000 Received: from EX19MTAUEA001.ant.amazon.com [10.0.0.204:8257] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.59.218:2525] with esmtp (Farcaster) id f29e8f74-c847-4100-b6a1-572dc2245bb7; Tue, 9 Jul 2024 13:21:19 +0000 (UTC) X-Farcaster-Flow-ID: f29e8f74-c847-4100-b6a1-572dc2245bb7 Received: from EX19D008UEA002.ant.amazon.com (10.252.134.125) by EX19MTAUEA001.ant.amazon.com (10.252.134.203) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 9 Jul 2024 13:21:17 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D008UEA002.ant.amazon.com (10.252.134.125) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 9 Jul 2024 13:21:17 +0000 Received: from ua2d7e1a6107c5b.ant.amazon.com (172.19.88.180) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Tue, 9 Jul 2024 13:21:14 +0000 From: Patrick Roy To: , , , , , CC: Patrick Roy , , , , , , , , , , , , , , , , , Subject: [RFC PATCH 7/8] mm: secretmem: use AS_INACCESSIBLE to prohibit GUP Date: Tue, 9 Jul 2024 14:20:35 +0100 Message-ID: <20240709132041.3625501-8-roypat@amazon.co.uk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240709132041.3625501-1-roypat@amazon.co.uk> References: <20240709132041.3625501-1-roypat@amazon.co.uk> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Inside of vma_is_secretmem and secretmem_mapping, instead of checking whether a vm_area_struct/address_space has the secretmem ops structure attached to it, check whether the address_space has the AS_INACCESSIBLE bit set. Then set the AS_INACCESSIBLE flag for secretmem's address_space. This means that get_user_pages and friends are disables for all adress_spaces that set AS_INACCESIBLE. The AS_INACCESSIBLE flag was introduced in commit c72ceafbd12c ("mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory") specifically for guest_memfd to indicate that no reads and writes should ever be done to guest_memfd address_spaces. Disallowing gup seems like a reasonable semantic extension, and means that potential future mmaps of guest_memfd cannot be GUP'd. Signed-off-by: Patrick Roy --- include/linux/secretmem.h | 13 +++++++++++-- mm/secretmem.c | 6 +----- 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h index e918f96881f5..886c8f7eb63e 100644 --- a/include/linux/secretmem.h +++ b/include/linux/secretmem.h @@ -8,10 +8,19 @@ extern const struct address_space_operations secretmem_aops; static inline bool secretmem_mapping(struct address_space *mapping) { - return mapping->a_ops == &secretmem_aops; + return mapping->flags & AS_INACCESSIBLE; +} + +static inline bool vma_is_secretmem(struct vm_area_struct *vma) +{ + struct file *file = vma->vm_file; + + if (!file) + return false; + + return secretmem_mapping(file->f_inode->i_mapping); } -bool vma_is_secretmem(struct vm_area_struct *vma); bool secretmem_active(void); #else diff --git a/mm/secretmem.c b/mm/secretmem.c index 3afb5ad701e1..fd03a84a1cb5 100644 --- a/mm/secretmem.c +++ b/mm/secretmem.c @@ -136,11 +136,6 @@ static int secretmem_mmap(struct file *file, struct vm_area_struct *vma) return 0; } -bool vma_is_secretmem(struct vm_area_struct *vma) -{ - return vma->vm_ops == &secretmem_vm_ops; -} - static const struct file_operations secretmem_fops = { .release = secretmem_release, .mmap = secretmem_mmap, @@ -218,6 +213,7 @@ static struct file *secretmem_file_create(unsigned long flags) inode->i_op = &secretmem_iops; inode->i_mapping->a_ops = &secretmem_aops; + inode->i_mapping->flags |= AS_INACCESSIBLE; /* pretend we are a normal file with zero size */ inode->i_mode |= S_IFREG; From patchwork Tue Jul 9 13:20:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Roy X-Patchwork-Id: 13727965 Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AE2CA15F33A; Tue, 9 Jul 2024 13:21:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=207.171.184.29 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720531286; cv=none; b=PNuLUJ7BxJElqCK+XQcirnsm27RyMp+nzq7s5eHAMB2bDxyCSblNxXNK2TMWC//4X74JYbySztLGn7258WaT6l6YZvw3LnBEgaYMUUYaG95FwclGrlSAF5NIunDgvYvb/76knlvsfnMvbdvO8GhnucR1A0QgiXisMigo1JcHiV0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720531286; c=relaxed/simple; bh=hGiQdDYqbo2pnH+iABJVi6Yv+2CSx9CFiwRk70YCz60=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=LE8OMo0PGseq1JmVFufXSrKi4bURMY4imqzCqvqoETqs/zy91W2g3uPJEOKpuRdJYCOK9E/SMqiylyq9yb5qPI1jzdE+6it9a4lyknAjgVE2+mQNEMMGnVA+bvAevSjQSajYiCyhene2qbhPd7AuzjFFGasBl1qz4GljIFk1Ah8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (1024-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b=CJ+Anwjz; arc=none smtp.client-ip=207.171.184.29 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b="CJ+Anwjz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1720531285; x=1752067285; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OYFwwPMzkwoeUqGjPsHTCVZ/9U32HO+9FB3G59spnNM=; b=CJ+AnwjzBwoA7XOYlde3nQzi7kdKSPWklNPOUP37LAJ2cyKi8VNma2nt LevIY6ua4s+ARwd85vaHvxKA2rZNt/SbcQGbTwr5JjzEu6/un7x88i1il TtjfMmrxV8Hz00TRiwqWo+dSZ9661XwUCpMexh1gKXXImoSqgjhVs7UYJ 8=; X-IronPort-AV: E=Sophos;i="6.09,195,1716249600"; d="scan'208";a="432897857" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-9102.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2024 13:21:23 +0000 Received: from EX19MTAUEB001.ant.amazon.com [10.0.0.204:11016] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.59.218:2525] with esmtp (Farcaster) id b525f254-64fd-4b54-ab1d-bef2d53e8187; Tue, 9 Jul 2024 13:21:21 +0000 (UTC) X-Farcaster-Flow-ID: b525f254-64fd-4b54-ab1d-bef2d53e8187 Received: from EX19D008UEA004.ant.amazon.com (10.252.134.191) by EX19MTAUEB001.ant.amazon.com (10.252.135.108) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 9 Jul 2024 13:21:20 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D008UEA004.ant.amazon.com (10.252.134.191) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 9 Jul 2024 13:21:20 +0000 Received: from ua2d7e1a6107c5b.ant.amazon.com (172.19.88.180) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Tue, 9 Jul 2024 13:21:17 +0000 From: Patrick Roy To: , , , , , CC: Patrick Roy , , , , , , , , , , , , , , , , , Subject: [RFC PATCH 8/8] kvm: gmem: Allow restricted userspace mappings Date: Tue, 9 Jul 2024 14:20:36 +0100 Message-ID: <20240709132041.3625501-9-roypat@amazon.co.uk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240709132041.3625501-1-roypat@amazon.co.uk> References: <20240709132041.3625501-1-roypat@amazon.co.uk> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Allow mapping guest_memfd into userspace. Since AS_INACCESSIBLE is set on the underlying address_space struct, no GUP of guest_memfd will be possible. Signed-off-by: Patrick Roy --- virt/kvm/guest_memfd.c | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index dc9b0c2d0b0e..101ec2b248bf 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -319,7 +319,37 @@ static inline struct file *kvm_gmem_get_file(struct kvm_memory_slot *slot) return get_file_active(&slot->gmem.file); } +static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf) +{ + struct folio *folio; + + folio = kvm_gmem_get_folio(file_inode(vmf->vma->vm_file), vmf->pgoff, true); + + if (!folio) + return VM_FAULT_SIGBUS; + + vmf->page = folio_file_page(folio, vmf->pgoff); + + return VM_FAULT_LOCKED; +} + +static const struct vm_operations_struct kvm_gmem_vm_ops = { + .fault = kvm_gmem_fault +}; + +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma) +{ + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0) + return -EINVAL; + + vm_flags_set(vma, VM_DONTDUMP); + vma->vm_ops = &kvm_gmem_vm_ops; + + return 0; +} + static struct file_operations kvm_gmem_fops = { + .mmap = kvm_gmem_mmap, .open = generic_file_open, .release = kvm_gmem_release, .fallocate = kvm_gmem_fallocate, @@ -594,7 +624,6 @@ static int __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot, return -EFAULT; } - gmem = file->private_data; if (xa_load(&gmem->bindings, index) != slot) { WARN_ON_ONCE(xa_load(&gmem->bindings, index)); return -EIO;