From patchwork Thu Oct 10 08:59:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13829806 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E5E4CF07DA for ; Thu, 10 Oct 2024 09:00:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1BD256B00A1; Thu, 10 Oct 2024 05:00:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 16C836B00A2; Thu, 10 Oct 2024 05:00:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED9C26B00A3; Thu, 10 Oct 2024 05:00:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id CE64D6B00A1 for ; Thu, 10 Oct 2024 05:00:01 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id A7900A16A3 for ; Thu, 10 Oct 2024 08:59:55 +0000 (UTC) X-FDA: 82657095402.19.902EE45 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) by imf29.hostedemail.com (Postfix) with ESMTP id 7EF4712000F for ; Thu, 10 Oct 2024 08:59:58 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=UQfMrjIv; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of 3jpcHZwUKCBsK12217FF7C5.3FDC9ELO-DDBM13B.FI7@flex--tabba.bounces.google.com designates 209.85.128.74 as permitted sender) smtp.mailfrom=3jpcHZwUKCBsK12217FF7C5.3FDC9ELO-DDBM13B.FI7@flex--tabba.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728550688; a=rsa-sha256; cv=none; b=D6GgsAkwdi7XdCMQhklsaDwN+en6Whs+zeYYFr5UIevWOWjmzXENgsZKrY1K6L/RIpqMRj iP2vYAbvaWvRRAQ7n9epp0rsJCNhlB1tkTmOJ7KALzjJsCOm5R9p1pP1hagCTS7FBvSbAd ypWsgBb0t6i8g+T8fa0/iR0mbEke/dk= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=UQfMrjIv; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of 3jpcHZwUKCBsK12217FF7C5.3FDC9ELO-DDBM13B.FI7@flex--tabba.bounces.google.com designates 209.85.128.74 as permitted sender) smtp.mailfrom=3jpcHZwUKCBsK12217FF7C5.3FDC9ELO-DDBM13B.FI7@flex--tabba.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728550688; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=exyGShkIuMmlgn/y2fH9jdue3XxSmqPMQZyWZJ9IIOc=; b=KtyKux8Jl3qEAMT8JkxBD2xbEq/7CAnAKARyF5VLjkQMF377NwN6+QarcRC6VK2biv3tC/ FD8Zj24S01TilpUtfsk/B8krabpj2GxgrqQBOLJsuvEZeGyMLvqRLWsw5jmWC667pvpwAT xh9ATcjAZGU4RIUxMp8XARhx1gKbyys= Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-42cb080ab53so5505385e9.0 for ; Thu, 10 Oct 2024 01:59:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1728550798; x=1729155598; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=exyGShkIuMmlgn/y2fH9jdue3XxSmqPMQZyWZJ9IIOc=; b=UQfMrjIvvjZ4UFCnH+0dlKKExQTb2ANieco1qbwTisfzBD/nTSDMpXGlZR3LgbBBGx scw9Lsm5H1BEGVikJ3aL02LNtDGi4PXFgtE1nllJB8nYGLmoHmB7Y+pzdtb6NuBf8WM4 vnlMf8E+aWhksV5+cXefUbCuTMgDa5IWGM6DfEsRmwMZmkr29jeyzGgyqtyvBhYmjUt6 pJbTAqmVLgSxKM7EOsmVG4DOwGERu23kkTUnFujVP5V7dGEW+Iy76qRlwRRxjKHwc/m3 JQk5Nu8u1FZAnIJ0Hyy70Co+n/rtx4yLQvVe5KCYUP3LpEnsp9mvLKa/Qyv8jvWawsU+ cfuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728550798; x=1729155598; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=exyGShkIuMmlgn/y2fH9jdue3XxSmqPMQZyWZJ9IIOc=; b=xDnpZJl2ALTjjQbkxga/yN6xATN4wHGdXG3JbiJQpv/Z9v9uKtizOS0BvXNFpdYlfU NcwBDPxutrceX2imRJSkdoxYmD6Laq1pW3/gdR3BDhBA/UFlyU/jR/46Gk77yd1XVR3/ UeLcl9gfUDbiFCygVYexbsXYwdZSr270gMrEDyBp39oBKz7iU0c0QrC0aZdSxmV4n0V4 fDhc0zzCwABnQ1OvbJrrWVf71g3gqhZYIR8bXnTFohWK+SMHq+sdCiFXFJEKpGmHJWDp so9JrrCEf7TalKS2E1y9Oj0HYYJbu/Xu7H/u27V9/5+0FCxh6e7FYYsVquIJqeLeyO0+ qIPQ== X-Forwarded-Encrypted: i=1; AJvYcCWLIocgo42xtYTMUl55UaKEwN7QseI1+ab7KHi6o7ZvKPcUJIs5lAS5ExxGCOFE2qWfzOTxYqFy+A==@kvack.org X-Gm-Message-State: AOJu0YxmOa+l7Lhe5ZRbfLytRFUdHpTu+S/uqcSzGAx3ZPcuxmA7iHSo V3gdOXopBSl4J3ed8+xXTpAy2SieSEBhq3NdD3zH7F8WdJDnM5tskAloDi5jFyUqnEisEXqFGQ= = X-Google-Smtp-Source: AGHT+IE3WukKHOq7Bmp4xhzc8OQfh5cP/L7vmCsDLrA8EMFDPkbOy1WECbzZ6HHl8nL5SydVZHebiozHSw== X-Received: from fuad.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:1613]) (user=tabba job=sendgmr) by 2002:a05:6000:1091:b0:37d:4d36:e178 with SMTP id ffacd0b85a97d-37d4d36e297mr456f8f.7.1728550798081; Thu, 10 Oct 2024 01:59:58 -0700 (PDT) Date: Thu, 10 Oct 2024 09:59:29 +0100 In-Reply-To: <20241010085930.1546800-1-tabba@google.com> Mime-Version: 1.0 References: <20241010085930.1546800-1-tabba@google.com> X-Mailer: git-send-email 2.47.0.rc0.187.ge670bccf7e-goog Message-ID: <20241010085930.1546800-11-tabba@google.com> Subject: [PATCH v3 10/11] KVM: arm64: Handle guest_memfd()-backed guest page faults From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 7EF4712000F X-Stat-Signature: bomb1rce9jgr8f5qa3qmro9x746qyj46 X-Rspam-User: X-HE-Tag: 1728550798-316463 X-HE-Meta: U2FsdGVkX1+iiJ1eVCcWazbT/A86aFPzGkdC+3YoeQVjxSyIk+aCY1tAx2ZnrMPJOp56kLPrlumz6RVS4LhlNCO6R1qjzZvJ9erGEcz9tzkVTMmrb6TICvf5vHLz+OPc/Ul2N3sc3OGUamWhwvbdXt7APGCB1oPx9DWq7r6qRTSiUCu6/tPka90w/XQcGFvdO/sS4LEz10rNK19W+zl1bCBa5zc9Bp5wQSwHc531VH9QdOmfNhhFeC5+hzjCuiQ+bK0edmKDLEaPYHO1YlvqOkb4HDYBA+0fDQ9gAa0+oeWK61UXcfTMfsYv1hhW4JE/LvW/+GVTeoLXbu8HqybW/coqXAjwA0IQxJyAmiZwOEVFWJZAHbag2er/DJ3xCKcdy2k2gBP0RYF9knFwAhn52XwPY59/tcZx84vYmLdBan4ADVA7u0FoJ+KRbcJklReHFOPY3mkaq/qyh5Z1G1eT3BoqpcUURx6YwIGng4QqT7bJTzRwrBJPdQTJWgo82lozzSEXyQkR83falCB+bQuOu49CBuue03LpsnylKQQRHCORSuNJQ+X+VlUxlPpKftZN//viGw3x43jKVW0VfoiDno6bRHfFlEs7vPeLUYkfy7N5pf+KqiRass51XhNqoHDHDo5uF7hTPtIO3Ye/ETb7Ad+otv47T0RpFi7CS9h1tcMlP4LzRl2WADUznn7HDzAFBRI2qrKHerWbONpYt0H7Iix0ZEx1zYBhOTcY3KE+m/l3pdhzO32XW3dxsmral8+BWLZaJJtjZrlQ9chrFot1APQ6fNCzJsoY5qerZNktGUEdb8DAA8BAtgJYqREbxGl2L8OYVYeJbLQFRHYn0wTwQ59dfnyM0/uA9bQrhxIvmogL/4y50exIN/OSEWkIQp+SBGQYRhS4VYVYJkn/8Wk9/Zaj4WBufha9OeubT4yj/UACwNoXtcPeO0s8r3374BGeNjqRaiFfjc8A9keT0+w sRFLnAwq cP3rvP2F6xdB0V7wAUy5pjKaS6L06vbtjddfzpwijhJDylTvtGFHVQf81IhGiflabbogE7/EIxkPnFe8eewnOFQpnTWeLtyBGiWriUMln06HCE2Z+Ag6O1QD4GDt7p7s5ZJG+Yq5Thm8RwUinwg03nfy2RmDwW5gEPMe5XVG0XdBtq+KnO7MItNYVp055I0F8MWYG8BvpThGEU3rEKz0yBvIkrYzjtydGEnpSddV/0tEZBF10aRKIEBaRGHiQU+cBG3JijFSjWXTh01oBC4yVTjkLPHsYHLAaJw93837PrAfSX06kMoiQkfr06cm2iLEkhe6UrO0nVJ9mTZDTPQDGVI1Gb3VXCz/6/CqSlXNg9xaqH/ebs0Qn/+HDKW7TxhgiwMRZH5Oc3crvYWuWUPX3vqhNSQ8F9H3x7oVeIMtJGlTTzcoCzjOo9ZSJLt0TagLMydiVuW5iKT6F9xGabI6n9nkgGljBw9noSENDv73dSSLjS1AjqAO6g7ey6Y9iEcAvxWsSudUzxyrJEPX7jFcMi7j8eWJr9ZrD2VjoisHoF6ngQnSFHW/ZVYGradq55UZ/usLZdKg0Z5hskM2ncDPe3Rhgmx5FqEY7wNA6e2gVRNlU7irwl02D6pdxhxE3Ks6oEL1Sc98s0tBbuWi4t7HbM8y8oA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add arm64 support for resolving guest page faults on guest_memfd() backed memslots. This support is not contingent on pKVM, or other confidential computing support, and works in both VHE and nVHE modes. Without confidential computing, this support is useful for testing and debugging. In the future, it might also be useful should a user want to use guest_memfd() for all code, whether it's for a protected guest or not. For now, the fault granule is restricted to PAGE_SIZE. Signed-off-by: Fuad Tabba --- arch/arm64/kvm/mmu.c | 112 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 110 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 71ceea661701..250c59f0ca5b 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1422,6 +1422,108 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma) return vma->vm_flags & VM_MTE_ALLOWED; } +static int guest_memfd_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, + struct kvm_memory_slot *memslot, bool fault_is_perm) +{ + struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache; + bool exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu); + bool logging_active = memslot_is_logging(memslot); + struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt; + enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R; + bool write_fault = kvm_is_write_fault(vcpu); + struct mm_struct *mm = current->mm; + gfn_t gfn = gpa_to_gfn(fault_ipa); + struct kvm *kvm = vcpu->kvm; + struct page *page; + kvm_pfn_t pfn; + int ret; + + /* For now, guest_memfd() only supports PAGE_SIZE granules. */ + if (WARN_ON_ONCE(fault_is_perm && + kvm_vcpu_trap_get_perm_fault_granule(vcpu) != PAGE_SIZE)) { + return -EFAULT; + } + + VM_BUG_ON(write_fault && exec_fault); + + if (fault_is_perm && !write_fault && !exec_fault) { + kvm_err("Unexpected L2 read permission error\n"); + return -EFAULT; + } + + /* + * Permission faults just need to update the existing leaf entry, + * and so normally don't require allocations from the memcache. The + * only exception to this is when dirty logging is enabled at runtime + * and a write fault needs to collapse a block entry into a table. + */ + if (!fault_is_perm || (logging_active && write_fault)) { + ret = kvm_mmu_topup_memory_cache(memcache, + kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu)); + if (ret) + return ret; + } + + /* + * Holds the folio lock until mapped in the guest and its refcount is + * stable, to avoid races with paths that check if the folio is mapped + * by the host. + */ + ret = kvm_gmem_get_pfn_locked(kvm, memslot, gfn, &pfn, NULL); + if (ret) + return ret; + + page = pfn_to_page(pfn); + + /* + * Once it's faulted in, a guest_memfd() page will stay in memory. + * Therefore, count it as locked. + */ + if (!fault_is_perm) { + ret = account_locked_vm(mm, 1, true); + if (ret) + goto unlock_page; + } + + read_lock(&kvm->mmu_lock); + if (write_fault) + prot |= KVM_PGTABLE_PROT_W; + + if (exec_fault) + prot |= KVM_PGTABLE_PROT_X; + + if (cpus_have_final_cap(ARM64_HAS_CACHE_DIC)) + prot |= KVM_PGTABLE_PROT_X; + + /* + * Under the premise of getting a FSC_PERM fault, we just need to relax + * permissions. + */ + if (fault_is_perm) + ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot); + else + ret = kvm_pgtable_stage2_map(pgt, fault_ipa, PAGE_SIZE, + __pfn_to_phys(pfn), prot, + memcache, + KVM_PGTABLE_WALK_HANDLE_FAULT | + KVM_PGTABLE_WALK_SHARED); + + /* Mark the page dirty only if the fault is handled successfully */ + if (write_fault && !ret) { + kvm_set_pfn_dirty(pfn); + mark_page_dirty_in_slot(kvm, memslot, gfn); + } + read_unlock(&kvm->mmu_lock); + + if (ret && !fault_is_perm) + account_locked_vm(mm, 1, false); +unlock_page: + put_page(page); + unlock_page(page); + + return ret != -EAGAIN ? ret : 0; +} + static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct kvm_s2_trans *nested, struct kvm_memory_slot *memslot, unsigned long hva, @@ -1893,8 +1995,14 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) goto out_unlock; } - ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva, - esr_fsc_is_permission_fault(esr)); + if (kvm_slot_can_be_private(memslot)) { + ret = guest_memfd_abort(vcpu, fault_ipa, memslot, + esr_fsc_is_permission_fault(esr)); + } else { + ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva, + esr_fsc_is_permission_fault(esr)); + } + if (ret == 0) ret = 1; out: