From patchwork Fri Oct 27 18:21:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13438936 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F9D4C25B70 for ; Fri, 27 Oct 2023 18:22:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7824180010; Fri, 27 Oct 2023 14:22:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 70AE68000C; Fri, 27 Oct 2023 14:22:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4C02080010; Fri, 27 Oct 2023 14:22:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 34BA48000C for ; Fri, 27 Oct 2023 14:22:46 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 12D801CBB26 for ; Fri, 27 Oct 2023 18:22:46 +0000 (UTC) X-FDA: 81392062332.01.AB0D342 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf08.hostedemail.com (Postfix) with ESMTP id 31E3A16000F for ; Fri, 27 Oct 2023 18:22:44 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=tF5mWc1B; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of 38_87ZQYKCBQCyu73w08805y.w86527EH-664Fuw4.8B0@flex--seanjc.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=38_87ZQYKCBQCyu73w08805y.w86527EH-664Fuw4.8B0@flex--seanjc.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698430964; a=rsa-sha256; cv=none; b=xuYlQ1+P4ernGWI05+W8VZqxS8ALoKOQVb0EWGz/o0q9MZeJs7lUPxNgVovs55SxCvpzlN RvGG7tTUubyzpFbIYB/MlBmlV/xLkhkrolJ/vaU+Ttrraksaw5GbawpoiD4ytZqKJyy/JR nKbVkoJ99Q/ArdkV/M94/QJCQa3YCxM= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=tF5mWc1B; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of 38_87ZQYKCBQCyu73w08805y.w86527EH-664Fuw4.8B0@flex--seanjc.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=38_87ZQYKCBQCyu73w08805y.w86527EH-664Fuw4.8B0@flex--seanjc.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698430964; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dSzaJ5MjV13E6q0DS9u4ZW54i690mbtysQRlTl8m4E8=; b=FkCsD9hAKFSA81XiWhrrHt8Xit+z7VcWM72AiBqAVoyoyx+lGt7GZWw0tOQ8zi8P7bNqR3 HuBEQKobUBe9oF92W63Nq05MkYyZYCMNIY12u+le5YvwF9ISgmf1oiNlB7yCffmoyFTsZw jTbsIgpuDr7R/qFzKE1+LGWjmzkwM6E= Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-da04fb79246so1711643276.2 for ; Fri, 27 Oct 2023 11:22:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698430963; x=1699035763; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=dSzaJ5MjV13E6q0DS9u4ZW54i690mbtysQRlTl8m4E8=; b=tF5mWc1BrqSA65DExzkoKKazt3+0/c5XCPvDj38zKVNfBXJ7yfniKrZIl1Y7LG6ioa xCOwyDQR37kmTa+nKePr5up9tg83e/tpuPbKLguUDqf1i6Q25Z2Hf9EDUhvQ1EKzRwXD XQhkqEYBjkBHOiweuumfcUhLqWVk+4BgnJWLkbqb2JwRfKocS+p1SIdQBBIMqOowUC/1 IWcsWumZtgRjSjB0p55VH+d48RJHU9Eyu46uBqGtqqsCMkzal7jXDZNap2ob03C0gzHd og2yOHk/6T7LlvZxMbFZjaABG8Ve28nWorzGHBr3SR/qGeFic6Cp/rszwmpOjrSA4i9i inXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698430963; x=1699035763; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=dSzaJ5MjV13E6q0DS9u4ZW54i690mbtysQRlTl8m4E8=; b=QWz9AbeFOECNN9oKzIM9z9eV2hSWpDBYp5S6V/paLSCEois7F2UM5Sic/KQOPSSa3T RB0Uf2MaZ/7fWYDhO7hAgRSPDCId+t+LNwE5Krsq78tpqNlrLKCJSmz1DVfXJuzZ78CC n0C/Ks/W+xKs/PCITwXgSBVa6q7Ccv+idSlrhjv7nxKI2hSro5/4O78xr0FTLsY7u7J6 KndT9YDWbxCgt1BJnBmBOIrA79yZ0wtew0pCJ7pbTVUaWS6RQ32GfKJYzZ8ZMXmppynM C2eFrcrvtZsHlfdGZkyA2iSu1M+aQRNfbY5MCOk/wupzfyepqh+UNGEeCfwR5O/QvbwJ KAaA== X-Gm-Message-State: AOJu0YzmIdbLou2iXQ+EcfpWmklL3WtbAfJdA8qT6P263TkTKx6faRe/ hIS8C9t+FxEaQoMyzDSW5n1VVhvO4NQ= X-Google-Smtp-Source: AGHT+IE2yfyx6OwAXQ6PbUozSNKQr4HAB6CvBECxReNk+TNP1aa3zG1osU5Glk09o10tbDLcLiI8MNAMXYU= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:1746:b0:d9a:59cb:8bed with SMTP id bz6-20020a056902174600b00d9a59cb8bedmr61033ybb.5.1698430963421; Fri, 27 Oct 2023 11:22:43 -0700 (PDT) Reply-To: Sean Christopherson Date: Fri, 27 Oct 2023 11:21:51 -0700 In-Reply-To: <20231027182217.3615211-1-seanjc@google.com> Mime-Version: 1.0 References: <20231027182217.3615211-1-seanjc@google.com> X-Mailer: git-send-email 2.42.0.820.g83a721a137-goog Message-ID: <20231027182217.3615211-10-seanjc@google.com> Subject: [PATCH v13 09/35] KVM: Add KVM_EXIT_MEMORY_FAULT exit to report faults to userspace From: Sean Christopherson To: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Sean Christopherson , Alexander Viro , Christian Brauner , "Matthew Wilcox (Oracle)" , Andrew Morton Cc: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Xiaoyao Li , Xu Yilun , Chao Peng , Fuad Tabba , Jarkko Sakkinen , Anish Moorthy , David Matlack , Yu Zhang , Isaku Yamahata , " =?utf-8?q?Micka=C3=ABl_Sala?= =?utf-8?q?=C3=BCn?= " , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 31E3A16000F X-Stat-Signature: 399xnbdhurszndrypo7axqonnum9ntcq X-HE-Tag: 1698430964-393207 X-HE-Meta: U2FsdGVkX1+BX+kNkMDkEKcfbpXH1jdj1Pr3E84TNMFGp/IOyyQKZQMDMDObFQ/lM9TYIhPt8dhjRGDL+E1igkZR/Xf35nCTEQoLgeKmRPzKxnoXyiXP7/j3SAZexKOZ3qNjy1eJeZtXJ1caMtsB2hDF8wOKOCfF0Omcdp5DNj+utsb/fkWjLAP4JXjtLEDjedthA3XzLEfZdLAAj7wHPjGQeRrCnXBzw03A9AmbXOfsOoWuQ85xlhrtdrv6EZXd+xyTubryU2DXHETdkj5tZVA70n8urwmg8dWrKwuMWcL4Jti4p/GbXFMgp56dIQ2ZTrEoN4lrOpoQLO4eMvqHy7of+ohDXWh+DmFFx1FSWWeO//uJZyLnOUSfo+Cormnls2/hx4QY3lqcof0F3gC2eUaxoQtB9LfZRaUThAuWu7DjkqGQgqe0m532thWr7eK/A3l9EAVg2QdL00Um8D1AxSPDpjopLqfPRuCZzkRXMEC70CZvc0NNERefC0ONeb9Lx2n5WPggIZpLZhwH99rIeFjvc6LY5tjls/Xopyyva0K0lFGBMX3lWotaw1mCAwtOtPX+FCBcXZuAVmPKFtNZc5vyyI9L8HTCXEl0CPDZwg5SZpLafTT/gJ1f9nsTLfwbtiD4wm0OpUH8DjtpwBUCqXNQr7WhFIshZGPc7sXqZu4Ezp73JIDIh/NwLn1U6GbfEZDRoWOSftRJgtd+Dmgs7U6u5CD3wUsuq+WnWQVhJTWOE9h9KJfF7RYUz/dhmwJOznH9Vv6THgQlTrbDG0vd7d6wDiPhrg8YL9KykgTu3ACON0p8x1v6ix1upf/0r/wg7uVooMmsLOaliv5bnC5MuKAJGINiN6fTI9xHG0H9443PyNtHpqfQp9WBasLwOYszsvWaLkPVNNW71zxWF/2upF3qmkMvX4/meWq+9NUbOgRlwBDRXUPI7rnZxDa5dNElk9JGJ7xAqirPLpdzJfR QdfohwID ly/vP6P6AZcwI5BdD1USPk33ZBkhHpIImSU+BcW/LSNsz3ye8l6YYHGFPFawclZZ+Ovk3I/mWPwBOpRswfTNm2n4GS3fG61VbdYZFQAxUGA7jIecveAyGt3qyWVXONn6uYQT6+gFK1lAp2RU6z363mQb9V1Y1rlTaWLCy29DfmuEjJoWLobBkx5//oiKqK4nHBFzMZcqlrQbCIU4cexbwhtO8eYBqksQPkOnpihmX2eOM8ewiG6G5f/f3eDGf8CK/ADNydEsJ9bjZOtFdAyGaaveFZHetcyVJqUhH3SkA9n6I6W0uKeOpa6R4qfvGkUSkdf3rRFWbIyblMfHetMrql6sK97Sx+hDyMjm3lwgYG5NR+xSauAoFpJ94ei4ZZ2evkaE1uqM7FCiIKj2X24M91wFMBG8b3n4b6f+jEy4ivAI1lqvoL0wgZvWFotau8vGCfsWsZsNG5jA/lchgOliAEN+B0Ia/SLzCSt1af7zyDS/c3EifCvEBVC7IZplkOPYI7m2DnlCx64Yhn3ZBU5B2uvyA8HpfVYpStuGVPCIMoSdVZL1La3wuJbI6s/pEUxu8RImoCCRmgikVeLBvyXJDNia7Drbs6w/rTcj8kOpTvNg1XNUoJVyfXCuQ5rSdCrHrV22Y1Yuwar2gDewDqdfhf5ge5Xlccyt/4o8v9UUd49PCMn4porlQ4AGW6A9k2FY1dnYw2CPEs6H5lD9uHvRjV2C3urz6NgR3pL/gRFsLpO2TRHYjN0odjG99i9cjBGuJ5owMyEvJr5L4FcTkmUyCC0AlPYLIxUp0QqP1Mutpqo6nNY1k4JBu8gM8UgloOTBnbIMp4MPL/lV7PjjXyQnGvAxjLjjvu8otvRPfz1YH2eDIyjHJLQyUWaJtxboJ1k3IHbwXTTJVQOjXinCEzafMUKb2UDM/r1AhYv/Bd1X4IqQHUiM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Chao Peng Add a new KVM exit type to allow userspace to handle memory faults that KVM cannot resolve, but that userspace *may* be able to handle (without terminating the guest). KVM will initially use KVM_EXIT_MEMORY_FAULT to report implicit conversions between private and shared memory. With guest private memory, there will be two kind of memory conversions: - explicit conversion: happens when the guest explicitly calls into KVM to map a range (as private or shared) - implicit conversion: happens when the guest attempts to access a gfn that is configured in the "wrong" state (private vs. shared) On x86 (first architecture to support guest private memory), explicit conversions will be reported via KVM_EXIT_HYPERCALL+KVM_HC_MAP_GPA_RANGE, but reporting KVM_EXIT_HYPERCALL for implicit conversions is undesriable as there is (obviously) no hypercall, and there is no guarantee that the guest actually intends to convert between private and shared, i.e. what KVM thinks is an implicit conversion "request" could actually be the result of a guest code bug. KVM_EXIT_MEMORY_FAULT will be used to report memory faults that appear to be implicit conversions. Note! To allow for future possibilities where KVM reports KVM_EXIT_MEMORY_FAULT and fills run->memory_fault on _any_ unresolved fault, KVM returns "-EFAULT" (-1 with errno == EFAULT from userspace's perspective), not '0'! Due to historical baggage within KVM, exiting to userspace with '0' from deep callstacks, e.g. in emulation paths, is infeasible as doing so would require a near-complete overhaul of KVM, whereas KVM already propagates -errno return codes to userspace even when the -errno originated in a low level helper. Report the gpa+size instead of a single gfn even though the initial usage is expected to always report single pages. It's entirely possible, likely even, that KVM will someday support sub-page granularity faults, e.g. Intel's sub-page protection feature allows for additional protections at 128-byte granularity. Link: https://lore.kernel.org/all/20230908222905.1321305-5-amoorthy@google.com Link: https://lore.kernel.org/all/ZQ3AmLO2SYv3DszH@google.com Cc: Anish Moorthy Cc: David Matlack Suggested-by: Sean Christopherson Co-developed-by: Yu Zhang Signed-off-by: Yu Zhang Signed-off-by: Chao Peng Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Reviewed-by: Paolo Bonzini --- Documentation/virt/kvm/api.rst | 41 ++++++++++++++++++++++++++++++++++ arch/x86/kvm/x86.c | 1 + include/linux/kvm_host.h | 11 +++++++++ include/uapi/linux/kvm.h | 8 +++++++ 4 files changed, 61 insertions(+) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index ace984acc125..860216536810 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6723,6 +6723,26 @@ array field represents return values. The userspace should update the return values of SBI call before resuming the VCPU. For more details on RISC-V SBI spec refer, https://github.com/riscv/riscv-sbi-doc. +:: + + /* KVM_EXIT_MEMORY_FAULT */ + struct { + __u64 flags; + __u64 gpa; + __u64 size; + } memory; + +KVM_EXIT_MEMORY_FAULT indicates the vCPU has encountered a memory fault that +could not be resolved by KVM. The 'gpa' and 'size' (in bytes) describe the +guest physical address range [gpa, gpa + size) of the fault. The 'flags' field +describes properties of the faulting access that are likely pertinent. +Currently, no flags are defined. + +Note! KVM_EXIT_MEMORY_FAULT is unique among all KVM exit reasons in that it +accompanies a return code of '-1', not '0'! errno will always be set to EFAULT +or EHWPOISON when KVM exits with KVM_EXIT_MEMORY_FAULT, userspace should assume +kvm_run.exit_reason is stale/undefined for all other error numbers. + :: /* KVM_EXIT_NOTIFY */ @@ -7757,6 +7777,27 @@ This capability is aimed to mitigate the threat that malicious VMs can cause CPU stuck (due to event windows don't open up) and make the CPU unavailable to host or other VMs. +7.34 KVM_CAP_MEMORY_FAULT_INFO +------------------------------ + +:Architectures: x86 +:Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP. + +The presence of this capability indicates that KVM_RUN will fill +kvm_run.memory_fault if KVM cannot resolve a guest page fault VM-Exit, e.g. if +there is a valid memslot but no backing VMA for the corresponding host virtual +address. + +The information in kvm_run.memory_fault is valid if and only if KVM_RUN returns +an error with errno=EFAULT or errno=EHWPOISON *and* kvm_run.exit_reason is set +to KVM_EXIT_MEMORY_FAULT. + +Note: Userspaces which attempt to resolve memory faults so that they can retry +KVM_RUN are encouraged to guard against repeatedly receiving the same +error/annotated fault. + +See KVM_EXIT_MEMORY_FAULT for more information. + 8. Other capabilities. ====================== diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6409914428ca..ee3cd8c3c0ef 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4518,6 +4518,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_ENABLE_CAP: case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES: case KVM_CAP_IRQFD_RESAMPLE: + case KVM_CAP_MEMORY_FAULT_INFO: r = 1; break; case KVM_CAP_EXIT_HYPERCALL: diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 4e741ff27af3..96aa930536b1 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2327,4 +2327,15 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr) /* Max number of entries allowed for each kvm dirty ring */ #define KVM_DIRTY_RING_MAX_ENTRIES 65536 +static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, + gpa_t gpa, gpa_t size) +{ + vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT; + vcpu->run->memory_fault.gpa = gpa; + vcpu->run->memory_fault.size = size; + + /* Flags are not (yet) defined or communicated to userspace. */ + vcpu->run->memory_fault.flags = 0; +} + #endif diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index bd1abe067f28..7ae9987b48dd 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -274,6 +274,7 @@ struct kvm_xen_exit { #define KVM_EXIT_RISCV_SBI 35 #define KVM_EXIT_RISCV_CSR 36 #define KVM_EXIT_NOTIFY 37 +#define KVM_EXIT_MEMORY_FAULT 38 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -520,6 +521,12 @@ struct kvm_run { #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0) __u32 flags; } notify; + /* KVM_EXIT_MEMORY_FAULT */ + struct { + __u64 flags; + __u64 gpa; + __u64 size; + } memory_fault; /* Fix the size of the union. */ char padding[256]; }; @@ -1203,6 +1210,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228 #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229 #define KVM_CAP_USER_MEMORY2 230 +#define KVM_CAP_MEMORY_FAULT_INFO 231 #ifdef KVM_CAP_IRQ_ROUTING