From patchwork Wed Apr 10 22:07:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13625086 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 49100184102; Wed, 10 Apr 2024 22:07:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712786877; cv=none; b=suoUHNlysQhBNqBXuMteQYB0m3zHXudfVSSRWDJUsVf5Th05t5iXlzd4VAiAFQDWEfRSFwRnRmsZAyTDn4HECbNmN8G938/yIbHtlTfDgY7wpKM8gKHkkmis/cqIgAl1ynjySpmleHD+UA/GGhD8f6z9dinhKixtsZVx+NKsCxU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712786877; c=relaxed/simple; bh=6zo9q44VDWHNQpisy/DUoH9d+K7FwbxtL1YxaN+nx9Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MWY47GoJ19gs22kHGdtDvhwdaHIJg/MI1rrtydRzIi9960j3hcbNG0GCrGnj0elVa5+LkaoHM/jtQ2c2kHhwb64i8PKhed7yeN7T0a5T3aseOFroKs1X2Q/UKUQ3alR/g5+Se2P466BXaXg5tXTNGE/jJaK9p7bpX4+BId4cMrs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=iP4j2Wcn; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="iP4j2Wcn" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1712786876; x=1744322876; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6zo9q44VDWHNQpisy/DUoH9d+K7FwbxtL1YxaN+nx9Y=; b=iP4j2Wcn67AtWEJvGDb3oOfn+e+JgJ4QhecUL7uSkWQHISD2jT+g+LD2 QBa+kIY6iu4Fx0cgoEHp7HFq3n9CybIYWMtJzCqPxXK4das98aXUreO0g wdQYEpfn+BGm5mpPmvq8reuT53owovBEl9cV7PigzShCqL4HyK1yUS55l Q5KFC4InPhN0sIZAqRVb0qAR6ckQWusU8pr4Jm9rihKifzLqD+f2IdWox pc6nahqxyNaUPXaam9L0z8QLb52JIZhATkdaWMMOGpJTTYWQk0Vp53pdk 13IVMDwQxXaXcTt3VMqp69FrNXUPNEezMLFX8NpjoqLrUr2KMHyuX9kQO w==; X-CSE-ConnectionGUID: Mpj7y31QRie0GYv8TWQpsA== X-CSE-MsgGUID: e4lawNIbTI2hY7lLkLcPPw== X-IronPort-AV: E=McAfee;i="6600,9927,11039"; a="8041105" X-IronPort-AV: E=Sophos;i="6.07,191,1708416000"; d="scan'208";a="8041105" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2024 15:07:53 -0700 X-CSE-ConnectionGUID: uYlNVSKcQSyPKHp3AC56Zg== X-CSE-MsgGUID: av2ztQZTQJC3DhZn+piJiA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,191,1708416000"; d="scan'208";a="25476294" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2024 15:07:52 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org, Sean Christopherson , Paolo Bonzini , Michael Roth , David Matlack , Federico Parola , Kai Huang Subject: [PATCH v2 01/10] KVM: Document KVM_MAP_MEMORY ioctl Date: Wed, 10 Apr 2024 15:07:27 -0700 Message-ID: <9a060293c9ad9a78f1d8994cfe1311e818e99257.1712785629.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Adds documentation of KVM_MAP_MEMORY ioctl. [1] It populates guest memory. It doesn't do extra operations on the underlying technology-specific initialization [2]. For example, CoCo-related operations won't be performed. Concretely for TDX, this API won't invoke TDH.MEM.PAGE.ADD() or TDH.MR.EXTEND(). Vendor-specific APIs are required for such operations. The key point is to adapt of vcpu ioctl instead of VM ioctl. First, populating guest memory requires vcpu. If it is VM ioctl, we need to pick one vcpu somehow. Secondly, vcpu ioctl allows each vcpu to invoke this ioctl in parallel. It helps to scale regarding guest memory size, e.g., hundreds of GB. [1] https://lore.kernel.org/kvm/Zbrj5WKVgMsUFDtb@google.com/ [2] https://lore.kernel.org/kvm/Ze-TJh0BBOWm9spT@google.com/ Suggested-by: Sean Christopherson Signed-off-by: Isaku Yamahata --- v2: - Make flags reserved for future use. (Sean, Michael) - Clarified the supposed use case. (Kai) - Dropped source member of struct kvm_memory_mapping. (Michael) - Change the unit from pages to bytes. (Michael) --- Documentation/virt/kvm/api.rst | 52 ++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index f0b76ff5030d..6ee3d2b51a2b 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6352,6 +6352,58 @@ a single guest_memfd file, but the bound ranges must not overlap). See KVM_SET_USER_MEMORY_REGION2 for additional details. +4.143 KVM_MAP_MEMORY +------------------------ + +:Capability: KVM_CAP_MAP_MEMORY +:Architectures: none +:Type: vcpu ioctl +:Parameters: struct kvm_memory_mapping (in/out) +:Returns: 0 on success, < 0 on error + +Errors: + + ========== ============================================================= + EINVAL invalid parameters + EAGAIN The region is only processed partially. The caller should + issue the ioctl with the updated parameters when `size` > 0. + EINTR An unmasked signal is pending. The region may be processed + partially. + EFAULT The parameter address was invalid. The specified region + `base_address` and `size` was invalid. The region isn't + covered by KVM memory slot. + EOPNOTSUPP The architecture doesn't support this operation. The x86 two + dimensional paging supports this API. the x86 kvm shadow mmu + doesn't support it. The other arch KVM doesn't support it. + ========== ============================================================= + +:: + + struct kvm_memory_mapping { + __u64 base_address; + __u64 size; + __u64 flags; + }; + +KVM_MAP_MEMORY populates guest memory with the range, `base_address` in (L1) +guest physical address(GPA) and `size` in bytes. `flags` must be zero. It's +reserved for future use. When the ioctl returns, the input values are updated +to point to the remaining range. If `size` > 0 on return, the caller should +issue the ioctl with the updated parameters. + +Multiple vcpus are allowed to call this ioctl simultaneously. It's not +mandatory for all vcpus to issue this ioctl. A single vcpu can suffice. +Multiple vcpus invocations are utilized for scalability to process the +population in parallel. If multiple vcpus call this ioctl in parallel, it may +result in the error of EAGAIN due to race conditions. + +This population is restricted to the "pure" population without triggering +underlying technology-specific initialization. For example, CoCo-related +operations won't perform. In the case of TDX, this API won't invoke +TDH.MEM.PAGE.ADD() or TDH.MR.EXTEND(). Vendor-specific uAPIs are required for +such operations. + + 5. The kvm_run structure ======================== From patchwork Wed Apr 10 22:07:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13625087 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 38F1518410B; Wed, 10 Apr 2024 22:07:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712786878; cv=none; b=CR4+Nj9+PqfdvUleS48IpDx5eP/h43HgFYMW0z1oT7Fa8ZxTDjUVCIi39CXW2bXe9JcjVGQ/lz+Nn2X5puV7ECVqENlqZcJeL2DWY5kfVhujyEIiiHC8yak0dEOQGSvgCbYrKco5gwWfDByT/ZqwqQTS3bXp91S9+XY6PBrU4jI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712786878; c=relaxed/simple; bh=RGRZLnS1e0K8pXOQ+HzpK57xndukuVd+qCbivG6HpcQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UkNNKJtkoXstXGDfqUv2YQrX+1ZEiAyMp5ZfEL2l5ut/zIMQx9+BK4D8i5M8ZwQQDZuRuKUREwKAcz4v9CYU1DgzjqMpIFBvvsmJPJeEW59MYXQhxWI7Sr53zSh36gOcms1QIOV7pXoivoqPs2QaHV49VhKDlzlGHwV0NSEqx/Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=AjZB8IIe; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="AjZB8IIe" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1712786877; x=1744322877; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=RGRZLnS1e0K8pXOQ+HzpK57xndukuVd+qCbivG6HpcQ=; b=AjZB8IIe3Jec3XJ6Z/XF+p3te2lQgfCoAgk7qo5DfYu0Mt4bAT39yVm+ gUnS4xCh4DiXXPJ7721waNWYJiQ/j+WFkvBdShOpq1ZhAlmV7crN+RJ9/ t1XbOOGQJ2NsbXgGE2k0jh549jCkS+3J8W5+mqA6Ik9V57Ah/fqJBoukl F71dtN1obFvqHoEMFTpV1PUeWyRMzSIXsduV6l19zZvLqkpUlL7NXD+6j xdUzP4xqaYjcbXdVoTzjV4AXLSIWn76MZzMKNbfd7KQM7M+pC4FcYkZ85 SqZ24dgpxdGLcjAMmtryuPCuhb9ACGpkTcTh8cDEvWTS1f1t9AI9X4s1E g==; X-CSE-ConnectionGUID: 2QDMHJKVSweq0egWJtWBlQ== X-CSE-MsgGUID: UlfDWJi6TZaRXSmNHld0pQ== X-IronPort-AV: E=McAfee;i="6600,9927,11039"; a="8041111" X-IronPort-AV: E=Sophos;i="6.07,191,1708416000"; d="scan'208";a="8041111" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2024 15:07:54 -0700 X-CSE-ConnectionGUID: EOOvi2lWSD+k+1BWQbxUVQ== X-CSE-MsgGUID: EUaxKyxZQs6D8QQUWpiJsQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,191,1708416000"; d="scan'208";a="25476300" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2024 15:07:53 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org, Sean Christopherson , Paolo Bonzini , Michael Roth , David Matlack , Federico Parola , Kai Huang Subject: [PATCH v2 02/10] KVM: Add KVM_MAP_MEMORY vcpu ioctl to pre-populate guest memory Date: Wed, 10 Apr 2024 15:07:28 -0700 Message-ID: <819322b8f25971f2b9933bfa4506e618508ad782.1712785629.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Add a new ioctl KVM_MAP_MEMORY in the KVM common code. It iterates on the memory range and calls the arch-specific function. Add stub arch function as a weak symbol. Suggested-by: Sean Christopherson Signed-off-by: Isaku Yamahata Reviewed-by: Rick Edgecombe --- v2: - Drop need_resched(). (David, Sean, Kai) - Move cond_resched() at the end of loop. (Kai) - Drop added check. (David) - Use EINTR instead of ERESTART. (David, Sean) - Fix srcu lock leak. (Kai, Sean) - Add comment above copy_to_user(). - Drop pointless comment. (Sean) - Drop kvm_arch_vcpu_pre_map_memory(). (Sean) - Don't overwrite error code. (Sean, David) - Make the parameter in bytes, not pages. (Michael) - Drop source member in struct kvm_memory_mapping. (Sean, Michael) --- include/linux/kvm_host.h | 3 +++ include/uapi/linux/kvm.h | 9 +++++++ virt/kvm/kvm_main.c | 54 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 66 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 48f31dcd318a..e56a0c7e5b42 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2445,4 +2445,7 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm, } #endif /* CONFIG_KVM_PRIVATE_MEM */ +int kvm_arch_vcpu_map_memory(struct kvm_vcpu *vcpu, + struct kvm_memory_mapping *mapping); + #endif diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 2190adbe3002..972aa9e054d3 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -917,6 +917,7 @@ struct kvm_enable_cap { #define KVM_CAP_MEMORY_ATTRIBUTES 233 #define KVM_CAP_GUEST_MEMFD 234 #define KVM_CAP_VM_TYPES 235 +#define KVM_CAP_MAP_MEMORY 236 struct kvm_irq_routing_irqchip { __u32 irqchip; @@ -1548,4 +1549,12 @@ struct kvm_create_guest_memfd { __u64 reserved[6]; }; +#define KVM_MAP_MEMORY _IOWR(KVMIO, 0xd5, struct kvm_memory_mapping) + +struct kvm_memory_mapping { + __u64 base_address; + __u64 size; + __u64 flags; +}; + #endif /* __LINUX_KVM_H */ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index fb49c2a60200..f2ad9e106cdb 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -4415,6 +4415,48 @@ static int kvm_vcpu_ioctl_get_stats_fd(struct kvm_vcpu *vcpu) return fd; } +__weak int kvm_arch_vcpu_map_memory(struct kvm_vcpu *vcpu, + struct kvm_memory_mapping *mapping) +{ + return -EOPNOTSUPP; +} + +static int kvm_vcpu_map_memory(struct kvm_vcpu *vcpu, + struct kvm_memory_mapping *mapping) +{ + int idx, r; + + if (mapping->flags) + return -EINVAL; + + if (!PAGE_ALIGNED(mapping->base_address) || + !PAGE_ALIGNED(mapping->size) || + mapping->base_address + mapping->size <= mapping->base_address) + return -EINVAL; + + vcpu_load(vcpu); + idx = srcu_read_lock(&vcpu->kvm->srcu); + + r = 0; + while (mapping->size) { + if (signal_pending(current)) { + r = -EINTR; + break; + } + + r = kvm_arch_vcpu_map_memory(vcpu, mapping); + if (r) + break; + + cond_resched(); + } + + srcu_read_unlock(&vcpu->kvm->srcu, idx); + vcpu_put(vcpu); + + return r; +} + static long kvm_vcpu_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { @@ -4616,6 +4658,18 @@ static long kvm_vcpu_ioctl(struct file *filp, r = kvm_vcpu_ioctl_get_stats_fd(vcpu); break; } + case KVM_MAP_MEMORY: { + struct kvm_memory_mapping mapping; + + r = -EFAULT; + if (copy_from_user(&mapping, argp, sizeof(mapping))) + break; + r = kvm_vcpu_map_memory(vcpu, &mapping); + /* Don't check error to tell the processed range. */ + if (copy_to_user(argp, &mapping, sizeof(mapping))) + r = -EFAULT; + break; + } default: r = kvm_arch_vcpu_ioctl(filp, ioctl, arg); } From patchwork Wed Apr 10 22:07:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13625088 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6CD6218410C; Wed, 10 Apr 2024 22:07:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712786878; cv=none; b=E2jvCW9jsSOEVoK23QVVFxgpVCm8uErKQW8RduqqPLF/uxg6P/dTNfumIemRbN3Op5Ax8VHdA3YkPXuwkBzPbb9Q18ifUg8NPbcyPIm17bII+F6lCwLeLg0IKTiHYPLSIUxSWeHyAo5TJWhnc+/nMmTdRc+FSdC4rtPUOrDBEV8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712786878; c=relaxed/simple; bh=AJ9A7wEL2oUuSn9GN771f2P12p85yWpOqELXl0aB/hM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=C7uJQirZXdJkDK+7hOVfQY/XXotGnRzViQpkLA/kjHw9YsseQ7JgvtJxFIB1l7xLByEhDtmZ7IhWsUPHY8A1YITXjDhxE1+WxNMh9h5xVlby/HO3zJ+5VqHqI6PvC0dPXukzw1qfXpEDC2xGmpxQSkCxETBbCjYnwfCoiMW/Wtk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=FdjO+Psj; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="FdjO+Psj" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1712786877; x=1744322877; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=AJ9A7wEL2oUuSn9GN771f2P12p85yWpOqELXl0aB/hM=; b=FdjO+PsjnFYOUy2M065IBBGUu6LFwiabuMxGNxpHyYnhKE/7ttI6gptp wy/VFErgxztdrlGLNVIrri00aKEWzQHS8v1/AnBR835H3uaFHwk/10LWl XK/zx2FfL+P07M+AJ2Eu3okh8yl0STZnfF/TL+k0Q2FhP0xbw9Xy4664W xDMxXLYNATLDmyOOANMYrvl/FR86+a+ynpyh/wNn+z9o6LGl6jpBvCI6m iumP94wXW5hAbWSqbgDuqupZPzcQnifXPtvvjc8XwYPmr+7GPBu7Tm5dW eSO0mWRWB3nUppglLncLFDO6Pt7PIiJQSsFgrM2zIAwC7nUItLzd5NVOK w==; X-CSE-ConnectionGUID: aCDDcJYsQpmJUyhbUH2vrA== X-CSE-MsgGUID: HEPUGIYCQVaqMI5w/hAckQ== X-IronPort-AV: E=McAfee;i="6600,9927,11039"; a="8041117" X-IronPort-AV: E=Sophos;i="6.07,191,1708416000"; d="scan'208";a="8041117" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2024 15:07:54 -0700 X-CSE-ConnectionGUID: 7VklZRaVRj6+btQbbxDMog== X-CSE-MsgGUID: 0ig66S/pR9yDuZQ5DVz9Gw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,191,1708416000"; d="scan'208";a="25476305" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2024 15:07:54 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org, Sean Christopherson , Paolo Bonzini , Michael Roth , David Matlack , Federico Parola , Kai Huang Subject: [PATCH v2 03/10] KVM: x86/mmu: Extract __kvm_mmu_do_page_fault() Date: Wed, 10 Apr 2024 15:07:29 -0700 Message-ID: X-Mailer: git-send-email 2.43.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Extract out __kvm_mmu_do_page_fault() from kvm_mmu_do_page_fault(). The inner function is to initialize struct kvm_page_fault and to call the fault handler, and the outer function handles updating stats and converting return code. KVM_MAP_MEMORY will call the KVM page fault handler. This patch makes the emulation_type always set irrelevant to the return code. kvm_mmu_page_fault() is the only caller of kvm_mmu_do_page_fault(), and references the value only when PF_RET_EMULATE is returned. Therefore, this adjustment doesn't affect functionality. No functional change intended. Suggested-by: Sean Christopherson Signed-off-by: Isaku Yamahata --- v2: - Newly introduced. (Sean) --- arch/x86/kvm/mmu/mmu_internal.h | 32 +++++++++++++++++++++----------- 1 file changed, 21 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index e68a60974cf4..9baae6c223ee 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -287,8 +287,8 @@ static inline void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, fault->is_private); } -static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, - u64 err, bool prefetch, int *emulation_type) +static inline int __kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, + u64 err, bool prefetch, int *emulation_type) { struct kvm_page_fault fault = { .addr = cr2_or_gpa, @@ -318,14 +318,6 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn); } - /* - * Async #PF "faults", a.k.a. prefetch faults, are not faults from the - * guest perspective and have already been counted at the time of the - * original fault. - */ - if (!prefetch) - vcpu->stat.pf_taken++; - if (IS_ENABLED(CONFIG_MITIGATION_RETPOLINE) && fault.is_tdp) r = kvm_tdp_page_fault(vcpu, &fault); else @@ -333,12 +325,30 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, if (r == RET_PF_EMULATE && fault.is_private) { kvm_mmu_prepare_memory_fault_exit(vcpu, &fault); - return -EFAULT; + r = -EFAULT; } if (fault.write_fault_to_shadow_pgtable && emulation_type) *emulation_type |= EMULTYPE_WRITE_PF_TO_SP; + return r; +} + +static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, + u64 err, bool prefetch, int *emulation_type) +{ + int r; + + /* + * Async #PF "faults", a.k.a. prefetch faults, are not faults from the + * guest perspective and have already been counted at the time of the + * original fault. + */ + if (!prefetch) + vcpu->stat.pf_taken++; + + r = __kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, err, prefetch, emulation_type); + /* * Similar to above, prefetch faults aren't truly spurious, and the * async #PF path doesn't do emulation. Do count faults that are fixed From patchwork Wed Apr 10 22:07:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13625089 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 45A4818412B; Wed, 10 Apr 2024 22:07:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712786879; cv=none; b=acZV3H+aYFeF+pjLJiBtH+jwaEwFH8ACUGBU42dI0Eqp5qk5Jev2hKNWlzZxvEPRhguiBx1isDjXuW9jlzbBjuSSZMOHHM8P3fU/cs1c5Hfn1C1woktKxfNVRyDGA9VVm+sfp2VTyO2PvZO4h5oWhdFhzlk65KICyAE7r++R+uM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712786879; c=relaxed/simple; bh=XwXdHZ2iQsysNrw834gj2+zq3XuIiPhYcqjrgWnIrjI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qPHwlekr+dF6uxm5VBFa1D90Wr8dqUz5cD5J6QzhaBBXvkNFk8OazfCJ8QLaXD7gIVMaERBmjFqwi5kF9/ZymrcwLNELEPm9eNWBozAbpplGGYjqx+rrtrwLlFEhMHwoC4/otdKfsxgobOi9Wjg9fogTJNZ0pwKp0CI0fuKHuBI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=i3NS07Vj; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="i3NS07Vj" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1712786878; x=1744322878; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XwXdHZ2iQsysNrw834gj2+zq3XuIiPhYcqjrgWnIrjI=; b=i3NS07VjEoxH0C8ExL1Qh+Xr3nj4nY90v/TBAIrRC9Z9Y9fZSK9O965k AwQ/Iz8JJRAexRnFk0cJ1WLWiYAHCcsCzvafF83nNgW+nCWHivOKgvPxe GnnvvyF0rmfI8kvzw2dlbZvM0lslHMjbtHo4Y0ugz4/gddAf/LWPjlSV0 QCK+YVJmTIc3knZ5+b8zzFCBcq44IexkOz88GRBpTgP/xw0V4zxyMG5bN WJqF6s4jVP1VRB6Op2oTL7cKYhNqny4INypaQrZCM08hQ3IhgYd+Woq1H bF4nKzhkdb4bZQrfHzUWKwwsAsM/CBPqFvwNrmZngZYmBQO0DvLia/BXF A==; X-CSE-ConnectionGUID: ENINn3snQUCcPWGY21U4bQ== X-CSE-MsgGUID: sRPjFllqSEuFq7f1xT4UGQ== X-IronPort-AV: E=McAfee;i="6600,9927,11039"; a="8041124" X-IronPort-AV: E=Sophos;i="6.07,191,1708416000"; d="scan'208";a="8041124" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2024 15:07:55 -0700 X-CSE-ConnectionGUID: 4lTHY6vySlCEXFYBF7fIew== X-CSE-MsgGUID: FS5nXk5XQveoz4d6MiHtqQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,191,1708416000"; d="scan'208";a="25476308" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2024 15:07:54 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org, Sean Christopherson , Paolo Bonzini , Michael Roth , David Matlack , Federico Parola , Kai Huang Subject: [PATCH v2 04/10] KVM: x86/mmu: Make __kvm_mmu_do_page_fault() return mapped level Date: Wed, 10 Apr 2024 15:07:30 -0700 Message-ID: X-Mailer: git-send-email 2.43.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata The guest memory population logic will need to know what page size or level (4K, 2M, ...) is mapped. Signed-off-by: Isaku Yamahata --- v2: - Newly added. --- arch/x86/kvm/mmu/mmu_internal.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 9baae6c223ee..b0a10f5a40dd 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -288,7 +288,8 @@ static inline void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, } static inline int __kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, - u64 err, bool prefetch, int *emulation_type) + u64 err, bool prefetch, + int *emulation_type, u8 *level) { struct kvm_page_fault fault = { .addr = cr2_or_gpa, @@ -330,6 +331,8 @@ static inline int __kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gp if (fault.write_fault_to_shadow_pgtable && emulation_type) *emulation_type |= EMULTYPE_WRITE_PF_TO_SP; + if (level) + *level = fault.goal_level; return r; } @@ -347,7 +350,8 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, if (!prefetch) vcpu->stat.pf_taken++; - r = __kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, err, prefetch, emulation_type); + r = __kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, err, prefetch, + emulation_type, NULL); /* * Similar to above, prefetch faults aren't truly spurious, and the From patchwork Wed Apr 10 22:07:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13625090 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D23518413A; Wed, 10 Apr 2024 22:07:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712786879; cv=none; b=Yggv+HBoSqj435W4GTQYPCbRHdIVykY822HRRstuwGu3+DW2QV+HWo/8UOhx3qckgQ2R+D8g/mcyCvEasfiv3ib9AIH0HkOlTjGg2DIJhWBqbfhG+yRwwrUq/QZXpOZPaz7g5QfKLZsCVV6Y+gv/uusTh1uqHcNXshhyZwKQ2lo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712786879; c=relaxed/simple; bh=LZ7h9Iej+qInZVYxYjjmBmC9T0ltOy6lHnXxoRaqceg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CuvZw/xh+vjjhlTbSIW5BL7CX8+0ybMx0omrIr+pqVrLBVjw+DybGvEQ7OZwANFGnaKCILyIN8uLcfGleAp5sdaXCML8yIrZK73HqpkI4Y0s38I+I0jT1ntWvVh+uviiKFTtb8Qe1OxQtNFFd702P51lI4AATSzxBBwycbCBm+U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=HoPbbvLG; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="HoPbbvLG" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1712786878; x=1744322878; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LZ7h9Iej+qInZVYxYjjmBmC9T0ltOy6lHnXxoRaqceg=; b=HoPbbvLG+nhdIFPRjYeA/js8K4vju57XJzD5ZE6xu1+R2TUEiSV/kbPe khn0wn35y6WOpYtPZDzrKLJxMC2O2qgZC5aScvBM31vCP6mNTTUGrpZHn z9CPfEX/PQs2LfGlC9e/Iqir55hZ8R9SghPREVWaKQ72ttphgl1QKKa8I voQAgCV3qWU6lwgAfjshFd/QSD3hQzTWWd3jXVJRSv/nyJElgLCp3SkjB epWqwVlC38B5lI4v7YjKK3XL6TKYeiFRZc8fPbHl74+Z0jEyh30ULM2uT 0I/ZlXcdjiZ9Pp8i82+2P5pBowYLNJ9ZgZiPVMmjCm3C45xk3e9ojvonE A==; X-CSE-ConnectionGUID: DM8DrhjDSsehBIfiSnIfaA== X-CSE-MsgGUID: Pu1jXM8UQ4GegxVa27YSUA== X-IronPort-AV: E=McAfee;i="6600,9927,11039"; a="8041130" X-IronPort-AV: E=Sophos;i="6.07,191,1708416000"; d="scan'208";a="8041130" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2024 15:07:56 -0700 X-CSE-ConnectionGUID: BvAy4ABeRs2O/lb1sgiEwA== X-CSE-MsgGUID: LJng1RPQQ7SjSAG0iiK2lA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,191,1708416000"; d="scan'208";a="25476311" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2024 15:07:55 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org, Sean Christopherson , Paolo Bonzini , Michael Roth , David Matlack , Federico Parola , Kai Huang Subject: [PATCH v2 05/10] KVM: x86/mmu: Introduce kvm_tdp_map_page() to populate guest memory Date: Wed, 10 Apr 2024 15:07:31 -0700 Message-ID: <9b866a0ae7147f96571c439e75429a03dcb659b6.1712785629.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Introduce a helper function to call the KVM fault handler. It allows a new ioctl to invoke the KVM fault handler to populate without seeing RET_PF_* enums or other KVM MMU internal definitions because RET_PF_* are internal to x86 KVM MMU. The implementation is restricted to two-dimensional paging for simplicity. The shadow paging uses GVA for faulting instead of L1 GPA. It makes the API difficult to use. Signed-off-by: Isaku Yamahata --- v2: - Make the helper function two-dimensional paging specific. (David) - Return error when vcpu is in guest mode. (David) - Rename goal_level to level in kvm_tdp_mmu_map_page(). (Sean) - Update return code conversion. Don't check pfn. RET_PF_EMULATE => EINVAL, RET_PF_CONTINUE => EIO (Sean) - Add WARN_ON_ONCE on RET_PF_CONTINUE and RET_PF_INVALID. (Sean) - Drop unnecessary EXPORT_SYMBOL_GPL(). (Sean) --- arch/x86/kvm/mmu.h | 3 +++ arch/x86/kvm/mmu/mmu.c | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index e8b620a85627..51ff4f67e115 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -183,6 +183,9 @@ static inline void kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu, __kvm_mmu_refresh_passthrough_bits(vcpu, mmu); } +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, + u8 *level); + /* * Check if a given access (described through the I/D, W/R and U/S bits of a * page fault error code pfec) causes a permission fault with the given PTE diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 91dd4c44b7d8..a34f4af44cbd 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4687,6 +4687,38 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) return direct_page_fault(vcpu, fault); } +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, + u8 *level) +{ + int r; + + /* Restrict to TDP page fault. */ + if (vcpu->arch.mmu->page_fault != kvm_tdp_page_fault) + return -EINVAL; + + r = __kvm_mmu_do_page_fault(vcpu, gpa, error_code, false, NULL, level); + if (r < 0) + return r; + + switch (r) { + case RET_PF_RETRY: + return -EAGAIN; + + case RET_PF_FIXED: + case RET_PF_SPURIOUS: + return 0; + + case RET_PF_EMULATE: + return -EINVAL; + + case RET_PF_CONTINUE: + case RET_PF_INVALID: + default: + WARN_ON_ONCE(r); + return -EIO; + } +} + static void nonpaging_init_context(struct kvm_mmu *context) { context->page_fault = nonpaging_page_fault; From patchwork Wed Apr 10 22:07:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13625091 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D1F2184139; Wed, 10 Apr 2024 22:07:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712786880; cv=none; b=n9LJ3Izg3PucseSeXknQTNBVnbZt9S5BW/3yqLMgSn6JNG+Lq+9hRt58FlLuZaDp11+lpCeG9XcwGxQ0RYUHUHfB70cO849DQgH8GYyfH/4w1buJTyRE9Y08ws+vjxkqQ3spj6uXdEACWmVxJjqZ3yvjWY5CReIcvocR9KJcrmY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712786880; c=relaxed/simple; bh=KzbwiK5L28jvxhKDLMs0XV3x+kKU7fDkmDzZGqeBXiQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gekK0Skyd2AundcdrhwDzdXrSVHIaNztYPKrON9yeRHg4WokcMlsqQuleOQoB/qnxDMatPcgQD0YkFaIDX0KQO/MkqBrs9suxxHjxPTdC5Ee9PWUXMYg/c5yh+p+oGL2TNj1BkGos3M5KLEhhYIbtor7AknBHGRNv6a/5X6S470= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=lkGMhVRD; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="lkGMhVRD" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1712786878; x=1744322878; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KzbwiK5L28jvxhKDLMs0XV3x+kKU7fDkmDzZGqeBXiQ=; b=lkGMhVRDwAX0QIYlwg2ynAs+clrqQu4M6KXFGp9aZSerMfZuM8qjP1FU EoKUJLpChAwL0gH1clWRkStd1l2uNI9+qRrbF5gmgiUXQmLq805TpzbNp 4Z6qNOYOvHpa7+t9RlofHi60RJxXsFio1l6t0IwSncm5ADTxQH7KOw+ux gxntHbmoQjFPQTaJr6D5PRkg9sfmqxrUVWyKgvZDPNVhLBlZIQairCNIn eGqI3lqLxoSz7dM2w4zZfkbECscmxfn08BuVrvffm7hjBHLcdlNHdoTMV H0ZoCtQu74jA0vDF26pwU90n7PIpRqlGcbRU5Y6NPVodbBczcqQruze5d Q==; X-CSE-ConnectionGUID: lP3LAkFOSAKbMbzBepZ98A== X-CSE-MsgGUID: y66Y/KT8Sjyf/qMS49dQBA== X-IronPort-AV: E=McAfee;i="6600,9927,11039"; a="8041136" X-IronPort-AV: E=Sophos;i="6.07,191,1708416000"; d="scan'208";a="8041136" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2024 15:07:56 -0700 X-CSE-ConnectionGUID: eiG0t2F9QHqTBdWvNxKo/g== X-CSE-MsgGUID: HVYTcesfSvuhlROSA5i/1A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,191,1708416000"; d="scan'208";a="25476314" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2024 15:07:56 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org, Sean Christopherson , Paolo Bonzini , Michael Roth , David Matlack , Federico Parola , Kai Huang Subject: [PATCH v2 06/10] KVM: x86: Implement kvm_arch_vcpu_map_memory() Date: Wed, 10 Apr 2024 15:07:32 -0700 Message-ID: <7138a3bc00ea8d3cbe0e59df15f8c22027005b59.1712785629.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Wire KVM_MAP_MEMORY ioctl to kvm_mmu_map_tdp_page() to populate guest memory. When KVM_CREATE_VCPU creates vCPU, it initializes the x86 KVM MMU part by kvm_mmu_create() and kvm_init_mmu(). vCPU is ready to invoke the KVM page fault handler. Signed-off-by: Isaku Yamahata --- v2: - Catch up the change of struct kvm_memory_mapping. (Sean) - Removed mapping level check. Push it down into vendor code. (David, Sean) - Rename goal_level to level. (Sean) - Drop kvm_arch_pre_vcpu_map_memory(), directly call kvm_mmu_reload(). (David, Sean) - Fixed the update of mapping. --- arch/x86/kvm/x86.c | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2d2619d3eee4..2c765de3531e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4713,6 +4713,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES: case KVM_CAP_IRQFD_RESAMPLE: case KVM_CAP_MEMORY_FAULT_INFO: + case KVM_CAP_MAP_MEMORY: r = 1; break; case KVM_CAP_EXIT_HYPERCALL: @@ -5867,6 +5868,35 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, } } +int kvm_arch_vcpu_map_memory(struct kvm_vcpu *vcpu, + struct kvm_memory_mapping *mapping) +{ + u64 end, error_code = 0; + u8 level = PG_LEVEL_4K; + int r; + + /* + * Shadow paging uses GVA for kvm page fault. The first implementation + * supports GPA only to avoid confusion. + */ + if (!tdp_enabled) + return -EOPNOTSUPP; + + /* reload is optimized for repeated call. */ + kvm_mmu_reload(vcpu); + + r = kvm_tdp_map_page(vcpu, mapping->base_address, error_code, &level); + if (r) + return r; + + /* mapping->base_address is not necessarily aligned to level-hugepage. */ + end = (mapping->base_address & KVM_HPAGE_MASK(level)) + + KVM_HPAGE_SIZE(level); + mapping->size -= end - mapping->base_address; + mapping->base_address = end; + return r; +} + long kvm_arch_vcpu_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { From patchwork Wed Apr 10 22:07:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13625092 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B900194C64; Wed, 10 Apr 2024 22:07:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712786880; cv=none; b=WtetQgsnaJWaDJduKrWgma0eQN5AZbD3BZyVqS7gOk9RHSC+xJXDsnnpPhNcVTjuWc8H0jfH3dw6XlQ1ymy5N2aSExM7qW18Xm1wNvu9lTZgjp8VA0TlgL2JP6Qp8Y7HZzUolIoCjjJjmkasx1KURCB81q35v30oFlpGZbr86yc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712786880; c=relaxed/simple; bh=GIu4IpEu0OR4p8boNWIAIqwgqflasHfUe3Ym3NmIEMM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OQ26O6l3CmtoQwiy/4JXZiUimFQGP9HvpWxIljJ0PIVV+6nOXK1EvvXRzz/bexyiyetly8cvOYT5uSiBoRVeiU7URCOhBipwRcsUWF0OW6W9GIRH7u1TRdRxJvm7K+raxIae4LbFwEvR6YDAN7CHY5zA8FVNbZTl+BUk1MCc+2s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=VusY2BjZ; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="VusY2BjZ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1712786879; x=1744322879; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GIu4IpEu0OR4p8boNWIAIqwgqflasHfUe3Ym3NmIEMM=; b=VusY2BjZT8XIhSPVtJn9TdLFGbRPQRydsqYYVf8Hc/lB9jYXJ+2z0ZC7 ZWMyz2jjtB6pL2D45EetufBX6BWNcQUQpJ4nfREJ19UvdIZCPz6dNMfwi Eg+AYsuphKDXVAFtmpDbN1xL308qo1F5sMlJQtgD+FVkH1EQJJrhl12q4 Ew0+hOcoSzb2gKir1o1Kjft7CiO32ynU3kugNi/VdYfciVXGv9iY5EnGZ 7Gh5hOXf5Mjp665DrQgNgKKu7FdE/pHFNxgLFLe97hJOa0bNhocFfYeSG UGR+M5A87F65R9USXJj2lY9uwCbCeEdaquyK0tVcIpt40gC19ndBUQP77 Q==; X-CSE-ConnectionGUID: FnnMdapzQpS08QyJtjFrOA== X-CSE-MsgGUID: 8EiHSVjHQ52PCrOjlDvVhg== X-IronPort-AV: E=McAfee;i="6600,9927,11039"; a="8041142" X-IronPort-AV: E=Sophos;i="6.07,191,1708416000"; d="scan'208";a="8041142" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2024 15:07:57 -0700 X-CSE-ConnectionGUID: VJ9idWvQQhu6SaChP5XnSw== X-CSE-MsgGUID: l1Ty1mbXRF6bFVTrL6gurA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,191,1708416000"; d="scan'208";a="25476319" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2024 15:07:56 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org, Sean Christopherson , Paolo Bonzini , Michael Roth , David Matlack , Federico Parola , Kai Huang Subject: [PATCH v2 07/10] KVM: x86: Always populate L1 GPA for KVM_MAP_MEMORY Date: Wed, 10 Apr 2024 15:07:33 -0700 Message-ID: <2f1de1b7b6512280fae4ac05e77ced80a585971b.1712785629.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Forcibly switch vCPU mode out from guest mode and SMM mode before calling KVM page fault handler for KVM_MAP_MEMORY. KVM_MAP_MEMORY populates guest memory with guest physical address (GPA). If the vCPU is in guest mode, it populates with L2 GPA. If vCPU is in SMM mode, it populates the SMM address pace. The API would be difficult to use as such. Change vCPU MMU mode around populating the guest memory to always populate with L1 GPA. There are several options to populate L1 GPA irrelevant to vCPU mode. - Switch vCPU MMU only: This patch. Pros: Concise implementation. Cons: Heavily dependent on the KVM MMU implementation. - Use kvm_x86_nested_ops.get/set_state() to switch to/from guest mode. Use __get/set_sregs2() to switch to/from SMM mode. Pros: straightforward. Cons: This may cause unintended side effects. - Refactor KVM page fault handler not to pass vCPU. Pass around necessary parameters and struct kvm. Pros: The end result will have clearly no side effects. Cons: This will require big refactoring. - Return error on guest mode or SMM mode: Without this patch. Pros: No additional patch. Cons: Difficult to use. Signed-off-by: Isaku Yamahata --- v2: - Newly added. --- arch/x86/kvm/x86.c | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2c765de3531e..8ba9c1720ac9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5871,8 +5871,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, int kvm_arch_vcpu_map_memory(struct kvm_vcpu *vcpu, struct kvm_memory_mapping *mapping) { + struct kvm_mmu *mmu = NULL, *walk_mmu = NULL; u64 end, error_code = 0; u8 level = PG_LEVEL_4K; + bool is_smm; int r; /* @@ -5882,18 +5884,40 @@ int kvm_arch_vcpu_map_memory(struct kvm_vcpu *vcpu, if (!tdp_enabled) return -EOPNOTSUPP; + /* Force to use L1 GPA despite of vcpu MMU mode. */ + is_smm = !!(vcpu->arch.hflags & HF_SMM_MASK); + if (is_smm || + vcpu->arch.mmu != &vcpu->arch.root_mmu || + vcpu->arch.walk_mmu != &vcpu->arch.root_mmu) { + vcpu->arch.hflags &= ~HF_SMM_MASK; + mmu = vcpu->arch.mmu; + walk_mmu = vcpu->arch.walk_mmu; + vcpu->arch.mmu = &vcpu->arch.root_mmu; + vcpu->arch.walk_mmu = &vcpu->arch.root_mmu; + kvm_mmu_reset_context(vcpu); + } + /* reload is optimized for repeated call. */ kvm_mmu_reload(vcpu); r = kvm_tdp_map_page(vcpu, mapping->base_address, error_code, &level); if (r) - return r; + goto out; /* mapping->base_address is not necessarily aligned to level-hugepage. */ end = (mapping->base_address & KVM_HPAGE_MASK(level)) + KVM_HPAGE_SIZE(level); mapping->size -= end - mapping->base_address; mapping->base_address = end; + +out: + /* Restore MMU state. */ + if (is_smm || mmu) { + vcpu->arch.hflags |= is_smm ? HF_SMM_MASK : 0; + vcpu->arch.mmu = mmu; + vcpu->arch.walk_mmu = walk_mmu; + kvm_mmu_reset_context(vcpu); + } return r; } From patchwork Wed Apr 10 22:07:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13625093 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D149B194C90; Wed, 10 Apr 2024 22:07:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712786881; cv=none; b=NRY6yA2F0ELMBNPJyqDyErwyGux3MH5Fmg6JkvsAHdO9MKqWUB89WXBGgeC2pUz/BBF6byqyXcVGFFfo/L6EOKEbK+i7QjUhsmmvDVh005idVTM4XTpi8rc3gm5IRJJb8j0/2ShvdOYJfPt/iBNsntd/zB0+r6vshULdF9g4/os= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712786881; c=relaxed/simple; bh=ETl8+5bXAMNCbp6iq0APgR1yrZku3CsPb2HTapGc37k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hTGFDguRK4agX6bNfPuz2XCOLPGyXX2QVgmcQ9ELl52IQFwiM9d5UTihRHxPPopUbj4nVPGrI85al4kO9AYMIZmMejd9R/DPgKNupeEiXxG9n7lM1BGJLvZvnv1CFEYCSdatiyi0JaEuX7pibhkozZB/jzMfvOHq3nBUIHZfsAQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=AxjLXnsR; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="AxjLXnsR" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1712786880; x=1744322880; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ETl8+5bXAMNCbp6iq0APgR1yrZku3CsPb2HTapGc37k=; b=AxjLXnsRzgJvK3p6bV+atyo9KW9rQm0abU8hgP6WOGFZJbuNqZvMT+D3 1cLLn860RorEcdz27ed25jPlSR/6wYUin3iY94QGjsP960NjAKRk82wyV Z6JKZCPWIMlWtDrko45EiV5wNEqYdRa3xJ/tdtVRot+DTlYPspUq44u+y iSM4oscE1uICsM3k4341vlHJ1oYkeV2tBCU8nH+ZB8V6ECin2ThdTBqnO vBMQuNr/mw5/R3z4norW+xVl94I3hfxzDqB/2n8CylhMLu4zadQlguIYH CXQ0v+Jab6mvvLXkPzifAO1SzFlK8LE5mSddWbnlGgWOZV9CLa0RNYgzy g==; X-CSE-ConnectionGUID: Jf5yLRruSPOtkjCBBMqBkA== X-CSE-MsgGUID: 9yXba4t+SjiNZGmoSkr2BQ== X-IronPort-AV: E=McAfee;i="6600,9927,11039"; a="8041148" X-IronPort-AV: E=Sophos;i="6.07,191,1708416000"; d="scan'208";a="8041148" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2024 15:07:58 -0700 X-CSE-ConnectionGUID: dWoIIdVpTc+cUhlWEUNQkQ== X-CSE-MsgGUID: fI8/2sUKQRKuXOH0I/dyrg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,191,1708416000"; d="scan'208";a="25476323" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2024 15:07:57 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org, Sean Christopherson , Paolo Bonzini , Michael Roth , David Matlack , Federico Parola , Kai Huang Subject: [PATCH v2 08/10] KVM: x86: Add a hook in kvm_arch_vcpu_map_memory() Date: Wed, 10 Apr 2024 15:07:34 -0700 Message-ID: <7194bb75ac25fa98875b7891d7929655ab245205.1712785629.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Add a hook in kvm_arch_vcpu_map_memory() for KVM_MAP_MEMORY before calling kvm_mmu_map_page() to adjust the error code for a page fault. The hook can hold vendor-specific logic to make those adjustments and enforce the restrictions. SEV and TDX KVM will use the hook. In the case of SEV and TDX, they need to adjust the KVM page fault error code or refuse the operation due to their restriction. TDX requires that the guest memory population must be before finalizing the VM. Signed-off-by: Isaku Yamahata --- v2: - Make pre_mmu_map_page() to take error_code. - Drop post_mmu_map_page(). - Drop struct kvm_memory_map.source check. --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/x86.c | 28 ++++++++++++++++++++++++++++ 3 files changed, 32 insertions(+) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index 5187fcf4b610..a5d4f4d5265d 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -139,6 +139,7 @@ KVM_X86_OP(vcpu_deliver_sipi_vector) KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); KVM_X86_OP_OPTIONAL(get_untagged_addr) KVM_X86_OP_OPTIONAL(alloc_apic_backing_page) +KVM_X86_OP_OPTIONAL(pre_mmu_map_page); #undef KVM_X86_OP #undef KVM_X86_OP_OPTIONAL diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3ce244ad44e5..2bf7f97f889b 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1812,6 +1812,9 @@ struct kvm_x86_ops { gva_t (*get_untagged_addr)(struct kvm_vcpu *vcpu, gva_t gva, unsigned int flags); void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu); + int (*pre_mmu_map_page)(struct kvm_vcpu *vcpu, + struct kvm_memory_mapping *mapping, + u64 *error_code); }; struct kvm_x86_nested_ops { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8ba9c1720ac9..b76d854701d5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5868,6 +5868,26 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, } } +static int kvm_pre_mmu_map_page(struct kvm_vcpu *vcpu, + struct kvm_memory_mapping *mapping, + u64 *error_code) +{ + int r = 0; + + if (vcpu->kvm->arch.vm_type == KVM_X86_DEFAULT_VM) { + /* nothing */ + } else if (vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM) { + if (kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(mapping->base_address))) + *error_code |= PFERR_PRIVATE_ACCESS; + } else if (kvm_x86_ops.pre_mmu_map_page) + r = static_call(kvm_x86_pre_mmu_map_page)(vcpu, mapping, + error_code); + else + r = -EOPNOTSUPP; + + return r; +} + int kvm_arch_vcpu_map_memory(struct kvm_vcpu *vcpu, struct kvm_memory_mapping *mapping) { @@ -5900,6 +5920,14 @@ int kvm_arch_vcpu_map_memory(struct kvm_vcpu *vcpu, /* reload is optimized for repeated call. */ kvm_mmu_reload(vcpu); + /* + * Adjust error_code for VM-type. max_level is adjusted by + * gmem_max_level() callback. + */ + r = kvm_pre_mmu_map_page(vcpu, mapping, &error_code); + if (r) + goto out; + r = kvm_tdp_map_page(vcpu, mapping->base_address, error_code, &level); if (r) goto out; From patchwork Wed Apr 10 22:07:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13625094 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8777F199E99; Wed, 10 Apr 2024 22:08:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712786882; cv=none; b=ApROqmkruDbxqssRzNtcoxeTEZRF38G0guVmqteraBILUalz3qii8uUhMNRqgmhXNnIXDd4jxCRZFxDhSSNa/ud9xedf2jo9unnRBegcq3/OkVOaLOeISdaKFG69hASqdmTGw7Lqpd6ByH7f1AM2hr+4mrolpoLnUBT8lCy5/q4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712786882; c=relaxed/simple; bh=yk0wcqElxl1bqJsocGv0srLS/uJA2TdoHc6rwAmCIaU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iQG8HKYJdhoBMNhdEEWa16z9GtzYK88AdHFiJiuK4gYrKt8GtyeCtqoQUoZ7u+47H0y/rVyTcUSl0wyDovvlj50yXqvChK/RGGSco3HPBQsv8ousw4mSkKRqAaZHaUaF0P0wAxPCFUSiUuGG4P2jgSfcfrmu9ebdMG56rcio9EA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Jb69LJbo; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Jb69LJbo" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1712786881; x=1744322881; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yk0wcqElxl1bqJsocGv0srLS/uJA2TdoHc6rwAmCIaU=; b=Jb69LJboraDal8QeHixShG84J4vaMISGr6u8TtcXK5t3fjh5uRNFIyjO nrS+2M1sO1NY+fY4dB7OZPg7FXlaK3FKOWnmjsBWyYU+iFeMwMe5Px4Za fBw+FlaLTEXSQgRBZBlHRMXqZ5ixV5nibUuTVSm9u4IAHUriWl1BxgCSl pM3iSEgOl1qdyPYve4gvIvH0oXr3R/hwqLCAcQzuXwKu49FrCe+Shtb6g jInJQd3tRaKVpX0eigvYvZnlhWYvh18yzi/SZCtXbBgFKuNif2kGxbrx+ ih2zW1d8/xGE+KFWlm7aQT4aEXWCqg0qaLDP8MasPrdAq6eo98wgnoIQ8 g==; X-CSE-ConnectionGUID: psBFvCCWRbWin35ceodU+Q== X-CSE-MsgGUID: NyO7iItHR/W8LPrT7Usivg== X-IronPort-AV: E=McAfee;i="6600,9927,11039"; a="8041154" X-IronPort-AV: E=Sophos;i="6.07,191,1708416000"; d="scan'208";a="8041154" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2024 15:07:59 -0700 X-CSE-ConnectionGUID: 7qyOBPlMSYuFbrYpFg10Yw== X-CSE-MsgGUID: 46U7rv4GQBqzVjzELZQD5w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,191,1708416000"; d="scan'208";a="25476327" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2024 15:07:58 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org, Sean Christopherson , Paolo Bonzini , Michael Roth , David Matlack , Federico Parola , Kai Huang Subject: [PATCH v2 09/10] KVM: SVM: Implement pre_mmu_map_page() to refuse KVM_MAP_MEMORY Date: Wed, 10 Apr 2024 15:07:35 -0700 Message-ID: X-Mailer: git-send-email 2.43.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Implement vendor callback for KVM_MAP_MEMORY for SEV as EOPNOTSUPP because it should use SEV-specific ioctl instead. This patch is only to demonstrate how to implement the hook. Compile only tested. I leave the actual implementation to the SEV folks. Signed-off-by: Isaku Yamahata --- v2: - Newly added --- arch/x86/kvm/svm/sev.c | 6 ++++++ arch/x86/kvm/svm/svm.c | 2 ++ arch/x86/kvm/svm/svm.h | 9 +++++++++ 3 files changed, 17 insertions(+) diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 1642d7d49bde..ab17d7c16636 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -3322,3 +3322,9 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) return p; } + +int sev_pre_mmu_map_page(struct kvm_vcpu *vcpu, + struct kvm_memory_mapping *mapping, u64 *error_code) +{ + return -EOPNOTSUPP; +} diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 535018f152a3..a886d4409b00 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -5057,6 +5057,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = { .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector, .vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons, .alloc_apic_backing_page = svm_alloc_apic_backing_page, + + .pre_mmu_map_page = sev_pre_mmu_map_page, }; /* diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 323901782547..c8dafcb0bfc6 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -689,6 +689,9 @@ int sev_mem_enc_unregister_region(struct kvm *kvm, int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd); int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd); void sev_guest_memory_reclaimed(struct kvm *kvm); +int sev_pre_mmu_map_page(struct kvm_vcpu *vcpu, + struct kvm_memory_mapping *mapping, u64 *error_code); + int sev_handle_vmgexit(struct kvm_vcpu *vcpu); /* These symbols are used in common code and are stubbed below. */ @@ -713,6 +716,12 @@ static inline void __init sev_hardware_setup(void) {} static inline void sev_hardware_unsetup(void) {} static inline int sev_cpu_init(struct svm_cpu_data *sd) { return 0; } static inline int sev_dev_get_attr(u32 group, u64 attr, u64 *val) { return -ENXIO; } +static inline int sev_pre_mmu_map_page(struct kvm_vcpu *vcpu, + struct kvm_memory_mapping *mapping, + u64 *error_code) +{ + return -EOPNOTSUPP; +} #define max_sev_asid 0 #endif From patchwork Wed Apr 10 22:07:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13625095 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ED6E9199EB5; Wed, 10 Apr 2024 22:08:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712786883; cv=none; b=UTo9okvtAprV0UHf8QZDM7oU1QUUbAbDgX2zR/zmX3zQpYKWG6HZjSw1BJblgrBcaleqTH93gywqDf++aZqKYOQ6ReAfVksVWXRsg75dvH0d9DLHVOUknjqF0hItSH3et7NqDSw3+cEKCfCnpCgnS3ZEtKXQor394adP9fz7tzY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712786883; c=relaxed/simple; bh=xiqSchLKUGvT4Pg7trp63x+D9biwzzQVpK0rEXgPkZE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oIIEH6Om8cwsNqCR0gnsPqeFqLUagnrxUKGXnZjcSVJzC1HQvOQ7YDaf/nma8nJPWDPpQq02LIQNuJQb9zLuEZZ6gcbKA6Zt5MWPlDTMj+lGc9QR8dLB6UaDjB5crkPtH1Ia9Qg20gqSY6aGXp1+Zy5cUPS3ItB3BIBVKFU85DY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=BpQq5G6D; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="BpQq5G6D" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1712786881; x=1744322881; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xiqSchLKUGvT4Pg7trp63x+D9biwzzQVpK0rEXgPkZE=; b=BpQq5G6DC7EPKxEFHkTmEXit5uuudvg3+MmLrBVDDzTOM75NATj+UBxK hWKcgpHAART9U4QjYu5MY+g1iZULsLXhPFT6OTKJ3HOHRBtlzTdyX6HyA 9Gd20mjEpyHd4TImAtpdRvq2qQSYIfK26+FHusQ3n4swmoMAwZtZ3VvEm IcAkZ52wOlg2DijhTvInV+6OAgAG1A37hGw3NxCwJtXB4T5bsz9gQSR36 JkSVwtDKLtA/zxYAe2rDjlF4ecKSAteD/amXUAgPNivERMkG2jc3V0389 GqYleeSaTXjVs4l4Gvn9RY/MhH3e//5zwtlsltuckjN1sZn1sUfb7hag9 A==; X-CSE-ConnectionGUID: uTgpQImxSvyzO0DlXWoL+A== X-CSE-MsgGUID: qWiimKMySqqqiDzMtbmH/w== X-IronPort-AV: E=McAfee;i="6600,9927,11039"; a="8041160" X-IronPort-AV: E=Sophos;i="6.07,191,1708416000"; d="scan'208";a="8041160" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2024 15:07:59 -0700 X-CSE-ConnectionGUID: MKlNuO+tR+S5qiNhUcI9VQ== X-CSE-MsgGUID: qr8pHCQgRICCdo9EoSj7xw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,191,1708416000"; d="scan'208";a="25476330" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2024 15:07:59 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org, Sean Christopherson , Paolo Bonzini , Michael Roth , David Matlack , Federico Parola , Kai Huang Subject: [PATCH v2 10/10] KVM: selftests: x86: Add test for KVM_MAP_MEMORY Date: Wed, 10 Apr 2024 15:07:36 -0700 Message-ID: <32427791ef42e5efaafb05d2ac37fa4372715f47.1712785629.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Add a test case to exercise KVM_MAP_MEMORY and run the guest to access the pre-populated area. It tests KVM_MAP_MEMORY ioctl for KVM_X86_DEFAULT_VM and KVM_X86_SW_PROTECTED_VM. Signed-off-by: Isaku Yamahata --- v2: - Catch up for uAPI change. - Added smm mode test case. - Added guest mode test case. --- tools/include/uapi/linux/kvm.h | 8 + tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/x86_64/map_memory_test.c | 479 ++++++++++++++++++ 3 files changed, 488 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86_64/map_memory_test.c diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h index c3308536482b..c742c403256a 100644 --- a/tools/include/uapi/linux/kvm.h +++ b/tools/include/uapi/linux/kvm.h @@ -2227,4 +2227,12 @@ struct kvm_create_guest_memfd { __u64 reserved[6]; }; +#define KVM_MAP_MEMORY _IOWR(KVMIO, 0xd5, struct kvm_memory_mapping) + +struct kvm_memory_mapping { + __u64 base_address; + __u64 size; + __u64 flags; +}; + #endif /* __LINUX_KVM_H */ diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 871e2de3eb05..2b097b6ec267 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -144,6 +144,7 @@ TEST_GEN_PROGS_x86_64 += set_memory_region_test TEST_GEN_PROGS_x86_64 += steal_time TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test TEST_GEN_PROGS_x86_64 += system_counter_offset_test +TEST_GEN_PROGS_x86_64 += x86_64/map_memory_test # Compiled outputs used by test targets TEST_GEN_PROGS_EXTENDED_x86_64 += x86_64/nx_huge_pages_test diff --git a/tools/testing/selftests/kvm/x86_64/map_memory_test.c b/tools/testing/selftests/kvm/x86_64/map_memory_test.c new file mode 100644 index 000000000000..d5728439542e --- /dev/null +++ b/tools/testing/selftests/kvm/x86_64/map_memory_test.c @@ -0,0 +1,479 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2024, Intel, Inc + * + * Author: + * Isaku Yamahata + */ +#include + +#include +#include +#include + +/* Arbitrarily chosen value. Pick 3G */ +#define TEST_GVA 0xc0000000 +#define TEST_GPA TEST_GVA +#define TEST_SIZE (SZ_2M + PAGE_SIZE) +#define TEST_NPAGES (TEST_SIZE / PAGE_SIZE) +#define TEST_SLOT 10 + +/* Nested: VMXON and VMCS12 for VMX or VMCB for XVM */ +/* Arbitrarily chosen value. Pick 128MB after TEST_GVA. */ +#define NESTED_GVA (TEST_GVA + 128 * 1024 * 1024) +#define NESTED_GPA (TEST_GPA + 128 * 1024 * 1024) +#define NESTED_NPAGES 2 +#define NESTED_SIZE (NESTED_NPAGES * PAGE_SIZE) +#define NESTED_SLOT 11 + +static void guest_code(uint64_t base_gpa) +{ + volatile uint64_t val __used; + int i; + + for (i = 0; i < TEST_NPAGES; i++) { + uint64_t *src = (uint64_t *)(base_gpa + i * PAGE_SIZE); + + val = *src; + } + + GUEST_DONE(); +} + +static void map_memory(struct kvm_vcpu *vcpu, u64 base_address, u64 size, + bool should_success) +{ + struct kvm_memory_mapping mapping = { + .base_address = base_address, + .size = size, + .flags = 0, + }; + int ret; + + do { + ret = __vcpu_ioctl(vcpu, KVM_MAP_MEMORY, &mapping); + } while (ret && (errno == EAGAIN || errno == EINTR)); + + if (should_success) + __TEST_ASSERT_VM_VCPU_IOCTL(!ret, "KVM_MAP_MEMORY", ret, vcpu->vm); + else + /* No memory slot causes RET_PF_EMULATE. it results in -EINVAL. */ + __TEST_ASSERT_VM_VCPU_IOCTL(ret && errno == EINVAL, + "KVM_MAP_MEMORY", ret, vcpu->vm); +} + +static void set_smm(struct kvm_vcpu *vcpu, bool enter_or_leave) +{ + struct kvm_vcpu_events events; + + vcpu_events_get(vcpu, &events); + + events.smi.smm = !!enter_or_leave; + events.smi.pending = 0; + events.flags |= KVM_VCPUEVENT_VALID_SMM; + + vcpu_events_set(vcpu, &events); +} + +/* Copied from arch/x86/kvm/vmx/vmcs12.h */ +#define VMCS12_REVISION 0x11e57ed0 + +struct vmcs_hdr { + u32 revision_id:31; + u32 shadow_vmcs:1; +}; + +typedef u64 natural_width; + +struct __packed vmcs12 { + struct vmcs_hdr hdr; + u32 abort; + + u32 launch_state; + u32 padding[7]; + + u64 io_bitmap_a; + u64 io_bitmap_b; + u64 msr_bitmap; + u64 vm_exit_msr_store_addr; + u64 vm_exit_msr_load_addr; + u64 vm_entry_msr_load_addr; + u64 tsc_offset; + u64 virtual_apic_page_addr; + u64 apic_access_addr; + u64 posted_intr_desc_addr; + u64 ept_pointer; + u64 eoi_exit_bitmap0; + u64 eoi_exit_bitmap1; + u64 eoi_exit_bitmap2; + u64 eoi_exit_bitmap3; + u64 xss_exit_bitmap; + u64 guest_physical_address; + u64 vmcs_link_pointer; + u64 guest_ia32_debugctl; + u64 guest_ia32_pat; + u64 guest_ia32_efer; + u64 guest_ia32_perf_global_ctrl; + u64 guest_pdptr0; + u64 guest_pdptr1; + u64 guest_pdptr2; + u64 guest_pdptr3; + u64 guest_bndcfgs; + u64 host_ia32_pat; + u64 host_ia32_efer; + u64 host_ia32_perf_global_ctrl; + u64 vmread_bitmap; + u64 vmwrite_bitmap; + u64 vm_function_control; + u64 eptp_list_address; + u64 pml_address; + u64 encls_exiting_bitmap; + u64 tsc_multiplier; + u64 padding64[1]; + + natural_width cr0_guest_host_mask; + natural_width cr4_guest_host_mask; + natural_width cr0_read_shadow; + natural_width cr4_read_shadow; + natural_width dead_space[4]; + natural_width exit_qualification; + natural_width guest_linear_address; + natural_width guest_cr0; + natural_width guest_cr3; + natural_width guest_cr4; + natural_width guest_es_base; + natural_width guest_cs_base; + natural_width guest_ss_base; + natural_width guest_ds_base; + natural_width guest_fs_base; + natural_width guest_gs_base; + natural_width guest_ldtr_base; + natural_width guest_tr_base; + natural_width guest_gdtr_base; + natural_width guest_idtr_base; + natural_width guest_dr7; + natural_width guest_rsp; + natural_width guest_rip; + natural_width guest_rflags; + natural_width guest_pending_dbg_exceptions; + natural_width guest_sysenter_esp; + natural_width guest_sysenter_eip; + natural_width host_cr0; + natural_width host_cr3; + natural_width host_cr4; + natural_width host_fs_base; + natural_width host_gs_base; + natural_width host_tr_base; + natural_width host_gdtr_base; + natural_width host_idtr_base; + natural_width host_ia32_sysenter_esp; + natural_width host_ia32_sysenter_eip; + natural_width host_rsp; + natural_width host_rip; + natural_width paddingl[8]; + + u32 pin_based_vm_exec_control; + u32 cpu_based_vm_exec_control; + u32 exception_bitmap; + u32 page_fault_error_code_mask; + u32 page_fault_error_code_match; + u32 cr3_target_count; + u32 vm_exit_controls; + u32 vm_exit_msr_store_count; + u32 vm_exit_msr_load_count; + u32 vm_entry_controls; + u32 vm_entry_msr_load_count; + u32 vm_entry_intr_info_field; + u32 vm_entry_exception_error_code; + u32 vm_entry_instruction_len; + u32 tpr_threshold; + u32 secondary_vm_exec_control; + u32 vm_instruction_error; + u32 vm_exit_reason; + u32 vm_exit_intr_info; + u32 vm_exit_intr_error_code; + u32 idt_vectoring_info_field; + u32 idt_vectoring_error_code; + u32 vm_exit_instruction_len; + u32 vmx_instruction_info; + u32 guest_es_limit; + u32 guest_cs_limit; + u32 guest_ss_limit; + u32 guest_ds_limit; + u32 guest_fs_limit; + u32 guest_gs_limit; + u32 guest_ldtr_limit; + u32 guest_tr_limit; + u32 guest_gdtr_limit; + u32 guest_idtr_limit; + u32 guest_es_ar_bytes; + u32 guest_cs_ar_bytes; + u32 guest_ss_ar_bytes; + u32 guest_ds_ar_bytes; + u32 guest_fs_ar_bytes; + u32 guest_gs_ar_bytes; + u32 guest_ldtr_ar_bytes; + u32 guest_tr_ar_bytes; + u32 guest_interruptibility_info; + u32 guest_activity_state; + u32 guest_sysenter_cs; + u32 host_ia32_sysenter_cs; + u32 vmx_preemption_timer_value; + u32 padding32[7]; + + u16 virtual_processor_id; + u16 posted_intr_nv; + u16 guest_es_selector; + u16 guest_cs_selector; + u16 guest_ss_selector; + u16 guest_ds_selector; + u16 guest_fs_selector; + u16 guest_gs_selector; + u16 guest_ldtr_selector; + u16 guest_tr_selector; + u16 guest_intr_status; + u16 host_es_selector; + u16 host_cs_selector; + u16 host_ss_selector; + u16 host_ds_selector; + u16 host_fs_selector; + u16 host_gs_selector; + u16 host_tr_selector; + u16 guest_pml_index; +}; + +/* Fill values to make KVM vmx_set_nested_state() pass. */ +void vmx_vmcs12_init(struct vmcs12 *vmcs12) +{ + *(__u32*)(vmcs12) = VMCS12_REVISION; + + vmcs12->vmcs_link_pointer = -1; + +#define PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR 0x00000016 + vmcs12->pin_based_vm_exec_control = PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR; + +#define CPU_BASED_ALWAYSON_WITHOUT_TRUE_MSR 0x0401e172 + vmcs12->cpu_based_vm_exec_control = CPU_BASED_ALWAYSON_WITHOUT_TRUE_MSR; + + vmcs12->secondary_vm_exec_control = 0; + +#define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff + vmcs12->vm_exit_controls = VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR; + +#define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff + vmcs12->vm_entry_controls = VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR; + +#define VMXON_CR0_ALWAYSON (X86_CR0_PE | X86_CR0_PG | X86_CR0_NE) +#define VMXON_CR4_ALWAYSON X86_CR4_VMXE + + /* host */ + vmcs12->host_cr0 = VMXON_CR0_ALWAYSON; + vmcs12->host_cr3 = TEST_GPA; + vmcs12->host_cr4 = VMXON_CR4_ALWAYSON;; + + /* Non-zero to make KVM vmx check pass */ + vmcs12->host_cs_selector = 8; + vmcs12->host_ss_selector = 8; + vmcs12->host_ds_selector = 8; + vmcs12->host_es_selector = 8; + vmcs12->host_fs_selector = 8; + vmcs12->host_gs_selector = 8; + vmcs12->host_tr_selector = 8; + + /* guest */ + vmcs12->guest_cr0 = VMXON_CR0_ALWAYSON; + vmcs12->guest_cr4 = VMXON_CR4_ALWAYSON; + + vmcs12->guest_cs_selector = 0xf000; + vmcs12->guest_cs_base = 0xffff0000UL; + vmcs12->guest_cs_limit = 0xffff; + vmcs12->guest_cs_ar_bytes = 0x93 | 0x08; + + vmcs12->guest_ds_selector = 0; + vmcs12->guest_ds_base = 0; + vmcs12->guest_ds_limit = 0xffff; + vmcs12->guest_ds_ar_bytes = 0x93; + + vmcs12->guest_es_selector = 0; + vmcs12->guest_es_base = 0; + vmcs12->guest_es_limit = 0xffff; + vmcs12->guest_es_ar_bytes = 0x93; + + vmcs12->guest_fs_selector = 0; + vmcs12->guest_fs_base = 0; + vmcs12->guest_fs_limit = 0xffff; + vmcs12->guest_fs_ar_bytes = 0x93; + + vmcs12->guest_gs_selector = 0; + vmcs12->guest_gs_base = 0; + vmcs12->guest_gs_limit = 0xffff; + vmcs12->guest_gs_ar_bytes = 0x93; + + vmcs12->guest_ss_selector = 0; + vmcs12->guest_ss_base = 0; + vmcs12->guest_ss_limit = 0xffff; + vmcs12->guest_ss_ar_bytes = 0x93; + + vmcs12->guest_ldtr_selector = 0; + vmcs12->guest_ldtr_base = 0; + vmcs12->guest_ldtr_limit = 0xfff; + vmcs12->guest_ldtr_ar_bytes = 0x008b; + + vmcs12->guest_gdtr_base = 0; + vmcs12->guest_gdtr_limit = 0xffff; + + vmcs12->guest_idtr_base = 0; + vmcs12->guest_idtr_limit = 0xffff; + + /* ACTIVE = 0 */ + vmcs12->guest_activity_state = 0; + + vmcs12->guest_interruptibility_info = 0; + vmcs12->guest_pending_dbg_exceptions = 0; + + vmcs12->vm_entry_intr_info_field = 0; +} + +void vmx_state_set(struct kvm_vcpu *vcpu, struct kvm_nested_state *state, + __u16 flags, __u64 vmxon_pa, __u64 vmcs12_pa) +{ + struct vmcs12 *vmcs12 = (struct vmcs12 *)state->data.vmx->vmcs12; + + memset(state, 0, sizeof(*state) + KVM_STATE_NESTED_VMX_VMCS_SIZE); + + state->flags = flags; + state->format = KVM_STATE_NESTED_FORMAT_VMX; + state->size = KVM_STATE_NESTED_VMX_VMCS_SIZE; + + state->hdr.vmx.vmxon_pa = vmxon_pa; + state->hdr.vmx.vmcs12_pa = vmcs12_pa; + state->hdr.vmx.smm.flags = 0; + state->hdr.vmx.pad = 0; + state->hdr.vmx.flags = 0; + state->hdr.vmx.preemption_timer_deadline = 0; + + vmx_vmcs12_init(vmcs12); + + vcpu_nested_state_set(vcpu, state); +} + +static void __test_map_memory(unsigned long vm_type, bool private, + bool smm, bool nested) +{ + struct kvm_nested_state *state = NULL; + const struct vm_shape shape = { + .mode = VM_MODE_DEFAULT, + .type = vm_type, + }; + struct kvm_sregs sregs; + struct kvm_vcpu *vcpu; + struct kvm_regs regs; + struct kvm_run *run; + struct kvm_vm *vm; + struct ucall uc; + + vm = vm_create_shape_with_one_vcpu(shape, &vcpu, guest_code); + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, + TEST_GPA, TEST_SLOT, TEST_NPAGES, + private ? KVM_MEM_GUEST_MEMFD : 0); + virt_map(vm, TEST_GVA, TEST_GPA, TEST_NPAGES); + + if (private) + vm_mem_set_private(vm, TEST_GPA, TEST_SIZE); + if (nested) { + size_t size = sizeof(*state); + + if (kvm_cpu_has(X86_FEATURE_VMX)) { + size += KVM_STATE_NESTED_VMX_VMCS_SIZE; + vcpu_set_cpuid_feature(vcpu, X86_FEATURE_VMX); + } + + state = malloc(size); + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, NESTED_GPA, + NESTED_SLOT, NESTED_NPAGES, 0); + virt_map(vm, NESTED_GVA, NESTED_GPA, NESTED_NPAGES); + } + + if (smm) + set_smm(vcpu, true); + if (nested) { + vcpu_regs_get(vcpu, ®s); + vcpu_sregs_get(vcpu, &sregs); + vmx_state_set(vcpu, state, KVM_STATE_NESTED_RUN_PENDING | + KVM_STATE_NESTED_GUEST_MODE, + NESTED_GPA, NESTED_GPA + PAGE_SIZE); + } + map_memory(vcpu, TEST_GPA, SZ_2M, true); + map_memory(vcpu, TEST_GPA + SZ_2M, PAGE_SIZE, true); + map_memory(vcpu, TEST_GPA + TEST_SIZE, PAGE_SIZE, false); + if (nested) { + vmx_state_set(vcpu, state, 0, -1, -1); + free(state); + vcpu_sregs_set(vcpu, &sregs); + vcpu_regs_set(vcpu, ®s); + } + if (smm) + set_smm(vcpu, false); + + vcpu_args_set(vcpu, 1, TEST_GVA); + vcpu_run(vcpu); + + run = vcpu->run; + TEST_ASSERT(run->exit_reason == KVM_EXIT_IO, + "Wanted KVM_EXIT_IO, got exit reason: %u (%s)", + run->exit_reason, exit_reason_str(run->exit_reason)); + + switch (get_ucall(vcpu, &uc)) { + case UCALL_ABORT: + REPORT_GUEST_ASSERT(uc); + break; + case UCALL_DONE: + break; + default: + TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd); + break; + } + + kvm_vm_free(vm); +} + +static void test_map_memory(unsigned long vm_type, bool private) +{ + if (!(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type))) { + pr_info("Skipping tests for vm_type 0x%lx\n", vm_type); + return; + } + + __test_map_memory(vm_type, private, false, false); + + if (kvm_has_cap(KVM_CAP_VCPU_EVENTS) && kvm_has_cap(KVM_CAP_X86_SMM)) + __test_map_memory(vm_type, private, true, false); + else + pr_info("skipping test for vm_type 0x%lx with smm\n", vm_type); + + if (!kvm_has_cap(KVM_CAP_NESTED_STATE)) { + pr_info("Skipping test for vm_type 0x%lx with nesting\n", vm_type); + return; + } + + if (kvm_cpu_has(X86_FEATURE_SVM)) { + pr_info("Implement nested SVM case\n"); + return; + } + if (!kvm_cpu_has(X86_FEATURE_VMX)) { + pr_info("Skipping test for vm_type 0x%lx with nested VMX\n", + vm_type); + return; + } + __test_map_memory(vm_type, private, false, true); +} + +int main(int argc, char *argv[]) +{ + TEST_REQUIRE(kvm_check_cap(KVM_CAP_MAP_MEMORY)); + + test_map_memory(KVM_X86_DEFAULT_VM, false); + test_map_memory(KVM_X86_SW_PROTECTED_VM, false); + test_map_memory(KVM_X86_SW_PROTECTED_VM, true); + return 0; +}