From patchwork Sat Dec 2 09:13:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476832 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="FmnHunGa" Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E76FC129; Sat, 2 Dec 2023 01:42:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510150; x=1733046150; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=nR/psJ1R705oxKAaTJeTprEYjfrf/NpWHn/y7k4WDWk=; b=FmnHunGaHs7cezVOUapSseOo2uOYSrJkAGr91I3yxShEV4kk7W15LlRy CfAYIQmd75BDQnKLC2CIphJUedxYXTfsutXgcp69PJYkWY7/izPZMXZ3Z mzrpxFvbHAJYEdGHKY5/P6WiS5onHnUd38VDiSyxJt0M8wQinuD5wjx4X xJ7rvZmSNdarAS4ZAiqX1acDqkwykwq39oWMLw3mNZGG/bL4+Srh9KA1f F7V64cHVcIs9Vu21WFJaxDzHk7X9UH/CFhnoLRFygxE+cbi88BVhGfmBQ /Tvi/1FdKP0lQy3hKAhRI6HZz2xQTrRMGmBUNxZ5SOe3FY7lcgrWmhwiF w==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="424755031" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="424755031" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:42:30 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="836018614" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="836018614" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:42:26 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 01/42] KVM: Public header for KVM to export TDP Date: Sat, 2 Dec 2023 17:13:24 +0800 Message-Id: <20231202091324.13436-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Introduce public header for data structures and interfaces for KVM to export TDP page table (EPT/NPT in x86) to external components of KVM. KVM exposes a TDP FD object which allows external components to get page table meta data, request mapping, and register invalidation callbacks to the TDP page table exported by KVM. Two symbols kvm_tdp_fd_get() and kvm_tdp_fd_put() are exported by KVM to external components to get/put the TDP FD object. New header file kvm_tdp_fd.h is added because kvm_host.h is not expected to be included from outside of KVM in future AFAIK. Signed-off-by: Yan Zhao --- include/linux/kvm_tdp_fd.h | 137 +++++++++++++++++++++++++++++++++++++ 1 file changed, 137 insertions(+) create mode 100644 include/linux/kvm_tdp_fd.h diff --git a/include/linux/kvm_tdp_fd.h b/include/linux/kvm_tdp_fd.h new file mode 100644 index 0000000000000..3661779dd8cf5 --- /dev/null +++ b/include/linux/kvm_tdp_fd.h @@ -0,0 +1,137 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef __KVM_TDP_FD_H +#define __KVM_TDP_FD_H + +#include +#include + +struct kvm_exported_tdp; +struct kvm_exported_tdp_ops; +struct kvm_tdp_importer_ops; + +/** + * struct kvm_tdp_fd - KVM TDP FD object + * + * Interface of exporting KVM TDP page table to external components of KVM. + * + * This KVM TDP FD object is created by KVM VM ioctl KVM_CREATE_TDP_FD. + * On object creation, KVM will find or create a TDP page table, mark it as + * exported and increase reference count of this exported TDP page table. + * + * On object destroy, the exported TDP page table is unmarked as exported with + * its reference count decreased. + * + * During the life cycle of KVM TDP FD object, ref count of KVM VM is hold. + * + * Components outside of KVM can get meta data (e.g. page table type, levels, + * root HPA,...), request page fault on the exported TDP page table and register + * themselves as importers to receive notification through kvm_exported_tdp_ops + * @ops. + * + * @file: struct file object associated with the KVM TDP FD object. + * @ops: kvm_exported_tdp_ops associated with the exported TDP page table. + * @priv: internal data structures used by KVM to manage TDP page table + * exported by KVM. + * + */ +struct kvm_tdp_fd { + /* Public */ + struct file *file; + const struct kvm_exported_tdp_ops *ops; + + /* private to KVM */ + struct kvm_exported_tdp *priv; +}; + +/** + * kvm_tdp_fd_get - Public interface to get KVM TDP FD object. + * + * @fd: fd of the KVM TDP FD object. + * @return: KVM TDP FD object if @fd corresponds to a valid KVM TDP FD file. + * -EBADF if @fd does not correspond a struct file. + * -EINVAL if @fd does not correspond to a KVM TDP FD file. + * + * Callers of this interface will get a KVM TDP FD object with ref count + * increased. + */ +struct kvm_tdp_fd *kvm_tdp_fd_get(int fd); + +/** + * kvm_tdp_fd_put - Public interface to put ref count of a KVM TDP FD object. + * + * @tdp: KVM TDP FD object. + * + * Put reference count of the KVM TDP FD object. + * After the last reference count of the TDP FD object goes away, + * kvm_tdp_fd_release() will be called to decrease KVM VM ref count and destroy + * the KVM TDP FD object. + */ +void kvm_tdp_fd_put(struct kvm_tdp_fd *tdp); + +struct kvm_tdp_fault_type { + u32 read:1; + u32 write:1; + u32 exec:1; +}; + +/** + * struct kvm_exported_tdp_ops - operations possible on KVM TDP FD object. + * @register_importer: This is called from components outside of KVM to register + * importer callback ops and the importer data. + * This callback is a must. + * Returns: 0 on success, negative error code on failure. + * -EBUSY if the importer ops is already registered. + * @unregister_importer:This is called from components outside of KVM if it does + * not want to receive importer callbacks any more. + * This callback is a must. + * @fault: This is called from components outside of KVM to trigger + * page fault on a GPA and to map physical page into the + * TDP page tables exported by KVM. + * This callback is optional. + * If this callback is absent, components outside KVM will + * not be able to trigger page fault and map physical pages + * into the TDP page tables exported by KVM. + * @get_metadata: This is called from components outside of KVM to retrieve + * meta data of the TDP page tables exported by KVM, e.g. + * page table type,root HPA, levels, reserved zero bits... + * Returns: pointer to a vendor meta data on success. + * Error PTR on error. + * This callback is a must. + */ +struct kvm_exported_tdp_ops { + int (*register_importer)(struct kvm_tdp_fd *tdp_fd, + struct kvm_tdp_importer_ops *ops, + void *importer_data); + + void (*unregister_importer)(struct kvm_tdp_fd *tdp_fd, + struct kvm_tdp_importer_ops *ops); + + int (*fault)(struct kvm_tdp_fd *tdp_fd, struct mm_struct *mm, + unsigned long gfn, struct kvm_tdp_fault_type type); + + void *(*get_metadata)(struct kvm_tdp_fd *tdp_fd); +}; + +/** + * struct kvm_tdp_importer_ops - importer callbacks + * + * Components outside of KVM can be registered as importers of KVM's exported + * TDP page tables via register_importer op in kvm_exported_tdp_ops of a KVM TDP + * FD object. + * + * Each importer must define its own importer callbacks and KVM will notify + * importers of changes of the exported TDP page tables. + */ +struct kvm_tdp_importer_ops { + /** + * This is called by KVM to notify the importer that a range of KVM + * TDP has been invalidated. + * When @start is 0 and @size is -1, a whole of KVM TDP is invalidated. + * + * @data: the importer private data. + * @start: start GPA of the invalidated range. + * @size: length of in the invalidated range. + */ + void (*invalidate)(void *data, unsigned long start, unsigned long size); +}; +#endif /* __KVM_TDP_FD_H */ From patchwork Sat Dec 2 09:15:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476833 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="jX5SEbC7" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A322D134; Sat, 2 Dec 2023 01:44:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510250; x=1733046250; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=USlqd3EMMW16iV2fDYRyDyMhXnAPYPczuIPu2d6BNb0=; b=jX5SEbC74aFHJecVVTiCJTOY+Pt/jWQ8UiSj4vnW1eIyWNWYs81zSLIG sIc4+4RJyPASsSJRR/+8npA2etrkDOMxiK49PraGEKkbF/pZKMztQk2TT 8g1tkA32BAAL3/WtfMN5K6aotm/hvjxJIL00mv47O+AE4wCP/0yW1qXIV PWVi/0BgDQggis2aeuc5pSrgYD2y1pf3fXk9JhhTl/MqylbF3Kj088j8e GaaumgWkTZFP8Q5+ook68MxEKEO/sHaTNMuV1VsMAkL6ZkiAk7S23aGpa a9BN9kCkAi6aaHmN0i9L2b5hr/C74KHO0PzEiJHRkSxoBTRen2i74AhfE g==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="444223" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="444223" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:44:09 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="943354101" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="943354101" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:44:05 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 02/42] KVM: x86: Arch header for kvm to export TDP for Intel Date: Sat, 2 Dec 2023 17:15:04 +0800 Message-Id: <20231202091504.13502-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Headers to define Intel specific meta data for TDP page tables exported by KVM. The meta data includes page table type, level, HPA of root page, max huge page level, and reserved zero bits currently. (Note, each vendor can define their own meta data format .e.g. it could be kvm_exported_tdp_meta_svm on AMD platform.) The consumer of the exported TDP (e.g. Intel vt-d driver) can retrieve and check the vendor specific meta data before loading the KVM exported TDP page tables to their own secondary MMU. Signed-off-by: Yan Zhao --- arch/x86/include/asm/kvm_exported_tdp.h | 43 +++++++++++++++++++++++++ include/linux/kvm_types.h | 12 +++++++ 2 files changed, 55 insertions(+) create mode 100644 arch/x86/include/asm/kvm_exported_tdp.h diff --git a/arch/x86/include/asm/kvm_exported_tdp.h b/arch/x86/include/asm/kvm_exported_tdp.h new file mode 100644 index 0000000000000..c7fe3f3cf89fb --- /dev/null +++ b/arch/x86/include/asm/kvm_exported_tdp.h @@ -0,0 +1,43 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_KVM_EXPORTED_TDP_H +#define _ASM_X86_KVM_EXPORTED_TDP_H +#define PT64_ROOT_MAX_LEVEL 5 + +#include +/** + * struct kvm_exported_tdp_meta_vmx - Intel specific meta data format of TDP + * page tables exported by KVM. + * + * Importers of KVM exported TDPs can decode meta data of the page tables with + * this structure. + * + * @type: Type defined across platforms to identify hardware + * platform of a KVM exported TDP. Importers of KVM + * exported TDP need to first check the type before + * decoding page table meta data. + * @level: Levels of the TDP exported by KVM. + * @root_hpa: HPA of the root page of TDP exported by KVM. + * @max_huge_page_level: Max huge page level allowed on the TDP exported by KVM. + * @rsvd_bits_mask: The must-be-zero bits of leaf and non-leaf PTEs. + * rsvd_bits_mask[0] or rsvd_bits_mask[1] is selected by + * bit 7 or a PTE. + * This field is provided as a way for importers to check + * if the must-be-zero bits from KVM is compatible to the + * importer side. KVM will ensure that the must-be-zero + * bits must not be set even for software purpose. + * (e.g. on Intel platform, bit 11 is usually used by KVM + * to identify a present SPTE, though bit 11 is ignored by + * EPT. However, Intel vt-d requires the bit 11 to be 0. + * Before importing KVM TDP, Intel vt-d driver needs to + * check if bit 11 is set in the must-be-zero bits by KVM + * to avoid possible DMAR fault.) + */ +struct kvm_exported_tdp_meta_vmx { + enum kvm_exported_tdp_type type; + int level; + hpa_t root_hpa; + int max_huge_page_level; + u64 rsvd_bits_mask[2][PT64_ROOT_MAX_LEVEL]; +}; + +#endif diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h index 6f4737d5046a4..04deb8334ce42 100644 --- a/include/linux/kvm_types.h +++ b/include/linux/kvm_types.h @@ -123,4 +123,16 @@ struct kvm_vcpu_stat_generic { #define KVM_STATS_NAME_SIZE 48 +/** + * enum kvm_exported_tdp_type - Type defined across platforms for TDP exported + * by KVM. + * + * @KVM_TDP_TYPE_EPT: The TDP is of type EPT running on Intel platform. + * + * Currently, @KVM_TDP_TYPE_EPT is the only supported type for TDPs exported by + * KVM. + */ +enum kvm_exported_tdp_type { + KVM_TDP_TYPE_EPT = 1, +}; #endif /* __KVM_TYPES_H__ */ From patchwork Sat Dec 2 09:15:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476834 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="doAyBkor" Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15A94181; Sat, 2 Dec 2023 01:44:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510280; x=1733046280; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=3m2805xxDC9GupP7oswWsUnW1kwrOEt/HuavVpPlfek=; b=doAyBkorSKI5GQlTAY56Lfn9XRT5lkLs1Q3aGbjDVVfRCbYIeEfa/YIa ZKk4DXV7s7Z+duit9oKQO23iVrNAbqiA30lXJykin+KRE7r1Vz7wUzVa3 WBdHhCtTmP3dE0uNjw08lOXkF/TseFZua7aGD9RtmmOzC+InIaKMoY1Mu 2LBkb0zxtZlHaI4JhAUKKD0SmZ36+J6y/D69sc9bZM9dE/hyV5OGsiMI8 B22Xp7MDsDtsgLvhcseB/pFll9eATs377JfLnXbnGsqdr3vPHAixltJFx 2EnCyJSho3M47VdLdDYQ+axaqAdagy2fcOQIDXVZr00K1fTFP36e8jRgY A==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="390756406" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="390756406" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:44:39 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="860817503" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="860817503" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:44:36 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 03/42] KVM: Introduce VM ioctl KVM_CREATE_TDP_FD Date: Sat, 2 Dec 2023 17:15:41 +0800 Message-Id: <20231202091541.13568-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Introduce VM ioctl KVM_CREATE_TDP_FD to create KVM TDP FD object, which will act as an interface of KVM to export TDP page tables and communicate with external components of KVM. Signed-off-by: Yan Zhao --- include/uapi/linux/kvm.h | 19 +++++++++++++++++++ virt/kvm/kvm_main.c | 19 +++++++++++++++++++ virt/kvm/tdp_fd.h | 10 ++++++++++ 3 files changed, 48 insertions(+) create mode 100644 virt/kvm/tdp_fd.h diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 211b86de35ac5..f181883c60fed 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1582,6 +1582,9 @@ struct kvm_s390_ucas_mapping { #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) +/* ioctl for vm fd to create tdp fd */ +#define KVM_CREATE_TDP_FD _IOWR(KVMIO, 0xe4, struct kvm_create_tdp_fd) + /* * ioctls for vcpu fds */ @@ -2267,4 +2270,20 @@ struct kvm_s390_zpci_op { /* flags for kvm_s390_zpci_op->u.reg_aen.flags */ #define KVM_S390_ZPCIOP_REGAEN_HOST (1 << 0) +/** + * struct kvm_create_tdp_fd - VM ioctl(KVM_CREATE_TDP_FD) + * Create a TDP fd object for a TDP exported by KVM. + * + * @as_id: in: Address space ID for this TDP. + * @mode: in: Mode of this tdp. + * Reserved for future usage. Currently, this field must be 0. + * @fd: out: fd of TDP fd object for a TDP exported by KVM. + * @pad: in: Reserved as 0. + */ +struct kvm_create_tdp_fd { + __u32 as_id; + __u32 mode; + __u32 fd; + __u32 pad; +}; #endif /* __LINUX_KVM_H */ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 486800a7024b3..494b6301a6065 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -61,6 +61,7 @@ #include "async_pf.h" #include "kvm_mm.h" #include "vfio.h" +#include "tdp_fd.h" #include @@ -4973,6 +4974,24 @@ static long kvm_vm_ioctl(struct file *filp, case KVM_GET_STATS_FD: r = kvm_vm_ioctl_get_stats_fd(kvm); break; + case KVM_CREATE_TDP_FD: { + struct kvm_create_tdp_fd ct; + + r = -EFAULT; + if (copy_from_user(&ct, argp, sizeof(ct))) + goto out; + + r = kvm_create_tdp_fd(kvm, &ct); + if (r) + goto out; + + r = -EFAULT; + if (copy_to_user(argp, &ct, sizeof(ct))) + goto out; + + r = 0; + break; + } default: r = kvm_arch_vm_ioctl(filp, ioctl, arg); } diff --git a/virt/kvm/tdp_fd.h b/virt/kvm/tdp_fd.h new file mode 100644 index 0000000000000..05c8a6d767469 --- /dev/null +++ b/virt/kvm/tdp_fd.h @@ -0,0 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __TDP_FD_H +#define __TDP_FD_H + +static inline int kvm_create_tdp_fd(struct kvm *kvm, struct kvm_create_tdp_fd *ct) +{ + return -EOPNOTSUPP; +} + +#endif /* __TDP_FD_H */ From patchwork Sat Dec 2 09:16:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476835 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="deRAPI2L" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 47DC819F; Sat, 2 Dec 2023 01:45:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510315; x=1733046315; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=4NU1FsCRind+58ulZNL2WYHJcOcY7NHFVydNqKJRdKM=; b=deRAPI2L0kO8iPTYwCLUjCsIDtb4zE70WsXrZ5dA9/wunNe/hqzZoto/ eGRcPH3t5YBH5lJTbYf+6vxDvn0S2M9URP5mNDWL8eaco+diJsD6eR1Nh ReA7LMEcbGw4euzzZIF/r/Eth+h3VNkqc96CUMkyzrbFim3F6zrKS26Xo I6BM7xXVke8mBS2fROvZAUuJSMedYj7IFhnF4F8+E0OQ6IL1EL5hTcc5Y 5edk2rKYaRMELLrEaX6ePlkLcBnxImZKbMKjifbD7od5/Dbg1V+UJ3YwA p8JifwJJNHnRNiy2/ONm5ElW42lHwTw8BWDM216FLsOwXsmlk7cod5kCA w==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="397478868" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="397478868" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:45:14 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="913852587" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="913852587" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:45:09 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 04/42] KVM: Skeleton of KVM TDP FD object Date: Sat, 2 Dec 2023 17:16:15 +0800 Message-Id: <20231202091615.13643-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: This is a skeleton implementation of KVM TDP FD object. The KVM TDP FD object is created by ioctl KVM_CREATE_TDP_FD in kvm_create_tdp_fd(), which contains Public part (defined in ): - A file object for reference count file reference count is 1 on creating KVM TDP FD object. On the reference count of the file object goes to 0, its .release() handler will destroy the KVM TDP FD object. - ops kvm_exported_tdp_ops (empty implementation in this patch). Private part (kvm_exported_tdp object defined in this patch) : The kvm_exported_tdp object is linked in kvm->exported_tdp_list, one for each KVM address space. It records address space id, and "kvm" pointer for TDP FD object, and KVM VM ref is hold during object life cycle. In later patches, this kvm_exported_tdp object will be associated to a TDP page table exported by KVM. Two symbols kvm_tdp_fd_get() and kvm_tdp_fd_put() are implemented and exported to external components to get/put KVM TDP FD object. Signed-off-by: Yan Zhao --- include/linux/kvm_host.h | 18 ++++ virt/kvm/Kconfig | 3 + virt/kvm/Makefile.kvm | 1 + virt/kvm/kvm_main.c | 5 + virt/kvm/tdp_fd.c | 208 +++++++++++++++++++++++++++++++++++++++ virt/kvm/tdp_fd.h | 5 + 6 files changed, 240 insertions(+) create mode 100644 virt/kvm/tdp_fd.c diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 4944136efaa22..122f47c94ecae 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -44,6 +44,7 @@ #include #include +#include #ifndef KVM_MAX_VCPU_IDS #define KVM_MAX_VCPU_IDS KVM_MAX_VCPUS @@ -808,6 +809,11 @@ struct kvm { struct notifier_block pm_notifier; #endif char stats_id[KVM_STATS_NAME_SIZE]; + +#ifdef CONFIG_HAVE_KVM_EXPORTED_TDP + struct list_head exported_tdp_list; + spinlock_t exported_tdplist_lock; +#endif }; #define kvm_err(fmt, ...) \ @@ -2318,4 +2324,16 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr) /* Max number of entries allowed for each kvm dirty ring */ #define KVM_DIRTY_RING_MAX_ENTRIES 65536 +#ifdef CONFIG_HAVE_KVM_EXPORTED_TDP + +struct kvm_exported_tdp { + struct kvm_tdp_fd *tdp_fd; + + struct kvm *kvm; + u32 as_id; + /* head at kvm->exported_tdp_list */ + struct list_head list_node; +}; + +#endif /* CONFIG_HAVE_KVM_EXPORTED_TDP */ #endif diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 484d0873061ca..63b5d55c84e95 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -92,3 +92,6 @@ config HAVE_KVM_PM_NOTIFIER config KVM_GENERIC_HARDWARE_ENABLING bool + +config HAVE_KVM_EXPORTED_TDP + bool diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm index 2c27d5d0c367c..fad4638e407c5 100644 --- a/virt/kvm/Makefile.kvm +++ b/virt/kvm/Makefile.kvm @@ -12,3 +12,4 @@ kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o +kvm-$(CONFIG_HAVE_KVM_EXPORTED_TDP) += $(KVM)/tdp_fd.o diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 494b6301a6065..9fa9132055807 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1232,6 +1232,11 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) INIT_HLIST_HEAD(&kvm->irq_ack_notifier_list); #endif +#ifdef CONFIG_HAVE_KVM_EXPORTED_TDP + INIT_LIST_HEAD(&kvm->exported_tdp_list); + spin_lock_init(&kvm->exported_tdplist_lock); +#endif + r = kvm_init_mmu_notifier(kvm); if (r) goto out_err_no_mmu_notifier; diff --git a/virt/kvm/tdp_fd.c b/virt/kvm/tdp_fd.c new file mode 100644 index 0000000000000..a5c4c3597e94f --- /dev/null +++ b/virt/kvm/tdp_fd.c @@ -0,0 +1,208 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * KVM TDP FD + * + */ +#include +#include +#include + +#include "tdp_fd.h" + +static inline int is_tdp_fd_file(struct file *file); +static const struct file_operations kvm_tdp_fd_fops; +static const struct kvm_exported_tdp_ops exported_tdp_ops; + +int kvm_create_tdp_fd(struct kvm *kvm, struct kvm_create_tdp_fd *ct) +{ + struct kvm_exported_tdp *tdp; + struct kvm_tdp_fd *tdp_fd; + int as_id = ct->as_id; + int ret, fd; + + if (as_id >= KVM_ADDRESS_SPACE_NUM || ct->pad || ct->mode) + return -EINVAL; + + /* for each address space, only one exported tdp is allowed */ + spin_lock(&kvm->exported_tdplist_lock); + list_for_each_entry(tdp, &kvm->exported_tdp_list, list_node) { + if (tdp->as_id != as_id) + continue; + + spin_unlock(&kvm->exported_tdplist_lock); + return -EEXIST; + } + spin_unlock(&kvm->exported_tdplist_lock); + + tdp_fd = kzalloc(sizeof(*tdp_fd), GFP_KERNEL_ACCOUNT); + if (!tdp) + return -ENOMEM; + + tdp = kzalloc(sizeof(*tdp), GFP_KERNEL_ACCOUNT); + if (!tdp) { + kfree(tdp_fd); + return -ENOMEM; + } + tdp_fd->priv = tdp; + tdp->tdp_fd = tdp_fd; + tdp->as_id = as_id; + + if (!kvm_get_kvm_safe(kvm)) { + ret = -ENODEV; + goto out; + } + tdp->kvm = kvm; + + tdp_fd->file = anon_inode_getfile("tdp_fd", &kvm_tdp_fd_fops, + tdp_fd, O_RDWR | O_CLOEXEC); + if (!tdp_fd->file) { + ret = -EFAULT; + goto out; + } + + fd = get_unused_fd_flags(O_RDWR | O_CLOEXEC); + if (fd < 0) + goto out; + + fd_install(fd, tdp_fd->file); + ct->fd = fd; + tdp_fd->ops = &exported_tdp_ops; + + spin_lock(&kvm->exported_tdplist_lock); + list_add(&tdp->list_node, &kvm->exported_tdp_list); + spin_unlock(&kvm->exported_tdplist_lock); + return 0; + +out: + if (tdp_fd->file) + fput(tdp_fd->file); + + if (tdp->kvm) + kvm_put_kvm_no_destroy(tdp->kvm); + kfree(tdp); + kfree(tdp_fd); + return ret; +} + +static int kvm_tdp_fd_release(struct inode *inode, struct file *file) +{ + struct kvm_exported_tdp *tdp; + struct kvm_tdp_fd *tdp_fd; + + if (!is_tdp_fd_file(file)) + return -EINVAL; + + tdp_fd = file->private_data; + tdp = tdp_fd->priv; + + if (WARN_ON(!tdp || !tdp->kvm)) + return -EFAULT; + + spin_lock(&tdp->kvm->exported_tdplist_lock); + list_del(&tdp->list_node); + spin_unlock(&tdp->kvm->exported_tdplist_lock); + + kvm_put_kvm(tdp->kvm); + kfree(tdp); + kfree(tdp_fd); + return 0; +} + +static long kvm_tdp_fd_ioctl(struct file *file, unsigned int cmd, + unsigned long arg) +{ + /* Do not support ioctl currently. May add it in future */ + return -ENODEV; +} + +static int kvm_tdp_fd_mmap(struct file *filp, struct vm_area_struct *vma) +{ + return -ENODEV; +} + +static const struct file_operations kvm_tdp_fd_fops = { + .unlocked_ioctl = kvm_tdp_fd_ioctl, + .compat_ioctl = compat_ptr_ioctl, + .release = kvm_tdp_fd_release, + .mmap = kvm_tdp_fd_mmap, +}; + +static inline int is_tdp_fd_file(struct file *file) +{ + return file->f_op == &kvm_tdp_fd_fops; +} + +static int kvm_tdp_register_importer(struct kvm_tdp_fd *tdp_fd, + struct kvm_tdp_importer_ops *ops, void *data) +{ + return -EOPNOTSUPP; +} + +static void kvm_tdp_unregister_importer(struct kvm_tdp_fd *tdp_fd, + struct kvm_tdp_importer_ops *ops) +{ +} + +static void *kvm_tdp_get_metadata(struct kvm_tdp_fd *tdp_fd) +{ + return ERR_PTR(-EOPNOTSUPP); +} + +static int kvm_tdp_fault(struct kvm_tdp_fd *tdp_fd, struct mm_struct *mm, + unsigned long gfn, struct kvm_tdp_fault_type type) +{ + return -EOPNOTSUPP; +} + +static const struct kvm_exported_tdp_ops exported_tdp_ops = { + .register_importer = kvm_tdp_register_importer, + .unregister_importer = kvm_tdp_unregister_importer, + .get_metadata = kvm_tdp_get_metadata, + .fault = kvm_tdp_fault, +}; + +/** + * kvm_tdp_fd_get - Public interface to get KVM TDP FD object. + * + * @fd: fd of the KVM TDP FD object. + * @return: KVM TDP FD object if @fd corresponds to a valid KVM TDP FD file. + * -EBADF if @fd does not correspond a struct file. + * -EINVAL if @fd does not correspond to a KVM TDP FD file. + * + * Callers of this interface will get a KVM TDP FD object with ref count + * increased. + */ +struct kvm_tdp_fd *kvm_tdp_fd_get(int fd) +{ + struct file *file; + + file = fget(fd); + if (!file) + return ERR_PTR(-EBADF); + + if (!is_tdp_fd_file(file)) { + fput(file); + return ERR_PTR(-EINVAL); + } + return file->private_data; +} +EXPORT_SYMBOL_GPL(kvm_tdp_fd_get); + +/** + * kvm_tdp_fd_put - Public interface to put ref count of a KVM TDP FD object. + * + * @tdp_fd: KVM TDP FD object. + * + * Put reference count of the KVM TDP FD object. + * After the last reference count of the TDP fd goes away, + * kvm_tdp_fd_release() will be called to decrease KVM VM ref count and destroy + * the KVM TDP FD object. + */ +void kvm_tdp_fd_put(struct kvm_tdp_fd *tdp_fd) +{ + if (WARN_ON(!tdp_fd || !tdp_fd->file || !is_tdp_fd_file(tdp_fd->file))) + return; + + fput(tdp_fd->file); +} +EXPORT_SYMBOL_GPL(kvm_tdp_fd_put); diff --git a/virt/kvm/tdp_fd.h b/virt/kvm/tdp_fd.h index 05c8a6d767469..85da9d8cc1ce4 100644 --- a/virt/kvm/tdp_fd.h +++ b/virt/kvm/tdp_fd.h @@ -2,9 +2,14 @@ #ifndef __TDP_FD_H #define __TDP_FD_H +#ifdef CONFIG_HAVE_KVM_EXPORTED_TDP +int kvm_create_tdp_fd(struct kvm *kvm, struct kvm_create_tdp_fd *ct); + +#else static inline int kvm_create_tdp_fd(struct kvm *kvm, struct kvm_create_tdp_fd *ct) { return -EOPNOTSUPP; } +#endif /* CONFIG_HAVE_KVM_EXPORTED_TDP */ #endif /* __TDP_FD_H */ From patchwork Sat Dec 2 09:16:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476836 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="cvnSYbts" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E02B71A6; Sat, 2 Dec 2023 01:46:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510364; x=1733046364; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=i9OyDLqUL8MDcPXZYwlSXijUUvemlGAamKue93VaY0k=; b=cvnSYbts+s4GoWYobm/2DkUYUfrDOubaPo2zk169Kai1B3S1t6zn0sRq 4ZrXjgSJEOvrh1W5WAAPAp3UZM9sTPbpekEDKGwDB0rLAUFR0rI+qubzR TN+/ELfGzAOfNsBcAJkgszuYi90xyCJE3c/omEs3ArXUUHDQvAmFRDeZ1 /gWOkuWgQ0RSvZq22tlqcN+Jh/l1oVq7BCeL2ipPPNefMVf55kpWDjjJd 5rNZ2lMIjjwBli7sgjhHt1S/E7gD7l6uXyLLenidaXR6kZk8dsxrPV+lp wJeWZ78jz03rDZzUjukKn0onBtjnh3w6yeJjq8N3y6mGHnLFalrx3Ba4D w==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="6886198" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="6886198" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:46:03 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="746278747" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="746278747" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:45:59 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 05/42] KVM: Embed "arch" object and call arch init/destroy in TDP FD Date: Sat, 2 Dec 2023 17:16:59 +0800 Message-Id: <20231202091659.13707-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Embed "arch" object in private "kvm_exported_tdp" object of KVM TDP FD object in order to associate a TDP page table to this private object. With later patches for arch x86, the overall data structure hierarchy on x86 for TDP FD to export TDP is outlined below for preview. kvm_tdp_fd .------ | ops-|-->kvm_exported_tdp_ops | file | public ----------------------------------------------------------------------- | priv-|-->kvm_exported_tdp private '------' .-----------. | tdp_fd | | as_id | | kvm | | importers | | arch -|-->kvm_arch_exported_tdp | list_node | .------. '-----------' | mmu -|--> kvm_exported_tdp_mmu | meta | .-----------. '--|---' | common -|--> kvm_mmu_common | | root_page | | '-----------' | | | +-->kvm_exported_tdp_meta_vmx .--------------------. | type | | level | | root_hpa | | max_huge_page_level| | rsvd_bits_mask | '--------------------' Signed-off-by: Yan Zhao --- include/linux/kvm_host.h | 17 +++++++++++++++++ virt/kvm/tdp_fd.c | 12 +++++++++--- 2 files changed, 26 insertions(+), 3 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 122f47c94ecae..5a74b2b0ac81f 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2327,6 +2327,9 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr) #ifdef CONFIG_HAVE_KVM_EXPORTED_TDP struct kvm_exported_tdp { +#ifdef __KVM_HAVE_ARCH_EXPORTED_TDP + struct kvm_arch_exported_tdp arch; +#endif struct kvm_tdp_fd *tdp_fd; struct kvm *kvm; @@ -2335,5 +2338,19 @@ struct kvm_exported_tdp { struct list_head list_node; }; +#ifdef __KVM_HAVE_ARCH_EXPORTED_TDP +int kvm_arch_exported_tdp_init(struct kvm *kvm, struct kvm_exported_tdp *tdp); +void kvm_arch_exported_tdp_destroy(struct kvm_exported_tdp *tdp); +#else +static inline int kvm_arch_exported_tdp_init(struct kvm *kvm, + struct kvm_exported_tdp *tdp) +{ + return -EOPNOTSUPP; +} +static inline void kvm_arch_exported_tdp_destroy(struct kvm_exported_tdp *tdp) +{ +} +#endif /* __KVM_HAVE_ARCH_EXPORTED_TDP */ + #endif /* CONFIG_HAVE_KVM_EXPORTED_TDP */ #endif diff --git a/virt/kvm/tdp_fd.c b/virt/kvm/tdp_fd.c index a5c4c3597e94f..7e68199ea9643 100644 --- a/virt/kvm/tdp_fd.c +++ b/virt/kvm/tdp_fd.c @@ -52,17 +52,20 @@ int kvm_create_tdp_fd(struct kvm *kvm, struct kvm_create_tdp_fd *ct) goto out; } tdp->kvm = kvm; + ret = kvm_arch_exported_tdp_init(kvm, tdp); + if (ret) + goto out; tdp_fd->file = anon_inode_getfile("tdp_fd", &kvm_tdp_fd_fops, tdp_fd, O_RDWR | O_CLOEXEC); if (!tdp_fd->file) { ret = -EFAULT; - goto out; + goto out_uninit; } fd = get_unused_fd_flags(O_RDWR | O_CLOEXEC); if (fd < 0) - goto out; + goto out_uninit; fd_install(fd, tdp_fd->file); ct->fd = fd; @@ -73,10 +76,12 @@ int kvm_create_tdp_fd(struct kvm *kvm, struct kvm_create_tdp_fd *ct) spin_unlock(&kvm->exported_tdplist_lock); return 0; -out: +out_uninit: if (tdp_fd->file) fput(tdp_fd->file); + kvm_arch_exported_tdp_destroy(tdp); +out: if (tdp->kvm) kvm_put_kvm_no_destroy(tdp->kvm); kfree(tdp); @@ -102,6 +107,7 @@ static int kvm_tdp_fd_release(struct inode *inode, struct file *file) list_del(&tdp->list_node); spin_unlock(&tdp->kvm->exported_tdplist_lock); + kvm_arch_exported_tdp_destroy(tdp); kvm_put_kvm(tdp->kvm); kfree(tdp); kfree(tdp_fd); From patchwork Sat Dec 2 09:17:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476837 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="fXwYKADO" Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CBAA1181; Sat, 2 Dec 2023 01:46:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510396; x=1733046396; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=D/v0y2NA6xRqP9bnDret0xs9lulZFX+zXryP5QvmzLE=; b=fXwYKADO/y9Mdvz1hDolBDulWghnXQkIyDM9NQojKK/7FwEwhyvhVSQD vIeErtLfJZBwbrm99G20ZEbhwF7eB1pRw+JZWnyI8x+NIUHsoUWy7kTEA Lpn/B1KyTbmO6NBybCw4amaFn++gDWH6t0WzfWvPjMy01AhCNvNdFLCxP 4k73P6yL4+GD1hFDU9LUuDqTRJl/es68ZPYqt5aPFo0PN8RKQO0tSEi7l B3oErvl5ciBW9if9Dip5Tx+Daq4lmz0grTUE+tUbLz9bS9YG0eAN7CR4P yIFMAxEh79+rXf6pL8Tw9zRt/5uICipT8NbJGtJYeXaRR83OYl3QQcNPz w==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="372982756" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="372982756" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:46:36 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="773705785" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="773705785" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:46:33 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 06/42] KVM: Register/Unregister importers to KVM exported TDP Date: Sat, 2 Dec 2023 17:17:38 +0800 Message-Id: <20231202091738.13770-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Each TDP exported by KVM has its own list of importers. External components can register/unregister itself as an importer with a unique importer ops. The sequence for external components to register/unregister as importer is like: 1. call kvm_tdp_fd_get() to get a KVM TDP fd object. 2. call tdp_fd->ops->register_importer() to register itself as an importer. 3. call tdp_fd->ops->unregister_importer() to unregister itself as importer. 4. call kvm_tdp_fd_put() to put the KVM TDP fd object. When destroying a KVM TDP fd object, all importers are force-unregistered. There's no extra notification to the importers at that time because the force-unregister should only happen when importers calls kvm_tdp_fd_put() without calling tdp_fd->ops->unregister_importer() first. Signed-off-by: Yan Zhao --- include/linux/kvm_host.h | 5 +++ virt/kvm/tdp_fd.c | 68 +++++++++++++++++++++++++++++++++++++++- 2 files changed, 72 insertions(+), 1 deletion(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 5a74b2b0ac81f..f73d32eef8833 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2334,6 +2334,11 @@ struct kvm_exported_tdp { struct kvm *kvm; u32 as_id; + + /* protect importers list */ + spinlock_t importer_lock; + struct list_head importers; + /* head at kvm->exported_tdp_list */ struct list_head list_node; }; diff --git a/virt/kvm/tdp_fd.c b/virt/kvm/tdp_fd.c index 7e68199ea9643..3271da1a4b2c1 100644 --- a/virt/kvm/tdp_fd.c +++ b/virt/kvm/tdp_fd.c @@ -13,6 +13,13 @@ static inline int is_tdp_fd_file(struct file *file); static const struct file_operations kvm_tdp_fd_fops; static const struct kvm_exported_tdp_ops exported_tdp_ops; +struct kvm_tdp_importer { + struct kvm_tdp_importer_ops *ops; + void *data; + struct list_head node; +}; +static void kvm_tdp_unregister_all_importers(struct kvm_exported_tdp *tdp); + int kvm_create_tdp_fd(struct kvm *kvm, struct kvm_create_tdp_fd *ct) { struct kvm_exported_tdp *tdp; @@ -56,6 +63,9 @@ int kvm_create_tdp_fd(struct kvm *kvm, struct kvm_create_tdp_fd *ct) if (ret) goto out; + INIT_LIST_HEAD(&tdp->importers); + spin_lock_init(&tdp->importer_lock); + tdp_fd->file = anon_inode_getfile("tdp_fd", &kvm_tdp_fd_fops, tdp_fd, O_RDWR | O_CLOEXEC); if (!tdp_fd->file) { @@ -107,6 +117,7 @@ static int kvm_tdp_fd_release(struct inode *inode, struct file *file) list_del(&tdp->list_node); spin_unlock(&tdp->kvm->exported_tdplist_lock); + kvm_tdp_unregister_all_importers(tdp); kvm_arch_exported_tdp_destroy(tdp); kvm_put_kvm(tdp->kvm); kfree(tdp); @@ -141,12 +152,67 @@ static inline int is_tdp_fd_file(struct file *file) static int kvm_tdp_register_importer(struct kvm_tdp_fd *tdp_fd, struct kvm_tdp_importer_ops *ops, void *data) { - return -EOPNOTSUPP; + struct kvm_tdp_importer *importer, *tmp; + struct kvm_exported_tdp *tdp; + + if (!tdp_fd || !tdp_fd->priv || !ops) + return -EINVAL; + + tdp = tdp_fd->priv; + importer = kzalloc(sizeof(*importer), GFP_KERNEL); + if (!importer) + return -ENOMEM; + + spin_lock(&tdp->importer_lock); + list_for_each_entry(tmp, &tdp->importers, node) { + if (tmp->ops != ops) + continue; + + kfree(importer); + spin_unlock(&tdp->importer_lock); + return -EBUSY; + } + + importer->ops = ops; + importer->data = data; + list_add(&importer->node, &tdp->importers); + + spin_unlock(&tdp->importer_lock); + + return 0; } static void kvm_tdp_unregister_importer(struct kvm_tdp_fd *tdp_fd, struct kvm_tdp_importer_ops *ops) { + struct kvm_tdp_importer *importer, *n; + struct kvm_exported_tdp *tdp; + + if (!tdp_fd || !tdp_fd->priv) + return; + + tdp = tdp_fd->priv; + spin_lock(&tdp->importer_lock); + list_for_each_entry_safe(importer, n, &tdp->importers, node) { + if (importer->ops != ops) + continue; + + list_del(&importer->node); + kfree(importer); + } + spin_unlock(&tdp->importer_lock); +} + +static void kvm_tdp_unregister_all_importers(struct kvm_exported_tdp *tdp) +{ + struct kvm_tdp_importer *importer, *n; + + spin_lock(&tdp->importer_lock); + list_for_each_entry_safe(importer, n, &tdp->importers, node) { + list_del(&importer->node); + kfree(importer); + } + spin_unlock(&tdp->importer_lock); } static void *kvm_tdp_get_metadata(struct kvm_tdp_fd *tdp_fd) From patchwork Sat Dec 2 09:18:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476838 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="kTCuUk0j" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D5BC5181; Sat, 2 Dec 2023 01:47:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510432; x=1733046432; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=zgOH3vLMDlIUcyJqzB7ENWd/3UPYCaFUNvwD3NOa7Mg=; b=kTCuUk0jF3Cet0kEc1nT9o6WBuMhdOP176iSHOBk3UCAA0fNXah5apdy MqtYogUNLf78NWVGbQ2frnCPgpngNy17SrYzTh8lkwU6nKKZC+EVP/Ime 6r+ZsgqT3zOZ2idF3yzkhjjiwCabX+fXcqC6iXQrayh3Dp/XhO2l3zjN9 E2bMkb3VIEg0p/j/bKgZBIx56dCAhdhLvNxJDQynsBB5rnKHN+j34R5Us 2z6ZAKjpXTDEpzSwiFG0Xnt/IojXPengQylXDhoeJDd8MZ8cr5HVDB4nQ QOkIdRa13YL80aNUV00PTkVx5XyjWaagoNJdaXr0qH8melomxngcGRSZu g==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="474342" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="474342" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:47:11 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="18037347" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:47:08 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 07/42] KVM: Forward page fault requests to arch specific code for exported TDP Date: Sat, 2 Dec 2023 17:18:12 +0800 Message-Id: <20231202091812.13830-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Implement .fault op of KVM TDP FD object and pass page fault requests from importers of KVM TDP FD to KVM arch specific code. Since the thread for importers to call .fault op is not vCPU thread and could be a kernel thread, thread "mm" is checked and kthread_use_mm() are called when necessary. Signed-off-by: Yan Zhao --- include/linux/kvm_host.h | 9 +++++++++ virt/kvm/tdp_fd.c | 28 +++++++++++++++++++++++++++- 2 files changed, 36 insertions(+), 1 deletion(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index f73d32eef8833..b76919eec9b72 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2346,6 +2346,8 @@ struct kvm_exported_tdp { #ifdef __KVM_HAVE_ARCH_EXPORTED_TDP int kvm_arch_exported_tdp_init(struct kvm *kvm, struct kvm_exported_tdp *tdp); void kvm_arch_exported_tdp_destroy(struct kvm_exported_tdp *tdp); +int kvm_arch_fault_exported_tdp(struct kvm_exported_tdp *tdp, unsigned long gfn, + struct kvm_tdp_fault_type type); #else static inline int kvm_arch_exported_tdp_init(struct kvm *kvm, struct kvm_exported_tdp *tdp) @@ -2355,6 +2357,13 @@ static inline int kvm_arch_exported_tdp_init(struct kvm *kvm, static inline void kvm_arch_exported_tdp_destroy(struct kvm_exported_tdp *tdp) { } + +static inline int kvm_arch_fault_exported_tdp(struct kvm_exported_tdp *tdp, + unsigned long gfn, + struct kvm_tdp_fault_type type) +{ + return -EOPNOTSUPP; +} #endif /* __KVM_HAVE_ARCH_EXPORTED_TDP */ #endif /* CONFIG_HAVE_KVM_EXPORTED_TDP */ diff --git a/virt/kvm/tdp_fd.c b/virt/kvm/tdp_fd.c index 3271da1a4b2c1..02c9066391ebe 100644 --- a/virt/kvm/tdp_fd.c +++ b/virt/kvm/tdp_fd.c @@ -223,7 +223,33 @@ static void *kvm_tdp_get_metadata(struct kvm_tdp_fd *tdp_fd) static int kvm_tdp_fault(struct kvm_tdp_fd *tdp_fd, struct mm_struct *mm, unsigned long gfn, struct kvm_tdp_fault_type type) { - return -EOPNOTSUPP; + bool kthread = current->mm == NULL; + int ret = -EINVAL; + + if (!tdp_fd || !tdp_fd->priv || !tdp_fd->priv->kvm) + return -EINVAL; + + if (!type.read && !type.write && !type.exec) + return -EINVAL; + + if (!mm || tdp_fd->priv->kvm->mm != mm) + return -EINVAL; + + if (!mmget_not_zero(mm)) + return -EPERM; + + if (kthread) + kthread_use_mm(mm); + else if (current->mm != mm) + goto out; + + ret = kvm_arch_fault_exported_tdp(tdp_fd->priv, gfn, type); + + if (kthread) + kthread_unuse_mm(mm); +out: + mmput(mm); + return ret; } static const struct kvm_exported_tdp_ops exported_tdp_ops = { From patchwork Sat Dec 2 09:18:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476839 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="e4OnIPat" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4A29E181; Sat, 2 Dec 2023 01:47:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510471; x=1733046471; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=fHaCSLze+0JkRVgR65Mc5+ZZ7ruPmv2R2QFTSzLWrGg=; b=e4OnIPatRMyAfv2tmRqGQ1vaQsxcj35oTePJ234Poi9Vsfz2iDUfRCVk Ocn3KhYphXUxKxgyW5ULzwWMQacit7VNtOogJfO3dV2+N9JkD+tATbcu1 trr5o91Bwx4X0qfkxE+lWNfotXccWJr584lYnmXJIlIoVrEHpchqSaycA CuBMTz2Sjsdh73clU4KqY6eLPoBxaGLiNRDsKTcCNo5g/TyXAKa2lxFJ8 5k8ld7g8VEZ/tnhMeviV/ACW8iahZXgCGZmGFfg2JowxsQWfShYT32jY5 QuzoOYZxfQflqxNT4TzP6+ZbAc+06KWSla4kqwAh/A9oAVdCyhA+6GmjE w==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="6886424" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="6886424" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:47:51 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="746278898" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="746278898" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:47:47 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 08/42] KVM: Add a helper to notify importers that KVM exported TDP is flushed Date: Sat, 2 Dec 2023 17:18:50 +0800 Message-Id: <20231202091850.13890-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Introduce a helper in KVM TDP FD to notify importers that TDP page tables are invalidated. This helper will be called by arch code (e.g. VMX specific code). Currently, the helper will notify all importers of all KVM exported TDPs. Signed-off-by: Yan Zhao --- include/linux/kvm_host.h | 3 +++ virt/kvm/tdp_fd.c | 38 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 41 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index b76919eec9b72..a8af95194767f 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2366,5 +2366,8 @@ static inline int kvm_arch_fault_exported_tdp(struct kvm_exported_tdp *tdp, } #endif /* __KVM_HAVE_ARCH_EXPORTED_TDP */ +void kvm_tdp_fd_flush_notify(struct kvm *kvm, unsigned long gfn, unsigned long npages); + #endif /* CONFIG_HAVE_KVM_EXPORTED_TDP */ + #endif diff --git a/virt/kvm/tdp_fd.c b/virt/kvm/tdp_fd.c index 02c9066391ebe..8c16af685a061 100644 --- a/virt/kvm/tdp_fd.c +++ b/virt/kvm/tdp_fd.c @@ -304,3 +304,41 @@ void kvm_tdp_fd_put(struct kvm_tdp_fd *tdp_fd) fput(tdp_fd->file); } EXPORT_SYMBOL_GPL(kvm_tdp_fd_put); + +static void kvm_tdp_fd_flush(struct kvm_exported_tdp *tdp, unsigned long gfn, + unsigned long npages) +{ +#define INVALID_NPAGES (-1UL) + bool all = (gfn == 0) && (npages == INVALID_NPAGES); + struct kvm_tdp_importer *importer; + unsigned long start, size; + + if (all) { + start = 0; + size = -1UL; + } else { + start = gfn << PAGE_SHIFT; + size = npages << PAGE_SHIFT; + } + + spin_lock(&tdp->importer_lock); + + list_for_each_entry(importer, &tdp->importers, node) { + if (!importer->ops->invalidate) + continue; + + importer->ops->invalidate(importer->data, start, size); + } + spin_unlock(&tdp->importer_lock); +} + +void kvm_tdp_fd_flush_notify(struct kvm *kvm, unsigned long gfn, unsigned long npages) +{ + struct kvm_exported_tdp *tdp; + + spin_lock(&kvm->exported_tdplist_lock); + list_for_each_entry(tdp, &kvm->exported_tdp_list, list_node) + kvm_tdp_fd_flush(tdp, gfn, npages); + spin_unlock(&kvm->exported_tdplist_lock); +} +EXPORT_SYMBOL_GPL(kvm_tdp_fd_flush_notify); From patchwork Sat Dec 2 09:19:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476840 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QSQzl9EO" Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B44B172A; Sat, 2 Dec 2023 01:48:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510502; x=1733046502; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=VQNKqC4SiqPXJcfo/UghgDuOyfAhw8F6DULvr3iToMM=; b=QSQzl9EOyKhF+DQmNkW6v8gzChahCX1uo71w3nP8Xu6J3l25mC/LDsYI ct2r1JO5D5R4AhBjF4TfO4c1azhHMTzA6XCmiWFLHZ2EvXyQIy2IGI/cN MJ/JwdhktWJE8RfFCjXgsocv0I6BF7e7RdR0otX/HNQ0iZwbohS8FFqYh ZKel5jg873KEGJjFoe3tLJVhtN4RPcZ5SFReBl9+r/njaaqDQ1UGiCrOV mBNliID0ZSrswl9G2/am8snq9Kfsch3brJ8ittK0n1qPGhG22pg8mm6ge s2bPnYIGFwF9VKI7e/UWnQAlwxei2ihJIU7J9ZVS1jJMFdXNpUsSZL72o Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="390756693" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="390756693" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:48:22 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="860818540" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="860818540" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:48:18 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 09/42] iommu: Add IOMMU_DOMAIN_KVM Date: Sat, 2 Dec 2023 17:19:24 +0800 Message-Id: <20231202091924.13947-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Introduce a new domain type to share stage 2 mappings from KVM. Paging strcture allocation/free of this new domain are managed by KVM. IOMMU side just gets page table root address from KVM via parsing vendor specific data passed in from KVM through IOMMUFD and sets it to the IOMMU hardware. This new domain can be allocated by domain_alloc_kvm op, and attached to a device through the existing iommu_attach_device/group() interfaces. Page mapping/unmapping are managed by KVM too, therefore map/unmap ops are not implemented. Signed-off-by: Yan Zhao --- include/linux/iommu.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index c79378833c758..9ecee72e2d6c4 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -171,6 +171,8 @@ struct iommu_domain_geometry { #define __IOMMU_DOMAIN_NESTED (1U << 6) /* User-managed address space nested on a stage-2 translation */ +#define __IOMMU_DOMAIN_KVM (1U << 7) /* KVM-managed stage-2 translation */ + #define IOMMU_DOMAIN_ALLOC_FLAGS ~__IOMMU_DOMAIN_DMA_FQ /* * This are the possible domain-types @@ -187,6 +189,7 @@ struct iommu_domain_geometry { * invalidation. * IOMMU_DOMAIN_SVA - DMA addresses are shared process addresses * represented by mm_struct's. + * IOMMU_DOMAIN_KVM - DMA mappings on stage 2, managed by KVM. * IOMMU_DOMAIN_PLATFORM - Legacy domain for drivers that do their own * dma_api stuff. Do not use in new drivers. */ @@ -201,6 +204,7 @@ struct iommu_domain_geometry { #define IOMMU_DOMAIN_SVA (__IOMMU_DOMAIN_SVA) #define IOMMU_DOMAIN_PLATFORM (__IOMMU_DOMAIN_PLATFORM) #define IOMMU_DOMAIN_NESTED (__IOMMU_DOMAIN_NESTED) +#define IOMMU_DOMAIN_KVM (__IOMMU_DOMAIN_KVM) struct iommu_domain { unsigned type; From patchwork Sat Dec 2 09:20:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476841 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="UH2Q45X6" Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CAC4C181; Sat, 2 Dec 2023 01:49:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510545; x=1733046545; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=hOSJlPe1dfesUGDszntE2ySqLMhZ74ABDSSs+Yf065E=; b=UH2Q45X6iGIkbfzmkiFKTDoRJ3efHQne4QGlODGJghZe/cMifXJu7CEF 4VkcJnLoPg/rSDMkEG9e6bYogWkfZjlwfaBMnnPPd7bT23iocz+lyp0Vu CHG0loxZq0CfmJyjH+MInDs6M+0H5BIv9sk9Ey4+poykFickaTuguNDLe vNX8k1KG9QPnLRjQW7jjSVahPrWmwI4zHhMYQYGr5cL4H1eOpiYuTtpBm LxFCLOmrCOEkSeix+EuBD3+YKcTfT97bGM1U+37mPEIuZilIAS5o4ya/d aBne3Jyhz7ObNjt5pdv1OxYxqeMl+KISILmVz4+hDrtKqwjXuC+VztpAP A==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="373774703" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="373774703" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:49:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="1017293275" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="1017293275" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:49:02 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 10/42] iommu: Add new iommu op to create domains managed by KVM Date: Sat, 2 Dec 2023 17:20:07 +0800 Message-Id: <20231202092007.14026-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Introduce a new iommu_domain op to create domains managed by KVM through IOMMUFD. These domains have a few different properties compares to kernel owned domains and user owned domains: - They must not be PAGING domains. Page mapping/unmapping is controlled by KVM. - They must be stage 2 mappings translating GPA to HPA. - Paging structure allocation/free is not managed by IOMMU driver, but by KVM. - TLBs flushes are notified by KVM. The new op clearly says the domain is being created by IOMMUFD. A driver specific structure to the meta data of paging structures from KVM is passed in via the op param "data". IOMMU drivers that cannot support VFIO/IOMMUFD should not support this op. This new op for now is only supposed to be used by IOMMUFD, hence no wrapper for it. IOMMUFD would call the callback directly. As for domain free, IOMMUFD would use iommu_domain_free(). Signed-off-by: Yan Zhao --- include/linux/iommu.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 9ecee72e2d6c4..0ce23ee399d35 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -522,6 +522,13 @@ __iommu_copy_struct_from_user_array(void *dst_data, * @domain_alloc_paging: Allocate an iommu_domain that can be used for * UNMANAGED, DMA, and DMA_FQ domain types. * @domain_alloc_sva: Allocate an iommu_domain for Shared Virtual Addressing. + * @domain_alloc_kvm: Allocate an iommu domain with type IOMMU_DOMAIN_KVM. + * It's called by IOMMUFD and must fully initialize the new + * domain before return. + * The @data is of type "const void *" whose format is defined + * in kvm arch specific header "asm/kvm_exported_tdp.h". + * Unpon success, domain of type IOMMU_DOMAIN_KVM is returned. + * Upon failure, ERR_PTR is returned. * @probe_device: Add device to iommu driver handling * @release_device: Remove device from iommu driver handling * @probe_finalize: Do final setup work after the device is added to an IOMMU @@ -564,6 +571,8 @@ struct iommu_ops { struct iommu_domain *(*domain_alloc_paging)(struct device *dev); struct iommu_domain *(*domain_alloc_sva)(struct device *dev, struct mm_struct *mm); + struct iommu_domain *(*domain_alloc_kvm)(struct device *dev, u32 flags, + const void *data); struct iommu_device *(*probe_device)(struct device *dev); void (*release_device)(struct device *dev); From patchwork Sat Dec 2 09:20:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476842 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="avJdL1ij" Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0D84134; Sat, 2 Dec 2023 01:49:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510584; x=1733046584; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=Lo5VlYM9pcr8va0tr3dWmZ/2vpXpVd1PrW/H2X7PBXI=; b=avJdL1ijTwbTKlTIlEeFphxcjD+BQDaAQYMOEHo6LKb+6R77offDuTjp Kv18skulvLIg3B+yEDvGTXVUaM/kssdH+Eo99j/S0QzBygQaj2I9MqWyx tIqtJFtq0rp1fHe4syDerp9AWm2m1aW6IK7A9/A+alkLEABb8VM4Ak6yO zasIih/PIMSYxTTLGBX8YYDBKD/lO/7EXfa753eLjb/xwZm6CVvfm2q65 TpcI9w+uy25oEr/7v2N/TGNO9KOKSfcDvrp83IENeFcEWJ2GD4kJpf//I i/VaqBSmyv1NCCMibMeJZWSiHJXmOoAX+dnAeDeNRz4QxKLSTWp3riT40 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="392459180" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="392459180" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:49:39 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="887937515" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="887937515" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:49:36 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 11/42] iommu: Add new domain op cache_invalidate_kvm Date: Sat, 2 Dec 2023 17:20:41 +0800 Message-Id: <20231202092041.14084-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: On KVM invalidates mappings that are shared to IOMMU stage 2 paging structures, IOMMU driver needs to invalidate hardware TLBs accordingly. The new op cache_invalidate_kvm is called from IOMMUFD to invalidate hardware TLBs upon receiving invalidation notifications from KVM. Signed-off-by: Yan Zhao --- include/linux/iommu.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 0ce23ee399d35..0b056d5a6b3a3 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -636,6 +636,9 @@ struct iommu_ops { * forward a driver specific error code to user space. * Both the driver data structure and the error code * must be defined in include/uapi/linux/iommufd.h + * @cache_invalidate_kvm: Synchronously flush hardware TLBs for KVM managed + * stage 2 IO page tables. + * The @domain must be IOMMU_DOMAIN_KVM. * @iova_to_phys: translate iova to physical address * @enforce_cache_coherency: Prevent any kind of DMA from bypassing IOMMU_CACHE, * including no-snoop TLPs on PCIe or other platform @@ -665,6 +668,8 @@ struct iommu_domain_ops { int (*cache_invalidate_user)(struct iommu_domain *domain, struct iommu_user_data_array *array, u32 *error_code); + void (*cache_invalidate_kvm)(struct iommu_domain *domain, + unsigned long iova, unsigned long size); phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova); From patchwork Sat Dec 2 09:21:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476843 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="bynbUnr1" Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7D34197; Sat, 2 Dec 2023 01:50:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510612; x=1733046612; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=WCJPtEQsnD+jMdSa58ByeKd9Udd++7gtpjWFBhxkT8o=; b=bynbUnr1p6FdWy3LqmAyL6aefJu7Unp06M5joR+PftSgrlnnWLwr0Pgv yo9VKa/GofQMK2WFUmylPmM/3N9/GROo4q1AVgSFQTP3HXXUBvfF9anl0 pYb3RSobbyttdC9zETXEtH4txe9EZrZlfk8OxDuO0rEmir/9txVQEXhDY oKrxNtSkSjRDisx2NAB7Se6YcxwHG14Gk41KEn2b97W4LgnUWHSJeRNuX IZccfX9H2OhqUSnX5pu4jhPVPYUei1hpw2CXmYsRuHs2ykHIEIXV7RQNQ v96KWXkGjNym3A6/gE2K9kASPv3bG61SMMVY4aeHg/q5lW1v9WNC3oB/B g==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="479794120" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="479794120" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:50:12 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="11414203" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:50:09 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 12/42] iommufd: Introduce allocation data info and flag for KVM managed HWPT Date: Sat, 2 Dec 2023 17:21:13 +0800 Message-Id: <20231202092113.14141-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Add allocation data info iommu_hwpt_kvm_info to allow IOMMUFD to create a KVM managed HWPT via ioctl IOMMU_HWPT_ALLOC. As KVM managed HWPT serves as stage-2 page tables whose paging structure and page mapping/unmapping are managed by KVM, there's no need to connect KVM managed HWPT to IOAS or parent HWPT. Signed-off-by: Yan Zhao --- include/uapi/linux/iommufd.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 71c009cc614a4..08570f3a751fc 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -390,6 +390,15 @@ struct iommu_hwpt_vtd_s1 { __u32 __reserved; }; +/** + * struct iommu_hwpt_kvm_info - KVM managed stage-2 page table info + * (IOMMU_HWPT_DATA_KVM) + * @fd: The fd of the page table shared from KVM + */ +struct iommu_hwpt_kvm_info { + __aligned_u64 fd; +}; + /** * struct iommu_hwpt_arm_smmuv3 - ARM SMMUv3 Context Descriptor Table info * (IOMMU_HWPT_DATA_ARM_SMMUV3) @@ -413,11 +422,13 @@ struct iommu_hwpt_arm_smmuv3 { * @IOMMU_HWPT_DATA_NONE: no data * @IOMMU_HWPT_DATA_VTD_S1: Intel VT-d stage-1 page table * @IOMMU_HWPT_DATA_ARM_SMMUV3: ARM SMMUv3 Context Descriptor Table + * @IOMMU_HWPT_DATA_KVM: KVM managed stage-2 page table */ enum iommu_hwpt_data_type { IOMMU_HWPT_DATA_NONE, IOMMU_HWPT_DATA_VTD_S1, IOMMU_HWPT_DATA_ARM_SMMUV3, + IOMMU_HWPT_DATA_KVM, }; /** @@ -447,6 +458,10 @@ enum iommu_hwpt_data_type { * must be set to a pre-defined type corresponding to an I/O page table * type supported by the underlying IOMMU hardware. * + * A KVM-managed HWPT will be created if @data_type is IOMMU_HWPT_DATA_KVM. + * @pt_id is not queried if data_type is IOMMU_HWPT_DATA_KVM because KVM-managed + * HWPT doesn't have any IOAS or parent HWPT associated. + * * If the @data_type is set to IOMMU_HWPT_DATA_NONE, @data_len and * @data_uptr should be zero. Otherwise, both @data_len and @data_uptr * must be given. From patchwork Sat Dec 2 09:21:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476844 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="IEVmV/1g" Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 236C81A4; Sat, 2 Dec 2023 01:50:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510647; x=1733046647; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=rqH35GdOrKoZEkvGITnBvH+ikDglRlVlt01Gqn8iCuQ=; b=IEVmV/1gdxf356DBwO04fRRnMuNVHGI2FFm259u7fyHoV3w9F0Xt5KLg IlVhnvkJsXcDPS60fWmJLQ/OU2eTm3JusESpxxuII8RJ/LePJCqXHqVGH /p9eFpDJdnDciK/N8xGG+GEANlJL6bYQ+D/6GVUtn2pHWe/iODzI8Pg98 lBtNTlGNr2FedY/wpwWIqKx35TRi1AOJC6K896JJlc1Q2huCE7NT950Sc r5hahkRLUP+OzWlBaWDEaGikURFzq/2uhtx1NzhHbdBQ34+tUHiapl7+9 5gDvK8sJKvIZUpVJiLLH3QEUmCVVDGcej+r9yjK+YQ27wu3mSHDpsyk1X w==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="479794143" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="479794143" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:50:46 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="11414277" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:50:43 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 13/42] iommufd: Add a KVM HW pagetable object Date: Sat, 2 Dec 2023 17:21:47 +0800 Message-Id: <20231202092147.14208-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Add new obj type IOMMUFD_OBJ_HWPT_KVM for KVM HW page tables, which correspond to iommu stage 2 domains whose paging strcutures and mappings are managed by KVM. Extend the IOMMU_HWPT_ALLOC ioctl to accept KVM HW page table specific data of "struct iommu_hwpt_kvm_info". The real allocator iommufd_hwpt_kvm_alloc() is now an empty function and will be implemented in next patch when config IOMMUFD_KVM_HWPT is on. Signed-off-by: Yan Zhao --- drivers/iommu/iommufd/device.c | 13 +++++---- drivers/iommu/iommufd/hw_pagetable.c | 29 +++++++++++++++++++- drivers/iommu/iommufd/iommufd_private.h | 35 +++++++++++++++++++++++++ drivers/iommu/iommufd/main.c | 4 +++ 4 files changed, 75 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 59d3a07300d93..83af6b7e2784b 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -629,7 +629,8 @@ static int iommufd_device_change_pt(struct iommufd_device *idev, u32 *pt_id, switch (pt_obj->type) { case IOMMUFD_OBJ_HWPT_NESTED: - case IOMMUFD_OBJ_HWPT_PAGING: { + case IOMMUFD_OBJ_HWPT_PAGING: + case IOMMUFD_OBJ_HWPT_KVM: { struct iommufd_hw_pagetable *hwpt = container_of(pt_obj, struct iommufd_hw_pagetable, obj); @@ -667,8 +668,9 @@ static int iommufd_device_change_pt(struct iommufd_device *idev, u32 *pt_id, /** * iommufd_device_attach - Connect a device to an iommu_domain * @idev: device to attach - * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HWPT_PAGING - * Output the IOMMUFD_OBJ_HWPT_PAGING ID + * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HWPT_PAGING, or + * IOMMUFD_OBJ_HWPT_KVM + * Output the IOMMUFD_OBJ_HWPT_PAGING ID or IOMMUFD_OBJ_HWPT_KVM ID * * This connects the device to an iommu_domain, either automatically or manually * selected. Once this completes the device could do DMA. @@ -696,8 +698,9 @@ EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, IOMMUFD); /** * iommufd_device_replace - Change the device's iommu_domain * @idev: device to change - * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HWPT_PAGING - * Output the IOMMUFD_OBJ_HWPT_PAGING ID + * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HWPT_PAGING, or + * IOMMUFD_OBJ_HWPT_KVM + * Output the IOMMUFD_OBJ_HWPT_PAGING ID or IOMMUFD_OBJ_HWPT_KVM ID * * This is the same as:: * diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 367459d92f696..c8430ec42cdf8 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -273,6 +273,31 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) if (IS_ERR(idev)) return PTR_ERR(idev); + if (cmd->data_type == IOMMU_HWPT_DATA_KVM) { + struct iommu_hwpt_kvm_info kvm_data; + struct iommufd_hwpt_kvm *hwpt_kvm; + + if (!cmd->data_len || cmd->data_len != sizeof(kvm_data) || + !cmd->data_uptr) { + rc = -EINVAL; + goto out_put_idev; + } + rc = copy_struct_from_user(&kvm_data, sizeof(kvm_data), + u64_to_user_ptr(cmd->data_uptr), + cmd->data_len); + if (rc) + goto out_put_idev; + + hwpt_kvm = iommufd_hwpt_kvm_alloc(ucmd->ictx, idev, cmd->flags, + &kvm_data); + if (IS_ERR(hwpt_kvm)) { + rc = PTR_ERR(hwpt_kvm); + goto out_put_idev; + } + hwpt = &hwpt_kvm->common; + goto out_respond; + } + pt_obj = iommufd_get_object(ucmd->ictx, cmd->pt_id, IOMMUFD_OBJ_ANY); if (IS_ERR(pt_obj)) { rc = -EINVAL; @@ -310,6 +335,7 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) goto out_put_pt; } +out_respond: cmd->out_hwpt_id = hwpt->obj.id; rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); if (rc) @@ -323,7 +349,8 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) if (ioas) mutex_unlock(&ioas->mutex); out_put_pt: - iommufd_put_object(pt_obj); + if (cmd->data_type != IOMMU_HWPT_DATA_KVM) + iommufd_put_object(pt_obj); out_put_idev: iommufd_put_object(&idev->obj); return rc; diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 160521800d9b4..a46a6e3e537f9 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -125,6 +125,7 @@ enum iommufd_object_type { IOMMUFD_OBJ_DEVICE, IOMMUFD_OBJ_HWPT_PAGING, IOMMUFD_OBJ_HWPT_NESTED, + IOMMUFD_OBJ_HWPT_KVM, IOMMUFD_OBJ_IOAS, IOMMUFD_OBJ_ACCESS, #ifdef CONFIG_IOMMUFD_TEST @@ -266,17 +267,33 @@ struct iommufd_hwpt_nested { struct iommufd_hwpt_paging *parent; }; +struct iommufd_hwpt_kvm { + struct iommufd_hw_pagetable common; + void *context; +}; + static inline bool hwpt_is_paging(struct iommufd_hw_pagetable *hwpt) { return hwpt->obj.type == IOMMUFD_OBJ_HWPT_PAGING; } +static inline bool hwpt_is_kvm(struct iommufd_hw_pagetable *hwpt) +{ + return hwpt->obj.type == IOMMUFD_OBJ_HWPT_KVM; +} + static inline struct iommufd_hwpt_paging * to_hwpt_paging(struct iommufd_hw_pagetable *hwpt) { return container_of(hwpt, struct iommufd_hwpt_paging, common); } +static inline struct iommufd_hwpt_kvm * +to_hwpt_kvm(struct iommufd_hw_pagetable *hwpt) +{ + return container_of(hwpt, struct iommufd_hwpt_kvm, common); +} + static inline struct iommufd_hwpt_paging * iommufd_get_hwpt_paging(struct iommufd_ucmd *ucmd, u32 id) { @@ -413,4 +430,22 @@ static inline bool iommufd_selftest_is_mock_dev(struct device *dev) return false; } #endif + +struct iommu_hwpt_kvm_info; +static inline struct iommufd_hwpt_kvm * +iommufd_hwpt_kvm_alloc(struct iommufd_ctx *ictx, + struct iommufd_device *idev, u32 flags, + const struct iommu_hwpt_kvm_info *kvm_data) +{ + return ERR_PTR(-EOPNOTSUPP); +} + +static inline void iommufd_hwpt_kvm_abort(struct iommufd_object *obj) +{ +} + +static inline void iommufd_hwpt_kvm_destroy(struct iommufd_object *obj) +{ +} + #endif diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 6edef860f91cc..0798c1279133f 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -499,6 +499,10 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { .destroy = iommufd_hwpt_nested_destroy, .abort = iommufd_hwpt_nested_abort, }, + [IOMMUFD_OBJ_HWPT_KVM] = { + .destroy = iommufd_hwpt_kvm_destroy, + .abort = iommufd_hwpt_kvm_abort, + }, #ifdef CONFIG_IOMMUFD_TEST [IOMMUFD_OBJ_SELFTEST] = { .destroy = iommufd_selftest_destroy, From patchwork Sat Dec 2 09:22:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476845 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="aDfxH2Fv" Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 020C6134; Sat, 2 Dec 2023 01:51:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510675; x=1733046675; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=oxDYXG3AJzaeGNtpp4op0WAgqkteWZvXNKWbORIdg58=; b=aDfxH2FvMZzJm83tpn9m4uJHPdlpv+39z7N8v06ZXJ12HyePB5DL4shj vgl/li599ggQ0qiZkpTsa15owob3Edxgg4DEJGhgCAO/Ztsn+gzg8sQKn UFDJCQfe2Lqt81C9vUxrEFRxhIS0FuNj6PpylAseERcwQe8Bi7gpghXQd jvqugmLodeVTPmz3f3iNdATbhDvPwRx3eFyRh3sQ3g8/VAKEo82E6UKmK LPZnzeSXWh56DH11+aiMZ/m0UL8r/Eus2BgfACmpjYTkujxHNkjKOp2QH K0sLIDIEDQJ32X6E+VxJmRhHoBsJNklF/NaHqYQDRSTchxHDrYR9/U4+p A==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="479794167" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="479794167" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:51:15 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="11414337" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:51:12 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 14/42] iommufd: Enable KVM HW page table object to be proxy between KVM and IOMMU Date: Sat, 2 Dec 2023 17:22:16 +0800 Message-Id: <20231202092216.14278-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Enable IOMMUFD KVM HW page table object to serve as proxy between KVM and IOMMU driver. Config IOMMUFD_KVM_HWPT is added to turn on/off this ability. KVM HW page table object first gets KVM TDP fd object via KVM exported interface kvm_tdp_fd_get() and then queries KVM for vendor meta data of page tables exported (shared) by KVM. It then passes the meta data to IOMMU driver to create a IOMMU_DOMAIN_KVM domain via op domain_alloc_kvm. IOMMU driver is responsible to check compatibility between IOMMU hardware and the KVM exported page tables. After successfully creating IOMMU_DOMAIN_KVM domain, IOMMUFD KVM HW page table object registers invalidation callback to KVM to receive invalidation notifications. It then passes the notification to IOMMU driver via op cache_invalidate_kvm to invalidate hardware TLBs. Signed-off-by: Yan Zhao --- drivers/iommu/iommufd/Kconfig | 10 ++ drivers/iommu/iommufd/Makefile | 1 + drivers/iommu/iommufd/hw_pagetable_kvm.c | 183 +++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 9 ++ 4 files changed, 203 insertions(+) create mode 100644 drivers/iommu/iommufd/hw_pagetable_kvm.c diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig index 99d4b075df49e..d79e0c1e00a4d 100644 --- a/drivers/iommu/iommufd/Kconfig +++ b/drivers/iommu/iommufd/Kconfig @@ -32,6 +32,16 @@ config IOMMUFD_VFIO_CONTAINER Unless testing IOMMUFD, say N here. +config IOMMUFD_KVM_HWPT + bool "Supports KVM managed HW page tables" + default n + help + Selecting this option will allow IOMMUFD to create IOMMU stage 2 + page tables whose paging structure and mappings are managed by + KVM MMU. IOMMUFD serves as proxy between KVM and IOMMU driver to + allow IOMMU driver to get paging structure meta data and cache + invalidate notifications from KVM. + config IOMMUFD_TEST bool "IOMMU Userspace API Test support" depends on DEBUG_KERNEL diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile index 34b446146961c..ae1e0b5c300dc 100644 --- a/drivers/iommu/iommufd/Makefile +++ b/drivers/iommu/iommufd/Makefile @@ -8,6 +8,7 @@ iommufd-y := \ pages.o \ vfio_compat.o +iommufd-$(CONFIG_IOMMUFD_KVM_HWPT) += hw_pagetable_kvm.o iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o obj-$(CONFIG_IOMMUFD) += iommufd.o diff --git a/drivers/iommu/iommufd/hw_pagetable_kvm.c b/drivers/iommu/iommufd/hw_pagetable_kvm.c new file mode 100644 index 0000000000000..e0e205f384ed5 --- /dev/null +++ b/drivers/iommu/iommufd/hw_pagetable_kvm.c @@ -0,0 +1,183 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include + +#include "../iommu-priv.h" +#include "iommufd_private.h" + +static void iommufd_kvmtdp_invalidate(void *data, + unsigned long start, unsigned long size) +{ + void (*invalidate_fn)(struct iommu_domain *domain, + unsigned long iova, unsigned long size); + struct iommufd_hw_pagetable *hwpt = data; + + if (!hwpt || !hwpt_is_kvm(hwpt)) + return; + + invalidate_fn = hwpt->domain->ops->cache_invalidate_kvm; + + if (!invalidate_fn) + return; + + invalidate_fn(hwpt->domain, start, size); + +} + +struct kvm_tdp_importer_ops iommufd_import_ops = { + .invalidate = iommufd_kvmtdp_invalidate, +}; + +static inline int kvmtdp_register(struct kvm_tdp_fd *tdp_fd, void *data) +{ + if (!tdp_fd->ops->register_importer || !tdp_fd->ops->register_importer) + return -EOPNOTSUPP; + + return tdp_fd->ops->register_importer(tdp_fd, &iommufd_import_ops, data); +} + +static inline void kvmtdp_unregister(struct kvm_tdp_fd *tdp_fd) +{ + WARN_ON(!tdp_fd->ops->unregister_importer); + + tdp_fd->ops->unregister_importer(tdp_fd, &iommufd_import_ops); +} + +static inline void *kvmtdp_get_metadata(struct kvm_tdp_fd *tdp_fd) +{ + if (!tdp_fd->ops->get_metadata) + return ERR_PTR(-EOPNOTSUPP); + + return tdp_fd->ops->get_metadata(tdp_fd); +} + +/* + * Get KVM TDP FD object and ensure tdp_fd->ops is available + */ +static inline struct kvm_tdp_fd *kvmtdp_get(int fd) +{ + struct kvm_tdp_fd *tdp_fd = NULL; + struct kvm_tdp_fd *(*get_func)(int fd) = NULL; + void (*put_func)(struct kvm_tdp_fd *) = NULL; + + get_func = symbol_get(kvm_tdp_fd_get); + + if (!get_func) + goto out; + + put_func = symbol_get(kvm_tdp_fd_put); + if (!put_func) + goto out; + + tdp_fd = get_func(fd); + if (!tdp_fd) + goto out; + + if (tdp_fd->ops) { + /* success */ + goto out; + } + + put_func(tdp_fd); + tdp_fd = NULL; + +out: + if (get_func) + symbol_put(kvm_tdp_fd_get); + + if (put_func) + symbol_put(kvm_tdp_fd_put); + + return tdp_fd; +} + +static void kvmtdp_put(struct kvm_tdp_fd *tdp_fd) +{ + void (*put_func)(struct kvm_tdp_fd *) = NULL; + + put_func = symbol_get(kvm_tdp_fd_put); + WARN_ON(!put_func); + + put_func(tdp_fd); + + symbol_put(kvm_tdp_fd_put); +} + +void iommufd_hwpt_kvm_destroy(struct iommufd_object *obj) +{ + struct kvm_tdp_fd *tdp_fd; + struct iommufd_hwpt_kvm *hwpt_kvm = + container_of(obj, struct iommufd_hwpt_kvm, common.obj); + + if (hwpt_kvm->common.domain) + iommu_domain_free(hwpt_kvm->common.domain); + + tdp_fd = hwpt_kvm->context; + kvmtdp_unregister(tdp_fd); + kvmtdp_put(tdp_fd); +} + +void iommufd_hwpt_kvm_abort(struct iommufd_object *obj) +{ + iommufd_hwpt_kvm_destroy(obj); +} + +struct iommufd_hwpt_kvm * +iommufd_hwpt_kvm_alloc(struct iommufd_ctx *ictx, + struct iommufd_device *idev, u32 flags, + const struct iommu_hwpt_kvm_info *kvm_data) +{ + + const struct iommu_ops *ops = dev_iommu_ops(idev->dev); + struct iommufd_hwpt_kvm *hwpt_kvm; + struct iommufd_hw_pagetable *hwpt; + struct kvm_tdp_fd *tdp_fd; + void *meta_data; + int rc; + + if (!ops->domain_alloc_kvm) + return ERR_PTR(-EOPNOTSUPP); + + if (kvm_data->fd < 0) + return ERR_PTR(-EINVAL); + + tdp_fd = kvmtdp_get(kvm_data->fd); + if (!tdp_fd) + return ERR_PTR(-EOPNOTSUPP); + + meta_data = kvmtdp_get_metadata(tdp_fd); + if (!meta_data || IS_ERR(meta_data)) { + rc = -EFAULT; + goto out_put_tdp; + } + + hwpt_kvm = __iommufd_object_alloc(ictx, hwpt_kvm, IOMMUFD_OBJ_HWPT_KVM, + common.obj); + if (IS_ERR(hwpt_kvm)) { + rc = PTR_ERR(hwpt_kvm); + goto out_put_tdp; + } + + hwpt_kvm->context = tdp_fd; + hwpt = &hwpt_kvm->common; + + hwpt->domain = ops->domain_alloc_kvm(idev->dev, flags, meta_data); + if (IS_ERR(hwpt->domain)) { + rc = PTR_ERR(hwpt->domain); + hwpt->domain = NULL; + goto out_abort; + } + + rc = kvmtdp_register(tdp_fd, hwpt); + if (rc) + goto out_abort; + + return hwpt_kvm; + +out_abort: + iommufd_object_abort_and_destroy(ictx, &hwpt->obj); +out_put_tdp: + kvmtdp_put(tdp_fd); + return ERR_PTR(rc); +} diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index a46a6e3e537f9..2c3149b1d5b55 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -432,6 +432,14 @@ static inline bool iommufd_selftest_is_mock_dev(struct device *dev) #endif struct iommu_hwpt_kvm_info; +#ifdef CONFIG_IOMMUFD_KVM_HWPT +struct iommufd_hwpt_kvm * +iommufd_hwpt_kvm_alloc(struct iommufd_ctx *ictx, + struct iommufd_device *idev, u32 flags, + const struct iommu_hwpt_kvm_info *kvm_data); +void iommufd_hwpt_kvm_abort(struct iommufd_object *obj); +void iommufd_hwpt_kvm_destroy(struct iommufd_object *obj); +#else static inline struct iommufd_hwpt_kvm * iommufd_hwpt_kvm_alloc(struct iommufd_ctx *ictx, struct iommufd_device *idev, u32 flags, @@ -447,5 +455,6 @@ static inline void iommufd_hwpt_kvm_abort(struct iommufd_object *obj) static inline void iommufd_hwpt_kvm_destroy(struct iommufd_object *obj) { } +#endif #endif From patchwork Sat Dec 2 09:22:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476846 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="PIS21ewV" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D7ADA198; Sat, 2 Dec 2023 01:51:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510705; x=1733046705; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=5tHj8FjpQu1zOKHpkDFSKBA08iJcnSpgfGv/y1ivWLI=; b=PIS21ewVdl6gu0+KyLc6oR27/mLImYs8GDdGT0vGnzMPTh0vaYOjxGtX KUk7TxUrdomXdrNBJFmiHBv2fkByQW97XFzS4+THiSTpHlFHW19+Ug9wt u7sWe68qaPrDiFn+nGEdFhX15piqHpgv2rQqw0/mGSBE8wAPj2V0/8L9g 5Ej/MzpswILuT8bkAMGgCH3IGeWVcrz3flIzBXDNw4lwJj7f47x/xWert vK0HQlUU0efw9Eq6r0oBNwKE7ns6QaV6UXo5g2AYVw9QGb9wiwhIFEdII lcjSsn2mEWD1OOOwbnBgmJ+I2rUZoucyKKhIW40gjvXgQ3b0Gi7FuZinY A==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="6886646" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="6886646" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:51:45 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="746280146" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="746280146" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:51:40 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 15/42] iommufd: Add iopf handler to KVM hw pagetable Date: Sat, 2 Dec 2023 17:22:45 +0800 Message-Id: <20231202092245.14335-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Add iopf handler to KVM HW page table. The iopf handler is implemented to forward IO page fault requests to KVM and return complete status back to IOMMU driver via iommu_page_response(). Signed-off-by: Yan Zhao --- drivers/iommu/iommufd/hw_pagetable_kvm.c | 87 ++++++++++++++++++++++++ 1 file changed, 87 insertions(+) diff --git a/drivers/iommu/iommufd/hw_pagetable_kvm.c b/drivers/iommu/iommufd/hw_pagetable_kvm.c index e0e205f384ed5..bff9fa3d9f703 100644 --- a/drivers/iommu/iommufd/hw_pagetable_kvm.c +++ b/drivers/iommu/iommufd/hw_pagetable_kvm.c @@ -6,6 +6,89 @@ #include "../iommu-priv.h" #include "iommufd_private.h" +static int iommufd_kvmtdp_fault(void *data, struct mm_struct *mm, + unsigned long addr, u32 perm) +{ + struct iommufd_hw_pagetable *hwpt = data; + struct kvm_tdp_fault_type fault_type = {0}; + unsigned long gfn = addr >> PAGE_SHIFT; + struct kvm_tdp_fd *tdp_fd; + int ret; + + if (!hwpt || !hwpt_is_kvm(hwpt)) + return IOMMU_PAGE_RESP_INVALID; + + tdp_fd = to_hwpt_kvm(hwpt)->context; + if (!tdp_fd->ops->fault) + return IOMMU_PAGE_RESP_INVALID; + + fault_type.read = !!(perm & IOMMU_FAULT_PERM_READ); + fault_type.write = !!(perm & IOMMU_FAULT_PERM_WRITE); + fault_type.exec = !!(perm & IOMMU_FAULT_PERM_EXEC); + + ret = tdp_fd->ops->fault(tdp_fd, mm, gfn, fault_type); + return ret ? IOMMU_PAGE_RESP_FAILURE : IOMMU_PAGE_RESP_SUCCESS; +} + +static int iommufd_kvmtdp_complete_group(struct device *dev, struct iopf_fault *iopf, + enum iommu_page_response_code status) +{ + struct iommu_page_response resp = { + .pasid = iopf->fault.prm.pasid, + .grpid = iopf->fault.prm.grpid, + .code = status, + }; + + if ((iopf->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) && + (iopf->fault.prm.flags & IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID)) + resp.flags = IOMMU_PAGE_RESP_PASID_VALID; + + return iommu_page_response(dev, &resp); +} + +static void iommufd_kvmtdp_handle_iopf(struct work_struct *work) +{ + struct iopf_fault *iopf; + struct iopf_group *group; + enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS; + struct iommu_domain *domain; + void *fault_data; + int ret; + + group = container_of(work, struct iopf_group, work); + domain = group->domain; + fault_data = domain->fault_data; + + list_for_each_entry(iopf, &group->faults, list) { + /* + * For the moment, errors are sticky: don't handle subsequent + * faults in the group if there is an error. + */ + if (status != IOMMU_PAGE_RESP_SUCCESS) + break; + + status = iommufd_kvmtdp_fault(fault_data, domain->mm, + iopf->fault.prm.addr, + iopf->fault.prm.perm); + } + + ret = iommufd_kvmtdp_complete_group(group->dev, &group->last_fault, status); + + iopf_free_group(group); + +} + +static int iommufd_kvmtdp_iopf_handler(struct iopf_group *group) +{ + struct iommu_fault_param *fault_param = group->dev->iommu->fault_param; + + INIT_WORK(&group->work, iommufd_kvmtdp_handle_iopf); + if (!queue_work(fault_param->queue->wq, &group->work)) + return -EBUSY; + + return 0; +} + static void iommufd_kvmtdp_invalidate(void *data, unsigned long start, unsigned long size) { @@ -169,6 +252,10 @@ iommufd_hwpt_kvm_alloc(struct iommufd_ctx *ictx, goto out_abort; } + hwpt->domain->mm = current->mm; + hwpt->domain->iopf_handler = iommufd_kvmtdp_iopf_handler; + hwpt->domain->fault_data = hwpt; + rc = kvmtdp_register(tdp_fd, hwpt); if (rc) goto out_abort; From patchwork Sat Dec 2 09:23:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476847 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="exT2k//C" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29CF0134; Sat, 2 Dec 2023 01:52:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510730; x=1733046730; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=qgwPOU0dccalAjjMk9ImV6B09aYt2Eqp8Z37kksWeQw=; b=exT2k//CH2x7cUhAJhUegDfTCDhV71i78bd0gAb/uE0KpTicm/uXnRly rd5BX4U4jpO+4ZnXXBskmf+HbPTAaGoxkYSGTIUYC+ZQGX9Co/ikeA+QV M0kN07q46Q8GLWExk61fN6jckTLAYhig/95FHtyY44kpx2zoIA0zdiDGf hIA9TI8myjUZezXYjYvDuH1CVRrWsYFgmDLTCzlsKjR+nYMfCarkG1g0N 6yvlo4IOxBvvcZWRbDvr7aIjXxLgL0icm+DfhB7GzNQIWajujJmxXlG/M frOT8coGfYEO2nhT+WYQuyVE2PMiBr0BdRCbAVPK12msBK7R+51faR/F2 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="6886663" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="6886663" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:52:10 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="746280168" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="746280168" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:52:05 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 16/42] iommufd: Enable device feature IOPF during device attachment to KVM HWPT Date: Sat, 2 Dec 2023 17:23:11 +0800 Message-Id: <20231202092311.14392-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Enable device feature IOPF during device attachment to KVM HWPT and abort the attachment if feature enabling is failed. "pin" is not done by KVM HWPT. If VMM wants to create KVM HWPT, it must know that all devices attached to this HWPT support IOPF so that pin-all is skipped. Signed-off-by: Yan Zhao --- drivers/iommu/iommufd/device.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 83af6b7e2784b..4ea447e052ce1 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -381,10 +381,26 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, goto err_unresv; idev->igroup->hwpt = hwpt; } + if (hwpt_is_kvm(hwpt)) { + /* + * Feature IOPF requires ats is enabled which is true only + * after device is attached to iommu domain. + * So enable dev feature IOPF after iommu_attach_group(). + * -EBUSY will be returned if feature IOPF is already on. + */ + rc = iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF); + if (rc && rc != -EBUSY) + goto err_detach; + } refcount_inc(&hwpt->obj.users); list_add_tail(&idev->group_item, &idev->igroup->device_list); mutex_unlock(&idev->igroup->lock); return 0; +err_detach: + if (list_empty(&idev->igroup->device_list)) { + iommu_detach_group(hwpt->domain, idev->igroup->group); + idev->igroup->hwpt = NULL; + } err_unresv: if (hwpt_is_paging(hwpt)) iopt_remove_reserved_iova(&to_hwpt_paging(hwpt)->ioas->iopt, @@ -408,6 +424,8 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev) if (hwpt_is_paging(hwpt)) iopt_remove_reserved_iova(&to_hwpt_paging(hwpt)->ioas->iopt, idev->dev); + if (hwpt_is_kvm(hwpt)) + iommu_dev_disable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF); mutex_unlock(&idev->igroup->lock); /* Caller must destroy hwpt */ From patchwork Sat Dec 2 09:23:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476848 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="jeGcBlzy" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59A29197; Sat, 2 Dec 2023 01:52:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510772; x=1733046772; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=4XuvqJzELS5xzXnhxrYHtRshvrsQAG9FckEEWisSVfA=; b=jeGcBlzyeYqbIYpbHw9YFd3s1iQIiYgnehIrJV7lLBqhWECps9dgrXpX CHPfy55cz2Hd/jvkJd/0FeaujyYhcnQyZOxKPQe6fqGahDZ7fwN6jToze +6XOJj7Dk72ccwcPxeohPEDVLV8TBe9cUVxJJ5YY/sTfPef1nQ6H7lsc3 W7JNOEKmkKcjDvT56AaQw7tb8FeVurWLVnPY7VjFvL1nsiJKZ+T4LBumv E0TtlmtnhWf9zAdks1rnGMdEvYti0vhy89UK/YlXUHeBasIjwaObJjQhv pnic/iNLkVY/8XqRT2+Ev2BsiqRmr3g5nsVNq0zMvFAFhwIW4NCJ9bdp9 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="397479499" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="397479499" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:52:51 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="913853337" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="913853337" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:52:47 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 17/42] iommu/vt-d: Make some macros and helpers to be extern Date: Sat, 2 Dec 2023 17:23:52 +0800 Message-Id: <20231202092352.14452-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: This makes the macros and helpers visible to outside of iommu.c, which is a preparation for next patch to create domain of IOMMU_DOMAIN_KVM. Signed-off-by: Yan Zhao --- drivers/iommu/intel/iommu.c | 39 +++---------------------------------- drivers/iommu/intel/iommu.h | 35 +++++++++++++++++++++++++++++++++ 2 files changed, 38 insertions(+), 36 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 5df6c21781e1c..924006cda18c5 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -49,7 +49,6 @@ #define MAX_AGAW_PFN_WIDTH (MAX_AGAW_WIDTH - VTD_PAGE_SHIFT) #define __DOMAIN_MAX_PFN(gaw) ((((uint64_t)1) << ((gaw) - VTD_PAGE_SHIFT)) - 1) -#define __DOMAIN_MAX_ADDR(gaw) ((((uint64_t)1) << (gaw)) - 1) /* We limit DOMAIN_MAX_PFN to fit in an unsigned long, and DOMAIN_MAX_ADDR to match. That way, we can use 'unsigned long' for PFNs with impunity. */ @@ -62,10 +61,6 @@ #define IOVA_PFN(addr) ((addr) >> PAGE_SHIFT) -/* page table handling */ -#define LEVEL_STRIDE (9) -#define LEVEL_MASK (((u64)1 << LEVEL_STRIDE) - 1) - static inline int agaw_to_level(int agaw) { return agaw + 2; @@ -76,11 +71,6 @@ static inline int agaw_to_width(int agaw) return min_t(int, 30 + agaw * LEVEL_STRIDE, MAX_AGAW_WIDTH); } -static inline int width_to_agaw(int width) -{ - return DIV_ROUND_UP(width - 30, LEVEL_STRIDE); -} - static inline unsigned int level_to_offset_bits(int level) { return (level - 1) * LEVEL_STRIDE; @@ -281,8 +271,6 @@ static LIST_HEAD(dmar_satc_units); #define for_each_rmrr_units(rmrr) \ list_for_each_entry(rmrr, &dmar_rmrr_units, list) -static void intel_iommu_domain_free(struct iommu_domain *domain); - int dmar_disabled = !IS_ENABLED(CONFIG_INTEL_IOMMU_DEFAULT_ON); int intel_iommu_sm = IS_ENABLED(CONFIG_INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON); @@ -450,12 +438,6 @@ int iommu_calculate_agaw(struct intel_iommu *iommu) return __iommu_calculate_agaw(iommu, DEFAULT_DOMAIN_ADDRESS_WIDTH); } -static inline bool iommu_paging_structure_coherency(struct intel_iommu *iommu) -{ - return sm_supported(iommu) ? - ecap_smpwc(iommu->ecap) : ecap_coherent(iommu->ecap); -} - static void domain_update_iommu_coherency(struct dmar_domain *domain) { struct iommu_domain_info *info; @@ -1757,7 +1739,7 @@ static bool first_level_by_default(unsigned int type) return type != IOMMU_DOMAIN_UNMANAGED; } -static struct dmar_domain *alloc_domain(unsigned int type) +struct dmar_domain *alloc_domain(unsigned int type) { struct dmar_domain *domain; @@ -1842,20 +1824,6 @@ void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu) spin_unlock(&iommu->lock); } -static inline int guestwidth_to_adjustwidth(int gaw) -{ - int agaw; - int r = (gaw - 12) % 9; - - if (r == 0) - agaw = gaw; - else - agaw = gaw + 9 - r; - if (agaw > 64) - agaw = 64; - return agaw; -} - static void domain_exit(struct dmar_domain *domain) { if (domain->pgd) { @@ -4106,7 +4074,7 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags, return domain; } -static void intel_iommu_domain_free(struct iommu_domain *domain) +void intel_iommu_domain_free(struct iommu_domain *domain) { if (domain != &si_domain->domain) domain_exit(to_dmar_domain(domain)); @@ -4155,8 +4123,7 @@ int prepare_domain_attach_device(struct iommu_domain *domain, return 0; } -static int intel_iommu_attach_device(struct iommu_domain *domain, - struct device *dev) +int intel_iommu_attach_device(struct iommu_domain *domain, struct device *dev) { struct device_domain_info *info = dev_iommu_priv_get(dev); int ret; diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index 6acb0211e85fe..c76f558ae6323 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -1021,4 +1021,39 @@ static inline const char *decode_prq_descriptor(char *str, size_t size, return str; } +#define __DOMAIN_MAX_ADDR(gaw) ((((uint64_t)1) << (gaw)) - 1) + +/* page table handling */ +#define LEVEL_STRIDE (9) +#define LEVEL_MASK (((u64)1 << LEVEL_STRIDE) - 1) + +int intel_iommu_attach_device(struct iommu_domain *domain, struct device *dev); +void intel_iommu_domain_free(struct iommu_domain *domain); +struct dmar_domain *alloc_domain(unsigned int type); + +static inline int guestwidth_to_adjustwidth(int gaw) +{ + int agaw; + int r = (gaw - 12) % 9; + + if (r == 0) + agaw = gaw; + else + agaw = gaw + 9 - r; + if (agaw > 64) + agaw = 64; + return agaw; +} + +static inline bool iommu_paging_structure_coherency(struct intel_iommu *iommu) +{ + return sm_supported(iommu) ? + ecap_smpwc(iommu->ecap) : ecap_coherent(iommu->ecap); +} + +static inline int width_to_agaw(int width) +{ + return DIV_ROUND_UP(width - 30, LEVEL_STRIDE); +} + #endif From patchwork Sat Dec 2 09:24:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476849 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hoCjmwOM" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA27119F; Sat, 2 Dec 2023 01:53:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510799; x=1733046799; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=qE3X4Xn/o+H/su5ms6o1r+LMtnk1FlXs0DKcPdLLVE8=; b=hoCjmwOMlB3pI9t+pPyu6bdaEZLcQwe15iv9cn2brZ8I/zpPfgEA/QSx MXUZ66ZR/LqAY6FadBepxMoVjXDKrJSo3k9vSgOd9v2x0ne8CS/4FQ50e mkzlLOKjxeqZvSu9hJUHa61JUge47bYA0c7Wvf9LXHNjGBZif4/uBjMpP R8qDTiEv7KkLyUbhcy4He1Hffg3GNtJ7bt+A4gE3/ni7fftVkF65RM0Yp lQWixxLN47rgjUGLziqEjAWQDlvoFqm/ONBS5+kRHSPCECXP+nyXF3HnS wJk6RtIqCxBPcetFEJIFIMwzlqZfq3YoCkgMbun64I+y8dbQaLMKWmLH+ w==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="397479511" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="397479511" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:53:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="913853413" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="913853413" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:53:15 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 18/42] iommu/vt-d: Support of IOMMU_DOMAIN_KVM domain in Intel IOMMU Date: Sat, 2 Dec 2023 17:24:21 +0800 Message-Id: <20231202092421.14524-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Add support of IOMMU_DOMAIN_KVM domain. Paging structures allocation/free, page mapping and unmapping of this damain are managed by KVM rather than by Intel IOMMU driver. The meta data of paging structures of KVM domain is read from the allocation "data" passed in from KVM through IOMMUFD. The format to parse the meta data is defined in arch header "asm/kvm_exported_tdp.h". KVM domain's gaw(guest witdh), agaw, pgd, max_add, max super page level are all read from the paging structure meta data from KVM. Snoop and paging structure coherency are forced to be true. IOMMU hardware are checked against the requirement of KVM domain at domain allocation phase and later device attachment phase (in a later patch). CONFIG_INTEL_IOMMU_KVM is provided to turn on/off KVM domain support. Signed-off-by: Yan Zhao --- drivers/iommu/intel/Kconfig | 9 +++ drivers/iommu/intel/Makefile | 1 + drivers/iommu/intel/iommu.c | 18 ++++- drivers/iommu/intel/iommu.h | 5 ++ drivers/iommu/intel/kvm.c | 128 +++++++++++++++++++++++++++++++++++ 5 files changed, 160 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/intel/kvm.c diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig index a4a125666293f..78078103d4280 100644 --- a/drivers/iommu/intel/Kconfig +++ b/drivers/iommu/intel/Kconfig @@ -108,4 +108,13 @@ config INTEL_IOMMU_PERF_EVENTS to aid performance tuning and debug. These are available on modern processors which support Intel VT-d 4.0 and later. +config INTEL_IOMMU_KVM + bool "Support of stage 2 paging structures/mappings managed by KVM" + help + Selecting this option will enable Intel IOMMU to use paging + structures shared from KVM MMU as the stage 2 paging structures + in IOMMU hardware. The page mapping/unmapping, paging struture + allocation/free of this stage 2 paging structures are not managed + by Intel IOMMU driver, but by KVM MMU. + endif # INTEL_IOMMU diff --git a/drivers/iommu/intel/Makefile b/drivers/iommu/intel/Makefile index 5dabf081a7793..c097bdd6ee13d 100644 --- a/drivers/iommu/intel/Makefile +++ b/drivers/iommu/intel/Makefile @@ -7,3 +7,4 @@ obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += debugfs.o obj-$(CONFIG_INTEL_IOMMU_SVM) += svm.o obj-$(CONFIG_IRQ_REMAP) += irq_remapping.o obj-$(CONFIG_INTEL_IOMMU_PERF_EVENTS) += perfmon.o +obj-$(CONFIG_INTEL_IOMMU_KVM) += kvm.o diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 924006cda18c5..fcdee40f30ed1 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -375,6 +375,15 @@ static inline int domain_type_is_si(struct dmar_domain *domain) return domain->domain.type == IOMMU_DOMAIN_IDENTITY; } +static inline int domain_type_is_kvm(struct dmar_domain *domain) +{ +#ifdef CONFIG_INTEL_IOMMU_KVM + return domain->domain.type == IOMMU_DOMAIN_KVM; +#else + return false; +#endif +} + static inline int domain_pfn_supported(struct dmar_domain *domain, unsigned long pfn) { @@ -1735,6 +1744,9 @@ static bool first_level_by_default(unsigned int type) if (intel_cap_flts_sanity() ^ intel_cap_slts_sanity()) return intel_cap_flts_sanity(); + if (type == IOMMU_DOMAIN_KVM) + return false; + /* Both levels are available, decide it based on domain type */ return type != IOMMU_DOMAIN_UNMANAGED; } @@ -1826,7 +1838,8 @@ void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu) static void domain_exit(struct dmar_domain *domain) { - if (domain->pgd) { + /* pgd of kvm domain is managed by KVM */ + if (!domain_type_is_kvm(domain) && (domain->pgd)) { LIST_HEAD(freelist); domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw), &freelist); @@ -4892,6 +4905,9 @@ const struct iommu_ops intel_iommu_ops = { .hw_info = intel_iommu_hw_info, .domain_alloc = intel_iommu_domain_alloc, .domain_alloc_user = intel_iommu_domain_alloc_user, +#ifdef CONFIG_INTEL_IOMMU_KVM + .domain_alloc_kvm = intel_iommu_domain_alloc_kvm, +#endif .probe_device = intel_iommu_probe_device, .probe_finalize = intel_iommu_probe_finalize, .release_device = intel_iommu_release_device, diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index c76f558ae6323..8826e9248f6ed 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -1056,4 +1056,9 @@ static inline int width_to_agaw(int width) return DIV_ROUND_UP(width - 30, LEVEL_STRIDE); } +#ifdef CONFIG_INTEL_IOMMU_KVM +struct iommu_domain * +intel_iommu_domain_alloc_kvm(struct device *dev, u32 flags, const void *data); +#endif + #endif diff --git a/drivers/iommu/intel/kvm.c b/drivers/iommu/intel/kvm.c new file mode 100644 index 0000000000000..188ec90083051 --- /dev/null +++ b/drivers/iommu/intel/kvm.c @@ -0,0 +1,128 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include "iommu.h" + +/** + * Check IOMMU hardware Snoop related caps + * + * - force_snooping: Force snoop cpu caches per current KVM implementation. + * - scalable-mode: To enable PGSNP bit in PASIDTE to overwrite SNP + * bit (bit 11) in stage 2 leaves. + * - paging structure coherency: As KVM will not call clflush_cache_range() + */ +static bool is_coherency(struct intel_iommu *iommu) +{ + return ecap_sc_support(iommu->ecap) && sm_supported(iommu) && + iommu_paging_structure_coherency(iommu); +} + +static bool is_iommu_cap_compatible_to_kvm_domain(struct dmar_domain *domain, + struct intel_iommu *iommu) +{ + if (!is_coherency(iommu)) + return false; + + if (domain->iommu_superpage > fls(cap_super_page_val(iommu->cap))) + return false; + + if (domain->agaw > iommu->agaw || domain->agaw > cap_mgaw(iommu->cap)) + return false; + + return true; +} + +/* + * Cache coherency is always enforced in KVM domain. + * IOMMU hardware caps will be checked to allow the cache coherency before + * device attachment to the KVM domain. + */ +static bool kvm_domain_enforce_cache_coherency(struct iommu_domain *domain) +{ + return true; +} + +static const struct iommu_domain_ops intel_kvm_domain_ops = { + .free = intel_iommu_domain_free, + .enforce_cache_coherency = kvm_domain_enforce_cache_coherency, +}; + +struct iommu_domain * +intel_iommu_domain_alloc_kvm(struct device *dev, u32 flags, const void *data) +{ + bool request_nest_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT; + const struct kvm_exported_tdp_meta_vmx *tdp = data; + struct dmar_domain *dmar_domain; + struct iommu_domain *domain; + struct intel_iommu *iommu; + int adjust_width; + + iommu = device_to_iommu(dev, NULL, NULL); + + if (!iommu) + return ERR_PTR(-ENODEV); + /* + * In theroy, a KVM domain can be nested as a parent domain to a user + * domain. Turn it off as we don't want to handle cases like IO page + * fault on nested domain for now. + */ + if ((request_nest_parent)) { + pr_err("KVM domain does not work as nested parent currently\n"); + return ERR_PTR(-EOPNOTSUPP); + } + + if (!tdp || tdp->type != KVM_TDP_TYPE_EPT) { + pr_err("No meta data or wrong KVM TDP type\n"); + return ERR_PTR(-EINVAL); + } + + if (tdp->level != 4 && tdp->level != 5) { + pr_err("Unsupported KVM TDP level %d in IOMMU\n", tdp->level); + return ERR_PTR(-EOPNOTSUPP); + } + + dmar_domain = alloc_domain(IOMMU_DOMAIN_KVM); + if (!dmar_domain) + return ERR_PTR(-ENOMEM); + + if (dmar_domain->use_first_level) + WARN_ON("KVM domain is applying to IOMMU flpt\n"); + + domain = &dmar_domain->domain; + domain->ops = &intel_kvm_domain_ops; + domain->type = IOMMU_DOMAIN_KVM; + + /* read dmar domain meta data from "tdp" */ + dmar_domain->gaw = tdp->level == 4 ? ADDR_WIDTH_4LEVEL : ADDR_WIDTH_5LEVEL; + adjust_width = guestwidth_to_adjustwidth(dmar_domain->gaw); + dmar_domain->agaw = width_to_agaw(adjust_width); + dmar_domain->iommu_superpage = tdp->max_huge_page_level - 1; + dmar_domain->max_addr = (1 << dmar_domain->gaw); + dmar_domain->pgd = phys_to_virt(tdp->root_hpa); + + dmar_domain->nested_parent = false; + dmar_domain->dirty_tracking = false; + + /* + * force_snooping and paging strucure coherency in KVM domain + * IOMMU hareware cap will be checked before device attach + */ + dmar_domain->force_snooping = true; + dmar_domain->iommu_coherency = true; + + /* no need to let iommu_map/unmap see pgsize_bitmap */ + domain->pgsize_bitmap = 0; + + /* force aperture */ + domain->geometry.aperture_start = 0; + domain->geometry.aperture_end = __DOMAIN_MAX_ADDR(dmar_domain->gaw); + domain->geometry.force_aperture = true; + + if (!is_iommu_cap_compatible_to_kvm_domain(dmar_domain, iommu)) { + pr_err("Unsupported KVM TDP\n"); + kfree(dmar_domain); + return ERR_PTR(-EOPNOTSUPP); + } + + return domain; +} From patchwork Sat Dec 2 09:24:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476850 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="eWHszJLN" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A5F7AD50; Sat, 2 Dec 2023 01:53:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510832; x=1733046832; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=DNvEQ5Df+enLLKlvPpgNl8iSMA8LDSmYdeB3/CStJvw=; b=eWHszJLNQJpKxhmd4cOBWfgH2FcceSsLHm0/+Eaq7DmcxmUaU6zWQ2q5 TLTcyJ7S6IfQBGW7ZtE0dyHDUES+kgi6i7EJIEpB9zQCqz/yUIyjkWWzz 6hh/V8edrecTCYxVNMcCUs9yFJz/ha51H6ZmgRVbVOEuAYkPXeP/lfFYt QUTOQIMyaynYXcfJ5pY3Eouv4Y7AOD1WTlQX55Pq7DqZ5ydrnF1vfnKl6 PkZhw6+Qg9ZHBBb37EbYvdxo9iqKZLHfcbBeU9j7OuT40g8KyWXEkWokb OLVjRywITAeACNkHF06HGplHt263b+U5MPggfldLqw1osDXEW/YZxCoke g==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="625567" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="625567" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:53:51 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="719780723" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="719780723" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:53:47 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 19/42] iommu/vt-d: Set bit PGSNP in PASIDTE if domain cache coherency is enforced Date: Sat, 2 Dec 2023 17:24:52 +0800 Message-Id: <20231202092452.14581-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Set bit PGSNP (Page Snoop, bit 88) in PASIDTE when attaching device to a domain whose cache coherency is enforced. Signed-off-by: Yan Zhao --- drivers/iommu/intel/pasid.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index 74e8e4c17e814..a42955b5e666f 100644 --- a/drivers/iommu/intel/pasid.c +++ b/drivers/iommu/intel/pasid.c @@ -679,10 +679,11 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu, pasid_set_address_width(pte, agaw); pasid_set_translation_type(pte, PASID_ENTRY_PGTT_SL_ONLY); pasid_set_fault_enable(pte); + if (domain->force_snooping) + pasid_set_pgsnp(pte); pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap)); if (domain->dirty_tracking) pasid_set_ssade(pte); - pasid_set_present(pte); spin_unlock(&iommu->lock); From patchwork Sat Dec 2 09:25:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476851 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="icl1q9Od" Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E6ECBCC; Sat, 2 Dec 2023 01:54:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510862; x=1733046862; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=CgOHdWHDwhSGt7y0GRo0NgHb+w74ehYb80zPfydmvj4=; b=icl1q9OdqATdQwtP/FyMtCYDZbDn0HnlhxcsatMDLyMDZXhAVHmTW/zf YhO0mEv29762zymrDL7ukM5HwZAvgCXHWLpJGRDtG+fQ9j3P1IodEzy5G 8cMVbIPzycmfrvbzszdJoK+Z9tmy+QnylTtbIcAE075LW6+G6mgd+nUMC Q6ZV1sC3Se+Ea8b5oVSkLPXmy+TmMaXeWdbSIpO9N1G0spcsTTNteLdEx aALNu7hQClE0VQ23u46xSK8BRok3ouVvG8aI8D0aK8aGVmYdFahKMoWOF whfuWZtwYgl2S9ZMIkZb7WLYSWNoLTRGX+F6/Sb6Eap0mwpYWzgqZXneR w==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="424756075" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="424756075" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:54:22 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="836021580" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="836021580" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:54:19 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 20/42] iommu/vt-d: Support attach devices to IOMMU_DOMAIN_KVM domain Date: Sat, 2 Dec 2023 17:25:24 +0800 Message-Id: <20231202092524.14647-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: IOMMU_DOMAIN_KVM domain reuses intel_iommu_attach_device() for device attachment. But unlike attaching to other dmar_domain, domain caps (e.g. iommu_superpage) are not updated after device attach. Instead, IOMMU caps are checked for compatibility before domain attachment. Signed-off-by: Yan Zhao --- drivers/iommu/intel/iommu.c | 11 +++++++++++ drivers/iommu/intel/iommu.h | 7 +++++++ drivers/iommu/intel/kvm.c | 9 +++++++++ 3 files changed, 27 insertions(+) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index fcdee40f30ed1..9cc42b3d24f65 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -552,6 +552,13 @@ static unsigned long domain_super_pgsize_bitmap(struct dmar_domain *domain) /* Some capabilities may be different across iommus */ void domain_update_iommu_cap(struct dmar_domain *domain) { + /* + * No need to adjust iommu cap of kvm domain. + * Instead, iommu will be checked in pre-attach phase. + */ + if (domain_type_is_kvm(domain)) + return; + domain_update_iommu_coherency(domain); domain->iommu_superpage = domain_update_iommu_superpage(domain, NULL); @@ -4104,6 +4111,9 @@ int prepare_domain_attach_device(struct iommu_domain *domain, if (!iommu) return -ENODEV; + if (domain_type_is_kvm(dmar_domain)) + return prepare_kvm_domain_attach(dmar_domain, iommu); + if (dmar_domain->force_snooping && !ecap_sc_support(iommu->ecap)) return -EINVAL; @@ -4117,6 +4127,7 @@ int prepare_domain_attach_device(struct iommu_domain *domain, if (dmar_domain->max_addr > (1LL << addr_width)) return -EINVAL; + dmar_domain->gaw = addr_width; /* diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index 8826e9248f6ed..801700bc7d820 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -1059,6 +1059,13 @@ static inline int width_to_agaw(int width) #ifdef CONFIG_INTEL_IOMMU_KVM struct iommu_domain * intel_iommu_domain_alloc_kvm(struct device *dev, u32 flags, const void *data); +int prepare_kvm_domain_attach(struct dmar_domain *domain, struct intel_iommu *iommu); +#else +static inline int prepare_kvm_domain_attach(struct dmar_domain *domain, + struct intel_iommu *iommu) +{ + return 0; +} #endif #endif diff --git a/drivers/iommu/intel/kvm.c b/drivers/iommu/intel/kvm.c index 188ec90083051..1ce334785430b 100644 --- a/drivers/iommu/intel/kvm.c +++ b/drivers/iommu/intel/kvm.c @@ -32,6 +32,14 @@ static bool is_iommu_cap_compatible_to_kvm_domain(struct dmar_domain *domain, return true; } +int prepare_kvm_domain_attach(struct dmar_domain *domain, struct intel_iommu *iommu) +{ + if (is_iommu_cap_compatible_to_kvm_domain(domain, iommu)) + return 0; + + return -EINVAL; +} + /* * Cache coherency is always enforced in KVM domain. * IOMMU hardware caps will be checked to allow the cache coherency before @@ -43,6 +51,7 @@ static bool kvm_domain_enforce_cache_coherency(struct iommu_domain *domain) } static const struct iommu_domain_ops intel_kvm_domain_ops = { + .attach_dev = intel_iommu_attach_device, .free = intel_iommu_domain_free, .enforce_cache_coherency = kvm_domain_enforce_cache_coherency, }; From patchwork Sat Dec 2 09:26:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476852 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="GpvP4aUk" Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3AA98CC; Sat, 2 Dec 2023 01:55:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510901; x=1733046901; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=qum6bCO8m1UvZ+EuWUSsWmgfywLvc9j17rJU1lVG7vc=; b=GpvP4aUkodMLkK1ZQa9oV57K5vnj9XouXDzVO88zrBYZmeFNyEiViWle 8ipuBfa4HMy7sAhQEsJzA9Ll1SUB95xxj7I+BlcTZR3kEHHbc3C47T544 ZmI4MaqS/GimDRi9JRi5T1moEgsBZ8OndUm78AcL+dFaw5sR1TVt+Fj3A fTg9Ulp22qN6AjvNBBD6rAFOqoFZoCXG36o89v/gNcqQmL+ILc0so4+mp CdEL5jQBzBC2J/e9NNoNJeZOsvMxteQTv1tMcUHS9cjI0R7QPhMDL0dSU f0B2L9x82rKuUOIyTZJfbYyQU3EdLrYG+GtlrrPbGqVrOcOtkvo9u23B/ Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="393322186" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="393322186" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:55:00 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="804336985" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="804336985" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:54:56 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 21/42] iommu/vt-d: Check reserved bits for IOMMU_DOMAIN_KVM domain Date: Sat, 2 Dec 2023 17:26:02 +0800 Message-Id: <20231202092602.14704-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Compatibility check between IOMMU driver and KVM. rsvd_bits_mask is provided by KVM to guarantee that the set bits are must-be-zero bits in PTEs. Intel vt-d driver can check it to see if all must-be-zero bits required by IOMMU side are included. In this RFC, only bit 11 is checked for simplicity and demo purpose. Signed-off-by: Yan Zhao --- drivers/iommu/intel/kvm.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/drivers/iommu/intel/kvm.c b/drivers/iommu/intel/kvm.c index 1ce334785430b..998d6daaf7ea1 100644 --- a/drivers/iommu/intel/kvm.c +++ b/drivers/iommu/intel/kvm.c @@ -32,6 +32,18 @@ static bool is_iommu_cap_compatible_to_kvm_domain(struct dmar_domain *domain, return true; } +static int check_tdp_reserved_bits(const struct kvm_exported_tdp_meta_vmx *tdp) +{ + int i; + + for (i = PT64_ROOT_MAX_LEVEL; --i >= 0;) { + if (!(tdp->rsvd_bits_mask[0][i] & BIT(11)) || + !(tdp->rsvd_bits_mask[1][i] & BIT(11))) + return -EFAULT; + } + return 0; +} + int prepare_kvm_domain_attach(struct dmar_domain *domain, struct intel_iommu *iommu) { if (is_iommu_cap_compatible_to_kvm_domain(domain, iommu)) @@ -90,6 +102,11 @@ intel_iommu_domain_alloc_kvm(struct device *dev, u32 flags, const void *data) return ERR_PTR(-EOPNOTSUPP); } + if (check_tdp_reserved_bits(tdp)) { + pr_err("Reserved bits incompatible between KVM and IOMMU\n"); + return ERR_PTR(-EOPNOTSUPP); + } + dmar_domain = alloc_domain(IOMMU_DOMAIN_KVM); if (!dmar_domain) return ERR_PTR(-ENOMEM); From patchwork Sat Dec 2 09:26:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476853 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="dYmF9aQn" Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AFD001A6; Sat, 2 Dec 2023 01:55:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510929; x=1733046929; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=BZYTTtyQtvoxZgxXT0A+sCWE4cIHFzWJmMiUL2ynGlI=; b=dYmF9aQnVmkJu07WnrokIVdrrNK9YE4DbiE3Nqmtjm6cynHIf0YWYgH0 6BCgO/jI1g2n/b0zcjGw55hAujJNPX7Jb4R2uGd1gqgcmtCAYbdNxWAOY nqyJBC/15s6TjfQVxKrKedMxfKMR50TJimQ3Yob9z8x76AFWfKuhalSw+ zo/cXC/FT9IKNo9SHsuPeqcxsgfKOvAioRtHcfKzC2rfh562cLHnfTZy0 +6XkkyalKT9TAombryRzQEnIlB9zA9hdKuhCQhzePMvPAjJk05LtyS9Uv 2CVCsg86/1jnWJ/+yZKgKA5c00SakhhoMB7IoPgS0Wb7I4LFe2FZmKkMT Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="393322196" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="393322196" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:55:29 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="804337094" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="804337094" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:55:24 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 22/42] iommu/vt-d: Support cache invalidate of IOMMU_DOMAIN_KVM domain Date: Sat, 2 Dec 2023 17:26:30 +0800 Message-Id: <20231202092630.14764-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Support invalidation of hardware TLBs on KVM invalidates mappings on domain of type IOMMU_DOMAIN_KVM. Signed-off-by: Yan Zhao --- drivers/iommu/intel/kvm.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/drivers/iommu/intel/kvm.c b/drivers/iommu/intel/kvm.c index 998d6daaf7ea1..56cb8f9bf1da0 100644 --- a/drivers/iommu/intel/kvm.c +++ b/drivers/iommu/intel/kvm.c @@ -62,10 +62,41 @@ static bool kvm_domain_enforce_cache_coherency(struct iommu_domain *domain) return true; } +static void domain_flush_iotlb_psi(struct dmar_domain *domain, + unsigned long iova, unsigned long size) +{ + struct iommu_domain_info *info; + unsigned long i; + + if (!IS_ALIGNED(size, VTD_PAGE_SIZE) || + !IS_ALIGNED(iova, VTD_PAGE_SIZE)) { + pr_err("Invalid KVM domain invalidation: iova=0x%lx, size=0x%lx\n", + iova, size); + return; + } + + xa_for_each(&domain->iommu_array, i, info) + iommu_flush_iotlb_psi(info->iommu, domain, + iova >> VTD_PAGE_SHIFT, + size >> VTD_PAGE_SHIFT, 1, 0); +} + +static void kvm_domain_cache_invalidate(struct iommu_domain *domain, + unsigned long iova, unsigned long size) +{ + struct dmar_domain *dmar_domain = to_dmar_domain(domain); + + if (iova == 0 && size == -1UL) + intel_flush_iotlb_all(domain); + else + domain_flush_iotlb_psi(dmar_domain, iova, size); +} + static const struct iommu_domain_ops intel_kvm_domain_ops = { .attach_dev = intel_iommu_attach_device, .free = intel_iommu_domain_free, .enforce_cache_coherency = kvm_domain_enforce_cache_coherency, + .cache_invalidate_kvm = kvm_domain_cache_invalidate, }; struct iommu_domain * From patchwork Sat Dec 2 09:26:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476854 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="eNuKTgKc" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF2881A4; Sat, 2 Dec 2023 01:55:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510958; x=1733046958; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=ezFO7DlF7HjinwT5gG8xfHRYCaAw4XvRxjdlb8ZqGls=; b=eNuKTgKc87Scq9Rm16NVSPj59OTv1bDkEDSclTBM0pWKJF0EBJNsszKQ aU94kgzVru+dKRPxJ/bJQdxtaMTXrMCmvdQQpMUcRtWybJtB/WIDd2xUm StTHuOaIABnHz8zGU5WclBVLKZ1FYQJjpCS02pVc/Lr0wQhrbW14QpVt7 Fs6z1IOZydy8SKJUKyx5WcCR5UehRLBVq8z1h+UvYHGImqnqBwyWefc5i BguP12WGb6IpQoyPGgMmmWgqlDJHQfrKbM4WtgKTBZm//EwnlxVxy/GF9 KGAaRyxlunqkWYFMndMD2LOsdjwi03w1J50MSzHhd5yikfinIRU+fOx6X A==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="625623" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="625623" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:55:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="719780952" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="719780952" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:55:53 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 23/42] iommu/vt-d: Allow pasid 0 in IOPF Date: Sat, 2 Dec 2023 17:26:57 +0800 Message-Id: <20231202092657.14822-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Pasid 0 is allowed when IOPFs are triggered in second level page tables. Page requests/response with pasid 0 or without pasid are also permitted by vt-d hardware spec. FIXME: Current .page_response and intel_svm_enable_prq() are bound to SVM and is compiled only with CONFIG_INTEL_IOMMU_SVM. e.g. .page_response = intel_svm_page_response, Need to move prq enableing and page response code outside of svm.c and SVM independent. Signed-off-by: Yan Zhao --- drivers/iommu/intel/svm.c | 37 ++++++++++++++++++++----------------- 1 file changed, 20 insertions(+), 17 deletions(-) diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index 659de9c160241..a2a63a85baa9f 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -628,6 +628,7 @@ static irqreturn_t prq_event_thread(int irq, void *d) int head, tail, handled; struct pci_dev *pdev; u64 address; + bool bad_req = false; /* * Clear PPR bit before reading head/tail registers, to ensure that @@ -642,30 +643,29 @@ static irqreturn_t prq_event_thread(int irq, void *d) req = &iommu->prq[head / sizeof(*req)]; address = (u64)req->addr << VTD_PAGE_SHIFT; - if (unlikely(!req->pasid_present)) { - pr_err("IOMMU: %s: Page request without PASID\n", + if (unlikely(!req->pasid_present)) + pr_info("IOMMU: %s: Page request without PASID\n", iommu->name); -bad_req: - handle_bad_prq_event(iommu, req, QI_RESP_INVALID); - goto prq_advance; - } if (unlikely(!is_canonical_address(address))) { pr_err("IOMMU: %s: Address is not canonical\n", iommu->name); - goto bad_req; + bad_req = true; + goto prq_advance; } if (unlikely(req->pm_req && (req->rd_req | req->wr_req))) { pr_err("IOMMU: %s: Page request in Privilege Mode\n", iommu->name); - goto bad_req; + bad_req = true; + goto prq_advance; } if (unlikely(req->exe_req && req->rd_req)) { pr_err("IOMMU: %s: Execution request not supported\n", iommu->name); - goto bad_req; + bad_req = true; + goto prq_advance; } /* Drop Stop Marker message. No need for a response. */ @@ -679,8 +679,10 @@ static irqreturn_t prq_event_thread(int irq, void *d) * If prq is to be handled outside iommu driver via receiver of * the fault notifiers, we skip the page response here. */ - if (!pdev) - goto bad_req; + if (!pdev) { + bad_req = true; + goto prq_advance; + } if (intel_svm_prq_report(iommu, &pdev->dev, req)) handle_bad_prq_event(iommu, req, QI_RESP_INVALID); @@ -688,8 +690,14 @@ static irqreturn_t prq_event_thread(int irq, void *d) trace_prq_report(iommu, &pdev->dev, req->qw_0, req->qw_1, req->priv_data[0], req->priv_data[1], iommu->prq_seq_number++); + pci_dev_put(pdev); + prq_advance: + if (bad_req) { + handle_bad_prq_event(iommu, req, QI_RESP_INVALID); + bad_req = false; + } head = (head + sizeof(*req)) & PRQ_RING_MASK; } @@ -747,12 +755,7 @@ int intel_svm_page_response(struct device *dev, private_present = prm->flags & IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA; last_page = prm->flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE; - if (!pasid_present) { - ret = -EINVAL; - goto out; - } - - if (prm->pasid == 0 || prm->pasid >= PASID_MAX) { + if (prm->pasid >= PASID_MAX) { ret = -EINVAL; goto out; } From patchwork Sat Dec 2 09:27:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476855 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="jQW+ry6G" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E83C2D50; Sat, 2 Dec 2023 01:56:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701510987; x=1733046987; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=nTuWdwSlGx9t9K0jy24gxhIhSwr530ozUXC9RhUGrZc=; b=jQW+ry6GfZlgqpJEaOZOqoqp5zhtRugU00GSigD/XsDp+yYE7vDPOwUU 49Ptf25Sghii6ngyEhhm4MUR2KKThQK2hpLX+B/FdHBTqSPHdzgX6dQgg AwC3IGiPclDsCbDisTpzR8n9KlNt8OAWZkkHVS1Ti/qZgR0tM5CYcDJPu ehWq1v3ecNuvHlFOxr5nRldFNc+ilDKqtoVspeVLLiVwgr+kd/69F9RtN AYc6C1sR0wx94Mi4zFyJk9t2JT0qQbQTu/7PpJDGxpZOJVkcJk/CEcQhO RCbFJubCbtJqMhog/lHWBrXPNAwrpFnbbxfd2iQCvIkSmMx0lpGSpy6pV w==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="625650" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="625650" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:56:27 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="719781015" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="719781015" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:56:22 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 24/42] KVM: x86/mmu: Move bit SPTE_MMU_PRESENT from bit 11 to bit 59 Date: Sat, 2 Dec 2023 17:27:27 +0800 Message-Id: <20231202092727.14888-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Add a config CONFIG_HAVE_KVM_MMU_PRESENT_HIGH to support locating SPTE_MMU_PRESENT bit from bit 11 to bit 59 and mark bit 11 as reserved 0. Though locating SPTE_MMU_PRESENT bit at low bit 11 has lower footprint, sometimes it's not allowed for bit 11 to be set, e.g. when KVM's TDP is exported and shared to IOMMU as stage 2 page tables, bit 11 must be reserved as 0 in Intel vt-d. For the 19 bits MMIO GEN masks, w/o CONFIG_HAVE_KVM_MMU_PRESENT_HIGH, it's divided into 2 parts, Low: bit 3 - 10 High: bit 52 - 62 w/ CONFIG_HAVE_KVM_MMU_PRESENT_HIGH, it's divided into 3 parts, Low: bit 3 - 11 Mid: bit 52 - 58 High: bit 60 - 62 It is ok for MMIO GEN mask to take bit 11 because MMIO GEN mask is for generation info of emulated MMIOs and therefore will not be directly accessed by Intel vt-d hardware. Signed-off-by: Yan Zhao --- arch/x86/kvm/mmu/mmu.c | 7 ++++ arch/x86/kvm/mmu/spte.c | 3 ++ arch/x86/kvm/mmu/spte.h | 77 ++++++++++++++++++++++++++++++++++++----- virt/kvm/Kconfig | 3 ++ 4 files changed, 81 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index c57e181bba21b..69af78e508197 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4926,6 +4926,13 @@ static void reset_tdp_shadow_zero_bits_mask(struct kvm_mmu *context) reserved_hpa_bits(), false, max_huge_page_level); + if (IS_ENABLED(CONFIG_HAVE_KVM_MMU_PRESENT_HIGH)) { + for (i = PT64_ROOT_MAX_LEVEL; --i >= 0;) { + shadow_zero_check->rsvd_bits_mask[0][i] |= rsvd_bits(11, 11); + shadow_zero_check->rsvd_bits_mask[1][i] |= rsvd_bits(11, 11); + } + } + if (!shadow_me_mask) return; diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 4a599130e9c99..179156cd995df 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -64,6 +64,9 @@ static u64 generation_mmio_spte_mask(u64 gen) WARN_ON_ONCE(gen & ~MMIO_SPTE_GEN_MASK); mask = (gen << MMIO_SPTE_GEN_LOW_SHIFT) & MMIO_SPTE_GEN_LOW_MASK; +#ifdef CONFIG_HAVE_KVM_MMU_PRESENT_HIGH + mask |= (gen << MMIO_SPTE_GEN_MID_SHIFT) & MMIO_SPTE_GEN_MID_MASK; +#endif mask |= (gen << MMIO_SPTE_GEN_HIGH_SHIFT) & MMIO_SPTE_GEN_HIGH_MASK; return mask; } diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index a129951c9a885..b88b686a4ecbc 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -7,13 +7,20 @@ #include "mmu_internal.h" /* - * A MMU present SPTE is backed by actual memory and may or may not be present - * in hardware. E.g. MMIO SPTEs are not considered present. Use bit 11, as it - * is ignored by all flavors of SPTEs and checking a low bit often generates - * better code than for a high bit, e.g. 56+. MMU present checks are pervasive - * enough that the improved code generation is noticeable in KVM's footprint. - */ +* A MMU present SPTE is backed by actual memory and may or may not be present +* in hardware. E.g. MMIO SPTEs are not considered present. Use bit 11, as it +* is ignored by all flavors of SPTEs and checking a low bit often generates +* better code than for a high bit, e.g. 56+. MMU present checks are pervasive +* enough that the improved code generation is noticeable in KVM's footprint. +* However, sometimes it's desired to have present bit in high bits. e.g. +* if a KVM TDP is exported to IOMMU side, bit 11 could be a reserved bit in +* IOMMU side. Add a config to decide MMU present bit is at bit 11 or bit 59. +*/ +#ifdef CONFIG_HAVE_KVM_MMU_PRESENT_HIGH +#define SPTE_MMU_PRESENT_MASK BIT_ULL(59) +#else #define SPTE_MMU_PRESENT_MASK BIT_ULL(11) +#endif /* * TDP SPTES (more specifically, EPT SPTEs) may not have A/D bits, and may also @@ -111,19 +118,66 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK)); * checking for MMIO spte cache hits. */ +#ifdef CONFIG_HAVE_KVM_MMU_PRESENT_HIGH + #define MMIO_SPTE_GEN_LOW_START 3 -#define MMIO_SPTE_GEN_LOW_END 10 +#define MMIO_SPTE_GEN_LOW_END 11 +#define MMIO_SPTE_GEN_MID_START 52 +#define MMIO_SPTE_GEN_MID_END 58 +#define MMIO_SPTE_GEN_HIGH_START 60 +#define MMIO_SPTE_GEN_HIGH_END 62 +#define MMIO_SPTE_GEN_LOW_MASK GENMASK_ULL(MMIO_SPTE_GEN_LOW_END, \ + MMIO_SPTE_GEN_LOW_START) +#define MMIO_SPTE_GEN_MID_MASK GENMASK_ULL(MMIO_SPTE_GEN_MID_END, \ + MMIO_SPTE_GEN_MID_START) +#define MMIO_SPTE_GEN_HIGH_MASK GENMASK_ULL(MMIO_SPTE_GEN_HIGH_END, \ + MMIO_SPTE_GEN_HIGH_START) +static_assert(!(SPTE_MMU_PRESENT_MASK & + (MMIO_SPTE_GEN_LOW_MASK | MMIO_SPTE_GEN_MID_MASK | + MMIO_SPTE_GEN_HIGH_MASK))); +/* + * The SPTE MMIO mask must NOT overlap the MMIO generation bits or the + * MMU-present bit. The generation obviously co-exists with the magic MMIO + * mask/value, and MMIO SPTEs are considered !MMU-present. + * + * The SPTE MMIO mask is allowed to use hardware "present" bits (i.e. all EPT + * RWX bits), all physical address bits (legal PA bits are used for "fast" MMIO + * and so they're off-limits for generation; additional checks ensure the mask + * doesn't overlap legal PA bits), and bit 63 (carved out for future usage). + */ +#define SPTE_MMIO_ALLOWED_MASK (BIT_ULL(63) | GENMASK_ULL(51, 12) | GENMASK_ULL(2, 0)) +static_assert(!(SPTE_MMIO_ALLOWED_MASK & + (SPTE_MMU_PRESENT_MASK | MMIO_SPTE_GEN_LOW_MASK | MMIO_SPTE_GEN_MID_MASK | + MMIO_SPTE_GEN_HIGH_MASK))); + +#define MMIO_SPTE_GEN_LOW_BITS (MMIO_SPTE_GEN_LOW_END - MMIO_SPTE_GEN_LOW_START + 1) +#define MMIO_SPTE_GEN_MID_BITS (MMIO_SPTE_GEN_MID_END - MMIO_SPTE_GEN_MID_START + 1) +#define MMIO_SPTE_GEN_HIGH_BITS (MMIO_SPTE_GEN_HIGH_END - MMIO_SPTE_GEN_HIGH_START + 1) +/* remember to adjust the comment above as well if you change these */ +static_assert(MMIO_SPTE_GEN_LOW_BITS == 9 && MMIO_SPTE_GEN_MID_BITS == 7 && + MMIO_SPTE_GEN_HIGH_BITS == 3); + +#define MMIO_SPTE_GEN_LOW_SHIFT (MMIO_SPTE_GEN_LOW_START - 0) +#define MMIO_SPTE_GEN_MID_SHIFT (MMIO_SPTE_GEN_MID_START - MMIO_SPTE_GEN_LOW_BITS) +#define MMIO_SPTE_GEN_HIGH_SHIFT (MMIO_SPTE_GEN_HIGH_START - MMIO_SPTE_GEN_MID_BITS - \ + MMIO_SPTE_GEN_LOW_BITS) + +#define MMIO_SPTE_GEN_MASK GENMASK_ULL(MMIO_SPTE_GEN_LOW_BITS + \ + MMIO_SPTE_GEN_MID_BITS + MMIO_SPTE_GEN_HIGH_BITS - 1, 0) + +#else /* !CONFIG_HAVE_KVM_MMU_PRESENT_HIGH */ + +#define MMIO_SPTE_GEN_LOW_START 3 +#define MMIO_SPTE_GEN_LOW_END 10 #define MMIO_SPTE_GEN_HIGH_START 52 #define MMIO_SPTE_GEN_HIGH_END 62 - #define MMIO_SPTE_GEN_LOW_MASK GENMASK_ULL(MMIO_SPTE_GEN_LOW_END, \ MMIO_SPTE_GEN_LOW_START) #define MMIO_SPTE_GEN_HIGH_MASK GENMASK_ULL(MMIO_SPTE_GEN_HIGH_END, \ MMIO_SPTE_GEN_HIGH_START) static_assert(!(SPTE_MMU_PRESENT_MASK & (MMIO_SPTE_GEN_LOW_MASK | MMIO_SPTE_GEN_HIGH_MASK))); - /* * The SPTE MMIO mask must NOT overlap the MMIO generation bits or the * MMU-present bit. The generation obviously co-exists with the magic MMIO @@ -149,6 +203,8 @@ static_assert(MMIO_SPTE_GEN_LOW_BITS == 8 && MMIO_SPTE_GEN_HIGH_BITS == 11); #define MMIO_SPTE_GEN_MASK GENMASK_ULL(MMIO_SPTE_GEN_LOW_BITS + MMIO_SPTE_GEN_HIGH_BITS - 1, 0) +#endif /* #ifdef CONFIG_HAVE_KVM_MMU_PRESENT_HIGH */ + extern u64 __read_mostly shadow_host_writable_mask; extern u64 __read_mostly shadow_mmu_writable_mask; extern u64 __read_mostly shadow_nx_mask; @@ -465,6 +521,9 @@ static inline u64 get_mmio_spte_generation(u64 spte) u64 gen; gen = (spte & MMIO_SPTE_GEN_LOW_MASK) >> MMIO_SPTE_GEN_LOW_SHIFT; +#ifdef CONFIG_HAVE_KVM_MMU_PRESENT_HIGH + gen |= (spte & MMIO_SPTE_GEN_MID_MASK) >> MMIO_SPTE_GEN_MID_SHIFT; +#endif gen |= (spte & MMIO_SPTE_GEN_HIGH_MASK) >> MMIO_SPTE_GEN_HIGH_SHIFT; return gen; } diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 63b5d55c84e95..b00f9f5180292 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -95,3 +95,6 @@ config KVM_GENERIC_HARDWARE_ENABLING config HAVE_KVM_EXPORTED_TDP bool + +config HAVE_KVM_MMU_PRESENT_HIGH + bool From patchwork Sat Dec 2 09:27:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476856 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="lrkZhbct" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D4C23197; Sat, 2 Dec 2023 01:56:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701511017; x=1733047017; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=eYf3VU1p/NWLVybSdv7z8At46N23CSKFjZR1lIqKJ0M=; b=lrkZhbctPs/Ty3TBP1+Y1Vu4oBa0MtvbSE/rv3zULvFGOJAf3bkKvEY7 I4s0r1RUqgvIBFT93l+I9OyfmAN8ZtgApGj1tNGjwgyLDRVtga49B9Ko6 lJSyyAwNlMhbuNFcmQjO9OGesp8Uyf5pUNQWclApZi/QeBE6ng+xpS3Y0 OGsiAH78TGqSq4pmJ/97jiuWxqTtWhfERCcHmigXEVPIgM5D4AO+1HGD9 59yyROnQgSSwWgE7pVn5AM/4WA5IJfpKigaPFQYSEslgzIHyCv8JAdryC m7dXJADNfjiF0EZ76vCmtvhFY6XtCeX9433ch+YKCTTVpRsAfbGJZ77hr w==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="457913578" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="457913578" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:56:56 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="840460888" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="840460888" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:56:52 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 25/42] KVM: x86/mmu: Abstract "struct kvm_mmu_common" from "struct kvm_mmu" Date: Sat, 2 Dec 2023 17:27:58 +0800 Message-Id: <20231202092758.14978-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Abstract "struct kvm_mmu_common" and move 3 common fields "root, root_role, shadow_zero_check" from "struct kvm_mmu" to "struct kvm_mmu_common". "struct kvm_mmu_common" is a preparation for later patches to introduce "struct kvm_exported_tdp_mmu" which is used by KVM to export TDP. Opportunistically, a new param "struct kvm_mmu_common *mmu_common" is added to make_spte(), so that is_rsvd_spte() in make_spte() can use &mmu_common->shadow_zero_check directly without asking it from vcpu. No functional changes expected. Signed-off-by: Yan Zhao --- arch/x86/include/asm/kvm_host.h | 22 +++-- arch/x86/kvm/mmu.h | 6 +- arch/x86/kvm/mmu/mmu.c | 168 ++++++++++++++++---------------- arch/x86/kvm/mmu/mmu_internal.h | 2 +- arch/x86/kvm/mmu/paging_tmpl.h | 9 +- arch/x86/kvm/mmu/spte.c | 7 +- arch/x86/kvm/mmu/spte.h | 3 +- arch/x86/kvm/mmu/tdp_mmu.c | 13 +-- arch/x86/kvm/svm/svm.c | 2 +- arch/x86/kvm/vmx/nested.c | 2 +- arch/x86/kvm/vmx/vmx.c | 4 +- arch/x86/kvm/x86.c | 8 +- 12 files changed, 127 insertions(+), 119 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d7036982332e3..16e01eee34a99 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -437,12 +437,25 @@ struct kvm_mmu_root_info { struct kvm_mmu_page; struct kvm_page_fault; +struct kvm_mmu_common { + struct kvm_mmu_root_info root; + union kvm_mmu_page_role root_role; + + /* + * check zero bits on shadow page table entries, these + * bits include not only hardware reserved bits but also + * the bits spte never used. + */ + struct rsvd_bits_validate shadow_zero_check; +}; + /* * x86 supports 4 paging modes (5-level 64-bit, 4-level 64-bit, 3-level 32-bit, * and 2-level 32-bit). The kvm_mmu structure abstracts the details of the * current mmu mode. */ struct kvm_mmu { + struct kvm_mmu_common common; unsigned long (*get_guest_pgd)(struct kvm_vcpu *vcpu); u64 (*get_pdptr)(struct kvm_vcpu *vcpu, int index); int (*page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); @@ -453,9 +466,7 @@ struct kvm_mmu { struct x86_exception *exception); int (*sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int i); - struct kvm_mmu_root_info root; union kvm_cpu_role cpu_role; - union kvm_mmu_page_role root_role; /* * The pkru_mask indicates if protection key checks are needed. It @@ -478,13 +489,6 @@ struct kvm_mmu { u64 *pml4_root; u64 *pml5_root; - /* - * check zero bits on shadow page table entries, these - * bits include not only hardware reserved bits but also - * the bits spte never used. - */ - struct rsvd_bits_validate shadow_zero_check; - struct rsvd_bits_validate guest_rsvd_check; u64 pdptrs[4]; /* pae */ diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index bb8c86eefac04..e9631cc23a594 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -126,7 +126,7 @@ void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new, static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu) { - if (likely(vcpu->arch.mmu->root.hpa != INVALID_PAGE)) + if (likely(vcpu->arch.mmu->common.root.hpa != INVALID_PAGE)) return 0; return kvm_mmu_load(vcpu); @@ -148,13 +148,13 @@ static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu) static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu) { - u64 root_hpa = vcpu->arch.mmu->root.hpa; + u64 root_hpa = vcpu->arch.mmu->common.root.hpa; if (!VALID_PAGE(root_hpa)) return; static_call(kvm_x86_load_mmu_pgd)(vcpu, root_hpa, - vcpu->arch.mmu->root_role.level); + vcpu->arch.mmu->common.root_role.level); } static inline void kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu, diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 69af78e508197..cfeb066f38687 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -643,7 +643,7 @@ static bool mmu_spte_age(u64 *sptep) static inline bool is_tdp_mmu_active(struct kvm_vcpu *vcpu) { - return tdp_mmu_enabled && vcpu->arch.mmu->root_role.direct; + return tdp_mmu_enabled && vcpu->arch.mmu->common.root_role.direct; } static void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu) @@ -1911,7 +1911,7 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp) static bool kvm_sync_page_check(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) { - union kvm_mmu_page_role root_role = vcpu->arch.mmu->root_role; + union kvm_mmu_page_role root_role = vcpu->arch.mmu->common.root_role; /* * Ignore various flags when verifying that it's safe to sync a shadow @@ -2363,11 +2363,11 @@ static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterato { iterator->addr = addr; iterator->shadow_addr = root; - iterator->level = vcpu->arch.mmu->root_role.level; + iterator->level = vcpu->arch.mmu->common.root_role.level; if (iterator->level >= PT64_ROOT_4LEVEL && vcpu->arch.mmu->cpu_role.base.level < PT64_ROOT_4LEVEL && - !vcpu->arch.mmu->root_role.direct) + !vcpu->arch.mmu->common.root_role.direct) iterator->level = PT32E_ROOT_LEVEL; if (iterator->level == PT32E_ROOT_LEVEL) { @@ -2375,7 +2375,7 @@ static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterato * prev_root is currently only used for 64-bit hosts. So only * the active root_hpa is valid here. */ - BUG_ON(root != vcpu->arch.mmu->root.hpa); + BUG_ON(root != vcpu->arch.mmu->common.root.hpa); iterator->shadow_addr = vcpu->arch.mmu->pae_root[(addr >> 30) & 3]; @@ -2389,7 +2389,7 @@ static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterato static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator, struct kvm_vcpu *vcpu, u64 addr) { - shadow_walk_init_using_root(iterator, vcpu, vcpu->arch.mmu->root.hpa, + shadow_walk_init_using_root(iterator, vcpu, vcpu->arch.mmu->common.root.hpa, addr); } @@ -2771,7 +2771,7 @@ static int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva) gpa_t gpa; int r; - if (vcpu->arch.mmu->root_role.direct) + if (vcpu->arch.mmu->common.root_role.direct) return 0; gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL); @@ -2939,7 +2939,8 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot, was_rmapped = 1; } - wrprot = make_spte(vcpu, sp, slot, pte_access, gfn, pfn, *sptep, prefetch, + wrprot = make_spte(vcpu, &vcpu->arch.mmu->common, + sp, slot, pte_access, gfn, pfn, *sptep, prefetch, true, host_writable, &spte); if (*sptep == spte) { @@ -3577,7 +3578,7 @@ void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu, /* Before acquiring the MMU lock, see if we need to do any real work. */ free_active_root = (roots_to_free & KVM_MMU_ROOT_CURRENT) - && VALID_PAGE(mmu->root.hpa); + && VALID_PAGE(mmu->common.root.hpa); if (!free_active_root) { for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) @@ -3597,10 +3598,10 @@ void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu, &invalid_list); if (free_active_root) { - if (kvm_mmu_is_dummy_root(mmu->root.hpa)) { + if (kvm_mmu_is_dummy_root(mmu->common.root.hpa)) { /* Nothing to cleanup for dummy roots. */ - } else if (root_to_sp(mmu->root.hpa)) { - mmu_free_root_page(kvm, &mmu->root.hpa, &invalid_list); + } else if (root_to_sp(mmu->common.root.hpa)) { + mmu_free_root_page(kvm, &mmu->common.root.hpa, &invalid_list); } else if (mmu->pae_root) { for (i = 0; i < 4; ++i) { if (!IS_VALID_PAE_ROOT(mmu->pae_root[i])) @@ -3611,8 +3612,8 @@ void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu, mmu->pae_root[i] = INVALID_PAE_ROOT; } } - mmu->root.hpa = INVALID_PAGE; - mmu->root.pgd = 0; + mmu->common.root.hpa = INVALID_PAGE; + mmu->common.root.pgd = 0; } kvm_mmu_commit_zap_page(kvm, &invalid_list); @@ -3631,7 +3632,7 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu) * This should not be called while L2 is active, L2 can't invalidate * _only_ its own roots, e.g. INVVPID unconditionally exits. */ - WARN_ON_ONCE(mmu->root_role.guest_mode); + WARN_ON_ONCE(mmu->common.root_role.guest_mode); for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) { root_hpa = mmu->prev_roots[i].hpa; @@ -3650,7 +3651,7 @@ EXPORT_SYMBOL_GPL(kvm_mmu_free_guest_mode_roots); static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant, u8 level) { - union kvm_mmu_page_role role = vcpu->arch.mmu->root_role; + union kvm_mmu_page_role role = vcpu->arch.mmu->common.root_role; struct kvm_mmu_page *sp; role.level = level; @@ -3668,7 +3669,7 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant, static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) { struct kvm_mmu *mmu = vcpu->arch.mmu; - u8 shadow_root_level = mmu->root_role.level; + u8 shadow_root_level = mmu->common.root_role.level; hpa_t root; unsigned i; int r; @@ -3680,10 +3681,10 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) if (tdp_mmu_enabled) { root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu); - mmu->root.hpa = root; + mmu->common.root.hpa = root; } else if (shadow_root_level >= PT64_ROOT_4LEVEL) { root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level); - mmu->root.hpa = root; + mmu->common.root.hpa = root; } else if (shadow_root_level == PT32E_ROOT_LEVEL) { if (WARN_ON_ONCE(!mmu->pae_root)) { r = -EIO; @@ -3698,7 +3699,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) mmu->pae_root[i] = root | PT_PRESENT_MASK | shadow_me_value; } - mmu->root.hpa = __pa(mmu->pae_root); + mmu->common.root.hpa = __pa(mmu->pae_root); } else { WARN_ONCE(1, "Bad TDP root level = %d\n", shadow_root_level); r = -EIO; @@ -3706,7 +3707,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) } /* root.pgd is ignored for direct MMUs. */ - mmu->root.pgd = 0; + mmu->common.root.pgd = 0; out_unlock: write_unlock(&vcpu->kvm->mmu_lock); return r; @@ -3785,7 +3786,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) root_gfn = root_pgd >> PAGE_SHIFT; if (!kvm_vcpu_is_visible_gfn(vcpu, root_gfn)) { - mmu->root.hpa = kvm_mmu_get_dummy_root(); + mmu->common.root.hpa = kvm_mmu_get_dummy_root(); return 0; } @@ -3819,8 +3820,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) */ if (mmu->cpu_role.base.level >= PT64_ROOT_4LEVEL) { root = mmu_alloc_root(vcpu, root_gfn, 0, - mmu->root_role.level); - mmu->root.hpa = root; + mmu->common.root_role.level); + mmu->common.root.hpa = root; goto set_root_pgd; } @@ -3835,7 +3836,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) * the shadow page table may be a PAE or a long mode page table. */ pm_mask = PT_PRESENT_MASK | shadow_me_value; - if (mmu->root_role.level >= PT64_ROOT_4LEVEL) { + if (mmu->common.root_role.level >= PT64_ROOT_4LEVEL) { pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK; if (WARN_ON_ONCE(!mmu->pml4_root)) { @@ -3844,7 +3845,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) } mmu->pml4_root[0] = __pa(mmu->pae_root) | pm_mask; - if (mmu->root_role.level == PT64_ROOT_5LEVEL) { + if (mmu->common.root_role.level == PT64_ROOT_5LEVEL) { if (WARN_ON_ONCE(!mmu->pml5_root)) { r = -EIO; goto out_unlock; @@ -3876,15 +3877,15 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) mmu->pae_root[i] = root | pm_mask; } - if (mmu->root_role.level == PT64_ROOT_5LEVEL) - mmu->root.hpa = __pa(mmu->pml5_root); - else if (mmu->root_role.level == PT64_ROOT_4LEVEL) - mmu->root.hpa = __pa(mmu->pml4_root); + if (mmu->common.root_role.level == PT64_ROOT_5LEVEL) + mmu->common.root.hpa = __pa(mmu->pml5_root); + else if (mmu->common.root_role.level == PT64_ROOT_4LEVEL) + mmu->common.root.hpa = __pa(mmu->pml4_root); else - mmu->root.hpa = __pa(mmu->pae_root); + mmu->common.root.hpa = __pa(mmu->pae_root); set_root_pgd: - mmu->root.pgd = root_pgd; + mmu->common.root.pgd = root_pgd; out_unlock: write_unlock(&vcpu->kvm->mmu_lock); @@ -3894,7 +3895,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu) { struct kvm_mmu *mmu = vcpu->arch.mmu; - bool need_pml5 = mmu->root_role.level > PT64_ROOT_4LEVEL; + bool need_pml5 = mmu->common.root_role.level > PT64_ROOT_4LEVEL; u64 *pml5_root = NULL; u64 *pml4_root = NULL; u64 *pae_root; @@ -3905,9 +3906,9 @@ static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu) * equivalent level in the guest's NPT to shadow. Allocate the tables * on demand, as running a 32-bit L1 VMM on 64-bit KVM is very rare. */ - if (mmu->root_role.direct || + if (mmu->common.root_role.direct || mmu->cpu_role.base.level >= PT64_ROOT_4LEVEL || - mmu->root_role.level < PT64_ROOT_4LEVEL) + mmu->common.root_role.level < PT64_ROOT_4LEVEL) return 0; /* @@ -4003,16 +4004,16 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu) int i; struct kvm_mmu_page *sp; - if (vcpu->arch.mmu->root_role.direct) + if (vcpu->arch.mmu->common.root_role.direct) return; - if (!VALID_PAGE(vcpu->arch.mmu->root.hpa)) + if (!VALID_PAGE(vcpu->arch.mmu->common.root.hpa)) return; vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY); if (vcpu->arch.mmu->cpu_role.base.level >= PT64_ROOT_4LEVEL) { - hpa_t root = vcpu->arch.mmu->root.hpa; + hpa_t root = vcpu->arch.mmu->common.root.hpa; if (!is_unsync_root(root)) return; @@ -4134,7 +4135,7 @@ static bool get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep) if (!is_shadow_present_pte(sptes[leaf])) leaf++; - rsvd_check = &vcpu->arch.mmu->shadow_zero_check; + rsvd_check = &vcpu->arch.mmu->common.shadow_zero_check; for (level = root; level >= leaf; level--) reserved |= is_rsvd_spte(rsvd_check, sptes[level], level); @@ -4233,7 +4234,7 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, arch.token = alloc_apf_token(vcpu); arch.gfn = gfn; - arch.direct_map = vcpu->arch.mmu->root_role.direct; + arch.direct_map = vcpu->arch.mmu->common.root_role.direct; arch.cr3 = kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu); return kvm_setup_async_pf(vcpu, cr2_or_gpa, @@ -4244,7 +4245,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) { int r; - if ((vcpu->arch.mmu->root_role.direct != work->arch.direct_map) || + if ((vcpu->arch.mmu->common.root_role.direct != work->arch.direct_map) || work->wakeup_all) return; @@ -4252,7 +4253,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) if (unlikely(r)) return; - if (!vcpu->arch.mmu->root_role.direct && + if (!vcpu->arch.mmu->common.root_role.direct && work->arch.cr3 != kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu)) return; @@ -4348,7 +4349,7 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, static bool is_page_fault_stale(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { - struct kvm_mmu_page *sp = root_to_sp(vcpu->arch.mmu->root.hpa); + struct kvm_mmu_page *sp = root_to_sp(vcpu->arch.mmu->common.root.hpa); /* Special roots, e.g. pae_root, are not backed by shadow pages. */ if (sp && is_obsolete_sp(vcpu->kvm, sp)) @@ -4374,7 +4375,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault int r; /* Dummy roots are used only for shadowing bad guest roots. */ - if (WARN_ON_ONCE(kvm_mmu_is_dummy_root(vcpu->arch.mmu->root.hpa))) + if (WARN_ON_ONCE(kvm_mmu_is_dummy_root(vcpu->arch.mmu->common.root.hpa))) return RET_PF_RETRY; if (page_fault_handle_page_track(vcpu, fault)) @@ -4555,9 +4556,9 @@ static inline bool is_root_usable(struct kvm_mmu_root_info *root, gpa_t pgd, /* * Find out if a previously cached root matching the new pgd/role is available, * and insert the current root as the MRU in the cache. - * If a matching root is found, it is assigned to kvm_mmu->root and + * If a matching root is found, it is assigned to kvm_mmu->common.root and * true is returned. - * If no match is found, kvm_mmu->root is left invalid, the LRU root is + * If no match is found, kvm_mmu->common.root is left invalid, the LRU root is * evicted to make room for the current root, and false is returned. */ static bool cached_root_find_and_keep_current(struct kvm *kvm, struct kvm_mmu *mmu, @@ -4566,7 +4567,7 @@ static bool cached_root_find_and_keep_current(struct kvm *kvm, struct kvm_mmu *m { uint i; - if (is_root_usable(&mmu->root, new_pgd, new_role)) + if (is_root_usable(&mmu->common.root, new_pgd, new_role)) return true; for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) { @@ -4578,8 +4579,8 @@ static bool cached_root_find_and_keep_current(struct kvm *kvm, struct kvm_mmu *m * 2 C 0 1 3 * 3 C 0 1 2 (on exit from the loop) */ - swap(mmu->root, mmu->prev_roots[i]); - if (is_root_usable(&mmu->root, new_pgd, new_role)) + swap(mmu->common.root, mmu->prev_roots[i]); + if (is_root_usable(&mmu->common.root, new_pgd, new_role)) return true; } @@ -4589,10 +4590,11 @@ static bool cached_root_find_and_keep_current(struct kvm *kvm, struct kvm_mmu *m /* * Find out if a previously cached root matching the new pgd/role is available. - * On entry, mmu->root is invalid. - * If a matching root is found, it is assigned to kvm_mmu->root, the LRU entry - * of the cache becomes invalid, and true is returned. - * If no match is found, kvm_mmu->root is left invalid and false is returned. + * On entry, mmu->common.root is invalid. + * If a matching root is found, it is assigned to kvm_mmu->common.root, the LRU + * entry of the cache becomes invalid, and true is returned. + * If no match is found, kvm_mmu->common.root is left invalid and false is + * returned. */ static bool cached_root_find_without_current(struct kvm *kvm, struct kvm_mmu *mmu, gpa_t new_pgd, @@ -4607,7 +4609,7 @@ static bool cached_root_find_without_current(struct kvm *kvm, struct kvm_mmu *mm return false; hit: - swap(mmu->root, mmu->prev_roots[i]); + swap(mmu->common.root, mmu->prev_roots[i]); /* Bubble up the remaining roots. */ for (; i < KVM_MMU_NUM_PREV_ROOTS - 1; i++) mmu->prev_roots[i] = mmu->prev_roots[i + 1]; @@ -4622,10 +4624,10 @@ static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu, * Limit reuse to 64-bit hosts+VMs without "special" roots in order to * avoid having to deal with PDPTEs and other complexities. */ - if (VALID_PAGE(mmu->root.hpa) && !root_to_sp(mmu->root.hpa)) + if (VALID_PAGE(mmu->common.root.hpa) && !root_to_sp(mmu->common.root.hpa)) kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT); - if (VALID_PAGE(mmu->root.hpa)) + if (VALID_PAGE(mmu->common.root.hpa)) return cached_root_find_and_keep_current(kvm, mmu, new_pgd, new_role); else return cached_root_find_without_current(kvm, mmu, new_pgd, new_role); @@ -4634,7 +4636,7 @@ static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu, void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd) { struct kvm_mmu *mmu = vcpu->arch.mmu; - union kvm_mmu_page_role new_role = mmu->root_role; + union kvm_mmu_page_role new_role = mmu->common.root_role; /* * Return immediately if no usable root was found, kvm_mmu_reload() @@ -4669,7 +4671,7 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd) * count. Otherwise, clear the write flooding count. */ if (!new_role.direct) { - struct kvm_mmu_page *sp = root_to_sp(vcpu->arch.mmu->root.hpa); + struct kvm_mmu_page *sp = root_to_sp(vcpu->arch.mmu->common.root.hpa); if (!WARN_ON_ONCE(!sp)) __clear_sp_write_flooding_count(sp); @@ -4863,7 +4865,7 @@ static inline u64 reserved_hpa_bits(void) * follow the features in guest. */ static void reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, - struct kvm_mmu *context) + struct kvm_mmu_common *context) { /* @amd adds a check on bit of SPTEs, which KVM shouldn't use anyways. */ bool is_amd = true; @@ -4909,7 +4911,7 @@ static inline bool boot_cpu_is_amd(void) * the direct page table on host, use as much mmu features as * possible, however, kvm currently does not do execution-protection. */ -static void reset_tdp_shadow_zero_bits_mask(struct kvm_mmu *context) +static void reset_tdp_shadow_zero_bits_mask(struct kvm_mmu_common *context) { struct rsvd_bits_validate *shadow_zero_check; int i; @@ -4947,7 +4949,7 @@ static void reset_tdp_shadow_zero_bits_mask(struct kvm_mmu *context) * is the shadow page table for intel nested guest. */ static void -reset_ept_shadow_zero_bits_mask(struct kvm_mmu *context, bool execonly) +reset_ept_shadow_zero_bits_mask(struct kvm_mmu_common *context, bool execonly) { __reset_rsvds_bits_mask_ept(&context->shadow_zero_check, reserved_hpa_bits(), execonly, @@ -5223,11 +5225,11 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu, union kvm_mmu_page_role root_role = kvm_calc_tdp_mmu_root_page_role(vcpu, cpu_role); if (cpu_role.as_u64 == context->cpu_role.as_u64 && - root_role.word == context->root_role.word) + root_role.word == context->common.root_role.word) return; context->cpu_role.as_u64 = cpu_role.as_u64; - context->root_role.word = root_role.word; + context->common.root_role.word = root_role.word; context->page_fault = kvm_tdp_page_fault; context->sync_spte = NULL; context->get_guest_pgd = get_guest_cr3; @@ -5242,7 +5244,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu, context->gva_to_gpa = paging32_gva_to_gpa; reset_guest_paging_metadata(vcpu, context); - reset_tdp_shadow_zero_bits_mask(context); + reset_tdp_shadow_zero_bits_mask(&context->common); } static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *context, @@ -5250,11 +5252,11 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte union kvm_mmu_page_role root_role) { if (cpu_role.as_u64 == context->cpu_role.as_u64 && - root_role.word == context->root_role.word) + root_role.word == context->common.root_role.word) return; context->cpu_role.as_u64 = cpu_role.as_u64; - context->root_role.word = root_role.word; + context->common.root_role.word = root_role.word; if (!is_cr0_pg(context)) nonpaging_init_context(context); @@ -5264,7 +5266,7 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte paging32_init_context(context); reset_guest_paging_metadata(vcpu, context); - reset_shadow_zero_bits_mask(vcpu, context); + reset_shadow_zero_bits_mask(vcpu, &context->common); } static void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, @@ -5356,7 +5358,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly, if (new_mode.as_u64 != context->cpu_role.as_u64) { /* EPT, and thus nested EPT, does not consume CR0, CR4, nor EFER. */ context->cpu_role.as_u64 = new_mode.as_u64; - context->root_role.word = new_mode.base.word; + context->common.root_role.word = new_mode.base.word; context->page_fault = ept_page_fault; context->gva_to_gpa = ept_gva_to_gpa; @@ -5365,7 +5367,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly, update_permission_bitmask(context, true); context->pkru_mask = 0; reset_rsvds_bits_mask_ept(vcpu, context, execonly, huge_page_level); - reset_ept_shadow_zero_bits_mask(context, execonly); + reset_ept_shadow_zero_bits_mask(&context->common, execonly); } kvm_mmu_new_pgd(vcpu, new_eptp); @@ -5451,9 +5453,9 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu) * that problem is swept under the rug; KVM's CPUID API is horrific and * it's all but impossible to solve it without introducing a new API. */ - vcpu->arch.root_mmu.root_role.word = 0; - vcpu->arch.guest_mmu.root_role.word = 0; - vcpu->arch.nested_mmu.root_role.word = 0; + vcpu->arch.root_mmu.common.root_role.word = 0; + vcpu->arch.guest_mmu.common.root_role.word = 0; + vcpu->arch.nested_mmu.common.root_role.word = 0; vcpu->arch.root_mmu.cpu_role.ext.valid = 0; vcpu->arch.guest_mmu.cpu_role.ext.valid = 0; vcpu->arch.nested_mmu.cpu_role.ext.valid = 0; @@ -5477,13 +5479,13 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu) { int r; - r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->root_role.direct); + r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->common.root_role.direct); if (r) goto out; r = mmu_alloc_special_roots(vcpu); if (r) goto out; - if (vcpu->arch.mmu->root_role.direct) + if (vcpu->arch.mmu->common.root_role.direct) r = mmu_alloc_direct_roots(vcpu); else r = mmu_alloc_shadow_roots(vcpu); @@ -5511,9 +5513,9 @@ void kvm_mmu_unload(struct kvm_vcpu *vcpu) struct kvm *kvm = vcpu->kvm; kvm_mmu_free_roots(kvm, &vcpu->arch.root_mmu, KVM_MMU_ROOTS_ALL); - WARN_ON_ONCE(VALID_PAGE(vcpu->arch.root_mmu.root.hpa)); + WARN_ON_ONCE(VALID_PAGE(vcpu->arch.root_mmu.common.root.hpa)); kvm_mmu_free_roots(kvm, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL); - WARN_ON_ONCE(VALID_PAGE(vcpu->arch.guest_mmu.root.hpa)); + WARN_ON_ONCE(VALID_PAGE(vcpu->arch.guest_mmu.common.root.hpa)); vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY); } @@ -5549,7 +5551,7 @@ static void __kvm_mmu_free_obsolete_roots(struct kvm *kvm, struct kvm_mmu *mmu) unsigned long roots_to_free = 0; int i; - if (is_obsolete_root(kvm, mmu->root.hpa)) + if (is_obsolete_root(kvm, mmu->common.root.hpa)) roots_to_free |= KVM_MMU_ROOT_CURRENT; for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) { @@ -5719,7 +5721,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err void *insn, int insn_len) { int r, emulation_type = EMULTYPE_PF; - bool direct = vcpu->arch.mmu->root_role.direct; + bool direct = vcpu->arch.mmu->common.root_role.direct; /* * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP @@ -5732,7 +5734,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err if (WARN_ON_ONCE(error_code & PFERR_IMPLICIT_ACCESS)) error_code &= ~PFERR_IMPLICIT_ACCESS; - if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa))) + if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->common.root.hpa))) return RET_PF_RETRY; r = RET_PF_INVALID; @@ -5762,7 +5764,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err * paging in both guests. If true, we simply unprotect the page * and resume the guest. */ - if (vcpu->arch.mmu->root_role.direct && + if (vcpu->arch.mmu->common.root_role.direct && (error_code & PFERR_NESTED_GUEST_PAGE) == PFERR_NESTED_GUEST_PAGE) { kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa)); return 1; @@ -5844,7 +5846,7 @@ void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, return; if (roots & KVM_MMU_ROOT_CURRENT) - __kvm_mmu_invalidate_addr(vcpu, mmu, addr, mmu->root.hpa); + __kvm_mmu_invalidate_addr(vcpu, mmu, addr, mmu->common.root.hpa); for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) { if (roots & KVM_MMU_ROOT_PREVIOUS(i)) @@ -5990,8 +5992,8 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu) struct page *page; int i; - mmu->root.hpa = INVALID_PAGE; - mmu->root.pgd = 0; + mmu->common.root.hpa = INVALID_PAGE; + mmu->common.root.pgd = 0; for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) mmu->prev_roots[i] = KVM_MMU_ROOT_INFO_INVALID; diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index decc1f1536694..7699596308386 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -299,7 +299,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, }; int r; - if (vcpu->arch.mmu->root_role.direct) { + if (vcpu->arch.mmu->common.root_role.direct) { fault.gfn = fault.addr >> PAGE_SHIFT; fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn); } diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index c85255073f672..84509af0d7f9d 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -648,7 +648,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, if (FNAME(gpte_changed)(vcpu, gw, top_level)) goto out_gpte_changed; - if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa))) + if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->common.root.hpa))) goto out_gpte_changed; /* @@ -657,7 +657,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, * loading a dummy root and handling the resulting page fault, e.g. if * userspace create a memslot in the interim. */ - if (unlikely(kvm_mmu_is_dummy_root(vcpu->arch.mmu->root.hpa))) { + if (unlikely(kvm_mmu_is_dummy_root(vcpu->arch.mmu->common.root.hpa))) { kvm_make_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu); goto out_gpte_changed; } @@ -960,9 +960,8 @@ static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int spte = *sptep; host_writable = spte & shadow_host_writable_mask; slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); - make_spte(vcpu, sp, slot, pte_access, gfn, - spte_to_pfn(spte), spte, true, false, - host_writable, &spte); + make_spte(vcpu, &vcpu->arch.mmu->common, sp, slot, pte_access, + gfn, spte_to_pfn(spte), spte, true, false, host_writable, &spte); return mmu_spte_update(sptep, spte); } diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 179156cd995df..9060a56e45569 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -137,7 +137,8 @@ bool spte_has_volatile_bits(u64 spte) return false; } -bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, +bool make_spte(struct kvm_vcpu *vcpu, + struct kvm_mmu_common *mmu_common, struct kvm_mmu_page *sp, const struct kvm_memory_slot *slot, unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool prefetch, bool can_unsync, @@ -237,9 +238,9 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, if (prefetch) spte = mark_spte_for_access_track(spte); - WARN_ONCE(is_rsvd_spte(&vcpu->arch.mmu->shadow_zero_check, spte, level), + WARN_ONCE(is_rsvd_spte(&mmu_common->shadow_zero_check, spte, level), "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level, - get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level)); + get_rsvd_bits(&mmu_common->shadow_zero_check, spte, level)); if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) { /* Enforced by kvm_mmu_hugepage_adjust. */ diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index b88b686a4ecbc..8f747268a4874 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -530,7 +530,8 @@ static inline u64 get_mmio_spte_generation(u64 spte) bool spte_has_volatile_bits(u64 spte); -bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, +bool make_spte(struct kvm_vcpu *vcpu, + struct kvm_mmu_common *mmu_common, struct kvm_mmu_page *sp, const struct kvm_memory_slot *slot, unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool prefetch, bool can_unsync, diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 6cd4dd631a2fa..6657685a28709 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -219,7 +219,7 @@ static void tdp_mmu_init_child_sp(struct kvm_mmu_page *child_sp, hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu) { - union kvm_mmu_page_role role = vcpu->arch.mmu->root_role; + union kvm_mmu_page_role role = vcpu->arch.mmu->common.root_role; struct kvm *kvm = vcpu->kvm; struct kvm_mmu_page *root; @@ -640,7 +640,7 @@ static inline void tdp_mmu_iter_set_spte(struct kvm *kvm, struct tdp_iter *iter, else #define tdp_mmu_for_each_pte(_iter, _mmu, _start, _end) \ - for_each_tdp_pte(_iter, root_to_sp(_mmu->root.hpa), _start, _end) + for_each_tdp_pte(_iter, root_to_sp(_mmu->common.root.hpa), _start, _end) /* * Yield if the MMU lock is contended or this thread needs to return control @@ -964,9 +964,10 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, if (unlikely(!fault->slot)) new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL); else - wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn, - fault->pfn, iter->old_spte, fault->prefetch, true, - fault->map_writable, &new_spte); + wrprot = make_spte(vcpu, &vcpu->arch.mmu->common, sp, fault->slot, + ACC_ALL, iter->gfn, fault->pfn, iter->old_spte, + fault->prefetch, true, fault->map_writable, + &new_spte); if (new_spte == iter->old_spte) ret = RET_PF_SPURIOUS; @@ -1769,7 +1770,7 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, gfn_t gfn = addr >> PAGE_SHIFT; int leaf = -1; - *root_level = vcpu->arch.mmu->root_role.level; + *root_level = vcpu->arch.mmu->common.root_role.level; tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) { leaf = iter.level; diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 7121463123584..4941f53234a00 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3900,7 +3900,7 @@ static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu) static void svm_flush_tlb_current(struct kvm_vcpu *vcpu) { - hpa_t root_tdp = vcpu->arch.mmu->root.hpa; + hpa_t root_tdp = vcpu->arch.mmu->common.root.hpa; /* * When running on Hyper-V with EnlightenedNptTlb enabled, explicitly diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index c5ec0ef51ff78..43451fca00605 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -5720,7 +5720,7 @@ static int handle_invept(struct kvm_vcpu *vcpu) VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID); roots_to_free = 0; - if (nested_ept_root_matches(mmu->root.hpa, mmu->root.pgd, + if (nested_ept_root_matches(mmu->common.root.hpa, mmu->common.root.pgd, operand.eptp)) roots_to_free |= KVM_MMU_ROOT_CURRENT; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index be20a60047b1f..1cc717a718e9c 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -3190,7 +3190,7 @@ static inline int vmx_get_current_vpid(struct kvm_vcpu *vcpu) static void vmx_flush_tlb_current(struct kvm_vcpu *vcpu) { struct kvm_mmu *mmu = vcpu->arch.mmu; - u64 root_hpa = mmu->root.hpa; + u64 root_hpa = mmu->common.root.hpa; /* No flush required if the current context is invalid. */ if (!VALID_PAGE(root_hpa)) @@ -3198,7 +3198,7 @@ static void vmx_flush_tlb_current(struct kvm_vcpu *vcpu) if (enable_ept) ept_sync_context(construct_eptp(vcpu, root_hpa, - mmu->root_role.level)); + mmu->common.root_role.level)); else vpid_sync_context(vmx_get_current_vpid(vcpu)); } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2c924075f6f11..9ac8682c70ae7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8688,7 +8688,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, WARN_ON_ONCE(!(emulation_type & EMULTYPE_PF))) return false; - if (!vcpu->arch.mmu->root_role.direct) { + if (!vcpu->arch.mmu->common.root_role.direct) { /* * Write permission should be allowed since only * write access need to be emulated. @@ -8721,7 +8721,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, kvm_release_pfn_clean(pfn); /* The instructions are well-emulated on direct mmu. */ - if (vcpu->arch.mmu->root_role.direct) { + if (vcpu->arch.mmu->common.root_role.direct) { unsigned int indirect_shadow_pages; write_lock(&vcpu->kvm->mmu_lock); @@ -8789,7 +8789,7 @@ static bool retry_instruction(struct x86_emulate_ctxt *ctxt, vcpu->arch.last_retry_eip = ctxt->eip; vcpu->arch.last_retry_addr = cr2_or_gpa; - if (!vcpu->arch.mmu->root_role.direct) + if (!vcpu->arch.mmu->common.root_role.direct) gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2_or_gpa, NULL); kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa)); @@ -9089,7 +9089,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, ctxt->exception.address = cr2_or_gpa; /* With shadow page tables, cr2 contains a GVA or nGPA. */ - if (vcpu->arch.mmu->root_role.direct) { + if (vcpu->arch.mmu->common.root_role.direct) { ctxt->gpa_available = true; ctxt->gpa_val = cr2_or_gpa; } From patchwork Sat Dec 2 09:28:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476857 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="XWDZVrUQ" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 601D4181; Sat, 2 Dec 2023 01:57:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701511044; x=1733047044; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=juntWTT8wckOCqH1OE3wLmKIfNnolWpXdTK3qwEkqLY=; b=XWDZVrUQic7KQgcyyYqGRqZLIG2kJJ3CD0a5dIkTcQt6Xa4/b1Nyz/SV DwFUpmJfmi6JwHnvOI71w2n179mVKUs4YeSpcCTGL3mdSJP1l7ivjpKqE LKK0yjYo/0CJ6EWBdJx4VALeGF7QILUmRBig0xni7zUZLBI/i53f+Phsm vGhtR9GWGu8XJ/8XrQa4CmbUbNhT5CrwBOJLkEFGAnTgJoZ2YS3ovFj6w PATLLpWKCbOhQDCd4m3ozSak7YtuFUfhMqbRx8QCWlkYrm6dd/7vCl6lU Xo6C/zYnErami3Ad5BQacnEpTMn/DdNIl+RDt5CE0xM2FDkMcqiQxPDBT w==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="457913602" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="457913602" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:57:23 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="840460946" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="840460946" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:57:19 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 26/42] KVM: x86/mmu: introduce new op get_default_mt_mask to kvm_x86_ops Date: Sat, 2 Dec 2023 17:28:25 +0800 Message-Id: <20231202092825.15041-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Introduce a new op get_default_mt_mask to kvm_x86_ops to get default memory types when no non-coherent DMA devices are attached. For VMX, when there's no non-coherent DMA devices, guest MTRRs and vCPUs CR0.CD mode are not queried to get memory types of EPT. So, introduce a new op get_default_mt_mask that does not require param "vcpu" to get memory types. This is a preparation patch for later KVM MMU to export TDP, because IO page fault requests are in non-vcpu context and have no "vcpu" to get memory type from op get_mt_mask. Signed-off-by: Yan Zhao --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/vmx/vmx.c | 11 +++++++++++ 3 files changed, 13 insertions(+) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index 26b628d84594b..d751407b1056c 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -92,6 +92,7 @@ KVM_X86_OP_OPTIONAL(sync_pir_to_irr) KVM_X86_OP_OPTIONAL_RET0(set_tss_addr) KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr) KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) +KVM_X86_OP_OPTIONAL_RET0(get_default_mt_mask) KVM_X86_OP(load_mmu_pgd) KVM_X86_OP(has_wbinvd_exit) KVM_X86_OP(get_l2_tsc_offset) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 16e01eee34a99..1f6ac04e0f952 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1679,6 +1679,7 @@ struct kvm_x86_ops { int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); int (*set_identity_map_addr)(struct kvm *kvm, u64 ident_addr); u8 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); + u8 (*get_default_mt_mask)(struct kvm *kvm, bool is_mmio); void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 1cc717a718e9c..f290dd3094da6 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7614,6 +7614,16 @@ static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) return kvm_mtrr_get_guest_memory_type(vcpu, gfn) << VMX_EPT_MT_EPTE_SHIFT; } +static u8 vmx_get_default_mt_mask(struct kvm *kvm, bool is_mmio) +{ + WARN_ON(kvm_arch_has_noncoherent_dma(kvm)); + + if (is_mmio) + return MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT; + + return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT; +} + static void vmcs_set_secondary_exec_control(struct vcpu_vmx *vmx, u32 new_ctl) { /* @@ -8295,6 +8305,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = { .set_tss_addr = vmx_set_tss_addr, .set_identity_map_addr = vmx_set_identity_map_addr, .get_mt_mask = vmx_get_mt_mask, + .get_default_mt_mask = vmx_get_default_mt_mask, .get_exit_info = vmx_get_exit_info, From patchwork Sat Dec 2 09:28:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476858 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="lXEjt0Fz" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0B9419F; Sat, 2 Dec 2023 01:57:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701511069; x=1733047069; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=55tsKcT+FAhok+Xkgfzjx0zHipfnMqIv8/jCkGwDm9g=; b=lXEjt0FzZuDXYIEH5JkY5eNLzil4BbGWwfV3tA/c7zAhNUmH52SdTQML 8LN9jgr8bSOeCaVsSUR/IhgSLpE7fcQKMucT3nfEhEWDf/2G3E9j6E+3d 34H05kRy0mqf5g/M3GPfgSCzomefEK/iDuggEVVNlmXAOH1B1Y3w68zrl 6OojjSfWMI4vLPLKOtAbyE7mT2hES98aAfRr7dbXc1g4RUf/QmD23Hcxs K/803m2dIrNKg/tyYArCI0VL/kDl5uj3M2hUwDbY9ahEstDvG2wFyfgxv JnYQbRNj7tjjcpQYDQ1wQm88u7EgblUvZSyOznhaNaTnNLcgPSt6oWLdb Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="12304410" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="12304410" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:57:49 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="1101537132" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="1101537132" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:57:44 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 27/42] KVM: x86/mmu: change param "vcpu" to "kvm" in kvm_mmu_hugepage_adjust() Date: Sat, 2 Dec 2023 17:28:50 +0800 Message-Id: <20231202092850.15107-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: kvm_mmu_hugepage_adjust() requires "vcpu" only to get "vcpu->kvm". Switch to pass in "kvm" directly. No functional changes expected. Signed-off-by: Yan Zhao --- arch/x86/kvm/mmu/mmu.c | 8 ++++---- arch/x86/kvm/mmu/mmu_internal.h | 2 +- arch/x86/kvm/mmu/paging_tmpl.h | 2 +- arch/x86/kvm/mmu/tdp_mmu.c | 2 +- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index cfeb066f38687..b461bab51255e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3159,7 +3159,7 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, return min(host_level, max_level); } -void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) +void kvm_mmu_hugepage_adjust(struct kvm *kvm, struct kvm_page_fault *fault) { struct kvm_memory_slot *slot = fault->slot; kvm_pfn_t mask; @@ -3179,8 +3179,8 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault * Enforce the iTLB multihit workaround after capturing the requested * level, which will be used to do precise, accurate accounting. */ - fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot, - fault->gfn, fault->max_level); + fault->req_level = kvm_mmu_max_mapping_level(kvm, slot, fault->gfn, + fault->max_level); if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed) return; @@ -3222,7 +3222,7 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) int ret; gfn_t base_gfn = fault->gfn; - kvm_mmu_hugepage_adjust(vcpu, fault); + kvm_mmu_hugepage_adjust(vcpu->kvm, fault); trace_kvm_mmu_spte_requested(fault); for_each_shadow_entry(vcpu, fault->addr, it) { diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 7699596308386..1e9be0604e348 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -339,7 +339,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, int kvm_mmu_max_mapping_level(struct kvm *kvm, const struct kvm_memory_slot *slot, gfn_t gfn, int max_level); -void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); +void kvm_mmu_hugepage_adjust(struct kvm *kvm, struct kvm_page_fault *fault); void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_level); void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 84509af0d7f9d..13c6390824a3e 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -716,7 +716,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, * are being shadowed by KVM, i.e. allocating a new shadow page may * affect the allowed hugepage size. */ - kvm_mmu_hugepage_adjust(vcpu, fault); + kvm_mmu_hugepage_adjust(vcpu->kvm, fault); trace_kvm_mmu_spte_requested(fault); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 6657685a28709..5d76d4849e8aa 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1047,7 +1047,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) struct kvm_mmu_page *sp; int ret = RET_PF_RETRY; - kvm_mmu_hugepage_adjust(vcpu, fault); + kvm_mmu_hugepage_adjust(vcpu->kvm, fault); trace_kvm_mmu_spte_requested(fault); From patchwork Sat Dec 2 09:29:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476859 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="g0FBx8eH" Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A4791E3; Sat, 2 Dec 2023 01:58:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701511103; x=1733047103; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=/sZNp1HTRIQfF9DKgY191Av3Iz3lMdrk+plprrEgK+w=; b=g0FBx8eHz1PxIow3zhCSsV9sgvMBKfNWPsderNGvX4z2P4Osbh2W+86+ Htv6ql1KBx9jl8Bi/Y2IJdgudgZhVqQHc4g69C89XHmAdHZ9UTfsX7Ix0 xgxE+xv43w+Ttt1ScmAWkCTMZVPdSiTckVk13M5xqwoUqx9nUXuFvyWWY cvY6tmWpIB6SsisubCuda4vyfFdCZh8I+m37+tMkNmPq/cVP1+X8r2OjV dtjIe02uxLQvkzruxukuwcLaGxFIY8mO3f+u/FkoMCbv+J6kNzhLHv5u4 j83luoNNQojQ+DjMFhHNf9pDH2hIab80iYrCqJz/hNdVDwFOXWizKyQx2 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="393322323" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="393322323" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:58:23 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="804337625" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="804337625" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:58:19 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 28/42] KVM: x86/mmu: change "vcpu" to "kvm" in page_fault_handle_page_track() Date: Sat, 2 Dec 2023 17:29:20 +0800 Message-Id: <20231202092920.15167-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: page_fault_handle_page_track() only uses param "vcpu" to refer to "vcpu->kvm", change it to "kvm" directly. No functional changes expected. Signed-off-by: Yan Zhao --- arch/x86/kvm/mmu/mmu.c | 8 ++++---- arch/x86/kvm/mmu/paging_tmpl.h | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index b461bab51255e..73437c1b1943e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4186,7 +4186,7 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct) return RET_PF_RETRY; } -static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu, +static bool page_fault_handle_page_track(struct kvm *kvm, struct kvm_page_fault *fault) { if (unlikely(fault->rsvd)) @@ -4199,7 +4199,7 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu, * guest is writing the page which is write tracked which can * not be fixed by page fault handler. */ - if (kvm_gfn_is_write_tracked(vcpu->kvm, fault->slot, fault->gfn)) + if (kvm_gfn_is_write_tracked(kvm, fault->slot, fault->gfn)) return true; return false; @@ -4378,7 +4378,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault if (WARN_ON_ONCE(kvm_mmu_is_dummy_root(vcpu->arch.mmu->common.root.hpa))) return RET_PF_RETRY; - if (page_fault_handle_page_track(vcpu, fault)) + if (page_fault_handle_page_track(vcpu->kvm, fault)) return RET_PF_EMULATE; r = fast_page_fault(vcpu, fault); @@ -4458,7 +4458,7 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, { int r; - if (page_fault_handle_page_track(vcpu, fault)) + if (page_fault_handle_page_track(vcpu->kvm, fault)) return RET_PF_EMULATE; r = fast_page_fault(vcpu, fault); diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 13c6390824a3e..f685b036f6637 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -803,7 +803,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault fault->max_level = walker.level; fault->slot = kvm_vcpu_gfn_to_memslot(vcpu, fault->gfn); - if (page_fault_handle_page_track(vcpu, fault)) { + if (page_fault_handle_page_track(vcpu->kvm, fault)) { shadow_page_table_clear_flood(vcpu, fault->addr); return RET_PF_EMULATE; } From patchwork Sat Dec 2 09:29:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476860 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QNgizjN+" Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C1933D48; Sat, 2 Dec 2023 01:58:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701511127; x=1733047127; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=9hhVkoMnDMz5pGlI3lvmW362r0q8LSuE7t4ADMue25I=; b=QNgizjN+13FOvANliVFwV8k6h+zGqT7x0n8BKu5LWYxPTOuGTbt+KP+k X8pbvr14EsJrFHXcPOB+W/giYeGEAwVg89NXE61TU9tLldWNLM8f0DioU g+KEKK5eCkbv56ngVkfBY0DJR2jPeTgOoMY2MfK8F+sbj5Y+SkuSTKZWi UgmSq3qdc6EtXEOtUFHOHmSg5sVT/LWWgnpXlScsqz3DVHH47ooJtl8zG Rg/nedfxtMw5FZu2E6m7MD2xCWV26qTrQBEylrIC/9PJdDHrBmZkHGa7Q ixNNBO3nM2ETttdu9C8HD4dqrEsWxV1kIYDtPpPHZK8on7WEJ97SWjCuc A==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="393322340" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="393322340" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:58:47 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="804337647" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="804337647" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:58:43 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 29/42] KVM: x86/mmu: remove param "vcpu" from kvm_mmu_get_tdp_level() Date: Sat, 2 Dec 2023 17:29:48 +0800 Message-Id: <20231202092948.15224-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: kvm_mmu_get_tdp_level() only requires param "vcpu" for cpuid_maxphyaddr(). So, pass in the value of cpuid_maxphyaddr() directly to avoid param "vcpu". This is a preparation patch for later KVM MMU to export TDP. No functional changes expected. Signed-off-by: Yan Zhao --- arch/x86/kvm/mmu/mmu.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 73437c1b1943e..abdf49b5cdd79 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5186,14 +5186,14 @@ void __kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu, reset_guest_paging_metadata(vcpu, mmu); } -static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu) +static inline int kvm_mmu_get_tdp_level(int maxphyaddr) { /* tdp_root_level is architecture forced level, use it if nonzero */ if (tdp_root_level) return tdp_root_level; /* Use 5-level TDP if and only if it's useful/necessary. */ - if (max_tdp_level == 5 && cpuid_maxphyaddr(vcpu) <= 48) + if (max_tdp_level == 5 && maxphyaddr <= 48) return 4; return max_tdp_level; @@ -5211,7 +5211,7 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu, role.smm = cpu_role.base.smm; role.guest_mode = cpu_role.base.guest_mode; role.ad_disabled = !kvm_ad_enabled(); - role.level = kvm_mmu_get_tdp_level(vcpu); + role.level = kvm_mmu_get_tdp_level(cpuid_maxphyaddr(vcpu)); role.direct = true; role.has_4_byte_gpte = false; @@ -5310,7 +5310,7 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0, WARN_ON_ONCE(cpu_role.base.direct); root_role = cpu_role.base; - root_role.level = kvm_mmu_get_tdp_level(vcpu); + root_role.level = kvm_mmu_get_tdp_level(cpuid_maxphyaddr(vcpu)); if (root_role.level == PT64_ROOT_5LEVEL && cpu_role.base.level == PT64_ROOT_4LEVEL) root_role.passthrough = 1; @@ -6012,7 +6012,8 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu) * other exception is for shadowing L1's 32-bit or PAE NPT on 64-bit * KVM; that horror is handled on-demand by mmu_alloc_special_roots(). */ - if (tdp_enabled && kvm_mmu_get_tdp_level(vcpu) > PT32E_ROOT_LEVEL) + if (tdp_enabled && + kvm_mmu_get_tdp_level(cpuid_maxphyaddr(vcpu)) > PT32E_ROOT_LEVEL) return 0; page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_DMA32); From patchwork Sat Dec 2 09:30:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476861 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="O1YzupEp" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 998CFCC; Sat, 2 Dec 2023 01:59:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701511160; x=1733047160; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=wAzhYUH58e3QaRHxpmVeodzpQMpT0HkCfncirII7TQo=; b=O1YzupEpSEyeQiZfgQr5dumCrIRT7Crl2oGy5iIHdBPHtRQto/iYQd8t F00XLSUgnjZojfkHhq9lVGVmNkvkIMvpqhsNMXyZcYELKJ2M2ptPgC6fz fSQXHDVmsQiLi62rUIxdaUPDvMoNW+sfzaRHkwPh/I606wovvNBwuuTR3 0hY3VGVOtWR2fHuxpnziPKZuA3C7hrnrQjkofme7zWRtvsvxJt0LwsTL2 ko1kMFegXVtISNNmhL7IenEEmQHeHs6Z+NWsdHlO9kYlwoSTAk6JQrXXN 9/nXjoYqh3FWUMnbjS7JbT2yYGtQmAQ1WSQdsv6PUV2d5pYxw7i0zJryb g==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="606859" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="606859" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:59:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="18038610" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:59:15 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 30/42] KVM: x86/mmu: remove param "vcpu" from kvm_calc_tdp_mmu_root_page_role() Date: Sat, 2 Dec 2023 17:30:21 +0800 Message-Id: <20231202093021.15281-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: kvm_calc_tdp_mmu_root_page_role() only requires "vcpu" to get maxphyaddr for kvm_mmu_get_tdp_level(). So, just pass in the value of maxphyaddr from the caller to get rid of param "vcpu". This is a preparation patch for later KVM MMU to export TDP. No functional changes expected. Signed-off-by: Yan Zhao --- arch/x86/kvm/mmu/mmu.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index abdf49b5cdd79..bcf17aef29119 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5200,7 +5200,7 @@ static inline int kvm_mmu_get_tdp_level(int maxphyaddr) } static union kvm_mmu_page_role -kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu, +kvm_calc_tdp_mmu_root_page_role(int maxphyaddr, union kvm_cpu_role cpu_role) { union kvm_mmu_page_role role = {0}; @@ -5211,7 +5211,7 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu, role.smm = cpu_role.base.smm; role.guest_mode = cpu_role.base.guest_mode; role.ad_disabled = !kvm_ad_enabled(); - role.level = kvm_mmu_get_tdp_level(cpuid_maxphyaddr(vcpu)); + role.level = kvm_mmu_get_tdp_level(maxphyaddr); role.direct = true; role.has_4_byte_gpte = false; @@ -5222,7 +5222,9 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu, union kvm_cpu_role cpu_role) { struct kvm_mmu *context = &vcpu->arch.root_mmu; - union kvm_mmu_page_role root_role = kvm_calc_tdp_mmu_root_page_role(vcpu, cpu_role); + union kvm_mmu_page_role root_role; + + root_role = kvm_calc_tdp_mmu_root_page_role(cpuid_maxphyaddr(vcpu), cpu_role); if (cpu_role.as_u64 == context->cpu_role.as_u64 && root_role.word == context->common.root_role.word) From patchwork Sat Dec 2 09:30:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476862 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Jt/wD5QU" Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D4A8E3; Sat, 2 Dec 2023 01:59:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701511192; x=1733047192; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=BB3oVFRCJKuJwn2fpuBKAK+sq6q4cIt28G0VfFVlmdc=; b=Jt/wD5QUYehfDm9wIlqvtSpVlHoBtUUDS2RwK9li/6MaqrZlSDOxwGDR TqM5Fj2JLGqrURwezVfFS2dveQVyLB30E72MVG1D9v0CjK/jdjDXGaUmx xYoa+HUAcDw8O3II53hLYMnDSDoZZ9VcXbKERzOW3ctnMn3HtiZkKdy3T IdTnQMq71ggJO5H60Ul92MlGy8dDmLQOWZ1Nl12WkQAQ2ncqWJgfXZIsl XjG6gq7cUEVclZrwnDBtptf/zXnnnHnRuHd409voG5PmKH8dd3J6EExUh CFGOYSi1+y9TRi6iD4VR5bK5J/a0ib7me+IthXvrwCx1hWunzZJMbwDWE A==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="424756329" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="424756329" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:59:52 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="836022446" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="836022446" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 01:59:48 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 31/42] KVM: x86/mmu: add extra param "kvm" to kvm_faultin_pfn() Date: Sat, 2 Dec 2023 17:30:49 +0800 Message-Id: <20231202093049.15341-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Add an extra param "kvm" to kvm_faultin_pfn() to allow param "vcpu" to be NULL in future to allow page faults in non-vcpu context. It is a preparation for later KVM MMU to export TDP. No-slot mapping (for emulated MMIO cache), async pf, sig pending PFN are not compatible to page fault in non-vcpu context. Signed-off-by: Yan Zhao --- arch/x86/kvm/mmu/mmu.c | 35 +++++++++++++++++++--------------- arch/x86/kvm/mmu/paging_tmpl.h | 2 +- 2 files changed, 21 insertions(+), 16 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index bcf17aef29119..df5651ea99139 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3266,9 +3266,10 @@ static void kvm_send_hwpoison_signal(struct kvm_memory_slot *slot, gfn_t gfn) send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva, PAGE_SHIFT, current); } -static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) +static int kvm_handle_error_pfn(struct kvm *kvm, struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault) { - if (is_sigpending_pfn(fault->pfn)) { + if (is_sigpending_pfn(fault->pfn) && vcpu) { kvm_handle_signal_exit(vcpu); return -EINTR; } @@ -3289,12 +3290,15 @@ static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fa return -EFAULT; } -static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu, +static int kvm_handle_noslot_fault(struct kvm *kvm, struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, unsigned int access) { gva_t gva = fault->is_tdp ? 0 : fault->addr; + if (!vcpu) + return -EFAULT; + vcpu_cache_mmio_info(vcpu, gva, fault->gfn, access & shadow_mmio_access_mask); @@ -4260,7 +4264,8 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true, NULL); } -static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) +static int __kvm_faultin_pfn(struct kvm *kvm, struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault) { struct kvm_memory_slot *slot = fault->slot; bool async; @@ -4275,7 +4280,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault if (!kvm_is_visible_memslot(slot)) { /* Don't expose private memslots to L2. */ - if (is_guest_mode(vcpu)) { + if (vcpu && is_guest_mode(vcpu)) { fault->slot = NULL; fault->pfn = KVM_PFN_NOSLOT; fault->map_writable = false; @@ -4288,7 +4293,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault * when the AVIC is re-enabled. */ if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT && - !kvm_apicv_activated(vcpu->kvm)) + !kvm_apicv_activated(kvm)) return RET_PF_EMULATE; } @@ -4299,7 +4304,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault if (!async) return RET_PF_CONTINUE; /* *pfn has correct page already */ - if (!fault->prefetch && kvm_can_do_async_pf(vcpu)) { + if (!fault->prefetch && vcpu && kvm_can_do_async_pf(vcpu)) { trace_kvm_try_async_get_page(fault->addr, fault->gfn); if (kvm_find_async_pf_gfn(vcpu, fault->gfn)) { trace_kvm_async_pf_repeated_fault(fault->addr, fault->gfn); @@ -4321,23 +4326,23 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault return RET_PF_CONTINUE; } -static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, - unsigned int access) +static int kvm_faultin_pfn(struct kvm *kvm, struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault, unsigned int access) { int ret; - fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq; + fault->mmu_seq = kvm->mmu_invalidate_seq; smp_rmb(); - ret = __kvm_faultin_pfn(vcpu, fault); + ret = __kvm_faultin_pfn(kvm, vcpu, fault); if (ret != RET_PF_CONTINUE) return ret; if (unlikely(is_error_pfn(fault->pfn))) - return kvm_handle_error_pfn(vcpu, fault); + return kvm_handle_error_pfn(kvm, vcpu, fault); if (unlikely(!fault->slot)) - return kvm_handle_noslot_fault(vcpu, fault, access); + return kvm_handle_noslot_fault(kvm, vcpu, fault, access); return RET_PF_CONTINUE; } @@ -4389,7 +4394,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault if (r) return r; - r = kvm_faultin_pfn(vcpu, fault, ACC_ALL); + r = kvm_faultin_pfn(vcpu->kvm, vcpu, fault, ACC_ALL); if (r != RET_PF_CONTINUE) return r; @@ -4469,7 +4474,7 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, if (r) return r; - r = kvm_faultin_pfn(vcpu, fault, ACC_ALL); + r = kvm_faultin_pfn(vcpu->kvm, vcpu, fault, ACC_ALL); if (r != RET_PF_CONTINUE) return r; diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index f685b036f6637..054d1a203f0ca 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -812,7 +812,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault if (r) return r; - r = kvm_faultin_pfn(vcpu, fault, walker.pte_access); + r = kvm_faultin_pfn(vcpu->kvm, vcpu, fault, walker.pte_access); if (r != RET_PF_CONTINUE) return r; From patchwork Sat Dec 2 09:31:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476863 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="dj8DtcpB" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9FC88181; Sat, 2 Dec 2023 02:00:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701511219; x=1733047219; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=eyjI0l/aTUHZutjS+4cOREv8PCxRPavsmMsLSGZ4+qo=; b=dj8DtcpBbcGAiquQ0YiOO6uWvzXx0YTekQcnBA3k2WO38fOf0k6OetBs xFqcd4ypxBpub92LO0CGqnqqTwtUfEJ7RLFHJ9CT/gBbF9wrbCGOP0NJo dr4P8YyUHdYPpoEev9HxMtXL8bDxn2l3FC70VTHmztcVRmOwzlWDZ6cpa Xn8Og5K0O1I9H9EiEqo39rkkRBHdhcNgTqlZNhIH/l09lJcdXLCelHBBJ 1wZ5vL/cwKqCnQLgiIi1D7LI2hwC7U6Aik/Um0dTcR3DStiNiRkm0+Ql2 nsz9GLrCI0CLES6+/v4fGDSdUpVtSqVOwir8dV/iSmg06qmzlhi3T5GiF A==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="396395293" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="396395293" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 02:00:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="799015506" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="799015506" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 02:00:15 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 32/42] KVM: x86/mmu: add extra param "kvm" to make_mmio_spte() Date: Sat, 2 Dec 2023 17:31:19 +0800 Message-Id: <20231202093119.15407-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Add an extra param "kvm" to make_mmio_spte() to allow param "vcpu" to be NULL in future to allow generating mmio spte in non-vcpu context. When "vcpu" is NULL, kvm_memslots() rather than kvm_vcpu_memslots() is called to get memslots pointer, so MMIO SPTEs are not allowed to be generated for SMM mode in non-vCPU context. This is a preparation patch for later KVM MMU to export TDP. Note: actually, if the exported TDP is mapped in non-vCPU context, it will not reach make_mmio_spte() due to earlier failure in kvm_handle_noslot_fault(). make_mmio_spte() is modified in this patch to avoid the check of "vcpu" in the caller. Signed-off-by: Yan Zhao --- arch/x86/kvm/mmu/mmu.c | 2 +- arch/x86/kvm/mmu/spte.c | 5 +++-- arch/x86/kvm/mmu/spte.h | 2 +- arch/x86/kvm/mmu/tdp_mmu.c | 2 +- 4 files changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index df5651ea99139..e4cae4ff20770 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -296,7 +296,7 @@ static void kvm_flush_remote_tlbs_sptep(struct kvm *kvm, u64 *sptep) static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn, unsigned int access) { - u64 spte = make_mmio_spte(vcpu, gfn, access); + u64 spte = make_mmio_spte(vcpu->kvm, vcpu, gfn, access); trace_mark_mmio_spte(sptep, gfn, spte); mmu_spte_set(sptep, spte); diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 9060a56e45569..daeab3b9eee1e 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -71,9 +71,10 @@ static u64 generation_mmio_spte_mask(u64 gen) return mask; } -u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access) +u64 make_mmio_spte(struct kvm *kvm, struct kvm_vcpu *vcpu, u64 gfn, unsigned int access) { - u64 gen = kvm_vcpu_memslots(vcpu)->generation & MMIO_SPTE_GEN_MASK; + struct kvm_memslots *memslots = vcpu ? kvm_vcpu_memslots(vcpu) : kvm_memslots(kvm); + u64 gen = memslots->generation & MMIO_SPTE_GEN_MASK; u64 spte = generation_mmio_spte_mask(gen); u64 gpa = gfn << PAGE_SHIFT; diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 8f747268a4874..4ad19c469bd73 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -539,7 +539,7 @@ bool make_spte(struct kvm_vcpu *vcpu, u64 make_huge_page_split_spte(struct kvm *kvm, u64 huge_spte, union kvm_mmu_page_role role, int index); u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled); -u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access); +u64 make_mmio_spte(struct kvm *kvm, struct kvm_vcpu *vcpu, u64 gfn, unsigned int access); u64 mark_spte_for_access_track(u64 spte); /* Restore an acc-track PTE back to a regular PTE */ diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 5d76d4849e8aa..892cf1f5b57a8 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -962,7 +962,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, return RET_PF_RETRY; if (unlikely(!fault->slot)) - new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL); + new_spte = make_mmio_spte(vcpu->kvm, vcpu, iter->gfn, ACC_ALL); else wrprot = make_spte(vcpu, &vcpu->arch.mmu->common, sp, fault->slot, ACC_ALL, iter->gfn, fault->pfn, iter->old_spte, From patchwork Sat Dec 2 09:31:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476864 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ejFyGcms" Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9B9E1E3; Sat, 2 Dec 2023 02:00:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701511250; x=1733047250; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=cQ/PBVpj7VdBPsR5uXHKk49rWc/LsdPwooGhW6lyHNs=; b=ejFyGcmsSgnKkdEkzBy7J5CqQhofODqM52qRnWmpWtRTqLicvdusE/UL OGXkJyfGkuBRjQotDagPyCcjsXZxx6/FiRkB0rSO/6MAeRtDHqvAi7CaX u6EZhrOgRE1i8aaC0DkDEfbO/mWPwzh+OD/CmmYHBi5yf6H/7ZJYfMLAK nikAnxovFRhpSR+Bf3eCDMUs0PfjgDT7jgzNcKo55qfvZMOPTGeQce25k e+rhcVZW4Iexa8TOS4m24XMbgLm/Zkps8XhUa2J8yKCaTo0VzajTKdAND mYCNP2oC/o98MXvEizEk7wctaopMgLXPoAMiEB3NKdy5M86zEkZuK91bq g==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="372983418" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="372983418" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 02:00:50 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="773709825" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="773709825" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 02:00:46 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 33/42] KVM: x86/mmu: add extra param "kvm" to make_spte() Date: Sat, 2 Dec 2023 17:31:46 +0800 Message-Id: <20231202093146.15477-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Add an extra param "kvm" to make_spte() to allow param "vcpu" to be NULL in future to allow generating spte in non-vcpu context. "vcpu" is only used in make_spte() to get memory type mask if shadow_memtype_mask is true, which applies only to VMX when EPT is enabled. VMX only requires param "vcpu" when non-coherent DMA devices are attached to check vcpu's CR0.CD and guest MTRRs. So, if non-coherent DMAs are not attached, make_spte() can call kvm_x86_get_default_mt_mask() to get default memory type for non-vCPU context. This is a preparation patch for later KVM MMU to export TDP. Signed-off-by: Yan Zhao --- arch/x86/kvm/mmu/mmu.c | 2 +- arch/x86/kvm/mmu/paging_tmpl.h | 2 +- arch/x86/kvm/mmu/spte.c | 18 ++++++++++++------ arch/x86/kvm/mmu/spte.h | 2 +- arch/x86/kvm/mmu/tdp_mmu.c | 2 +- 5 files changed, 16 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index e4cae4ff20770..c9b587b30dae3 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2939,7 +2939,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot, was_rmapped = 1; } - wrprot = make_spte(vcpu, &vcpu->arch.mmu->common, + wrprot = make_spte(vcpu->kvm, vcpu, &vcpu->arch.mmu->common, sp, slot, pte_access, gfn, pfn, *sptep, prefetch, true, host_writable, &spte); diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 054d1a203f0ca..fb4767a9e966e 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -960,7 +960,7 @@ static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int spte = *sptep; host_writable = spte & shadow_host_writable_mask; slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); - make_spte(vcpu, &vcpu->arch.mmu->common, sp, slot, pte_access, + make_spte(vcpu->kvm, vcpu, &vcpu->arch.mmu->common, sp, slot, pte_access, gfn, spte_to_pfn(spte), spte, true, false, host_writable, &spte); return mmu_spte_update(sptep, spte); diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index daeab3b9eee1e..5e73a679464c0 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -138,7 +138,7 @@ bool spte_has_volatile_bits(u64 spte) return false; } -bool make_spte(struct kvm_vcpu *vcpu, +bool make_spte(struct kvm *kvm, struct kvm_vcpu *vcpu, struct kvm_mmu_common *mmu_common, struct kvm_mmu_page *sp, const struct kvm_memory_slot *slot, unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn, @@ -179,7 +179,7 @@ bool make_spte(struct kvm_vcpu *vcpu, * just to optimize a mode that is anything but performance critical. */ if (level > PG_LEVEL_4K && (pte_access & ACC_EXEC_MASK) && - is_nx_huge_page_enabled(vcpu->kvm)) { + is_nx_huge_page_enabled(kvm)) { pte_access &= ~ACC_EXEC_MASK; } @@ -194,9 +194,15 @@ bool make_spte(struct kvm_vcpu *vcpu, if (level > PG_LEVEL_4K) spte |= PT_PAGE_SIZE_MASK; - if (shadow_memtype_mask) - spte |= static_call(kvm_x86_get_mt_mask)(vcpu, gfn, + if (shadow_memtype_mask) { + if (vcpu) + spte |= static_call(kvm_x86_get_mt_mask)(vcpu, gfn, kvm_is_mmio_pfn(pfn)); + else + spte |= static_call(kvm_x86_get_default_mt_mask)(kvm, + kvm_is_mmio_pfn(pfn)); + } + if (host_writable) spte |= shadow_host_writable_mask; else @@ -225,7 +231,7 @@ bool make_spte(struct kvm_vcpu *vcpu, * e.g. it's write-tracked (upper-level SPs) or has one or more * shadow pages and unsync'ing pages is not allowed. */ - if (mmu_try_to_unsync_pages(vcpu->kvm, slot, gfn, can_unsync, prefetch)) { + if (mmu_try_to_unsync_pages(kvm, slot, gfn, can_unsync, prefetch)) { wrprot = true; pte_access &= ~ACC_WRITE_MASK; spte &= ~(PT_WRITABLE_MASK | shadow_mmu_writable_mask); @@ -246,7 +252,7 @@ bool make_spte(struct kvm_vcpu *vcpu, if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) { /* Enforced by kvm_mmu_hugepage_adjust. */ WARN_ON_ONCE(level > PG_LEVEL_4K); - mark_page_dirty_in_slot(vcpu->kvm, slot, gfn); + mark_page_dirty_in_slot(kvm, slot, gfn); } *new_spte = spte; diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 4ad19c469bd73..f1532589b7083 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -530,7 +530,7 @@ static inline u64 get_mmio_spte_generation(u64 spte) bool spte_has_volatile_bits(u64 spte); -bool make_spte(struct kvm_vcpu *vcpu, +bool make_spte(struct kvm *kvm, struct kvm_vcpu *vcpu, struct kvm_mmu_common *mmu_common, struct kvm_mmu_page *sp, const struct kvm_memory_slot *slot, unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn, diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 892cf1f5b57a8..a45d1b71cd62a 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -964,7 +964,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, if (unlikely(!fault->slot)) new_spte = make_mmio_spte(vcpu->kvm, vcpu, iter->gfn, ACC_ALL); else - wrprot = make_spte(vcpu, &vcpu->arch.mmu->common, sp, fault->slot, + wrprot = make_spte(vcpu->kvm, vcpu, &vcpu->arch.mmu->common, sp, fault->slot, ACC_ALL, iter->gfn, fault->pfn, iter->old_spte, fault->prefetch, true, fault->map_writable, &new_spte); From patchwork Sat Dec 2 09:32:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476865 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="WwNJySS6" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C89FD197; Sat, 2 Dec 2023 02:01:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701511293; x=1733047293; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=uSoSY6TiRDOPE0HLSqzCfFuL+LmvcCuprKj793XIBHM=; b=WwNJySS6Cr0O/UYFtRhXvzlgYvQtaB6XcPn+ZduIu0tpIeOqaHwQ1xiM mgvFNrZtI4XggLjue5kigE/R458Ok89nlcSBPpVf0/hZmiZ2u+tr79+oK NWO0PWHUlatzrwYxnzAN5XVbqDadwO5Pk/X8mG27b13nolLGoNJIZW469 20RchAugsiJIO4nB6uvP8pSmlj6BFM7GK9+mRrDSFbPqM3kLp/FT1Zloz j6fRSOGrM+a/O/ptHBydRSEqdPcLsv05bzWkiD1+/mHZ/vbFXmcjDcjSB cs2wmhcv1iBWHZlnHKwGvPkQ10l1h6VT0prNRpHdkziTLFoIlMoOkn506 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="396395349" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="396395349" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 02:01:33 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="799015715" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="799015715" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 02:01:17 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 34/42] KVM: x86/mmu: add extra param "kvm" to tdp_mmu_map_handle_target_level() Date: Sat, 2 Dec 2023 17:32:22 +0800 Message-Id: <20231202093222.15534-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Add an extra param "kvm" to tdp_mmu_map_handle_target_level() to allow for mapping in non-vCPU context in future. "vcpu" is only required in tdp_mmu_map_handle_target_level() for accounting of MMIO SPTEs. As kvm_faultin_pfn() now will return error for non-slot PFNs, no MMIO SPTEs should be generated and accounted in non-vCPU context. So, just let tdp_mmu_map_handle_target_level() warn if MMIO SPTEs are encountered in non-vCPU context. This is a preparation patch for later KVM MMU to export TDP. Signed-off-by: Yan Zhao --- arch/x86/kvm/mmu/tdp_mmu.c | 26 +++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index a45d1b71cd62a..5edff3b4698b7 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -949,7 +949,9 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm) * Installs a last-level SPTE to handle a TDP page fault. * (NPT/EPT violation/misconfiguration) */ -static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, +static int tdp_mmu_map_handle_target_level(struct kvm *kvm, + struct kvm_vcpu *vcpu, + struct kvm_mmu_common *mmu_common, struct kvm_page_fault *fault, struct tdp_iter *iter) { @@ -958,24 +960,26 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int ret = RET_PF_FIXED; bool wrprot = false; + WARN_ON(!kvm); + if (WARN_ON_ONCE(sp->role.level != fault->goal_level)) return RET_PF_RETRY; if (unlikely(!fault->slot)) new_spte = make_mmio_spte(vcpu->kvm, vcpu, iter->gfn, ACC_ALL); else - wrprot = make_spte(vcpu->kvm, vcpu, &vcpu->arch.mmu->common, sp, fault->slot, + wrprot = make_spte(kvm, vcpu, mmu_common, sp, fault->slot, ACC_ALL, iter->gfn, fault->pfn, iter->old_spte, fault->prefetch, true, fault->map_writable, &new_spte); if (new_spte == iter->old_spte) ret = RET_PF_SPURIOUS; - else if (tdp_mmu_set_spte_atomic(vcpu->kvm, iter, new_spte)) + else if (tdp_mmu_set_spte_atomic(kvm, iter, new_spte)) return RET_PF_RETRY; else if (is_shadow_present_pte(iter->old_spte) && !is_last_spte(iter->old_spte, iter->level)) - kvm_flush_remote_tlbs_gfn(vcpu->kvm, iter->gfn, iter->level); + kvm_flush_remote_tlbs_gfn(kvm, iter->gfn, iter->level); /* * If the page fault was caused by a write but the page is write @@ -989,10 +993,13 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, /* If a MMIO SPTE is installed, the MMIO will need to be emulated. */ if (unlikely(is_mmio_spte(new_spte))) { - vcpu->stat.pf_mmio_spte_created++; - trace_mark_mmio_spte(rcu_dereference(iter->sptep), iter->gfn, - new_spte); - ret = RET_PF_EMULATE; + /* if without vcpu, no mmio spte should be installed */ + if (!WARN_ON(!vcpu)) { + vcpu->stat.pf_mmio_spte_created++; + trace_mark_mmio_spte(rcu_dereference(iter->sptep), iter->gfn, + new_spte); + ret = RET_PF_EMULATE; + } } else { trace_kvm_mmu_set_spte(iter->level, iter->gfn, rcu_dereference(iter->sptep)); @@ -1114,7 +1121,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) goto retry; map_target_level: - ret = tdp_mmu_map_handle_target_level(vcpu, fault, &iter); + ret = tdp_mmu_map_handle_target_level(vcpu->kvm, vcpu, &vcpu->arch.mmu->common, + fault, &iter); retry: rcu_read_unlock(); From patchwork Sat Dec 2 09:32:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476866 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="gcXXIdQP" Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3D02C10C2; Sat, 2 Dec 2023 02:01:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701511318; x=1733047318; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=4u8cSeWDKWAI+5kWXYXLrN5RJVNgC+yTrQ8oaHPCpOs=; b=gcXXIdQPBqP8Dnkc9IzQ474n58moiLDYrBEARNKLbsztuM+wwvSuLL18 XkRerZPmbUR85sLWrA00YU+hkybxy80X62YaAUJoYiBSAv0Vfg1yQpFYM IJexhn6p1PhRQMuMM/99QuD358r2elO9vzvFD0DZOZkR5Qx4O6dOSBGhR q0DnCjLKY9V5ziK6a6DoNL9IhNeYq2ryy16847wnFtD+t4ZJy9KS+thcA ZFcmmjSnhkm3yOYsVj933hQCKx+cJQeHVsUd4/NwAc6jr/DwP21AhNVOP OtBHoAtXm9u6+2RE5SkpdSZlU/AHUr+p++v9jf9ozQOpRVwSTE0v1OuoR w==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="424756492" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="424756492" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 02:01:57 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="836022857" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="836022857" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 02:01:54 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 35/42] KVM: x86/mmu: Get/Put TDP root page to be exported Date: Sat, 2 Dec 2023 17:32:59 +0800 Message-Id: <20231202093259.15609-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Get/Put the root page of a KVM exported TDP page table based on KVM TDP MMU. When KVM TDP FD requests a TDP to export, it provides an address space to indicate page roles of the TDP to export. In this RFC, KVM address space 0 is supported only. So, TDP MMU will select a root page role with smm=0 and guest_mode=0. (Level of root page role is from kvm->arch.maxphyaddr, based on the assumption that vCPUs are homogeneous.) TDP MMU then searches list tdp_mmu_roots for a existing root, or create a new root if no one is found. A specific kvm->arch.exported_tdp_header_cache is used to allocate the root page in non-vCPU context. The found/created root page will be marked as "exported". When KVM TDP fd puts the exported FD, the mark of "exported" on root page role will be removed. No matter the root page role is exported or not, vCPUs just load TDP root according to its vCPU modes. In this way, KVM is able to share the TDP page tables in KVM address space 0 to IOMMU side. tdp_mmu_roots | role | smm | guest_mode +------+-----------+----------+ ------|----------------- | | | | 0 | 0 | 0 ==> address space 0 | v v v 1 | 1 | 0 | .--------. .--------. .--------. 2 | 0 | 1 | | root | | root | | root | 3 | 1 | 1 | |(role 1)| |(role 2)| |(role 3)| | '--------' '--------' '--------' | ^ | | create or get .------. | +--------------------| vCPU | | fault '------' | smm=1 | guest_mode=0 | (set root as exported) v .--------. create or get .---------------. create or get .------. | TDP FD |------------------->| root (role 0) |<-----------------| vCPU | '--------' fault '---------------' fault '------' . smm=0 . guest_mode=0 . non-vCPU context <---|---> vCPU context . . This patch actually needs to be split into several smaller ones. It's tempted to be kept in a single big patch to show a bigger picture. Will split them into smaller ones in next version. Signed-off-by: Yan Zhao --- arch/x86/include/asm/kvm_host.h | 18 +++++ arch/x86/kvm/mmu.h | 5 ++ arch/x86/kvm/mmu/mmu.c | 129 ++++++++++++++++++++++++++++++++ arch/x86/kvm/mmu/mmu_internal.h | 4 + arch/x86/kvm/mmu/tdp_mmu.c | 47 ++++++++++++ arch/x86/kvm/mmu/tdp_mmu.h | 6 ++ arch/x86/kvm/x86.c | 17 +++++ 7 files changed, 226 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 1f6ac04e0f952..860502720e3e7 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1476,7 +1476,25 @@ struct kvm_arch { */ #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1) struct kvm_mmu_memory_cache split_desc_cache; + +#ifdef CONFIG_HAVE_KVM_EXPORTED_TDP + struct kvm_mmu_memory_cache exported_tdp_header_cache; + struct kvm_mmu_memory_cache exported_tdp_page_cache; + struct mutex exported_tdp_cache_lock; + int maxphyaddr; +#endif +}; + +#ifdef CONFIG_HAVE_KVM_EXPORTED_TDP +#define __KVM_HAVE_ARCH_EXPORTED_TDP +struct kvm_exported_tdp_mmu { + struct kvm_mmu_common common; + struct kvm_mmu_page *root_page; }; +struct kvm_arch_exported_tdp { + struct kvm_exported_tdp_mmu mmu; +}; +#endif struct kvm_vm_stat { struct kvm_vm_stat_generic generic; diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index e9631cc23a594..3d11f2068572d 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -251,6 +251,11 @@ int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu); int kvm_mmu_post_init_vm(struct kvm *kvm); void kvm_mmu_pre_destroy_vm(struct kvm *kvm); +#ifdef CONFIG_HAVE_KVM_EXPORTED_TDP +int kvm_mmu_get_exported_tdp(struct kvm *kvm, struct kvm_exported_tdp *tdp); +void kvm_mmu_put_exported_tdp(struct kvm_exported_tdp *tdp); +#endif + static inline bool kvm_shadow_root_allocated(struct kvm *kvm) { /* diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index c9b587b30dae3..3e2475c678c27 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5468,6 +5468,13 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu) vcpu->arch.nested_mmu.cpu_role.ext.valid = 0; kvm_mmu_reset_context(vcpu); +#ifdef CONFIG_HAVE_KVM_EXPORTED_TDP + if (vcpu->kvm->arch.maxphyaddr) + vcpu->kvm->arch.maxphyaddr = min(vcpu->kvm->arch.maxphyaddr, + vcpu->arch.maxphyaddr); + else + vcpu->kvm->arch.maxphyaddr = vcpu->arch.maxphyaddr; +#endif /* * Changing guest CPUID after KVM_RUN is forbidden, see the comment in * kvm_arch_vcpu_ioctl(). @@ -6216,6 +6223,13 @@ void kvm_mmu_init_vm(struct kvm *kvm) kvm->arch.split_desc_cache.kmem_cache = pte_list_desc_cache; kvm->arch.split_desc_cache.gfp_zero = __GFP_ZERO; + +#ifdef CONFIG_HAVE_KVM_EXPORTED_TDP + mutex_init(&kvm->arch.exported_tdp_cache_lock); + kvm->arch.exported_tdp_header_cache.kmem_cache = mmu_page_header_cache; + kvm->arch.exported_tdp_header_cache.gfp_zero = __GFP_ZERO; + kvm->arch.exported_tdp_page_cache.gfp_zero = __GFP_ZERO; +#endif } static void mmu_free_vm_memory_caches(struct kvm *kvm) @@ -7193,3 +7207,118 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm) if (kvm->arch.nx_huge_page_recovery_thread) kthread_stop(kvm->arch.nx_huge_page_recovery_thread); } + +#ifdef CONFIG_HAVE_KVM_EXPORTED_TDP +static bool kvm_mmu_is_expoted_allowed(struct kvm *kvm, int as_id) +{ + if (as_id != 0) { + pr_err("unsupported address space to export TDP\n"); + return false; + } + + /* + * Currently, exporting TDP is based on TDP MMU and is not enabled on + * hyperv, one of the reasons is because of hyperv's tlb flush way + */ + if (!tdp_mmu_enabled || IS_ENABLED(CONFIG_HYPERV) || + !IS_ENABLED(CONFIG_HAVE_KVM_MMU_PRESENT_HIGH)) { + pr_err("Not allowed to create exported tdp, please check config\n"); + return false; + } + + /* we need max phys addr of vcpus, so oneline vcpus must > 0 */ + if (!atomic_read(&kvm->online_vcpus)) { + pr_err("Exported tdp must be created after vCPUs created\n"); + return false; + } + + if (kvm->arch.maxphyaddr < 32) { + pr_err("Exported tdp must be created on 64-bit platform\n"); + return false; + } + /* + * Do not allow noncoherent DMA if TDP is exported, because mapping of + * the exported TDP may not be at vCPU context, but noncoherent DMA + * requires vCPU mode and guest vCPU MTRRs to get the right memory type. + */ + if (kvm_arch_has_noncoherent_dma(kvm)) { + pr_err("Not allowed to create exported tdp for noncoherent DMA\n"); + return false; + } + + return true; +} + +static void init_kvm_exported_tdp_mmu(struct kvm *kvm, int as_id, + struct kvm_exported_tdp_mmu *mmu) +{ + WARN_ON(!kvm->arch.maxphyaddr); + + union kvm_cpu_role cpu_role = { 0 }; + + cpu_role.base.smm = !!as_id; + cpu_role.base.guest_mode = 0; + + mmu->common.root_role = kvm_calc_tdp_mmu_root_page_role(kvm->arch.maxphyaddr, + cpu_role); + reset_tdp_shadow_zero_bits_mask(&mmu->common); +} + +static int mmu_topup_exported_tdp_caches(struct kvm *kvm) +{ + int r; + + lockdep_assert_held(&kvm->arch.exported_tdp_cache_lock); + + r = kvm_mmu_topup_memory_cache(&kvm->arch.exported_tdp_header_cache, + PT64_ROOT_MAX_LEVEL); + if (r) + return r; + + return kvm_mmu_topup_memory_cache(&kvm->arch.exported_tdp_page_cache, + PT64_ROOT_MAX_LEVEL); +} + +int kvm_mmu_get_exported_tdp(struct kvm *kvm, struct kvm_exported_tdp *tdp) +{ + struct kvm_exported_tdp_mmu *mmu = &tdp->arch.mmu; + struct kvm_mmu_page *root; + int ret; + + if (!kvm_mmu_is_expoted_allowed(kvm, tdp->as_id)) + return -EINVAL; + + init_kvm_exported_tdp_mmu(kvm, tdp->as_id, mmu); + + mutex_lock(&kvm->arch.exported_tdp_cache_lock); + ret = mmu_topup_exported_tdp_caches(kvm); + if (ret) { + mutex_unlock(&kvm->arch.exported_tdp_cache_lock); + return ret; + } + write_lock(&kvm->mmu_lock); + root = kvm_tdp_mmu_get_exported_root(kvm, mmu); + WARN_ON(root->exported); + root->exported = true; + mmu->common.root.hpa = __pa(root->spt); + mmu->root_page = root; + write_unlock(&kvm->mmu_lock); + + mutex_unlock(&kvm->arch.exported_tdp_cache_lock); + + return 0; +} + +void kvm_mmu_put_exported_tdp(struct kvm_exported_tdp *tdp) +{ + struct kvm_exported_tdp_mmu *mmu = &tdp->arch.mmu; + struct kvm *kvm = tdp->kvm; + + write_lock(&kvm->mmu_lock); + mmu->root_page->exported = false; + kvm_tdp_mmu_put_exported_root(kvm, mmu->root_page); + mmu->common.root.hpa = INVALID_PAGE; + mmu->root_page = NULL; + write_unlock(&kvm->mmu_lock); +} +#endif diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 1e9be0604e348..9294bb7e56c08 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -130,6 +130,10 @@ struct kvm_mmu_page { /* Used for freeing the page asynchronously if it is a TDP MMU page. */ struct rcu_head rcu_head; #endif + +#ifdef CONFIG_HAVE_KVM_EXPORTED_TDP + bool exported; +#endif }; extern struct kmem_cache *mmu_page_header_cache; diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 5edff3b4698b7..47edf54961e89 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1824,3 +1824,50 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr, */ return rcu_dereference(sptep); } + +#ifdef CONFIG_HAVE_KVM_EXPORTED_TDP +static struct kvm_mmu_page *tdp_mmu_alloc_sp_exported_cache(struct kvm *kvm) +{ + struct kvm_mmu_page *sp; + + sp = kvm_mmu_memory_cache_alloc(&kvm->arch.exported_tdp_header_cache); + sp->spt = kvm_mmu_memory_cache_alloc(&kvm->arch.exported_tdp_page_cache); + + return sp; +} + +struct kvm_mmu_page *kvm_tdp_mmu_get_exported_root(struct kvm *kvm, + struct kvm_exported_tdp_mmu *mmu) +{ + union kvm_mmu_page_role role = mmu->common.root_role; + struct kvm_mmu_page *root; + + lockdep_assert_held_write(&kvm->mmu_lock); + + for_each_tdp_mmu_root(kvm, root, kvm_mmu_role_as_id(role)) { + if (root->role.word == role.word && + kvm_tdp_mmu_get_root(root)) + goto out; + + } + + root = tdp_mmu_alloc_sp_exported_cache(kvm); + tdp_mmu_init_sp(root, NULL, 0, role); + + refcount_set(&root->tdp_mmu_root_count, 2); + + spin_lock(&kvm->arch.tdp_mmu_pages_lock); + list_add_rcu(&root->link, &kvm->arch.tdp_mmu_roots); + spin_unlock(&kvm->arch.tdp_mmu_pages_lock); + +out: + return root; +} + +void kvm_tdp_mmu_put_exported_root(struct kvm *kvm, struct kvm_mmu_page *root) +{ + tdp_mmu_zap_root(kvm, root, false); + kvm_tdp_mmu_put_root(kvm, root, false); +} + +#endif diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index 733a3aef3a96e..1d36ed378848b 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -75,4 +75,10 @@ static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return sp->tdp_mmu static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; } #endif +#ifdef CONFIG_HAVE_KVM_EXPORTED_TDP +struct kvm_mmu_page *kvm_tdp_mmu_get_exported_root(struct kvm *kvm, + struct kvm_exported_tdp_mmu *mmu); +void kvm_tdp_mmu_put_exported_root(struct kvm *kvm, struct kvm_mmu_page *root); +#endif + #endif /* __KVM_X86_MMU_TDP_MMU_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9ac8682c70ae7..afc0e5372ddce 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13429,6 +13429,23 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_arch_no_poll); +#ifdef CONFIG_HAVE_KVM_EXPORTED_TDP +int kvm_arch_exported_tdp_init(struct kvm *kvm, struct kvm_exported_tdp *tdp) +{ + int ret; + + ret = kvm_mmu_get_exported_tdp(kvm, tdp); + if (ret) + return ret; + + return 0; +} + +void kvm_arch_exported_tdp_destroy(struct kvm_exported_tdp *tdp) +{ + kvm_mmu_put_exported_tdp(tdp); +} +#endif int kvm_spec_ctrl_test_value(u64 value) { From patchwork Sat Dec 2 09:33:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476867 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="SyLHtsbV" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 806CC19F; Sat, 2 Dec 2023 02:02:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701511349; x=1733047349; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=hmQiFx37c9kQD8NB0bOovdRfd/CdlZnR5EzplmD7wsQ=; b=SyLHtsbVOIzjpge9tnxB/Y0SmYF18pY2YhHcm9fEGKMDXw+RfG2v/s6y AE9+Gd0nAAT6yaPfv0Lzklsqi9f3N3vqPFmWhMcUjXk5lWAXH/bybxAaX pvceSYnkQeYM5RbqwrCgczkq/FfV9QcJBRL73KL+NwnpWgxWEQ+0s+Be1 qrVLRl0d2xunzwilHMuSKdy8lXR4LfIPWFtd4PU3+u830mynjlYz042tW nI8irqPC/VTbxY/+VbYeUJGroU27D7Bm2UFCEnGypFJId8E2WiwlYQb2R LlOGOXzBv5WVgi1OtjD1UkUujczpzSxrkJ7AY+02S8gRGQ8kS4aiysI4j g==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="12304628" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="12304628" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 02:02:28 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="1101537971" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="1101537971" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 02:02:24 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 36/42] KVM: x86/mmu: Keep exported TDP root valid Date: Sat, 2 Dec 2023 17:33:25 +0800 Message-Id: <20231202093325.15676-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Keep exported TDP root always valid and zap all leaf entries to replace the "root role invalid" operation. Unlike TDP roots accessed by vCPUs only, update of TDP root exported to external components must be in an atomic way, like 1. allocating new root, 2. updating and notifying new root to external components, 3. making old root invalid, So, it's more efficient to just zap all leaf entries of the exported TDP. Though zapping all leaf entries will make "fast zap" not fast enough, as with commit 0df9dab891ff ("KVM: x86/mmu: Stop zapping invalidated TDP MMU roots asynchronously"), zap of root is anyway required to be done synchronously in kvm_mmu_zap_all_fast() before completing memslot removal. Besides, it's also safe to skip invalidating "exported" root in kvm_tdp_mmu_invalidate_all_roots() for path kvm_mmu_uninit_tdp_mmu(), because when the VM is shutting down, as TDP FD will hold reference count of kvm, kvm_mmu_uninit_tdp_mmu() --> kvm_tdp_mmu_invalidate_all_roots() will not come until the TDP root is unmarked as "exported" and put. All children entries are also zapped before the root is put. Signed-off-by: Yan Zhao --- arch/x86/kvm/mmu/mmu.c | 3 +++ arch/x86/kvm/mmu/tdp_mmu.c | 40 +++++++++++++++++++++++++++++++++----- arch/x86/kvm/mmu/tdp_mmu.h | 1 + 3 files changed, 39 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 3e2475c678c27..37a903fff582a 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6187,6 +6187,9 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm) kvm_zap_obsolete_pages(kvm); + if (tdp_mmu_enabled) + kvm_tdp_mmu_zap_exported_roots(kvm); + write_unlock(&kvm->mmu_lock); /* diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 47edf54961e89..36a309ad27d47 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -897,12 +897,38 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm) read_unlock(&kvm->mmu_lock); } +void kvm_tdp_mmu_zap_exported_roots(struct kvm *kvm) +{ +#ifdef CONFIG_HAVE_KVM_EXPORTED_TDP + struct kvm_mmu_page *root; + bool flush; + + lockdep_assert_held_write(&kvm->mmu_lock); + + rcu_read_lock(); + + list_for_each_entry_rcu(root, &kvm->arch.tdp_mmu_roots, link) { + if (!root->exported) + continue; + + flush = tdp_mmu_zap_leafs(kvm, root, 0, -1ULL, false, false); + if (flush) + kvm_flush_remote_tlbs(kvm); + } + + rcu_read_unlock(); +#endif +} + /* - * Mark each TDP MMU root as invalid to prevent vCPUs from reusing a root that - * is about to be zapped, e.g. in response to a memslots update. The actual - * zapping is done separately so that it happens with mmu_lock with read, - * whereas invalidating roots must be done with mmu_lock held for write (unless - * the VM is being destroyed). + * Mark each TDP MMU root (except exported root) as invalid to prevent vCPUs from + * reusing a root that is about to be zapped, e.g. in response to a memslots + * update. + * The actual zapping is done separately so that it happens with mmu_lock + * with read, whereas invalidating roots must be done with mmu_lock held for write + * (unless the VM is being destroyed). + * For exported root, zap is done in kvm_tdp_mmu_zap_exported_roots() before + * the memslot update completes with mmu_lock held for write. * * Note, kvm_tdp_mmu_zap_invalidated_roots() is gifted the TDP MMU's reference. * See kvm_tdp_mmu_get_vcpu_root_hpa(). @@ -932,6 +958,10 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm) * or get/put references to roots. */ list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) { +#ifdef CONFIG_HAVE_KVM_EXPORTED_TDP + if (root->exported) + continue; +#endif /* * Note, invalid roots can outlive a memslot update! Invalid * roots must be *zapped* before the memslot update completes, diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index 1d36ed378848b..df42350022a3f 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -25,6 +25,7 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp); void kvm_tdp_mmu_zap_all(struct kvm *kvm); void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm); void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm); +void kvm_tdp_mmu_zap_exported_roots(struct kvm *kvm); int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); From patchwork Sat Dec 2 09:33:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476868 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="EicHjEmX" Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA9C410C2; Sat, 2 Dec 2023 02:02:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701511373; x=1733047373; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=Ny7K/1p6mPwz6jo6BVXvIZBJop8cXIZXURz+qH/CFNo=; b=EicHjEmXeYd0aRT3O1uvzEQaTHkCrIbJs0qB+ccIc9RM7okvcDHLA9pu /w8T6XJalLwDywlu0IFjgyIl79fWZiSsZPYYqebqXLZ6Vqs0YZQxRFrbE YkgxD6GydOchIi+J43ARuKcH96ceQBtlShqa9LmmCNk7HzhIIc7s5Sz0e 0pRVz9l2Ne9qI7mkMX2GofghZWEeNd9ZzwShCTi6ufqZnOGWbFRWofuVZ DyxV8yUkcaqOrSTckSo6jHHe5JmisaO1Wxo0mbdpjaGtNHlCRwrBGXKiU rOxLNjX+kZpc1Yi1CECnWbuserHTwtjA6EDNXZkPH+0QhjWdPOmU4KM+h w==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="424756538" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="424756538" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 02:02:53 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="836023023" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="836023023" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 02:02:49 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 37/42] KVM: x86: Implement KVM exported TDP fault handler on x86 Date: Sat, 2 Dec 2023 17:33:55 +0800 Message-Id: <20231202093355.15745-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Implement fault handler of KVM exported TDP on x86. The fault handler will fail if the GFN to be faulted is in emulated MMIO range or in write-tracked range. kvm_tdp_mmu_map_exported_root() is actually a duplicate of kvm_tdp_mmu_map() except that its shadow pages are allocated from exported TDP specific header/page caches in kvm arch rather than from each vCPU's header/page caches. The exported TDP specific header/page caches are used is because fault handler of KVM exported TDP is not called in vCPU thread. Will seek to remove the duplication in future. Signed-off-by: Yan Zhao --- arch/x86/kvm/mmu.h | 1 + arch/x86/kvm/mmu/mmu.c | 57 +++++++++++++++++++++++++++ arch/x86/kvm/mmu/tdp_mmu.c | 81 ++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/mmu/tdp_mmu.h | 2 + arch/x86/kvm/x86.c | 22 +++++++++++ 5 files changed, 163 insertions(+) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 3d11f2068572d..a6e6802fb4d56 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -254,6 +254,7 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm); #ifdef CONFIG_HAVE_KVM_EXPORTED_TDP int kvm_mmu_get_exported_tdp(struct kvm *kvm, struct kvm_exported_tdp *tdp); void kvm_mmu_put_exported_tdp(struct kvm_exported_tdp *tdp); +int kvm_mmu_fault_exported_tdp(struct kvm_exported_tdp *tdp, unsigned long gfn, u32 err); #endif static inline bool kvm_shadow_root_allocated(struct kvm *kvm) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 37a903fff582a..b4b1ede30642d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -7324,4 +7324,61 @@ void kvm_mmu_put_exported_tdp(struct kvm_exported_tdp *tdp) mmu->root_page = NULL; write_unlock(&kvm->mmu_lock); } + +int kvm_mmu_fault_exported_tdp(struct kvm_exported_tdp *tdp, unsigned long gfn, u32 err) +{ + struct kvm *kvm = tdp->kvm; + struct kvm_page_fault fault = { + .addr = gfn << PAGE_SHIFT, + .error_code = err, + .prefetch = false, + .exec = err & PFERR_FETCH_MASK, + .write = err & PFERR_WRITE_MASK, + .present = err & PFERR_PRESENT_MASK, + .rsvd = err & PFERR_RSVD_MASK, + .user = err & PFERR_USER_MASK, + .is_tdp = true, + .nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(kvm), + .max_level = KVM_MAX_HUGEPAGE_LEVEL, + .req_level = PG_LEVEL_4K, + .goal_level = PG_LEVEL_4K, + .gfn = gfn, + .slot = gfn_to_memslot(kvm, gfn), + }; + struct kvm_exported_tdp_mmu *mmu = &tdp->arch.mmu; + int r; + + if (page_fault_handle_page_track(kvm, &fault)) + return -EINVAL; +retry: + r = kvm_faultin_pfn(kvm, NULL, &fault, ACC_ALL); + if (r != RET_PF_CONTINUE) + goto out; + + mutex_lock(&kvm->arch.exported_tdp_cache_lock); + r = mmu_topup_exported_tdp_caches(kvm); + if (r) + goto out_cache; + + r = RET_PF_RETRY; + read_lock(&kvm->mmu_lock); + if (fault.slot && mmu_invalidate_retry_hva(kvm, fault.mmu_seq, fault.hva)) + goto out_mmu; + + if (mmu->root_page && is_obsolete_sp(kvm, mmu->root_page)) + goto out_mmu; + + r = kvm_tdp_mmu_map_exported_root(kvm, mmu, &fault); + +out_mmu: + read_unlock(&kvm->mmu_lock); +out_cache: + mutex_unlock(&kvm->arch.exported_tdp_cache_lock); + kvm_release_pfn_clean(fault.pfn); +out: + if (r == RET_PF_RETRY) + goto retry; + + return r == RET_PF_FIXED ? 0 : -EFAULT; +} #endif diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 36a309ad27d47..e7587aefc3304 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1900,4 +1900,85 @@ void kvm_tdp_mmu_put_exported_root(struct kvm *kvm, struct kvm_mmu_page *root) kvm_tdp_mmu_put_root(kvm, root, false); } +int kvm_tdp_mmu_map_exported_root(struct kvm *kvm, struct kvm_exported_tdp_mmu *mmu, + struct kvm_page_fault *fault) +{ + struct tdp_iter iter; + struct kvm_mmu_page *sp; + int ret = RET_PF_RETRY; + + kvm_mmu_hugepage_adjust(kvm, fault); + + trace_kvm_mmu_spte_requested(fault); + + rcu_read_lock(); + + tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) { + int r; + + if (fault->nx_huge_page_workaround_enabled) + disallowed_hugepage_adjust(fault, iter.old_spte, iter.level); + + /* + * If SPTE has been frozen by another thread, just give up and + * retry, avoiding unnecessary page table allocation and free. + */ + if (is_removed_spte(iter.old_spte)) + goto retry; + + if (iter.level == fault->goal_level) + goto map_target_level; + + /* Step down into the lower level page table if it exists. */ + if (is_shadow_present_pte(iter.old_spte) && + !is_large_pte(iter.old_spte)) + continue; + + /* + * The SPTE is either non-present or points to a huge page that + * needs to be split. + */ + sp = tdp_mmu_alloc_sp_exported_cache(kvm); + tdp_mmu_init_child_sp(sp, &iter); + + sp->nx_huge_page_disallowed = fault->huge_page_disallowed; + + if (is_shadow_present_pte(iter.old_spte)) + r = tdp_mmu_split_huge_page(kvm, &iter, sp, true); + else + r = tdp_mmu_link_sp(kvm, &iter, sp, true); + + /* + * Force the guest to retry if installing an upper level SPTE + * failed, e.g. because a different task modified the SPTE. + */ + if (r) { + tdp_mmu_free_sp(sp); + goto retry; + } + + if (fault->huge_page_disallowed && + fault->req_level >= iter.level) { + spin_lock(&kvm->arch.tdp_mmu_pages_lock); + if (sp->nx_huge_page_disallowed) + track_possible_nx_huge_page(kvm, sp); + spin_unlock(&kvm->arch.tdp_mmu_pages_lock); + } + } + + /* + * The walk aborted before reaching the target level, e.g. because the + * iterator detected an upper level SPTE was frozen during traversal. + */ + WARN_ON_ONCE(iter.level == fault->goal_level); + goto retry; + +map_target_level: + ret = tdp_mmu_map_handle_target_level(kvm, NULL, &mmu->common, fault, &iter); + +retry: + rcu_read_unlock(); + return ret; +} + #endif diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index df42350022a3f..a3ea418aaffed 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -80,6 +80,8 @@ static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; } struct kvm_mmu_page *kvm_tdp_mmu_get_exported_root(struct kvm *kvm, struct kvm_exported_tdp_mmu *mmu); void kvm_tdp_mmu_put_exported_root(struct kvm *kvm, struct kvm_mmu_page *root); +int kvm_tdp_mmu_map_exported_root(struct kvm *kvm, struct kvm_exported_tdp_mmu *mmu, + struct kvm_page_fault *fault); #endif #endif /* __KVM_X86_MMU_TDP_MMU_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index afc0e5372ddce..2886eac0590d8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13445,6 +13445,28 @@ void kvm_arch_exported_tdp_destroy(struct kvm_exported_tdp *tdp) { kvm_mmu_put_exported_tdp(tdp); } + +int kvm_arch_fault_exported_tdp(struct kvm_exported_tdp *tdp, unsigned long gfn, + struct kvm_tdp_fault_type type) +{ + u32 err = 0; + int ret; + + if (type.read) + err |= PFERR_PRESENT_MASK | PFERR_USER_MASK; + + if (type.write) + err |= PFERR_WRITE_MASK; + + if (type.exec) + err |= PFERR_FETCH_MASK; + + mutex_lock(&tdp->kvm->slots_lock); + ret = kvm_mmu_fault_exported_tdp(tdp, gfn, err); + mutex_unlock(&tdp->kvm->slots_lock); + return ret; +} + #endif int kvm_spec_ctrl_test_value(u64 value) From patchwork Sat Dec 2 09:35:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476869 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="MkAQbsSB" Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A16B1A6; Sat, 2 Dec 2023 02:04:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701511449; x=1733047449; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=TMgeUuQ49CQ1kIvreWulUxr+utMzPKzhJ7mH5ahCx+8=; b=MkAQbsSBTH+R5d87sfxQdIO7c479rVEIBbz81oK1Nvi5MBQLQxQvy2Y/ 09eB6KhTl5lRhUFEWSNPPZEuDqwMlbvHS3FpSNhYPUbSYc/eemsSnMnfO XeuJ3kdX8df/hZBYFIOq8EIK1o6PrgPGmvfTkLhgES4qeVgbZSENGhPXi EPdMOQx0SpfwBK/gRaoKYtS/RECswHPKeu7cfffYKukls2lkTTaw0P+WK o7XCbdCJK9tPvW5YrQXg5F1Q/5usa02SFUk+OEt+pfIany/gs38PdYL4D C3Q9QZz93CRjz9JWbns4g99imQ9JlpwqVwRsK/N/N3PMmWnJNo07hB6w6 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="392459598" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="392459598" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 02:04:09 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="887939763" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="887939763" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 02:04:05 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 38/42] KVM: x86: "compose" and "get" interface for meta data of exported TDP Date: Sat, 2 Dec 2023 17:35:10 +0800 Message-Id: <20231202093510.15817-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Added two fields .exported_tdp_meta_size and .exported_tdp_meta_compose in kvm_x86_ops to allow vendor specific code to compose meta data of exported TDP and provided an arch interface for external components to get the composed meta data. As the meta data is consumed in IOMMU's vendor driver to check if the exported TDP is compatible to the IOMMU hardware before reusing them as IOMMU's stage 2 page tables, it's better to compose them in KVM's vendor spcific code too. Signed-off-by: Yan Zhao --- arch/x86/include/asm/kvm-x86-ops.h | 3 +++ arch/x86/include/asm/kvm_host.h | 7 +++++++ arch/x86/kvm/x86.c | 23 ++++++++++++++++++++++- include/linux/kvm_host.h | 6 ++++++ virt/kvm/tdp_fd.c | 2 +- 5 files changed, 39 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index d751407b1056c..baf3efaa148c2 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -136,6 +136,9 @@ KVM_X86_OP(msr_filter_changed) KVM_X86_OP(complete_emulated_msr) KVM_X86_OP(vcpu_deliver_sipi_vector) KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); +#if IS_ENABLED(CONFIG_HAVE_KVM_EXPORTED_TDP) +KVM_X86_OP_OPTIONAL(exported_tdp_meta_compose); +#endif #undef KVM_X86_OP #undef KVM_X86_OP_OPTIONAL diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 860502720e3e7..412a1b2088f09 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -1493,6 +1494,7 @@ struct kvm_exported_tdp_mmu { }; struct kvm_arch_exported_tdp { struct kvm_exported_tdp_mmu mmu; + void *meta; }; #endif @@ -1784,6 +1786,11 @@ struct kvm_x86_ops { * Returns vCPU specific APICv inhibit reasons */ unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu); + +#ifdef CONFIG_HAVE_KVM_EXPORTED_TDP + unsigned long exported_tdp_meta_size; + void (*exported_tdp_meta_compose)(struct kvm_exported_tdp *tdp); +#endif }; struct kvm_x86_nested_ops { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2886eac0590d8..468bcde414691 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13432,18 +13432,39 @@ EXPORT_SYMBOL_GPL(kvm_arch_no_poll); #ifdef CONFIG_HAVE_KVM_EXPORTED_TDP int kvm_arch_exported_tdp_init(struct kvm *kvm, struct kvm_exported_tdp *tdp) { + void *meta; int ret; + if (!kvm_x86_ops.exported_tdp_meta_size || + !kvm_x86_ops.exported_tdp_meta_compose) + return -EOPNOTSUPP; + + meta = __vmalloc(kvm_x86_ops.exported_tdp_meta_size, + GFP_KERNEL_ACCOUNT | __GFP_ZERO); + if (!meta) + return -ENOMEM; + + tdp->arch.meta = meta; + ret = kvm_mmu_get_exported_tdp(kvm, tdp); - if (ret) + if (ret) { + kvfree(meta); return ret; + } + static_call(kvm_x86_exported_tdp_meta_compose)(tdp); return 0; } void kvm_arch_exported_tdp_destroy(struct kvm_exported_tdp *tdp) { kvm_mmu_put_exported_tdp(tdp); + kvfree(tdp->arch.meta); +} + +void *kvm_arch_exported_tdp_get_metadata(struct kvm_exported_tdp *tdp) +{ + return tdp->arch.meta; } int kvm_arch_fault_exported_tdp(struct kvm_exported_tdp *tdp, unsigned long gfn, diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a8af95194767f..48324c846d90b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2348,6 +2348,7 @@ int kvm_arch_exported_tdp_init(struct kvm *kvm, struct kvm_exported_tdp *tdp); void kvm_arch_exported_tdp_destroy(struct kvm_exported_tdp *tdp); int kvm_arch_fault_exported_tdp(struct kvm_exported_tdp *tdp, unsigned long gfn, struct kvm_tdp_fault_type type); +void *kvm_arch_exported_tdp_get_metadata(struct kvm_exported_tdp *tdp); #else static inline int kvm_arch_exported_tdp_init(struct kvm *kvm, struct kvm_exported_tdp *tdp) @@ -2364,6 +2365,11 @@ static inline int kvm_arch_fault_exported_tdp(struct kvm_exported_tdp *tdp, { return -EOPNOTSUPP; } + +static inline void *kvm_arch_exported_tdp_get_metadata(struct kvm_exported_tdp *tdp) +{ + return NULL; +} #endif /* __KVM_HAVE_ARCH_EXPORTED_TDP */ void kvm_tdp_fd_flush_notify(struct kvm *kvm, unsigned long gfn, unsigned long npages); diff --git a/virt/kvm/tdp_fd.c b/virt/kvm/tdp_fd.c index 8c16af685a061..e4a2453a5547f 100644 --- a/virt/kvm/tdp_fd.c +++ b/virt/kvm/tdp_fd.c @@ -217,7 +217,7 @@ static void kvm_tdp_unregister_all_importers(struct kvm_exported_tdp *tdp) static void *kvm_tdp_get_metadata(struct kvm_tdp_fd *tdp_fd) { - return ERR_PTR(-EOPNOTSUPP); + return kvm_arch_exported_tdp_get_metadata(tdp_fd->priv); } static int kvm_tdp_fault(struct kvm_tdp_fd *tdp_fd, struct mm_struct *mm, From patchwork Sat Dec 2 09:35:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476870 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="kuuNYiyc" Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AFE5E19F; Sat, 2 Dec 2023 02:04:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701511473; x=1733047473; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=aDYDeaOzsRKl3ev8dC58rNFjF38aQ9l/2QXjjp0gHNA=; b=kuuNYiyc/pDkoT46m+N9x8z6QqhoASjGdUluT1jJFXb85y1a90tJGoO9 U8ocQEdYuKY16rNuFoFZeK+gdEsxTDBzSuhg16enrqveXfWpzYG5bDSLL wEzvqI2VNJcxubuUqgXKWGI/AqpA3zos7jcZ1g6RAQ87udXb9efiIbgyj Jnn99j02n7MKDrN9QcKFhB5ROGxJg68Gi5pE9HqZwI27WnHzDgUDJgtOn qtMCbQz8x+fAAU0uaqXmxzJNXwUNcmcBp/u1L3lALUVku5axXKzh8NWyl t1IVCcTpr+A25kiSMCGz0wRTJadaooWBWDuvdz3siYO68dNSKlyGmLdqs Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="392459638" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="392459638" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 02:04:33 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="887939832" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="887939832" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 02:04:30 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 39/42] KVM: VMX: add config KVM_INTEL_EXPORTED_EPT Date: Sat, 2 Dec 2023 17:35:35 +0800 Message-Id: <20231202093535.15874-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Add config KVM_INTEL_EXPORTED_EPT to let kvm_intel.ko support exporting EPT to KVM external components (e.g. Intel VT-d). This config will turn on HAVE_KVM_EXPORTED_TDP and HAVE_KVM_MMU_PRESENT_HIGH automatically. HAVE_KVM_MMU_PRESENT_HIGH will make bit 11 reserved as 0. Signed-off-by: Yan Zhao --- arch/x86/kvm/Kconfig | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 950c12868d304..7126344077ab5 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -99,6 +99,19 @@ config X86_SGX_KVM If unsure, say N. +config KVM_INTEL_EXPORTED_EPT + bool "export EPT to be used by other modules (e.g. iommufd)" + depends on KVM_INTEL + select HAVE_KVM_EXPORTED_TDP + select HAVE_KVM_MMU_PRESENT_HIGH if X86_64 + help + Intel EPT is architecturally guaranteed of compatible to stage 2 + page tables in Intel IOMMU. + + Enable this feature to allow Intel EPT to be exported and used + directly as stage 2 page tables in Intel IOMMU. + + config KVM_AMD tristate "KVM for AMD processors support" depends on KVM && (CPU_SUP_AMD || CPU_SUP_HYGON) From patchwork Sat Dec 2 09:36:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476871 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="oIzpXtFm" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 507AB197; Sat, 2 Dec 2023 02:05:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701511501; x=1733047501; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=salWrsDL9dpOvmr91p0gKRVzj1lK++JFGh51YeQPB6Q=; b=oIzpXtFmjMjZ3HkDSzsUvWt8FnfccUhdTn+CskBij29VX6Pa2Cdt+7XY sXFVClfmOJGKMiHUAKtJCuWs3M1Tp3NHMBrSYNanNEqhAbMzDrebyCwPU MSNJlY9i7tGr1IJ6DjJ+9/HYNHCYvKHAx+hUYiH4ZJzCaQtIsCZ1diHMy 8+aKynWhtMI3dO2AAC7Hn2hZgnbI9xatq8N2HvPmdZ1B3mQLRKnsO4IEH +sMC1c8TvLqj22F7DUNOIBoOmnlSSqK9cduzkVIyCE2X7hXwYwWwSEFLD 0mdc3DQgWhDNVUJvhwiKqmMnE8J2jg9rYpdGbfF8nQLNqDRDPvz4DORJC Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="457913864" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="457913864" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 02:05:00 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="840461606" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="840461606" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 02:04:57 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 40/42] KVM: VMX: Compose VMX specific meta data for KVM exported TDP Date: Sat, 2 Dec 2023 17:36:01 +0800 Message-Id: <20231202093601.15931-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Compose VMX specific meta data of KVM exported TDP. The format of the meta data is defined in "asm/kvm_exported_tdp.h". Intel VT-d driver can include "asm/kvm_exported_tdp.h" to decode this meta data in order to check page table format, level, reserved zero bits before loading KVM page tables with root HPA. Signed-off-by: Yan Zhao --- arch/x86/kvm/vmx/vmx.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index f290dd3094da6..7965bc32f87de 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -48,6 +48,7 @@ #include #include #include +#include #include "capabilities.h" #include "cpuid.h" @@ -8216,6 +8217,22 @@ static void vmx_vm_destroy(struct kvm *kvm) free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm)); } +#ifdef CONFIG_KVM_INTEL_EXPORTED_EPT +void kvm_exported_tdp_compose_meta(struct kvm_exported_tdp *tdp) +{ + struct kvm_exported_tdp_meta_vmx *meta = tdp->arch.meta; + struct kvm_mmu_common *context = &tdp->arch.mmu.common; + void *rsvd_bits_mask = context->shadow_zero_check.rsvd_bits_mask; + + meta->root_hpa = context->root.hpa; + meta->level = context->root_role.level; + meta->max_huge_page_level = min(ept_caps_to_lpage_level(vmx_capability.ept), + KVM_MAX_HUGEPAGE_LEVEL); + memcpy(meta->rsvd_bits_mask, rsvd_bits_mask, sizeof(meta->rsvd_bits_mask)); + meta->type = KVM_TDP_TYPE_EPT; +} +#endif + static struct kvm_x86_ops vmx_x86_ops __initdata = { .name = KBUILD_MODNAME, @@ -8357,6 +8374,11 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = { .complete_emulated_msr = kvm_complete_insn_gp, .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, + +#ifdef CONFIG_KVM_INTEL_EXPORTED_EPT + .exported_tdp_meta_size = sizeof(struct kvm_exported_tdp_meta_vmx), + .exported_tdp_meta_compose = kvm_exported_tdp_compose_meta, +#endif }; static unsigned int vmx_handle_intel_pt_intr(void) From patchwork Sat Dec 2 09:36:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476872 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="KceNHWxV" Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E366B1A6; Sat, 2 Dec 2023 02:05:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701511531; x=1733047531; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=2TvSYcYyRxZJXd7jYUAEf7Q9pF2z8JkVIapNJrcQxZk=; b=KceNHWxVMoHHTCmvnwDci7v4XGhRA6PCaY/ADkAwT9wtdPMJuhU39Z95 lNJ5WhQo/wnsCjMXGFEtTODwDNVqzp9y1mUfL8dRoVhX4WCCqcQvtvaTk 4+3CCJHY3NTVG8LQiWyKwqMf1MIzguE8PFYHFVl+EFKwDwbYyLpfCm6Hg 5tiZDzi4uOg9vfXeHZispiHtTJHqZ26+4nIzECS3tCMCqJuzCWtS+SaHk khXqUGefse4Nz4Tk7ZRAlBNHgCrZYzDzsK7jtC6kKOKe7JGac1QSwK5mK qbEAKQNj5TDE4m0oIEe+vBaUlu2T/D/LTYsQ4EXVAO3p0pTaTtwDr/lmN Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="392459741" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="392459741" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 02:05:31 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="887939973" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="887939973" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 02:05:28 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 41/42] KVM: VMX: Implement ops .flush_remote_tlbs* in VMX when EPT is on Date: Sat, 2 Dec 2023 17:36:33 +0800 Message-Id: <20231202093633.15991-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Add VMX implementation of ops of flush_remote_tlbs* in kvm_x86_ops when enable_ept is on and CONFIG_HYPERV is off. Without ops flush_remote_tlbs* in VMX, kvm_flush_remote_tlbs*() just makes all cpus request KVM_REQ_TLB_FLUSH after finding the two ops are non-present. So, by also making all cpu requests KVM_REQ_TLB_FLUSH in ops flush_remote_tlbs* in VMX, no functional changes should be introduced. The two ops allow vendor code (e.g. VMX) to control when to notify IOMMU to flush TLBs. This is useful for contidions when sequence to flush CPU TLBs and IOTLBs is important. Signed-off-by: Yan Zhao --- arch/x86/kvm/vmx/vmx.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 7965bc32f87de..2fec351a3fa5b 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7544,6 +7544,17 @@ static int vmx_vcpu_create(struct kvm_vcpu *vcpu) return err; } +static int vmx_flush_remote_tlbs_range(struct kvm *kvm, gfn_t gfn, gfn_t nr_pages) +{ + kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH); + return 0; +} + +static int vmx_flush_remote_tlbs(struct kvm *kvm) +{ + return vmx_flush_remote_tlbs_range(kvm, 0, -1ULL); +} + #define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n" #define L1TF_MSG_L1D "L1TF CPU bug present and virtualization mitigation disabled, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n" @@ -8528,6 +8539,11 @@ static __init int hardware_setup(void) vmx_x86_ops.flush_remote_tlbs = hv_flush_remote_tlbs; vmx_x86_ops.flush_remote_tlbs_range = hv_flush_remote_tlbs_range; } +#else + if (enable_ept) { + vmx_x86_ops.flush_remote_tlbs = vmx_flush_remote_tlbs; + vmx_x86_ops.flush_remote_tlbs_range = vmx_flush_remote_tlbs_range; + } #endif if (!cpu_has_vmx_ple()) { From patchwork Sat Dec 2 09:37:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13476873 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="F8wGCTFZ" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4DAE01A4; Sat, 2 Dec 2023 02:06:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701511571; x=1733047571; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=Kur6VrDVtWtZgYRN/PKml2utDvkLe+OiXJiTkkUWmVs=; b=F8wGCTFZk2QUknO7gbXDejUEEb48++ssQQ0oTcPgn+IvpZwj373BhywE oBuan7FuN2fOBZtxFCvFyDuYB+uXrCiP9hLsI09Xk2EJXrBZPiTSWYt+E +gepyrYcVIStGEJ9pYti4pyaUrjuURhupDe4W6pd8lc7V9UedhyVWf0Ft ypAMK62lK2pZctfH8ScnM/Kjukr7QfHYTyAw5LsruhPudT6VpDABAdV1p ffJxv4xL4sQl5ga9DmyWgd2WT4OBjNK7WPgqzqu0nggO40NOGrCwXEeM2 F3Mtpn3X0PHtz2PiffhMztIAKC7n839j5uoLFI5ci+eOZ4/K9F4TbLQC/ Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="383989384" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="383989384" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 02:06:10 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10911"; a="913854410" X-IronPort-AV: E=Sophos;i="6.04,245,1695711600"; d="scan'208";a="913854410" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2023 02:06:07 -0800 From: Yan Zhao To: iommu@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alex.williamson@redhat.com, jgg@nvidia.com, pbonzini@redhat.com, seanjc@google.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, yi.l.liu@intel.com, Yan Zhao Subject: [RFC PATCH 42/42] KVM: VMX: Notify importers of exported TDP to flush TLBs on KVM flushes EPT Date: Sat, 2 Dec 2023 17:37:12 +0800 Message-Id: <20231202093712.16049-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231202091211.13376-1-yan.y.zhao@intel.com> References: <20231202091211.13376-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Call TDP FD helper to notify importers of exported TDP to flush TLBs when KVM flushes EPT. Signed-off-by: Yan Zhao --- arch/x86/kvm/vmx/vmx.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 2fec351a3fa5b..3a2b6ddcde108 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7547,6 +7547,9 @@ static int vmx_vcpu_create(struct kvm_vcpu *vcpu) static int vmx_flush_remote_tlbs_range(struct kvm *kvm, gfn_t gfn, gfn_t nr_pages) { kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH); +#if IS_ENABLED(CONFIG_KVM_INTEL_EXPORTED_EPT) + kvm_tdp_fd_flush_notify(kvm, gfn, nr_pages); +#endif return 0; }