From patchwork Mon Jan 15 10:37:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519512 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CC8C4C3DA79 for ; Mon, 15 Jan 2024 10:42:38 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKNd-0000dr-AT; Mon, 15 Jan 2024 05:39:45 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKNa-0000da-SL for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:39:42 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKNW-0002jD-BH for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:39:42 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315178; x=1736851178; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Y27R1DzcuwgQJlkFL0gA5xUnNdhi4Jv0oT4oYeHI2SU=; b=GUEUO4KD2uss7nSYUZ4uXkMGtz3VGCcKGYLfNO3psTB3M7qEbt5si8vZ +lYlmwSDLt/MZzpz8n1OMa6YknCza9TQUQKFYFMzH/x2Kyebxfo5SKX+g OrsB9wLBVmXH1UUvRpJTdvaLSiDbKxgT+Bg51K5OVN759DmXHDuFtF4cO YBbDef2Gju/8VReW4v5nnx514iu/d526vW5OKn4Or5AEha+234U5yThTB cuX5nDoxuXqgpS8eQj4uO9kk/oT4pqWHmLKf6ZCsbz0Qwn1mjnDkOolnh kdH2Jbv8YPlpLCdOUJzWj08037pjq7SdkX5Q4FjytjaSDlDPSTjATAbg5 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13067449" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13067449" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:39:37 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065309" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065309" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:39:31 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Zhenzhong Duan , Cornelia Huck , Paolo Bonzini , kvm@vger.kernel.org (open list:Overall KVM CPUs) Subject: [PATCH rfcv1 01/23] Update linux header to support nested hwpt alloc Date: Mon, 15 Jan 2024 18:37:13 +0800 Message-Id: <20240115103735.132209-2-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Repo: https://github.com/yiliu1765/iommufd/tree/iommufd_nesting commit id: 7c22f835c4c9b Placeholder, not for upstream. Signed-off-by: Zhenzhong Duan --- include/standard-headers/drm/drm_fourcc.h | 2 + include/standard-headers/linux/fuse.h | 10 +- include/standard-headers/linux/pci_regs.h | 24 +- include/standard-headers/linux/vhost_types.h | 7 + .../standard-headers/linux/virtio_config.h | 5 + include/standard-headers/linux/virtio_pci.h | 11 + linux-headers/asm-arm64/kvm.h | 32 +++ linux-headers/asm-generic/unistd.h | 14 +- linux-headers/asm-loongarch/bitsperlong.h | 1 + linux-headers/asm-loongarch/kvm.h | 108 ++++++++ linux-headers/asm-loongarch/mman.h | 1 + linux-headers/asm-loongarch/unistd.h | 5 + linux-headers/asm-mips/unistd_n32.h | 4 + linux-headers/asm-mips/unistd_n64.h | 4 + linux-headers/asm-mips/unistd_o32.h | 4 + linux-headers/asm-powerpc/unistd_32.h | 4 + linux-headers/asm-powerpc/unistd_64.h | 4 + linux-headers/asm-riscv/kvm.h | 12 + linux-headers/asm-s390/unistd_32.h | 4 + linux-headers/asm-s390/unistd_64.h | 4 + linux-headers/asm-x86/unistd_32.h | 4 + linux-headers/asm-x86/unistd_64.h | 3 + linux-headers/asm-x86/unistd_x32.h | 3 + linux-headers/linux/iommufd.h | 259 +++++++++++++++++- linux-headers/linux/kvm.h | 11 + linux-headers/linux/psp-sev.h | 1 + linux-headers/linux/stddef.h | 9 +- linux-headers/linux/userfaultfd.h | 9 +- linux-headers/linux/vfio.h | 47 +++- linux-headers/linux/vhost.h | 8 + 30 files changed, 583 insertions(+), 31 deletions(-) create mode 100644 linux-headers/asm-loongarch/bitsperlong.h create mode 100644 linux-headers/asm-loongarch/kvm.h create mode 100644 linux-headers/asm-loongarch/mman.h create mode 100644 linux-headers/asm-loongarch/unistd.h diff --git a/include/standard-headers/drm/drm_fourcc.h b/include/standard-headers/drm/drm_fourcc.h index 72279f4d25..3afb70160f 100644 --- a/include/standard-headers/drm/drm_fourcc.h +++ b/include/standard-headers/drm/drm_fourcc.h @@ -322,6 +322,8 @@ extern "C" { * index 1 = Cr:Cb plane, [39:0] Cr1:Cb1:Cr0:Cb0 little endian */ #define DRM_FORMAT_NV15 fourcc_code('N', 'V', '1', '5') /* 2x2 subsampled Cr:Cb plane */ +#define DRM_FORMAT_NV20 fourcc_code('N', 'V', '2', '0') /* 2x1 subsampled Cr:Cb plane */ +#define DRM_FORMAT_NV30 fourcc_code('N', 'V', '3', '0') /* non-subsampled Cr:Cb plane */ /* * 2 plane YCbCr MSB aligned diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h index 6b9793842c..fc0dcd10ae 100644 --- a/include/standard-headers/linux/fuse.h +++ b/include/standard-headers/linux/fuse.h @@ -209,7 +209,7 @@ * - add FUSE_HAS_EXPIRE_ONLY * * 7.39 - * - add FUSE_DIRECT_IO_RELAX + * - add FUSE_DIRECT_IO_ALLOW_MMAP * - add FUSE_STATX and related structures */ @@ -405,8 +405,7 @@ struct fuse_file_lock { * FUSE_CREATE_SUPP_GROUP: add supplementary group info to create, mkdir, * symlink and mknod (single group that matches parent) * FUSE_HAS_EXPIRE_ONLY: kernel supports expiry-only entry invalidation - * FUSE_DIRECT_IO_RELAX: relax restrictions in FOPEN_DIRECT_IO mode, for now - * allow shared mmap + * FUSE_DIRECT_IO_ALLOW_MMAP: allow shared mmap in FOPEN_DIRECT_IO mode. */ #define FUSE_ASYNC_READ (1 << 0) #define FUSE_POSIX_LOCKS (1 << 1) @@ -445,7 +444,10 @@ struct fuse_file_lock { #define FUSE_HAS_INODE_DAX (1ULL << 33) #define FUSE_CREATE_SUPP_GROUP (1ULL << 34) #define FUSE_HAS_EXPIRE_ONLY (1ULL << 35) -#define FUSE_DIRECT_IO_RELAX (1ULL << 36) +#define FUSE_DIRECT_IO_ALLOW_MMAP (1ULL << 36) + +/* Obsolete alias for FUSE_DIRECT_IO_ALLOW_MMAP */ +#define FUSE_DIRECT_IO_RELAX FUSE_DIRECT_IO_ALLOW_MMAP /** * CUSE INIT request/reply flags diff --git a/include/standard-headers/linux/pci_regs.h b/include/standard-headers/linux/pci_regs.h index e5f558d964..a39193213f 100644 --- a/include/standard-headers/linux/pci_regs.h +++ b/include/standard-headers/linux/pci_regs.h @@ -80,6 +80,7 @@ #define PCI_HEADER_TYPE_NORMAL 0 #define PCI_HEADER_TYPE_BRIDGE 1 #define PCI_HEADER_TYPE_CARDBUS 2 +#define PCI_HEADER_TYPE_MFD 0x80 /* Multi-Function Device (possible) */ #define PCI_BIST 0x0f /* 8 bits */ #define PCI_BIST_CODE_MASK 0x0f /* Return result */ @@ -637,6 +638,7 @@ #define PCI_EXP_RTCAP 0x1e /* Root Capabilities */ #define PCI_EXP_RTCAP_CRSVIS 0x0001 /* CRS Software Visibility capability */ #define PCI_EXP_RTSTA 0x20 /* Root Status */ +#define PCI_EXP_RTSTA_PME_RQ_ID 0x0000ffff /* PME Requester ID */ #define PCI_EXP_RTSTA_PME 0x00010000 /* PME status */ #define PCI_EXP_RTSTA_PENDING 0x00020000 /* PME pending */ /* @@ -930,12 +932,13 @@ /* Process Address Space ID */ #define PCI_PASID_CAP 0x04 /* PASID feature register */ -#define PCI_PASID_CAP_EXEC 0x02 /* Exec permissions Supported */ -#define PCI_PASID_CAP_PRIV 0x04 /* Privilege Mode Supported */ +#define PCI_PASID_CAP_EXEC 0x0002 /* Exec permissions Supported */ +#define PCI_PASID_CAP_PRIV 0x0004 /* Privilege Mode Supported */ +#define PCI_PASID_CAP_WIDTH 0x1f00 #define PCI_PASID_CTRL 0x06 /* PASID control register */ -#define PCI_PASID_CTRL_ENABLE 0x01 /* Enable bit */ -#define PCI_PASID_CTRL_EXEC 0x02 /* Exec permissions Enable */ -#define PCI_PASID_CTRL_PRIV 0x04 /* Privilege Mode Enable */ +#define PCI_PASID_CTRL_ENABLE 0x0001 /* Enable bit */ +#define PCI_PASID_CTRL_EXEC 0x0002 /* Exec permissions Enable */ +#define PCI_PASID_CTRL_PRIV 0x0004 /* Privilege Mode Enable */ #define PCI_EXT_CAP_PASID_SIZEOF 8 /* Single Root I/O Virtualization */ @@ -975,6 +978,8 @@ #define PCI_LTR_VALUE_MASK 0x000003ff #define PCI_LTR_SCALE_MASK 0x00001c00 #define PCI_LTR_SCALE_SHIFT 10 +#define PCI_LTR_NOSNOOP_VALUE 0x03ff0000 /* Max No-Snoop Latency Value */ +#define PCI_LTR_NOSNOOP_SCALE 0x1c000000 /* Scale for Max Value */ #define PCI_EXT_CAP_LTR_SIZEOF 8 /* Access Control Service */ @@ -1042,9 +1047,16 @@ #define PCI_EXP_DPC_STATUS 0x08 /* DPC Status */ #define PCI_EXP_DPC_STATUS_TRIGGER 0x0001 /* Trigger Status */ #define PCI_EXP_DPC_STATUS_TRIGGER_RSN 0x0006 /* Trigger Reason */ +#define PCI_EXP_DPC_STATUS_TRIGGER_RSN_UNCOR 0x0000 /* Uncorrectable error */ +#define PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE 0x0002 /* Rcvd ERR_NONFATAL */ +#define PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE 0x0004 /* Rcvd ERR_FATAL */ +#define PCI_EXP_DPC_STATUS_TRIGGER_RSN_IN_EXT 0x0006 /* Reason in Trig Reason Extension field */ #define PCI_EXP_DPC_STATUS_INTERRUPT 0x0008 /* Interrupt Status */ #define PCI_EXP_DPC_RP_BUSY 0x0010 /* Root Port Busy */ #define PCI_EXP_DPC_STATUS_TRIGGER_RSN_EXT 0x0060 /* Trig Reason Extension */ +#define PCI_EXP_DPC_STATUS_TRIGGER_RSN_RP_PIO 0x0000 /* RP PIO error */ +#define PCI_EXP_DPC_STATUS_TRIGGER_RSN_SW_TRIGGER 0x0020 /* DPC SW Trigger bit */ +#define PCI_EXP_DPC_RP_PIO_FEP 0x1f00 /* RP PIO First Err Ptr */ #define PCI_EXP_DPC_SOURCE_ID 0x0A /* DPC Source Identifier */ @@ -1088,6 +1100,8 @@ #define PCI_L1SS_CTL1_LTR_L12_TH_VALUE 0x03ff0000 /* LTR_L1.2_THRESHOLD_Value */ #define PCI_L1SS_CTL1_LTR_L12_TH_SCALE 0xe0000000 /* LTR_L1.2_THRESHOLD_Scale */ #define PCI_L1SS_CTL2 0x0c /* Control 2 Register */ +#define PCI_L1SS_CTL2_T_PWR_ON_SCALE 0x00000003 /* T_POWER_ON Scale */ +#define PCI_L1SS_CTL2_T_PWR_ON_VALUE 0x000000f8 /* T_POWER_ON Value */ /* Designated Vendor-Specific (DVSEC, PCI_EXT_CAP_ID_DVSEC) */ #define PCI_DVSEC_HEADER1 0x4 /* Designated Vendor-Specific Header1 */ diff --git a/include/standard-headers/linux/vhost_types.h b/include/standard-headers/linux/vhost_types.h index 5ad07e134a..fd54044936 100644 --- a/include/standard-headers/linux/vhost_types.h +++ b/include/standard-headers/linux/vhost_types.h @@ -185,5 +185,12 @@ struct vhost_vdpa_iova_range { * DRIVER_OK */ #define VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK 0x6 +/* Device may expose the virtqueue's descriptor area, driver area and + * device area to a different group for ASID binding than where its + * buffers may reside. Requires VHOST_BACKEND_F_IOTLB_ASID. + */ +#define VHOST_BACKEND_F_DESC_ASID 0x7 +/* IOTLB don't flush memory mapping across device reset */ +#define VHOST_BACKEND_F_IOTLB_PERSIST 0x8 #endif diff --git a/include/standard-headers/linux/virtio_config.h b/include/standard-headers/linux/virtio_config.h index 8a7d0dc8b0..bfd1ca643e 100644 --- a/include/standard-headers/linux/virtio_config.h +++ b/include/standard-headers/linux/virtio_config.h @@ -103,6 +103,11 @@ */ #define VIRTIO_F_NOTIFICATION_DATA 38 +/* This feature indicates that the driver uses the data provided by the device + * as a virtqueue identifier in available buffer notifications. + */ +#define VIRTIO_F_NOTIF_CONFIG_DATA 39 + /* * This feature indicates that the driver can reset a queue individually. */ diff --git a/include/standard-headers/linux/virtio_pci.h b/include/standard-headers/linux/virtio_pci.h index be912cfc95..b7fdfd0668 100644 --- a/include/standard-headers/linux/virtio_pci.h +++ b/include/standard-headers/linux/virtio_pci.h @@ -166,6 +166,17 @@ struct virtio_pci_common_cfg { uint32_t queue_used_hi; /* read-write */ }; +/* + * Warning: do not use sizeof on this: use offsetofend for + * specific fields you need. + */ +struct virtio_pci_modern_common_cfg { + struct virtio_pci_common_cfg cfg; + + uint16_t queue_notify_data; /* read-write */ + uint16_t queue_reset; /* read-write */ +}; + /* Fields in VIRTIO_PCI_CAP_PCI_CFG: */ struct virtio_pci_cfg_cap { struct virtio_pci_cap cap; diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h index 38e5957526..c59ea55cd8 100644 --- a/linux-headers/asm-arm64/kvm.h +++ b/linux-headers/asm-arm64/kvm.h @@ -491,6 +491,38 @@ struct kvm_smccc_filter { #define KVM_HYPERCALL_EXIT_SMC (1U << 0) #define KVM_HYPERCALL_EXIT_16BIT (1U << 1) +/* + * Get feature ID registers userspace writable mask. + * + * From DDI0487J.a, D19.2.66 ("ID_AA64MMFR2_EL1, AArch64 Memory Model + * Feature Register 2"): + * + * "The Feature ID space is defined as the System register space in + * AArch64 with op0==3, op1=={0, 1, 3}, CRn==0, CRm=={0-7}, + * op2=={0-7}." + * + * This covers all currently known R/O registers that indicate + * anything useful feature wise, including the ID registers. + * + * If we ever need to introduce a new range, it will be described as + * such in the range field. + */ +#define KVM_ARM_FEATURE_ID_RANGE_IDX(op0, op1, crn, crm, op2) \ + ({ \ + __u64 __op1 = (op1) & 3; \ + __op1 -= (__op1 == 3); \ + (__op1 << 6 | ((crm) & 7) << 3 | (op2)); \ + }) + +#define KVM_ARM_FEATURE_ID_RANGE 0 +#define KVM_ARM_FEATURE_ID_RANGE_SIZE (3 * 8 * 8) + +struct reg_mask_range { + __u64 addr; /* Pointer to mask array */ + __u32 range; /* Requested range */ + __u32 reserved[13]; +}; + #endif #endif /* __ARM_KVM_H__ */ diff --git a/linux-headers/asm-generic/unistd.h b/linux-headers/asm-generic/unistd.h index abe087c53b..756b013fb8 100644 --- a/linux-headers/asm-generic/unistd.h +++ b/linux-headers/asm-generic/unistd.h @@ -71,7 +71,7 @@ __SYSCALL(__NR_fremovexattr, sys_fremovexattr) #define __NR_getcwd 17 __SYSCALL(__NR_getcwd, sys_getcwd) #define __NR_lookup_dcookie 18 -__SC_COMP(__NR_lookup_dcookie, sys_lookup_dcookie, compat_sys_lookup_dcookie) +__SYSCALL(__NR_lookup_dcookie, sys_ni_syscall) #define __NR_eventfd2 19 __SYSCALL(__NR_eventfd2, sys_eventfd2) #define __NR_epoll_create1 20 @@ -816,15 +816,21 @@ __SYSCALL(__NR_process_mrelease, sys_process_mrelease) __SYSCALL(__NR_futex_waitv, sys_futex_waitv) #define __NR_set_mempolicy_home_node 450 __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node) - #define __NR_cachestat 451 __SYSCALL(__NR_cachestat, sys_cachestat) - #define __NR_fchmodat2 452 __SYSCALL(__NR_fchmodat2, sys_fchmodat2) +#define __NR_map_shadow_stack 453 +__SYSCALL(__NR_map_shadow_stack, sys_map_shadow_stack) +#define __NR_futex_wake 454 +__SYSCALL(__NR_futex_wake, sys_futex_wake) +#define __NR_futex_wait 455 +__SYSCALL(__NR_futex_wait, sys_futex_wait) +#define __NR_futex_requeue 456 +__SYSCALL(__NR_futex_requeue, sys_futex_requeue) #undef __NR_syscalls -#define __NR_syscalls 453 +#define __NR_syscalls 457 /* * 32 bit systems traditionally used different diff --git a/linux-headers/asm-loongarch/bitsperlong.h b/linux-headers/asm-loongarch/bitsperlong.h new file mode 100644 index 0000000000..6dc0bb0c13 --- /dev/null +++ b/linux-headers/asm-loongarch/bitsperlong.h @@ -0,0 +1 @@ +#include diff --git a/linux-headers/asm-loongarch/kvm.h b/linux-headers/asm-loongarch/kvm.h new file mode 100644 index 0000000000..c6ad2ee610 --- /dev/null +++ b/linux-headers/asm-loongarch/kvm.h @@ -0,0 +1,108 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * Copyright (C) 2020-2023 Loongson Technology Corporation Limited + */ + +#ifndef __UAPI_ASM_LOONGARCH_KVM_H +#define __UAPI_ASM_LOONGARCH_KVM_H + +#include + +/* + * KVM LoongArch specific structures and definitions. + * + * Some parts derived from the x86 version of this file. + */ + +#define __KVM_HAVE_READONLY_MEM + +#define KVM_COALESCED_MMIO_PAGE_OFFSET 1 +#define KVM_DIRTY_LOG_PAGE_OFFSET 64 + +/* + * for KVM_GET_REGS and KVM_SET_REGS + */ +struct kvm_regs { + /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */ + __u64 gpr[32]; + __u64 pc; +}; + +/* + * for KVM_GET_FPU and KVM_SET_FPU + */ +struct kvm_fpu { + __u32 fcsr; + __u64 fcc; /* 8x8 */ + struct kvm_fpureg { + __u64 val64[4]; + } fpr[32]; +}; + +/* + * For LoongArch, we use KVM_SET_ONE_REG and KVM_GET_ONE_REG to access various + * registers. The id field is broken down as follows: + * + * bits[63..52] - As per linux/kvm.h + * bits[51..32] - Must be zero. + * bits[31..16] - Register set. + * + * Register set = 0: GP registers from kvm_regs (see definitions below). + * + * Register set = 1: CSR registers. + * + * Register set = 2: KVM specific registers (see definitions below). + * + * Register set = 3: FPU / SIMD registers (see definitions below). + * + * Other sets registers may be added in the future. Each set would + * have its own identifier in bits[31..16]. + */ + +#define KVM_REG_LOONGARCH_GPR (KVM_REG_LOONGARCH | 0x00000ULL) +#define KVM_REG_LOONGARCH_CSR (KVM_REG_LOONGARCH | 0x10000ULL) +#define KVM_REG_LOONGARCH_KVM (KVM_REG_LOONGARCH | 0x20000ULL) +#define KVM_REG_LOONGARCH_FPSIMD (KVM_REG_LOONGARCH | 0x30000ULL) +#define KVM_REG_LOONGARCH_CPUCFG (KVM_REG_LOONGARCH | 0x40000ULL) +#define KVM_REG_LOONGARCH_MASK (KVM_REG_LOONGARCH | 0x70000ULL) +#define KVM_CSR_IDX_MASK 0x7fff +#define KVM_CPUCFG_IDX_MASK 0x7fff + +/* + * KVM_REG_LOONGARCH_KVM - KVM specific control registers. + */ + +#define KVM_REG_LOONGARCH_COUNTER (KVM_REG_LOONGARCH_KVM | KVM_REG_SIZE_U64 | 1) +#define KVM_REG_LOONGARCH_VCPU_RESET (KVM_REG_LOONGARCH_KVM | KVM_REG_SIZE_U64 | 2) + +#define LOONGARCH_REG_SHIFT 3 +#define LOONGARCH_REG_64(TYPE, REG) (TYPE | KVM_REG_SIZE_U64 | (REG << LOONGARCH_REG_SHIFT)) +#define KVM_IOC_CSRID(REG) LOONGARCH_REG_64(KVM_REG_LOONGARCH_CSR, REG) +#define KVM_IOC_CPUCFG(REG) LOONGARCH_REG_64(KVM_REG_LOONGARCH_CPUCFG, REG) + +struct kvm_debug_exit_arch { +}; + +/* for KVM_SET_GUEST_DEBUG */ +struct kvm_guest_debug_arch { +}; + +/* definition of registers in kvm_run */ +struct kvm_sync_regs { +}; + +/* dummy definition */ +struct kvm_sregs { +}; + +struct kvm_iocsr_entry { + __u32 addr; + __u32 pad; + __u64 data; +}; + +#define KVM_NR_IRQCHIPS 1 +#define KVM_IRQCHIP_NUM_PINS 64 +#define KVM_MAX_CORES 256 + +#endif /* __UAPI_ASM_LOONGARCH_KVM_H */ diff --git a/linux-headers/asm-loongarch/mman.h b/linux-headers/asm-loongarch/mman.h new file mode 100644 index 0000000000..8eebf89f5a --- /dev/null +++ b/linux-headers/asm-loongarch/mman.h @@ -0,0 +1 @@ +#include diff --git a/linux-headers/asm-loongarch/unistd.h b/linux-headers/asm-loongarch/unistd.h new file mode 100644 index 0000000000..fcb668984f --- /dev/null +++ b/linux-headers/asm-loongarch/unistd.h @@ -0,0 +1,5 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#define __ARCH_WANT_SYS_CLONE +#define __ARCH_WANT_SYS_CLONE3 + +#include diff --git a/linux-headers/asm-mips/unistd_n32.h b/linux-headers/asm-mips/unistd_n32.h index 46d8500654..994b6f008f 100644 --- a/linux-headers/asm-mips/unistd_n32.h +++ b/linux-headers/asm-mips/unistd_n32.h @@ -381,5 +381,9 @@ #define __NR_set_mempolicy_home_node (__NR_Linux + 450) #define __NR_cachestat (__NR_Linux + 451) #define __NR_fchmodat2 (__NR_Linux + 452) +#define __NR_map_shadow_stack (__NR_Linux + 453) +#define __NR_futex_wake (__NR_Linux + 454) +#define __NR_futex_wait (__NR_Linux + 455) +#define __NR_futex_requeue (__NR_Linux + 456) #endif /* _ASM_UNISTD_N32_H */ diff --git a/linux-headers/asm-mips/unistd_n64.h b/linux-headers/asm-mips/unistd_n64.h index c2f7ac673b..41dcf5877a 100644 --- a/linux-headers/asm-mips/unistd_n64.h +++ b/linux-headers/asm-mips/unistd_n64.h @@ -357,5 +357,9 @@ #define __NR_set_mempolicy_home_node (__NR_Linux + 450) #define __NR_cachestat (__NR_Linux + 451) #define __NR_fchmodat2 (__NR_Linux + 452) +#define __NR_map_shadow_stack (__NR_Linux + 453) +#define __NR_futex_wake (__NR_Linux + 454) +#define __NR_futex_wait (__NR_Linux + 455) +#define __NR_futex_requeue (__NR_Linux + 456) #endif /* _ASM_UNISTD_N64_H */ diff --git a/linux-headers/asm-mips/unistd_o32.h b/linux-headers/asm-mips/unistd_o32.h index 757c68f2ad..ae9d334d96 100644 --- a/linux-headers/asm-mips/unistd_o32.h +++ b/linux-headers/asm-mips/unistd_o32.h @@ -427,5 +427,9 @@ #define __NR_set_mempolicy_home_node (__NR_Linux + 450) #define __NR_cachestat (__NR_Linux + 451) #define __NR_fchmodat2 (__NR_Linux + 452) +#define __NR_map_shadow_stack (__NR_Linux + 453) +#define __NR_futex_wake (__NR_Linux + 454) +#define __NR_futex_wait (__NR_Linux + 455) +#define __NR_futex_requeue (__NR_Linux + 456) #endif /* _ASM_UNISTD_O32_H */ diff --git a/linux-headers/asm-powerpc/unistd_32.h b/linux-headers/asm-powerpc/unistd_32.h index 8ef94bbac1..b9b23d66d7 100644 --- a/linux-headers/asm-powerpc/unistd_32.h +++ b/linux-headers/asm-powerpc/unistd_32.h @@ -434,6 +434,10 @@ #define __NR_set_mempolicy_home_node 450 #define __NR_cachestat 451 #define __NR_fchmodat2 452 +#define __NR_map_shadow_stack 453 +#define __NR_futex_wake 454 +#define __NR_futex_wait 455 +#define __NR_futex_requeue 456 #endif /* _ASM_UNISTD_32_H */ diff --git a/linux-headers/asm-powerpc/unistd_64.h b/linux-headers/asm-powerpc/unistd_64.h index 0e7ee43e88..cbb4b3e8f7 100644 --- a/linux-headers/asm-powerpc/unistd_64.h +++ b/linux-headers/asm-powerpc/unistd_64.h @@ -406,6 +406,10 @@ #define __NR_set_mempolicy_home_node 450 #define __NR_cachestat 451 #define __NR_fchmodat2 452 +#define __NR_map_shadow_stack 453 +#define __NR_futex_wake 454 +#define __NR_futex_wait 455 +#define __NR_futex_requeue 456 #endif /* _ASM_UNISTD_64_H */ diff --git a/linux-headers/asm-riscv/kvm.h b/linux-headers/asm-riscv/kvm.h index 992c5e4071..60d3b21dea 100644 --- a/linux-headers/asm-riscv/kvm.h +++ b/linux-headers/asm-riscv/kvm.h @@ -80,6 +80,7 @@ struct kvm_riscv_csr { unsigned long sip; unsigned long satp; unsigned long scounteren; + unsigned long senvcfg; }; /* AIA CSR registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */ @@ -93,6 +94,11 @@ struct kvm_riscv_aia_csr { unsigned long iprio2h; }; +/* Smstateen CSR for KVM_GET_ONE_REG and KVM_SET_ONE_REG */ +struct kvm_riscv_smstateen_csr { + unsigned long sstateen0; +}; + /* TIMER registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */ struct kvm_riscv_timer { __u64 frequency; @@ -131,6 +137,8 @@ enum KVM_RISCV_ISA_EXT_ID { KVM_RISCV_ISA_EXT_ZICSR, KVM_RISCV_ISA_EXT_ZIFENCEI, KVM_RISCV_ISA_EXT_ZIHPM, + KVM_RISCV_ISA_EXT_SMSTATEEN, + KVM_RISCV_ISA_EXT_ZICOND, KVM_RISCV_ISA_EXT_MAX, }; @@ -148,6 +156,7 @@ enum KVM_RISCV_SBI_EXT_ID { KVM_RISCV_SBI_EXT_PMU, KVM_RISCV_SBI_EXT_EXPERIMENTAL, KVM_RISCV_SBI_EXT_VENDOR, + KVM_RISCV_SBI_EXT_DBCN, KVM_RISCV_SBI_EXT_MAX, }; @@ -178,10 +187,13 @@ enum KVM_RISCV_SBI_EXT_ID { #define KVM_REG_RISCV_CSR (0x03 << KVM_REG_RISCV_TYPE_SHIFT) #define KVM_REG_RISCV_CSR_GENERAL (0x0 << KVM_REG_RISCV_SUBTYPE_SHIFT) #define KVM_REG_RISCV_CSR_AIA (0x1 << KVM_REG_RISCV_SUBTYPE_SHIFT) +#define KVM_REG_RISCV_CSR_SMSTATEEN (0x2 << KVM_REG_RISCV_SUBTYPE_SHIFT) #define KVM_REG_RISCV_CSR_REG(name) \ (offsetof(struct kvm_riscv_csr, name) / sizeof(unsigned long)) #define KVM_REG_RISCV_CSR_AIA_REG(name) \ (offsetof(struct kvm_riscv_aia_csr, name) / sizeof(unsigned long)) +#define KVM_REG_RISCV_CSR_SMSTATEEN_REG(name) \ + (offsetof(struct kvm_riscv_smstateen_csr, name) / sizeof(unsigned long)) /* Timer registers are mapped as type 4 */ #define KVM_REG_RISCV_TIMER (0x04 << KVM_REG_RISCV_TYPE_SHIFT) diff --git a/linux-headers/asm-s390/unistd_32.h b/linux-headers/asm-s390/unistd_32.h index 716fa368ca..c093e6d5f9 100644 --- a/linux-headers/asm-s390/unistd_32.h +++ b/linux-headers/asm-s390/unistd_32.h @@ -425,5 +425,9 @@ #define __NR_set_mempolicy_home_node 450 #define __NR_cachestat 451 #define __NR_fchmodat2 452 +#define __NR_map_shadow_stack 453 +#define __NR_futex_wake 454 +#define __NR_futex_wait 455 +#define __NR_futex_requeue 456 #endif /* _ASM_S390_UNISTD_32_H */ diff --git a/linux-headers/asm-s390/unistd_64.h b/linux-headers/asm-s390/unistd_64.h index b2a11b1d13..114c0569a4 100644 --- a/linux-headers/asm-s390/unistd_64.h +++ b/linux-headers/asm-s390/unistd_64.h @@ -373,5 +373,9 @@ #define __NR_set_mempolicy_home_node 450 #define __NR_cachestat 451 #define __NR_fchmodat2 452 +#define __NR_map_shadow_stack 453 +#define __NR_futex_wake 454 +#define __NR_futex_wait 455 +#define __NR_futex_requeue 456 #endif /* _ASM_S390_UNISTD_64_H */ diff --git a/linux-headers/asm-x86/unistd_32.h b/linux-headers/asm-x86/unistd_32.h index d749ad1c24..329649c377 100644 --- a/linux-headers/asm-x86/unistd_32.h +++ b/linux-headers/asm-x86/unistd_32.h @@ -443,6 +443,10 @@ #define __NR_set_mempolicy_home_node 450 #define __NR_cachestat 451 #define __NR_fchmodat2 452 +#define __NR_map_shadow_stack 453 +#define __NR_futex_wake 454 +#define __NR_futex_wait 455 +#define __NR_futex_requeue 456 #endif /* _ASM_UNISTD_32_H */ diff --git a/linux-headers/asm-x86/unistd_64.h b/linux-headers/asm-x86/unistd_64.h index cea67282eb..4583606ce6 100644 --- a/linux-headers/asm-x86/unistd_64.h +++ b/linux-headers/asm-x86/unistd_64.h @@ -366,6 +366,9 @@ #define __NR_cachestat 451 #define __NR_fchmodat2 452 #define __NR_map_shadow_stack 453 +#define __NR_futex_wake 454 +#define __NR_futex_wait 455 +#define __NR_futex_requeue 456 #endif /* _ASM_UNISTD_64_H */ diff --git a/linux-headers/asm-x86/unistd_x32.h b/linux-headers/asm-x86/unistd_x32.h index 5b2e79bf4c..146d74d8e4 100644 --- a/linux-headers/asm-x86/unistd_x32.h +++ b/linux-headers/asm-x86/unistd_x32.h @@ -318,6 +318,9 @@ #define __NR_set_mempolicy_home_node (__X32_SYSCALL_BIT + 450) #define __NR_cachestat (__X32_SYSCALL_BIT + 451) #define __NR_fchmodat2 (__X32_SYSCALL_BIT + 452) +#define __NR_futex_wake (__X32_SYSCALL_BIT + 454) +#define __NR_futex_wait (__X32_SYSCALL_BIT + 455) +#define __NR_futex_requeue (__X32_SYSCALL_BIT + 456) #define __NR_rt_sigaction (__X32_SYSCALL_BIT + 512) #define __NR_rt_sigreturn (__X32_SYSCALL_BIT + 513) #define __NR_ioctl (__X32_SYSCALL_BIT + 514) diff --git a/linux-headers/linux/iommufd.h b/linux-headers/linux/iommufd.h index 218bf7ac98..72e8f4b9dd 100644 --- a/linux-headers/linux/iommufd.h +++ b/linux-headers/linux/iommufd.h @@ -47,6 +47,9 @@ enum { IOMMUFD_CMD_VFIO_IOAS, IOMMUFD_CMD_HWPT_ALLOC, IOMMUFD_CMD_GET_HW_INFO, + IOMMUFD_CMD_HWPT_SET_DIRTY_TRACKING, + IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP, + IOMMUFD_CMD_HWPT_INVALIDATE, }; /** @@ -347,20 +350,86 @@ struct iommu_vfio_ioas { }; #define IOMMU_VFIO_IOAS _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VFIO_IOAS) +/** + * enum iommufd_hwpt_alloc_flags - Flags for HWPT allocation + * @IOMMU_HWPT_ALLOC_NEST_PARENT: If set, allocate a HWPT that can serve as + * the parent HWPT in a nesting configuration. + * @IOMMU_HWPT_ALLOC_DIRTY_TRACKING: Dirty tracking support for device IOMMU is + * enforced on device attachment + */ +enum iommufd_hwpt_alloc_flags { + IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0, + IOMMU_HWPT_ALLOC_DIRTY_TRACKING = 1 << 1, +}; + +/** + * enum iommu_hwpt_vtd_s1_flags - Intel VT-d stage-1 page table + * entry attributes + * @IOMMU_VTD_S1_SRE: Supervisor request + * @IOMMU_VTD_S1_EAFE: Extended access enable + * @IOMMU_VTD_S1_WPE: Write protect enable + */ +enum iommu_hwpt_vtd_s1_flags { + IOMMU_VTD_S1_SRE = 1 << 0, + IOMMU_VTD_S1_EAFE = 1 << 1, + IOMMU_VTD_S1_WPE = 1 << 2, +}; + +/** + * struct iommu_hwpt_vtd_s1 - Intel VT-d stage-1 page table + * info (IOMMU_HWPT_DATA_VTD_S1) + * @flags: Combination of enum iommu_hwpt_vtd_s1_flags + * @pgtbl_addr: The base address of the stage-1 page table. + * @addr_width: The address width of the stage-1 page table + * @__reserved: Must be 0 + */ +struct iommu_hwpt_vtd_s1 { + __aligned_u64 flags; + __aligned_u64 pgtbl_addr; + __u32 addr_width; + __u32 __reserved; +}; + +/** + * enum iommu_hwpt_data_type - IOMMU HWPT Data Type + * @IOMMU_HWPT_DATA_NONE: no data + * @IOMMU_HWPT_DATA_VTD_S1: Intel VT-d stage-1 page table + */ +enum iommu_hwpt_data_type { + IOMMU_HWPT_DATA_NONE, + IOMMU_HWPT_DATA_VTD_S1, +}; + /** * struct iommu_hwpt_alloc - ioctl(IOMMU_HWPT_ALLOC) * @size: sizeof(struct iommu_hwpt_alloc) - * @flags: Must be 0 + * @flags: Combination of enum iommufd_hwpt_alloc_flags * @dev_id: The device to allocate this HWPT for - * @pt_id: The IOAS to connect this HWPT to + * @pt_id: The IOAS or HWPT to connect this HWPT to * @out_hwpt_id: The ID of the new HWPT * @__reserved: Must be 0 + * @data_type: One of enum iommu_hwpt_data_type + * @data_len: Length of the type specific data + * @data_uptr: User pointer to the type specific data * * Explicitly allocate a hardware page table object. This is the same object * type that is returned by iommufd_device_attach() and represents the * underlying iommu driver's iommu_domain kernel object. * - * A HWPT will be created with the IOVA mappings from the given IOAS. + * A kernel-managed HWPT will be created with the mappings from the given + * IOAS via the @pt_id. The @data_type for this allocation must be set to + * IOMMU_HWPT_DATA_NONE. The HWPT can be allocated as a parent HWPT for a + * nesting configuration by passing IOMMU_HWPT_ALLOC_NEST_PARENT via @flags. + * + * A user-managed nested HWPT will be created from a given parent HWPT via + * @pt_id, in which the parent HWPT must be allocated previously via the + * same ioctl from a given IOAS (@pt_id). In this case, the @data_type + * must be set to a pre-defined type corresponding to an I/O page table + * type supported by the underlying IOMMU hardware. + * + * If the @data_type is set to IOMMU_HWPT_DATA_NONE, @data_len and + * @data_uptr should be zero. Otherwise, both @data_len and @data_uptr + * must be given. */ struct iommu_hwpt_alloc { __u32 size; @@ -369,13 +438,26 @@ struct iommu_hwpt_alloc { __u32 pt_id; __u32 out_hwpt_id; __u32 __reserved; + __u32 data_type; + __u32 data_len; + __aligned_u64 data_uptr; }; #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC) +/** + * enum iommu_hw_info_vtd_flags - Flags for VT-d hw_info + * @IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17: If set, disallow read-only mappings + * on a nested_parent domain. + * https://www.intel.com/content/www/us/en/content-details/772415/content-details.html + */ +enum iommu_hw_info_vtd_flags { + IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 = 1 << 0, +}; + /** * struct iommu_hw_info_vtd - Intel VT-d hardware information * - * @flags: Must be 0 + * @flags: Combination of enum iommu_hw_info_vtd_flags * @__reserved: Must be 0 * * @cap_reg: Value of Intel VT-d capability register defined in VT-d spec @@ -404,6 +486,20 @@ enum iommu_hw_info_type { IOMMU_HW_INFO_TYPE_INTEL_VTD, }; +/** + * enum iommufd_hw_capabilities + * @IOMMU_HW_CAP_DIRTY_TRACKING: IOMMU hardware support for dirty tracking + * If available, it means the following APIs + * are supported: + * + * IOMMU_HWPT_GET_DIRTY_BITMAP + * IOMMU_HWPT_SET_DIRTY_TRACKING + * + */ +enum iommufd_hw_capabilities { + IOMMU_HW_CAP_DIRTY_TRACKING = 1 << 0, +}; + /** * struct iommu_hw_info - ioctl(IOMMU_GET_HW_INFO) * @size: sizeof(struct iommu_hw_info) @@ -415,6 +511,8 @@ enum iommu_hw_info_type { * the iommu type specific hardware information data * @out_data_type: Output the iommu hardware info type as defined in the enum * iommu_hw_info_type. + * @out_capabilities: Output the generic iommu capability info type as defined + * in the enum iommu_hw_capabilities. * @__reserved: Must be 0 * * Query an iommu type specific hardware information data from an iommu behind @@ -439,6 +537,159 @@ struct iommu_hw_info { __aligned_u64 data_uptr; __u32 out_data_type; __u32 __reserved; + __aligned_u64 out_capabilities; }; #define IOMMU_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_GET_HW_INFO) + +/* + * enum iommufd_hwpt_set_dirty_tracking_flags - Flags for steering dirty + * tracking + * @IOMMU_HWPT_DIRTY_TRACKING_ENABLE: Enable dirty tracking + */ +enum iommufd_hwpt_set_dirty_tracking_flags { + IOMMU_HWPT_DIRTY_TRACKING_ENABLE = 1, +}; + +/** + * struct iommu_hwpt_set_dirty_tracking - ioctl(IOMMU_HWPT_SET_DIRTY_TRACKING) + * @size: sizeof(struct iommu_hwpt_set_dirty_tracking) + * @flags: Combination of enum iommufd_hwpt_set_dirty_tracking_flags + * @hwpt_id: HW pagetable ID that represents the IOMMU domain + * @__reserved: Must be 0 + * + * Toggle dirty tracking on an HW pagetable. + */ +struct iommu_hwpt_set_dirty_tracking { + __u32 size; + __u32 flags; + __u32 hwpt_id; + __u32 __reserved; +}; +#define IOMMU_HWPT_SET_DIRTY_TRACKING _IO(IOMMUFD_TYPE, \ + IOMMUFD_CMD_HWPT_SET_DIRTY_TRACKING) + +/** + * enum iommufd_hwpt_get_dirty_bitmap_flags - Flags for getting dirty bits + * @IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR: Just read the PTEs without clearing + * any dirty bits metadata. This flag + * can be passed in the expectation + * where the next operation is an unmap + * of the same IOVA range. + * + */ +enum iommufd_hwpt_get_dirty_bitmap_flags { + IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR = 1, +}; + +/** + * struct iommu_hwpt_get_dirty_bitmap - ioctl(IOMMU_HWPT_GET_DIRTY_BITMAP) + * @size: sizeof(struct iommu_hwpt_get_dirty_bitmap) + * @hwpt_id: HW pagetable ID that represents the IOMMU domain + * @flags: Combination of enum iommufd_hwpt_get_dirty_bitmap_flags + * @__reserved: Must be 0 + * @iova: base IOVA of the bitmap first bit + * @length: IOVA range size + * @page_size: page size granularity of each bit in the bitmap + * @data: bitmap where to set the dirty bits. The bitmap bits each + * represent a page_size which you deviate from an arbitrary iova. + * + * Checking a given IOVA is dirty: + * + * data[(iova / page_size) / 64] & (1ULL << ((iova / page_size) % 64)) + * + * Walk the IOMMU pagetables for a given IOVA range to return a bitmap + * with the dirty IOVAs. In doing so it will also by default clear any + * dirty bit metadata set in the IOPTE. + */ +struct iommu_hwpt_get_dirty_bitmap { + __u32 size; + __u32 hwpt_id; + __u32 flags; + __u32 __reserved; + __aligned_u64 iova; + __aligned_u64 length; + __aligned_u64 page_size; + __aligned_u64 data; +}; +#define IOMMU_HWPT_GET_DIRTY_BITMAP _IO(IOMMUFD_TYPE, \ + IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP) + +/** + * enum iommu_hwpt_invalidate_data_type - IOMMU HWPT Cache Invalidation + * Data Type + * @IOMMU_HWPT_INVALIDATE_DATA_VTD_S1: Invalidation data for VTD_S1 + */ +enum iommu_hwpt_invalidate_data_type { + IOMMU_HWPT_INVALIDATE_DATA_VTD_S1, +}; + +/** + * enum iommu_hwpt_vtd_s1_invalidate_flags - Flags for Intel VT-d + * stage-1 cache invalidation + * @IOMMU_VTD_INV_FLAGS_LEAF: Indicates whether the invalidation applies + * to all-levels page structure cache or just + * the leaf PTE cache. + */ +enum iommu_hwpt_vtd_s1_invalidate_flags { + IOMMU_VTD_INV_FLAGS_LEAF = 1 << 0, +}; + +/** + * struct iommu_hwpt_vtd_s1_invalidate - Intel VT-d cache invalidation + * (IOMMU_HWPT_INVALIDATE_DATA_VTD_S1) + * @addr: The start address of the range to be invalidated. It needs to + * be 4KB aligned. + * @npages: Number of contiguous 4K pages to be invalidated. + * @flags: Combination of enum iommu_hwpt_vtd_s1_invalidate_flags + * @__reserved: Must be 0 + * + * The Intel VT-d specific invalidation data for user-managed stage-1 cache + * invalidation in nested translation. Userspace uses this structure to + * tell the impacted cache scope after modifying the stage-1 page table. + * + * Invalidating all the caches related to the page table by setting @addr + * to be 0 and @npages to be U64_MAX. + * + * The device TLB will be invalidated automatically if ATS is enabled. + */ +struct iommu_hwpt_vtd_s1_invalidate { + __aligned_u64 addr; + __aligned_u64 npages; + __u32 flags; + __u32 __reserved; +}; + +/** + * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE) + * @size: sizeof(struct iommu_hwpt_invalidate) + * @hwpt_id: ID of a nested HWPT for cache invalidation + * @data_uptr: User pointer to an array of driver-specific cache invalidation + * data. + * @data_type: One of enum iommu_hwpt_invalidate_data_type, defining the data + * type of all the entries in the invalidation request array. It + * should be a type supported by the hwpt pointed by @hwpt_id. + * @entry_len: Length (in bytes) of a request entry in the request array + * @entry_num: Input the number of cache invalidation requests in the array. + * Output the number of requests successfully handled by kernel. + * @__reserved: Must be 0. + * + * Invalidate the iommu cache for user-managed page table. Modifications on a + * user-managed page table should be followed by this operation to sync cache. + * Each ioctl can support one or more cache invalidation requests in the array + * that has a total size of @entry_len * @entry_num. + * + * An empty invalidation request array by setting @entry_num==0 is allowed, and + * @entry_len and @data_uptr would be ignored in this case. This can be used to + * check if the given @data_type is supported or not by kernel. + */ +struct iommu_hwpt_invalidate { + __u32 size; + __u32 hwpt_id; + __aligned_u64 data_uptr; + __u32 data_type; + __u32 entry_len; + __u32 entry_num; + __u32 __reserved; +}; +#define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE) #endif diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index 0d74ee999a..549fea3a97 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -264,6 +264,7 @@ struct kvm_xen_exit { #define KVM_EXIT_RISCV_SBI 35 #define KVM_EXIT_RISCV_CSR 36 #define KVM_EXIT_NOTIFY 37 +#define KVM_EXIT_LOONGARCH_IOCSR 38 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -336,6 +337,13 @@ struct kvm_run { __u32 len; __u8 is_write; } mmio; + /* KVM_EXIT_LOONGARCH_IOCSR */ + struct { + __u64 phys_addr; + __u8 data[8]; + __u32 len; + __u8 is_write; + } iocsr_io; /* KVM_EXIT_HYPERCALL */ struct { __u64 nr; @@ -1188,6 +1196,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_COUNTER_OFFSET 227 #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228 #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229 +#define KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES 230 #ifdef KVM_CAP_IRQ_ROUTING @@ -1358,6 +1367,7 @@ struct kvm_dirty_tlb { #define KVM_REG_ARM64 0x6000000000000000ULL #define KVM_REG_MIPS 0x7000000000000000ULL #define KVM_REG_RISCV 0x8000000000000000ULL +#define KVM_REG_LOONGARCH 0x9000000000000000ULL #define KVM_REG_SIZE_SHIFT 52 #define KVM_REG_SIZE_MASK 0x00f0000000000000ULL @@ -1558,6 +1568,7 @@ struct kvm_s390_ucas_mapping { #define KVM_ARM_MTE_COPY_TAGS _IOR(KVMIO, 0xb4, struct kvm_arm_copy_mte_tags) /* Available with KVM_CAP_COUNTER_OFFSET */ #define KVM_ARM_SET_COUNTER_OFFSET _IOW(KVMIO, 0xb5, struct kvm_arm_counter_offset) +#define KVM_ARM_GET_REG_WRITABLE_MASKS _IOR(KVMIO, 0xb6, struct reg_mask_range) /* ioctl for vm fd */ #define KVM_CREATE_DEVICE _IOWR(KVMIO, 0xe0, struct kvm_create_device) diff --git a/linux-headers/linux/psp-sev.h b/linux-headers/linux/psp-sev.h index 12ccb70099..bcb21339ee 100644 --- a/linux-headers/linux/psp-sev.h +++ b/linux-headers/linux/psp-sev.h @@ -68,6 +68,7 @@ typedef enum { SEV_RET_INVALID_PARAM, SEV_RET_RESOURCE_LIMIT, SEV_RET_SECURE_DATA_INVALID, + SEV_RET_INVALID_KEY = 0x27, SEV_RET_MAX, } sev_ret_code; diff --git a/linux-headers/linux/stddef.h b/linux-headers/linux/stddef.h index 9bb07083ac..bf9749dd14 100644 --- a/linux-headers/linux/stddef.h +++ b/linux-headers/linux/stddef.h @@ -27,8 +27,13 @@ union { \ struct { MEMBERS } ATTRS; \ struct TAG { MEMBERS } ATTRS NAME; \ - } + } ATTRS +#ifdef __cplusplus +/* sizeof(struct{}) is 1 in C++, not 0, can't use C version of the macro. */ +#define __DECLARE_FLEX_ARRAY(T, member) \ + T member[0] +#else /** * __DECLARE_FLEX_ARRAY() - Declare a flexible array usable in a union * @@ -49,3 +54,5 @@ #ifndef __counted_by #define __counted_by(m) #endif + +#endif /* _LINUX_STDDEF_H */ diff --git a/linux-headers/linux/userfaultfd.h b/linux-headers/linux/userfaultfd.h index 59978fbaae..953c75feda 100644 --- a/linux-headers/linux/userfaultfd.h +++ b/linux-headers/linux/userfaultfd.h @@ -40,7 +40,8 @@ UFFD_FEATURE_EXACT_ADDRESS | \ UFFD_FEATURE_WP_HUGETLBFS_SHMEM | \ UFFD_FEATURE_WP_UNPOPULATED | \ - UFFD_FEATURE_POISON) + UFFD_FEATURE_POISON | \ + UFFD_FEATURE_WP_ASYNC) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -216,6 +217,11 @@ struct uffdio_api { * (i.e. empty ptes). This will be the default behavior for shmem * & hugetlbfs, so this flag only affects anonymous memory behavior * when userfault write-protection mode is registered. + * + * UFFD_FEATURE_WP_ASYNC indicates that userfaultfd write-protection + * asynchronous mode is supported in which the write fault is + * automatically resolved and write-protection is un-set. + * It implies UFFD_FEATURE_WP_UNPOPULATED. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -232,6 +238,7 @@ struct uffdio_api { #define UFFD_FEATURE_WP_HUGETLBFS_SHMEM (1<<12) #define UFFD_FEATURE_WP_UNPOPULATED (1<<13) #define UFFD_FEATURE_POISON (1<<14) +#define UFFD_FEATURE_WP_ASYNC (1<<15) __u64 features; __u64 ioctls; diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h index acf72b4999..8e175ece31 100644 --- a/linux-headers/linux/vfio.h +++ b/linux-headers/linux/vfio.h @@ -277,8 +277,8 @@ struct vfio_region_info { #define VFIO_REGION_INFO_FLAG_CAPS (1 << 3) /* Info supports caps */ __u32 index; /* Region index */ __u32 cap_offset; /* Offset within info struct of first cap */ - __u64 size; /* Region size (bytes) */ - __u64 offset; /* Region offset from start of device fd */ + __aligned_u64 size; /* Region size (bytes) */ + __aligned_u64 offset; /* Region offset from start of device fd */ }; #define VFIO_DEVICE_GET_REGION_INFO _IO(VFIO_TYPE, VFIO_BASE + 8) @@ -294,8 +294,8 @@ struct vfio_region_info { #define VFIO_REGION_INFO_CAP_SPARSE_MMAP 1 struct vfio_region_sparse_mmap_area { - __u64 offset; /* Offset of mmap'able area within region */ - __u64 size; /* Size of mmap'able area */ + __aligned_u64 offset; /* Offset of mmap'able area within region */ + __aligned_u64 size; /* Size of mmap'able area */ }; struct vfio_region_info_cap_sparse_mmap { @@ -450,9 +450,9 @@ struct vfio_device_migration_info { VFIO_DEVICE_STATE_V1_RESUMING) __u32 reserved; - __u64 pending_bytes; - __u64 data_offset; - __u64 data_size; + __aligned_u64 pending_bytes; + __aligned_u64 data_offset; + __aligned_u64 data_size; }; /* @@ -476,7 +476,7 @@ struct vfio_device_migration_info { struct vfio_region_info_cap_nvlink2_ssatgt { struct vfio_info_cap_header header; - __u64 tgt; + __aligned_u64 tgt; }; /* @@ -816,7 +816,7 @@ struct vfio_device_gfx_plane_info { __u32 drm_plane_type; /* type of plane: DRM_PLANE_TYPE_* */ /* out */ __u32 drm_format; /* drm format of plane */ - __u64 drm_format_mod; /* tiled mode */ + __aligned_u64 drm_format_mod; /* tiled mode */ __u32 width; /* width of plane */ __u32 height; /* height of plane */ __u32 stride; /* stride of plane */ @@ -829,6 +829,7 @@ struct vfio_device_gfx_plane_info { __u32 region_index; /* region index */ __u32 dmabuf_id; /* dma-buf id */ }; + __u32 reserved; }; #define VFIO_DEVICE_QUERY_GFX_PLANE _IO(VFIO_TYPE, VFIO_BASE + 14) @@ -863,9 +864,10 @@ struct vfio_device_ioeventfd { #define VFIO_DEVICE_IOEVENTFD_32 (1 << 2) /* 4-byte write */ #define VFIO_DEVICE_IOEVENTFD_64 (1 << 3) /* 8-byte write */ #define VFIO_DEVICE_IOEVENTFD_SIZE_MASK (0xf) - __u64 offset; /* device fd offset of write */ - __u64 data; /* data to be written */ + __aligned_u64 offset; /* device fd offset of write */ + __aligned_u64 data; /* data to be written */ __s32 fd; /* -1 for de-assignment */ + __u32 reserved; }; #define VFIO_DEVICE_IOEVENTFD _IO(VFIO_TYPE, VFIO_BASE + 16) @@ -1434,6 +1436,27 @@ struct vfio_device_feature_mig_data_size { #define VFIO_DEVICE_FEATURE_MIG_DATA_SIZE 9 +/** + * Upon VFIO_DEVICE_FEATURE_SET, set or clear the BUS mastering for the device + * based on the operation specified in op flag. + * + * The functionality is incorporated for devices that needs bus master control, + * but the in-band device interface lacks the support. Consequently, it is not + * applicable to PCI devices, as bus master control for PCI devices is managed + * in-band through the configuration space. At present, this feature is supported + * only for CDX devices. + * When the device's BUS MASTER setting is configured as CLEAR, it will result in + * blocking all incoming DMA requests from the device. On the other hand, configuring + * the device's BUS MASTER setting as SET (enable) will grant the device the + * capability to perform DMA to the host memory. + */ +struct vfio_device_feature_bus_master { + __u32 op; +#define VFIO_DEVICE_FEATURE_CLEAR_MASTER 0 /* Clear Bus Master */ +#define VFIO_DEVICE_FEATURE_SET_MASTER 1 /* Set Bus Master */ +}; +#define VFIO_DEVICE_FEATURE_BUS_MASTER 10 + /* -------- API for Type1 VFIO IOMMU -------- */ /** @@ -1449,7 +1472,7 @@ struct vfio_iommu_type1_info { __u32 flags; #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */ #define VFIO_IOMMU_INFO_CAPS (1 << 1) /* Info supports caps */ - __u64 iova_pgsizes; /* Bitmap of supported page sizes */ + __aligned_u64 iova_pgsizes; /* Bitmap of supported page sizes */ __u32 cap_offset; /* Offset within info struct of first cap */ __u32 pad; }; diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h index f5c48b61ab..649560c685 100644 --- a/linux-headers/linux/vhost.h +++ b/linux-headers/linux/vhost.h @@ -219,4 +219,12 @@ */ #define VHOST_VDPA_RESUME _IO(VHOST_VIRTIO, 0x7E) +/* Get the group for the descriptor table including driver & device areas + * of a virtqueue: read index, write group in num. + * The virtqueue index is stored in the index field of vhost_vring_state. + * The group ID of the descriptor table for this specific virtqueue + * is returned via num field of vhost_vring_state. + */ +#define VHOST_VDPA_GET_VRING_DESC_GROUP _IOWR(VHOST_VIRTIO, 0x7F, \ + struct vhost_vring_state) #endif From patchwork Mon Jan 15 10:37:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519520 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 84815C47258 for ; Mon, 15 Jan 2024 10:43:42 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKNd-0000dw-Vz; Mon, 15 Jan 2024 05:39:47 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKNb-0000dj-Kw for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:39:43 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKNZ-0002iw-QQ for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:39:43 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315182; x=1736851182; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=H+naJ7VRctkuwtffWQXrbsDi9DkheXW/d5BzZnaSwP8=; b=XDkpDgHZMI+qLRVruCl77hZdBBkTb/jHEmnFN3yc1GGZAy4ESCe+ceDX SmGX42JOnZchh4dmWP1XLA9yh27QvElTwx96+Ero9GeWG3LTg5n57DRVy 5qQoI+wuwltrV6DRSyEvNRsPzyRXpubtK3eZ7i0Gxx0OIJ0AeB4XImbXF 6eMD3Op0owl9USUBFSwC89xYfiaUTB1lHWtYVUoqO23uQVU6Vr5SwFsdG BRIw1r7Aoao3ZZHyZk5hZ1QaxnEDTnDs+SfguHMsKNRG4R2aP+sOWY7XB x8jergMDebcxCjxEhts8b3/jn0E58ZZbpiHkcF0S/OAomOHdHO6J+Vy2t g==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13067461" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13067461" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:39:41 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065313" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065313" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:39:37 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Zhenzhong Duan Subject: [PATCH rfcv1 02/23] backends/iommufd: add helpers for allocating user-managed HWPT Date: Mon, 15 Jan 2024 18:37:14 +0800 Message-Id: <20240115103735.132209-3-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Include helper to allocate user-managed hwpt and helper for cache invalidation as user-managed HWPT needs to sync cache per modifications. Signed-off-by: Nicolin Chen Signed-off-by: Zhenzhong Duan --- include/sysemu/iommufd.h | 7 +++++ backends/iommufd.c | 61 ++++++++++++++++++++++++++++++++++++++++ backends/trace-events | 2 ++ 3 files changed, 70 insertions(+) diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h index 9af27ebd6c..ab6c382081 100644 --- a/include/sysemu/iommufd.h +++ b/include/sysemu/iommufd.h @@ -33,4 +33,11 @@ int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova, ram_addr_t size, void *vaddr, bool readonly); int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova, ram_addr_t size); +int iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id, + uint32_t pt_id, uint32_t flags, + uint32_t data_type, uint32_t data_len, + void *data_ptr, uint32_t *out_hwpt); +int iommufd_backend_invalidate_cache(IOMMUFDBackend *be, uint32_t hwpt_id, + uint32_t data_type, uint32_t entry_len, + uint32_t *entry_num, void *data_ptr); #endif diff --git a/backends/iommufd.c b/backends/iommufd.c index 1ef683c7b0..9f920e08d3 100644 --- a/backends/iommufd.c +++ b/backends/iommufd.c @@ -211,6 +211,67 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id, return ret; } +int iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id, + uint32_t pt_id, uint32_t flags, + uint32_t data_type, uint32_t data_len, + void *data_ptr, uint32_t *out_hwpt) +{ + int ret, fd = be->fd; + struct iommu_hwpt_alloc alloc_hwpt = { + .size = sizeof(struct iommu_hwpt_alloc), + .flags = flags, + .dev_id = dev_id, + .pt_id = pt_id, + .data_type = data_type, + .data_len = data_len, + .data_uptr = (uintptr_t)data_ptr, + .__reserved = 0, + }; + + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt); + if (ret) { + ret = -errno; + error_report("IOMMU_HWPT_ALLOC failed: %m"); + } else { + *out_hwpt = alloc_hwpt.out_hwpt_id; + } + + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type, + data_len, (uint64_t)data_ptr, + alloc_hwpt.out_hwpt_id, ret); + return ret; +} + +int iommufd_backend_invalidate_cache(IOMMUFDBackend *be, uint32_t hwpt_id, + uint32_t data_type, uint32_t entry_len, + uint32_t *entry_num, void *data_ptr) +{ + int ret, fd = be->fd; + struct iommu_hwpt_invalidate cache = { + .size = sizeof(cache), + .hwpt_id = hwpt_id, + .data_type = data_type, + .entry_len = entry_len, + .entry_num = *entry_num, + .data_uptr = (uintptr_t)data_ptr, + }; + + ret = ioctl(fd, IOMMU_HWPT_INVALIDATE, &cache); + + trace_iommufd_backend_invalidate_cache(fd, hwpt_id, data_type, entry_len, + *entry_num, cache.entry_num, + (uintptr_t)data_ptr, ret); + if (ret) { + *entry_num = cache.entry_num; + error_report("IOMMU_HWPT_INVALIDATE failed: %s", strerror(errno)); + ret = -errno; + } else { + g_assert(*entry_num == cache.entry_num); + } + + return ret; +} + static const TypeInfo iommufd_backend_info = { .name = TYPE_IOMMUFD_BACKEND, .parent = TYPE_OBJECT, diff --git a/backends/trace-events b/backends/trace-events index d45c6e31a6..3df48bfb08 100644 --- a/backends/trace-events +++ b/backends/trace-events @@ -15,3 +15,5 @@ iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, u iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)" iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d ioas=%d (%d)" iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)" +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)" +iommufd_backend_invalidate_cache(int iommufd, uint32_t hwpt_id, uint32_t data_type, uint32_t entry_len, uint32_t entry_num, uint32_t done_num, uint64_t data_ptr, int ret) " iommufd=%d hwpt_id=%u data_type=%u entry_len=%u entry_num=%u done_num=%u data_ptr=0x%"PRIx64" (%d)" From patchwork Mon Jan 15 10:37:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519497 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6524FC3DA79 for ; Mon, 15 Jan 2024 10:40:16 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKNi-0000ek-Sy; Mon, 15 Jan 2024 05:39:51 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKNg-0000ea-R3 for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:39:49 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKNf-0002kC-5H for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:39:48 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315187; x=1736851187; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CpVZAHI8xoxlAfinekUHSCH11i1K//4shVWNcQ5H+gA=; b=gQ+fxVkuYrBFVOir3iLQ7cBZQ49yoiSwuh3pniLbfLgNVuAj7XpgFgDc H8kN9xaC1dQcGWIXC+X8Z08PR6MlxErVKUAC3tvX0gOOfLLGYTnofems0 Xr6RsP7bUXxusU9iISwMcoOPdjM9UnhXpcxVDrcw6gb/lozS+9HfZMVWy Cg8UJk7xnqpLJY5iSLrPhCAdGl/Xxf7OFYTry4wr/dol2t/PpFuJhlmkR bCX6uOlYfXZbqyDHpgK5zDzVFcOdP6UPWS3SekW6swfsJ6JRc5mU59nyF qbPOgk/la8c9lhLhKHQT53IJAETN6vo7DzPimBESzuexEmwxz++cjYnFL w==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13067468" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13067468" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:39:45 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065318" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065318" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:39:41 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Zhenzhong Duan Subject: [PATCH rfcv1 03/23] backends/iommufd_device: introduce IOMMUFDDevice targeted interface Date: Mon, 15 Jan 2024 18:37:15 +0800 Message-Id: <20240115103735.132209-4-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org With IOMMUFDDevice passed to vIOMMU, we can query hw IOMMU information and allocate hwpt for a device, but still need an extensible interface for vIOMMU usage. This introduces an IOMMUFDDevice targeted interface for vIOMMU. Currently this interface includes two callbacks attach_hwpt/detach_hwpt for vIOMMU to attach to or detach from hwpt on host side. Signed-off-by: Yi Liu Signed-off-by: Zhenzhong Duan --- include/sysemu/iommufd_device.h | 11 ++++++++++- backends/iommufd_device.c | 16 +++++++++++++++- hw/vfio/iommufd.c | 3 ++- 3 files changed, 27 insertions(+), 3 deletions(-) diff --git a/include/sysemu/iommufd_device.h b/include/sysemu/iommufd_device.h index 795630324b..799c1345fd 100644 --- a/include/sysemu/iommufd_device.h +++ b/include/sysemu/iommufd_device.h @@ -17,15 +17,24 @@ typedef struct IOMMUFDDevice IOMMUFDDevice; +typedef struct IOMMUFDDeviceOps { + int (*attach_hwpt)(IOMMUFDDevice *idev, uint32_t hwpt_id); + int (*detach_hwpt)(IOMMUFDDevice *idev); +} IOMMUFDDeviceOps; + /* This is an abstraction of host IOMMUFD device */ struct IOMMUFDDevice { IOMMUFDBackend *iommufd; uint32_t dev_id; + IOMMUFDDeviceOps *ops; }; +int iommufd_device_attach_hwpt(IOMMUFDDevice *idev, uint32_t hwpt_id); +int iommufd_device_detach_hwpt(IOMMUFDDevice *idev); int iommufd_device_get_info(IOMMUFDDevice *idev, enum iommu_hw_info_type *type, uint32_t len, void *data); void iommufd_device_init(void *_idev, size_t instance_size, - IOMMUFDBackend *iommufd, uint32_t dev_id); + IOMMUFDBackend *iommufd, uint32_t dev_id, + IOMMUFDDeviceOps *ops); #endif diff --git a/backends/iommufd_device.c b/backends/iommufd_device.c index f6e7ca1dbf..26f69252d2 100644 --- a/backends/iommufd_device.c +++ b/backends/iommufd_device.c @@ -14,6 +14,18 @@ #include "qemu/error-report.h" #include "sysemu/iommufd_device.h" +int iommufd_device_attach_hwpt(IOMMUFDDevice *idev, uint32_t hwpt_id) +{ + g_assert(idev->ops->attach_hwpt); + return idev->ops->attach_hwpt(idev, hwpt_id); +} + +int iommufd_device_detach_hwpt(IOMMUFDDevice *idev) +{ + g_assert(idev->ops->detach_hwpt); + return idev->ops->detach_hwpt(idev); +} + int iommufd_device_get_info(IOMMUFDDevice *idev, enum iommu_hw_info_type *type, uint32_t len, void *data) @@ -39,7 +51,8 @@ int iommufd_device_get_info(IOMMUFDDevice *idev, } void iommufd_device_init(void *_idev, size_t instance_size, - IOMMUFDBackend *iommufd, uint32_t dev_id) + IOMMUFDBackend *iommufd, uint32_t dev_id, + IOMMUFDDeviceOps *ops) { IOMMUFDDevice *idev = (IOMMUFDDevice *)_idev; @@ -47,4 +60,5 @@ void iommufd_device_init(void *_idev, size_t instance_size, idev->iommufd = iommufd; idev->dev_id = dev_id; + idev->ops = ops; } diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c index cbd035f148..1b174b71ee 100644 --- a/hw/vfio/iommufd.c +++ b/hw/vfio/iommufd.c @@ -429,7 +429,8 @@ found_container: QLIST_INSERT_HEAD(&bcontainer->device_list, vbasedev, container_next); QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next); - iommufd_device_init(idev, sizeof(*idev), container->be, vbasedev->devid); + iommufd_device_init(idev, sizeof(*idev), container->be, vbasedev->devid, + NULL); trace_iommufd_cdev_device_info(vbasedev->name, devfd, vbasedev->num_irqs, vbasedev->num_regions, vbasedev->flags); return 0; From patchwork Mon Jan 15 10:37:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519501 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E4930C47258 for ; Mon, 15 Jan 2024 10:41:21 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKNm-0000fJ-ON; Mon, 15 Jan 2024 05:39:54 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKNk-0000f3-GT for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:39:52 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKNi-0002kC-TF for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:39:52 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315191; x=1736851191; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=E8nl8QWoRgG2/jDdglZ/k5bjSREUF3iMIVNY6f/l+ik=; b=UQyAZKFDVzDGywyBYQpDJrtKilM7erv3C9ubWXj8vB4cXX0icvq/OmWz gNOI8MvqNFyNVnO0q9jX/EdRc05F5jJvcp+NLBZrjCtYLtI3wtGEaTXlZ tJKx2m9VJVStpqabRbBT2h5S/tiWz/WC1NNJSkXYHECCr9+kYVh4wVpVI RAiZ5vbG5MS6iJCPkkbVykksEJiherna8c/OYcaQq0BGQVRl4gole62a3 CjqTKXCp0GcSVNkach3VaogDyWCJ2Et/dT2j2RshZoMn+lPThcITWKTfg F3mx7PfEtBhcew3u6IvTu/MGID70tS9JfPirTt1Da68xjgOzmVWjYuftm Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13067484" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13067484" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:39:49 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065322" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065322" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:39:45 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Zhenzhong Duan Subject: [PATCH rfcv1 04/23] vfio: implement IOMMUFDDevice interface callbacks Date: Mon, 15 Jan 2024 18:37:16 +0800 Message-Id: <20240115103735.132209-5-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Implement IOMMUFDDevice interface callbacks attach_hwpt/detach_hwpt for vIOMMU usage. vIOMMU utilizes them to attach to or detach from hwpt on host side. Signed-off-by: Yi Liu Signed-off-by: Zhenzhong Duan --- hw/vfio/iommufd.c | 36 +++++++++++++++++++++++++++++++++++- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c index 1b174b71ee..c8c669c59a 100644 --- a/hw/vfio/iommufd.c +++ b/hw/vfio/iommufd.c @@ -26,6 +26,8 @@ #include "qemu/chardev_open.h" #include "pci.h" +static IOMMUFDDeviceOps vfio_iommufd_device_ops; + static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova, ram_addr_t size, void *vaddr, bool readonly) { @@ -430,7 +432,7 @@ found_container: QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next); iommufd_device_init(idev, sizeof(*idev), container->be, vbasedev->devid, - NULL); + &vfio_iommufd_device_ops); trace_iommufd_cdev_device_info(vbasedev->name, devfd, vbasedev->num_irqs, vbasedev->num_regions, vbasedev->flags); return 0; @@ -642,3 +644,35 @@ static const TypeInfo types[] = { }; DEFINE_TYPES(types) + +static int vfio_iommufd_device_attach_hwpt(IOMMUFDDevice *idev, + uint32_t hwpt_id) +{ + VFIODevice *vbasedev = container_of(idev, VFIODevice, idev); + Error *err = NULL; + int ret; + + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt_id, &err); + if (err) { + error_report_err(err); + } + return ret; +} + +static int vfio_iommufd_device_detach_hwpt(IOMMUFDDevice *idev) +{ + VFIODevice *vbasedev = container_of(idev, VFIODevice, idev); + Error *err = NULL; + int ret; + + ret = iommufd_cdev_detach_ioas_hwpt(vbasedev, &err); + if (err) { + error_report_err(err); + } + return ret; +} + +static IOMMUFDDeviceOps vfio_iommufd_device_ops = { + .attach_hwpt = vfio_iommufd_device_attach_hwpt, + .detach_hwpt = vfio_iommufd_device_detach_hwpt, +}; From patchwork Mon Jan 15 10:37:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519500 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2AB02C3DA79 for ; Mon, 15 Jan 2024 10:41:03 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKNq-0000fr-D5; Mon, 15 Jan 2024 05:39:58 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKNo-0000fa-Qf for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:39:56 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKNn-0002kC-0X for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:39:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315195; x=1736851195; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=N+1Pxq2BwWvz7LayJOjjK6kBAGkNx5MDYj8e747Ynvo=; b=fCrS0bLl7vmXYXE1cS8tHpRa2VTctV0v0rERi2LxzgQ/V5OMXw5OrxXq NFhq/B60tF83zQ0SJD92CMbsTsKGNva9DP1XnPcihZCZ9vsOiXcLq2ls7 8MRGyIRHmpOPjjnMxq1d2WhmnK5IAFAMU1e1AACjrnU0uCInovyeUrE0Y Za/fc3yCXd4gcyMfL8zeKoITHWkzxbXb26khqbBtmqhlMgv6gcjGVjMJD 3o+GmAF3PdUpPROwOLZfuzyIJQq953M2rgaXeT3I+T26wfBr4crPIJcS0 BZbzAA73UIghOtCYFdg2Ztd9XC11A4R9mjM5Ex8Yjlv9udOpbBkxfSaE+ A==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13067513" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13067513" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:39:54 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065330" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065330" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:39:49 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Zhenzhong Duan , Paolo Bonzini , Richard Henderson , Eduardo Habkost , Marcel Apfelbaum Subject: [PATCH rfcv1 05/23] intel_iommu: add a placeholder variable for scalable modern mode Date: Mon, 15 Jan 2024 18:37:17 +0800 Message-Id: <20240115103735.132209-6-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Add an new element scalable_mode in IntelIOMMUState to mark scalable modern mode, this element will be exposed as an intel_iommu property finally. For now, it's only a placehholder and used for cap/ecap initialization, parameter compatibility check, etc. No need to zero this element separately as IntelIOMMUState is zeroed when creation. Signed-off-by: Yi Liu Signed-off-by: Zhenzhong Duan --- hw/i386/intel_iommu_internal.h | 3 +++ include/hw/i386/intel_iommu.h | 1 + hw/i386/intel_iommu.c | 13 +++++++++++-- 3 files changed, 15 insertions(+), 2 deletions(-) diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index f8cf99bddf..ee4a784a35 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -192,9 +192,11 @@ #define VTD_ECAP_SC (1ULL << 7) #define VTD_ECAP_MHMV (15ULL << 20) #define VTD_ECAP_SRS (1ULL << 31) +#define VTD_ECAP_EAFS (1ULL << 34) #define VTD_ECAP_PASID (1ULL << 40) #define VTD_ECAP_SMTS (1ULL << 43) #define VTD_ECAP_SLTS (1ULL << 46) +#define VTD_ECAP_FLTS (1ULL << 47) /* CAP_REG */ /* (offset >> 4) << 24 */ @@ -211,6 +213,7 @@ #define VTD_CAP_SLLPS ((1ULL << 34) | (1ULL << 35)) #define VTD_CAP_DRAIN_WRITE (1ULL << 54) #define VTD_CAP_DRAIN_READ (1ULL << 55) +#define VTD_CAP_FL1GP (1ULL << 56) #define VTD_CAP_DRAIN (VTD_CAP_DRAIN_READ | VTD_CAP_DRAIN_WRITE) #define VTD_CAP_CM (1ULL << 7) #define VTD_PASID_ID_SHIFT 20 diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h index b8abbcce12..006cec116b 100644 --- a/include/hw/i386/intel_iommu.h +++ b/include/hw/i386/intel_iommu.h @@ -270,6 +270,7 @@ struct IntelIOMMUState { bool caching_mode; /* RO - is cap CM enabled? */ bool scalable_mode; /* RO - is Scalable Mode supported? */ + bool scalable_modern; /* RO - is modern SM supported? */ bool snoop_control; /* RO - is SNP filed supported? */ dma_addr_t root; /* Current root table pointer */ diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index be03fcbf52..1d007c33a8 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -4095,8 +4095,11 @@ static void vtd_cap_init(IntelIOMMUState *s) } /* TODO: read cap/ecap from host to decide which cap to be exposed. */ - if (s->scalable_mode) { + if (s->scalable_mode && !s->scalable_modern) { s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS; + } else if (s->scalable_mode && s->scalable_modern) { + s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_EAFS | VTD_ECAP_FLTS; + s->cap |= VTD_CAP_FL1GP; } if (s->snoop_control) { @@ -4271,12 +4274,18 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) /* Currently only address widths supported are 39 and 48 bits */ if ((s->aw_bits != VTD_HOST_AW_39BIT) && - (s->aw_bits != VTD_HOST_AW_48BIT)) { + (s->aw_bits != VTD_HOST_AW_48BIT) && + !s->scalable_modern) { error_setg(errp, "Supported values for aw-bits are: %d, %d", VTD_HOST_AW_39BIT, VTD_HOST_AW_48BIT); return false; } + if ((s->aw_bits != VTD_HOST_AW_48BIT) && s->scalable_modern) { + error_setg(errp, "Supported values for aw-bits are: %d", + VTD_HOST_AW_48BIT); + return false; + } if (s->scalable_mode && !s->dma_drain) { error_setg(errp, "Need to set dma_drain for scalable mode"); return false; From patchwork Mon Jan 15 10:37:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519499 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 02790C3DA79 for ; Mon, 15 Jan 2024 10:40:55 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKNw-0000jq-BQ; Mon, 15 Jan 2024 05:40:04 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKNu-0000gf-BJ for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:02 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKNs-0002kC-A7 for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:02 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315200; x=1736851200; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CmrsCEMVEfEQPHwzbT2P/VKgXPg6Wj6hai7KgFcbhbw=; b=A9AYR4pS7uf4N5Vc5Ny2Qz0K4E2EO5C2ryxKxssWhyNrr+eDbYBs5sr1 Ed9wWYgjgSKm2CwnennB2RO5VulIhgpooVnqh74BKL7GwDyKr+GIiAash IX7dPGPF1K59/0DnEST0188HMwjEK0KFr6l7cRTOGki+E/FMm1L6kX6u7 S2xg0ly9WyS777HGdxGwmpptDnpx77K2mhpqURFFUZD1e/7RmtQfW82Uc riyb/qvH2Hd7ibMOu92I67Jue8IcttJVKFkAFbpHLXVNGRIhOn2GjItGS 8dPElfyIyCb4eA1uWAMXWcC7Txo916qHMQ3uZ1BBm9YU1hXhWh2GzRs4h w==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13067588" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13067588" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:39:59 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065333" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065333" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:39:54 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Zhenzhong Duan , Marcel Apfelbaum , Paolo Bonzini , Richard Henderson , Eduardo Habkost Subject: [PATCH rfcv1 06/23] intel_iommu: check and sync host IOMMU cap/ecap in scalable modern mode Date: Mon, 15 Jan 2024 18:37:18 +0800 Message-Id: <20240115103735.132209-7-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org When vIOMMU is configured in scalable modern mode, stage-1 page table is supported. We need to check and sync host side cap/ecap with vIOMMU cap/ecap. This happens when PCIe device (i.e., VFIO case) sets IOMMUFDDevice to vIOMMU. Some of the bits in cap/ecap is user controllable, then user setting is compared with host cap/ecap for compatibility, i.e., if intel_iommu is configured in scalable modern but VTD_ECAP_NEST isn't set in host ecap, that device will fail to attach. For other bits not controlled by user, i.e. VTD_CAP/ECAP_MASK bits, host cap/ecap is picked. Below is the sequence to initial and finalize vIOMMU cap/ecap: vtd_cap_init() initializes iommu->cap/ecap. ---- vtd_cap_init() iommu->host_cap/ecap is initialized as iommu->cap/ecap. ---- vtd_init() iommu->host_cap/ecap is updated some bits(VTD_CAP/ECAP_MASK) with host setting. ---- vtd_sync_hw_info() iommu->cap/ecap is finalized as iommu->host_cap/ecap. ---- vtd_machine_done_hook() iommu->host_cap/ecap is a temporary storage to hold intermediate value when synthesize host cap/ecap with vIOMMU's initial configured cap/ecap. Signed-off-by: Yi Liu Signed-off-by: Zhenzhong Duan --- hw/i386/intel_iommu_internal.h | 10 ++++ hw/i386/intel_iommu.c | 83 ++++++++++++++++++++++++++++++---- 2 files changed, 85 insertions(+), 8 deletions(-) diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index ee4a784a35..6d881adf9b 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -191,13 +191,19 @@ #define VTD_ECAP_PT (1ULL << 6) #define VTD_ECAP_SC (1ULL << 7) #define VTD_ECAP_MHMV (15ULL << 20) +#define VTD_ECAP_NEST (1ULL << 26) #define VTD_ECAP_SRS (1ULL << 31) #define VTD_ECAP_EAFS (1ULL << 34) +#define VTD_ECAP_PSS(val) (((val) & 0x1fULL) << 35) #define VTD_ECAP_PASID (1ULL << 40) #define VTD_ECAP_SMTS (1ULL << 43) #define VTD_ECAP_SLTS (1ULL << 46) #define VTD_ECAP_FLTS (1ULL << 47) +#define VTD_ECAP_MASK (VTD_ECAP_SRS | VTD_ECAP_EAFS) +#define VTD_GET_PSS(val) (((val) >> 35) & 0x1f) +#define VTD_ECAP_PSS_MASK (0x1fULL << 35) + /* CAP_REG */ /* (offset >> 4) << 24 */ #define VTD_CAP_FRO (DMAR_FRCD_REG_OFFSET << 20) @@ -214,11 +220,15 @@ #define VTD_CAP_DRAIN_WRITE (1ULL << 54) #define VTD_CAP_DRAIN_READ (1ULL << 55) #define VTD_CAP_FL1GP (1ULL << 56) +#define VTD_CAP_FL5LP (1ULL << 60) #define VTD_CAP_DRAIN (VTD_CAP_DRAIN_READ | VTD_CAP_DRAIN_WRITE) #define VTD_CAP_CM (1ULL << 7) #define VTD_PASID_ID_SHIFT 20 #define VTD_PASID_ID_MASK ((1ULL << VTD_PASID_ID_SHIFT) - 1) + +#define VTD_CAP_MASK (VTD_CAP_FL1GP | VTD_CAP_FL5LP) + /* Supported Adjusted Guest Address Widths */ #define VTD_CAP_SAGAW_SHIFT 8 #define VTD_CAP_SAGAW_MASK (0x1fULL << VTD_CAP_SAGAW_SHIFT) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 1d007c33a8..c0973aaccb 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -3819,19 +3819,82 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, return vtd_dev_as; } +static bool vtd_check_hw_info(IntelIOMMUState *s, struct iommu_hw_info_vtd *vtd, + Error **errp) +{ + if (!(vtd->ecap_reg & VTD_ECAP_NEST)) { + error_setg(errp, "Need nested translation on host in modern mode"); + return false; + } + + return true; +} + +/* cap/ecap are readonly after vIOMMU finalized */ +static bool vtd_check_hw_info_finalized(IntelIOMMUState *s, + struct iommu_hw_info_vtd *vtd, + Error **errp) +{ + if (s->cap & ~vtd->cap_reg & VTD_CAP_MASK) { + error_setg(errp, "vIOMMU cap %lx isn't compatible with host %llx", + s->cap, vtd->cap_reg); + return false; + } + + if (s->ecap & ~vtd->ecap_reg & VTD_ECAP_MASK) { + error_setg(errp, "vIOMMU ecap %lx isn't compatible with host %llx", + s->ecap, vtd->ecap_reg); + return false; + } + + if (s->ecap & vtd->ecap_reg & VTD_ECAP_PASID && + VTD_GET_PSS(s->ecap) > VTD_GET_PSS(vtd->ecap_reg)) { + error_setg(errp, "vIOMMU pasid bits %lu > host pasid bits %llu", + VTD_GET_PSS(s->ecap), VTD_GET_PSS(vtd->ecap_reg)); + return false; + } + + return true; +} + static bool vtd_sync_hw_info(IntelIOMMUState *s, struct iommu_hw_info_vtd *vtd, Error **errp) { - uint64_t addr_width; + uint64_t cap, ecap, addr_width, pasid_bits; - addr_width = (vtd->cap_reg >> 16) & 0x3fULL; - if (s->aw_bits > addr_width) { - error_setg(errp, "User aw-bits: %u > host address width: %lu", - s->aw_bits, addr_width); + if (!s->scalable_modern) { + addr_width = (vtd->cap_reg >> 16) & 0x3fULL; + if (s->aw_bits > addr_width) { + error_setg(errp, "User aw-bits: %u > host address width: %lu", + s->aw_bits, addr_width); + return false; + } + return true; + } + + if (!vtd_check_hw_info(s, vtd, errp)) { return false; } - /* TODO: check and sync host cap/ecap into vIOMMU cap/ecap */ + if (s->cap_finalized) { + return vtd_check_hw_info_finalized(s, vtd, errp); + } + + /* sync host cap/ecap to vIOMMU */ + + cap = s->host_cap & vtd->cap_reg & VTD_CAP_MASK; + s->host_cap &= ~VTD_CAP_MASK; + s->host_cap |= cap; + ecap = s->host_ecap & vtd->ecap_reg & VTD_ECAP_MASK; + s->host_ecap &= ~VTD_ECAP_MASK; + s->host_ecap |= ecap; + + pasid_bits = VTD_GET_PSS(vtd->ecap_reg); + if (s->host_ecap & VTD_ECAP_PASID && + VTD_GET_PSS(s->host_ecap) > pasid_bits) { + s->host_ecap &= ~VTD_ECAP_PSS_MASK; + s->host_ecap |= VTD_ECAP_PSS(pasid_bits); + } return true; } @@ -3873,9 +3936,13 @@ static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int32_t devfn, assert(0 <= devfn && devfn < PCI_DEVFN_MAX); - /* None IOMMUFD case */ - if (!idev) { + if (!s->scalable_modern && !idev) { + /* Legacy vIOMMU and non-IOMMUFD backend */ return 0; + } else if (!idev) { + /* Modern vIOMMU and non-IOMMUFD backend */ + error_setg(errp, "Need IOMMUFD backend to setup nested page table"); + return -1; } if (!vtd_check_idev(s, idev, errp)) { From patchwork Mon Jan 15 10:37:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519498 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0BF2AC3DA79 for ; Mon, 15 Jan 2024 10:40:23 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKO1-000114-KZ; Mon, 15 Jan 2024 05:40:10 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKNz-0000uA-M4 for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:07 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKNx-0002kC-DU for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:07 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315205; x=1736851205; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4wZcfHFaPYlYm36Yl05rsY1cRsYvun83Dagiw/zWm+8=; b=MSMBAZM/VwoTXOcZNvsCrXEJYlPnkPIm3LaUo2DZHQZCz2i+za0qaWF0 OH+DVtJb5BzH6DXK747LiB4EtYzTGRnnmLDd7EAlVwd6MLr71BcKewSDb /LQIAOTp0qbvZfNbirGZFvBvLGK9/K+YN0Tvm8xzFJ2pZrQXj2llU1APU kCBYNoOoF0YwrK4ocNDV1438v3nQHuY/bNQcpvf6Wp4B1BwaQGkekQobo gv/zmOYo7iAacHsQvw4Go4NbhsQzdOQj7hYbl2dILKwcv7gk1FrgIr1/4 8hjzkxcrmlWAPJayaO/CxLgq2TniBOA+F0nmZIAwh1yBzeVfop0IWit9e Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13067616" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13067616" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:40:04 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065339" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065339" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:39:59 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Zhenzhong Duan , Paolo Bonzini , Richard Henderson , Eduardo Habkost , Marcel Apfelbaum Subject: [PATCH rfcv1 07/23] intel_iommu: process PASID cache invalidation Date: Mon, 15 Jan 2024 18:37:19 +0800 Message-Id: <20240115103735.132209-8-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Yi Liu This adds PASID cache invalidation handling. When guest updated a pasid entry in scalable mode, guest software should issue a proper PASID cache invalidation when caching-mode is exposed. This can happen even when pasid is disabled as rid_pasid will still be used. This only adds a basic framework for handling pasid cache invalidation. Detailed handling will be added in subsequent patches. Signed-off-by: Yi Liu Signed-off-by: Zhenzhong Duan --- hw/i386/intel_iommu_internal.h | 12 ++++++++++ hw/i386/intel_iommu.c | 40 +++++++++++++++++++++++++++++----- hw/i386/trace-events | 3 +++ 3 files changed, 50 insertions(+), 5 deletions(-) diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 6d881adf9b..10117e2f25 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -444,6 +444,18 @@ typedef union VTDInvDesc VTDInvDesc; (0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM | VTD_SL_TM)) : \ (0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) +#define VTD_INV_DESC_PASIDC_G (3ULL << 4) +#define VTD_INV_DESC_PASIDC_PASID(val) (((val) >> 32) & 0xfffffULL) +#define VTD_INV_DESC_PASIDC_DID(val) (((val) >> 16) & VTD_DOMAIN_ID_MASK) +#define VTD_INV_DESC_PASIDC_RSVD_VAL0 0xfff000000000ffc0ULL +#define VTD_INV_DESC_PASIDC_RSVD_VAL1 0xffffffffffffffffULL +#define VTD_INV_DESC_PASIDC_RSVD_VAL2 0xffffffffffffffffULL +#define VTD_INV_DESC_PASIDC_RSVD_VAL3 0xffffffffffffffffULL + +#define VTD_INV_DESC_PASIDC_DSI (0ULL << 4) +#define VTD_INV_DESC_PASIDC_PASID_SI (1ULL << 4) +#define VTD_INV_DESC_PASIDC_GLOBAL (3ULL << 4) + /* Information about page-selective IOTLB invalidate */ struct VTDIOTLBPageInvInfo { uint16_t domain_id; diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index c0973aaccb..effbeed8a3 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -2635,6 +2635,37 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) return true; } +static bool vtd_process_pasid_desc(IntelIOMMUState *s, + VTDInvDesc *inv_desc) +{ + if ((inv_desc->val[0] & VTD_INV_DESC_PASIDC_RSVD_VAL0) || + (inv_desc->val[1] & VTD_INV_DESC_PASIDC_RSVD_VAL1) || + (inv_desc->val[2] & VTD_INV_DESC_PASIDC_RSVD_VAL2) || + (inv_desc->val[3] & VTD_INV_DESC_PASIDC_RSVD_VAL3)) { + error_report_once("non-zero-field-in-pc_inv_desc hi: 0x%" PRIx64 + " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]); + return false; + } + + switch (inv_desc->val[0] & VTD_INV_DESC_PASIDC_G) { + case VTD_INV_DESC_PASIDC_DSI: + break; + + case VTD_INV_DESC_PASIDC_PASID_SI: + break; + + case VTD_INV_DESC_PASIDC_GLOBAL: + break; + + default: + error_report_once("invalid-inv-granu-in-pc_inv_desc hi: 0x%" PRIx64 + " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]); + return false; + } + + return true; +} + static bool vtd_process_inv_iec_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) { @@ -2736,12 +2767,11 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s) } break; - /* - * TODO: the entity of below two cases will be implemented in future series. - * To make guest (which integrates scalable mode support patch set in - * iommu driver) work, just return true is enough so far. - */ case VTD_INV_DESC_PC: + trace_vtd_inv_desc("pasid-cache", inv_desc.val[1], inv_desc.val[0]); + if (!vtd_process_pasid_desc(s, &inv_desc)) { + return false; + } break; case VTD_INV_DESC_PIOTLB: diff --git a/hw/i386/trace-events b/hw/i386/trace-events index 53c02d7ac8..e54799ee82 100644 --- a/hw/i386/trace-events +++ b/hw/i386/trace-events @@ -24,6 +24,9 @@ vtd_inv_qi_head(uint16_t head) "read head %d" vtd_inv_qi_tail(uint16_t head) "write tail %d" vtd_inv_qi_fetch(void) "" vtd_context_cache_reset(void) "" +vtd_pasid_cache_gsi(void) "" +vtd_pasid_cache_dsi(uint16_t domain) "Domian slective PC invalidation domain 0x%"PRIx16 +vtd_pasid_cache_psi(uint16_t domain, uint32_t pasid) "PASID slective PC invalidation domain 0x%"PRIx16" pasid 0x%"PRIx32 vtd_re_not_present(uint8_t bus) "Root entry bus %"PRIu8" not present" vtd_ce_not_present(uint8_t bus, uint8_t devfn) "Context entry bus %"PRIu8" devfn %"PRIu8" not present" vtd_iotlb_page_hit(uint16_t sid, uint64_t addr, uint64_t slpte, uint16_t domain) "IOTLB page hit sid 0x%"PRIx16" iova 0x%"PRIx64" slpte 0x%"PRIx64" domain 0x%"PRIx16 From patchwork Mon Jan 15 10:37:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519502 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 59D20C4707C for ; Mon, 15 Jan 2024 10:41:21 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKO8-0001P6-Fp; Mon, 15 Jan 2024 05:40:17 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKO5-0001HD-Rx for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:14 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKO2-0002kC-Le for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:13 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315211; x=1736851211; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=RI8it2Ry8rXjKKnuM5t0zkEXkCaEw++ETOugRQImZ/U=; b=Q0vfczzWMjT1V40ax8e9DEKFD9ikEoZRn4MyWA1N53M9qR/DxQ540Op1 iDoio7gaRvYtoG8aH/Ik7KofiCe2vfJZtI8mlvqEE5h72wYoVHddpOdCM ZnlKYoI7TqMiP7QR5cJx3KlHBPggEJ9bXh+BMajB7AyhsNEKJC+3lk1Yj n1RVnLsx2Rbd83dumjvlM6C5LERpNq9OuODyFGXf1qyBzv5zYDZ7NCAUi 7B3DV9rlE+ibP/jAhIUKo6JqsCkt6cyc0tC3goh6EvH5KAAw/KPotV+yo Wam/AW1mpvVqMMlnnke6xPnVxyWiC1syJ/IShog6a20QDBFEggahpqweX w==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13067650" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13067650" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:40:10 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065345" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065345" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:40:04 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Yi Sun , Zhenzhong Duan , Paolo Bonzini , Richard Henderson , Eduardo Habkost , Marcel Apfelbaum Subject: [PATCH rfcv1 08/23] intel_iommu: add PASID cache management infrastructure Date: Mon, 15 Jan 2024 18:37:20 +0800 Message-Id: <20240115103735.132209-9-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Yi Liu This adds a PASID cache management infrastructure based on new added structure VTDPASIDAddressSpace, which is used to track the PASID usage and future PASID tagged DMA address translation support in vIOMMU. struct VTDPASIDAddressSpace { PCIBus *bus; uint8_t devfn; AddressSpace as; uint32_t pasid; IntelIOMMUState *iommu_state; VTDContextCacheEntry context_cache_entry; QLIST_ENTRY(VTDPASIDAddressSpace) next; VTDPASIDCacheEntry pasid_cache_entry; }; The implementation manages VTDPASIDAddressSpace instances per PASID+BDF (lookup and insert will use PASID and BDF) since Intel VT-d spec allows per-BDF PASID Table. A VTDPASIDAddressSpace instance is created/destroyed per the guest pasid entry set up/destroy for passthrough devices. While for emulated devices, VTDPASIDAddressSpace instance is created in the PASID tagged DMA translation and be destroyed per guest PASID cache invalidation. This focuses on the PASID cache management for passthrough devices as there is no PASID-capable emulated devices yet. When guest modifies a PASID entry, QEMU will capture the guest pasid selective pasid cache invalidation, allocate or remove a VTDPASIDAddressSpace instance per the invalidation reasons: *) a present pasid entry moved to non-present *) a present pasid entry to be a present entry *) a non-present pasid entry moved to present vIOMMU emulator could figure out the reason by fetching latest guest pasid entry and compare it with the PASID cache. Signed-off-by: Yi Liu Signed-off-by: Yi Sun Signed-off-by: Zhenzhong Duan --- hw/i386/intel_iommu_internal.h | 21 ++ include/hw/i386/intel_iommu.h | 26 ++ hw/i386/intel_iommu.c | 468 +++++++++++++++++++++++++++++++++ hw/i386/trace-events | 1 + 4 files changed, 516 insertions(+) diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 10117e2f25..16dc712e94 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -325,6 +325,7 @@ typedef enum VTDFaultReason { VTD_FR_IR_SID_ERR = 0x26, /* Invalid Source-ID */ VTD_FR_PASID_TABLE_INV = 0x58, /*Invalid PASID table entry */ + VTD_FR_PASID_ENTRY_P = 0x59, /* The Present(P) field of pasidt-entry is 0 */ /* Output address in the interrupt address range for scalable mode */ VTD_FR_SM_INTERRUPT_ADDR = 0x87, @@ -512,10 +513,29 @@ typedef struct VTDRootEntry VTDRootEntry; #define VTD_CTX_ENTRY_LEGACY_SIZE 16 #define VTD_CTX_ENTRY_SCALABLE_SIZE 32 +#define VTD_SM_CONTEXT_ENTRY_PDTS(val) (((val) >> 9) & 0x7) #define VTD_SM_CONTEXT_ENTRY_RID2PASID_MASK 0xfffff #define VTD_SM_CONTEXT_ENTRY_RSVD_VAL0(aw) (0x1e0ULL | ~VTD_HAW_MASK(aw)) #define VTD_SM_CONTEXT_ENTRY_RSVD_VAL1 0xffffffffffe00000ULL +typedef enum VTDPCInvType { + /* force reset all */ + VTD_PASID_CACHE_FORCE_RESET = 0, + /* pasid cache invalidation rely on guest PASID entry */ + VTD_PASID_CACHE_GLOBAL_INV, + VTD_PASID_CACHE_DOMSI, + VTD_PASID_CACHE_PASIDSI, +} VTDPCInvType; + +struct VTDPASIDCacheInfo { + VTDPCInvType type; + uint16_t domain_id; + uint32_t pasid; + PCIBus *bus; + uint16_t devfn; +}; +typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo; + /* PASID Table Related Definitions */ #define VTD_PASID_DIR_BASE_ADDR_MASK (~0xfffULL) #define VTD_PASID_TABLE_BASE_ADDR_MASK (~0xfffULL) @@ -527,6 +547,7 @@ typedef struct VTDRootEntry VTDRootEntry; #define VTD_PASID_TABLE_BITS_MASK (0x3fULL) #define VTD_PASID_TABLE_INDEX(pasid) ((pasid) & VTD_PASID_TABLE_BITS_MASK) #define VTD_PASID_ENTRY_FPD (1ULL << 1) /* Fault Processing Disable */ +#define VTD_PASID_TBL_ENTRY_NUM (1ULL << 6) /* PASID Granular Translation Type Mask */ #define VTD_PASID_ENTRY_P 1ULL diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h index 006cec116b..c7b707a3d5 100644 --- a/include/hw/i386/intel_iommu.h +++ b/include/hw/i386/intel_iommu.h @@ -63,6 +63,8 @@ typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress; typedef struct VTDPASIDDirEntry VTDPASIDDirEntry; typedef struct VTDPASIDEntry VTDPASIDEntry; typedef struct VTDIOMMUFDDevice VTDIOMMUFDDevice; +typedef struct VTDPASIDCacheEntry VTDPASIDCacheEntry; +typedef struct VTDPASIDAddressSpace VTDPASIDAddressSpace; /* Context-Entry */ struct VTDContextEntry { @@ -95,6 +97,25 @@ struct VTDPASIDEntry { uint64_t val[8]; }; +struct pasid_key { + uint32_t pasid; + uint16_t sid; +}; + +struct VTDPASIDCacheEntry { + struct VTDPASIDEntry pasid_entry; +}; + +struct VTDPASIDAddressSpace { + PCIBus *bus; + uint8_t devfn; + uint32_t pasid; + IntelIOMMUState *iommu_state; + VTDContextCacheEntry context_cache_entry; + QLIST_ENTRY(VTDPASIDAddressSpace) next; + VTDPASIDCacheEntry pasid_cache_entry; +}; + struct VTDAddressSpace { PCIBus *bus; uint8_t devfn; @@ -154,6 +175,7 @@ struct VTDIOMMUFDDevice { uint8_t devfn; IOMMUFDDevice *idev; IntelIOMMUState *iommu_state; + QLIST_ENTRY(VTDIOMMUFDDevice) next; }; struct VTDIOTLBEntry { @@ -301,9 +323,13 @@ struct IntelIOMMUState { GHashTable *vtd_address_spaces; /* VTD address spaces */ VTDAddressSpace *vtd_as_cache[VTD_PCI_BUS_MAX]; /* VTD address space cache */ + GHashTable *vtd_pasid_as; /* VTDPASIDAddressSpace instances */ /* list of registered notifiers */ QLIST_HEAD(, VTDAddressSpace) vtd_as_with_notifiers; + /* list of VTDIOMMUFDDevices */ + QLIST_HEAD(, VTDIOMMUFDDevice) vtd_idev_list; + GHashTable *vtd_iommufd_dev; /* VTDIOMMUFDDevice */ /* interrupt remapping */ diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index effbeed8a3..a1a1f23246 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -39,6 +39,7 @@ #include "kvm/kvm_i386.h" #include "migration/vmstate.h" #include "trace.h" +#include "qemu/jhash.h" /* context entry operations */ #define VTD_CE_GET_RID2PASID(ce) \ @@ -71,6 +72,8 @@ struct vtd_iotlb_key { static void vtd_address_space_refresh_all(IntelIOMMUState *s); static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n); +static void vtd_pasid_cache_reset(IntelIOMMUState *s); + static void vtd_panic_require_caching_mode(void) { error_report("We need to set caching-mode=on for intel-iommu to enable " @@ -326,6 +329,7 @@ static void vtd_reset_caches(IntelIOMMUState *s) vtd_iommu_lock(s); vtd_reset_iotlb_locked(s); vtd_reset_context_cache_locked(s); + vtd_pasid_cache_reset(s); vtd_iommu_unlock(s); } @@ -757,6 +761,16 @@ static inline bool vtd_pe_type_check(X86IOMMUState *x86_iommu, return true; } +static inline uint16_t vtd_pe_get_domain_id(VTDPASIDEntry *pe) +{ + return VTD_SM_PASID_ENTRY_DID((pe)->val[1]); +} + +static inline uint32_t vtd_sm_ce_get_pdt_entry_num(VTDContextEntry *ce) +{ + return 1U << (VTD_SM_CONTEXT_ENTRY_PDTS(ce->val[0]) + 7); +} + static inline bool vtd_pdire_present(VTDPASIDDirEntry *pdire) { return pdire->val & 1; @@ -2635,9 +2649,443 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) return true; } +static inline void vtd_init_pasid_key(uint32_t pasid, + uint16_t sid, + struct pasid_key *key) +{ + key->pasid = pasid; + key->sid = sid; +} + +static guint vtd_pasid_as_key_hash(gconstpointer v) +{ + struct pasid_key *key = (struct pasid_key *)v; + uint32_t a, b, c; + + /* Jenkins hash */ + a = b = c = JHASH_INITVAL + sizeof(*key); + a += key->sid; + b += extract32(key->pasid, 0, 16); + c += extract32(key->pasid, 16, 16); + + __jhash_mix(a, b, c); + __jhash_final(a, b, c); + + return c; +} + +static gboolean vtd_pasid_as_key_equal(gconstpointer v1, gconstpointer v2) +{ + const struct pasid_key *k1 = v1; + const struct pasid_key *k2 = v2; + + return (k1->pasid == k2->pasid) && (k1->sid == k2->sid); +} + +static inline int vtd_dev_get_pe_from_pasid(IntelIOMMUState *s, + uint8_t bus_num, + uint8_t devfn, + uint32_t pasid, + VTDPASIDEntry *pe) +{ + VTDContextEntry ce; + int ret; + dma_addr_t pasid_dir_base; + + if (!s->root_scalable) { + return -VTD_FR_PASID_TABLE_INV; + } + + ret = vtd_dev_to_context_entry(s, bus_num, devfn, &ce); + if (ret) { + return ret; + } + + pasid_dir_base = VTD_CE_GET_PASID_DIR_TABLE(&ce); + ret = vtd_get_pe_from_pasid_table(s, + pasid_dir_base, pasid, pe); + + return ret; +} + +static bool vtd_pasid_entry_compare(VTDPASIDEntry *p1, VTDPASIDEntry *p2) +{ + return !memcmp(p1, p2, sizeof(*p1)); +} + +/* + * This function fills in the pasid entry in &vtd_pasid_as. Caller + * of this function should hold iommu_lock. + */ +static void vtd_fill_pe_in_cache(IntelIOMMUState *s, + VTDPASIDAddressSpace *vtd_pasid_as, + VTDPASIDEntry *pe) +{ + VTDPASIDCacheEntry *pc_entry = &vtd_pasid_as->pasid_cache_entry; + + if (vtd_pasid_entry_compare(pe, &pc_entry->pasid_entry)) { + /* No need to go further as cached pasid entry is latest */ + return; + } + + pc_entry->pasid_entry = *pe; + /* + * TODO: + * - send pasid bind to host for passthru devices + */ +} + +/* + * This function is used to clear cached pasid entry in vtd_pasid_as + * instances. Caller of this function should hold iommu_lock. + */ +static gboolean vtd_flush_pasid(gpointer key, gpointer value, + gpointer user_data) +{ + VTDPASIDCacheInfo *pc_info = user_data; + VTDPASIDAddressSpace *vtd_pasid_as = value; + IntelIOMMUState *s = vtd_pasid_as->iommu_state; + VTDPASIDCacheEntry *pc_entry = &vtd_pasid_as->pasid_cache_entry; + PCIBus *bus = vtd_pasid_as->bus; + VTDPASIDEntry pe; + uint16_t did; + uint32_t pasid; + uint16_t devfn; + int ret; + + did = vtd_pe_get_domain_id(&pc_entry->pasid_entry); + pasid = vtd_pasid_as->pasid; + devfn = vtd_pasid_as->devfn; + + switch (pc_info->type) { + case VTD_PASID_CACHE_FORCE_RESET: + goto remove; + case VTD_PASID_CACHE_PASIDSI: + if (pc_info->pasid != pasid) { + return false; + } + /* Fall through */ + case VTD_PASID_CACHE_DOMSI: + if (pc_info->domain_id != did) { + return false; + } + /* Fall through */ + case VTD_PASID_CACHE_GLOBAL_INV: + break; + default: + error_report("invalid pc_info->type"); + abort(); + } + + /* + * pasid cache invalidation may indicate a present pasid + * entry to present pasid entry modification. To cover such + * case, vIOMMU emulator needs to fetch latest guest pasid + * entry and check cached pasid entry, then update pasid + * cache and send pasid bind/unbind to host properly. + */ + ret = vtd_dev_get_pe_from_pasid(s, pci_bus_num(bus), + devfn, pasid, &pe); + if (ret) { + /* + * No valid pasid entry in guest memory. e.g. pasid entry + * was modified to be either all-zero or non-present. Either + * case means existing pasid cache should be removed. + */ + goto remove; + } + + vtd_fill_pe_in_cache(s, vtd_pasid_as, &pe); + /* + * TODO: + * - when pasid-base-iotlb(piotlb) infrastructure is ready, + * should invalidate QEMU piotlb togehter with this change. + */ + return false; +remove: + /* + * TODO: + * - send pasid bind to host for passthru devices + * - when pasid-base-iotlb(piotlb) infrastructure is ready, + * should invalidate QEMU piotlb togehter with this change. + */ + return true; +} + +/* + * This function finds or adds a VTDPASIDAddressSpace for a device + * when it is bound to a pasid. Caller of this function should hold + * iommu_lock. + */ +static VTDPASIDAddressSpace *vtd_add_find_pasid_as(IntelIOMMUState *s, + PCIBus *bus, + int devfn, + uint32_t pasid) +{ + struct pasid_key key; + struct pasid_key *new_key; + VTDPASIDAddressSpace *vtd_pasid_as; + uint16_t sid; + + sid = PCI_BUILD_BDF(pci_bus_num(bus), devfn); + vtd_init_pasid_key(pasid, sid, &key); + vtd_pasid_as = g_hash_table_lookup(s->vtd_pasid_as, &key); + + if (!vtd_pasid_as) { + new_key = g_malloc0(sizeof(*new_key)); + vtd_init_pasid_key(pasid, sid, new_key); + /* + * Initiate the vtd_pasid_as structure. + * + * This structure here is used to track the guest pasid + * binding and also serves as pasid-cache mangement entry. + * + * TODO: in future, if wants to support the SVA-aware DMA + * emulation, the vtd_pasid_as should have include + * AddressSpace to support DMA emulation. + */ + vtd_pasid_as = g_malloc0(sizeof(VTDPASIDAddressSpace)); + vtd_pasid_as->iommu_state = s; + vtd_pasid_as->bus = bus; + vtd_pasid_as->devfn = devfn; + vtd_pasid_as->pasid = pasid; + g_hash_table_insert(s->vtd_pasid_as, new_key, vtd_pasid_as); + } + return vtd_pasid_as; +} + +/* Caller of this function should hold iommu_lock. */ +static void vtd_sm_pasid_table_walk_one(IntelIOMMUState *s, + dma_addr_t pt_base, + int start, + int end, + VTDPASIDCacheInfo *info) +{ + VTDPASIDEntry pe; + int pasid = start; + int pasid_next; + VTDPASIDAddressSpace *vtd_pasid_as; + + while (pasid < end) { + pasid_next = pasid + 1; + + if (!vtd_get_pe_in_pasid_leaf_table(s, pasid, pt_base, &pe) + && vtd_pe_present(&pe)) { + vtd_pasid_as = vtd_add_find_pasid_as(s, + info->bus, info->devfn, pasid); + if ((info->type == VTD_PASID_CACHE_DOMSI || + info->type == VTD_PASID_CACHE_PASIDSI) && + !(info->domain_id == vtd_pe_get_domain_id(&pe))) { + /* + * VTD_PASID_CACHE_DOMSI and VTD_PASID_CACHE_PASIDSI + * requires domain ID check. If domain Id check fail, + * go to next pasid. + */ + pasid = pasid_next; + continue; + } + vtd_fill_pe_in_cache(s, vtd_pasid_as, &pe); + } + pasid = pasid_next; + } +} + +/* + * Currently, VT-d scalable mode pasid table is a two level table, + * this function aims to loop a range of PASIDs in a given pasid + * table to identify the pasid config in guest. + * Caller of this function should hold iommu_lock. + */ +static void vtd_sm_pasid_table_walk(IntelIOMMUState *s, + dma_addr_t pdt_base, + int start, + int end, + VTDPASIDCacheInfo *info) +{ + VTDPASIDDirEntry pdire; + int pasid = start; + int pasid_next; + dma_addr_t pt_base; + + while (pasid < end) { + pasid_next = ((end - pasid) > VTD_PASID_TBL_ENTRY_NUM) ? + (pasid + VTD_PASID_TBL_ENTRY_NUM) : end; + if (!vtd_get_pdire_from_pdir_table(pdt_base, pasid, &pdire) + && vtd_pdire_present(&pdire)) { + pt_base = pdire.val & VTD_PASID_TABLE_BASE_ADDR_MASK; + vtd_sm_pasid_table_walk_one(s, pt_base, pasid, pasid_next, info); + } + pasid = pasid_next; + } +} + +static void vtd_replay_pasid_bind_for_dev(IntelIOMMUState *s, + int start, int end, + VTDPASIDCacheInfo *info) +{ + VTDContextEntry ce; + int bus_n, devfn; + + bus_n = pci_bus_num(info->bus); + devfn = info->devfn; + + if (!vtd_dev_to_context_entry(s, bus_n, devfn, &ce)) { + uint32_t max_pasid; + + max_pasid = vtd_sm_ce_get_pdt_entry_num(&ce) * VTD_PASID_TBL_ENTRY_NUM; + if (end > max_pasid) { + end = max_pasid; + } + vtd_sm_pasid_table_walk(s, + VTD_CE_GET_PASID_DIR_TABLE(&ce), + start, + end, + info); + } +} + +/* + * This function replay the guest pasid bindings to hots by + * walking the guest PASID table. This ensures host will have + * latest guest pasid bindings. Caller should hold iommu_lock. + */ +static void vtd_replay_guest_pasid_bindings(IntelIOMMUState *s, + VTDPASIDCacheInfo *pc_info) +{ + VTDIOMMUFDDevice *vtd_idev; + int start = 0, end = 1; /* only rid2pasid is supported */ + VTDPASIDCacheInfo walk_info; + + switch (pc_info->type) { + case VTD_PASID_CACHE_PASIDSI: + start = pc_info->pasid; + end = pc_info->pasid + 1; + /* + * PASID selective invalidation is within domain, + * thus fall through. + */ + case VTD_PASID_CACHE_DOMSI: + case VTD_PASID_CACHE_GLOBAL_INV: + /* loop all assigned devices */ + break; + case VTD_PASID_CACHE_FORCE_RESET: + /* For force reset, no need to go further replay */ + return; + default: + error_report("invalid pc_info->type for replay"); + abort(); + } + + /* + * In this replay, only needs to care about the devices which + * are backed by host IOMMU. For such devices, their vtd_idev + * instances are in the s->vtd_idev_list. For devices which + * are not backed byhost IOMMU, it is not necessary to replay + * the bindings since their cache could be re-created in the future + * DMA address transaltion. + */ + walk_info = *pc_info; + QLIST_FOREACH(vtd_idev, &s->vtd_idev_list, next) { + /* bus|devfn fields are not identical with pc_info */ + walk_info.bus = vtd_idev->bus; + walk_info.devfn = vtd_idev->devfn; + vtd_replay_pasid_bind_for_dev(s, start, end, &walk_info); + } +} + +/* + * This function syncs the pasid bindings between guest and host. + * It includes updating the pasid cache in vIOMMU and updating the + * pasid bindings per guest's latest pasid entry presence. + */ +static void vtd_pasid_cache_sync(IntelIOMMUState *s, + VTDPASIDCacheInfo *pc_info) +{ + if (!s->scalable_modern || !s->root_scalable || !s->dmar_enabled) { + return; + } + + /* + * Regards to a pasid cache invalidation, e.g. a PSI. + * it could be either cases of below: + * a) a present pasid entry moved to non-present + * b) a present pasid entry to be a present entry + * c) a non-present pasid entry moved to present + * + * Different invalidation granularity may affect different device + * scope and pasid scope. But for each invalidation granularity, + * it needs to do two steps to sync host and guest pasid binding. + * + * Here is the handling of a PSI: + * 1) loop all the existing vtd_pasid_as instances to update them + * according to the latest guest pasid entry in pasid table. + * this will make sure affected existing vtd_pasid_as instances + * cached the latest pasid entries. Also, during the loop, the + * host should be notified if needed. e.g. pasid unbind or pasid + * update. Should be able to cover case a) and case b). + * + * 2) loop all devices to cover case c) + * - For devices which have IOMMUFDDevice instances, + * we loop them and check if guest pasid entry exists. If yes, + * it is case c), we update the pasid cache and also notify + * host. + * - For devices which have no IOMMUFDDevice, it is not + * necessary to create pasid cache at this phase since it + * could be created when vIOMMU does DMA address translation. + * This is not yet implemented since there is no emulated + * pasid-capable devices today. If we have such devices in + * future, the pasid cache shall be created there. + * Other granularity follow the same steps, just with different scope + * + */ + + vtd_iommu_lock(s); + /* Step 1: loop all the exisitng vtd_pasid_as instances */ + g_hash_table_foreach_remove(s->vtd_pasid_as, + vtd_flush_pasid, pc_info); + + /* + * Step 2: loop all the exisitng vtd_idev instances. + * Ideally, needs to loop all devices to find if there is any new + * PASID binding regards to the PASID cache invalidation request. + * But it is enough to loop the devices which are backed by host + * IOMMU. For devices backed by vIOMMU (a.k.a emulated devices), + * if new PASID happened on them, their vtd_pasid_as instance could + * be created during future vIOMMU DMA translation. + */ + vtd_replay_guest_pasid_bindings(s, pc_info); + vtd_iommu_unlock(s); +} + +/* Caller of this function should hold iommu_lock */ +static void vtd_pasid_cache_reset(IntelIOMMUState *s) +{ + VTDPASIDCacheInfo pc_info; + + trace_vtd_pasid_cache_reset(); + + pc_info.type = VTD_PASID_CACHE_FORCE_RESET; + + /* + * Reset pasid cache is a big hammer, so use + * g_hash_table_foreach_remove which will free + * the vtd_pasid_as instances. Also, as a big + * hammer, use VTD_PASID_CACHE_FORCE_RESET to + * ensure all the vtd_pasid_as instances are + * dropped, meanwhile the change will be pass + * to host if IOMMUFDDevice is available. + */ + g_hash_table_foreach_remove(s->vtd_pasid_as, + vtd_flush_pasid, &pc_info); +} + static bool vtd_process_pasid_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) { + uint16_t domain_id; + uint32_t pasid; + VTDPASIDCacheInfo pc_info; + if ((inv_desc->val[0] & VTD_INV_DESC_PASIDC_RSVD_VAL0) || (inv_desc->val[1] & VTD_INV_DESC_PASIDC_RSVD_VAL1) || (inv_desc->val[2] & VTD_INV_DESC_PASIDC_RSVD_VAL2) || @@ -2647,14 +3095,27 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s, return false; } + domain_id = VTD_INV_DESC_PASIDC_DID(inv_desc->val[0]); + pasid = VTD_INV_DESC_PASIDC_PASID(inv_desc->val[0]); + switch (inv_desc->val[0] & VTD_INV_DESC_PASIDC_G) { case VTD_INV_DESC_PASIDC_DSI: + trace_vtd_pasid_cache_dsi(domain_id); + pc_info.type = VTD_PASID_CACHE_DOMSI; + pc_info.domain_id = domain_id; break; case VTD_INV_DESC_PASIDC_PASID_SI: + /* PASID selective implies a DID selective */ + trace_vtd_pasid_cache_psi(domain_id, pasid); + pc_info.type = VTD_PASID_CACHE_PASIDSI; + pc_info.domain_id = domain_id; + pc_info.pasid = pasid; break; case VTD_INV_DESC_PASIDC_GLOBAL: + trace_vtd_pasid_cache_gsi(); + pc_info.type = VTD_PASID_CACHE_GLOBAL_INV; break; default: @@ -2663,6 +3124,7 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s, return false; } + vtd_pasid_cache_sync(s, &pc_info); return true; } @@ -3997,6 +4459,7 @@ static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int32_t devfn, vtd_idev->devfn = (uint8_t)devfn; vtd_idev->iommu_state = s; vtd_idev->idev = idev; + QLIST_INSERT_HEAD(&s->vtd_idev_list, vtd_idev, next); g_hash_table_insert(s->vtd_iommufd_dev, new_key, vtd_idev); @@ -4024,6 +4487,7 @@ static void vtd_dev_unset_iommu_device(PCIBus *bus, void *opaque, int32_t devfn) return; } + QLIST_REMOVE(vtd_idev, next); g_hash_table_remove(s->vtd_iommufd_dev, &key); vtd_iommu_unlock(s); @@ -4460,6 +4924,7 @@ static void vtd_realize(DeviceState *dev, Error **errp) } QLIST_INIT(&s->vtd_as_with_notifiers); + QLIST_INIT(&s->vtd_idev_list); qemu_mutex_init(&s->iommu_lock); s->cap_finalized = false; memory_region_init_io(&s->csrmem, OBJECT(s), &vtd_mem_ops, s, @@ -4487,6 +4952,9 @@ static void vtd_realize(DeviceState *dev, Error **errp) g_free, g_free); s->vtd_iommufd_dev = g_hash_table_new_full(vtd_as_hash, vtd_as_idev_equal, g_free, g_free); + s->vtd_pasid_as = g_hash_table_new_full(vtd_pasid_as_key_hash, + vtd_pasid_as_key_equal, + g_free, g_free); vtd_init(s); pci_setup_iommu(bus, &vtd_iommu_ops, dev); /* Pseudo address space under root PCI bus. */ diff --git a/hw/i386/trace-events b/hw/i386/trace-events index e54799ee82..91d6c400b4 100644 --- a/hw/i386/trace-events +++ b/hw/i386/trace-events @@ -25,6 +25,7 @@ vtd_inv_qi_tail(uint16_t head) "write tail %d" vtd_inv_qi_fetch(void) "" vtd_context_cache_reset(void) "" vtd_pasid_cache_gsi(void) "" +vtd_pasid_cache_reset(void) "" vtd_pasid_cache_dsi(uint16_t domain) "Domian slective PC invalidation domain 0x%"PRIx16 vtd_pasid_cache_psi(uint16_t domain, uint32_t pasid) "PASID slective PC invalidation domain 0x%"PRIx16" pasid 0x%"PRIx32 vtd_re_not_present(uint8_t bus) "Root entry bus %"PRIu8" not present" From patchwork Mon Jan 15 10:37:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519508 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BDA25C47258 for ; Mon, 15 Jan 2024 10:42:02 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKOD-0001a8-0C; Mon, 15 Jan 2024 05:40:21 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKO9-0001SW-3r for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:18 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKO6-0002kC-Ab for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315214; x=1736851214; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=H5UO03Ip2ltkt3njo/jpsR/+1XU0cjOXRwPae/dLvQk=; b=OTew5hmWYnP0UuBUCzt/y8woKnn7NV8IaIO3TBRZBi45SN8r1R9adWEE 4CU0t1Do9ttU/7Npl9sEFVmCAHo8InBiesH2rBSCsInhP5glFDtIKfDHk c5BSg7yOqhIhjn3Rk4GdG+W6my4VIIhbmi68Tb989FiOUq/ESawaFIvGg pbi+rzucgsuUF57i1g6auJalGILU+LgjnyvwQudokcu4v5Z432UJOpwsf DcPtgVRCoOtNxbtT/vAr1qupyn2LWakE/IG+yrkgGPJHx3PKKkClxb8Q9 C/LhEhaDhGmnZiDb4lKv/p3MaFzuhqiTKGEyvr2C3KgPApFxyL/MEskWZ A==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13067661" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13067661" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:40:14 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065349" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065349" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:40:09 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Zhenzhong Duan Subject: [PATCH rfcv1 09/23] vfio/iommufd_device: Add ioas_id in IOMMUFDDevice and pass to vIOMMU Date: Mon, 15 Jan 2024 18:37:21 +0800 Message-Id: <20240115103735.132209-10-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sometimes vIOMMU needs to re-attach device to ioas id of vfio, i.e., when vIOMMU is disabled by guest. This is a prerequisite patch for following one. Signed-off-by: Zhenzhong Duan --- include/sysemu/iommufd_device.h | 3 ++- backends/iommufd_device.c | 3 ++- hw/vfio/iommufd.c | 2 +- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/include/sysemu/iommufd_device.h b/include/sysemu/iommufd_device.h index 799c1345fd..7aeec9b980 100644 --- a/include/sysemu/iommufd_device.h +++ b/include/sysemu/iommufd_device.h @@ -26,6 +26,7 @@ typedef struct IOMMUFDDeviceOps { struct IOMMUFDDevice { IOMMUFDBackend *iommufd; uint32_t dev_id; + uint32_t ioas_id; IOMMUFDDeviceOps *ops; }; @@ -36,5 +37,5 @@ int iommufd_device_get_info(IOMMUFDDevice *idev, uint32_t len, void *data); void iommufd_device_init(void *_idev, size_t instance_size, IOMMUFDBackend *iommufd, uint32_t dev_id, - IOMMUFDDeviceOps *ops); + uint32_t ioas_id, IOMMUFDDeviceOps *ops); #endif diff --git a/backends/iommufd_device.c b/backends/iommufd_device.c index 26f69252d2..f93a201453 100644 --- a/backends/iommufd_device.c +++ b/backends/iommufd_device.c @@ -52,7 +52,7 @@ int iommufd_device_get_info(IOMMUFDDevice *idev, void iommufd_device_init(void *_idev, size_t instance_size, IOMMUFDBackend *iommufd, uint32_t dev_id, - IOMMUFDDeviceOps *ops) + uint32_t ioas_id, IOMMUFDDeviceOps *ops) { IOMMUFDDevice *idev = (IOMMUFDDevice *)_idev; @@ -60,5 +60,6 @@ void iommufd_device_init(void *_idev, size_t instance_size, idev->iommufd = iommufd; idev->dev_id = dev_id; + idev->ioas_id = ioas_id; idev->ops = ops; } diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c index c8c669c59a..3aabe41043 100644 --- a/hw/vfio/iommufd.c +++ b/hw/vfio/iommufd.c @@ -432,7 +432,7 @@ found_container: QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next); iommufd_device_init(idev, sizeof(*idev), container->be, vbasedev->devid, - &vfio_iommufd_device_ops); + container->ioas_id, &vfio_iommufd_device_ops); trace_iommufd_cdev_device_info(vbasedev->name, devfd, vbasedev->num_irqs, vbasedev->num_regions, vbasedev->flags); return 0; From patchwork Mon Jan 15 10:37:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519504 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D1DB3C47258 for ; Mon, 15 Jan 2024 10:41:24 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKOR-0001mg-7m; Mon, 15 Jan 2024 05:40:35 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKOH-0001ic-SF for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:28 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKOC-0002kC-Fz for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:23 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315220; x=1736851220; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=F+vI2gnZC8OhgKuPwXC7M/r937Pf3KWSvF9lwExyz+o=; b=LM5xUpi/qjrODbYu1ZKc6vya/XAVGOa0ph64ABDkRfJ0NuOl5GPt3nZP EE1WrWRRDNR0nAIN+Vpl6B4ZjRDb0iNQ4zi8+b38QKMRc4La2JYoNHS7p OjNEj7ozMV5wzNuofl+/rJg5FuNE+0JF+dNikp6rSnWW/QzcsKuv2DQ94 2CGb/141SDUbmwBfTywJISZ731eHogqwoFqeXg0HG63nU1F5tE+GyBt2L Z5pXV8qpwkXt1iXOGvPMlGb3Uy4NiMZUrhAaa04vgEWiP00VREZm6v/sC tarvj3cr1CXf8Krc5vOMvTPbWbJyKqRnu/CdajuMvH50yQgTlegeOuA3l Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13067689" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13067689" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:40:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065353" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065353" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:40:13 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Zhenzhong Duan , Yi Sun , Marcel Apfelbaum , Paolo Bonzini , Richard Henderson , Eduardo Habkost Subject: [PATCH rfcv1 10/23] intel_iommu: bind/unbind guest page table to host Date: Mon, 15 Jan 2024 18:37:22 +0800 Message-Id: <20240115103735.132209-11-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org This captures the guest PASID table entry modifications and propagates the changes to host to attach a hwpt with type determined per guest PGTT configuration. When PGTT is Pass-through(100b), the hwpt on host side is a stage-2 page table(GPA->HPA). When PGTT is First-stage Translation only(001b), the hwpt on host side is a nested page table. The guest page table is configured as stage-1 page table (gIOVA->GPA) whose translation result would further go through host VT-d stage-2 page table(GPA->HPA) under nested translation mode. This is the key to support gIOVA over stage-1 page table for Intel VT-d in virtualization environment. Stage-2 page table could be shared by different devices if there is no conflict and devices link to same iommufd object, i.e. devices under same host IOMMU can share same stage-2 page table. If there is conflict, i.e. there is one device under non cache coherency mode which is different from others, it requires a separate stage-2 page table in non-CC mode. See below example diagram: IntelIOMMUState | V .------------------. .------------------. | VTDIOASContainer |--->| VTDIOASContainer |--->... | (iommufd0) | | (iommufd1) | .------------------. .------------------. | | | .-->... V .-------------------. .-------------------. | VTDS2Hwpt(CC) |--->| VTDS2Hwpt(non-CC) |-->... .-------------------. .-------------------. | | | | | | .-----------. .-----------. .------------. | IOMMUFD | | IOMMUFD | | IOMMUFD | | Device(CC)| | Device(CC)| | Device | | (iommufd0)| | (iommufd0)| | (non-CC) | | | | | | (iommufd0) | .-----------. .-----------. .------------. Co-Authored-by: Yi Liu Signed-off-by: Yi Liu Signed-off-by: Yi Sun Signed-off-by: Zhenzhong Duan --- hw/i386/intel_iommu_internal.h | 16 + include/hw/i386/intel_iommu.h | 30 ++ hw/i386/intel_iommu.c | 641 ++++++++++++++++++++++++++++++++- hw/i386/trace-events | 8 + 4 files changed, 677 insertions(+), 18 deletions(-) diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 16dc712e94..e33c9f54b5 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -199,6 +199,7 @@ #define VTD_ECAP_SMTS (1ULL << 43) #define VTD_ECAP_SLTS (1ULL << 46) #define VTD_ECAP_FLTS (1ULL << 47) +#define VTD_ECAP_RPS (1ULL << 49) #define VTD_ECAP_MASK (VTD_ECAP_SRS | VTD_ECAP_EAFS) #define VTD_GET_PSS(val) (((val) >> 35) & 0x1f) @@ -518,6 +519,14 @@ typedef struct VTDRootEntry VTDRootEntry; #define VTD_SM_CONTEXT_ENTRY_RSVD_VAL0(aw) (0x1e0ULL | ~VTD_HAW_MASK(aw)) #define VTD_SM_CONTEXT_ENTRY_RSVD_VAL1 0xffffffffffe00000ULL +enum VTDPASIDOp { + VTD_PASID_BIND, + VTD_PASID_UPDATE, + VTD_PASID_UNBIND, + VTD_OP_NUM +}; +typedef enum VTDPASIDOp VTDPASIDOp; + typedef enum VTDPCInvType { /* force reset all */ VTD_PASID_CACHE_FORCE_RESET = 0, @@ -533,6 +542,7 @@ struct VTDPASIDCacheInfo { uint32_t pasid; PCIBus *bus; uint16_t devfn; + bool error_happened; }; typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo; @@ -560,6 +570,12 @@ typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo; #define VTD_SM_PASID_ENTRY_AW 7ULL /* Adjusted guest-address-width */ #define VTD_SM_PASID_ENTRY_DID(val) ((val) & VTD_DOMAIN_ID_MASK) +#define VTD_SM_PASID_ENTRY_FLPM 3ULL +#define VTD_SM_PASID_ENTRY_FLPTPTR (~0xfffULL) +#define VTD_SM_PASID_ENTRY_SRE_BIT(val) (!!((val) & 1ULL)) +#define VTD_SM_PASID_ENTRY_WPE_BIT(val) (!!(((val) >> 4) & 1ULL)) +#define VTD_SM_PASID_ENTRY_EAFE_BIT(val) (!!(((val) >> 7) & 1ULL)) + /* Second Level Page Translation Pointer*/ #define VTD_SM_PASID_ENTRY_SLPTPTR (~0xfffULL) diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h index c7b707a3d5..d3122cf699 100644 --- a/include/hw/i386/intel_iommu.h +++ b/include/hw/i386/intel_iommu.h @@ -65,6 +65,9 @@ typedef struct VTDPASIDEntry VTDPASIDEntry; typedef struct VTDIOMMUFDDevice VTDIOMMUFDDevice; typedef struct VTDPASIDCacheEntry VTDPASIDCacheEntry; typedef struct VTDPASIDAddressSpace VTDPASIDAddressSpace; +typedef struct VTDHwpt VTDHwpt; +typedef struct VTDIOASContainer VTDIOASContainer; +typedef struct VTDS2Hwpt VTDS2Hwpt; /* Context-Entry */ struct VTDContextEntry { @@ -102,14 +105,37 @@ struct pasid_key { uint16_t sid; }; +struct VTDIOASContainer { + IOMMUFDBackend *iommufd; + uint32_t ioas_id; + MemoryListener listener; + QLIST_HEAD(, VTDS2Hwpt) s2_hwpt_list; + QLIST_ENTRY(VTDIOASContainer) next; + Error *error; +}; + +struct VTDS2Hwpt { + uint32_t users; + uint32_t hwpt_id; + VTDIOASContainer *container; + QLIST_ENTRY(VTDS2Hwpt) next; +}; + +struct VTDHwpt { + uint32_t hwpt_id; + VTDS2Hwpt *s2_hwpt; +}; + struct VTDPASIDCacheEntry { struct VTDPASIDEntry pasid_entry; + bool cache_filled; }; struct VTDPASIDAddressSpace { PCIBus *bus; uint8_t devfn; uint32_t pasid; + VTDHwpt hwpt; IntelIOMMUState *iommu_state; VTDContextCacheEntry context_cache_entry; QLIST_ENTRY(VTDPASIDAddressSpace) next; @@ -330,8 +356,12 @@ struct IntelIOMMUState { /* list of VTDIOMMUFDDevices */ QLIST_HEAD(, VTDIOMMUFDDevice) vtd_idev_list; + QLIST_HEAD(, VTDIOASContainer) containers; + GHashTable *vtd_iommufd_dev; /* VTDIOMMUFDDevice */ + VTDHwpt *s2_hwpt; + /* interrupt remapping */ bool intr_enabled; /* Whether guest enabled IR */ dma_addr_t intr_root; /* Interrupt remapping table pointer */ diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index a1a1f23246..df93fcacd8 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -40,6 +40,7 @@ #include "migration/vmstate.h" #include "trace.h" #include "qemu/jhash.h" +#include "sysemu/iommufd.h" /* context entry operations */ #define VTD_CE_GET_RID2PASID(ce) \ @@ -771,6 +772,24 @@ static inline uint32_t vtd_sm_ce_get_pdt_entry_num(VTDContextEntry *ce) return 1U << (VTD_SM_CONTEXT_ENTRY_PDTS(ce->val[0]) + 7); } +static inline uint32_t vtd_pe_get_fl_aw(VTDPASIDEntry *pe) +{ + return 48 + ((pe->val[2] >> 2) & VTD_SM_PASID_ENTRY_FLPM) * 9; +} + +static inline dma_addr_t vtd_pe_get_flpt_base(VTDPASIDEntry *pe) +{ + return pe->val[2] & VTD_SM_PASID_ENTRY_FLPTPTR; +} + +static inline void pasid_cache_info_set_error(VTDPASIDCacheInfo *pc_info) +{ + if (pc_info->error_happened) { + return; + } + pc_info->error_happened = true; +} + static inline bool vtd_pdire_present(VTDPASIDDirEntry *pdire) { return pdire->val & 1; @@ -1631,6 +1650,17 @@ static int vtd_address_space_sync(VTDAddressSpace *vtd_as) return vtd_sync_shadow_page_table_range(vtd_as, &ce, 0, UINT64_MAX); } +static bool vtd_pe_pgtt_is_pt(VTDPASIDEntry *pe) +{ + return (VTD_PE_GET_TYPE(pe) == VTD_SM_PASID_ENTRY_PT); +} + +/* check if pgtt is first stage translation */ +static bool vtd_pe_pgtt_is_flt(VTDPASIDEntry *pe) +{ + return (VTD_PE_GET_TYPE(pe) == VTD_SM_PASID_ENTRY_FLT); +} + /* * Check if specific device is configured to bypass address * translation for DMA requests. In Scalable Mode, bypass @@ -1652,7 +1682,7 @@ static bool vtd_dev_pt_enabled(IntelIOMMUState *s, VTDContextEntry *ce, */ return false; } - return (VTD_PE_GET_TYPE(&pe) == VTD_SM_PASID_ENTRY_PT); + return vtd_pe_pgtt_is_pt(&pe); } return (vtd_ce_get_type(ce) == VTD_CONTEXT_TT_PASS_THROUGH); @@ -2091,6 +2121,543 @@ static void vtd_context_global_invalidate(IntelIOMMUState *s) vtd_iommu_replay_all(s); } +static bool iommufd_listener_skipped_section(MemoryRegionSection *section) +{ + return !memory_region_is_ram(section->mr) || + memory_region_is_protected(section->mr) || + /* + * Sizing an enabled 64-bit BAR can cause spurious mappings to + * addresses in the upper part of the 64-bit address space. These + * are never accessed by the CPU and beyond the address width of + * some IOMMU hardware. TODO: VFIO should tell us the IOMMU width. + */ + section->offset_within_address_space & (1ULL << 63); +} + +static void iommufd_listener_region_add_s2domain(MemoryListener *listener, + MemoryRegionSection *section) +{ + VTDIOASContainer *container = container_of(listener, + VTDIOASContainer, listener); + IOMMUFDBackend *iommufd = container->iommufd; + uint32_t ioas_id = container->ioas_id; + hwaddr iova; + Int128 llend, llsize; + void *vaddr; + Error *err = NULL; + int ret; + + if (iommufd_listener_skipped_section(section)) { + return; + } + iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space); + llend = int128_make64(section->offset_within_address_space); + llend = int128_add(llend, section->size); + llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask())); + llsize = int128_sub(llend, int128_make64(iova)); + vaddr = memory_region_get_ram_ptr(section->mr) + + section->offset_within_region + + (iova - section->offset_within_address_space); + + memory_region_ref(section->mr); + + ret = iommufd_backend_map_dma(iommufd, ioas_id, iova, int128_get64(llsize), + vaddr, section->readonly); + if (!ret) { + return; + } + + error_setg(&err, + "iommufd_listener_region_add_s2domain(%p, 0x%"HWADDR_PRIx", " + "0x%"HWADDR_PRIx", %p) = %d (%s)", + container, iova, int128_get64(llsize), vaddr, ret, + strerror(-ret)); + + if (memory_region_is_ram_device(section->mr)) { + /* Allow unexpected mappings not to be fatal for RAM devices */ + error_report_err(err); + return; + } + + if (!container->error) { + error_propagate_prepend(&container->error, err, "Region %s: ", + memory_region_name(section->mr)); + } else { + error_free(err); + } +} + +static void iommufd_listener_region_del_s2domain(MemoryListener *listener, + MemoryRegionSection *section) +{ + VTDIOASContainer *container = container_of(listener, + VTDIOASContainer, listener); + IOMMUFDBackend *iommufd = container->iommufd; + uint32_t ioas_id = container->ioas_id; + hwaddr iova; + Int128 llend, llsize; + int ret; + + if (iommufd_listener_skipped_section(section)) { + return; + } + iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space); + llend = int128_make64(section->offset_within_address_space); + llend = int128_add(llend, section->size); + llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask())); + llsize = int128_sub(llend, int128_make64(iova)); + + ret = iommufd_backend_unmap_dma(iommufd, ioas_id, + iova, int128_get64(llsize)); + if (ret) { + error_report("iommufd_listener_region_del_s2domain(%p, " + "0x%"HWADDR_PRIx", 0x%"HWADDR_PRIx") = %d (%s)", + container, iova, int128_get64(llsize), ret, + strerror(-ret)); + } + + memory_region_unref(section->mr); +} + +static const MemoryListener iommufd_s2domain_memory_listener = { + .name = "iommufd_s2domain", + .priority = 1000, + .region_add = iommufd_listener_region_add_s2domain, + .region_del = iommufd_listener_region_del_s2domain, +}; + +static void vtd_init_s1_hwpt_data(struct iommu_hwpt_vtd_s1 *vtd, + VTDPASIDEntry *pe) +{ + memset(vtd, 0, sizeof(*vtd)); + + vtd->flags = (VTD_SM_PASID_ENTRY_SRE_BIT(pe->val[2]) ? + IOMMU_VTD_S1_SRE : 0) | + (VTD_SM_PASID_ENTRY_WPE_BIT(pe->val[2]) ? + IOMMU_VTD_S1_WPE : 0) | + (VTD_SM_PASID_ENTRY_EAFE_BIT(pe->val[2]) ? + IOMMU_VTD_S1_EAFE : 0); + vtd->addr_width = vtd_pe_get_fl_aw(pe); + vtd->pgtbl_addr = (uint64_t)vtd_pe_get_flpt_base(pe); +} + +static int vtd_create_s1_hwpt(IOMMUFDDevice *idev, + VTDS2Hwpt *s2_hwpt, VTDHwpt *hwpt, + VTDPASIDEntry *pe, Error **errp) +{ + struct iommu_hwpt_vtd_s1 vtd; + uint32_t hwpt_id, s2_hwpt_id = s2_hwpt->hwpt_id; + int ret; + + vtd_init_s1_hwpt_data(&vtd, pe); + + ret = iommufd_backend_alloc_hwpt(idev->iommufd, idev->dev_id, + s2_hwpt_id, 0, IOMMU_HWPT_DATA_VTD_S1, + sizeof(vtd), &vtd, &hwpt_id); + if (ret) { + error_setg(errp, "Failed to allocate stage-1 page table, dev_id %d", + idev->dev_id); + return ret; + } + + hwpt->hwpt_id = hwpt_id; + + return 0; +} + +static void vtd_destroy_s1_hwpt(IOMMUFDDevice *idev, VTDHwpt *hwpt) +{ + iommufd_backend_free_id(idev->iommufd, hwpt->hwpt_id); +} + +static VTDS2Hwpt *vtd_ioas_container_get_s2_hwpt(VTDIOASContainer *container, + uint32_t hwpt_id) +{ + VTDS2Hwpt *s2_hwpt; + + QLIST_FOREACH(s2_hwpt, &container->s2_hwpt_list, next) { + if (s2_hwpt->hwpt_id == hwpt_id) { + return s2_hwpt; + } + } + + s2_hwpt = g_malloc0(sizeof(*s2_hwpt)); + + s2_hwpt->hwpt_id = hwpt_id; + s2_hwpt->container = container; + QLIST_INSERT_HEAD(&container->s2_hwpt_list, s2_hwpt, next); + + return s2_hwpt; +} + +static void vtd_ioas_container_put_s2_hwpt(VTDS2Hwpt *s2_hwpt) +{ + VTDIOASContainer *container = s2_hwpt->container; + + if (s2_hwpt->users) { + return; + } + + QLIST_REMOVE(s2_hwpt, next); + iommufd_backend_free_id(container->iommufd, s2_hwpt->hwpt_id); + g_free(s2_hwpt); +} + +static void vtd_ioas_container_destroy(VTDIOASContainer *container) +{ + if (!QLIST_EMPTY(&container->s2_hwpt_list)) { + return; + } + + QLIST_REMOVE(container, next); + memory_listener_unregister(&container->listener); + iommufd_backend_free_id(container->iommufd, container->ioas_id); + g_free(container); +} + +static int vtd_device_attach_hwpt(VTDIOMMUFDDevice *vtd_idev, + uint32_t rid_pasid, VTDPASIDEntry *pe, + VTDS2Hwpt *s2_hwpt, VTDHwpt *hwpt, + Error **errp) +{ + IOMMUFDDevice *idev = vtd_idev->idev; + int ret; + + if (vtd_pe_pgtt_is_flt(pe)) { + ret = vtd_create_s1_hwpt(vtd_idev->idev, s2_hwpt, + hwpt, pe, errp); + if (ret) { + return ret; + } + } else { + hwpt->hwpt_id = s2_hwpt->hwpt_id; + } + + ret = iommufd_device_attach_hwpt(idev, hwpt->hwpt_id); + trace_vtd_device_attach_hwpt(idev->dev_id, rid_pasid, hwpt->hwpt_id, ret); + if (ret) { + if (vtd_pe_pgtt_is_flt(pe)) { + vtd_destroy_s1_hwpt(idev, hwpt); + } + hwpt->hwpt_id = 0; + error_setg(errp, "dev_id %d pasid %d failed to attach hwpt %d", + idev->dev_id, rid_pasid, hwpt->hwpt_id); + return ret; + } + + s2_hwpt->users++; + hwpt->s2_hwpt = s2_hwpt; + + return 0; +} + +static void vtd_device_detach_hwpt(VTDIOMMUFDDevice *vtd_idev, + uint32_t rid_pasid, VTDPASIDEntry *pe, + VTDHwpt *hwpt, Error **errp) +{ + IOMMUFDDevice *idev = vtd_idev->idev; + int ret; + + if (vtd_idev->iommu_state->dmar_enabled) { + ret = iommufd_device_detach_hwpt(idev); + trace_vtd_device_detach_hwpt(idev->dev_id, rid_pasid, ret); + } else { + ret = iommufd_device_attach_hwpt(idev, idev->ioas_id); + trace_vtd_device_reattach_def_ioas(idev->dev_id, rid_pasid, + idev->ioas_id, ret); + } + + if (ret) { + error_setg(errp, "dev_id %d pasid %d failed to attach hwpt %d", + idev->dev_id, rid_pasid, hwpt->hwpt_id); + } + + if (vtd_pe_pgtt_is_flt(pe)) { + vtd_destroy_s1_hwpt(idev, hwpt); + } + + hwpt->s2_hwpt->users--; + hwpt->s2_hwpt = NULL; + hwpt->hwpt_id = 0; +} + +static int vtd_device_attach_container(VTDIOMMUFDDevice *vtd_idev, + VTDIOASContainer *container, + uint32_t rid_pasid, + VTDPASIDEntry *pe, + VTDHwpt *hwpt, + Error **errp) +{ + IOMMUFDDevice *idev = vtd_idev->idev; + IOMMUFDBackend *iommufd = idev->iommufd; + VTDS2Hwpt *s2_hwpt; + uint32_t s2_hwpt_id; + Error *err = NULL; + int ret; + + /* try to attach to an existing hwpt in this container */ + QLIST_FOREACH(s2_hwpt, &container->s2_hwpt_list, next) { + ret = vtd_device_attach_hwpt(vtd_idev, rid_pasid, pe, + s2_hwpt, hwpt, &err); + if (ret) { + const char *msg = error_get_pretty(err); + + trace_vtd_device_fail_attach_existing_hwpt(msg); + error_free(err); + err = NULL; + } else { + goto found_hwpt; + } + } + + ret = iommufd_backend_alloc_hwpt(iommufd, idev->dev_id, + container->ioas_id, + IOMMU_HWPT_ALLOC_NEST_PARENT, + IOMMU_HWPT_DATA_NONE, + 0, NULL, &s2_hwpt_id); + if (ret) { + error_setg_errno(errp, errno, "error alloc parent hwpt"); + return ret; + } + + s2_hwpt = vtd_ioas_container_get_s2_hwpt(container, s2_hwpt_id); + + /* Attach vtd device to a new allocated hwpt within iommufd */ + ret = vtd_device_attach_hwpt(vtd_idev, rid_pasid, pe, s2_hwpt, hwpt, &err); + if (ret) { + goto err_attach_hwpt; + } + +found_hwpt: + trace_vtd_device_attach_container(iommufd->fd, idev->dev_id, rid_pasid, + container->ioas_id, hwpt->hwpt_id); + return 0; + +err_attach_hwpt: + vtd_ioas_container_put_s2_hwpt(s2_hwpt); + return ret; +} + +static void vtd_device_detach_container(VTDIOMMUFDDevice *vtd_idev, + uint32_t rid_pasid, + VTDPASIDEntry *pe, + VTDHwpt *hwpt, + Error **errp) +{ + IOMMUFDDevice *idev = vtd_idev->idev; + IOMMUFDBackend *iommufd = idev->iommufd; + VTDS2Hwpt *s2_hwpt = hwpt->s2_hwpt; + + trace_vtd_device_detach_container(iommufd->fd, idev->dev_id, rid_pasid); + vtd_device_detach_hwpt(vtd_idev, rid_pasid, pe, hwpt, errp); + vtd_ioas_container_put_s2_hwpt(s2_hwpt); +} + +static int vtd_device_attach_iommufd(VTDIOMMUFDDevice *vtd_idev, + uint32_t rid_pasid, + VTDPASIDEntry *pe, + VTDHwpt *hwpt, + Error **errp) +{ + IntelIOMMUState *s = vtd_idev->iommu_state; + VTDIOASContainer *container; + IOMMUFDBackend *iommufd = vtd_idev->idev->iommufd; + Error *err = NULL; + uint32_t ioas_id; + int ret; + + /* try to attach to an existing container in this space */ + QLIST_FOREACH(container, &s->containers, next) { + if (container->iommufd != iommufd) { + continue; + } + + if (vtd_device_attach_container(vtd_idev, container, + rid_pasid, pe, hwpt, &err)) { + const char *msg = error_get_pretty(err); + + trace_vtd_device_fail_attach_existing_container(msg); + error_free(err); + err = NULL; + } else { + return 0; + } + } + + /* Need to allocate a new dedicated container */ + ret = iommufd_backend_alloc_ioas(iommufd, &ioas_id, errp); + if (ret < 0) { + return ret; + } + + trace_vtd_device_alloc_ioas(iommufd->fd, ioas_id); + + container = g_malloc0(sizeof(*container)); + container->iommufd = iommufd; + container->ioas_id = ioas_id; + QLIST_INIT(&container->s2_hwpt_list); + + if (vtd_device_attach_container(vtd_idev, container, + rid_pasid, pe, hwpt, errp)) { + goto err_attach_container; + } + + container->listener = iommufd_s2domain_memory_listener; + memory_listener_register(&container->listener, &address_space_memory); + + if (container->error) { + ret = -1; + error_propagate_prepend(errp, container->error, + "memory listener initialization failed: "); + goto err_listener_register; + } + + QLIST_INSERT_HEAD(&s->containers, container, next); + + return 0; + +err_listener_register: + vtd_device_detach_container(vtd_idev, rid_pasid, pe, hwpt, errp); +err_attach_container: + iommufd_backend_free_id(iommufd, container->ioas_id); + g_free(container); + return ret; +} + +static void vtd_device_detach_iommufd(VTDIOMMUFDDevice *vtd_idev, + uint32_t rid_pasid, + VTDPASIDEntry *pe, + VTDHwpt *hwpt, + Error **errp) +{ + VTDIOASContainer *container = hwpt->s2_hwpt->container; + + vtd_device_detach_container(vtd_idev, rid_pasid, pe, hwpt, errp); + vtd_ioas_container_destroy(container); +} + +static int vtd_device_attach_pgtbl(VTDIOMMUFDDevice *vtd_idev, + VTDPASIDEntry *pe, + VTDPASIDAddressSpace *vtd_pasid_as, + uint32_t rid_pasid) +{ + /* + * If pe->gptt != FLT, should be go ahead to do bind as host only + * accepts guest FLT under nesting. If pe->pgtt==PT, should setup + * the pasid with GPA page table. Otherwise should return failure. + */ + if (!vtd_pe_pgtt_is_flt(pe) && !vtd_pe_pgtt_is_pt(pe)) { + return -EINVAL; + } + + /* Should fail if the FLPT base is 0 */ + if (vtd_pe_pgtt_is_flt(pe) && !vtd_pe_get_flpt_base(pe)) { + return -EINVAL; + } + + return vtd_device_attach_iommufd(vtd_idev, rid_pasid, pe, + &vtd_pasid_as->hwpt, &error_abort); +} + +static int vtd_device_detach_pgtbl(VTDIOMMUFDDevice *vtd_idev, + VTDPASIDAddressSpace *vtd_pasid_as, + uint32_t rid_pasid) +{ + VTDPASIDEntry *cached_pe = vtd_pasid_as->pasid_cache_entry.cache_filled ? + &vtd_pasid_as->pasid_cache_entry.pasid_entry : NULL; + + if (!cached_pe || + (!vtd_pe_pgtt_is_flt(cached_pe) && !vtd_pe_pgtt_is_pt(cached_pe))) { + return 0; + } + + vtd_device_detach_iommufd(vtd_idev, rid_pasid, cached_pe, + &vtd_pasid_as->hwpt, &error_abort); + + return 0; +} + +static int vtd_dev_get_rid2pasid(IntelIOMMUState *s, uint8_t bus_num, + uint8_t devfn, uint32_t *rid_pasid) +{ + VTDContextEntry ce; + int ret; + + /* + * Currently, ECAP.RPS bit is likely to be reported as "Clear". + * And per VT-d 3.1 spec, it will use PASID #0 as RID2PASID when + * RPS bit is reported as "Clear". + */ + if (likely(!(s->ecap & VTD_ECAP_RPS))) { + *rid_pasid = 0; + return 0; + } + + /* + * In future, to improve performance, could try to fetch context + * entry from cache firstly. + */ + ret = vtd_dev_to_context_entry(s, bus_num, devfn, &ce); + if (!ret) { + *rid_pasid = VTD_CE_GET_RID2PASID(&ce); + } + + return ret; +} + +/** + * Caller should hold iommu_lock. + */ +static int vtd_bind_guest_pasid(VTDPASIDAddressSpace *vtd_pasid_as, + VTDPASIDEntry *pe, VTDPASIDOp op) +{ + IntelIOMMUState *s = vtd_pasid_as->iommu_state; + VTDIOMMUFDDevice *vtd_idev; + uint32_t rid_pasid; + int devfn = vtd_pasid_as->devfn; + int ret = -EINVAL; + struct vtd_as_key key = { + .bus = vtd_pasid_as->bus, + .devfn = devfn, + }; + + vtd_idev = g_hash_table_lookup(s->vtd_iommufd_dev, &key); + if (!vtd_idev || !vtd_idev->idev) { + /* means no need to go further, e.g. for emulated devices */ + return 0; + } + + if (vtd_dev_get_rid2pasid(s, pci_bus_num(vtd_pasid_as->bus), + devfn, &rid_pasid)) { + error_report("Unable to get rid_pasid for devfn: %d!", devfn); + return ret; + } + + if (vtd_pasid_as->pasid != rid_pasid) { + error_report("Non-rid_pasid %d not supported yet", vtd_pasid_as->pasid); + return ret; + } + + switch (op) { + case VTD_PASID_UPDATE: + case VTD_PASID_BIND: + { + ret = vtd_device_attach_pgtbl(vtd_idev, pe, vtd_pasid_as, rid_pasid); + break; + } + case VTD_PASID_UNBIND: + { + ret = vtd_device_detach_pgtbl(vtd_idev, vtd_pasid_as, rid_pasid); + break; + } + default: + error_report_once("Unknown VTDPASIDOp!!!\n"); + break; + } + + return ret; +} + /* Do a context-cache device-selective invalidation. * @func_mask: FM field after shifting */ @@ -2717,22 +3284,30 @@ static bool vtd_pasid_entry_compare(VTDPASIDEntry *p1, VTDPASIDEntry *p2) * This function fills in the pasid entry in &vtd_pasid_as. Caller * of this function should hold iommu_lock. */ -static void vtd_fill_pe_in_cache(IntelIOMMUState *s, - VTDPASIDAddressSpace *vtd_pasid_as, - VTDPASIDEntry *pe) +static int vtd_fill_pe_in_cache(IntelIOMMUState *s, + VTDPASIDAddressSpace *vtd_pasid_as, + VTDPASIDEntry *pe) { VTDPASIDCacheEntry *pc_entry = &vtd_pasid_as->pasid_cache_entry; + int ret; - if (vtd_pasid_entry_compare(pe, &pc_entry->pasid_entry)) { - /* No need to go further as cached pasid entry is latest */ - return; + if (pc_entry->cache_filled) { + if (vtd_pasid_entry_compare(pe, &pc_entry->pasid_entry)) { + /* No need to go further as cached pasid entry is latest */ + return 0; + } + ret = vtd_bind_guest_pasid(vtd_pasid_as, + pe, VTD_PASID_UPDATE); + } else { + ret = vtd_bind_guest_pasid(vtd_pasid_as, + pe, VTD_PASID_BIND); } - pc_entry->pasid_entry = *pe; - /* - * TODO: - * - send pasid bind to host for passthru devices - */ + if (!ret) { + pc_entry->pasid_entry = *pe; + pc_entry->cache_filled = true; + } + return ret; } /* @@ -2795,7 +3370,11 @@ static gboolean vtd_flush_pasid(gpointer key, gpointer value, goto remove; } - vtd_fill_pe_in_cache(s, vtd_pasid_as, &pe); + if (vtd_fill_pe_in_cache(s, vtd_pasid_as, &pe)) { + pasid_cache_info_set_error(pc_info); + return true; + } + /* * TODO: * - when pasid-base-iotlb(piotlb) infrastructure is ready, @@ -2805,10 +3384,14 @@ static gboolean vtd_flush_pasid(gpointer key, gpointer value, remove: /* * TODO: - * - send pasid bind to host for passthru devices * - when pasid-base-iotlb(piotlb) infrastructure is ready, * should invalidate QEMU piotlb togehter with this change. */ + if (vtd_bind_guest_pasid(vtd_pasid_as, + NULL, VTD_PASID_UNBIND)) { + pasid_cache_info_set_error(pc_info); + } + return true; } @@ -2854,6 +3437,22 @@ static VTDPASIDAddressSpace *vtd_add_find_pasid_as(IntelIOMMUState *s, return vtd_pasid_as; } +/* Caller of this function should hold iommu_lock. */ +static void vtd_remove_pasid_as(VTDPASIDAddressSpace *vtd_pasid_as) +{ + IntelIOMMUState *s = vtd_pasid_as->iommu_state; + PCIBus *bus = vtd_pasid_as->bus; + struct pasid_key key; + int devfn = vtd_pasid_as->devfn; + uint32_t pasid = vtd_pasid_as->pasid; + uint16_t sid; + + sid = PCI_BUILD_BDF(pci_bus_num(bus), devfn); + vtd_init_pasid_key(pasid, sid, &key); + + g_hash_table_remove(s->vtd_pasid_as, &key); +} + /* Caller of this function should hold iommu_lock. */ static void vtd_sm_pasid_table_walk_one(IntelIOMMUState *s, dma_addr_t pt_base, @@ -2884,7 +3483,10 @@ static void vtd_sm_pasid_table_walk_one(IntelIOMMUState *s, pasid = pasid_next; continue; } - vtd_fill_pe_in_cache(s, vtd_pasid_as, &pe); + if (vtd_fill_pe_in_cache(s, vtd_pasid_as, &pe)) { + vtd_remove_pasid_as(vtd_pasid_as); + pasid_cache_info_set_error(info); + } } pasid = pasid_next; } @@ -2991,6 +3593,9 @@ static void vtd_replay_guest_pasid_bindings(IntelIOMMUState *s, walk_info.devfn = vtd_idev->devfn; vtd_replay_pasid_bind_for_dev(s, start, end, &walk_info); } + if (walk_info.error_happened) { + pasid_cache_info_set_error(pc_info); + } } /* @@ -3060,7 +3665,7 @@ static void vtd_pasid_cache_sync(IntelIOMMUState *s, /* Caller of this function should hold iommu_lock */ static void vtd_pasid_cache_reset(IntelIOMMUState *s) { - VTDPASIDCacheInfo pc_info; + VTDPASIDCacheInfo pc_info = { .error_happened = false, }; trace_vtd_pasid_cache_reset(); @@ -3082,9 +3687,9 @@ static void vtd_pasid_cache_reset(IntelIOMMUState *s) static bool vtd_process_pasid_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) { + VTDPASIDCacheInfo pc_info = { .error_happened = false, }; uint16_t domain_id; uint32_t pasid; - VTDPASIDCacheInfo pc_info; if ((inv_desc->val[0] & VTD_INV_DESC_PASIDC_RSVD_VAL0) || (inv_desc->val[1] & VTD_INV_DESC_PASIDC_RSVD_VAL1) || @@ -3125,7 +3730,7 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s, } vtd_pasid_cache_sync(s, &pc_info); - return true; + return !pc_info.error_happened ? true : false; } static bool vtd_process_inv_iec_desc(IntelIOMMUState *s, diff --git a/hw/i386/trace-events b/hw/i386/trace-events index 91d6c400b4..17e7191696 100644 --- a/hw/i386/trace-events +++ b/hw/i386/trace-events @@ -72,6 +72,14 @@ vtd_frr_new(int index, uint64_t hi, uint64_t lo) "index %d high 0x%"PRIx64" low vtd_warn_invalid_qi_tail(uint16_t tail) "tail 0x%"PRIx16 vtd_warn_ir_vector(uint16_t sid, int index, int vec, int target) "sid 0x%"PRIx16" index %d vec %d (should be: %d)" vtd_warn_ir_trigger(uint16_t sid, int index, int trig, int target) "sid 0x%"PRIx16" index %d trigger %d (should be: %d)" +vtd_device_attach_hwpt(uint32_t dev_id, uint32_t pasid, uint32_t hwpt_id, int ret) "dev_id %d pasid %d hwpt_id %d, ret: %d" +vtd_device_detach_hwpt(uint32_t dev_id, uint32_t pasid, int ret) "dev_id %d pasid %d ret: %d" +vtd_device_reattach_def_ioas(uint32_t dev_id, uint32_t pasid, uint32_t ioas_id, int ret) "dev_id %d pasid %d ioas_id %d, ret: %d" +vtd_device_fail_attach_existing_hwpt(const char *msg) " %s" +vtd_device_attach_container(int fd, uint32_t dev_id, uint32_t pasid, uint32_t ioas_id, uint32_t hwpt_id) "iommufd %d dev_id %d pasid %d ioas_id %d hwpt_id %d" +vtd_device_detach_container(int fd, uint32_t dev_id, uint32_t pasid) "iommufd %d dev_id %d pasid %d" +vtd_device_fail_attach_existing_container(const char *msg) " %s" +vtd_device_alloc_ioas(int fd, uint32_t ioas_id) "iommufd %d ioas_id %d" # amd_iommu.c amdvi_evntlog_fail(uint64_t addr, uint32_t head) "error: fail to write at addr 0x%"PRIx64" + offset 0x%"PRIx32 From patchwork Mon Jan 15 10:37:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519506 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C5BD8C3DA79 for ; Mon, 15 Jan 2024 10:41:48 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKOU-0001ym-HD; Mon, 15 Jan 2024 05:40:38 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKON-0001mQ-K9 for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:33 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKOJ-0003Bt-HW for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315227; x=1736851227; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Pq4tna4liAgU6SEEAHI54jtuLdPFcBw1b4/wcNEBEC4=; b=cc7N4IIykWCyVkj5bpIZosTQslTGUhbsMfKBs4ZskCWmBiBmaCtTDPu6 dVHjKk0fJTVDrm2A8QtQpwm/PANdgplctu7K/O4JQ12e0C4ITe/wFS70v lGfpoo29yMJ8iWpsOKqZlz1bA/buch2aCoHhS6zkfTIJZikhBCtJUYNws uHW06IN5lAF6d8abFG7cK7Pbt2c3H/1XL6ifHn9jiO3Ko3wbnFOQZeMbI sGG9pfAshEacAW84XJPbajyVuqWaMHzQFeuZxql8o76hypz775n64MO17 6wSf9A+5Hwt84Glb7FRWFO2bJv48R5GrylStyiD/tWSsFt4Fd6Xg4sBhx g==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13067726" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13067726" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:40:24 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065360" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065360" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:40:19 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Zhenzhong Duan , Marcel Apfelbaum , Paolo Bonzini , Richard Henderson , Eduardo Habkost Subject: [PATCH rfcv1 11/23] intel_iommu: ERRATA_772415 workaround Date: Mon, 15 Jan 2024 18:37:23 +0800 Message-Id: <20240115103735.132209-12-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org On a system influenced by ERRATA_772415, IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 is repored by IOMMU_DEVICE_GET_HW_INFO. Due to this errata, even the readonly range mapped on stage-2 page table could still be written. Reference from 4th Gen Intel Xeon Processor Scalable Family Specification Update, Errata Details, SPR17. [0] https://edc.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/eagle-stream/sapphire-rapids-specification-update We utilize the new added IOMMUFD container/ioas/hwpt management framework in VTD. Add a check to create new VTDIOASContainer to hold RW-only mappings, then this VTDIOASContainer can be used as backend for device with ERRATA_772415. See below diagram for details: IntelIOMMUState | V .------------------. .------------------. .-------------------. | VTDIOASContainer |--->| VTDIOASContainer |--->| VTDIOASContainer |-->... | (iommufd0,RW&RO) | | (iommufd1,RW&RO) | | (iommufd0,RW only)| .------------------. .------------------. .-------------------. | | | | .-->... | V V .-------------------. .-------------------. .---------------. | VTDS2Hwpt(CC) |--->| VTDS2Hwpt(non-CC) |-->... | VTDS2Hwpt(CC) |-->... .-------------------. .-------------------. .---------------. | | | | | | | | .-----------. .-----------. .------------. .------------. | IOMMUFD | | IOMMUFD | | IOMMUFD | | IOMMUFD | | Device(CC)| | Device(CC)| | Device | | Device(CC) | | (iommufd0)| | (iommufd0)| | (non-CC) | | (errata) | | | | | | (iommufd0) | | (iommufd0) | .-----------. .-----------. .------------. .------------. Suggested-by: Yi Liu Signed-off-by: Zhenzhong Duan --- include/hw/i386/intel_iommu.h | 2 ++ hw/i386/intel_iommu.c | 27 +++++++++++++++++++-------- 2 files changed, 21 insertions(+), 8 deletions(-) diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h index d3122cf699..72702e10a2 100644 --- a/include/hw/i386/intel_iommu.h +++ b/include/hw/i386/intel_iommu.h @@ -108,6 +108,7 @@ struct pasid_key { struct VTDIOASContainer { IOMMUFDBackend *iommufd; uint32_t ioas_id; + uint32_t errata; MemoryListener listener; QLIST_HEAD(, VTDS2Hwpt) s2_hwpt_list; QLIST_ENTRY(VTDIOASContainer) next; @@ -200,6 +201,7 @@ struct VTDIOMMUFDDevice { PCIBus *bus; uint8_t devfn; IOMMUFDDevice *idev; + uint32_t errata; IntelIOMMUState *iommu_state; QLIST_ENTRY(VTDIOMMUFDDevice) next; }; diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index df93fcacd8..8f9a59ae6f 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -2121,7 +2121,8 @@ static void vtd_context_global_invalidate(IntelIOMMUState *s) vtd_iommu_replay_all(s); } -static bool iommufd_listener_skipped_section(MemoryRegionSection *section) +static bool iommufd_listener_skipped_section(VTDIOASContainer *container, + MemoryRegionSection *section) { return !memory_region_is_ram(section->mr) || memory_region_is_protected(section->mr) || @@ -2131,7 +2132,8 @@ static bool iommufd_listener_skipped_section(MemoryRegionSection *section) * are never accessed by the CPU and beyond the address width of * some IOMMU hardware. TODO: VFIO should tell us the IOMMU width. */ - section->offset_within_address_space & (1ULL << 63); + section->offset_within_address_space & (1ULL << 63) || + (container->errata && section->readonly); } static void iommufd_listener_region_add_s2domain(MemoryListener *listener, @@ -2147,7 +2149,7 @@ static void iommufd_listener_region_add_s2domain(MemoryListener *listener, Error *err = NULL; int ret; - if (iommufd_listener_skipped_section(section)) { + if (iommufd_listener_skipped_section(container, section)) { return; } iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space); @@ -2198,7 +2200,7 @@ static void iommufd_listener_region_del_s2domain(MemoryListener *listener, Int128 llend, llsize; int ret; - if (iommufd_listener_skipped_section(section)) { + if (iommufd_listener_skipped_section(container, section)) { return; } iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space); @@ -2468,7 +2470,8 @@ static int vtd_device_attach_iommufd(VTDIOMMUFDDevice *vtd_idev, /* try to attach to an existing container in this space */ QLIST_FOREACH(container, &s->containers, next) { - if (container->iommufd != iommufd) { + if (container->iommufd != iommufd || + container->errata != vtd_idev->errata) { continue; } @@ -2495,6 +2498,7 @@ static int vtd_device_attach_iommufd(VTDIOMMUFDDevice *vtd_idev, container = g_malloc0(sizeof(*container)); container->iommufd = iommufd; container->ioas_id = ioas_id; + container->errata = vtd_idev->errata; QLIST_INIT(&container->s2_hwpt_list); if (vtd_device_attach_container(vtd_idev, container, @@ -5002,10 +5006,11 @@ static bool vtd_sync_hw_info(IntelIOMMUState *s, struct iommu_hw_info_vtd *vtd, * could bind guest page table to host. */ static bool vtd_check_idev(IntelIOMMUState *s, IOMMUFDDevice *idev, - Error **errp) + uint32_t *flags, Error **errp) { struct iommu_hw_info_vtd vtd; enum iommu_hw_info_type type = IOMMU_HW_INFO_TYPE_INTEL_VTD; + bool passed; if (iommufd_device_get_info(idev, &type, sizeof(vtd), &vtd)) { error_setg(errp, "Failed to get IOMMU capability!!!"); @@ -5017,7 +5022,11 @@ static bool vtd_check_idev(IntelIOMMUState *s, IOMMUFDDevice *idev, return false; } - return vtd_sync_hw_info(s, &vtd, errp); + passed = vtd_sync_hw_info(s, &vtd, errp); + if (passed) { + *flags = vtd.flags; + } + return passed; } static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int32_t devfn, @@ -5030,6 +5039,7 @@ static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int32_t devfn, .devfn = devfn, }; struct vtd_as_key *new_key; + uint32_t flags; assert(0 <= devfn && devfn < PCI_DEVFN_MAX); @@ -5042,7 +5052,7 @@ static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int32_t devfn, return -1; } - if (!vtd_check_idev(s, idev, errp)) { + if (!vtd_check_idev(s, idev, &flags, errp)) { return -1; } @@ -5064,6 +5074,7 @@ static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int32_t devfn, vtd_idev->devfn = (uint8_t)devfn; vtd_idev->iommu_state = s; vtd_idev->idev = idev; + vtd_idev->errata = flags & IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17; QLIST_INSERT_HEAD(&s->vtd_idev_list, vtd_idev, next); g_hash_table_insert(s->vtd_iommufd_dev, new_key, vtd_idev); From patchwork Mon Jan 15 10:37:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519503 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 174F8C3DA79 for ; Mon, 15 Jan 2024 10:41:24 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKOY-0002O0-E8; Mon, 15 Jan 2024 05:40:42 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKOR-0001uR-Gr for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:36 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKON-0002kC-DF for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:34 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315231; x=1736851231; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MGZbZEp7lFxf6DOJqPSLn3wam9euDL90MEGfLXGcpHM=; b=PovsC7a2OYChvgnB/3gOX2LU1nwLdqCgtjVAaJT7ctWVvQcJU4Uqpt/n aEJnGJjp3WfY4q1mkNQjbBOv0Xd1tGFivx/WOCsB6pvQYDOJr1YbVf0Xs +rx5fmo0Q5PNWykTvV9YMSFphEkI58vQ7kbbM5+vnUJNpgu/B3xqI6o6j YDjXn24Uq1K/k/WboajlTMVzLqQJc59JYHAA6ePWEdvbkAGEVu976VQt2 yd5eF3FbU+lOTr4uN5HmqKJ0Bm1KtkuHjRGrNXfU5A35cNfaxLHLhf3/O pCL6ADAIeNx2g1mBTjrgtzZbTpBRQB4wFxF45RyAcxQYU7VJoOJdT5ask g==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13067747" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13067747" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:40:30 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065368" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065368" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:40:25 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Yi Sun , Zhenzhong Duan , Marcel Apfelbaum , Paolo Bonzini , Richard Henderson , Eduardo Habkost Subject: [PATCH rfcv1 12/23] intel_iommu: replay pasid binds after context cache invalidation Date: Mon, 15 Jan 2024 18:37:24 +0800 Message-Id: <20240115103735.132209-13-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Yi Liu This replays guest pasid attachments after context cache invalidation. This is a behavior to ensure safety. Actually, programmer should issue pasid cache invalidation with proper granularity after issuing a context cache invalidation. Signed-off-by: Yi Liu Signed-off-by: Yi Sun Signed-off-by: Zhenzhong Duan --- hw/i386/intel_iommu_internal.h | 1 + hw/i386/intel_iommu.c | 49 ++++++++++++++++++++++++++++++++++ hw/i386/trace-events | 1 + 3 files changed, 51 insertions(+) diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index e33c9f54b5..65fe07c13b 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -532,6 +532,7 @@ typedef enum VTDPCInvType { VTD_PASID_CACHE_FORCE_RESET = 0, /* pasid cache invalidation rely on guest PASID entry */ VTD_PASID_CACHE_GLOBAL_INV, + VTD_PASID_CACHE_DEVSI, VTD_PASID_CACHE_DOMSI, VTD_PASID_CACHE_PASIDSI, } VTDPCInvType; diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 8f9a59ae6f..9058be9efd 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -74,6 +74,10 @@ static void vtd_address_space_refresh_all(IntelIOMMUState *s); static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n); static void vtd_pasid_cache_reset(IntelIOMMUState *s); +static void vtd_pasid_cache_sync(IntelIOMMUState *s, + VTDPASIDCacheInfo *pc_info); +static void vtd_pasid_cache_devsi(IntelIOMMUState *s, + PCIBus *bus, uint16_t devfn); static void vtd_panic_require_caching_mode(void) { @@ -2102,6 +2106,8 @@ static void vtd_iommu_replay_all(IntelIOMMUState *s) static void vtd_context_global_invalidate(IntelIOMMUState *s) { + VTDPASIDCacheInfo pc_info = { .error_happened = false, }; + trace_vtd_inv_desc_cc_global(); /* Protects context cache */ vtd_iommu_lock(s); @@ -2119,6 +2125,9 @@ static void vtd_context_global_invalidate(IntelIOMMUState *s) * VT-d emulation codes. */ vtd_iommu_replay_all(s); + + pc_info.type = VTD_PASID_CACHE_GLOBAL_INV; + vtd_pasid_cache_sync(s, &pc_info); } static bool iommufd_listener_skipped_section(VTDIOASContainer *container, @@ -2720,6 +2729,21 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s, * happened. */ vtd_address_space_sync(vtd_as); + /* + * Per spec, context flush should also followed with PASID + * cache and iotlb flush. Regards to a device selective + * context cache invalidation: + * if (emaulted_device) + * invalidate pasid cahce and pasid-based iotlb + * else if (assigned_device) + * check if the device has been bound to any pasid + * invoke pasid_unbind regards to each bound pasid + * Here, we have vtd_pasid_cache_devsi() to invalidate pasid + * caches, while for piotlb in QEMU, we don't have it yet, so + * no handling. For assigned device, host iommu driver would + * flush piotlb when a pasid unbind is pass down to it. + */ + vtd_pasid_cache_devsi(s, vtd_as->bus, devfn); } } } @@ -3351,6 +3375,12 @@ static gboolean vtd_flush_pasid(gpointer key, gpointer value, /* Fall through */ case VTD_PASID_CACHE_GLOBAL_INV: break; + case VTD_PASID_CACHE_DEVSI: + if (pc_info->bus != bus || + pc_info->devfn != devfn) { + return false; + } + break; default: error_report("invalid pc_info->type"); abort(); @@ -3574,6 +3604,11 @@ static void vtd_replay_guest_pasid_bindings(IntelIOMMUState *s, case VTD_PASID_CACHE_GLOBAL_INV: /* loop all assigned devices */ break; + case VTD_PASID_CACHE_DEVSI: + walk_info.bus = pc_info->bus; + walk_info.devfn = pc_info->devfn; + vtd_replay_pasid_bind_for_dev(s, start, end, &walk_info); + return; case VTD_PASID_CACHE_FORCE_RESET: /* For force reset, no need to go further replay */ return; @@ -3666,6 +3701,20 @@ static void vtd_pasid_cache_sync(IntelIOMMUState *s, vtd_iommu_unlock(s); } +static void vtd_pasid_cache_devsi(IntelIOMMUState *s, + PCIBus *bus, uint16_t devfn) +{ + VTDPASIDCacheInfo pc_info = { .error_happened = false, }; + + trace_vtd_pasid_cache_devsi(devfn); + + pc_info.type = VTD_PASID_CACHE_DEVSI; + pc_info.bus = bus; + pc_info.devfn = devfn; + + vtd_pasid_cache_sync(s, &pc_info); +} + /* Caller of this function should hold iommu_lock */ static void vtd_pasid_cache_reset(IntelIOMMUState *s) { diff --git a/hw/i386/trace-events b/hw/i386/trace-events index 17e7191696..66f7c1ba59 100644 --- a/hw/i386/trace-events +++ b/hw/i386/trace-events @@ -28,6 +28,7 @@ vtd_pasid_cache_gsi(void) "" vtd_pasid_cache_reset(void) "" vtd_pasid_cache_dsi(uint16_t domain) "Domian slective PC invalidation domain 0x%"PRIx16 vtd_pasid_cache_psi(uint16_t domain, uint32_t pasid) "PASID slective PC invalidation domain 0x%"PRIx16" pasid 0x%"PRIx32 +vtd_pasid_cache_devsi(uint16_t devfn) "Dev selective PC invalidation dev: 0x%"PRIx16 vtd_re_not_present(uint8_t bus) "Root entry bus %"PRIu8" not present" vtd_ce_not_present(uint8_t bus, uint8_t devfn) "Context entry bus %"PRIu8" devfn %"PRIu8" not present" vtd_iotlb_page_hit(uint16_t sid, uint64_t addr, uint64_t slpte, uint16_t domain) "IOTLB page hit sid 0x%"PRIx16" iova 0x%"PRIx64" slpte 0x%"PRIx64" domain 0x%"PRIx16 From patchwork Mon Jan 15 10:37:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519521 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 91A17C47DA2 for ; Mon, 15 Jan 2024 10:43:43 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKOZ-0002TO-Dd; Mon, 15 Jan 2024 05:40:43 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKOX-0002ID-57 for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:41 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKOT-0003GV-Fd for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:40 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315237; x=1736851237; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6qvtGAeRI65jpIn0RDpdA0bxkS0bbMXh5alkN//6KZk=; b=CF9GlQX7tPfUFjd5TUaQztssTSZdXCPkLQQabMxAshkQXPUXYdU8FUlK zhsYO92p861yETJIQp5DqpCkkZb88rJTnXQ+y8J4D7BbTT+jBj7ZkSfcD SV/b6uYPA9h884MkCGj4N2moxiPJ8OKFNK2m1DzWz9QFxelxokd+utOzR nWkiLwyd7rRBw0ag9ack2eBO9heVhBgY1hMtji3GCnMqd36IZWhyT5SRs 6mbKrINEM6BHmaPW6M2fl5y6xxuPMwmoDeJXZGwX8XY06efrqpQ1i3dEY CtbbeSZP9VuePj00hsG4YH93CeTAnIAdFzjoLVcnp6XhfFhMqk0KKS7Yw A==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13067761" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13067761" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:40:35 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065387" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065387" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:40:30 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Zhenzhong Duan , Marcel Apfelbaum , Paolo Bonzini , Richard Henderson , Eduardo Habkost Subject: [PATCH rfcv1 13/23] intel_iommu: process PASID-based iotlb invalidation Date: Mon, 15 Jan 2024 18:37:25 +0800 Message-Id: <20240115103735.132209-14-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Yi Liu PASID-based iotlb (piotlb) is used during walking Intel VT-d stage-1 page table. This adds the basic framework for piotlb invalidation, detailed handling will be added in next patch. Signed-off-by: Yi Liu Signed-off-by: Zhenzhong Duan --- hw/i386/intel_iommu_internal.h | 13 +++++++++ hw/i386/intel_iommu.c | 52 ++++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+) diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 65fe07c13b..40361de207 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -458,6 +458,19 @@ typedef union VTDInvDesc VTDInvDesc; #define VTD_INV_DESC_PASIDC_PASID_SI (1ULL << 4) #define VTD_INV_DESC_PASIDC_GLOBAL (3ULL << 4) +#define VTD_INV_DESC_PIOTLB_ALL_IN_PASID (2ULL << 4) +#define VTD_INV_DESC_PIOTLB_PSI_IN_PASID (3ULL << 4) + +#define VTD_INV_DESC_PIOTLB_RSVD_VAL0 0xfff000000000ffc0ULL +#define VTD_INV_DESC_PIOTLB_RSVD_VAL1 0xf80ULL + +#define VTD_INV_DESC_PIOTLB_PASID(val) (((val) >> 32) & 0xfffffULL) +#define VTD_INV_DESC_PIOTLB_DID(val) (((val) >> 16) & \ + VTD_DOMAIN_ID_MASK) +#define VTD_INV_DESC_PIOTLB_ADDR(val) ((val) & ~0xfffULL) +#define VTD_INV_DESC_PIOTLB_AM(val) ((val) & 0x3fULL) +#define VTD_INV_DESC_PIOTLB_IH(val) (((val) >> 6) & 0x1) + /* Information about page-selective IOTLB invalidate */ struct VTDIOTLBPageInvInfo { uint16_t domain_id; diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 9058be9efd..6aa44b80d6 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -3786,6 +3786,54 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s, return !pc_info.error_happened ? true : false; } +static void vtd_piotlb_pasid_invalidate(IntelIOMMUState *s, + uint16_t domain_id, uint32_t pasid) +{ +} + +static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id, + uint32_t pasid, hwaddr addr, uint8_t am, + bool ih) +{ +} + +static bool vtd_process_piotlb_desc(IntelIOMMUState *s, + VTDInvDesc *inv_desc) +{ + uint16_t domain_id; + uint32_t pasid; + uint8_t am; + hwaddr addr; + + if ((inv_desc->val[0] & VTD_INV_DESC_PIOTLB_RSVD_VAL0) || + (inv_desc->val[1] & VTD_INV_DESC_PIOTLB_RSVD_VAL1)) { + error_report_once("non-zero-field-in-piotlb_inv_desc hi: 0x%" PRIx64 + " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]); + return false; + } + + domain_id = VTD_INV_DESC_PIOTLB_DID(inv_desc->val[0]); + pasid = VTD_INV_DESC_PIOTLB_PASID(inv_desc->val[0]); + switch (inv_desc->val[0] & VTD_INV_DESC_IOTLB_G) { + case VTD_INV_DESC_PIOTLB_ALL_IN_PASID: + vtd_piotlb_pasid_invalidate(s, domain_id, pasid); + break; + + case VTD_INV_DESC_PIOTLB_PSI_IN_PASID: + am = VTD_INV_DESC_PIOTLB_AM(inv_desc->val[1]); + addr = (hwaddr) VTD_INV_DESC_PIOTLB_ADDR(inv_desc->val[1]); + vtd_piotlb_page_invalidate(s, domain_id, pasid, addr, am, + VTD_INV_DESC_PIOTLB_IH(inv_desc->val[1])); + break; + + default: + error_report_once("Invalid granularity in P-IOTLB desc hi: 0x%" PRIx64 + " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]); + return false; + } + return true; +} + static bool vtd_process_inv_iec_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) { @@ -3895,6 +3943,10 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s) break; case VTD_INV_DESC_PIOTLB: + trace_vtd_inv_desc("p-iotlb", inv_desc.val[1], inv_desc.val[0]); + if (!vtd_process_piotlb_desc(s, &inv_desc)) { + return false; + } break; case VTD_INV_DESC_WAIT: From patchwork Mon Jan 15 10:37:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519522 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 07F65C4707C for ; Mon, 15 Jan 2024 10:43:43 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKOb-0002ls-ET; Mon, 15 Jan 2024 05:40:45 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKOZ-0002VE-D9 for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:43 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKOX-0003GV-Fn for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:43 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315241; x=1736851241; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4mJOnaIw6qtr7WL6aK18qEAf6e2n7DTWjVvJ4h8laqw=; b=KnGqQnZT7N3yxFID1qCJP/71tnA4HGKhX06f16Jp6doB5l00KkgSjV1F 2KJh5L2yLV9tjgo3ghKcYVPMrc701DvfU56VW6b1UZdY3v6ge5qlW+kfw Px7zmgQFXBK2/S31eBjVEWHSJOaM6y5SlDc5BP/aOCSw8SjcX1mIJJ7ct YLmh9yrCCnrLksU5Pq2w8vyf++nGCxk5BBOcv8hO8xh7HrLX8XJH1RncH zg+rF8owW2FACPtUim1E3j478wcJ7XOFFSclBSnSIdAHViVpfd7Wl2ZF2 LE3eInFQD5Ro274Zjy+WfOsSA7LQMWfsV55bqfE90l7y3tCvi7MpldMin g==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13067776" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13067776" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:40:41 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065404" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065404" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:40:35 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Yi Sun , Zhenzhong Duan , Marcel Apfelbaum , Paolo Bonzini , Richard Henderson , Eduardo Habkost Subject: [PATCH rfcv1 14/23] intel_iommu: propagate PASID-based iotlb invalidation to host Date: Mon, 15 Jan 2024 18:37:26 +0800 Message-Id: <20240115103735.132209-15-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Yi Liu This traps the guest PASID-based iotlb invalidation request and propagate it to host. Intel VT-d 3.0 supports nested translation in PASID granular. Guest SVA support could be implemented by configuring nested translation on specific PASID. This is also known as dual stage DMA translation. Under such configuration, guest owns the GVA->GPA translation which is configured as stage-1 page table in host side for a specific pasid, and host owns GPA->HPA translation. As guest owns stage-1 translation table, piotlb invalidation should be propagated to host since host IOMMU will cache first level page table related mappings during DMA address translation. Signed-off-by: Yi Liu Signed-off-by: Yi Sun Signed-off-by: Zhenzhong Duan --- hw/i386/intel_iommu_internal.h | 7 +++ hw/i386/intel_iommu.c | 103 +++++++++++++++++++++++++++++++++ 2 files changed, 110 insertions(+) diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 40361de207..ed0d5cd99b 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -560,6 +560,13 @@ struct VTDPASIDCacheInfo { }; typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo; +struct VTDPIOTLBInvInfo { + uint16_t domain_id; + uint32_t pasid; + struct iommu_hwpt_vtd_s1_invalidate *inv_data; +}; +typedef struct VTDPIOTLBInvInfo VTDPIOTLBInvInfo; + /* PASID Table Related Definitions */ #define VTD_PASID_DIR_BASE_ADDR_MASK (~0xfffULL) #define VTD_PASID_TABLE_BASE_ADDR_MASK (~0xfffULL) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 6aa44b80d6..2912fc6b88 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -3786,15 +3786,118 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s, return !pc_info.error_happened ? true : false; } +/** + * Caller of this function should hold iommu_lock. + */ +static void vtd_invalidate_piotlb(VTDPASIDAddressSpace *vtd_pasid_as, + struct iommu_hwpt_vtd_s1_invalidate *cache) +{ + VTDIOMMUFDDevice *vtd_idev; + VTDHwpt *hwpt = &vtd_pasid_as->hwpt; + int devfn = vtd_pasid_as->devfn; + struct vtd_as_key key = { + .bus = vtd_pasid_as->bus, + .devfn = devfn, + }; + IntelIOMMUState *s = vtd_pasid_as->iommu_state; + uint32_t entry_num = 1; /* Only implement one request for simplicity */ + + if (!hwpt) { + return; + } + + vtd_idev = g_hash_table_lookup(s->vtd_iommufd_dev, &key); + if (!vtd_idev || !vtd_idev->idev) { + return; + } + if (iommufd_backend_invalidate_cache(vtd_idev->idev->iommufd, hwpt->hwpt_id, + IOMMU_HWPT_INVALIDATE_DATA_VTD_S1, + sizeof(*cache), &entry_num, cache)) { + error_report("Cache flush failed, entry_num %d", entry_num); + } +} + +/** + * This function is a loop function for the s->vtd_pasid_as + * list with VTDPIOTLBInvInfo as execution filter. It propagates + * the piotlb invalidation to host. Caller of this function + * should hold iommu_lock. + */ +static void vtd_flush_pasid_iotlb(gpointer key, gpointer value, + gpointer user_data) +{ + VTDPIOTLBInvInfo *piotlb_info = user_data; + VTDPASIDAddressSpace *vtd_pasid_as = value; + VTDPASIDCacheEntry *pc_entry = &vtd_pasid_as->pasid_cache_entry; + uint16_t did; + + if (!vtd_pe_pgtt_is_flt(&pc_entry->pasid_entry)) { + return; + } + + did = vtd_pe_get_domain_id(&pc_entry->pasid_entry); + + if ((piotlb_info->domain_id == did) && + (piotlb_info->pasid == vtd_pasid_as->pasid)) { + vtd_invalidate_piotlb(vtd_pasid_as, + piotlb_info->inv_data); + } + + /* + * TODO: needs to add QEMU piotlb flush when QEMU piotlb + * infrastructure is ready. For now, it is enough for passthru + * devices. + */ +} + static void vtd_piotlb_pasid_invalidate(IntelIOMMUState *s, uint16_t domain_id, uint32_t pasid) { + struct iommu_hwpt_vtd_s1_invalidate cache_info = { 0 }; + VTDPIOTLBInvInfo piotlb_info; + + cache_info.addr = 0; + cache_info.npages = (uint64_t)-1; + + piotlb_info.domain_id = domain_id; + piotlb_info.pasid = pasid; + piotlb_info.inv_data = &cache_info; + + vtd_iommu_lock(s); + /* + * Here loops all the vtd_pasid_as instances in s->vtd_pasid_as + * to find out the affected devices since piotlb invalidation + * should check pasid cache per architecture point of view. + */ + g_hash_table_foreach(s->vtd_pasid_as, + vtd_flush_pasid_iotlb, &piotlb_info); + vtd_iommu_unlock(s); } static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id, uint32_t pasid, hwaddr addr, uint8_t am, bool ih) { + struct iommu_hwpt_vtd_s1_invalidate cache_info = { 0 }; + VTDPIOTLBInvInfo piotlb_info; + + cache_info.addr = addr; + cache_info.npages = 1 << am; + cache_info.flags = ih ? IOMMU_VTD_INV_FLAGS_LEAF : 0; + + piotlb_info.domain_id = domain_id; + piotlb_info.pasid = pasid; + piotlb_info.inv_data = &cache_info; + + vtd_iommu_lock(s); + /* + * Here loops all the vtd_pasid_as instances in s->vtd_pasid_as + * to find out the affected devices since piotlb invalidation + * should check pasid cache per architecture point of view. + */ + g_hash_table_foreach(s->vtd_pasid_as, + vtd_flush_pasid_iotlb, &piotlb_info); + vtd_iommu_unlock(s); } static bool vtd_process_piotlb_desc(IntelIOMMUState *s, From patchwork Mon Jan 15 10:37:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519518 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 32945C3DA79 for ; Mon, 15 Jan 2024 10:43:24 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKOf-00035v-5O; Mon, 15 Jan 2024 05:40:49 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKOe-000347-Cy for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:48 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKOc-0003GV-Cq for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:48 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315246; x=1736851246; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rzv4rACBfIq9ROOfL34OgDBmAl9ewK2Tn0IPg/RHhrA=; b=HRa/+nS+4B7spO4WKxgQRoB5+h314UpYpShBG0AcuEF47ekYYAbR8P3t vcOeeRSTqFRqCV94avFjnPYf7ZAgtFfx5FSdCHovA5En9Umb+TGEEePH0 OSanmgFNzmKnUuEowdOYPS7oGl/97G57Q0et8/peR/pnKmkVL7xkifwxQ gJ4Kc/Bw2yaumYjE6xSpYsbn6RjAohWF6YxHhra8HuKkWCkP0rhvZ1Aog wujhTaop3XKDwaUj7PcQrP6dSU50yITo16ZWy+bPQBJM7FcEq2YY3kk51 WqbBVAJldkCNG4ik/9n8Z5RL7uBNzs6koowsDjIGmZUKZraGFrL0qwJTO A==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13067850" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13067850" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:40:46 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065425" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065425" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:40:40 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Zhenzhong Duan , Paolo Bonzini , Richard Henderson , Eduardo Habkost , Marcel Apfelbaum Subject: [PATCH rfcv1 15/23] intel_iommu: process PASID-based Device-TLB invalidation Date: Mon, 15 Jan 2024 18:37:27 +0800 Message-Id: <20240115103735.132209-16-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Yi Liu This adds an empty handling for PASID-based Device-TLB invalidation. For now it is enough as it is not necessary to propagate it to host for passthrough device. The reason for an empty device tlb invalidation handling is that iommufd's intel vt-d cache invalidation uapi indicates all caches related to the mapping. So no need to send device tlb. spec says an iotlb invalidation should be issued before issuing device tlb invalidation. This means host should have received a cache invalidation due to guest iotlb invalidation. Hence no need to send another cache invalidation due to device tlb invalidation. Chapter 6.5.2.5: "Since translation requests-without-PASID from a device may be serviced by hardware from the IOTLB, software must always request IOTLB invalidation (iotlb_inv_dsc) before requesting corresponding Device-TLB (dev_tlb_inv_dsc) invalidation." Signed-off-by: Yi Liu Signed-off-by: Zhenzhong Duan --- hw/i386/intel_iommu_internal.h | 1 + hw/i386/intel_iommu.c | 18 ++++++++++++++++++ 2 files changed, 19 insertions(+) diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index ed0d5cd99b..dcf1410fcf 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -380,6 +380,7 @@ typedef union VTDInvDesc VTDInvDesc; #define VTD_INV_DESC_WAIT 0x5 /* Invalidation Wait Descriptor */ #define VTD_INV_DESC_PIOTLB 0x6 /* PASID-IOTLB Invalidate Desc */ #define VTD_INV_DESC_PC 0x7 /* PASID-cache Invalidate Desc */ +#define VTD_INV_DESC_DEV_PIOTLB 0x8 /* PASID-based-DIOTLB inv_desc*/ #define VTD_INV_DESC_NONE 0 /* Not an Invalidate Descriptor */ /* Masks for Invalidation Wait Descriptor*/ diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 2912fc6b88..5e7d445d98 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -3950,6 +3950,17 @@ static bool vtd_process_inv_iec_desc(IntelIOMMUState *s, return true; } +static bool vtd_process_device_piotlb_desc(IntelIOMMUState *s, + VTDInvDesc *inv_desc) +{ + /* + * no need to handle it for passthru device, for emulated + * devices with device tlb, it may be required, but for now, + * return is enough + */ + return true; +} + static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) { @@ -4066,6 +4077,13 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s) } break; + case VTD_INV_DESC_DEV_PIOTLB: + trace_vtd_inv_desc("device-piotlb", inv_desc.hi, inv_desc.lo); + if (!vtd_process_device_piotlb_desc(s, &inv_desc)) { + return false; + } + break; + case VTD_INV_DESC_DEVICE: trace_vtd_inv_desc("device", inv_desc.hi, inv_desc.lo); if (!vtd_process_device_iotlb_desc(s, &inv_desc)) { From patchwork Mon Jan 15 10:37:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519505 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 52729C3DA79 for ; Mon, 15 Jan 2024 10:41:38 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKOo-0003L7-Bk; Mon, 15 Jan 2024 05:40:58 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKOk-0003I9-14 for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:54 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKOi-0003GV-7t for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:53 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315252; x=1736851252; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bRmsL7JKNEi6D3ICQa69K5/fS0ztI+ui4r90d6Oeejc=; b=NTe6SoirvMtEU8DLGUOv6FeHAGTkMPLc/EbAO+cWDj21TNR/ySRn58aJ IsZ4TBzlGSfELA2e90zfAxjB/N0WOrtWhbVLkJ1REuw/IUb/SGX9sL4hI 6lW/8TqESyNZ8dNJrdUlTNZx/jaKWYILKeS1J0NPqQcutPG+QtRrJwAhz B8BBHBY1t2eBjZ/GvGbFIxWNPjQb9FvMOG+INr9LYgEYkQ0qXwcOgQ5Xg Hza8oo1B2rSwPnMXpQnzqccOrSpoAxfK+rtyVROvRR4BzY9mIvvZ0FuOm hyQV4FmpNbG0cB5hlaGNVDVGv0NPeqkW3vK3Fd+5KebPdlD/EdZrhoPjD g==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13067868" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13067868" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:40:51 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065442" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065442" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:40:45 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Yi Sun , Zhenzhong Duan , Paolo Bonzini , Richard Henderson , Eduardo Habkost , Marcel Apfelbaum Subject: [PATCH rfcv1 16/23] intel_iommu: rename slpte in iotlb_entry to pte Date: Mon, 15 Jan 2024 18:37:28 +0800 Message-Id: <20240115103735.132209-17-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Yi Liu Because we support both FST and SST translation, rename slpte in iotlb_entry to pte to make it generic. Signed-off-by: Yi Liu Signed-off-by: Yi Sun Signed-off-by: Zhenzhong Duan --- include/hw/i386/intel_iommu.h | 2 +- hw/i386/intel_iommu.c | 10 +++++----- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h index 72702e10a2..dedaab5ac9 100644 --- a/include/hw/i386/intel_iommu.h +++ b/include/hw/i386/intel_iommu.h @@ -210,7 +210,7 @@ struct VTDIOTLBEntry { uint64_t gfn; uint16_t domain_id; uint32_t pasid; - uint64_t slpte; + uint64_t pte; uint64_t mask; uint8_t access_flags; }; diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 5e7d445d98..7c24f8f677 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -384,7 +384,7 @@ static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t source_id, entry->gfn = gfn; entry->domain_id = domain_id; - entry->slpte = slpte; + entry->pte = slpte; entry->access_flags = access_flags; entry->mask = vtd_slpt_level_page_mask(level); entry->pasid = pasid; @@ -1949,9 +1949,9 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus, if (!rid2pasid) { iotlb_entry = vtd_lookup_iotlb(s, source_id, pasid, addr); if (iotlb_entry) { - trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->slpte, + trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->pte, iotlb_entry->domain_id); - slpte = iotlb_entry->slpte; + slpte = iotlb_entry->pte; access_flags = iotlb_entry->access_flags; page_mask = iotlb_entry->mask; goto out; @@ -2027,9 +2027,9 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus, if (rid2pasid) { iotlb_entry = vtd_lookup_iotlb(s, source_id, pasid, addr); if (iotlb_entry) { - trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->slpte, + trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->pte, iotlb_entry->domain_id); - slpte = iotlb_entry->slpte; + slpte = iotlb_entry->pte; access_flags = iotlb_entry->access_flags; page_mask = iotlb_entry->mask; goto out; From patchwork Mon Jan 15 10:37:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519513 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D23F3C3DA79 for ; Mon, 15 Jan 2024 10:42:59 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKOu-0003bD-DR; Mon, 15 Jan 2024 05:41:05 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKOp-0003RA-JV for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:41:00 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKOn-0003GV-8E for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:40:59 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315257; x=1736851257; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=e7d7tmJy6UgKJguG1e3a0+VD1XweimWsRvnuDM6vvDI=; b=UgJ7FWBdqsLPpUN7EZVr/GOft919yL3440fZiZ1kxjO0Hi83yRVyCSxp VCq1gxnhxf01LM++T53iCy+tPGHZW4VKSHFEyJ+qVVO3q9cLGsF2fyLOG MkYtsu8sBoj8iFo6YKGnfmMMKIJc0Ts8ib+QIYPk4eifWvTgxsBEwjGfW Q5cDL/zro+MvY+E8IAb60l01MCcBhhnZYg0caeW3tmcf65vMNHSsQspeh fR/s1elD6oGZ5/6SHGgIl7faetlbfMLKeP6UMgXuQgFqGy3EwsclL9Mko sEV7VNN3ez4U/JU7g6F5IYSePkFQuVoiypxgs38nwqOynNReTHdtTl0Zj w==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13067884" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13067884" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:40:56 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065453" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065453" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:40:51 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Yi Sun , Zhenzhong Duan , Marcel Apfelbaum , Paolo Bonzini , Richard Henderson , Eduardo Habkost Subject: [PATCH rfcv1 17/23] intel_iommu: implement firt level translation Date: Mon, 15 Jan 2024 18:37:29 +0800 Message-Id: <20240115103735.132209-18-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Yi Liu This adds stage-1 page table walking to support stage-1 only transltion in scalable mode. Signed-off-by: Yi Liu Signed-off-by: Yi Sun Signed-off-by: Zhenzhong Duan --- hw/i386/intel_iommu_internal.h | 16 +++ hw/i386/intel_iommu.c | 242 ++++++++++++++++++++++++++++++++- hw/i386/trace-events | 2 + 3 files changed, 258 insertions(+), 2 deletions(-) diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index dcf1410fcf..41b958cd5d 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -598,6 +598,22 @@ typedef struct VTDPIOTLBInvInfo VTDPIOTLBInvInfo; #define VTD_SM_PASID_ENTRY_WPE_BIT(val) (!!(((val) >> 4) & 1ULL)) #define VTD_SM_PASID_ENTRY_EAFE_BIT(val) (!!(((val) >> 7) & 1ULL)) +#define VTD_PASID_IOTLB_MAX_SIZE 1024 /* Max size of the hash table */ + +/* Paging Structure common */ +#define VTD_FL_PT_PAGE_SIZE_MASK (1ULL << 7) +/* Bits to decide the offset for each level */ +#define VTD_FL_LEVEL_BITS 9 + +/* First Level Paging Structure */ +#define VTD_FL_PT_LEVEL 1 +#define VTD_FL_PT_ENTRY_NR 512 + +/* Masks for First Level Paging Entry */ +#define VTD_FL_RW_MASK (1ULL << 1) +#define VTD_FL_PT_BASE_ADDR_MASK(aw) (~(VTD_PAGE_SIZE - 1) & VTD_HAW_MASK(aw)) +#define VTD_PASID_ENTRY_FPD (1ULL << 1) /* Fault Processing Disable */ + /* Second Level Page Translation Pointer*/ #define VTD_SM_PASID_ENTRY_SLPTPTR (~0xfffULL) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 7c24f8f677..1c21f40ccd 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -78,6 +78,10 @@ static void vtd_pasid_cache_sync(IntelIOMMUState *s, VTDPASIDCacheInfo *pc_info); static void vtd_pasid_cache_devsi(IntelIOMMUState *s, PCIBus *bus, uint16_t devfn); +static VTDPASIDAddressSpace *vtd_add_find_pasid_as(IntelIOMMUState *s, + PCIBus *bus, + int devfn, + uint32_t pasid); static void vtd_panic_require_caching_mode(void) { @@ -1888,6 +1892,114 @@ out: trace_vtd_pt_enable_fast_path(source_id, success); } +/* The shift of an addr for a certain level of paging structure */ +static inline uint32_t vtd_flpt_level_shift(uint32_t level) +{ + assert(level != 0); + return VTD_PAGE_SHIFT_4K + (level - 1) * VTD_FL_LEVEL_BITS; +} + +static inline uint64_t vtd_flpt_level_page_mask(uint32_t level) +{ + return ~((1ULL << vtd_flpt_level_shift(level)) - 1); +} + +static inline dma_addr_t vtd_pe_get_flpt_level(VTDPASIDEntry *pe) +{ + return 4 + ((pe->val[2] >> 2) & VTD_SM_PASID_ENTRY_FLPM); +} + +/* + * Given an iova and the level of paging structure, return the offset + * of current level. + */ +static inline uint32_t vtd_iova_fl_level_offset(uint64_t iova, uint32_t level) +{ + return (iova >> vtd_flpt_level_shift(level)) & + ((1ULL << VTD_FL_LEVEL_BITS) - 1); +} + +/* Get the content of a flpte located in @base_addr[@index] */ +static uint64_t vtd_get_flpte(dma_addr_t base_addr, uint32_t index) +{ + uint64_t flpte; + + assert(index < VTD_FL_PT_ENTRY_NR); + + if (dma_memory_read(&address_space_memory, + base_addr + index * sizeof(flpte), &flpte, + sizeof(flpte), MEMTXATTRS_UNSPECIFIED)) { + flpte = (uint64_t)-1; + return flpte; + } + flpte = le64_to_cpu(flpte); + return flpte; +} + +static inline bool vtd_flpte_present(uint64_t flpte) +{ + return !!(flpte & 0x1); +} + +/* Whether the pte indicates the address of the page frame */ +static inline bool vtd_is_last_flpte(uint64_t flpte, uint32_t level) +{ + return level == VTD_FL_PT_LEVEL || (flpte & VTD_FL_PT_PAGE_SIZE_MASK); +} + +static inline uint64_t vtd_get_flpte_addr(uint64_t flpte, uint8_t aw) +{ + return flpte & VTD_FL_PT_BASE_ADDR_MASK(aw); +} + +/* + * Given the @iova, get relevant @flptep. @flpte_level will be the last level + * of the translation, can be used for deciding the size of large page. + */ +static int vtd_iova_to_flpte(VTDPASIDEntry *pe, uint64_t iova, bool is_write, + uint64_t *flptep, uint32_t *flpte_level, + bool *reads, bool *writes, uint8_t aw_bits) +{ + dma_addr_t addr = vtd_pe_get_flpt_base(pe); + uint32_t level = vtd_pe_get_flpt_level(pe); + uint32_t offset; + uint64_t flpte; + + while (true) { + offset = vtd_iova_fl_level_offset(iova, level); + flpte = vtd_get_flpte(addr, offset); + if (flpte == (uint64_t)-1) { + if (level == VTD_PE_GET_LEVEL(pe)) { + /* Invalid programming of context-entry */ + return -VTD_FR_CONTEXT_ENTRY_INV; + } else { + return -VTD_FR_PAGING_ENTRY_INV; + } + } + + if (!vtd_flpte_present(flpte)) { + *reads = false; + *writes = false; + return -VTD_FR_PAGING_ENTRY_INV; + } + + *reads = true; + *writes = (*writes) && (flpte & VTD_FL_RW_MASK); + if (is_write && !(flpte & VTD_FL_RW_MASK)) { + return -VTD_FR_WRITE; + } + + if (vtd_is_last_flpte(flpte, level)) { + *flptep = flpte; + *flpte_level = level; + return 0; + } + + addr = vtd_get_flpte_addr(flpte, aw_bits); + level--; + } +} + static void vtd_report_fault(IntelIOMMUState *s, int err, bool is_fpd_set, uint16_t source_id, @@ -1904,6 +2016,105 @@ static void vtd_report_fault(IntelIOMMUState *s, } } +/* + * Map dev to pasid-entry then do a paging-structures walk to do a iommu + * translation. + * + * Called from RCU critical section. + * + * @vtd_as: The untranslated address space + * @bus_num: The bus number + * @devfn: The devfn, which is the combined of device and function number + * @is_write: The access is a write operation + * @entry: IOMMUTLBEntry that contain the addr to be translated and result + * + * Returns true if translation is successful, otherwise false. + */ +static bool vtd_do_iommu_fl_translate(VTDAddressSpace *vtd_as, PCIBus *bus, + uint8_t devfn, hwaddr addr, bool is_write, + IOMMUTLBEntry *entry) +{ + IntelIOMMUState *s = vtd_as->iommu_state; + VTDContextEntry ce; + VTDPASIDEntry pe; + uint8_t bus_num = pci_bus_num(bus); + uint64_t flpte, page_mask; + uint32_t level; + uint16_t source_id = PCI_BUILD_BDF(bus_num, devfn); + int ret; + bool is_fpd_set = false; + bool reads = true; + bool writes = true; + uint8_t access_flags; + + /* + * We have standalone memory region for interrupt addresses, we + * should never receive translation requests in this region. + */ + assert(!vtd_is_interrupt_addr(addr)); + + ret = vtd_dev_to_context_entry(s, pci_bus_num(bus), devfn, &ce); + if (ret) { + error_report_once("%s: detected translation failure 1 " + "(dev=%02x:%02x:%02x, iova=0x%" PRIx64 ")", + __func__, pci_bus_num(bus), + VTD_PCI_SLOT(devfn), + VTD_PCI_FUNC(devfn), + addr); + return false; + } + + vtd_iommu_lock(s); + + ret = vtd_ce_get_rid2pasid_entry(s, &ce, &pe, PCI_NO_PASID); + is_fpd_set = pe.val[0] & VTD_PASID_ENTRY_FPD; + if (ret) { + vtd_report_fault(s, -ret, is_fpd_set, source_id, addr, is_write, + false, PCI_NO_PASID); + goto error; + } + + /* + * We don't need to translate for pass-through context entries. + * Also, let's ignore IOTLB caching as well for PT devices. + */ + if (VTD_PE_GET_TYPE(&pe) == VTD_SM_PASID_ENTRY_PT) { + entry->iova = addr & VTD_PAGE_MASK_4K; + entry->translated_addr = entry->iova; + entry->addr_mask = ~VTD_PAGE_MASK_4K; + entry->perm = IOMMU_RW; + vtd_iommu_unlock(s); + return true; + } + + ret = vtd_iova_to_flpte(&pe, addr, is_write, &flpte, &level, + &reads, &writes, s->aw_bits); + if (ret) { + vtd_report_fault(s, -ret, is_fpd_set, source_id, addr, is_write, + false, PCI_NO_PASID); + goto error; + } + + page_mask = vtd_flpt_level_page_mask(level); + access_flags = IOMMU_ACCESS_FLAG(reads, writes); + + vtd_iommu_unlock(s); + + entry->iova = addr & page_mask; + entry->translated_addr = vtd_get_flpte_addr(flpte, s->aw_bits) & page_mask; + entry->addr_mask = ~page_mask; + entry->perm = access_flags; + return true; + +error: + vtd_iommu_unlock(s); + entry->iova = 0; + entry->translated_addr = 0; + entry->addr_mask = 0; + entry->perm = IOMMU_NONE; + return false; +} + /* Map dev to context-entry then do a paging-structures walk to do a iommu * translation. * @@ -4516,10 +4727,37 @@ static IOMMUTLBEntry vtd_iommu_translate(IOMMUMemoryRegion *iommu, hwaddr addr, .target_as = &address_space_memory, }; bool success; + VTDContextEntry ce; + VTDPASIDEntry pe; + int ret = 0; if (likely(s->dmar_enabled)) { - success = vtd_do_iommu_translate(vtd_as, vtd_as->bus, vtd_as->devfn, - addr, flag & IOMMU_WO, &iotlb); + if (s->root_scalable) { + ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus), + vtd_as->devfn, &ce); + ret = vtd_ce_get_rid2pasid_entry(s, &ce, &pe, PCI_NO_PASID); + if (ret) { + error_report_once("%s: detected translation failure 1 " + "(dev=%02x:%02x:%02x, iova=0x%" PRIx64 ")", + __func__, pci_bus_num(vtd_as->bus), + VTD_PCI_SLOT(vtd_as->devfn), + VTD_PCI_FUNC(vtd_as->devfn), + addr); + return iotlb; + } + if (VTD_PE_GET_TYPE(&pe) == VTD_SM_PASID_ENTRY_FLT) { + success = vtd_do_iommu_fl_translate(vtd_as, vtd_as->bus, + vtd_as->devfn, addr, + flag & IOMMU_WO, &iotlb); + } else { + success = vtd_do_iommu_translate(vtd_as, vtd_as->bus, + vtd_as->devfn, addr, + flag & IOMMU_WO, &iotlb); + } + } else { + success = vtd_do_iommu_translate(vtd_as, vtd_as->bus, vtd_as->devfn, + addr, flag & IOMMU_WO, &iotlb); + } } else { /* DMAR disabled, passthrough, use 4k-page*/ iotlb.iova = addr & VTD_PAGE_MASK_4K; diff --git a/hw/i386/trace-events b/hw/i386/trace-events index 66f7c1ba59..00b27bc5b1 100644 --- a/hw/i386/trace-events +++ b/hw/i386/trace-events @@ -33,6 +33,8 @@ vtd_re_not_present(uint8_t bus) "Root entry bus %"PRIu8" not present" vtd_ce_not_present(uint8_t bus, uint8_t devfn) "Context entry bus %"PRIu8" devfn %"PRIu8" not present" vtd_iotlb_page_hit(uint16_t sid, uint64_t addr, uint64_t slpte, uint16_t domain) "IOTLB page hit sid 0x%"PRIx16" iova 0x%"PRIx64" slpte 0x%"PRIx64" domain 0x%"PRIx16 vtd_iotlb_page_update(uint16_t sid, uint64_t addr, uint64_t slpte, uint16_t domain) "IOTLB page update sid 0x%"PRIx16" iova 0x%"PRIx64" slpte 0x%"PRIx64" domain 0x%"PRIx16 +vtd_iotlb_pe_hit(uint32_t pasid, uint64_t val0, uint32_t gen) "IOTLB pasid hit pasid %"PRIu32" val[0] 0x%"PRIx64" gen %"PRIu32 +vtd_iotlb_pe_update(uint32_t pasid, uint64_t val0, uint32_t gen1, uint32_t gen2) "IOTLB pasid update pasid %"PRIu32" val[0] 0x%"PRIx64" gen %"PRIu32" -> gen %"PRIu32 vtd_iotlb_cc_hit(uint8_t bus, uint8_t devfn, uint64_t high, uint64_t low, uint32_t gen) "IOTLB context hit bus 0x%"PRIx8" devfn 0x%"PRIx8" high 0x%"PRIx64" low 0x%"PRIx64" gen %"PRIu32 vtd_iotlb_cc_update(uint8_t bus, uint8_t devfn, uint64_t high, uint64_t low, uint32_t gen1, uint32_t gen2) "IOTLB context update bus 0x%"PRIx8" devfn 0x%"PRIx8" high 0x%"PRIx64" low 0x%"PRIx64" gen %"PRIu32" -> gen %"PRIu32 vtd_iotlb_reset(const char *reason) "IOTLB reset (reason: %s)" From patchwork Mon Jan 15 10:37:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519507 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 82A6BC3DA79 for ; Mon, 15 Jan 2024 10:42:01 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKPE-0004Gd-KN; Mon, 15 Jan 2024 05:41:24 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKPC-000457-84 for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:41:22 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKPA-0003Jt-0C for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:41:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315280; x=1736851280; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fJOX1cjFN0fKo1HlZXBJcFTRrqpVs4CrCib5VoGAUrw=; b=BNKy7UAyRiNmvWJdf0W2y9CShO4twQPwGt38GZFYIwK9U3NNpT1Rcw0o cJbf9gHScml1VwV+V+7DmXBnrveDjUUDqx3YQ6oTsp+drFFhL/J1LFPoe EbxfOpfZ4Zn942kkRUmOLL86c5cbxRGz27s6Un53HMIozbS0RQaOgWkQw 4CMYQHexWN+koVfKMLxd3K+v83FNAcLu0DTl6OdjHwO5AioZdekzTuyt+ /2wHFJCNuMJTQOBWrMwjMW4tjSVI1WbIAqS/ecbwIJ7PnPipJP8xDHxPi kwgbWfC+c2SxtdvO083uEP6wrNDA1ljN35mXTUXD/Wi/ypHXsVVAYGZvc A==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13067929" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13067929" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:41:18 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065579" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065579" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:40:56 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Yu Zhang , Zhenzhong Duan , Marcel Apfelbaum , Paolo Bonzini , Richard Henderson , Eduardo Habkost Subject: [PATCH rfcv1 18/23] intel_iommu: fix the fault reason report Date: Mon, 15 Jan 2024 18:37:30 +0800 Message-Id: <20240115103735.132209-19-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Yu Zhang Currently we use only VTD_FR_PASID_TABLE_INV as fault reaon. Fix this with correct fault reasons listed in VT-d spec 7.2.3. Signed-off-by: Yu Zhang Signed-off-by: Zhenzhong Duan --- hw/i386/intel_iommu_internal.h | 8 ++++++- hw/i386/intel_iommu.c | 42 +++++++++++++++++++++------------- 2 files changed, 33 insertions(+), 17 deletions(-) diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 41b958cd5d..21fa767740 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -325,8 +325,14 @@ typedef enum VTDFaultReason { * request while disabled */ VTD_FR_IR_SID_ERR = 0x26, /* Invalid Source-ID */ - VTD_FR_PASID_TABLE_INV = 0x58, /*Invalid PASID table entry */ + VTD_FR_RTADDR_INV_TTM = 0x31, /* Invalid TTM in RTADDR */ + /* PASID directory entry access failure */ + VTD_FR_PASID_DIR_ACCESS_ERR = 0x50, + /* The Present(P) field of pasid directory entry is 0 */ + VTD_FR_PASID_DIR_ENTRY_P = 0x51, + VTD_FR_PASID_TABLE_ACCESS_ERR = 0x58, /* PASID table entry access failure */ VTD_FR_PASID_ENTRY_P = 0x59, /* The Present(P) field of pasidt-entry is 0 */ + VTD_FR_PASID_TABLE_ENTRY_INV = 0x5b, /*Invalid PASID table entry */ /* Output address in the interrupt address range for scalable mode */ VTD_FR_SM_INTERRUPT_ADDR = 0x87, diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 1c21f40ccd..1e87383a41 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -819,7 +819,7 @@ static int vtd_get_pdire_from_pdir_table(dma_addr_t pasid_dir_base, addr = pasid_dir_base + index * entry_size; if (dma_memory_read(&address_space_memory, addr, pdire, entry_size, MEMTXATTRS_UNSPECIFIED)) { - return -VTD_FR_PASID_TABLE_INV; + return -VTD_FR_PASID_DIR_ACCESS_ERR; } pdire->val = le64_to_cpu(pdire->val); @@ -832,6 +832,11 @@ static inline bool vtd_pe_present(VTDPASIDEntry *pe) return pe->val[0] & VTD_PASID_ENTRY_P; } +static inline uint32_t vtd_pe_get_flpt_level(VTDPASIDEntry *pe) +{ + return 4 + ((pe->val[2] >> 2) & VTD_SM_PASID_ENTRY_FLPM); +} + static int vtd_get_pe_in_pasid_leaf_table(IntelIOMMUState *s, uint32_t pasid, dma_addr_t addr, @@ -840,13 +845,14 @@ static int vtd_get_pe_in_pasid_leaf_table(IntelIOMMUState *s, uint32_t index; dma_addr_t entry_size; X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); + uint8_t pgtt; index = VTD_PASID_TABLE_INDEX(pasid); entry_size = VTD_PASID_ENTRY_SIZE; addr = addr + index * entry_size; if (dma_memory_read(&address_space_memory, addr, pe, entry_size, MEMTXATTRS_UNSPECIFIED)) { - return -VTD_FR_PASID_TABLE_INV; + return -VTD_FR_PASID_TABLE_ACCESS_ERR; } for (size_t i = 0; i < ARRAY_SIZE(pe->val); i++) { pe->val[i] = le64_to_cpu(pe->val[i]); @@ -854,12 +860,17 @@ static int vtd_get_pe_in_pasid_leaf_table(IntelIOMMUState *s, /* Do translation type check */ if (!vtd_pe_type_check(x86_iommu, pe)) { - return -VTD_FR_PASID_TABLE_INV; + return -VTD_FR_PASID_TABLE_ENTRY_INV; } - if (!vtd_is_level_supported(s, VTD_PE_GET_LEVEL(pe))) { - return -VTD_FR_PASID_TABLE_INV; - } + pgtt = VTD_PE_GET_TYPE(pe); + if (pgtt == VTD_SM_PASID_ENTRY_SLT && + !vtd_is_level_supported(s, VTD_PE_GET_LEVEL(pe))) + return -VTD_FR_PASID_TABLE_ENTRY_INV; + + if (pgtt == VTD_SM_PASID_ENTRY_FLT && + vtd_pe_get_flpt_level(pe) != 4) + return -VTD_FR_PASID_TABLE_ENTRY_INV; return 0; } @@ -899,7 +910,7 @@ static int vtd_get_pe_from_pasid_table(IntelIOMMUState *s, } if (!vtd_pdire_present(&pdire)) { - return -VTD_FR_PASID_TABLE_INV; + return -VTD_FR_PASID_DIR_ENTRY_P; } ret = vtd_get_pe_from_pdire(s, pasid, &pdire, pe); @@ -908,7 +919,7 @@ static int vtd_get_pe_from_pasid_table(IntelIOMMUState *s, } if (!vtd_pe_present(pe)) { - return -VTD_FR_PASID_TABLE_INV; + return -VTD_FR_PASID_ENTRY_P; } return 0; @@ -961,7 +972,7 @@ static int vtd_ce_get_pasid_fpd(IntelIOMMUState *s, } if (!vtd_pdire_present(&pdire)) { - return -VTD_FR_PASID_TABLE_INV; + return -VTD_FR_PASID_DIR_ENTRY_P; } /* @@ -1829,7 +1840,11 @@ static const bool vtd_qualified_faults[] = { [VTD_FR_ROOT_ENTRY_RSVD] = false, [VTD_FR_PAGING_ENTRY_RSVD] = true, [VTD_FR_CONTEXT_ENTRY_TT] = true, - [VTD_FR_PASID_TABLE_INV] = false, + [VTD_FR_PASID_DIR_ACCESS_ERR] = false, + [VTD_FR_PASID_DIR_ENTRY_P] = true, + [VTD_FR_PASID_TABLE_ACCESS_ERR] = false, + [VTD_FR_PASID_ENTRY_P] = true, + [VTD_FR_PASID_TABLE_ENTRY_INV] = true, [VTD_FR_SM_INTERRUPT_ADDR] = true, [VTD_FR_MAX] = false, }; @@ -1904,11 +1919,6 @@ static inline uint64_t vtd_flpt_level_page_mask(uint32_t level) return ~((1ULL << vtd_flpt_level_shift(level)) - 1); } -static inline dma_addr_t vtd_pe_get_flpt_level(VTDPASIDEntry *pe) -{ - return 4 + ((pe->val[2] >> 2) & VTD_SM_PASID_ENTRY_FLPM); -} - /* * Given an iova and the level of paging structure, return the offset * of current level. @@ -3499,7 +3509,7 @@ static inline int vtd_dev_get_pe_from_pasid(IntelIOMMUState *s, dma_addr_t pasid_dir_base; if (!s->root_scalable) { - return -VTD_FR_PASID_TABLE_INV; + return -VTD_FR_RTADDR_INV_TTM; } ret = vtd_dev_to_context_entry(s, bus_num, devfn, &ce); From patchwork Mon Jan 15 10:37:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519519 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 16930C3DA79 for ; Mon, 15 Jan 2024 10:43:41 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKPm-0005h4-Gz; Mon, 15 Jan 2024 05:41:58 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKPG-0004TG-9q for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:41:26 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKPC-0003Jt-N0 for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:41:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315283; x=1736851283; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Drqd7XZ0POrpiPaI6AYByololYTsjcc0JUEruQOVOfs=; b=Pj7HObdx8wS0TDXX4KbQDEhuCX9XY2ZStwj3vnMblfgwRDFKJ48e2idO mH/eOk5+K0qyoaU0b387bKpxr2SoKI82JWzw0s8piYsbiyjXysySNRsJS 97hV8QfCETE1PWEiXPZfc0r1/U8m9Pvr7YK4XrjNrXbhi4fWNuu73BMDU 2ICCzayqUQSieT11NePWXoWYBsJtSdW3cW+xx7dSPp8eKoBUsw6zeCdZt KUT9b4Y8fK+7eDmw1LqLRn4ZdVDOaSgytHk7leSromWip3Ng/m/6a46iF i7hnG/ADnzKTBNk09k0PRNdpoLPxQJmu+ZfG4vb+F0c7qBkeD2FBu9JyQ w==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13067942" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13067942" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:41:18 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065582" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065582" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:41:02 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Yi Sun , Zhenzhong Duan , Marcel Apfelbaum , Paolo Bonzini , Richard Henderson , Eduardo Habkost Subject: [PATCH rfcv1 19/23] intel_iommu: introduce pasid iotlb cache Date: Mon, 15 Jan 2024 18:37:31 +0800 Message-Id: <20240115103735.132209-20-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Yi Liu To accelerate stage-1 translation, introduce pasid iotlb cache. Signed-off-by: Yi Liu Signed-off-by: Yi Sun Signed-off-by: Zhenzhong Duan --- hw/i386/intel_iommu_internal.h | 1 + include/hw/i386/intel_iommu.h | 1 + hw/i386/intel_iommu.c | 126 +++++++++++++++++++++++++++++++-- hw/i386/trace-events | 1 + 4 files changed, 124 insertions(+), 5 deletions(-) diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 21fa767740..08701f5457 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -480,6 +480,7 @@ typedef union VTDInvDesc VTDInvDesc; /* Information about page-selective IOTLB invalidate */ struct VTDIOTLBPageInvInfo { + bool is_piotlb; uint16_t domain_id; uint32_t pasid; uint64_t addr; diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h index dedaab5ac9..f3e75263b7 100644 --- a/include/hw/i386/intel_iommu.h +++ b/include/hw/i386/intel_iommu.h @@ -348,6 +348,7 @@ struct IntelIOMMUState { uint32_t context_cache_gen; /* Should be in [1,MAX] */ GHashTable *iotlb; /* IOTLB */ + GHashTable *p_iotlb; /* pasid based IOTLB */ GHashTable *vtd_address_spaces; /* VTD address spaces */ VTDAddressSpace *vtd_as_cache[VTD_PCI_BUS_MAX]; /* VTD address space cache */ diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 1e87383a41..e9480608a5 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -82,6 +82,8 @@ static VTDPASIDAddressSpace *vtd_add_find_pasid_as(IntelIOMMUState *s, PCIBus *bus, int devfn, uint32_t pasid); +static int vtd_dev_get_rid2pasid(IntelIOMMUState *s, uint8_t bus_num, + uint8_t devfn, uint32_t *rid_pasid); static void vtd_panic_require_caching_mode(void) { @@ -297,6 +299,7 @@ static gboolean vtd_hash_remove_by_page(gpointer key, gpointer value, uint64_t gfn = (info->addr >> VTD_PAGE_SHIFT_4K) & info->mask; uint64_t gfn_tlb = (info->addr & entry->mask) >> VTD_PAGE_SHIFT_4K; return (entry->domain_id == info->domain_id) && + (info->is_piotlb ? (entry->pasid == info->pasid) : 1) && (((entry->gfn & info->mask) == gfn) || (entry->gfn == gfn_tlb)); } @@ -333,12 +336,19 @@ static void vtd_reset_iotlb(IntelIOMMUState *s) vtd_iommu_unlock(s); } +static void vtd_reset_piotlb(IntelIOMMUState *s) +{ + assert(s->p_iotlb); + g_hash_table_remove_all(s->p_iotlb); +} + static void vtd_reset_caches(IntelIOMMUState *s) { vtd_iommu_lock(s); vtd_reset_iotlb_locked(s); vtd_reset_context_cache_locked(s); vtd_pasid_cache_reset(s); + vtd_reset_piotlb(s); vtd_iommu_unlock(s); } @@ -2026,6 +2036,63 @@ static void vtd_report_fault(IntelIOMMUState *s, } } +static uint64_t vtd_get_piotlb_gfn(hwaddr addr, uint32_t level) +{ + return (addr & vtd_flpt_level_page_mask(level)) >> VTD_PAGE_SHIFT_4K; +} + +static int vtd_get_piotlb_key(char *key, int key_size, uint64_t gfn, + uint32_t pasid, uint32_t level, + uint16_t source_id) +{ + return snprintf(key, key_size, + "rsv%010dsid%06dpasid%010dgfn%017lldlevel%01d", + 0, source_id, pasid, (unsigned long long int)gfn, level); +} + +static VTDIOTLBEntry *vtd_lookup_piotlb(IntelIOMMUState *s, uint32_t pasid, + hwaddr addr, uint16_t source_id) +{ + VTDIOTLBEntry *entry; + char key[64]; + int level; + + for (level = VTD_SL_PT_LEVEL; level < VTD_SL_PML4_LEVEL; level++) { + vtd_get_piotlb_key(&key[0], 64, vtd_get_piotlb_gfn(addr, level), + pasid, level, source_id); + entry = g_hash_table_lookup(s->p_iotlb, &key[0]); + if (entry) { + goto out; + } + } + +out: + return entry; +} + +static void vtd_update_piotlb(IntelIOMMUState *s, uint32_t pasid, + uint16_t domain_id, hwaddr addr, uint64_t flpte, + uint8_t access_flags, uint32_t level, + uint16_t source_id) +{ + VTDIOTLBEntry *entry = g_malloc(sizeof(*entry)); + char *key = g_malloc(64); + uint64_t gfn = vtd_get_piotlb_gfn(addr, level); + + if (g_hash_table_size(s->p_iotlb) >= VTD_PASID_IOTLB_MAX_SIZE) { + vtd_reset_piotlb(s); + } + + entry->gfn = gfn; + entry->domain_id = domain_id; + entry->pte = flpte; + entry->pasid = pasid; + entry->access_flags = access_flags; + entry->mask = vtd_flpt_level_page_mask(level); + vtd_get_piotlb_key(key, 64, gfn, pasid, level, source_id); + g_hash_table_replace(s->p_iotlb, key, entry); +} + /* * Map dev to pasid-entry then do a paging-structures walk to do a iommu * translation. @@ -2056,6 +2123,8 @@ static bool vtd_do_iommu_fl_translate(VTDAddressSpace *vtd_as, PCIBus *bus, bool reads = true; bool writes = true; uint8_t access_flags; + uint32_t pasid; + VTDIOTLBEntry *piotlb_entry; /* * We have standalone memory region for interrupt addresses, we @@ -2074,8 +2143,30 @@ static bool vtd_do_iommu_fl_translate(VTDAddressSpace *vtd_as, PCIBus *bus, return false; } + /* For emulated device IOVA translation, use RID2PASID. */ + if (vtd_dev_get_rid2pasid(s, pci_bus_num(bus), devfn, &pasid)) { + error_report_once("%s: detected translation failure 2 " + "(dev=%02x:%02x:%02x, iova=0x%" PRIx64 ")", + __func__, pci_bus_num(bus), + VTD_PCI_SLOT(devfn), + VTD_PCI_FUNC(devfn), + addr); + return false; + } + vtd_iommu_lock(s); + /* Try to fetch flpte form IOTLB */ + piotlb_entry = vtd_lookup_piotlb(s, pasid, addr, source_id); + if (piotlb_entry) { + trace_vtd_piotlb_page_hit(source_id, pasid, addr, piotlb_entry->pte, + piotlb_entry->domain_id); + flpte = piotlb_entry->pte; + access_flags = piotlb_entry->access_flags; + page_mask = piotlb_entry->mask; + goto out; + } + ret = vtd_ce_get_rid2pasid_entry(s, &ce, &pe, PCI_NO_PASID); is_fpd_set = pe.val[0] & VTD_PASID_ENTRY_FPD; if (ret) { @@ -2108,6 +2199,9 @@ static bool vtd_do_iommu_fl_translate(VTDAddressSpace *vtd_as, PCIBus *bus, page_mask = vtd_flpt_level_page_mask(level); access_flags = IOMMU_ACCESS_FLAG(reads, writes); + vtd_update_piotlb(s, pasid, vtd_pe_get_domain_id(&pe), addr, flpte, + access_flags, level, source_id); +out: vtd_iommu_unlock(s); entry->iova = addr & page_mask; @@ -3080,6 +3174,7 @@ static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id, trace_vtd_inv_desc_iotlb_pages(domain_id, addr, am); assert(am <= VTD_MAMV); + info.is_piotlb = false; info.domain_id = domain_id; info.addr = addr; info.mask = ~((1 << am) - 1); @@ -4063,12 +4158,16 @@ static void vtd_flush_pasid_iotlb(gpointer key, gpointer value, vtd_invalidate_piotlb(vtd_pasid_as, piotlb_info->inv_data); } +} - /* - * TODO: needs to add QEMU piotlb flush when QEMU piotlb - * infrastructure is ready. For now, it is enough for passthru - * devices. - */ +static gboolean vtd_hash_remove_by_pasid(gpointer key, gpointer value, + gpointer user_data) +{ + VTDIOTLBEntry *entry = (VTDIOTLBEntry *)value; + VTDIOTLBPageInvInfo *info = (VTDIOTLBPageInvInfo *)user_data; + + return ((entry->domain_id == info->domain_id) && + (entry->pasid == info->pasid)); } static void vtd_piotlb_pasid_invalidate(IntelIOMMUState *s, @@ -4076,6 +4175,7 @@ static void vtd_piotlb_pasid_invalidate(IntelIOMMUState *s, { struct iommu_hwpt_vtd_s1_invalidate cache_info = { 0 }; VTDPIOTLBInvInfo piotlb_info; + VTDIOTLBPageInvInfo info; cache_info.addr = 0; cache_info.npages = (uint64_t)-1; @@ -4084,6 +4184,9 @@ static void vtd_piotlb_pasid_invalidate(IntelIOMMUState *s, piotlb_info.pasid = pasid; piotlb_info.inv_data = &cache_info; + info.domain_id = domain_id; + info.pasid = pasid; + vtd_iommu_lock(s); /* * Here loops all the vtd_pasid_as instances in s->vtd_pasid_as @@ -4092,6 +4195,8 @@ static void vtd_piotlb_pasid_invalidate(IntelIOMMUState *s, */ g_hash_table_foreach(s->vtd_pasid_as, vtd_flush_pasid_iotlb, &piotlb_info); + g_hash_table_foreach_remove(s->p_iotlb, vtd_hash_remove_by_pasid, + &info); vtd_iommu_unlock(s); } @@ -4101,6 +4206,7 @@ static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id, { struct iommu_hwpt_vtd_s1_invalidate cache_info = { 0 }; VTDPIOTLBInvInfo piotlb_info; + VTDIOTLBPageInvInfo info; cache_info.addr = addr; cache_info.npages = 1 << am; @@ -4110,6 +4216,12 @@ static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id, piotlb_info.pasid = pasid; piotlb_info.inv_data = &cache_info; + info.is_piotlb = true; + info.domain_id = domain_id; + info.pasid = pasid; + info.addr = addr; + info.mask = ~((1 << am) - 1); + vtd_iommu_lock(s); /* * Here loops all the vtd_pasid_as instances in s->vtd_pasid_as @@ -4118,6 +4230,8 @@ static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id, */ g_hash_table_foreach(s->vtd_pasid_as, vtd_flush_pasid_iotlb, &piotlb_info); + g_hash_table_foreach_remove(s->p_iotlb, + vtd_hash_remove_by_page, &info); vtd_iommu_unlock(s); } @@ -6034,6 +6148,8 @@ static void vtd_realize(DeviceState *dev, Error **errp) /* No corresponding destroy */ s->iotlb = g_hash_table_new_full(vtd_iotlb_hash, vtd_iotlb_equal, g_free, g_free); + s->p_iotlb = g_hash_table_new_full(&g_str_hash, &g_str_equal, + g_free, g_free); s->vtd_address_spaces = g_hash_table_new_full(vtd_as_hash, vtd_as_equal, g_free, g_free); s->vtd_iommufd_dev = g_hash_table_new_full(vtd_as_hash, vtd_as_idev_equal, diff --git a/hw/i386/trace-events b/hw/i386/trace-events index 00b27bc5b1..7c36f34ae8 100644 --- a/hw/i386/trace-events +++ b/hw/i386/trace-events @@ -31,6 +31,7 @@ vtd_pasid_cache_psi(uint16_t domain, uint32_t pasid) "PASID slective PC invalida vtd_pasid_cache_devsi(uint16_t devfn) "Dev selective PC invalidation dev: 0x%"PRIx16 vtd_re_not_present(uint8_t bus) "Root entry bus %"PRIu8" not present" vtd_ce_not_present(uint8_t bus, uint8_t devfn) "Context entry bus %"PRIu8" devfn %"PRIu8" not present" +vtd_piotlb_page_hit(uint16_t sid, uint32_t pasid, uint64_t addr, uint64_t pte, uint16_t domain) "PIOTLB page hit sid 0x%"PRIx16" pasid %"PRIu32" iova 0x%"PRIx64" pte 0x%"PRIx64" domain 0x%"PRIx16 vtd_iotlb_page_hit(uint16_t sid, uint64_t addr, uint64_t slpte, uint16_t domain) "IOTLB page hit sid 0x%"PRIx16" iova 0x%"PRIx64" slpte 0x%"PRIx64" domain 0x%"PRIx16 vtd_iotlb_page_update(uint16_t sid, uint64_t addr, uint64_t slpte, uint16_t domain) "IOTLB page update sid 0x%"PRIx16" iova 0x%"PRIx64" slpte 0x%"PRIx64" domain 0x%"PRIx16 vtd_iotlb_pe_hit(uint32_t pasid, uint64_t val0, uint32_t gen) "IOTLB pasid hit pasid %"PRIu32" val[0] 0x%"PRIx64" gen %"PRIu32 From patchwork Mon Jan 15 10:37:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519514 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5F1F8C4707C for ; Mon, 15 Jan 2024 10:43:00 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKPM-0004xQ-3B; Mon, 15 Jan 2024 05:41:32 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKPF-0004Rz-QS for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:41:26 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKPC-0003K1-1T for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:41:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315282; x=1736851282; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=amPl8s6TQpCYgTRAAM5FeJdm1Fi9iFvRdh4EyUvHCr4=; b=GITWOOaPWsTPGWNS5gGWjzN4lMQEUFDDEtVwjZ4bouiZEuDDdvgIQzJI ycBn3bJ/TKCnsovAlpb5f1EVy0G43TqlEhXcaXLzoLJu6+9xyW6G8/iUx gbFCdoPGsg9vjGXwsfyZ+vre8eAA+USg3aEevjIqK7ZrtupJwrhJoL360 csyFj5om7KcpjzvzODxYBqye9N3f61LRzDB627ni4sdjU19Ec996RHJZS TrCf85GE9ZXYaFn3xSSucBYRYhQdovL1D2F2ztYjDm5KzWInEMJCu1IQC RrilLK3e50qtK8NQ6lVgTpvousGz0nI4fztrWh9fZhxTa+WRvqZIbB6fZ g==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13067936" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13067936" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:41:18 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065584" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065584" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:41:08 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Yi Sun , Zhenzhong Duan , Marcel Apfelbaum , Paolo Bonzini , Richard Henderson , Eduardo Habkost Subject: [PATCH rfcv1 20/23] intel_iommu: piotlb invalidation should notify unmap Date: Mon, 15 Jan 2024 18:37:32 +0800 Message-Id: <20240115103735.132209-21-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Yi Sun This is used by some emulated devices which caches address translation result. When piotlb invalidation issued in guest, those caches should be refreshed. Signed-off-by: Yi Sun Signed-off-by: Zhenzhong Duan --- hw/i386/intel_iommu.c | 56 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index e9480608a5..6a6478e865 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -4176,6 +4176,9 @@ static void vtd_piotlb_pasid_invalidate(IntelIOMMUState *s, struct iommu_hwpt_vtd_s1_invalidate cache_info = { 0 }; VTDPIOTLBInvInfo piotlb_info; VTDIOTLBPageInvInfo info; + VTDAddressSpace *vtd_as; + VTDContextEntry ce; + int ret; cache_info.addr = 0; cache_info.npages = (uint64_t)-1; @@ -4198,6 +4201,33 @@ static void vtd_piotlb_pasid_invalidate(IntelIOMMUState *s, g_hash_table_foreach_remove(s->p_iotlb, vtd_hash_remove_by_pasid, &info); vtd_iommu_unlock(s); + + QLIST_FOREACH(vtd_as, &(s->vtd_as_with_notifiers), next) { + uint32_t rid2pasid = 0; + vtd_dev_get_rid2pasid(s, pci_bus_num(vtd_as->bus), vtd_as->devfn, + &rid2pasid); + ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus), + vtd_as->devfn, &ce); + if (!ret && s->root_scalable && likely(s->dmar_enabled) && + domain_id == vtd_get_domain_id(s, &ce, pasid) && + pasid == rid2pasid && !vtd_as_has_map_notifier(vtd_as)) { + IOMMUNotifier *notifier; + + IOMMU_NOTIFIER_FOREACH(notifier, &vtd_as->iommu) { + IOMMUTLBEvent event; + + event.type = IOMMU_NOTIFIER_UNMAP | + IOMMU_NOTIFIER_DEVIOTLB_UNMAP; + event.entry.target_as = &address_space_memory; + event.entry.iova = notifier->start; + event.entry.perm = IOMMU_NONE; + event.entry.addr_mask = notifier->end - notifier->start; + event.entry.translated_addr = 0; + + memory_region_notify_iommu_one(notifier, &event); + } + } + } } static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id, @@ -4207,6 +4237,10 @@ static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id, struct iommu_hwpt_vtd_s1_invalidate cache_info = { 0 }; VTDPIOTLBInvInfo piotlb_info; VTDIOTLBPageInvInfo info; + VTDAddressSpace *vtd_as; + VTDContextEntry ce; + hwaddr size = (1 << am) * VTD_PAGE_SIZE; + int ret; cache_info.addr = addr; cache_info.npages = 1 << am; @@ -4233,6 +4267,28 @@ static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id, g_hash_table_foreach_remove(s->p_iotlb, vtd_hash_remove_by_page, &info); vtd_iommu_unlock(s); + + QLIST_FOREACH(vtd_as, &(s->vtd_as_with_notifiers), next) { + uint32_t rid2pasid = 0; + vtd_dev_get_rid2pasid(s, pci_bus_num(vtd_as->bus), vtd_as->devfn, + &rid2pasid); + ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus), + vtd_as->devfn, &ce); + if (!ret && s->root_scalable && likely(s->dmar_enabled) && + domain_id == vtd_get_domain_id(s, &ce, pasid) && + pasid == rid2pasid && !vtd_as_has_map_notifier(vtd_as)) { + IOMMUTLBEvent event; + + event.type = IOMMU_NOTIFIER_UNMAP | IOMMU_NOTIFIER_DEVIOTLB_UNMAP; + event.entry.target_as = &address_space_memory; + event.entry.iova = addr; + event.entry.perm = IOMMU_NONE; + event.entry.addr_mask = size - 1; + event.entry.translated_addr = 0; + + memory_region_notify_iommu(&vtd_as->iommu, 0, event); + } + } } static bool vtd_process_piotlb_desc(IntelIOMMUState *s, From patchwork Mon Jan 15 10:37:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519511 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0A27CC3DA79 for ; Mon, 15 Jan 2024 10:42:22 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKPH-0004Ys-Lk; Mon, 15 Jan 2024 05:41:27 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKPF-0004Ry-Pd for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:41:26 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKPD-0003KJ-8x for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:41:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315283; x=1736851283; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=g43ApeKoVxg8huAGHB51CY/ISeM1g4IzIOpG8KYjKF0=; b=ljZfi+kCK8usGI9fGlLypK1xrHidVsfXagKKECyD1jgTWm5MOGRI1hN3 4SwhZmuNZVkWprAN+0sLUg2T+lPAkWbbelFQtzF/X0grWoHy9avzUjfcm RvDZLvCBoXxE098CvjkqUAdk8xHWsRjxHvjM0r80AS5XkVGuE1prw1MGB Blg4k7gNz1ABXeah8AEcI5wEmfwXtECgwkgeeda4pY1uSdRRYwXSDIBsC XyHwLcxLGKvk2t+swC/xWyyEt88juwb7gOP711/xYcVE+sKjYK7xMizNd a6TfsqMh32XSogOHGBNymow/XPRymNXjnBd0aMbjPwy1k0nte+aMkDTe/ g==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13067954" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13067954" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:41:20 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065587" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065587" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:41:14 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Yi Sun , Zhenzhong Duan , Marcel Apfelbaum , Paolo Bonzini , Richard Henderson , Eduardo Habkost Subject: [PATCH rfcv1 21/23] intel_iommu: invalidate piotlb when flush pasid Date: Mon, 15 Jan 2024 18:37:33 +0800 Message-Id: <20240115103735.132209-22-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Yi Sun When bind/unbind emulated devices, we should invalidate QEMU piotlb. Host will flush piotlb for passthrough devices so we don't handle passthrough devices. Signed-off-by: Yi Sun Signed-off-by: Zhenzhong Duan --- hw/i386/intel_iommu.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 6a6478e865..2f3d3a28b0 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -84,6 +84,8 @@ static VTDPASIDAddressSpace *vtd_add_find_pasid_as(IntelIOMMUState *s, uint32_t pasid); static int vtd_dev_get_rid2pasid(IntelIOMMUState *s, uint8_t bus_num, uint8_t devfn, uint32_t *rid_pasid); +static gboolean vtd_hash_remove_by_pasid(gpointer key, gpointer value, + gpointer user_data); static void vtd_panic_require_caching_mode(void) { @@ -3667,14 +3669,21 @@ static gboolean vtd_flush_pasid(gpointer key, gpointer value, VTDPASIDCacheEntry *pc_entry = &vtd_pasid_as->pasid_cache_entry; PCIBus *bus = vtd_pasid_as->bus; VTDPASIDEntry pe; + VTDIOMMUFDDevice *vtd_idev; + VTDIOTLBPageInvInfo info; uint16_t did; uint32_t pasid; uint16_t devfn; int ret; + struct vtd_as_key as_key = { + .bus = vtd_pasid_as->bus, + .devfn = vtd_pasid_as->devfn, + }; did = vtd_pe_get_domain_id(&pc_entry->pasid_entry); pasid = vtd_pasid_as->pasid; devfn = vtd_pasid_as->devfn; + vtd_idev = g_hash_table_lookup(s->vtd_iommufd_dev, &as_key); switch (pc_info->type) { case VTD_PASID_CACHE_FORCE_RESET: @@ -3702,6 +3711,13 @@ static gboolean vtd_flush_pasid(gpointer key, gpointer value, abort(); } + info.domain_id = did; + info.pasid = pasid; + /* For passthrough device, we don't need invalidate QEMU piotlb */ + if (s->root_scalable && likely(s->dmar_enabled) && !vtd_idev) + g_hash_table_foreach_remove(s->p_iotlb, vtd_hash_remove_by_pasid, + &info); + /* * pasid cache invalidation may indicate a present pasid * entry to present pasid entry modification. To cover such @@ -3725,18 +3741,9 @@ static gboolean vtd_flush_pasid(gpointer key, gpointer value, return true; } - /* - * TODO: - * - when pasid-base-iotlb(piotlb) infrastructure is ready, - * should invalidate QEMU piotlb togehter with this change. - */ return false; + remove: - /* - * TODO: - * - when pasid-base-iotlb(piotlb) infrastructure is ready, - * should invalidate QEMU piotlb togehter with this change. - */ if (vtd_bind_guest_pasid(vtd_pasid_as, NULL, VTD_PASID_UNBIND)) { pasid_cache_info_set_error(pc_info); From patchwork Mon Jan 15 10:37:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519509 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E4DFEC4707C for ; Mon, 15 Jan 2024 10:42:03 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKPm-0005fH-AS; Mon, 15 Jan 2024 05:41:58 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKPW-0005Pa-0W for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:41:43 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKPQ-0003Jt-LW for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:41:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315297; x=1736851297; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=RIeEMYmqYjbZcXK1QLKu4BKfZLxL2mTh8GqqfO6TNPo=; b=nUmglYkKpNfEzJX6Is2l2+C8tJrJHy8IZnCFY5VIJZaGHfGovvpG1UJ3 liLO0T8qkJIYTooIWmDd9EqnTxd5Ai2GNXl34EAvBdwyYLj3gdtxpjtNn BV1Mpm/UDQMhHRpC9Z/hYwt53ovUQb0CBBekSvgwX4RlsmQzbSWAbOKTk NmcWaWhy9CAyctQ/5NahzaXUpQZGxQzA7q5KqvqqNOfanBOXJ/nw7v80F WVUjwAA5eB2OxHkCEeDleF8dP8/DwwFvkp0pzMGR0sBuBoJut8DAu6GZv fhsZcLoOzcvKUmL/znSX/cs7+Uv5ibhn+bQs8gWbien6ZDBDw9fvo76+S Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13068038" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13068038" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:41:25 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065594" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065594" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:41:19 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Zhenzhong Duan , Marcel Apfelbaum , Paolo Bonzini , Richard Henderson , Eduardo Habkost Subject: [PATCH rfcv1 22/23] intel_iommu: refresh pasid bind after pasid cache force reset Date: Mon, 15 Jan 2024 18:37:34 +0800 Message-Id: <20240115103735.132209-23-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Yi Liu The force reset clears the vtd_pasid_as and also unbinds the pasid on host side. This is bad when the reset is triggered after some pasid binding is setup. e.g. gcmd.TE enabling will reset cache, but wishes to keep the pasid #0 (gIOVA) binding. So needs to refresh the pasid bind per guest pasid table accordingly. Without this, issue has been observed when legacy device passthrough (e.g. NIC without PASID support). Signed-off-by: Yi Liu Signed-off-by: Zhenzhong Duan --- hw/i386/intel_iommu.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 2f3d3a28b0..e418305f6e 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -72,6 +72,7 @@ struct vtd_iotlb_key { static void vtd_address_space_refresh_all(IntelIOMMUState *s); static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n); +static void vtd_refresh_pasid_bind(IntelIOMMUState *s); static void vtd_pasid_cache_reset(IntelIOMMUState *s); static void vtd_pasid_cache_sync(IntelIOMMUState *s, @@ -3292,6 +3293,7 @@ static void vtd_handle_gcmd_srtp(IntelIOMMUState *s) vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_RTPS); vtd_reset_caches(s); vtd_address_space_refresh_all(s); + vtd_refresh_pasid_bind(s); } /* Set Interrupt Remap Table Pointer */ @@ -3326,6 +3328,7 @@ static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en) vtd_reset_caches(s); vtd_address_space_refresh_all(s); + vtd_refresh_pasid_bind(s); } /* Handle Interrupt Remap Enable/Disable */ @@ -3960,6 +3963,28 @@ static void vtd_replay_guest_pasid_bindings(IntelIOMMUState *s, } } +static void vtd_refresh_pasid_bind(IntelIOMMUState *s) +{ + VTDPASIDCacheInfo pc_info = { .error_happened = false, + .type = VTD_PASID_CACHE_GLOBAL_INV }; + + /* + * Only when dmar is enabled, should pasid bindings replayed, + * otherwise no need to replay. + */ + if (!s->dmar_enabled) { + return; + } + + if (!s->scalable_modern || !s->root_scalable) { + return; + } + + vtd_iommu_lock(s); + vtd_replay_guest_pasid_bindings(s, &pc_info); + vtd_iommu_unlock(s); +} + /* * This function syncs the pasid bindings between guest and host. * It includes updating the pasid cache in vIOMMU and updating the @@ -6051,6 +6076,7 @@ static void vtd_reset(DeviceState *dev) vtd_init(s); vtd_address_space_refresh_all(s); + vtd_refresh_pasid_bind(s); } static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn) From patchwork Mon Jan 15 10:37:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Duan, Zhenzhong" X-Patchwork-Id: 13519510 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 60877C3DA79 for ; Mon, 15 Jan 2024 10:42:10 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPKPp-0006Cu-PN; Mon, 15 Jan 2024 05:42:03 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKPX-0005QA-QT for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:41:48 -0500 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPKPV-0003Kx-FD for qemu-devel@nongnu.org; Mon, 15 Jan 2024 05:41:43 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705315301; x=1736851301; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=N1EJBo0QkyI38oU7hFlhkRvp8L7/8dy6o1+lVicbbU4=; b=YB5klzL5HDimxqOMSCDuWksZV7pArQWQQmsBj9TVJCdalyhsDWmSQJH+ OmAnxlnI9oDgnRX3uuf+po2bdRcv+Dl5AcwwBXmvHWYU7nuFW1EtGhS4F TyJQ9jZlRiYRa5tjmgFGnmmGEJvH7iaDPgZOvHBRouyUnvgtQuE3/mruq mRaNOiq3r41Gj57rNIhx5cD+1SgY4ikHGS6GMtJkvbOobgb2GcxmKKLrR 4T+n/ceC+sPVY8jrPAsFdoBkYe/7PHecA4MXqL3Odpw/BWrU7dBMjvaHL bX/cLdgyRugZfz1NW0mYxBvCslOw289+ERa9hmOK9zmU0Z536ElwDDQlO w==; X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="13068048" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="13068048" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:41:31 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10953"; a="874065600" X-IronPort-AV: E=Sophos;i="6.04,196,1695711600"; d="scan'208";a="874065600" Received: from spr-s2600bt.bj.intel.com ([10.240.192.124]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2024 02:41:25 -0800 From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Yi Sun , Zhenzhong Duan , Marcel Apfelbaum , Paolo Bonzini , Richard Henderson , Eduardo Habkost Subject: [PATCH rfcv1 23/23] intel_iommu: modify x-scalable-mode to be string option Date: Mon, 15 Jan 2024 18:37:35 +0800 Message-Id: <20240115103735.132209-24-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240115103735.132209-1-zhenzhong.duan@intel.com> References: <20240115103735.132209-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.198.163.8; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.758, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Yi Liu Intel VT-d 3.0 introduces scalable mode, and it has a bunch of capabilities related to scalable mode translation, thus there are multiple combinations. While this vIOMMU implementation wants to simplify it for user by providing typical combinations. User could config it by "x-scalable-mode" option. The usage is as below: "-device intel-iommu,x-scalable-mode=["legacy"|"modern"|"off"]" - "legacy": gives support for stage-2 page table - "modern": gives support for stage-1 page table - "off": no scalable mode support - if not configured, means no scalable mode support, if not proper configured, will throw error Signed-off-by: Yi Liu Signed-off-by: Yi Sun Signed-off-by: Zhenzhong Duan --- include/hw/i386/intel_iommu.h | 1 + hw/i386/intel_iommu.c | 25 +++++++++++++++++++++++-- 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h index f3e75263b7..9cbd568171 100644 --- a/include/hw/i386/intel_iommu.h +++ b/include/hw/i386/intel_iommu.h @@ -320,6 +320,7 @@ struct IntelIOMMUState { bool caching_mode; /* RO - is cap CM enabled? */ bool scalable_mode; /* RO - is Scalable Mode supported? */ + char *scalable_mode_str; /* RO - admin's Scalable Mode config */ bool scalable_modern; /* RO - is modern SM supported? */ bool snoop_control; /* RO - is SNP filed supported? */ diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index e418305f6e..b507112069 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -5111,7 +5111,7 @@ static Property vtd_properties[] = { DEFINE_PROP_UINT8("aw-bits", IntelIOMMUState, aw_bits, VTD_HOST_ADDRESS_WIDTH), DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE), - DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState, scalable_mode, FALSE), + DEFINE_PROP_STRING("x-scalable-mode", IntelIOMMUState, scalable_mode_str), DEFINE_PROP_BOOL("snoop-control", IntelIOMMUState, snoop_control, false), DEFINE_PROP_BOOL("x-pasid-mode", IntelIOMMUState, pasid, false), DEFINE_PROP_BOOL("dma-drain", IntelIOMMUState, dma_drain, true), @@ -6122,7 +6122,28 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) } } - /* Currently only address widths supported are 39 and 48 bits */ + if (s->scalable_mode_str && + (strcmp(s->scalable_mode_str, "off") && + strcmp(s->scalable_mode_str, "modern") && + strcmp(s->scalable_mode_str, "legacy"))) { + error_setg(errp, "Invalid x-scalable-mode config," + "Please use \"modern\", \"legacy\" or \"off\""); + return false; + } + + if (s->scalable_mode_str && + !strcmp(s->scalable_mode_str, "legacy")) { + s->scalable_mode = true; + s->scalable_modern = false; + } else if (s->scalable_mode_str && + !strcmp(s->scalable_mode_str, "modern")) { + s->scalable_mode = true; + s->scalable_modern = true; + } else { + s->scalable_mode = false; + s->scalable_modern = false; + } + if ((s->aw_bits != VTD_HOST_AW_39BIT) && (s->aw_bits != VTD_HOST_AW_48BIT) && !s->scalable_modern) {