From patchwork Mon Feb 26 14:35:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572285 Received: from mail-pg1-f171.google.com (mail-pg1-f171.google.com [209.85.215.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 66D7612B164; Mon, 26 Feb 2024 14:34:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958065; cv=none; b=Amhe4VuISpLr0NrMsAhQSYsfGyXMbFat45c2mbSPRG+OJ3Xg4nCe0tRABFEdQjDPBDXtjwqPU/NAaqc1GmJHRCjmUKfcTLfo2bPk7adeOc/5LZ2fGzT2F1PjGDBgVo0/wu9/HmmYhXKO9KWNeGmc1jykFvYpX8zmPgadqk8E070= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958065; c=relaxed/simple; bh=ozn07TaEArvs1Yri54YxuA+0YN3Bz75pvh6/2i8lsFA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=MTD50O8dXm/LLMaTgEOLcWoiu6jpLg3WSyCgoezHpRcq/TgcUaCBWSfALaWtrqD6g2M+yEq02Ubnvpj1YAxqypPtCGRsx0x4f7rE8FvjCSQYl1jIBGuiGJEZNgS9t6tNJdrS3QBGc4iI8oycJCmhyPx3jv6mds44TF/N1+TfHqg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ENgbwXwv; arc=none smtp.client-ip=209.85.215.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ENgbwXwv" Received: by mail-pg1-f171.google.com with SMTP id 41be03b00d2f7-5d81b08d6f2so3174932a12.0; Mon, 26 Feb 2024 06:34:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958061; x=1709562861; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Jzf0g7EylvXUWUSksoJXpRrZ6Sj7ZztBGRYmj4aSKcQ=; b=ENgbwXwva8fw65udklVeFvugdH2jEG04j5v44CCvrt3Ed8DO7StTmGT/ZsRKG7uAo7 CtOx45/NXCci7kIdSz5jrIz0ExAJr6CyXldrsAmo71tGQ/cm2vJPrkq6Fh/7Qf+9olHd YYhPAuvGlbVQaS1SRhvNJb6OlrXQnN28PIOiD87B2KvHTbQMRLrmQGAFPiZU4nhBC9KK M6KZkVGRgIuAPZqfDJbyVcAosDd7OiryWz7F3fIGGc+2lq0q95aNbAxcc8M2pxJyWaZ/ EzRCsbCbB2fKlk8Z5ubkrRktwMmYQ0S+PQbhGOSVCgae7PZ/VJtO4xLyAODjD2oC1nhX kBPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958061; x=1709562861; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Jzf0g7EylvXUWUSksoJXpRrZ6Sj7ZztBGRYmj4aSKcQ=; b=XIXVeHyIf/wNtpltJ4HLeMeGYRjNFqr8O2DdTnq6b3o8EiEuIPfmBiTjRhjwOE5WDL EquQ5rvOoLHzUGRVMFL0huW2Pvj9vH8m6z6TOAtQsFqI6Tey6bwBRWG4x6qg+aIZennx I9onrAme2+RhlOTwsyqI9yg+PZduAM5bfWkWkrDGK0vD/TO7vfNlpi/BMiJ/Y6uIecNi v/HI+UhNV/fEZFQB4b5Yi5DIyl43CRbwvTK9KNqggS0XXcADI8rHML+2TZ6iO2lBL8MC KEUikRAHQDFeC0/r26vJGJ0lWjU7tQHmvRrAPkQ73HDwWcPKxqzjPWxt1kpZ3AXc8eox jmCQ== X-Forwarded-Encrypted: i=1; AJvYcCVCQcrQ4slohKsAlwTgAzyMEbpP2aIa8KWQgELB/Y5MLkWxU7n27A4JZCnvkBGlCBBImxgbF2UdYTwq93pljd0f9ZtlppgajZFxjKDsIx3oveQldVx3/gcDK0LgddSA X-Gm-Message-State: AOJu0Yx6rFn9L7pDgIIaJEZ8kmfdO/hkQOaAfAY02EnEcIs/GCCMuhDw dl0v8ASMWEOBZKwSf2PseENXAklqjx6cuuDMezOplx/kKf5qm2b0OTLonLCu X-Google-Smtp-Source: AGHT+IEfcr97t+WPwN7IPRiY9UjCjDczIo66RUsmofZM3m79dA4IdW+6iYrFVuiVJlLQ4H/5nM5VIg== X-Received: by 2002:a17:90a:e550:b0:29a:7fde:7087 with SMTP id ei16-20020a17090ae55000b0029a7fde7087mr5837237pjb.8.1708958060245; Mon, 26 Feb 2024 06:34:20 -0800 (PST) Received: from localhost ([47.88.5.130]) by smtp.gmail.com with ESMTPSA id v14-20020a17090a0c8e00b0029981c0d5c5sm5085666pja.19.2024.02.26.06.34.19 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:34:19 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , SU Hang , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Jonathan Corbet , linux-doc@vger.kernel.org Subject: [RFC PATCH 01/73] KVM: Documentation: Add the specification for PVM Date: Mon, 26 Feb 2024 22:35:18 +0800 Message-Id: <20240226143630.33643-2-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan Add the specification to describe the PVM ABI, which is a new lightweight software-based virtualization for x86. Signed-off-by: Lai Jiangshan Signed-off-by: SU Hang Signed-off-by: Hou Wenlong --- Documentation/virt/kvm/x86/pvm-spec.rst | 989 ++++++++++++++++++++++++ 1 file changed, 989 insertions(+) create mode 100644 Documentation/virt/kvm/x86/pvm-spec.rst diff --git a/Documentation/virt/kvm/x86/pvm-spec.rst b/Documentation/virt/kvm/x86/pvm-spec.rst new file mode 100644 index 000000000000..04d3cf93d99f --- /dev/null +++ b/Documentation/virt/kvm/x86/pvm-spec.rst @@ -0,0 +1,989 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================== +X86 PVM Specification +===================== + +Underlying states +----------------- + +**The PVM guest is only running on the underlying CPU with underlying +CPL=3.** + +The term ``underlying`` refers to the actual hardware architecture +state. For example ``underlying CR3`` is the physic ``CR3`` of the +architecture. On the contrary, ``CR3`` or ``PVM CR3`` is the virtualized +``PVM CR3`` register. Any x86 states or registers in this document are +PVM states or registers unless any of "underlying", "physic", or +"hardware" is used to describe the states or registers. The doc uses +"underlying" mostly to describe the actual hardware architecture. + +When the PVM guest is only running on the underlying CPU, it not only +runs with underlying CPL=3 but also with the following underlying states +and registers: + ++-------------------+--------------------------------------------------+ +| Registers | Values | ++===================+==================================================+ +| Underlying RFLAGS | IOPL=VM=VIF=VIP=0, IF=1, fixed-bit1=1. | ++-------------------+--------------------------------------------------+ +| Underlying CR3 | implementation-defined value, typically | +| | shadows the ``PVM CR3`` with extra pages | +| | mapped including the switcher. | ++-------------------+--------------------------------------------------+ +| Underlying CR0 | PE=PG=WP=ET=NE=AM=MP=1, CD=NW=EM=TS=0 | ++-------------------+--------------------------------------------------+ +| Underlying CR4 | VME=PVI=0, PAE=FSGSBASE=1, | +| | others=implementation-defined | ++-------------------+--------------------------------------------------+ +| Underlying EFER | SCE=LMA=LME=1, NXE=implementation-defined. | ++-------------------+--------------------------------------------------+ +| Underlying GDTR | All Entries with DPL<3 in the table are | +| | hypervisor-defined values. The table must | +| | have entries with DPL=3 for the selectors: | +| | ``__USER32_CS``, ``__USER_CS``, | +| | ``__USER_DS`` (``__USER32_CS`` is | +| | implementation-defined value, | +| | ``__USER_CS``\ =\ ``__USER32_CS``\ +8, | +| | ``__USER_DS``\ =\ ``__USER32_CS``\ +16) | +| | and may have other hypervisor-defined | +| | DPL=3 data entries. Typically a | +| | host-defined CPUNODE entry is also in the | +| | underlying ``GDT`` table for each host CPU | +| | and its content (segment limit) can be | +| | visible to the PVM guest. | ++-------------------+--------------------------------------------------+ +| Underlying TR | implementation-defined, no IOPORT is | +| | allowed. | ++-------------------+--------------------------------------------------+ +| Underlying LDTR | must be NULL | ++-------------------+--------------------------------------------------+ +| Underlying IDT | implementation-defined. All gate entries | +| | are with DPL=0, except for the entries for | +| | vector=3,4 and a vector>32 (for legacy | +| | syscall, normally 0x80) with DPL=3. | ++-------------------+--------------------------------------------------+ +| Underlying CS | loaded with the selector ``__USER_CS`` or | +| | ``__USER32_CS``. | ++-------------------+--------------------------------------------------+ +| Underlying SS | loaded with the selector ``__USER_DS``. | ++-------------------+--------------------------------------------------+ +| Underlying | loaded with the selector NULL or | +| DS/ES/FS/GS | ``__USER_DS`` or other DPL=3 data entries | +| | in the underlying ``GDT`` table. | ++-------------------+--------------------------------------------------+ +| Underlying DR6 | 0xFFFF0FF0, until a hardware #DB is | +| | delivered and the hardware exits the guest | ++-------------------+--------------------------------------------------+ +| Underlying DR7 | ``DR7_GD``\ =0; illegitimate linear | +| | address (see address space separation) in | +| | ``DR0-DR3`` causes the corresponding bits | +| | in the ``underlying DR7`` cleared. | ++-------------------+--------------------------------------------------+ + +In summary, the underlying states are typical x86 states to run +unprivileged software with stricter limitations. + +PVM modes and states +-------------------- + +PVM has three PVM modes and they are modified IA32-e mode with PVM ABI. + +- PVM 64bit supervisor mode: modified X86 64bit supervisor mode with + PVM ABI + +- PVM 64bit user mode: X86 64bit user mode with PVM event handling + +- PVM 32bit compatible user mode: x86 compatible user mode with PVM + event handling + +| A VMM or hypervisor may also support non-PVM mode. They are non-IA32-e + mode or IA32-e compatible kernel mode. +| The specification has nothing to do with any non-PVM mode. But if the + hypervisor or the VMM can not boot the software directly into PVM + mode, the hypervisor can implement non-PVM mode as bootstrap. +| Bootstrapping is implementation-defined. Bootstrapping in non-PVM mode + should conform to pure X86 ABI until it enters X86 64bit supervisor + mode and then the PVM hypervisor changes privilege registers(``CR0``, + ``CR4,`` ``EFER``, ``MSR_STAR``) to conform to PVM mode and transits + it into PVM 64bit supervisor mode. + +States or registers on PVM modes +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + ++-----------------------+----------------------------------------------+ +| Register | Values | ++=======================+==============================================+ +| ``CR3`` and | PVM switches ``CR3`` with | +| MSR_PVM_SWITCH_CR3 | MSR_PVM_SWITCH_CR3 when switching | +| | supervisor/user mode. Hypercall | +| | HC_LOAD_PGTBL can load ``CR3`` and | +| | MSR_PVM_SWITCH_CR3 in one call. It | +| | is recommended software to use | +| | different ``CR3`` for supervisor | +| | and user modes like KPTI. | ++-----------------------+----------------------------------------------+ +| ``CR0`` | PE=PG=WP=ET=NE=AM=MP=1, | +| | CD=NW=EM=TS=0 | ++-----------------------+----------------------------------------------+ +| ``CR4`` | VME/PVI=0; PAE/FSGSBASE=1; | +| | UMIP/PKE/OSXSAVE/OSXMMEXCPT/OSFXSR=host. | +| | PCID is recommended to be set even | +| | if the ``underlying CR4.PCID`` is | +| | not set. SMAP=SMEP=0 and the | +| | corresponding features are | +| | disabled in CPUID leaves. | ++-----------------------+----------------------------------------------+ +| ``EFER`` | SCE=LMA=LME=1; NXE=underlying; | ++-----------------------+----------------------------------------------+ +| ``RFLAGS`` | Mapped to the underlying RFLAGS except for | +| | the RFLAGS.IF. (The underlying RFLAGS.IF | +| | is always 1.) | +| | | +| | IOPL=VM=VIF=VIP=0, fixed-bit1=1. | +| | AC is not recommended to be set in | +| | the supervisor mode. | +| | | +| | The PVM interrupt flag is defined as: | +| | | +| | - the bit 9 of the PVCS::event_flags when in | +| | supervisor mode. | +| | - 1, when in user mode. | +| | - 0, when in supervisor mode if | +| | MSR_PVM_VCPU_CTRL_STRUCT=0. | ++-----------------------+----------------------------------------------+ +| ``GDTR`` | ignored (can be written and read | +| | to get the last written value but | +| | take no effect). The effective PVM | +| | ``GDT`` table can be considered to | +| | be a read-only table consisting of | +| | entries: emulated supervisor mode | +| | ``CS/SS`` and entries in | +| | ``underlying GDT`` with DPL=3. The | +| | hypercall PVM_HC_LOAD_TLS can | +| | modify part of the | +| | ``underlying GDT``. | ++-----------------------+----------------------------------------------+ +| ``TR`` | ignored. Replaced by PVM event | +| | handling | ++-----------------------+----------------------------------------------+ +| ``IDT`` | ignored. Replaced by PVM event | +| | handling | ++-----------------------+----------------------------------------------+ +| ``LDTR`` | ignored. No replacement so it can | +| | be considered disabled. | ++-----------------------+----------------------------------------------+ +| ``CS`` in | emulated. the ``underlying CS`` is | +| supervisor mode | ``__USER_CS``. | ++-----------------------+----------------------------------------------+ +| ``CS`` in | mapped to the ``underlying CS`` | +| user mode | which is ``__USER_CS`` or | +| | ``__USER32_CS`` | ++-----------------------+----------------------------------------------+ +| ``SS`` in | emulated. the ``underlying SS`` is | +| supervisor mode | ``__USER_DS``. | ++-----------------------+----------------------------------------------+ +| ``SS`` in | mapped to the ``underlying SS`` | +| user mode | whose value is ``__USER_DS``. | ++-----------------------+----------------------------------------------+ +| DS/ES/FS/GS | mapped to the | +| | ``underlying DS/ES/FS/GS``, loaded | +| | with the selector NULL or | +| | ``__USER_DS`` or other DPL=3 data | +| | entries in the ``underlying GDT`` | +| | table. | ++-----------------------+----------------------------------------------+ +| interrupt shadow mask | no interrupt shadow mask | ++-----------------------+----------------------------------------------+ +| NMI shadow mask | set when #NMI is delivered and | +| | cleared when and only when | +| | EVENT_RETURN_USER or | +| | EVENT_RETURN_SUPERVISOR | ++-----------------------+----------------------------------------------+ + +MSR_PVM_VCPU_CTRL_STRUCT +~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: + + struct pvm_vcpu_struct { + u64 event_flags; + u32 event_errcode; + u32 event_vector; + u64 cr2; + u64 reserved0[5]; + + u16 user_cs, user_ss; + u32 reserved1; + u64 reserved2; + u64 user_gsbase; + u32 eflags; + u32 pkru; + u64 rip; + u64 rsp; + u64 rcx; + u64 r11; + } + +PVCS::event_flags +^^^^^^^^^^^^^^^^^ + +| ``PVCS::event_flags.IF``\ (bit 9): interrupt enable flag: The flag + is set to respond to maskable external interrupts; and cleared to + inhibit maskable external interrupts. +| The flag works only in supervisor mode. The VCPU always responds to + maskable external interrupts regardless of the value of this flag in + user mode. The flag is unchanged when the VCPU switches + user/supervisor modes, even when handling the synthetic instruction + EVENT_RETURN_USER. The guest is responsible for clearing the flag + before switching to user mode (issuing EVENT_RETURN_USER) to ensure + that the external interrupt is disabled when the VCPU is switched back + from user mode later. + +| ``PVCS::event_flags.IP``\ (bit 8): interrupt pending flag: The + hypervisor sets it if it fails to inject a maskable event to the VCPU + due to the interrupt-enable flag being cleared in supervisor mode. +| The guest is responsible for issuing a hypercall PVM_HC_IRQ_WIN when + the guest sees this bit after setting the PVCS::event_flags.IF. + The hypervisor clears this bit in handling + PVM_HC_IRQ_WIN/IRQ_HLT/EVENT_RETURN_USER/EVENT_RETURN_HYPERVISOR. + +Other bits are reserved (Software should set them to zero). + +PVCS::event_vector, PVCS::event_errcode +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +If the vector event being delivered is from user mode or with vector >= 32 +from supervisor mode ``PVCS::event_vector`` is set to the vector number. And +if the event has an error code, ``PVCS::event_errcode`` is set to the code. + +PVCS::cr2 +^^^^^^^^^ + +If the event being delivered is a page fault (#PF), ``PVCS::cr2`` is set +to be ``CR2`` (the faulting linear address). + +PVCS::user_cs, PVCS::user_ss, PVCS::user_gsbase, PVCS::pkru, PVCS::rsp, PVCS::eflags, PVCS::rip, PVCS::rcx, PVCS::r11 +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +| ``CS``, ``SS``, ``GSBASE``, ``PKRU``, ``RSP``, ``EFLAGS``, ``RIP``, + ``RCX``, and ``R11`` are saved to ``PVCS::user_cs``, + ``PVCS::user_ss``, ``PVCS::user_gsbase``, ``PVCS::pkru``, + ``PVCS::rsp``, ``PVCS::eflags``, ``PVCS::rip``, ``PVCS::rcx``, + ``PVCS::r11`` correspondingly when handling the synthetic instruction + EVENT_RETURN_USER or vice vers when the architecture is switching to + supervisor mode on any event in user mode. +| The value of ``PVCS::user_gsbase`` is semi-canonicalized before being + set to the ``underlying GSBASE`` by adjusting bits 63:N to get the + value of bit N–1, where N is the host’s linear address width (48 or + 57). +| The value of ``PVCS::eflags`` is standardized before setting to the + ``underlying RFLAGS``. IOPL, VM, VIF, and VIP are cleared, and IF and + FIXED1 are set. +| If an event with vector>=32 happens in supervisor mode, ``RSP``, + ``EFLAGS``, ``RIP``, ``RCX``, and ``R11`` are saved to ``PVCS::rsp``, + ``PVCS::eflags``, ``PVCS::rip``, ``PVCS::rcx``, ``PVCS::r11`` + correspondingly. + +TSC MSRs +~~~~~~~~ + +TSC ABI is not settled down yet. + +X86 MSR +~~~~~~~ + +MSR_GS_BASE/MSR_KERNEL_GS_BASE +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``MSR_GS_BASE`` is mapped to the ``underlying GSBASE``. + +The ``MSR_KERNEL_GS_BASE`` is recommended to be synced with +``MSR_GS_BASE`` when in supervisor mode, and supervisor software is +recommended to maintain its version of ``MSR_KERNEL_GS_BASE``, and +``PVCS::user_gsbase`` is recommended to be used on this purpose. + +When the CPU is switching from user mode to supervisor mode, +``PVCS::user_gsbase`` is updated as the value of ``MSR_GS_BASE`` (the +``underlying GSBASE``), and the value of ``MSR_GS_BASE`` is reset to +``MSR_KERNEL_GS_BASE`` atomically at the same time. + +When the CPU is switching from supervisor mode to user mode, +``MSR_KERNEL_GS_BASE`` is normally set with the value of +``MSR_GS_BASE`` (but the hypervisor is allowed to omit this operation +because ``MSR_GS_BASE`` and ``MSR_KERNEL_GS_BASE`` are expected to be +the same when in supervisor), and the ``MSR_GS_BASE`` is loaded with +``PVCS::user_gsbase``. + +WRGSBASE is not recommended to be used in supervisor mode. + +MSR_SYSCALL_MASK +^^^^^^^^^^^^^^^^ + +Ignored, when syscall, ``RFLAGS`` is set to a default value. + +MSR_STAR +^^^^^^^^ + +| ``__USER_CS,`` ``__USER_DS`` derived from it must be the same as + host's ``__USER_CS,`` ``__USER_DS`` and have RPL=3. ``__KERNEL_CS``, + ``__KERNEL_DS`` derived from it must have RPL=0 and be the same value + as the current PVM ``CS`` ``SS`` registers hold respectively. + Otherwise #GP. +| X86 forces RPL for derived ``__USER_CS,`` ``__USER_DS``, + ``__USER32_CS``, ``__KERNEL_CS``, (not ``__KERNEL_DS``) when using + them, so the RPLs can be an arbitrary value. + +MSR_CSTAR, MSR_IA32_SYSENTER_CS/EIP/ESP +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Ignored, the software should use INTn instead for compatibility +syscalls. + +MSR_IA32_PKRS +^^^^^^^^^^^^^ + +See "`Protection Keys <#protection-keys>`__". + +PVM MSRs +~~~~~~~~ + +MSR_PVM_SWITCH_CR3 +^^^^^^^^^^^^^^^^^^ + +Switched with ``CR3`` when mode switching. No TLB request is issued when +mode switching. + +MSR_PVM_EVENT_ENTRY +^^^^^^^^^^^^^^^^^^^ + +| The value is the entry point for vector events from the PVM user mode. +| The value+256 is the entry point for vector events (vector < 32) from + the PVM supervisor mode. +| The value+512 is the entry point for vector events (vector >= 32) from + the PVM supervisor mode. + +MSR_PVM_SUPERVISOR_RSP +^^^^^^^^^^^^^^^^^^^^^^ + +When switching from supervisor mode to user mode, this MSR is +automatically saved with ``RSP`` which is restored from it when +switching back from user mode. + +MSR_PVM_SUPERVISOR_REDZONE +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +When delivering the event from supervisor mode, a fixed-size area +is reserved below the current ``RSP`` and can be safely used by +guest. The size is specified in this MSR. + +MSR_PVM_LINEAR_ADDRESS_RANGE +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +See "`Paging <#paging>`__". + +PML4_INDEX_START, PML4_INDEX_END, PML5_INDEX_START, and PML5_INDEX_END +are encoded in the MSR and they are all 9 bits value with the most +significant bit set: + +- bit 57-63 are all set; bit 48-56: PML5_INDEX_END, bit 56 must be set. +- bit 41-47 are all set; bit 32-40: PML5_INDEX_START, bit 40 must be set. +- bit 25-31 are all set; bit 16-24:PML4_INDEX_END, bit 24 must be set. +- bit 9-15 are all set; bit 0-8:PML4_INDEX_START, bit 8 must be set. + +constraints: + +- 256 <= PML5_INDEX_START < PML5_INDEX_END < 511 +- 256 <= PML4_INDEX_START < PML4_INDEX_END < 511 +- PML5_INDEX_START = PML5_INDEX_END = 0x1FF if the + ``underlying CR4.LA57`` is not set. + +The three legitimate address ranges for PVM virtual addresses: + +:: + + [ (1UL << 48) * (0xFE00 | PML5_INDEX_START), (1UL << 48) * (0xFE00 | PML5_INDEX_END) ) + [ (1UL << 39) * (0x1FFFE00 | PML4_INDEX_START), (1UL << 39) * (0x1FFFE00 | PML4_INDEX_END) ) + Lower half address (canonical address with bit63=0) + +The MSR is initialized as the widest ranges when the CPU is reset. The +ranges should be sub-ranges of these initialized ranges when writing to +the MSR or migration. + +| Pagetable walking is confined to these legitimate address ranges. +| Note: + +- the top 2G is not in the range, so the guest supervisor software should + be PIE kernel. +- Breakpoints (``DR0-DR3``) out of these ranges are not activated in the + underlying DR7. + +MSR_PVM_RETU_RIP, MSR_PVM_RETS_RIP +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The bare SYSCALL instruction staring at ``MSR_PVM_RETU_RIP`` or +``MSR_PVM_RETS_RIP`` is synthetic instructions to return to +user/supervisor mode. See "`PVM Synthetic +Instructions <#pvm-synthetic-instructions>`__" and "`Events and Mode +Changing <#events-and-mode-changing>`__". + +.. pvm-synthetic-instructions: + +PVM Synthetic Instructions +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +PVM_SYNTHETIC_CPUID: invlpg 0xffffffffff4d5650;cpuid +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Works the same as the bare CPUID instruction generally, but it is +ensured to be handled by the PVM hypervisor and reports the corresponding +CPUID results for PVM. + +PVM_SYNTHETIC_CPUID is supposed to not trigger any trap in the real or virtual +x86 kernel mode and is also guaranteed to trigger a trap in the underlying +hardware user mode for the hypervisor emulating it. The hypervisor emulates +both of the basic instructions, while the INVLPG is often emulated as an NOP +since 0xffffffffff4d5650 is normally out of the allowed linear address ranges. + +EVENT_RETURN_SUPERVISOR: SYSCALL instruction starting at MSR_PVM_RETS_RIP +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +EVENT_RETURN_SUPERVISOR instruction returns from supervisor mode to +supervisor mode with the return state on the stack. + +EVENT_RETURN_USER: SYSCALL instruction starting at MSR_PVM_RETU_RIP +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +EVENT_RETURN_USER instruction returns from supervisor mode to user +mode with the return state on the PVCS. + +X86 Instructions with changed behavior +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +CPUID +^^^^^ + +Guest CPUID instruction would get the host's CPUID information normally +(when CPUID faulting is not enabled), and the synthetic instruction +KVM_CPUID is recommended to be used instead in guest supervisor +software. + +SGDT/SIDT/SLDT/STR/SMSW +^^^^^^^^^^^^^^^^^^^^^^^ + +Guest SGDT/SIDT/SLDT/STR/SMSW instructions would get the host's +information. ``CR4.UMIP`` is in effect for guests only when the host +enables it. + +LAR/LSL/VERR/VERW +^^^^^^^^^^^^^^^^^ + +Guest LAR/LSL/VERR/VERW instructions would get segment information from +host ``GDT``. + +STAC/CLAC, SWAPGS, SYSEXIT, SYSRET +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +These instructions are not allowed for PVM supervisor software, using +them would result in unexpected behavior for the guest. + +SYSENTER +^^^^^^^^ + +Results in #GP. + +INT n +^^^^^ + +Only 0x80 and 0x3 are allowed in guests. Other INT n results in #GP. + +RDPKRU/WRPKRU +^^^^^^^^^^^^^ + +When the guest is in supervisor mode, RDPKRU/WRPKRU would access the +``underlying PKRU`` register which is effectively PVM's +``MSR_IA32_PKRS``, so the guest supervisor software should access user +``PKRU`` via ``PVCS::pkru``. + +CPUID leaf +~~~~~~~~~~ + +- Features disabled in the host are also disabled in the guest except for + some specially handled features such as PCID and PKS. + + - PCID can be enabled even host PCID is disabled or the hardware doesn't + support PCID. + - PKS can be enabled if the host ``CR4.PKE`` is set because guest PKS is + handled via hardware PKE. + +- Features that require the hypervisor's handling but are not yet + implemented are disabled in the guest. + +- Some features that require hardware-privileged instructions are + disabled in the guest. + + - XSAVES/XRESTORES/MSR_IA32_XSS is not enabled. + +- Features that require distinguishing U/S pages are disabled in the + guest. + + - SMEP/SMAP is disabled. LASS is also disabled. + +KVM and PVM specific CPUID leafs +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- When CPUID.EAX = KVM_CPUID_SIGNATURE (0x40000000) is entered, the + output CPUID.EAX will be at least 0x40000002 which is + KVM_CPUID_VENDOR_FEATURES (iff the hypervisor is a PVM hypervisor). +- When CPUID.EAX = KVM_CPUID_VENDOR_FEATURES(0x40000002) is entered, + the output CPUID.EAX is PVM features; CPUID.EBX is 0x6d7670 ("pvm"); + CPUID.ECX and CPUID.EDX are reserved (0). + +PVM booting sequence +^^^^^^^^^^^^^^^^^^^^ + +The PVM supervisor software has to relocate itself to conform its +allowed address ranges (See MSR_PVM_LINEAR_ADDRESS_RANGE) and prepare +itself for its special event handling mechanism on booting. + +PVM software can be booted via linux general booting entry points, so +the software must detect whether itself is PVM as early as possible. + +Booting sequence for detecting PVM in 64 bit linux general booting entry: + +- check if the underlying EFLAGS.IF is 1 +- check if the underlying CS.CPL is 3 +- use the synthetic instruction KVM_CPUID to check KVM_CPUID_SIGNATURE + and KVM_CPUID_VENDOR_FEATURES including checking the signature. + +PVM is the first to define such booting sequence, so any later paravirt +hypervisor that can boot a 64 bit linux guest with underlying +EFLAGS.IF==1 and CS.CPL == 3 from the linux general booting entry points +should support the synthetic instruction KVM_CPUID for compatibility. + +.. paging: + +Paging +------ + +PVM MMU has two registers for pagetables: ``CR3`` and ``MSR_PVM_SWITCH_CR3`` +and they are automatically switched on switching user/supervisor modes. +When in supervisor mode, ``CR3`` holds the kernel pagetable and +``MSR_PVM_SWITCH_CR3`` holds the user pagetable. These two pagetables work +in the same way as the two pagetables for KPTI. + +The U/S bit in the paging struct is not always honored in PVM and is +sometimes ignored. User mode software may or may not access the final +page even if it is a supervisor page (in the view of X86). In fact, due +to the lack of legacy segment-based isolation, both the user page and +kernel page in PVM are shadowed as user pages in the underlying +pagetable with only hypervisor pages with the U bit cleared in the +underlying pagetable. + +It is recommended to have no supervisor pages in the user pagetable. (To +make more use of the existing KPTI code, this rule can be relaxed as "it +is recommended that any paging tree should be all supervisor pages or +all user pages in the user pagetable except for the root PGD +pagetable.") + +And the lack of legacy segment-based isolation is also the reason why +PVM has two registers for pagetables and the automatically switching +feature. + +Due to the ignoring U/S bit, some features are disabled in PVM. + +- SMEP is disabled and ``CR4.SMEP`` can not be set. The guest can use + the NX bit for the user pages in the supervisor pagetable to regain + the protection. + +- SMAP is disabled and ``CR4.SMAP`` can not be set. The guest can + emulate it via PKS. + +- PKS feature is changed. Protection Key protection doesn't consider + the U/S bit, it protects all the data access based on the key. The + software should distribute different keys for supervisor pages and + user pages. + +TLB +~~~ + +| TLB entries are considered to be tagged by the root page table (PGD) + pointer. + +- Hypercall HC_TLB_FLUSH_CURENT, HC_TLB_FLUSH, and HC_TLB_LOAD_PGTBL + flush TLB entries based on the tags (PGD of ``CR3`` and + ``MSR_PVM_SWITCH_CR3``). +- ``CR3`` and ``MSR_PVM_SWITCH_CR3`` are swapped on switching + user/supervisor mode but no TLB flushing is performed. +- Writing to ``CR3`` may not flush TLB for ``MSR_PVM_SWITCH_CR3``. +- WRMSR or HC_WRMSR to ``MSR_PVM_SWITCH_CR3`` doesn't flush TLB. +- ``CR4.PCID`` bit is recommended to be set even if the + ``underlying CR4.PCID`` is cleared so that the PVM TLB can be flushed + only on demand. + +Exclusive address ranges +~~~~~~~~~~~~~~~~~~~~~~~~ + +A portion of the upper half of the linear address is separated from +the host kernel and the host doesn't use this separated portion. Only +the address in this separated portion and the lower half is the +guest-allowed linear address. + +.. protection-keys: + +Protection Keys +~~~~~~~~~~~~~~~ + +There are no distinctions between PVM user pages and PVM supervisor +pages in the real hardware. Protection Keys protection protects all data +accesses if enabled. ``CR4.PKE`` enables Protection Keys protection in +user mode while ``CR4.PKS`` enables Protection Keys protection in +supervisor mode. + +``CR4.PKS`` can only be enabled when ``CR4.PKE`` is enabled and +``CR4.PKE`` can only be enabled when the underlying ``CR4.PKE`` is +enabled. + +The ``underlying PKRU`` is the effective protection key register in both +supervisor mode and user mode. + +The supervisor software should distribute different keys for supervisor +mode and user mode so that the PVM ``PKRU`` and ``MSR_IA32_PKRS``\ (in +guest supervisor view) are mapped to the different parts of the +``underlying PKRU`` at the same time. With distributed different keys, +``SUPERVISOR_KEYS_MASK`` can be defined in the guest supervisor. + +- The ``MSR_IA32_PKRS`` (in guest supervisor view) is the + ``underlying PKRU`` masked with ``SUPERVISOR_KEYS_MASK``, and it is + invisible to the hypervisor since ``SUPERVISOR_KEYS_MASK`` is + invisible to the hypervisor. +- ``MSR_IA32_PKRS`` (in hypervisor view) is recommended to be set as the + same as ``MSR_IA32_PKRS`` (in guest supervisor view) before returning + to the user mode so that after the next switchback, the user part of + the ``underlying PKRU`` is access-denied and the supervisor part is + already set properly. + +If host/hardware ``CR4.PKE`` is set: the hypervisor/switcher will do +these no matter what the value of ``CR4.PKE`` or ``CR4.PKS:`` + +- supervisor -> user switching: load the ``underlying PKRU`` with + ``PVCS::pkru`` + +- user -> supervisor switching: save the ``underlying PKRU`` to + ``PVCS::pkru``\ , load the ``underlying PKRU`` with a default value + (0 or ``MSR_IA32_PKRS`` if ``CR4.PKS``). + +SMAP +~~~~ + +| PVM doesn't support SMAP, if the guest supervisor wants to protect + user access, it should use ``CR4.PKS``. + +- The software should distribute different keys for supervisor mode and + user mode. +- ``MSR_IA32_PKRS`` should be set with the user keys as access-denied. +- Events handlers in supervisor mode + + - Save the old ``underlying PKRU`` and set it to ``MSR_IA32_PKRS`` on entry + so that the user part of the ``underlying PKRU`` is access-denied. + - Restore the ``underlying PKRU`` on exit. + +- When accessing to 'PVM user page' in supervisor mode + + - Set the ``underlying PKRU`` to (``MSR_IA32_PKRS`` & + ``SUPERVISOR_KEYS_MASK``) \| ``PVCS::pkru`` + - Restore the ``underlying PKRU`` when after it finishes the access. + + +Events and Mode Changing +------------------------ + +Special Events +~~~~~~~~~~~~~~ + +No DoubleFault +^^^^^^^^^^^^^^ + +#DF is always promoted to TripleFault and brings down the PVM instance. + +Discarded #DB +^^^^^^^^^^^^^ + +When MOV/POP SS from a watched address is followed by any +instruction-trap-induced supervisor mode entries, the MOV/POP SS that +hits the watchpoint will be discarded instead. + +Vector events in user mode +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When vector events happen in user mode, the hypervisor is responsible +for saving guest registers into ``PVCS``, including ``SS``, ``CS``, +``PKRU``, ``GSBASE``, ``RSP``, ``RFLAGS``, ``RIP``, ``RCX``, and +``R11``. + +The PVM hypervisor should also save the event vector into +``PVCS::event_vector`` and the error code in ``PVCS::event_errcode``, +and ``CR2`` into ``PVCS::cr2`` if it is pagefault event. + +No change to ``PVCS::event_flags.IF``\ (bit 9) during delivering any +event in user mode, and the supervisor software is recommended to ensure +it unset. + +Before returning to the guest supervisor, the PVM hypervisor will also +load values to vCPU with the following actions: + +- Inexplicitly load ``CS/SS`` with the value the supervisor expects + from ``MSR_STAR``. + + - The ``underlying CS/SS`` is loaded with host-defined ``__USER_CS`` + and ``__USER_DS``. + +- Switch ``CR3`` with ``MSR_PVM_SWITCH_CR3`` without flushing TLB + + - The ``underlying CR3`` is the actual shadow root page table for + the new ``PVM CR3``. + +- Load ``GSBASE`` with ``MSR_KERNEL_GS_BASE``. + +- Load ``RSP`` with ``MSR_PVM_KERNEL_RSP``. + +- Load ``RIP/RCX`` with ``MSR_PVM_EVENT_ENTRY``. + +- Load ``R11`` with (``X86_EFLAGS_IF`` \| ``X86_EFLAGS_FIXED``). + +- Load ``RFLAGS`` with ``X86_EFLAGS_FIXED``. + + - The ``underlying RFLAGS`` is the same as ``R11`` which is + (``X86_EFLAGS_IF`` \| ``X86_EFLAGS_FIXED``). + +Vector events in supervisor mode +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The hypervisor handles vector events differently based on the vector +and there is no IST stacks. + +The hypervisor handles vector events occurring in supervisor mode with +vector number < 32 as these uninterruptible steps: + +- Subtract the fixed size (MSR_PVM_SUPERVISOR_REDZONE) from RSP. +- Align RSP down to a 16-byte boundary. +- Push R11 +- Push Rcx +- Push SS +- Push original RSP +- Push RFLAGS + + - ``RFLAGS.IF`` comes from ``PVCS::event_flags.IF`` (bit 9), + which means the pushed ``RFLAGS`` is ``(underlying RFLAGS ~ + X86_EFLAGS_IF) | (PVCS::event_flags & X86_EFLAGS_IF)`` + +- Push CS +- Push RIP +- Push vector (4 bytes), ERRCODE (4 bytes) +- If it is pagefault, save CR2 into PVCS:cr2 +- No change to ``CS/SS.`` +- Load ``RSP`` with the result after the last push as described above. +- Load ``R11`` with (``X86_EFLAGS_IF`` \| ``X86_EFLAGS_FIXED``). +- Load ``RFLAGS`` with ``X86_EFLAGS_FIXED``. + + - The ``underlying RFLAGS`` is the same as ``R11`` which is + (``X86_EFLAGS_IF`` \| ``X86_EFLAGS_FIXED``). + - PVCS::event_flags.IF will be cleared if it is previously set. + +- Load ``RIP/RCX`` with ``MSR_PVM_EVENT_ENTRY``\ +256 + +The hypervisor handles vector events occurring in supervisor mode with +vector number => 32 as these uninterruptible steps: + +- Save R11,RCX,RSP,EFLAGS,RIP to PVCS. +- Save the vector number to PVCS:event_vector. +- No change to ``CS/SS.`` +- Subtract the fixed size (MSR_PVM_SUPERVISOR_REDZONE) from RSP. +- Load RSP with the current RSP value aligned down to a 16-byte boundary. +- Load ``R11`` with (``X86_EFLAGS_IF`` \| ``X86_EFLAGS_FIXED``). +- Load ``RFLAGS`` with ``X86_EFLAGS_FIXED`` + + - The ``underlying RFLAGS`` is the same as ``R11`` which is + (``X86_EFLAGS_IF`` \| ``X86_EFLAGS_FIXED``). + - PVCS::event_flags.IF will be cleared if it is previously set. + +- Load ``RIP/RCX`` with ``MSR_PVM_EVENT_ENTRY``\ +512 + +User SYSCALL event +~~~~~~~~~~~~~~~~~~ + +SYSCALL instruction in PVM user mode is a user SYSCALL event and the +hypervisor handles it almost as the same as vector events in user mode +except that no change to ``PVCS::event_vector``, ``PVCS::event_errcode`` +and ``PVCS::cr2`` and ``RIP/RCX`` is loaded with ``MSR_LSTAR``. + +Specifically, the hypervisor saves guest registers into ``PVCS``, +including ``SS``, ``CS``, ``PKRU``, ``GSBASE``, ``RSP``, ``RFLAGS``, +RIP, ``RCX``, and ``R11``, and loads values to vCPU with the following +actions: + +- Inexplicitly load ``CS/SS`` with the value the supervisor expects + from ``MSR_STAR``. + + - The ``underlying CS/SS`` is loaded with host-defined ``__USER_CS`` + and ``__USER_DS``. + +- Switch ``CR3`` with ``MSR_PVM_SWITCH_CR3`` without flushing TLB + + - The ``underlying CR3`` is the actual shadow root page table for + the new ``PVM CR3``. + +- Load ``GSBASE`` with ``MSR_KERNEL_GS_BASE``. +- Load ``RSP`` with ``MSR_PVM_KERNEL_RSP``. +- Load ``RIP/RCX`` with ``MSR_LSTAR``. +- Load ``R11`` with (``X86_EFLAGS_IF`` \| ``X86_EFLAGS_FIXED``). +- Load ``RFLAGS`` with ``X86_EFLAGS_FIXED``. + + - The ``underlying RFLAGS`` is the same as ``R11`` which is + (``X86_EFLAGS_IF`` \| ``X86_EFLAGS_FIXED``). + - No change to ``PVCS::event_flags.IF``\ (bit 9) during delivering + the SYSCALL event, and the supervisor software is recommended to + ensure it unset. + + +Synthetic Instruction: EVENT_RETURN_USER +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This synthetic instruction is the only way for the PVM supervisor to +switch to user mode. + +It works as the opposite operations of the event in user mode: load +``CS``, ``SS``, ``GSBASE``, ``PKRU``, ``RSP``, ``RFLAGS``, RIP, ``RCX``, +and ``R11`` from the ``PVCS`` respectively with some conversions to +``GSBASE`` and ``RFLAGS``; switch ``CR3`` and ``MSR_PVM_SWITCH_CR3`` and +return to user mode. The origian ``RSP`` is saved into +``MSR_PVM_SUPERVISOR_RSP``. + +No change to ``PVCS::event_flags.IF``\ (bit 9) during handling it +and the supervisor software is recommended to ensure it unset. + +Synthetic Instruction: EVENT_RETURN_SUPERVISOR +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +| Return to PVM supervisor mode. +| Work almost the same as IRETQ instruction except for ``RCX``, ``R11`` and + ``ERRCODE`` are also in the stack. + +It expects the stack frame: + +.. code:: + + R11 + RCX + SS + RSP + RFLAGS + CS + RIP + ERRCODE + +Return to the context with RIP, RFLAGS, RSP, RCX, and R11 restored from the +stack. + +The ``CS/SS`` and ``ERRCODE`` in the stack are ignored and the current PVM +``CS/SS`` are unchanged. + +Hypercall event in supervisor mode +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Except for the synthetic instructions, SYSCALL instructions in PVM +supervisor mode is a HYPERCALL. + +``RAX`` is the request number of the HYPERCALL. Some hypercall request +numbers are PVM-specific HYPERCALLs. Other values are KVM-specific +HYPERCALL. + +HYPERCALL be issued in supervisor software +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +PVM supervisor software saves ``R10``, ``R11`` onto the stack and copies +``RCX`` into ``R10``, and then invokes the SYSCALL instruction. After +the HYPERCALL(SYSCALL instruction) returns, the software should get +``RCX`` from ``R10`` and restore ``R10`` and ``R11`` from the stack. + +Hypercall's behavior should treat ``R10`` as ``RCX`` (in PVM +hypervisor): + +.. code:: + + RCX := R10 + pvm or kvm hypercall handling. + R10 := RCX + +If not specific, the return result is in ``RAX``. + +PVM_HC_LOAD_PGTBL +^^^^^^^^^^^^^^^^^ + +| Parameters: *flags*, *supervisor_pgd*, *user_pgd*. +| Loads the pagetables +| \* flags bit0: flush the new supervisor_pgd and user_pgd. +| \* flags bit1: 4-level(bit1=0) or 5-level(bit1=1 && LA57 is supported + in the VCPU's cpuid features) pagetable, the ``CR4.LA57`` bit is also + changed correspondingly. +| \* supervisor_pgd: set to ``CR3`` +| \* user_pgd: set to ``MSR_PVM_SWITCH_CR3`` + +PVM_HC_IRQ_WIN +^^^^^^^^^^^^^^ + +| No parameters. +| Infos the hypervisor that IRQ is enabled. + +PVM_HC_IRQ_HLT +^^^^^^^^^^^^^^ + +| No parameters. +| Emulates the combination of X86 instructions "STI; HLT;". + +PVM_HC_TLB_FLUSH +^^^^^^^^^^^^^^^^ + +| No parameters. +| Flush all TLB + +PVM_HC_TLB_FLUSH_CURRENT +^^^^^^^^^^^^^^^^^^^^^^^^ + +| No parameters. +| Flush the TLB associated with the current ``PVM CR3`` and + ``MSR_PVM_SWITCH_CR3``. + +PVM_HC_TLB_INVLPG +^^^^^^^^^^^^^^^^^ + +| Parameters: *addr*. +| Emulates INVLPG and Flush the TLB entries of the address. + +PVM_HC_LOAD_GS +^^^^^^^^^^^^^^ + +| Parameters: *gs_sel*. +| Load GS with the selector gs_sel, if it fails, load GS with the NULL + selector. +| Return the resulting GS_BASE. + +PVM_HC_RDMSR +^^^^^^^^^^^^ + +| Parameters: msr_index +| Returns the MSR value or zero if the MSR index is invalid + +PVM_HC_WRMSR +^^^^^^^^^^^^ + +| Parameters: msr_index, msr_value +| return 0 or -EINVAL. + +PVM_HC_LOAD_TLS +^^^^^^^^^^^^^^^ + +| Parameters: gdt_entry0, gdt_entry1, gdt_entry2 +| Rectify gdt_entry0, gdt_entry1, and gdt_entry2 and set them + continuously in the HOST ``GDT``. +| Return HOST ``GDT`` index for *gdt_entry0*. From patchwork Mon Feb 26 14:35:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572286 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C183112C531; Mon, 26 Feb 2024 14:34:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958071; cv=none; b=UuBl+k8kDk7z2DNV711xaKHHB9ZIV1LboVpUTPOAJBhhUigmJPKvCel+mBcGMWXLxV37lhyDfKeU7TYTrkKLc6q7IwDRh4ecQ3dimCuc/oOvkHLWXHz2f6jxx5NR173uUq58xEhnOTHI/4PVcu49vtz24hw4o4t7+nmt/DEiT3Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958071; c=relaxed/simple; bh=EyCcBxyA0tF+AnNctevxSb8iobYKpoudiJi6zYhw30M=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=KYhC4fxviUSjemXg+/yjTsFB3GD1YxVvynkiKRxflATqoWGGS+k0aIT8QYsEGn0TMvRHVQSYLE0aKSds+f/ZnGurOpdPZ5A92UdNpRaSNFkugoWwU7JH6yYWU1Vz187AdMdOTsDAS86nuKtT/Gle6G/55UHsxX1I5BIRBXQS90Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=IzQ+G6t0; arc=none smtp.client-ip=209.85.214.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="IzQ+G6t0" Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-1dc13fb0133so22428045ad.3; Mon, 26 Feb 2024 06:34:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958069; x=1709562869; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=HuI5MqcIFWjqUYnfThcFZp4aPT3F8UKxd0R0fmplqnQ=; b=IzQ+G6t0mYr29jS4Q1GSMRfpfgVwnNTBV9lmBw3Hiia/6VrqS5aZcWEnwrI4YvlGL9 tExqkTam+iSlSIcsCJdkpZA0ujw83l/zcqkSvXV2OxhI3Vr9V2RnpgKMUrEjUnbfhBbw YlwOt6iHf7KyeqVDzLPy1TxyuwIB39CRhY57JoB2Mj2+zKCXzZX6MVCMuVjc6Sx+U1e+ jqXmS+zbIb0rO2G8ldWochMaxB9eRp3i4hPoSgw0b+myAPCt+p9bIsn75NO4m/M4M7Ww Wx4tCZj2tuJ4VDexY3P11VC/i2CrS6QP9W9JZmEkjizmt9ATP1TTnOE47eqQcHk1UyGk cF0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958069; x=1709562869; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HuI5MqcIFWjqUYnfThcFZp4aPT3F8UKxd0R0fmplqnQ=; b=mdW00zKJGNLMkrbRA5gjeZIzDniLHg1jmMu3Vcpi3XvMg/pFwPTzgWV8a1W5IEp684 5nni4vo8HdCBlwll3Uc69wWxcsONqMwYMpybfq0ZeY2DDhP3Co6PvM43GnyJZQ37Nkuo pdaEIOHtJBC6PkIkai2c/dwnkGXh2oKuJ5PMDYgitIA/fcki3NCWbXfoBy0XswS/gWdg /a9t3h16gTHWUtY0ggl+dYs6vDpsf9VSyPmh+ld0rJwKW+SKp1h0wte6lGt9auSfugpo F3nsaTG4eR2CgvMwsW3L2FO/tyHycxlAd4OVWvgQrwuTS20hURHBVg5Ty4WLilTQcsKu iF1A== X-Forwarded-Encrypted: i=1; AJvYcCVa8O31U9w1FzDG/2G4DMF7cviODfD/u1JXtcI13jCyorLXjVn1kcGNMfo2A3KSHOlrY1I6StKitRDzM5wtTdcWiPXE X-Gm-Message-State: AOJu0YwzDLceHjpTpANhgRVqDa79Y3/DspeJ9cKIt92LNvzIg5I4e+GC FFsGHGyPbhxKB5m3Nm+X3V4RU8yPjn37LuG5etagpNRvphnH3UardSVHlNI3 X-Google-Smtp-Source: AGHT+IFW+jKydClu9ScBCGN8mKskTXeGieKmJE+3Qx4uz16yZSsE1D8n78in3nG7xJ/WBYdrpfORSA== X-Received: by 2002:a17:902:d506:b0:1dc:91da:a1c with SMTP id b6-20020a170902d50600b001dc91da0a1cmr4411430plg.50.1708958068799; Mon, 26 Feb 2024 06:34:28 -0800 (PST) Received: from localhost ([47.89.225.180]) by smtp.gmail.com with ESMTPSA id ji22-20020a170903325600b001dc23e877bfsm4012888plb.268.2024.02.26.06.34.28 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:34:28 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 02/73] x86/ABI/PVM: Add PVM-specific ABI header file Date: Mon, 26 Feb 2024 22:35:19 +0800 Message-Id: <20240226143630.33643-3-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan Add a PVM-specific ABI header file to describe the ABI between the guest and hypervisor, which contains the hypercall numbers, virtual MSRS index, and event data structure definitions. This is in preparation for PVM. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/include/uapi/asm/pvm_para.h | 131 +++++++++++++++++++++++++++ include/uapi/Kbuild | 4 + 2 files changed, 135 insertions(+) create mode 100644 arch/x86/include/uapi/asm/pvm_para.h diff --git a/arch/x86/include/uapi/asm/pvm_para.h b/arch/x86/include/uapi/asm/pvm_para.h new file mode 100644 index 000000000000..36aedfa2cabd --- /dev/null +++ b/arch/x86/include/uapi/asm/pvm_para.h @@ -0,0 +1,131 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_ASM_X86_PVM_PARA_H +#define _UAPI_ASM_X86_PVM_PARA_H + +#include + +/* + * The CPUID instruction in PVM guest can't be trapped and emulated, + * so PVM guest should use the following two instructions instead: + * "invlpg 0xffffffffff4d5650; cpuid;" + * + * PVM_SYNTHETIC_CPUID is supposed to not trigger any trap in the real or + * virtual x86 kernel mode and is also guaranteed to trigger a trap in the + * underlying hardware user mode for the hypervisor emulating it. The + * hypervisor emulates both of the basic instructions, while the INVLPG is + * often emulated as an NOP since 0xffffffffff4d5650 is normally out of the + * allowed linear address ranges. + */ +#define PVM_SYNTHETIC_CPUID 0x0f,0x01,0x3c,0x25,0x50, \ + 0x56,0x4d,0xff,0x0f,0xa2 +#define PVM_SYNTHETIC_CPUID_ADDRESS 0xffffffffff4d5650 + +/* + * The vendor signature 'PVM' is returned in ebx. It should be used to + * determine that a VM is running under PVM. + */ +#define PVM_CPUID_SIGNATURE 0x4d5650 + +/* + * PVM virtual MSRS falls in the range 0x4b564df0-0x4b564dff, and it should not + * conflict with KVM, see arch/x86/include/uapi/asm/kvm_para.h + */ +#define PVM_VIRTUAL_MSR_MAX_NR 15 +#define PVM_VIRTUAL_MSR_BASE 0x4b564df0 +#define PVM_VIRTUAL_MSR_MAX (PVM_VIRTUAL_MSR_BASE+PVM_VIRTUAL_MSR_MAX_NR) + +#define MSR_PVM_LINEAR_ADDRESS_RANGE 0x4b564df0 +#define MSR_PVM_VCPU_STRUCT 0x4b564df1 +#define MSR_PVM_SUPERVISOR_RSP 0x4b564df2 +#define MSR_PVM_SUPERVISOR_REDZONE 0x4b564df3 +#define MSR_PVM_EVENT_ENTRY 0x4b564df4 +#define MSR_PVM_RETU_RIP 0x4b564df5 +#define MSR_PVM_RETS_RIP 0x4b564df6 +#define MSR_PVM_SWITCH_CR3 0x4b564df7 + +#define PVM_HC_SPECIAL_MAX_NR (256) +#define PVM_HC_SPECIAL_BASE (0x17088200) +#define PVM_HC_SPECIAL_MAX (PVM_HC_SPECIAL_BASE+PVM_HC_SPECIAL_MAX_NR) + +#define PVM_HC_LOAD_PGTBL (PVM_HC_SPECIAL_BASE+0) +#define PVM_HC_IRQ_WIN (PVM_HC_SPECIAL_BASE+1) +#define PVM_HC_IRQ_HALT (PVM_HC_SPECIAL_BASE+2) +#define PVM_HC_TLB_FLUSH (PVM_HC_SPECIAL_BASE+3) +#define PVM_HC_TLB_FLUSH_CURRENT (PVM_HC_SPECIAL_BASE+4) +#define PVM_HC_TLB_INVLPG (PVM_HC_SPECIAL_BASE+5) +#define PVM_HC_LOAD_GS (PVM_HC_SPECIAL_BASE+6) +#define PVM_HC_RDMSR (PVM_HC_SPECIAL_BASE+7) +#define PVM_HC_WRMSR (PVM_HC_SPECIAL_BASE+8) +#define PVM_HC_LOAD_TLS (PVM_HC_SPECIAL_BASE+9) + +/* + * PVM_EVENT_FLAGS_IP + * - Interrupt enable flag. The flag is set to respond to maskable + * external interrupts; and cleared to inhibit maskable external + * interrupts. + * + * PVM_EVENT_FLAGS_IF + * - interrupt pending flag. The hypervisor sets it if it fails to inject + * a maskable event to the VCPU due to the interrupt-enable flag being + * cleared in supervisor mode. + */ +#define PVM_EVENT_FLAGS_IP_BIT 8 +#define PVM_EVENT_FLAGS_IP _BITUL(PVM_EVENT_FLAGS_IP_BIT) +#define PVM_EVENT_FLAGS_IF_BIT 9 +#define PVM_EVENT_FLAGS_IF _BITUL(PVM_EVENT_FLAGS_IF_BIT) + +#ifndef __ASSEMBLY__ + +/* + * PVM event delivery saves the information about the event and the old context + * into the PVCS structure if the event is from user mode or from supervisor + * mode with vector >=32. And ERETU synthetic instruction reads the return + * state from the PVCS structure to restore the old context. + */ +struct pvm_vcpu_struct { + /* + * This flag is only used in supervisor mode, with only bit 8 and + * bit 9 being valid. The other bits are reserved. + */ + u64 event_flags; + u32 event_errcode; + u32 event_vector; + u64 cr2; + u64 reserved0[5]; + + /* + * For the event from supervisor mode with vector >=32, only eflags, + * rip, rsp, rcx and r11 are saved, and others keep untouched. + */ + u16 user_cs, user_ss; + u32 reserved1; + u64 reserved2; + u64 user_gsbase; + u32 eflags; + u32 pkru; + u64 rip; + u64 rsp; + u64 rcx; + u64 r11; +}; + +/* + * PVM event delivery saves the information about the event and the old context + * on the stack with the following frame format if the event is from supervisor + * mode with vector <32. And ERETS synthetic instruction reads the return state + * with the following frame format from the stack to restore the old context. + */ +struct pvm_supervisor_event { + unsigned long errcode; // vector in high32 + unsigned long rip; + unsigned long cs; + unsigned long rflags; + unsigned long rsp; + unsigned long ss; + unsigned long rcx; + unsigned long r11; +}; + +#endif /* __ASSEMBLY__ */ + +#endif /* _UAPI_ASM_X86_PVM_PARA_H */ diff --git a/include/uapi/Kbuild b/include/uapi/Kbuild index 61ee6e59c930..991848db246b 100644 --- a/include/uapi/Kbuild +++ b/include/uapi/Kbuild @@ -12,3 +12,7 @@ ifeq ($(wildcard $(objtree)/arch/$(SRCARCH)/include/generated/uapi/asm/kvm_para. no-export-headers += linux/kvm_para.h endif endif + +ifeq ($(wildcard $(srctree)/arch/$(SRCARCH)/include/uapi/asm/pvm_para.h),) +no-export-headers += pvm_para.h +endif From patchwork Mon Feb 26 14:35:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572287 Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7725812CDA4; Mon, 26 Feb 2024 14:34:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958082; cv=none; b=TRewOTn8VNC9s/CG/ITxvGXybIHV5+LIaQ78Zxxjkh0y5mgx62fSvTstm+Z3tDpamTd3c1c1U8zWZ+F9F95om30deBNByR5MGISumU6oIfV6ycjktaM44HKwJgeljK85IXovnMcRyTIQHFn57ovP47zUdssLL4Kn5H5Oo6g02E8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958082; c=relaxed/simple; bh=71RmCF0I5O9+DUkzTlfx4tfYMIMIEldy34Q0i64VxNA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=BGYvAkEEC1NFuY8sSSYjlKbnLsQM6X6GdAJGBL8H2y4oQxEB8lGczf5VZe8MEZeRuZS/CZwTvUMjKPRujzXKPfgZbzgpFlNEYetKY9Up158DTSPbSKEdFbFTC/8+QkJsFx3/bqNVzd+FkITWP/Douo7dLVH1SmjS0Z1ZNMOrHu8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MX3bk4qC; arc=none smtp.client-ip=209.85.210.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MX3bk4qC" Received: by mail-pf1-f177.google.com with SMTP id d2e1a72fcca58-6e4670921a4so1635800b3a.0; Mon, 26 Feb 2024 06:34:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958079; x=1709562879; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=W29a9mqzu8vfy19euofChocK8qRaZXs4J5qOV1uE9JI=; b=MX3bk4qCC9uhO1qQvhnqqoqLl2K6xXTRX3clq8Cu7K75irm/F8rYaIVEdoYGsB+WR0 DO2yZ5plz+Y0Ah9ZpqEuYqHzl/mKF1vu31cImcE20iyTnyHPNPHKpmoPmIRtEPZbwKdN 1NdzUL4xrRpjR8AZSKmVOOo+JGbcABkhkbNdb7xs/HeMLYj5X4pnxD0+yFgzyOOHukE7 1kCw9Q5Kny+xl7ox9YKjQSf+iGk/hnLBRyEsYngPW0IffOifxe+gajBsXchfE0tplSs/ ePM+htcLAlUyBdGpupkL22Y2KIyz0TPV6oTTYut0qvutSnV87Z0J0d6FWDHw10J4N8hG Bk5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958079; x=1709562879; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=W29a9mqzu8vfy19euofChocK8qRaZXs4J5qOV1uE9JI=; b=vUG5Xf9UiSer9fK5bz6O0aR74CkQ8TvOPaqBBrsylO0/uhgJ+2hK9wFdcaIyjwanDq Pbc3Pu5uUkgCyfjwmqujIVEF6r2RZ0LQFhdCod18cm3S1ZZ9leAUhO8Cmnnzn16Nz/mf urHrzcmrauHQVdI45A3pbrb8sCXGCU//0k6in4P6LripQLLUXrQuOQmpGjQzSWCk7CnY 5c+T2xMjdlbYvQvlX3gAV8z3lEnS9bwy9AnQ5TCGVdFM4gYheZCSoA6nt0aZn6Ft+aYd gDCcQGidBSbcz4OhMgPnBUP4O0oCNeD/XtjmjsOP8KEVcNMXx48FF7r3E/hWN1+2zvJr jT0w== X-Forwarded-Encrypted: i=1; AJvYcCWZxjwo9lfMSHwDVW5UUfo5/0yD30Du1ADFGna04kRhzlPIbOlsd9ZzD+cBoWnbEkoxhYKIaQkOrxuHBlm2didv4xWe X-Gm-Message-State: AOJu0Yye016zPDhn/m3Gq+ALP5zGDD2wXSqjLdHXQhS46dc8HPyxXBCC cGCzEyHcAPrmV/LjPyA8H2EMma0uUAinqHjUvHbrsjEyiDGe8yVAli/520wj X-Google-Smtp-Source: AGHT+IF2wWZka0uY5mUOiFNqhGVsgbvVnqy5j8m1PdDgUuKtDBcxMJ01/THsuR4Kwz+eCftlt3cqqg== X-Received: by 2002:a05:6a20:9f03:b0:1a0:c470:aacc with SMTP id mk3-20020a056a209f0300b001a0c470aaccmr9918044pzb.21.1708958079333; Mon, 26 Feb 2024 06:34:39 -0800 (PST) Received: from localhost ([47.89.225.180]) by smtp.gmail.com with ESMTPSA id bw11-20020a17090af60b00b0029937256b91sm4562679pjb.7.2024.02.26.06.34.38 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:34:39 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Andy Lutomirski , Dave Hansen , "H. Peter Anvin" , Oleg Nesterov , "Mike Rapoport (IBM)" , Rick Edgecombe , Arnd Bergmann , Brian Gerst , Mateusz Guzik , "Kirill A. Shutemov" , Jacob Pan Subject: [RFC PATCH 03/73] x86/entry: Implement switcher for PVM VM enter/exit Date: Mon, 26 Feb 2024 22:35:20 +0800 Message-Id: <20240226143630.33643-4-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan Since the PVM guest runs in hardware CPL3, the host/guest world switching is similar to userspace/kernelspace switching. Therefore, PVM has decided to reuse the host entries for host/guest world switching. In order to differentiate PVM guests from normal userspace processes, a new flag is introduced to mark that the guest is active. The host entries are then modified to use this flag for handling forwarding. The modified host entries and VM enter path are collectively called the "switcher." In the host entries, if from CPL3 and the flag is set, then it is regarded as VM exit and the handling will be forwarded to the hypervisor. Otherwise, the handling belongs to the host like before. If from CPL0, the handling belongs to the host too. Paranoid entries should save and restore the guest CR3, similar to the save and restore procedure for user CR3 in KPTI. So the switcher is not compatiable with KPTI currently. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/entry/Makefile | 3 + arch/x86/entry/calling.h | 47 ++++++++++- arch/x86/entry/entry_64.S | 68 ++++++++++++++- arch/x86/entry/entry_64_switcher.S | 127 +++++++++++++++++++++++++++++ arch/x86/include/asm/processor.h | 5 ++ arch/x86/include/asm/ptrace.h | 3 + arch/x86/include/asm/switcher.h | 59 ++++++++++++++ arch/x86/kernel/asm-offsets_64.c | 8 ++ arch/x86/kernel/traps.c | 3 + 9 files changed, 315 insertions(+), 8 deletions(-) create mode 100644 arch/x86/entry/entry_64_switcher.S create mode 100644 arch/x86/include/asm/switcher.h diff --git a/arch/x86/entry/Makefile b/arch/x86/entry/Makefile index ca2fe186994b..55dd3f193d99 100644 --- a/arch/x86/entry/Makefile +++ b/arch/x86/entry/Makefile @@ -21,3 +21,6 @@ obj-$(CONFIG_PREEMPTION) += thunk_$(BITS).o obj-$(CONFIG_IA32_EMULATION) += entry_64_compat.o syscall_32.o obj-$(CONFIG_X86_X32_ABI) += syscall_x32.o +ifeq ($(CONFIG_X86_64),y) + obj-y += entry_64_switcher.o +endif diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index c99f36339236..83758019162d 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -142,6 +142,10 @@ For 32-bit we have the following conventions - kernel is built with .endif .endm +.macro SET_NOFLUSH_BIT reg:req + bts $X86_CR3_PCID_NOFLUSH_BIT, \reg +.endm + #ifdef CONFIG_PAGE_TABLE_ISOLATION /* @@ -154,10 +158,6 @@ For 32-bit we have the following conventions - kernel is built with #define PTI_USER_PCID_MASK (1 << PTI_USER_PCID_BIT) #define PTI_USER_PGTABLE_AND_PCID_MASK (PTI_USER_PCID_MASK | PTI_USER_PGTABLE_MASK) -.macro SET_NOFLUSH_BIT reg:req - bts $X86_CR3_PCID_NOFLUSH_BIT, \reg -.endm - .macro ADJUST_KERNEL_CR3 reg:req ALTERNATIVE "", "SET_NOFLUSH_BIT \reg", X86_FEATURE_PCID /* Clear PCID and "PAGE_TABLE_ISOLATION bit", point CR3 at kernel pagetables: */ @@ -284,6 +284,45 @@ For 32-bit we have the following conventions - kernel is built with #endif +#define TSS_extra(field) PER_CPU_VAR(cpu_tss_rw+TSS_EX_##field) + +/* + * Switcher would be disabled when KPTI is enabled. + * + * Ideally, switcher would switch to HOST_CR3 in IST before gsbase is fixed, + * in which case it would use the offset from the IST stack top to the TSS + * in CEA to get the pointer of the TSS. But SEV guest modifies TSS.IST on + * the fly and makes the code non-workable in SEV guest even the switcher + * is not used. + * + * So switcher is marked disabled when KPTI is enabled rather than when + * in SEV guest. + * + * To enable switcher with KPTI, something like Integrated Entry code with + * atomic-IST-entry has to be introduced beforehand. + * + * The current SWITCHER_SAVE_AND_SWITCH_TO_HOST_CR3 is called after gsbase + * is fixed. + */ +.macro SWITCHER_SAVE_AND_SWITCH_TO_HOST_CR3 scratch_reg:req save_reg:req + ALTERNATIVE "", "jmp .Lend_\@", X86_FEATURE_PTI + cmpq $0, TSS_extra(host_rsp) + jz .Lend_\@ + movq %cr3, \save_reg + movq TSS_extra(host_cr3), \scratch_reg + movq \scratch_reg, %cr3 +.Lend_\@: +.endm + +.macro SWITCHER_RESTORE_CR3 scratch_reg:req save_reg:req + ALTERNATIVE "", "jmp .Lend_\@", X86_FEATURE_PTI + cmpq $0, TSS_extra(host_rsp) + jz .Lend_\@ + ALTERNATIVE "", "SET_NOFLUSH_BIT \save_reg", X86_FEATURE_PCID + movq \save_reg, %cr3 +.Lend_\@: +.endm + /* * IBRS kernel mitigation for Spectre_v2. * diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 57fae15b3136..65bfebebeab6 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -278,10 +278,11 @@ SYM_CODE_END(xen_error_entry) /** * idtentry_body - Macro to emit code calling the C function + * @vector: Vector number * @cfunc: C function to be called * @has_error_code: Hardware pushed error code on stack */ -.macro idtentry_body cfunc has_error_code:req +.macro idtentry_body vector cfunc has_error_code:req /* * Call error_entry() and switch to the task stack if from userspace. @@ -297,6 +298,10 @@ SYM_CODE_END(xen_error_entry) ENCODE_FRAME_POINTER UNWIND_HINT_REGS + cmpq $0, TSS_extra(host_rsp) + jne .Lpvm_idtentry_body_\@ +.L_host_idtenrty_\@: + movq %rsp, %rdi /* pt_regs pointer into 1st argument*/ .if \has_error_code == 1 @@ -310,6 +315,25 @@ SYM_CODE_END(xen_error_entry) REACHABLE jmp error_return + +.Lpvm_idtentry_body_\@: + testb $3, CS(%rsp) + /* host exception nested in IST handler while the switcher is active */ + jz .L_host_idtenrty_\@ + + .if \vector < 256 + movl $\vector, ORIG_RAX+4(%rsp) + .else // X86_TRAP_OTHER + /* + * Here are the macros for common_interrupt(), spurious_interrupt(), + * and XENPV entries with the titular vector X86_TRAP_OTHER. XENPV + * entries can't reach here while common_interrupt() and + * spurious_interrupt() have the real vector at ORIG_RAX. + */ + movl ORIG_RAX(%rsp), %eax + movl %eax, ORIG_RAX+4(%rsp) + .endif + jmp switcher_return_from_guest .endm /** @@ -354,7 +378,7 @@ SYM_CODE_START(\asmsym) .Lfrom_usermode_no_gap_\@: .endif - idtentry_body \cfunc \has_error_code + idtentry_body \vector \cfunc \has_error_code _ASM_NOKPROBE(\asmsym) SYM_CODE_END(\asmsym) @@ -427,7 +451,7 @@ SYM_CODE_START(\asmsym) /* Switch to the regular task stack and use the noist entry point */ .Lfrom_usermode_switch_stack_\@: - idtentry_body noist_\cfunc, has_error_code=0 + idtentry_body \vector, noist_\cfunc, has_error_code=0 _ASM_NOKPROBE(\asmsym) SYM_CODE_END(\asmsym) @@ -507,7 +531,7 @@ SYM_CODE_START(\asmsym) /* Switch to the regular task stack */ .Lfrom_usermode_switch_stack_\@: - idtentry_body user_\cfunc, has_error_code=1 + idtentry_body \vector, user_\cfunc, has_error_code=1 _ASM_NOKPROBE(\asmsym) SYM_CODE_END(\asmsym) @@ -919,6 +943,16 @@ SYM_CODE_START(paranoid_entry) FENCE_SWAPGS_KERNEL_ENTRY .Lparanoid_gsbase_done: + /* + * Switch back to kernel cr3 when switcher is active. + * Switcher can't be used when KPTI is enabled by far, so only one of + * SAVE_AND_SWITCH_TO_KERNEL_CR3 and SWITCHER_SAVE_AND_SWITCH_TO_KERNEL_CR3 + * takes effect. SWITCHER_SAVE_AND_SWITCH_TO_KERNEL_CR3 requires + * kernel GSBASE. + * See the comments above SWITCHER_SAVE_AND_SWITCH_TO_HOST_CR3. + */ + SWITCHER_SAVE_AND_SWITCH_TO_HOST_CR3 scratch_reg=%rax save_reg=%r14 + /* * Once we have CR3 and %GS setup save and set SPEC_CTRL. Just like * CR3 above, keep the old value in a callee saved register. @@ -970,6 +1004,15 @@ SYM_CODE_START_LOCAL(paranoid_exit) */ RESTORE_CR3 scratch_reg=%rax save_reg=%r14 + /* + * Switch back to origin cr3 when switcher is active. + * Switcher can't be used when KPTI is enabled by far, so only + * one of RESTORE_CR3 and SWITCHER_RESTORE_CR3 takes effect. + * + * See the comments above SWITCHER_SAVE_AND_SWITCH_TO_HOST_CR3. + */ + SWITCHER_RESTORE_CR3 scratch_reg=%rax save_reg=%r14 + /* Handle the three GSBASE cases */ ALTERNATIVE "jmp .Lparanoid_exit_checkgs", "", X86_FEATURE_FSGSBASE @@ -1158,6 +1201,8 @@ SYM_CODE_START(asm_exc_nmi) FENCE_SWAPGS_USER_ENTRY SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx movq %rsp, %rdx + cmpq $0, TSS_extra(host_rsp) + jne .Lnmi_from_pvm_guest movq PER_CPU_VAR(pcpu_hot + X86_top_of_stack), %rsp UNWIND_HINT_IRET_REGS base=%rdx offset=8 pushq 5*8(%rdx) /* pt_regs->ss */ @@ -1188,6 +1233,21 @@ SYM_CODE_START(asm_exc_nmi) */ jmp swapgs_restore_regs_and_return_to_usermode +.Lnmi_from_pvm_guest: + movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp + UNWIND_HINT_IRET_REGS base=%rdx offset=8 + pushq 5*8(%rdx) /* pt_regs->ss */ + pushq 4*8(%rdx) /* pt_regs->rsp */ + pushq 3*8(%rdx) /* pt_regs->flags */ + pushq 2*8(%rdx) /* pt_regs->cs */ + pushq 1*8(%rdx) /* pt_regs->rip */ + UNWIND_HINT_IRET_REGS + pushq $0 /* pt_regs->orig_ax */ + movl $2, 4(%rsp) /* pt_regs->orig_ax, pvm vector */ + PUSH_AND_CLEAR_REGS rdx=(%rdx) + ENCODE_FRAME_POINTER + jmp switcher_return_from_guest + .Lnmi_from_kernel: /* * Here's what our stack frame will look like: diff --git a/arch/x86/entry/entry_64_switcher.S b/arch/x86/entry/entry_64_switcher.S new file mode 100644 index 000000000000..2b99a46421cc --- /dev/null +++ b/arch/x86/entry/entry_64_switcher.S @@ -0,0 +1,127 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "calling.h" + +.code64 +.section .entry.text, "ax" + +.macro MITIGATION_EXIT + /* Same as user entry. */ + IBRS_EXIT +.endm + +.macro MITIGATION_ENTER + /* + * IMPORTANT: RSB filling and SPEC_CTRL handling must be done before + * the first unbalanced RET after vmexit! + * + * For retpoline or IBRS, RSB filling is needed to prevent poisoned RSB + * entries and (in some cases) RSB underflow. + * + * eIBRS has its own protection against poisoned RSB, so it doesn't + * need the RSB filling sequence. But it does need to be enabled, and a + * single call to retire, before the first unbalanced RET. + */ + FILL_RETURN_BUFFER %rcx, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_VMEXIT, \ + X86_FEATURE_RSB_VMEXIT_LITE + + IBRS_ENTER +.endm + +/* + * switcher_enter_guest - Do a transition to guest mode + * + * Called with guest registers on the top of the sp0 stack and the switcher + * states on cpu_tss_rw.tss_ex. + * + * Returns: + * pointer to pt_regs (on top of sp0 or IST stack) with guest registers. + */ +SYM_FUNC_START(switcher_enter_guest) + pushq %rbp + pushq %r15 + pushq %r14 + pushq %r13 + pushq %r12 + pushq %rbx + + /* Save host RSP and mark the switcher active */ + movq %rsp, TSS_extra(host_rsp) + + /* Switch to host sp0 */ + movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rdi + subq $FRAME_SIZE, %rdi + movq %rdi, %rsp + + UNWIND_HINT_REGS + + MITIGATION_EXIT + + /* switch to guest cr3 on sp0 stack */ + movq TSS_extra(enter_cr3), %rax + movq %rax, %cr3 + /* Load guest registers. */ + POP_REGS + addq $8, %rsp + + /* Switch to guest GSBASE and return to guest */ + swapgs + jmp native_irq_return_iret + +SYM_INNER_LABEL(switcher_return_from_guest, SYM_L_GLOBAL) + /* switch back to host cr3 when still on sp0/ist stack */ + movq TSS_extra(host_cr3), %rax + movq %rax, %cr3 + + MITIGATION_ENTER + + /* Restore to host RSP and mark the switcher inactive */ + movq %rsp, %rax + movq TSS_extra(host_rsp), %rsp + movq $0, TSS_extra(host_rsp) + + popq %rbx + popq %r12 + popq %r13 + popq %r14 + popq %r15 + popq %rbp + RET +SYM_FUNC_END(switcher_enter_guest) +EXPORT_SYMBOL_GPL(switcher_enter_guest) + +SYM_CODE_START(entry_SYSCALL_64_switcher) + UNWIND_HINT_ENTRY + ENDBR + + swapgs + /* tss.sp2 is scratch space. */ + movq %rsp, PER_CPU_VAR(cpu_tss_rw + TSS_sp2) + movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp + +SYM_INNER_LABEL(entry_SYSCALL_64_switcher_safe_stack, SYM_L_GLOBAL) + ANNOTATE_NOENDBR + + /* Construct struct pt_regs on stack */ + pushq $__USER_DS /* pt_regs->ss */ + pushq PER_CPU_VAR(cpu_tss_rw + TSS_sp2) /* pt_regs->sp */ + pushq %r11 /* pt_regs->flags */ + pushq $__USER_CS /* pt_regs->cs */ + pushq %rcx /* pt_regs->ip */ + + pushq $0 /* pt_regs->orig_ax */ + movl $SWITCH_EXIT_REASONS_SYSCALL, 4(%rsp) + + PUSH_AND_CLEAR_REGS + jmp switcher_return_from_guest +SYM_CODE_END(entry_SYSCALL_64_switcher) +EXPORT_SYMBOL_GPL(entry_SYSCALL_64_switcher) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 83dc4122c38d..4115267e7a3e 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -29,6 +29,7 @@ struct vm86; #include #include #include +#include #include #include @@ -382,6 +383,10 @@ struct tss_struct { */ struct x86_hw_tss x86_tss; +#ifdef CONFIG_X86_64 + struct tss_extra tss_ex; +#endif + struct x86_io_bitmap io_bitmap; } __aligned(PAGE_SIZE); diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index f4db78b09c8f..9eeeb5fdd387 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -5,6 +5,7 @@ #include #include #include +#include #ifndef __ASSEMBLY__ #ifdef __i386__ @@ -194,6 +195,8 @@ static __always_inline bool ip_within_syscall_gap(struct pt_regs *regs) ret = ret || (regs->ip >= (unsigned long)entry_SYSRETL_compat_unsafe_stack && regs->ip < (unsigned long)entry_SYSRETL_compat_end); #endif + ret = ret || (regs->ip >= (unsigned long)entry_SYSCALL_64_switcher && + regs->ip < (unsigned long)entry_SYSCALL_64_switcher_safe_stack); return ret; } diff --git a/arch/x86/include/asm/switcher.h b/arch/x86/include/asm/switcher.h new file mode 100644 index 000000000000..dbf1970ca62f --- /dev/null +++ b/arch/x86/include/asm/switcher.h @@ -0,0 +1,59 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_SWITCHER_H +#define _ASM_X86_SWITCHER_H + +#ifdef CONFIG_X86_64 +#include + +#define SWITCH_EXIT_REASONS_SYSCALL 1024 +#define SWITCH_EXIT_REASONS_FAILED_VMETNRY 1025 + +/* Bits allowed to be set in the underlying eflags */ +#define SWITCH_ENTER_EFLAGS_ALLOWED (X86_EFLAGS_FIXED | X86_EFLAGS_IF |\ + X86_EFLAGS_TF | X86_EFLAGS_RF |\ + X86_EFLAGS_AC | X86_EFLAGS_OF | \ + X86_EFLAGS_DF | X86_EFLAGS_SF | \ + X86_EFLAGS_ZF | X86_EFLAGS_AF | \ + X86_EFLAGS_PF | X86_EFLAGS_CF | \ + X86_EFLAGS_ID | X86_EFLAGS_NT) + +/* Bits must be set in the underlying eflags */ +#define SWITCH_ENTER_EFLAGS_FIXED (X86_EFLAGS_FIXED | X86_EFLAGS_IF) + +#ifndef __ASSEMBLY__ +#include + +struct pt_regs; + +/* + * Extra per CPU control structure lives in the struct tss_struct. + * + * The page-size-aligned struct tss_struct has enough room to accommodate + * this extra data without increasing its size. + * + * The extra data is also in the first page of struct tss_struct whose + * read-write mapping (percpu cpu_tss_rw) is in the KPTI's user pagetable, + * so that it can even be accessible via cpu_tss_rw in the entry code. + */ +struct tss_extra { + /* Saved host CR3 to be loaded after VM exit. */ + unsigned long host_cr3; + /* + * Saved host stack to be loaded after VM exit. This also serves as a + * flag to indicate that it is entering the guest world in the switcher + * or has been in the guest world in the host entries. + */ + unsigned long host_rsp; + /* Prepared guest CR3 to be loaded before VM enter. */ + unsigned long enter_cr3; +} ____cacheline_aligned; + +extern struct pt_regs *switcher_enter_guest(void); +extern const char entry_SYSCALL_64_switcher[]; +extern const char entry_SYSCALL_64_switcher_safe_stack[]; +extern const char entry_SYSRETQ_switcher_unsafe_stack[]; +#endif /* __ASSEMBLY__ */ + +#endif /* CONFIG_X86_64 */ + +#endif /* _ASM_X86_SWITCHER_H */ diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c index f39baf90126c..1485cbda6dc4 100644 --- a/arch/x86/kernel/asm-offsets_64.c +++ b/arch/x86/kernel/asm-offsets_64.c @@ -60,5 +60,13 @@ int main(void) OFFSET(FIXED_stack_canary, fixed_percpu_data, stack_canary); BLANK(); #endif + +#define ENTRY(entry) OFFSET(TSS_EX_ ## entry, tss_struct, tss_ex.entry) + ENTRY(host_cr3); + ENTRY(host_rsp); + ENTRY(enter_cr3); + BLANK(); +#undef ENTRY + return 0; } diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index c876f1d36a81..c4f2b629b422 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -773,6 +773,9 @@ DEFINE_IDTENTRY_RAW(exc_int3) asmlinkage __visible noinstr struct pt_regs *sync_regs(struct pt_regs *eregs) { struct pt_regs *regs = (struct pt_regs *)this_cpu_read(pcpu_hot.top_of_stack) - 1; + + if (this_cpu_read(cpu_tss_rw.tss_ex.host_rsp)) + return eregs; if (regs != eregs) *regs = *eregs; return regs; From patchwork Mon Feb 26 14:35:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572288 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8E4A060B87; Mon, 26 Feb 2024 14:34:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958089; cv=none; b=Pt1JXvcaUa6nAap340HrnBIRFZwHSkOciEq5Np7aVKgPDLq1Nwm1fnsNeABmhSsIYfvHhYvMb4iQfcf5dknNYUnsQ48c7yFHXfXHH2oJ370A+4QX3xweeZzqhpQ+mYhA7b93qg9Yy6w1RX+zo3gqFeanmh5FuUID5wbdKkBtqN4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958089; c=relaxed/simple; bh=/GUkCZ302+GMyG/K8t3wfSAJoYB7n1waPF59X7S+x7w=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=YiZhdn6winhrvufzbFAOsrzXzveUXa/mmzIXkk1Vrqx0s3bYxtt68cG30KmYwJ0Ec5dKsiVUvLUgOpN4XCo+IJvRE1kfxP9ZK4PEU59ngBSDfz6JLbSLOvAa7lmwohKPS7IXH0/OQJd//2X9tTVyuO4jMH4SPtUBueHUdqHRhW0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=i48fYwW9; arc=none smtp.client-ip=209.85.210.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="i48fYwW9" Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-6e4f569f326so1154771b3a.2; Mon, 26 Feb 2024 06:34:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958086; x=1709562886; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=SUKCCkt8N5sdt+lxo7ukm1SyD24K8ST3i3kNBn9Onjw=; b=i48fYwW9SloPRb5AWx2hx4M04yj+qt8zvg15jthEwbz7kzEtOJVtF5TwuaFq5R2ohs KlyrKnSRfv7j8T+hdkmMxoijq5ueVczJruE1FDga+h/9pb41bFTPOku6kpnIW27jHPI1 /SBzrMF8I2vn+9HEtkTf8UsDEOhbHAuGPZRw8D14kTqoiaGWmVwSkLFsXbde75m8cuIl tuCtASZiYsOv0N8ds1ePBCCC0IhtwGmQ8ngW0PCeCq8N0jBylJe+1W29LfyYfLYuv+4r XDHB9fkTHzD9MyVCFCbbZsX6GELLPyM6f4i6+HqJgI+bPzXKryxuUhElYhuYU2q38hTS RIVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958086; x=1709562886; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SUKCCkt8N5sdt+lxo7ukm1SyD24K8ST3i3kNBn9Onjw=; b=QxPHdUMTYcFauQanbt//pSVDMY4ESo4/Cc9oxLLxfKLtgJbDNyiw4ewTg0BytfteVq TshKIFFOBNokKHXHfrj/NtJ1p4XUc3Pj0rY8Ogk2Y0TG6lPrZ8XQ004Pl9ULMwgO4ZmX TX25m/8OvJdq2TpxU/1K2JRkD9CFDrw0zJnckx3Y1aMMvqeioCiNfAVSqoRQVYy/U+vT rzJECIeNdvwBR0zy0yj6q4CEo7MpjFz7av4QsGt1V2pJREDGGtYQvu2zM05C+4AZQOAS eAcA47QtGo39VRcv13fjBefnFo6dniKt9uMty77cKDT3oxYnThVEIVtganAOSe2KVHkV lnxQ== X-Forwarded-Encrypted: i=1; AJvYcCUNLRF4mNKcZNOXfTyYaQDenD/JZtqJiILED982I8YIxlDj3HjUQ6IE6bW0xvTHyA4QD5qLYjWY62W30j3ZbozhFPz7 X-Gm-Message-State: AOJu0YxA0BzvpUtsU7Zf7v0zpmmNaWkRNKO0aV4BoEZjMQEsWPpdCqNk W1W2uRiApYvoGsfTOTaVAcYaOFhjZ4uKV8+M0iU7c9yJEsWKmofUnq36V1k1 X-Google-Smtp-Source: AGHT+IH2MIidXpHRalYfZ3USRmaUZKVWt4xTipOI996MatPcH53rbB/zEskXJ14qt9C5x+lbzBOx3g== X-Received: by 2002:a05:6a00:1a88:b0:6e4:700d:6ac6 with SMTP id e8-20020a056a001a8800b006e4700d6ac6mr9293094pfv.8.1708958086510; Mon, 26 Feb 2024 06:34:46 -0800 (PST) Received: from localhost ([47.88.5.130]) by smtp.gmail.com with ESMTPSA id 16-20020a056a00071000b006e3809da4fdsm4015645pfl.83.2024.02.26.06.34.45 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:34:46 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Andy Lutomirski , Dave Hansen , "H. Peter Anvin" , Oleg Nesterov , Brian Gerst Subject: [RFC PATCH 04/73] x86/entry: Implement direct switching for the switcher Date: Mon, 26 Feb 2024 22:35:21 +0800 Message-Id: <20240226143630.33643-5-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan During VM running, all VM exits in the switcher will be forwarded to the hypervisor and then returned to the switcher to re-enter the VM after handling the VM exit. In some situations, the switcher can handle the VM exit directly without involving the hypervisor. This is referred to as direct switching, and it can reduce the overhead of guest/host state switching. Currently, for simplicity, only the syscall event from user mode and ERETU synthetic instruction are allowed for direct switching. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/entry/entry_64_switcher.S | 145 ++++++++++++++++++++++++++++- arch/x86/include/asm/ptrace.h | 2 + arch/x86/include/asm/switcher.h | 60 ++++++++++++ arch/x86/kernel/asm-offsets_64.c | 23 +++++ 4 files changed, 229 insertions(+), 1 deletion(-) diff --git a/arch/x86/entry/entry_64_switcher.S b/arch/x86/entry/entry_64_switcher.S index 2b99a46421cc..6f166d15635c 100644 --- a/arch/x86/entry/entry_64_switcher.S +++ b/arch/x86/entry/entry_64_switcher.S @@ -75,7 +75,7 @@ SYM_FUNC_START(switcher_enter_guest) /* Switch to guest GSBASE and return to guest */ swapgs - jmp native_irq_return_iret + jmp .L_switcher_return_to_guest SYM_INNER_LABEL(switcher_return_from_guest, SYM_L_GLOBAL) /* switch back to host cr3 when still on sp0/ist stack */ @@ -99,6 +99,23 @@ SYM_INNER_LABEL(switcher_return_from_guest, SYM_L_GLOBAL) SYM_FUNC_END(switcher_enter_guest) EXPORT_SYMBOL_GPL(switcher_enter_guest) +.macro canonical_rcx + /* + * If width of "canonical tail" ever becomes variable, this will need + * to be updated to remain correct on both old and new CPUs. + * + * Change top bits to match most significant bit (47th or 56th bit + * depending on paging mode) in the address. + */ +#ifdef CONFIG_X86_5LEVEL + ALTERNATIVE "shl $(64 - 48), %rcx; sar $(64 - 48), %rcx", \ + "shl $(64 - 57), %rcx; sar $(64 - 57), %rcx", X86_FEATURE_LA57 +#else + shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx + sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx +#endif +.endm + SYM_CODE_START(entry_SYSCALL_64_switcher) UNWIND_HINT_ENTRY ENDBR @@ -117,7 +134,133 @@ SYM_INNER_LABEL(entry_SYSCALL_64_switcher_safe_stack, SYM_L_GLOBAL) pushq %r11 /* pt_regs->flags */ pushq $__USER_CS /* pt_regs->cs */ pushq %rcx /* pt_regs->ip */ + pushq %rdi /* put rdi on ORIG_RAX */ + + /* check if it can do direct switch from umod to smod */ + testq $SWITCH_FLAGS_NO_DS_TO_SMOD, TSS_extra(switch_flags) + jnz .L_switcher_check_return_umod_instruction + + /* Now it must be umod, start to do direct switch from umod to smod */ + movq TSS_extra(pvcs), %rdi + movl %r11d, PVCS_eflags(%rdi) + movq %rcx, PVCS_rip(%rdi) + movq %rcx, PVCS_rcx(%rdi) + movq %r11, PVCS_r11(%rdi) + movq RSP-ORIG_RAX(%rsp), %rcx + movq %rcx, PVCS_rsp(%rdi) + + /* switch umod to smod (switch_flags & cr3) */ + xorb $SWITCH_FLAGS_MOD_TOGGLE, TSS_extra(switch_flags) + movq TSS_extra(smod_cr3), %rcx + movq %rcx, %cr3 + + /* load smod registers from TSS_extra to sp0 stack or %r11 */ + movq TSS_extra(smod_rsp), %rcx + movq %rcx, RSP-ORIG_RAX(%rsp) + movq TSS_extra(smod_entry), %rcx + movq %rcx, RIP-ORIG_RAX(%rsp) + movq TSS_extra(smod_gsbase), %r11 + + /* switch host gsbase to guest gsbase, TSS_extra can't be use afterward */ + swapgs + + /* save guest gsbase as user_gsbase and switch to smod_gsbase */ + rdgsbase %rcx + movq %rcx, PVCS_user_gsbase(%rdi) + wrgsbase %r11 + + /* restore umod rdi and smod rflags/r11, rip/rcx and rsp for sysretq */ + popq %rdi + movq $SWITCH_ENTER_EFLAGS_FIXED, %r11 + movq RIP-RIP(%rsp), %rcx + +.L_switcher_sysretq: + UNWIND_HINT_IRET_REGS + /* now everything is ready for sysretq except for %rsp */ + movq RSP-RIP(%rsp), %rsp + /* No instruction can be added between seting the guest %rsp and doing sysretq */ +SYM_INNER_LABEL(entry_SYSRETQ_switcher_unsafe_stack, SYM_L_GLOBAL) + sysretq + +.L_switcher_check_return_umod_instruction: + UNWIND_HINT_IRET_REGS offset=8 + + /* check if it can do direct switch from smod to umod */ + testq $SWITCH_FLAGS_NO_DS_TO_UMOD, TSS_extra(switch_flags) + jnz .L_switcher_return_to_hypervisor + + /* + * Now it must be smod, check if it is the return-umod instruction. + * Switcher and the PVM specification defines a SYSCALL instrucion + * at TSS_extra(retu_rip) - 2 in smod as the return-umod instruction. + */ + cmpq %rcx, TSS_extra(retu_rip) + jne .L_switcher_return_to_hypervisor + + /* only handle for the most common cs/ss */ + movq TSS_extra(pvcs), %rdi + cmpl $((__USER_DS << 16) | __USER_CS), PVCS_user_cs(%rdi) + jne .L_switcher_return_to_hypervisor + + /* Switcher and the PVM specification requires the smod RSP to be saved */ + movq RSP-ORIG_RAX(%rsp), %rcx + movq %rcx, TSS_extra(smod_rsp) + + /* switch smod to umod (switch_flags & cr3) */ + xorb $SWITCH_FLAGS_MOD_TOGGLE, TSS_extra(switch_flags) + movq TSS_extra(umod_cr3), %rcx + movq %rcx, %cr3 + + /* switch host gsbase to guest gsbase, TSS_extra can't be use afterward */ + swapgs + + /* write umod gsbase */ + movq PVCS_user_gsbase(%rdi), %rcx + canonical_rcx + wrgsbase %rcx + + /* load sp, flags, ip to sp0 stack and cx, r11, rdi to registers */ + movq PVCS_rsp(%rdi), %rcx + movq %rcx, RSP-ORIG_RAX(%rsp) + movl PVCS_eflags(%rdi), %r11d + movq %r11, EFLAGS-ORIG_RAX(%rsp) + movq PVCS_rip(%rdi), %rcx + movq %rcx, RIP-ORIG_RAX(%rsp) + movq PVCS_rcx(%rdi), %rcx + movq PVCS_r11(%rdi), %r11 + popq %rdi // saved rdi (on ORIG_RAX) + +.L_switcher_return_to_guest: + /* + * Now the RSP points to an IRET frame with guest state on the + * top of the sp0 stack. Check if it can do sysretq. + */ + UNWIND_HINT_IRET_REGS + + andq $SWITCH_ENTER_EFLAGS_ALLOWED, EFLAGS-RIP(%rsp) + orq $SWITCH_ENTER_EFLAGS_FIXED, EFLAGS-RIP(%rsp) + testq $(X86_EFLAGS_RF|X86_EFLAGS_TF), EFLAGS-RIP(%rsp) + jnz native_irq_return_iret + cmpq %r11, EFLAGS-RIP(%rsp) + jne native_irq_return_iret + + cmpq %rcx, RIP-RIP(%rsp) + jne native_irq_return_iret + /* + * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP + * in kernel space. This essentially lets the guest take over + * the host, since guest controls RSP. + */ + canonical_rcx + cmpq %rcx, RIP-RIP(%rsp) + je .L_switcher_sysretq + + /* RCX matches for RIP only before RCX is canonicalized, restore RCX and do IRET. */ + movq RIP-RIP(%rsp), %rcx + jmp native_irq_return_iret +.L_switcher_return_to_hypervisor: + popq %rdi /* saved rdi */ pushq $0 /* pt_regs->orig_ax */ movl $SWITCH_EXIT_REASONS_SYSCALL, 4(%rsp) diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index 9eeeb5fdd387..322697877a2d 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -198,6 +198,8 @@ static __always_inline bool ip_within_syscall_gap(struct pt_regs *regs) ret = ret || (regs->ip >= (unsigned long)entry_SYSCALL_64_switcher && regs->ip < (unsigned long)entry_SYSCALL_64_switcher_safe_stack); + ret = ret || (regs->ip == (unsigned long)entry_SYSRETQ_switcher_unsafe_stack); + return ret; } #endif diff --git a/arch/x86/include/asm/switcher.h b/arch/x86/include/asm/switcher.h index dbf1970ca62f..35a60f4044c4 100644 --- a/arch/x86/include/asm/switcher.h +++ b/arch/x86/include/asm/switcher.h @@ -8,6 +8,40 @@ #define SWITCH_EXIT_REASONS_SYSCALL 1024 #define SWITCH_EXIT_REASONS_FAILED_VMETNRY 1025 +/* + * SWITCH_FLAGS control the way how the switcher code works, + * mostly dictate whether it should directly do the guest ring + * switch or just go back to hypervisor. + * + * SMOD and UMOD + * Current vcpu mode. Use two parity bits to simplify direct-switch + * flags checking. + * + * NO_DS_CR3 + * Not to direct switch due to smod_cr3 or umod_cr3 not having been + * prepared. + */ +#define SWITCH_FLAGS_SMOD _BITULL(0) +#define SWITCH_FLAGS_UMOD _BITULL(1) +#define SWITCH_FLAGS_NO_DS_CR3 _BITULL(2) + +#define SWITCH_FLAGS_MOD_TOGGLE (SWITCH_FLAGS_SMOD | SWITCH_FLAGS_UMOD) + +/* + * Direct switching disabling bits are all the bits other than + * SWITCH_FLAGS_SMOD or SWITCH_FLAGS_UMOD. Bits 8-64 are defined by the driver + * using the switcher. Direct switching is enabled if all the disabling bits + * are cleared. + * + * SWITCH_FLAGS_NO_DS_TO_SMOD: not to direct switch to smod due to any + * disabling bit or smod bit being set. + * + * SWITCH_FLAGS_NO_DS_TO_UMOD: not to direct switch to umod due to any + * disabling bit or umod bit being set. + */ +#define SWITCH_FLAGS_NO_DS_TO_SMOD (~SWITCH_FLAGS_UMOD) +#define SWITCH_FLAGS_NO_DS_TO_UMOD (~SWITCH_FLAGS_SMOD) + /* Bits allowed to be set in the underlying eflags */ #define SWITCH_ENTER_EFLAGS_ALLOWED (X86_EFLAGS_FIXED | X86_EFLAGS_IF |\ X86_EFLAGS_TF | X86_EFLAGS_RF |\ @@ -24,6 +58,7 @@ #include struct pt_regs; +struct pvm_vcpu_struct; /* * Extra per CPU control structure lives in the struct tss_struct. @@ -46,6 +81,31 @@ struct tss_extra { unsigned long host_rsp; /* Prepared guest CR3 to be loaded before VM enter. */ unsigned long enter_cr3; + + /* + * Direct switching flag indicates whether direct switching + * is allowed. + */ + unsigned long switch_flags ____cacheline_aligned; + /* + * Guest supervisor mode hardware CR3 for direct switching of guest + * user mode syscall. + */ + unsigned long smod_cr3; + /* + * Guest user mode hardware CR3 for direct switching of guest ERETU + * synthetic instruction. + */ + unsigned long umod_cr3; + /* + * The current PVCS for saving and restoring guest user mode context + * in direct switching. + */ + struct pvm_vcpu_struct *pvcs; + unsigned long retu_rip; + unsigned long smod_entry; + unsigned long smod_gsbase; + unsigned long smod_rsp; } ____cacheline_aligned; extern struct pt_regs *switcher_enter_guest(void); diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c index 1485cbda6dc4..8230bd27f0b3 100644 --- a/arch/x86/kernel/asm-offsets_64.c +++ b/arch/x86/kernel/asm-offsets_64.c @@ -4,6 +4,7 @@ #endif #include +#include #if defined(CONFIG_KVM_GUEST) #include @@ -65,6 +66,28 @@ int main(void) ENTRY(host_cr3); ENTRY(host_rsp); ENTRY(enter_cr3); + ENTRY(switch_flags); + ENTRY(smod_cr3); + ENTRY(umod_cr3); + ENTRY(pvcs); + ENTRY(retu_rip); + ENTRY(smod_entry); + ENTRY(smod_gsbase); + ENTRY(smod_rsp); + BLANK(); +#undef ENTRY + +#define ENTRY(entry) OFFSET(PVCS_ ## entry, pvm_vcpu_struct, entry) + ENTRY(event_flags); + ENTRY(event_errcode); + ENTRY(user_cs); + ENTRY(user_ss); + ENTRY(user_gsbase); + ENTRY(rsp); + ENTRY(eflags); + ENTRY(rip); + ENTRY(rcx); + ENTRY(r11); BLANK(); #undef ENTRY From patchwork Mon Feb 26 14:35:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572289 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B66D12DDA7; Mon, 26 Feb 2024 14:34:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958091; cv=none; b=G8Q24qjqpqJb3uphXW64DfCV86gs3vfgbGLycCEs9Z2pRp/WAVu2fDbfC5AJ0sNr2eNsEv5vKOZH8xfzXGEfKnyc2352C1c8NelbGegAPBtTgF5PeKVRKP1ollvEn696sfAiNIisfT8aummpfnHiRez1t+I2BuusW/yLPH0FDmE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958091; c=relaxed/simple; bh=4HJibg4PRRsH779ixU3qdR/j6+N7s9KUuHflN06BYVE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=kwH613FYU1TfP8QvMNpEFB4kwxyDIyp4PMOpaGvKuqqdfRG4Fx5nhMYXDgYHK/yPEyj12LkoGhMSkLci1rdZgLVOqWuIf25P62QLfKBW9X2cu9BoUL0SOOz34WCkRbHNhYdy4RalKFvDcjKywVSpYHJw711tGf5ZYcLYTt7YWUs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=fLpuLN5Y; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="fLpuLN5Y" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-1dbae7b8ff2so13449745ad.3; Mon, 26 Feb 2024 06:34:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958090; x=1709562890; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=N6gafMeoopfCc4UbDX3euhZiN9XGBKerCw9yP6s1L8E=; b=fLpuLN5YPeVGiwK5UeCYQxya43W8RxIHHfDVvxgz+BoRQds1rpDTFpYhmXymaNSIYr xzA9ggF6BvBYt/iOdNNw0x/uMJtpdNWmYubKpsortDKojCHKxeeB/y9i8wtuhwm/UroH P8CL/1ihRyyx1/gmIFA3tKSqZGBjDx/1oMyhD7v9J048x33prCLqh6fx07kSO5Gk2shF 67QRqro6ZBCV0WJez6slaqqN4HKtuMpVzDa0xcQ69MwsfJtiVSEbsnOv2ggYSCf3AHQN NGPqgGxuBUBHRTLNFcRC9yns+9RgEj3CrXEWDcNL//L5RLZMJhLJEB+B0yfikPce23xx qpaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958090; x=1709562890; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=N6gafMeoopfCc4UbDX3euhZiN9XGBKerCw9yP6s1L8E=; b=bzhfai4SKTfxx7B2m6m+vEbEetOiV87dPkuGK7ui2Mlp8n53Bo4DIFj1cfKsX5G2LJ jaOgoEjZYIwXFUD89cKUxaUVEHR4nmv1PcrDpis783aiuD4kU9dI81yaM5uG9y+yXiJ1 wg4i6lTPnjobXAYqK+T6Mo59N2NKS59GBha/6Dei3nkyn+cT0j9u/YvUxZtsXJnFTQAv fgHl5Ve/YEWm5PLMyCZ76DRd2FDN9hvf4HvYVJk4mIL7i43HGhC9Dy6/xPA4brEEo1Bm SC6dyVdEXCwohVNNAl/J3/Gay0E57th2R3OfAFmC5CkCZGJSZ8zNUl6nxTEaK6aB3e/M zXDw== X-Forwarded-Encrypted: i=1; AJvYcCVu7J7XSv002Fgg/PTwXHrvz2KTriQLkGAJLEgDoWqA081jyitj8yMBOkcr4P56TQhWk778VrjESjiTGxRXIVce0F+8 X-Gm-Message-State: AOJu0YwJug6K7hZ10FNuDazDi7fgr40FS//zHjQcRgKRu/oqHoUioST3 AUSSb+2bndnuf9g+7UboLHbMe8ITX8xcH6pOy3PW5MR1GKi+JXZA440VsztP X-Google-Smtp-Source: AGHT+IHPKYBy1wnyWdMqd8xjPMLrgijiqpGjuWnco7MDJmuvAkez4009E+wiByi32zY+jmJ4vFFMLA== X-Received: by 2002:a17:903:249:b0:1d7:5d88:f993 with SMTP id j9-20020a170903024900b001d75d88f993mr7022274plh.41.1708958089624; Mon, 26 Feb 2024 06:34:49 -0800 (PST) Received: from localhost ([47.254.32.37]) by smtp.gmail.com with ESMTPSA id e6-20020a170902cf4600b001dcb18cd22esm534802plg.141.2024.02.26.06.34.49 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:34:49 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 05/73] KVM: x86: Set 'vcpu->arch.exception.injected' as true before vendor callback Date: Mon, 26 Feb 2024 22:35:22 +0800 Message-Id: <20240226143630.33643-6-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan For PVM, the exception is injected and delivered directly in the callback before VM enter, so it will clear 'vcpu->arch.exception.injected'. Therefore, if 'vcpu->arch.exception.injected' is set to true after the vendor callback, it may inject the same exception repeatedly in PVM. To address this, move the setting of 'vcpu->arch.exception.injected' to true before the vendor callback in kvm_inject_exception(). This adjustment has no influence on VMX/SVM, as they don't change it in their callbacks. No functional change. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/x86.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1a3aaa7dafae..35ad6dd5eaf6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10137,6 +10137,7 @@ static void kvm_inject_exception(struct kvm_vcpu *vcpu) vcpu->arch.exception.error_code, vcpu->arch.exception.injected); + vcpu->arch.exception.injected = true; static_call(kvm_x86_inject_exception)(vcpu); } @@ -10288,7 +10289,6 @@ static int kvm_check_and_inject_events(struct kvm_vcpu *vcpu, kvm_inject_exception(vcpu); vcpu->arch.exception.pending = false; - vcpu->arch.exception.injected = true; can_inject = false; } From patchwork Mon Feb 26 14:35:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572290 Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B5E5612EBE4; Mon, 26 Feb 2024 14:34:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958099; cv=none; b=d4ttrkBHZz8M8kSTc4ahwu6oOv+N/jIQyDGTyUNl0i29efuJnd6rcPI/zBLEuxcxs9I78+VsEmnsH/uT8CYf8PB5L8NhFHMJ1DN4psYJ5YmowhkXOBlxdWwXdBdC0SuElPNGsKrQDqSWHckMhThC4+LqsBFMYRl9T6iVN8hGRsY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958099; c=relaxed/simple; bh=v3oWR+P39CKvhNF+pOmX56rjrg+9Z/2sF3YtAzYOJl8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=qYHTNQGD6VV5n5htWP/OSITzCPlOaLnYhCxhuo+Q6/mqsqj9fDuC6ibJoKW+ZqXFBFEm7QESJ+AGGYdHBDhu2HhGaOB8Matz0q74C6tLhMOadUh89IONSD/iHc64/6O28y+jMy/H9Cl/cwFLh280b6m2n67Z5Ul2rUJBgg/jaiM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=bKDaXNJq; arc=none smtp.client-ip=209.85.210.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="bKDaXNJq" Received: by mail-pf1-f175.google.com with SMTP id d2e1a72fcca58-6e4f49c5632so593126b3a.0; Mon, 26 Feb 2024 06:34:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958097; x=1709562897; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KvdApW2xM6ZV/9SNM/xsyM+VYex7z3fSCZ3wCP7fJoU=; b=bKDaXNJq7EZijQfePY8pdbChonxnJZrUtjA8Eg6xXe7t3LL3/2HtTyKrnxBuUBFjB/ oWpAkCamh1/obaWlHfXuL/lbAQi4LRxYlHpBdVmU78Wm7WLdKu99Js9Ehqs8POIS9ziM 2Xf9K/reTpKvk5IJBvGBZYbdYknJlsupmVExwdXH+mKh8Z4FVXItg8hq9veOz6cV2CPI DsAmm7P+47UPU62opQNM9w2V1E0+BiX9ECD8Q6Y3x9li1OP32xEUx/kl7VClvvfL89pJ NXrEX5EV1iTPL4LSKnhP7txqWnyTDdsJ2/JhU6Q+uFMtlAoZdK/9nnO7IKkr0T+Ih9hh ktGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958097; x=1709562897; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KvdApW2xM6ZV/9SNM/xsyM+VYex7z3fSCZ3wCP7fJoU=; b=L4WDLmKr/0XylKU/9bH7fVYR/FWb/OcwR0HHK1LOwC2NgGKiR9AllFwHteY3pi9w2c qRQAm2Ds2VES7dHYdU1kUYW2joWS80tESUPSxWvOPFxnkGqG1JJqzYNKr87vqqUxnYYE t5IImaCN4IZt0MWAIg6GBCM4rtj1kaVB8HqaG0gukYu3GPUgYD2o0cU5J0voZLRYpu1e JEMSpwtKKyURVJht1BaPE48D/aas8NTyPj4zuft8HIPimI9xEdCAalKhQIr92G/6QBZg m6nFZ5CzUusesQZ9kTXI4MDsB0Qc5uEvK4lbX38mjrSlBUSMC5EwuushxvZbJ4+9c7Lh SZCA== X-Forwarded-Encrypted: i=1; AJvYcCXewQWiTFJTUbU8uaC8B7JsGESz/WbFOo/vYGL6/4Cx86QlzR0H7w1N+5tAqJ8w+N8oQFXxnFu4gakY8qmvaXScWCWE X-Gm-Message-State: AOJu0Yw5UA/p5XpUEKFFT5Cpvp1lBsmvxm6sgm+s3Qg9MOYoes7xeo00 VGySRCKtdW0bPzlPLYR9OB3NArdjJlXKuOnlH1tP+y6fCfYP3TNy12gpKXHJ X-Google-Smtp-Source: AGHT+IF1HWQpwmNIokh+etIWHwc/ZM1A5RDLvvVbDiJng+VTsHDOjSVxGJxiTaxlg0eYbwbQ7MaxYw== X-Received: by 2002:a05:6a20:d80f:b0:1a0:a882:950c with SMTP id iv15-20020a056a20d80f00b001a0a882950cmr7295333pzb.18.1708958096561; Mon, 26 Feb 2024 06:34:56 -0800 (PST) Received: from localhost ([198.11.176.14]) by smtp.gmail.com with ESMTPSA id t5-20020a170902dcc500b001dc6b99af70sm4008687pll.108.2024.02.26.06.34.55 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:34:56 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" , "Mike Rapoport (IBM)" , Yu-cheng Yu , Rick Edgecombe , "Paul E. McKenney" , Mark Rutland Subject: [RFC PATCH 06/73] KVM: x86: Move VMX interrupt/nmi handling into kvm.ko Date: Mon, 26 Feb 2024 22:35:23 +0800 Message-Id: <20240226143630.33643-7-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan Similar to VMX, hardware interrupts/NMI during guest running in PVM will trigger VM exit and should be handled by host interrupt/NMI handlers. Therefore, move VMX interrupt/NMI handling into kvm.ko for common usage. Signed-off-by: Lai Jiangshan Co-developed-by: Hou Wenlong Signed-off-by: Hou Wenlong --- arch/x86/include/asm/idtentry.h | 12 ++++---- arch/x86/kernel/nmi.c | 8 +++--- arch/x86/kvm/Makefile | 2 +- arch/x86/kvm/host_entry.S | 50 +++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/vmenter.S | 43 ---------------------------- arch/x86/kvm/vmx/vmx.c | 14 ++------- arch/x86/kvm/x86.c | 3 ++ arch/x86/kvm/x86.h | 18 ++++++++++++ 8 files changed, 85 insertions(+), 65 deletions(-) create mode 100644 arch/x86/kvm/host_entry.S diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h index 13639e57e1f8..8aab0b50431a 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -586,14 +586,14 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_MC, xenpv_exc_machine_check); /* NMI */ -#if IS_ENABLED(CONFIG_KVM_INTEL) +#if IS_ENABLED(CONFIG_KVM) /* - * Special entry point for VMX which invokes this on the kernel stack, even for - * 64-bit, i.e. without using an IST. asm_exc_nmi() requires an IST to work - * correctly vs. the NMI 'executing' marker. Used for 32-bit kernels as well - * to avoid more ifdeffery. + * Special entry point for VMX/PVM which invokes this on the kernel stack, even + * for 64-bit, i.e. without using an IST. asm_exc_nmi() requires an IST to + * work correctly vs. the NMI 'executing' marker. Used for 32-bit kernels as + * well to avoid more ifdeffery. */ -DECLARE_IDTENTRY(X86_TRAP_NMI, exc_nmi_kvm_vmx); +DECLARE_IDTENTRY(X86_TRAP_NMI, exc_nmi_kvm); #endif DECLARE_IDTENTRY_NMI(X86_TRAP_NMI, exc_nmi); diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c index 17e955ab69fe..265e6b38cc58 100644 --- a/arch/x86/kernel/nmi.c +++ b/arch/x86/kernel/nmi.c @@ -568,13 +568,13 @@ DEFINE_IDTENTRY_RAW(exc_nmi) mds_user_clear_cpu_buffers(); } -#if IS_ENABLED(CONFIG_KVM_INTEL) -DEFINE_IDTENTRY_RAW(exc_nmi_kvm_vmx) +#if IS_ENABLED(CONFIG_KVM) +DEFINE_IDTENTRY_RAW(exc_nmi_kvm) { exc_nmi(regs); } -#if IS_MODULE(CONFIG_KVM_INTEL) -EXPORT_SYMBOL_GPL(asm_exc_nmi_kvm_vmx); +#if IS_MODULE(CONFIG_KVM) +EXPORT_SYMBOL_GPL(asm_exc_nmi_kvm); #endif #endif diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index 80e3fe184d17..97bad203b1b1 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -9,7 +9,7 @@ endif include $(srctree)/virt/kvm/Makefile.kvm -kvm-y += x86.o emulate.o i8259.o irq.o lapic.o \ +kvm-y += x86.o emulate.o i8259.o irq.o lapic.o host_entry.o \ i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \ hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o \ mmu/spte.o diff --git a/arch/x86/kvm/host_entry.S b/arch/x86/kvm/host_entry.S new file mode 100644 index 000000000000..6bdf0df06eb0 --- /dev/null +++ b/arch/x86/kvm/host_entry.S @@ -0,0 +1,50 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#include +#include +#include +#include + +.macro KVM_DO_EVENT_IRQOFF call_insn call_target + /* + * Unconditionally create a stack frame, getting the correct RSP on the + * stack (for x86-64) would take two instructions anyways, and RBP can + * be used to restore RSP to make objtool happy (see below). + */ + push %_ASM_BP + mov %_ASM_SP, %_ASM_BP + +#ifdef CONFIG_X86_64 + /* + * Align RSP to a 16-byte boundary (to emulate CPU behavior) before + * creating the synthetic interrupt stack frame for the IRQ/NMI. + */ + and $-16, %rsp + push $__KERNEL_DS + push %rbp +#endif + pushf + push $__KERNEL_CS + \call_insn \call_target + + /* + * "Restore" RSP from RBP, even though IRET has already unwound RSP to + * the correct value. objtool doesn't know the callee will IRET and, + * without the explicit restore, thinks the stack is getting walloped. + * Using an unwind hint is problematic due to x86-64's dynamic alignment. + */ + mov %_ASM_BP, %_ASM_SP + pop %_ASM_BP + RET +.endm + +.section .noinstr.text, "ax" + +SYM_FUNC_START(kvm_do_host_nmi_irqoff) + KVM_DO_EVENT_IRQOFF call asm_exc_nmi_kvm +SYM_FUNC_END(kvm_do_host_nmi_irqoff) + +.section .text, "ax" + +SYM_FUNC_START(kvm_do_host_interrupt_irqoff) + KVM_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1 +SYM_FUNC_END(kvm_do_host_interrupt_irqoff) diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S index 906ecd001511..12b7b99a9dd8 100644 --- a/arch/x86/kvm/vmx/vmenter.S +++ b/arch/x86/kvm/vmx/vmenter.S @@ -31,39 +31,6 @@ #define VCPU_R15 __VCPU_REGS_R15 * WORD_SIZE #endif -.macro VMX_DO_EVENT_IRQOFF call_insn call_target - /* - * Unconditionally create a stack frame, getting the correct RSP on the - * stack (for x86-64) would take two instructions anyways, and RBP can - * be used to restore RSP to make objtool happy (see below). - */ - push %_ASM_BP - mov %_ASM_SP, %_ASM_BP - -#ifdef CONFIG_X86_64 - /* - * Align RSP to a 16-byte boundary (to emulate CPU behavior) before - * creating the synthetic interrupt stack frame for the IRQ/NMI. - */ - and $-16, %rsp - push $__KERNEL_DS - push %rbp -#endif - pushf - push $__KERNEL_CS - \call_insn \call_target - - /* - * "Restore" RSP from RBP, even though IRET has already unwound RSP to - * the correct value. objtool doesn't know the callee will IRET and, - * without the explicit restore, thinks the stack is getting walloped. - * Using an unwind hint is problematic due to x86-64's dynamic alignment. - */ - mov %_ASM_BP, %_ASM_SP - pop %_ASM_BP - RET -.endm - .section .noinstr.text, "ax" /** @@ -299,10 +266,6 @@ SYM_INNER_LABEL_ALIGN(vmx_vmexit, SYM_L_GLOBAL) SYM_FUNC_END(__vmx_vcpu_run) -SYM_FUNC_START(vmx_do_nmi_irqoff) - VMX_DO_EVENT_IRQOFF call asm_exc_nmi_kvm_vmx -SYM_FUNC_END(vmx_do_nmi_irqoff) - #ifndef CONFIG_CC_HAS_ASM_GOTO_OUTPUT /** @@ -354,9 +317,3 @@ SYM_FUNC_START(vmread_error_trampoline) RET SYM_FUNC_END(vmread_error_trampoline) #endif - -.section .text, "ax" - -SYM_FUNC_START(vmx_do_interrupt_irqoff) - VMX_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1 -SYM_FUNC_END(vmx_do_interrupt_irqoff) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index be20a60047b1..fca47304506e 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6920,9 +6920,6 @@ static void vmx_apicv_pre_state_restore(struct kvm_vcpu *vcpu) memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir)); } -void vmx_do_interrupt_irqoff(unsigned long entry); -void vmx_do_nmi_irqoff(void); - static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu) { /* @@ -6968,9 +6965,7 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu) "unexpected VM-Exit interrupt info: 0x%x", intr_info)) return; - kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ); - vmx_do_interrupt_irqoff(gate_offset(desc)); - kvm_after_interrupt(vcpu); + kvm_do_interrupt_irqoff(vcpu, gate_offset(desc)); vcpu->arch.at_instruction_boundary = true; } @@ -7260,11 +7255,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD); if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI && - is_nmi(vmx_get_intr_info(vcpu))) { - kvm_before_interrupt(vcpu, KVM_HANDLING_NMI); - vmx_do_nmi_irqoff(); - kvm_after_interrupt(vcpu); - } + is_nmi(vmx_get_intr_info(vcpu))) + kvm_do_nmi_irqoff(vcpu); out: guest_state_exit_irqoff(); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 35ad6dd5eaf6..96f3913f7fc5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13784,6 +13784,9 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size, } EXPORT_SYMBOL_GPL(kvm_sev_es_string_io); +EXPORT_SYMBOL_GPL(kvm_do_host_nmi_irqoff); +EXPORT_SYMBOL_GPL(kvm_do_host_interrupt_irqoff); + EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 5184fde1dc54..4d1430f8874b 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -491,6 +491,24 @@ static inline void kvm_machine_check(void) #endif } +void kvm_do_host_nmi_irqoff(void); +void kvm_do_host_interrupt_irqoff(unsigned long entry); + +static __always_inline void kvm_do_nmi_irqoff(struct kvm_vcpu *vcpu) +{ + kvm_before_interrupt(vcpu, KVM_HANDLING_NMI); + kvm_do_host_nmi_irqoff(); + kvm_after_interrupt(vcpu); +} + +static inline void kvm_do_interrupt_irqoff(struct kvm_vcpu *vcpu, + unsigned long entry) +{ + kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ); + kvm_do_host_interrupt_irqoff(entry); + kvm_after_interrupt(vcpu); +} + void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu); void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu); int kvm_spec_ctrl_test_value(u64 value); From patchwork Mon Feb 26 14:35:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572291 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0860012EBF1; Mon, 26 Feb 2024 14:35:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958102; cv=none; b=ks3hdiBlvHT22cXVkHdYNAsHsabmTlbXwmZ5IZCtNfM9MPcYcnIKVX5Ne07Rm/39b6grTC7GSDlddXvhBVhiFm2IXlfZoS9ii47f+G+MhKv55QlHArcyAgah71SLj1QWg21bB0iOfR1e0E5dtZq3m+a3l8iTNYHN+8uwXHpukiE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958102; c=relaxed/simple; bh=mp4bIEVXnnIUzxsDHcmVT4EyPtcJSElBPfc+LZawrHk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Dn8IPm82lxPy417X+GYEPZd0/o/Ubr72U9C0+6rw2asD+HEm2UNixuUgxKOMNLBvaP1CxV+gBclETNNDqlJbKUEBuyOnl0Y+rAYcE70z7mfIf/GCfJ3L6f08yaUSax/6GL8S22PMCbFsB0IvKClnpALFZNcT4ltpNa1hwrdH0Pk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=JaiUA9Os; arc=none smtp.client-ip=209.85.210.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="JaiUA9Os" Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-6e4f49c5632so593160b3a.0; Mon, 26 Feb 2024 06:35:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958100; x=1709562900; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xGcizR5RhJyAZaF61CVdloVSq2PxTGm8Bnu+7WmTNJg=; b=JaiUA9OstKah3JAP40jBp3Ftu3TGLqF+Xr1kXv3+SKImIaqe+BvycZ1DKWjH/Y0kiJ gfU8bvoNd3SyawEMImClrHqiZqTQ1TF5IhAJmjFy1bz4ABJ/KL7goTSbtb6FBbOFxWgr pnF3Xb76va9t633C4l0qiWJZlhT/NyYv+7mWDseUEsq5Kznb/ZbAxlR+NBodD1dnDKX8 vmz5Kzwm2ipDH0sA2q65s7yZIJ17aRdJ/KqnPX79PnMUTJPz2ErdCtA/G56I0gJtijYw 8FJjSzcuLq1zf0iWvJ4hdt6xgzjrSzV2wUpFegEK1XgFMb4Y5sY1Q+//oBwsRoL0oF4x yBiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958100; x=1709562900; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xGcizR5RhJyAZaF61CVdloVSq2PxTGm8Bnu+7WmTNJg=; b=te1+KLHPNfOK+NXIF/i8XuZPszyp1pKoWzSkglIVds+DwuUe2l84CDgJVVAi5XEqyd OrgsyJkaoCS3QvztLrtRor9A9L8DLMXafvkQgiQ2tPP8+rraujppSU/AfufEQNt1oVHC O9/ghD7nP2J5KQny6ae2Xqg+ApMeg1/a7FTI8Tkx2b5h48TQGMX8SLxH3xalYylhuXxR jZ/cGHRigytSd6thCWfzf0zxEMnpY4oLTN8+Pv7zMNTHC5S7DIYnba8bBSpvWDQa541H ykvjHKl2tYa1IMDh456iXKdRh3y5pcvPPuCviir0pJOC0EzqC9LDmCzclntyGpDkm+rg mh2g== X-Forwarded-Encrypted: i=1; AJvYcCXxHT2Cd1zhnERzs/7i0Sk9iI+OMqVLLZkClItcp6kVDteAn4xQTuW8nS2GfCWox4Z6aIbBp09YdinjwpGuOEdz5o4L X-Gm-Message-State: AOJu0Yx/X6xL+ZmblyVv5SMPjtUAr8IY5Hpy0SNvlpDuRTPYWszgNoQ4 WGMu9uU3sV8G3bos0ArObBbJtYOM/VE3VekMT2TOiVe74HcKzdbeJQkzU6d/ X-Google-Smtp-Source: AGHT+IEQ5ZAAPokPrEcDDW70e6Lag98en0XxkSEwYRvhdcHIwCUTOJilf0F9mb76AEaNtv2jcqpTtw== X-Received: by 2002:a05:6a00:1397:b0:6e4:f761:1a4c with SMTP id t23-20020a056a00139700b006e4f7611a4cmr6338374pfg.12.1708958099979; Mon, 26 Feb 2024 06:34:59 -0800 (PST) Received: from localhost ([47.88.5.130]) by smtp.gmail.com with ESMTPSA id it8-20020a056a00458800b006e05c801748sm4103087pfb.199.2024.02.26.06.34.59 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:34:59 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 07/73] KVM: x86/mmu: Adapt shadow MMU for PVM Date: Mon, 26 Feb 2024 22:35:24 +0800 Message-Id: <20240226143630.33643-8-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan In PVM, shadow MMU is used for guest MMU virtualization. However, it needs some changes to adapt for PVM: 1. In PVM, hardware CR4.LA57 is not changed, so the paging level of shadow MMU should be same as host. If the guest paging level is 4 and host paging level is 5, then it performs like shadow NPT MMU and 'root_role.passthrough' is set as true. 2. PVM guest needs to access the host switcher, so some host mapping PGD entries will be cloned into the guest shadow paging table during the root SP allocation. These cloned host PGD entries are not marked as MMU present, so they can't be cleared by write-protecting. Additionally, in order to avoid modifying those cloned host PGD entries in the #PF handling path, a new callback is introduced to check the fault of the guest virtual address before walking the guest page table. This ensures that the guest cannot overwrite the host entries in the root SP. 3. If the guest paging level is 4 and the host paging level is 5, then the last PGD entry in the root SP is allowed to be overwritten if the guest tries to build a new allowed mapping under this PGD entry. In this case, the host P4D entries in the table pointed to by the last PGD entry should also be cloned during the new P4D SP allocation. These cloned P4D entries are also not marked as MMU present. A new bit in the 'kvm_mmu_page_role' is used to mark this special SP. When zapping this SP, its parent PTE will be set to the original host PGD PTEs instead of clearing it. 4. The user bit in the SPTE of guest mapping should be forced to be set for PVM, as the guest is always running in hardware CPL3. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 6 ++++- arch/x86/kvm/mmu/mmu.c | 35 +++++++++++++++++++++++++++++- arch/x86/kvm/mmu/paging_tmpl.h | 3 +++ arch/x86/kvm/mmu/spte.c | 4 ++++ 5 files changed, 47 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index 26b628d84594..32e5473b499d 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -93,6 +93,7 @@ KVM_X86_OP_OPTIONAL_RET0(set_tss_addr) KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr) KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) KVM_X86_OP(load_mmu_pgd) +KVM_X86_OP_OPTIONAL_RET0(disallowed_va) KVM_X86_OP(has_wbinvd_exit) KVM_X86_OP(get_l2_tsc_offset) KVM_X86_OP(get_l2_tsc_multiplier) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d7036982332e..c76bafe9c7e2 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -346,7 +346,8 @@ union kvm_mmu_page_role { unsigned ad_disabled:1; unsigned guest_mode:1; unsigned passthrough:1; - unsigned :5; + unsigned host_mmu_la57_top_p4d:1; + unsigned :4; /* * This is left at the top of the word so that @@ -1429,6 +1430,7 @@ struct kvm_arch { * the thread holds the MMU lock in write mode. */ spinlock_t tdp_mmu_pages_lock; + u64 *host_mmu_root_pgd; #endif /* CONFIG_X86_64 */ /* @@ -1679,6 +1681,8 @@ struct kvm_x86_ops { void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); + bool (*disallowed_va)(struct kvm_vcpu *vcpu, u64 la); + bool (*has_wbinvd_exit)(void); u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index c57e181bba21..80406666d7da 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1745,6 +1745,18 @@ static unsigned kvm_page_table_hashfn(gfn_t gfn) return hash_64(gfn, KVM_MMU_HASH_SHIFT); } +#define HOST_ROOT_LEVEL (pgtable_l5_enabled() ? PT64_ROOT_5LEVEL : PT64_ROOT_4LEVEL) + +static inline bool pvm_mmu_p4d_at_la57_pgd511(struct kvm *kvm, u64 *sptep) +{ + if (!pgtable_l5_enabled()) + return false; + if (!kvm->arch.host_mmu_root_pgd) + return false; + + return sptep_to_sp(sptep)->role.level == 5 && spte_index(sptep) == 511; +} + static void mmu_page_add_parent_pte(struct kvm_mmu_memory_cache *cache, struct kvm_mmu_page *sp, u64 *parent_pte) { @@ -1764,7 +1776,10 @@ static void drop_parent_pte(struct kvm *kvm, struct kvm_mmu_page *sp, u64 *parent_pte) { mmu_page_remove_parent_pte(kvm, sp, parent_pte); - mmu_spte_clear_no_track(parent_pte); + if (!unlikely(sp->role.host_mmu_la57_top_p4d)) + mmu_spte_clear_no_track(parent_pte); + else + __update_clear_spte_fast(parent_pte, kvm->arch.host_mmu_root_pgd[511]); } static void mark_unsync(u64 *spte); @@ -2253,6 +2268,15 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm, list_add(&sp->link, &kvm->arch.active_mmu_pages); kvm_account_mmu_page(kvm, sp); + /* install host mmu entries when PVM */ + if (kvm->arch.host_mmu_root_pgd && role.level == HOST_ROOT_LEVEL) { + memcpy(sp->spt, kvm->arch.host_mmu_root_pgd, PAGE_SIZE); + } else if (role.host_mmu_la57_top_p4d) { + u64 *p4d = __va(kvm->arch.host_mmu_root_pgd[511] & SPTE_BASE_ADDR_MASK); + + memcpy(sp->spt, p4d, PAGE_SIZE); + } + sp->gfn = gfn; sp->role = role; hlist_add_head(&sp->hash_link, sp_list); @@ -2354,6 +2378,9 @@ static struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu, return ERR_PTR(-EEXIST); role = kvm_mmu_child_role(sptep, direct, access); + if (unlikely(pvm_mmu_p4d_at_la57_pgd511(vcpu->kvm, sptep))) + role.host_mmu_la57_top_p4d = 1; + return kvm_mmu_get_shadow_page(vcpu, gfn, role); } @@ -5271,6 +5298,12 @@ static void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, /* KVM uses PAE paging whenever the guest isn't using 64-bit paging. */ root_role.level = max_t(u32, root_role.level, PT32E_ROOT_LEVEL); + /* Shadow MMU level should be the same as host for PVM */ + if (vcpu->kvm->arch.host_mmu_root_pgd && root_role.level != HOST_ROOT_LEVEL) { + root_role.level = HOST_ROOT_LEVEL; + root_role.passthrough = 1; + } + /* * KVM forces EFER.NX=1 when TDP is disabled, reflect it in the MMU role. * KVM uses NX when TDP is disabled to handle a variety of scenarios, diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index c85255073f67..8ea3dca940ad 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -336,6 +336,9 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, goto error; --walker->level; } + + if (static_call(kvm_x86_disallowed_va)(vcpu, addr)) + goto error; #endif walker->max_level = walker->level; diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 4a599130e9c9..e302f7b5c696 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -186,6 +186,10 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, if (pte_access & ACC_USER_MASK) spte |= shadow_user_mask; + /* PVM guest is always running in hardware CPL3. */ + if (vcpu->kvm->arch.host_mmu_root_pgd) + spte |= shadow_user_mask; + if (level > PG_LEVEL_4K) spte |= PT_PAGE_SIZE_MASK; From patchwork Mon Feb 26 14:35:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572292 Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2F2F912F374; Mon, 26 Feb 2024 14:35:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958105; cv=none; b=rNiR6nffdCRX5WvccKVrCRvKXdRsptvNWHqG+b+JeviNoXZ5gV36zJuQfGUAlJFvTpxdbJbtw5faOj12TiKMebFziH+k/P7Uq0VF46z13BBVgyaQGt+zBBBrYuOJtoFtK33GyWSUv4Anv456W95G1rPYf5Z+2urE/UZRMlYh2us= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958105; c=relaxed/simple; bh=xwIh858WObmvtC725DseRlAcONsFGiQgeAooN8QJfLA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=kBtWYugZg09k0AMOF0MxsRp3w4/hDv0jH6+fjcL12AxchTK8txCbiu3ZtXpLgVhLvnIL1MU4RiVLBU7Owa0E84WLzSSq8HcAG3Smu18hZ0mhJzHRWqK+qc7iKRgs8VQKqBpzy0vSHq3weq6CNSoqYPMUeq/05meT0iJipNNLQYY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=jcAbNFFq; arc=none smtp.client-ip=209.85.214.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="jcAbNFFq" Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-1dcad814986so4522245ad.0; Mon, 26 Feb 2024 06:35:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958103; x=1709562903; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=b0atspnxIjLDF90X6FTNIM6Zc64/TGzPipqn5jcW5+A=; b=jcAbNFFqFWQf4TszNix7Wtee3nBY0rSo9QUNMj0eDj1CIvaqyM5WZSuInKdXcy7h+1 znYssp4BkOU9C46NzzmnFTeJOgdI1HU7mzN4Sn/SQDX05J87dFmVpISJ6hlmy6at8xgU GmNR3PJQ+3kKBfW71h0/xvXMZUiDXVAepgpWTc3ET3EiGAx6Tr5Ch7WbJynLy5vdKIeq ShmsMKaLwI10vdL2c9/yi5gqYat1B0Bhjz/95K1/V2dYHJPzGEvd/XZOo4y2IrtzEsd+ Sc2e8MktQ6kXy38wZJIjM8Vto/P9bXwPfP0OiGtAOmk2i12atOJK2xIXgetiEdJ1S01F fAXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958103; x=1709562903; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=b0atspnxIjLDF90X6FTNIM6Zc64/TGzPipqn5jcW5+A=; b=tWfjcAJhYggyD2dplFhIoJqhO+s+FyzZcNe47Tqx9u78PSRw+E6J3vvQkcZfheX7lX 7cPZdSBGR2I5OodcxcOMdB65GitDCWBbN4XChLi0kh9LPPvL0reeQ8bFlqVsQIEb96eW me3SKJkxxeIacp1Seme604+u1qDO4fL6oKRLrsbuM20MPNW69BLNQsb8V71F+YEQRgqC xDw/2AUbk2UDtIE7nEa/0JsrYupb8irIlCW2mxmlq/qeAlmP8E068e38nE4P92Z0MMsV J1qej+TsamwBRSdwTaWBNZ3cpvRAAPO/uoCkF/IK5sf+u4KmNn4kW9gFnVWy96ivXXpC skyA== X-Forwarded-Encrypted: i=1; AJvYcCVSf6V7jxnPfwmjqZWBsWjJQrrmsDv0tYurlItmlSgF3/ss6IpykjkqZxeXyTYMO1KV8ZyGSDS1fhvrO4ISSBRsNlXr X-Gm-Message-State: AOJu0YxtjyoTI5FkH9sKPKx9IZdnfOX5eqLqwVEw5+DczsNoI8bMABt9 yw5qWLjJqT8EdO/DAU0iDYzxMyJc9z7bnDJIzf6ShljkiOqd3+Rgi/nVM6wl X-Google-Smtp-Source: AGHT+IFZXkg7BuUXLXx7GpKe0W887fXA9MAv5WAGUy5HQDjLkpsO0IpvshxneGsKwVIG1iDVvBSLhw== X-Received: by 2002:a17:903:2342:b0:1dc:30d7:ff37 with SMTP id c2-20020a170903234200b001dc30d7ff37mr9570988plh.42.1708958103167; Mon, 26 Feb 2024 06:35:03 -0800 (PST) Received: from localhost ([47.254.32.37]) by smtp.gmail.com with ESMTPSA id j5-20020a170902c3c500b001db608b54a9sm4011080plj.23.2024.02.26.06.35.02 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:35:02 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Hou Wenlong , Lai Jiangshan , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 08/73] KVM: x86: Allow hypercall handling to not skip the instruction Date: Mon, 26 Feb 2024 22:35:25 +0800 Message-Id: <20240226143630.33643-9-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Hou Wenlong In PVM, the syscall instruction is used as the hypercall instruction. Since the syscall instruction is a trap that indicates the instruction has been executed, there is no need to skip the hypercall instruction. Suggested-by: Lai Jiangshan Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/include/asm/kvm_host.h | 12 +++++++++++- arch/x86/kvm/x86.c | 10 +++++++--- 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c76bafe9c7e2..d17d85106d6f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -2077,7 +2077,17 @@ static inline void kvm_clear_apicv_inhibit(struct kvm *kvm, kvm_set_or_clear_apicv_inhibit(kvm, reason, false); } -int kvm_emulate_hypercall(struct kvm_vcpu *vcpu); +int kvm_handle_hypercall(struct kvm_vcpu *vcpu, bool skip); + +static inline int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) +{ + return kvm_handle_hypercall(vcpu, true); +} + +static inline int kvm_emulate_hypercall_noskip(struct kvm_vcpu *vcpu) +{ + return kvm_handle_hypercall(vcpu, false); +} int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code, void *insn, int insn_len); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 96f3913f7fc5..8ec7a36cdf3e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9933,7 +9933,7 @@ static int complete_hypercall_exit(struct kvm_vcpu *vcpu) return kvm_skip_emulated_instruction(vcpu); } -int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) +int kvm_handle_hypercall(struct kvm_vcpu *vcpu, bool skip) { unsigned long nr, a0, a1, a2, a3, ret; int op_64_bit; @@ -10034,9 +10034,13 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) kvm_rax_write(vcpu, ret); ++vcpu->stat.hypercalls; - return kvm_skip_emulated_instruction(vcpu); + + if (skip) + return kvm_skip_emulated_instruction(vcpu); + + return 1; } -EXPORT_SYMBOL_GPL(kvm_emulate_hypercall); +EXPORT_SYMBOL_GPL(kvm_handle_hypercall); static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt) { From patchwork Mon Feb 26 14:35:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572293 Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com [209.85.210.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 58CF512F586; Mon, 26 Feb 2024 14:35:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958108; cv=none; b=eLxYE0ptXUef4QG5QguP1BSN+1jCfIineckgltKKLtSUqJoCh5TubYr2RXMACGfP0/S0EmFSoyz/YJw+OurK/G5VR0gPHWzRXQ3zMpM24zpwr9NXMUS5p9Uyjmq9Z6jWXcJC49tDynw2A9Zv/8bQZdsTR7JUsWzUGTifIz8vods= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958108; c=relaxed/simple; bh=vB6QlPmmK8kXwQBtLPQg8xVtiKf4ZcuKJbsohLPdMp8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=brdWIHY6xnyQsdNFouorpVGtaJZMqHxDN9bJZq/MM/cMfOl771UfMKmDBPIJ/C6m2xwkq0kngy14V6wIvACo1t4Xmh+EeD7cqmU+TaDqwgf+W8U0lDwfLt2mnP9goO3jKsEJ7KlQQQ497fkMZRvcGWYri0xw0K4gudeqjRVUsjA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=FNECOgvY; arc=none smtp.client-ip=209.85.210.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="FNECOgvY" Received: by mail-pf1-f178.google.com with SMTP id d2e1a72fcca58-6e4670921a4so1636112b3a.0; Mon, 26 Feb 2024 06:35:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958106; x=1709562906; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=v7Hh2Afn1QJ5FCuqDR4QoSv7xeyDbeCLAUAuNj6SkwU=; b=FNECOgvYh310vlvBfQ+tKGCeoPCzB8p2NsBa9PH2P2EC32kLiLaGC2u2LT9HDBdwtW Mh5GwmeO0NWsUzTE9QddK4BdQe2vBwehGMxu8hnb97FKWcYvlDWlUxsRUaBs6LmwcD4F Q8K5QFqGTm8p2TwgrNTk7Ti7E0QaeNfOFeiZl8kj3hMycHop5P7eGOLd1hRkfeiVsllp 6GFEmNJYZ1E8ldlyTjvmpKdLcgiAXA9LOEMzMUsf0uWt9Ew19cMtMCqAQLmRoC1bRqyX 4KQC0yEQV6Js1i12/M/bsK2ZwjumeuYZgEt0309DcfGTmhnuD2JyHmnAhoWlkpy6jV4w 30gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958106; x=1709562906; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=v7Hh2Afn1QJ5FCuqDR4QoSv7xeyDbeCLAUAuNj6SkwU=; b=nuQW4BGTO8nzewN8ywfbflW2F4MFeRiQUn1zmO/uv+g2Msf94q2akftCLb2/kOSqEE pf0DQi0oGSFDNNxRMi2asyg321pOhBRrjNdkGHfHlcXxynuw6y3dtcjvZ5mJ1HLBzFy/ eTORJ9UrJjnzx/72L2tYp3WEE/3PRNb3SS/skbpoqs/Xc4wOzRw9j6iOwNoJtDfSLhUy 2WT9FyvtWz1xbcsQbMrrzRTW89M51sotsAKfU07L/j6Q7wEupYirMx8BJg8XzQipPR5d IdZ7gqlWVTVpeNWi+lwDBiIwaZcHv8VeVrwi/EHdAm5aAbD1hQVnAeh2gZ0qkbob/DjO Ro/Q== X-Forwarded-Encrypted: i=1; AJvYcCU2dD9tkfIELgKn7N3f0q0QA+mFvc6ju4XJQV97OF7pYjj7SZL6IJuXW3oZ/dC8UzxIv59WUagO4VD0QIxhfzU86ZWS X-Gm-Message-State: AOJu0YwOXo4EFW0SuHB68uEU3yx+eF5H+TfB7iAaDOctLYU3iKO+BQwi W5PSiI9Kg84do2rZfhhd3Vil0SW3MAedQ4RO36VWURcEjya9Pwqd/GQIwZUW X-Google-Smtp-Source: AGHT+IEv7XU1rNRba9oMMqkV+6qN+XVjpodubAS8NJsRtUG59QJZkNQNkRTDYqCvVVP66O5TiXM94A== X-Received: by 2002:a05:6a21:3183:b0:1a0:ea31:c34f with SMTP id za3-20020a056a21318300b001a0ea31c34fmr9518737pzb.38.1708958106285; Mon, 26 Feb 2024 06:35:06 -0800 (PST) Received: from localhost ([47.88.5.130]) by smtp.gmail.com with ESMTPSA id t30-20020a62d15e000000b006e375ac0d8dsm4276032pfl.138.2024.02.26.06.35.05 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:35:06 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 09/73] KVM: x86: Add PVM virtual MSRs into emulated_msrs_all[] Date: Mon, 26 Feb 2024 22:35:26 +0800 Message-Id: <20240226143630.33643-10-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan Add PVM virtual MSRs to emulated_msrs_all[], enabling the saving and restoration of VM states. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/svm/svm.c | 4 ++++ arch/x86/kvm/vmx/vmx.c | 4 ++++ arch/x86/kvm/x86.c | 10 ++++++++++ 3 files changed, 18 insertions(+) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index f3bb30b40876..91ab7cbbe813 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -31,6 +31,7 @@ #include #include +#include #include #include #include @@ -4281,6 +4282,9 @@ static bool svm_has_emulated_msr(struct kvm *kvm, u32 index) case MSR_IA32_MCG_EXT_CTL: case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR: return false; + case PVM_VIRTUAL_MSR_BASE ... PVM_VIRTUAL_MSR_MAX: + /* This is PVM only. */ + return false; case MSR_IA32_SMBASE: if (!IS_ENABLED(CONFIG_KVM_SMM)) return false; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index fca47304506e..e20a566f6d83 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -43,6 +43,7 @@ #include #include #include +#include #include #include #include @@ -7004,6 +7005,9 @@ static bool vmx_has_emulated_msr(struct kvm *kvm, u32 index) case MSR_AMD64_TSC_RATIO: /* This is AMD only. */ return false; + case PVM_VIRTUAL_MSR_BASE ... PVM_VIRTUAL_MSR_MAX: + /* This is PVM only. */ + return false; default: return true; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8ec7a36cdf3e..be8fdae942d1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -84,6 +84,7 @@ #include #include #include +#include #include #define CREATE_TRACE_POINTS @@ -1525,6 +1526,15 @@ static const u32 emulated_msrs_all[] = { MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME, MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF_INT, MSR_KVM_ASYNC_PF_ACK, + MSR_PVM_LINEAR_ADDRESS_RANGE, + MSR_PVM_VCPU_STRUCT, + MSR_PVM_SUPERVISOR_RSP, + MSR_PVM_SUPERVISOR_REDZONE, + MSR_PVM_EVENT_ENTRY, + MSR_PVM_RETU_RIP, + MSR_PVM_RETS_RIP, + MSR_PVM_SWITCH_CR3, + MSR_IA32_TSC_ADJUST, MSR_IA32_TSC_DEADLINE, MSR_IA32_ARCH_CAPABILITIES, From patchwork Mon Feb 26 14:35:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572294 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A233212B169; Mon, 26 Feb 2024 14:35:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958112; cv=none; b=BF7nK62yFlt8AQpvJaidXO1dYfof586caU2Sgh7h+JE8i8qJMeNWTs7u+S3i5jZpww7g4ps375pz9Oi52dzJ3HE4v9XrW6cGfx35OvtLkPLqzo6j8tMrhVlyWaVcVR9BdT8x723lj5FZBQSzmeuMWxPyfbmCyLBuFCNSnJtCR1o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958112; c=relaxed/simple; bh=Nr7CMfdJ9PM7G4Gfs8lhaw5MGkaNuGFz2WobPAAJvQ4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=n+B12oIDexozNrWLYZ4lurz33uWEhHoJVhSQmDtjFpMceqmkvBJQ7n/FFnMoWfelpYAwM00rjiZCjIG6Pw0AAGLaTSQTc7Cdtv/tO3LRitSifFZjsj7N4ohzKxJKfpXUU3eW8r3rdWeBLzpveb5uSt8IOj49JNRFPCTKM3pOVGg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=br/nDf4O; arc=none smtp.client-ip=209.85.210.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="br/nDf4O" Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-6e4c359e48aso1841974b3a.1; Mon, 26 Feb 2024 06:35:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958110; x=1709562910; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rO51ukeMK5niBnG3YPZsyc+qUDg8m1O0dxWi2twkaSQ=; b=br/nDf4OQd34dzP7Bdqlt01ex5TpWdMLSLLhzwe3b/AvKovf3BFt3mstethmk5QgYA e82R2WT9WGMbRyEgzLc5Vjn/ZGNcQ0dQtitlf57oK8V9XShv81OUAqJmS25MYb9xUTm2 uV/AxpROuYSzgISDPf8/aXiFnhkTnQDGMH/u+dTyRwsbvo27GsmuaKBOXABNUs+2zWyp SSgZyw+B1FmtRR1O8IQ50iWl9Fn3W/QLHWZNcK9HFAudiBlAnJlJstzsLo2hjTsq0jIY rPtUvXAYtT9lJqLIhbv0OhhiS1zI+/1k5G+UCVT9qcbPM1QWX6PUM7uRkwCHLbLC02JZ 3Tbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958110; x=1709562910; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rO51ukeMK5niBnG3YPZsyc+qUDg8m1O0dxWi2twkaSQ=; b=tIoeLJo09yZUUwJPd9AWOojqNNuspakmOaMM8AHu7oFBSiLTrNhjGho2TCuu8h1R/R SMy2Py6HGC8zXGIjvJtNsaB8uRNxLGr65Ih7lU2p2UK41bwDtzXHvnDT1nLEMP6K2sq8 n/w6mBWtUVYm5yKXyLNP8d8OFva5qkiHBng3LG1Y7MOERi1tu3Oo8sUAXE7hZD9tL1Ra nVVADqmiKLGmJQEf7M09dcDwxBDgnlFgnXz+ppjbTRPXDLsF75DRcAGYr8oR20kJ0waG WmaVYMvwHyjR+ZTZpxM+EpYFNrHSkAd8Vcs7XcZyoU+vIB+eNSj0yB2LXKFVo88+PWEC tu5Q== X-Forwarded-Encrypted: i=1; AJvYcCUXo0wcS623gi0l80aDyA0fOWacmJE7PUKH4cl31N0sPJ/lpP2AfYkYtgSvaCPeuir4mK1w3sAspFqmAwpr68Z3nmEG X-Gm-Message-State: AOJu0Yxnzj2OF9mAHwy3sm0gr0nAxRuQHujnw7I2J9ZB5h0yB+zA3KHk 8M3+GcqRPPnzP0ERuul+T/lEwNx5lyGVxkhHDuJUgxeiH+4yaQQj+rZRWIV6 X-Google-Smtp-Source: AGHT+IGJvzhfAqivfTGDTqz1/+5gHpSaGMtHtrKKQtMPmpZmCYitbZIOo7kqxHG7B9PqbaEmEEBYQg== X-Received: by 2002:a05:6a00:2d20:b0:6e5:3ec7:c068 with SMTP id fa32-20020a056a002d2000b006e53ec7c068mr882243pfb.24.1708958109539; Mon, 26 Feb 2024 06:35:09 -0800 (PST) Received: from localhost ([198.11.176.14]) by smtp.gmail.com with ESMTPSA id it2-20020a056a00458200b006e543b59587sm118471pfb.126.2024.02.26.06.35.08 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:35:09 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Wanpeng Li , Vitaly Kuznetsov , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 10/73] KVM: x86: Introduce vendor feature to expose vendor-specific CPUID Date: Mon, 26 Feb 2024 22:35:27 +0800 Message-Id: <20240226143630.33643-11-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan For the PVM guest, it needs to detect PVM support early, even before IDT setup, so the cpuid instruction is used. Moreover, in order to differentiate PVM from VMX/SVM, a new CPUID is introduced to expose vendor-specific features. Currently, only PVM uses it. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/include/uapi/asm/kvm_para.h | 8 +++++++- arch/x86/kvm/cpuid.c | 26 +++++++++++++++++++++++++- arch/x86/kvm/cpuid.h | 3 +++ 3 files changed, 35 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 6e64b27b2c1e..f999f1d32423 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -5,7 +5,9 @@ #include /* This CPUID returns the signature 'KVMKVMKVM' in ebx, ecx, and edx. It - * should be used to determine that a VM is running under KVM. + * should be used to determine that a VM is running under KVM. And it + * returns KVM_CPUID_FEATURES in eax if vendor feature is not enabled, + * otherwise KVM_CPUID_VENDOR_FEATURES. */ #define KVM_CPUID_SIGNATURE 0x40000000 #define KVM_SIGNATURE "KVMKVMKVM\0\0\0" @@ -16,6 +18,10 @@ * in edx. */ #define KVM_CPUID_FEATURES 0x40000001 +/* This CPUID returns the vendor feature bitmaps in eax and the vendor + * signature in ebx. + */ +#define KVM_CPUID_VENDOR_FEATURES 0x40000002 #define KVM_FEATURE_CLOCKSOURCE 0 #define KVM_FEATURE_NOP_IO_DELAY 1 #define KVM_FEATURE_MMU_OP 2 diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index dda6fc4cfae8..31ae843a6180 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -36,6 +36,16 @@ u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly; EXPORT_SYMBOL_GPL(kvm_cpu_caps); +u32 kvm_cpuid_vendor_features; +EXPORT_SYMBOL_GPL(kvm_cpuid_vendor_features); +u32 kvm_cpuid_vendor_signature; +EXPORT_SYMBOL_GPL(kvm_cpuid_vendor_signature); + +static inline bool has_kvm_cpuid_vendor_features(void) +{ + return !!kvm_cpuid_vendor_signature; +} + u32 xstate_required_size(u64 xstate_bv, bool compacted) { int feature_bit = 0; @@ -1132,7 +1142,10 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) break; case KVM_CPUID_SIGNATURE: { const u32 *sigptr = (const u32 *)KVM_SIGNATURE; - entry->eax = KVM_CPUID_FEATURES; + if (!has_kvm_cpuid_vendor_features()) + entry->eax = KVM_CPUID_FEATURES; + else + entry->eax = KVM_CPUID_VENDOR_FEATURES; entry->ebx = sigptr[0]; entry->ecx = sigptr[1]; entry->edx = sigptr[2]; @@ -1160,6 +1173,17 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) entry->ecx = 0; entry->edx = 0; break; + case KVM_CPUID_VENDOR_FEATURES: + if (!has_kvm_cpuid_vendor_features()) { + entry->eax = 0; + entry->ebx = 0; + } else { + entry->eax = kvm_cpuid_vendor_features; + entry->ebx = kvm_cpuid_vendor_signature; + } + entry->ecx = 0; + entry->edx = 0; + break; case 0x80000000: entry->eax = min(entry->eax, 0x80000022); /* diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index 0b90532b6e26..b93e5fec4808 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -8,6 +8,9 @@ #include #include +extern u32 kvm_cpuid_vendor_features; +extern u32 kvm_cpuid_vendor_signature; + extern u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly; void kvm_set_cpu_caps(void); From patchwork Mon Feb 26 14:35:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572295 Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B43A212FB24; Mon, 26 Feb 2024 14:35:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958115; cv=none; b=j6ADMZK8gXmd+nsuMvxdTSe6wafA03dYDenjEBBKlultfbw6DM0pG/lu+uhJ8mGcReO2/KSzUTCs4UjQ2SXctr8TVoml/IFwayxt+3YOxbwci6xK/2qYdWUwDUj7HE7cwzoU+Uo1d5CO00Qd8OKz4XBHVB+Ow+ZCFvR90GAGFgc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958115; c=relaxed/simple; bh=MIk0n5o4+C50ulodEC5+4A8eHcWwizVFpO2CWdaU2Yc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Hx8jy9ooHMR0Vl7BwSv37zg4BAF7dDAeJ8QQ7yeTlajiIh3kvFv1PKiQwA5YCThTmwGo+ntjRCdEysoMapmpTOoV1AyWbme/1tF+QUWwnHtXhqzUjDHXazGJ/w2c1ubjyG/RqPYBeOcbS9wmqfLI0dRUMJmf3pEqnyNwTKaDu0k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Bf9JCjqC; arc=none smtp.client-ip=209.85.210.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Bf9JCjqC" Received: by mail-pf1-f179.google.com with SMTP id d2e1a72fcca58-6e4f45d4369so744491b3a.0; Mon, 26 Feb 2024 06:35:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958113; x=1709562913; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9EXD5rGmwv6zXLURl1tisTGOYWsO2XTKjX2ulEK0E4M=; b=Bf9JCjqC5s2VRAAs7xxrHHc9ip6qSCwHMZmOFZegKXSkEG8o57J1gtnMcv2hEnVvfg KuhUDTnOqgf62BxCqUIMABoYYuTSAQTQi6Ie4fM8wXQhThuAJP9Gw85DFBQhZO7RA9/r fCKAZ9idDZrF03C9l9rEIqQZUQEze8lBdXgU8BY1HXgUTZIl6Yt4nySBKcHhBQnDHFbh lEkShKvuWrONiEN/HxHwOkTUPAk/Gu1iDBzu0b6YeJEZCDOv479NAsFqwcoKe76G73W8 2sa+DB+G5+rvU2pSBynp06LYrvE+L/cBLwrz9tWuDGzhygEIP0b2hVK9eH2X3VSzIha/ Oalg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958113; x=1709562913; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9EXD5rGmwv6zXLURl1tisTGOYWsO2XTKjX2ulEK0E4M=; b=YtrnMOKXncBoHxLzLiCpqqBLw43FqNhdUOGQBJ5qxYLv2o8x+g/y3bpHCeXi6Gilh7 huzm5LDy4cIUtX2ImJBzNdXSjzosS124Ba1n7k/2OiGLDO5nyCBh9K1bO1mTpTK47BjS BD04Ry1Dx3QUwtnYFM+/JfB3jyDWsUUwMZzTD9gEJScTvcLV8+4O2uINH7raFShod96G 939bBfiuBL3PrNh4LrfpcffTw61TW9DBbsMahrqvekp4TfMSyTEX4EHkdtOp6LMV2FW6 Y6aamfm646XBgUGUH0dufuvtNLT/gwcJGrY9xy+fO+VI1smCTxuneM38lV0F38mHE1T4 C8VA== X-Forwarded-Encrypted: i=1; AJvYcCXVYzu+PqL+oYllIjVITo/ePKPQKspuwo3+vAr1EPko3BX3qG1cmWX2f7nfUaM2V51VfEGKY7R+g+8Zy4/5wJ3b2qx2 X-Gm-Message-State: AOJu0Yy/HBqL9KxluxmG3zWt/hTr4JNGn04gLk9YOj+pxtkqhSPs8Xf7 Ni0YTyVbbBLD5cbEcFv45P4c6DR9drsVznnM4yn+vzgsAk6DVxfibl5D0Vq3 X-Google-Smtp-Source: AGHT+IFofjGPN31+0LX8dDXmoFM6o8f3Gqf5ulADBej6kpr2ajxjNY65fM6XeEamxOZIfV8ZpecV5Q== X-Received: by 2002:a05:6a00:23cb:b0:6e4:7b26:3f28 with SMTP id g11-20020a056a0023cb00b006e47b263f28mr8149200pfc.21.1708958112888; Mon, 26 Feb 2024 06:35:12 -0800 (PST) Received: from localhost ([198.11.178.15]) by smtp.gmail.com with ESMTPSA id g3-20020a62f943000000b006e537f3c487sm1241269pfm.127.2024.02.26.06.35.12 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:35:12 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Hou Wenlong , Lai Jiangshan , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 11/73] KVM: x86: Implement gpc refresh for guest usage Date: Mon, 26 Feb 2024 22:35:28 +0800 Message-Id: <20240226143630.33643-12-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Hou Wenlong PVM uses pfncache to share the PVCS structure between the guest and host. The flag of pfncache of PVCS is initialized as KVM_GUEST_AND_HOST_USE_PFN because the PVCS is used inside the vcpu_run() callback, even in the switcher, where the vcpu is in guest mode. However, there is no real usage for GUEST_USE_PFN, so the request in mmu_notifier only kicks the vcpu out of guest mode and no refresh is done before the next vcpu_run(). Therefore, a new request type is introduced to request the refresh, and a new callback is used to service the request. Suggested-by: Lai Jiangshan Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/mmu/mmu.c | 1 + arch/x86/kvm/x86.c | 3 +++ include/linux/kvm_host.h | 10 ++++++++++ virt/kvm/pfncache.c | 2 +- 6 files changed, 19 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index 32e5473b499d..0d9b21988943 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -94,6 +94,7 @@ KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr) KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) KVM_X86_OP(load_mmu_pgd) KVM_X86_OP_OPTIONAL_RET0(disallowed_va) +KVM_X86_OP_OPTIONAL(vcpu_gpc_refresh); KVM_X86_OP(has_wbinvd_exit) KVM_X86_OP(get_l2_tsc_offset) KVM_X86_OP(get_l2_tsc_multiplier) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d17d85106d6f..9223d34cb8e3 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1683,6 +1683,8 @@ struct kvm_x86_ops { bool (*disallowed_va)(struct kvm_vcpu *vcpu, u64 la); + void (*vcpu_gpc_refresh)(struct kvm_vcpu *vcpu); + bool (*has_wbinvd_exit)(void); u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu); @@ -1839,6 +1841,7 @@ static inline int kvm_arch_flush_remote_tlbs(struct kvm *kvm) } #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE +#define __KVM_HAVE_GUEST_USE_PFN_USAGE #define kvm_arch_pmi_in_guest(vcpu) \ ((vcpu) && (vcpu)->arch.handling_intr_from_guest) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 80406666d7da..7bd88f7ace51 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6741,6 +6741,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) { kvm_mmu_zap_all_fast(kvm); + kvm_make_all_cpus_request(kvm, KVM_REQ_GPC_REFRESH); } void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index be8fdae942d1..89bf368085a9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10786,6 +10786,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu)) static_call(kvm_x86_update_cpu_dirty_logging)(vcpu); + + if (kvm_check_request(KVM_REQ_GPC_REFRESH, vcpu)) + static_call_cond(kvm_x86_vcpu_gpc_refresh)(vcpu); } if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win || diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 4944136efaa2..b7c490e74704 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -167,6 +167,7 @@ static inline bool is_error_page(struct page *page) #define KVM_REQ_VM_DEAD (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_UNBLOCK 2 #define KVM_REQ_DIRTY_RING_SOFT_FULL 3 +#define KVM_REQ_GPC_REFRESH (5 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQUEST_ARCH_BASE 8 /* @@ -1367,6 +1368,15 @@ int kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, unsigned long len); */ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc); +static inline unsigned int kvm_gpc_refresh_request(void) +{ +#ifdef __KVM_HAVE_GUEST_USE_PFN_USAGE + return KVM_REQ_GPC_REFRESH; +#else + return KVM_REQ_OUTSIDE_GUEST_MODE; +#endif +} + void kvm_sigset_activate(struct kvm_vcpu *vcpu); void kvm_sigset_deactivate(struct kvm_vcpu *vcpu); diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index 2d6aba677830..f7b7a2f75ec7 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -59,7 +59,7 @@ void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start, * KVM needs to ensure the vCPU is fully out of guest context * before allowing the invalidation to continue. */ - unsigned int req = KVM_REQ_OUTSIDE_GUEST_MODE; + unsigned int req = kvm_gpc_refresh_request(); bool called; /* From patchwork Mon Feb 26 14:35:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572296 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 29CE612FF72; Mon, 26 Feb 2024 14:35:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958118; cv=none; b=RZOj+k8UvyZpWTIRHnAQF14Dv4+eU/g8mHa5FWkkMdtfxqUKtpo7sWQMoPG54YlFxJQ/ec59fEv4eX948THc/U9GodgndzhRhzjrQAysSPiZINX8tGxDXmXzmhklRouAJfbl0uH/xgntQrHI7dwWPFaGBWHLyZm2M0Nhpz5LXmg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958118; c=relaxed/simple; bh=bVULcf1cUtmaXpPyRTkatm0ACsMJlUTYThVtsgkA4kk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=qs/7z44xZ9nOqSGgkME+Fi6Nz75EaeJWA+HgFp74O0MAf5yZ9MBsuxLgn3dsJQaF3LfhBV8g0N/cHDYNCFt+E6uL2NVuUjjNkga8hX9c/oqTMj9OM0HHEZTgRD5fuu3dPhkqB8552+DROE2ySOIVxyZdX02Udu8fKxpnYfPPql4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=evgEFPLY; arc=none smtp.client-ip=209.85.214.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="evgEFPLY" Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-1db6e0996ceso23678875ad.2; Mon, 26 Feb 2024 06:35:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958116; x=1709562916; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Q55Xz98tQTUsOL6Z4q/HueBOFbtRKoeK1/Jth2H7uWk=; b=evgEFPLYiGYRqdH9fqKvZY7qFW/7RwsR+lzdGHxC4Y6hmy8v0FCOaC1YnLzC4slBcW bTJPict8UkCBh5uPsJujbBAkdHcqPjru4f/F3QuYkOy1tr+z+xqx+5jH97dJMbsrhtc9 +RJmMY/gkL/Fb6WXqVVJiALIx+4soDXYivHTyGG/AYin//QABGUE/7H0Xd7rmcYd8HPZ /VxCFQrWgdGY0sJ/lpchUh4M+QF7udB59/CRQSdeibsZkK1phMdSe+6j+z0T5QHjl5lh YU6vLA3rZBr0vEVpSUVqwlzFuMz9CnNGDOBCPtunGTtruqVkrIB3bBdP6SdlenPqRBGd Ki1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958116; x=1709562916; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Q55Xz98tQTUsOL6Z4q/HueBOFbtRKoeK1/Jth2H7uWk=; b=Yd3Ra2vmINw4agfZRaToHf+geYJm1GEEVdVAJVOG0Ykb7FgLSkk/Nlg4ASRmebdeZa 7Ldwvx2jJyq9nuIayBFPGrNRSwYEJBUaXhNGup6boEAi8ora1hZzU5d9POHrRlyEDgWP PZ9SxYGDzuJCpCMkrl1v/NIupyX69CkK+1/oEKC+rN0zjNmIB29ljFmFFJiqt+vKXh7p jKWtcfAJkdU8yOJG/z2115C/bYxzrgKmCuGA3fslH6yVy67rrerwkn7gMvKdIjCEXFbi DxdOntfj/MY+RR3qgzwRFELmWfg/sfO9air4MasKzN97OWxSjrfM4FH/YfqrXbcP6HQ8 FqiA== X-Forwarded-Encrypted: i=1; AJvYcCUK99lkoBzTXbewHOmoPb1uX5x2HIGNVcT9dcY2WoA3kpUBfXWKnGnDebZbVQwLcsdeDESbEotNq6ReQxtR/SnWGSTN X-Gm-Message-State: AOJu0Yx88j1OCSCa5J7OcFzbiWYNnuWiGCVS2JIrtfA8PWZCtw1iT7Gm /KtQ8T+hqDJVa3fh0dJF/d2Nhmb2IsAHrlOYqqR3aNaxKghSNyrxtzcd2Fjq X-Google-Smtp-Source: AGHT+IHkBPkB5xny3ra/TgRPvNN08vKJiwrUxBUwYLd3pBdUclIcjQgGvyRHpphiykpFXPS63hd0/g== X-Received: by 2002:a17:903:449:b0:1da:2c01:fef5 with SMTP id iw9-20020a170903044900b001da2c01fef5mr6714781plb.56.1708958115987; Mon, 26 Feb 2024 06:35:15 -0800 (PST) Received: from localhost ([47.89.225.180]) by smtp.gmail.com with ESMTPSA id ks14-20020a170903084e00b001db93340f9bsm4026898plb.205.2024.02.26.06.35.15 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:35:15 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 12/73] KVM: x86: Add NR_VCPU_SREG in SREG enum Date: Mon, 26 Feb 2024 22:35:29 +0800 Message-Id: <20240226143630.33643-13-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan Add NR_VCPU_SREG to describe the size of the SREG enum, this allows for the definition of the size of an array. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/include/asm/kvm_host.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 9223d34cb8e3..a90807f676b9 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -204,6 +204,7 @@ enum { VCPU_SREG_GS, VCPU_SREG_TR, VCPU_SREG_LDTR, + NR_VCPU_SREG, }; enum exit_fastpath_completion { From patchwork Mon Feb 26 14:35:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572297 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 53834130AC9; Mon, 26 Feb 2024 14:35:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958121; cv=none; b=oJzPqdfaBtzb0mKizJ+7Wj2m2peZ+fBr71m/6XmJNuf3Qz8ZrEP+KhSMKxtdEvSyjoUxNJpQo+GtX7ySUAKUa5Sr2jtKSQf8776ZzR4LB8QAOUnLisK0RIRPAFKvKkzyXSglbY7NUJLo3X41fogusfhrhrqEGezTi3P3JiH7Ecg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958121; c=relaxed/simple; bh=ZRftprPhA+Ho5+u0w8l6wtMzwvZBZSuQdUzsf46+o4E=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=WKUF33lsAaHY1wabWaVoEnyCfIExGx4itJYJZfw+TmZnhOh4+ZjuFAD4dGKQVvxov8CDLh/ejkWPE1jLQfJx07igAyXl1dIQ8DPX3AH52zZOrkiy+cIVXSMr84lyHXOtfXExXZ+F6s5Y3ldk0QYasG6DFrhMorQb818wJ18Uzcg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Ioavjcri; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Ioavjcri" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-1dc09556599so27193095ad.1; Mon, 26 Feb 2024 06:35:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958119; x=1709562919; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TfQmPSzBj4Fo3ydJK18pQVJpu7jjOOVodiv0WFv/iog=; b=IoavjcriLGSpOwaBMKnP3g1xUQudQL7wBAPYtyBmN4ASdWI6YMyElDpfjFjEaS0Vuh F6ZI9TFneMSHVc6V+dyz2veZY+qyot83CLc4PFZkZvh8uxiNyVo29jRbSCONMaiJHsmw ganQv6YHifIcCkXjiP5rPPO1wYo4zRT2AP3t0ijciHYbtfAl2IZbhmSQ2otHwk9SzPt7 rhYehM7agczxcb6qC+nR6BN31dIKx/4dYQ9wnotMTSgC4KDlmGyKaIkL+fCnS81aJD2v lz5mzc1QCOcI0C+DwIHQVz/fyQ216ip1VDy+QXKH6ULn+ZaQ7s94YMPawyF6dzwvK7Oa WcVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958119; x=1709562919; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TfQmPSzBj4Fo3ydJK18pQVJpu7jjOOVodiv0WFv/iog=; b=a0m7U02+vaXgB1uUOyvIrA38v0AnsERiKYeA2sqtmmhhcppZnFOvxZl9DhIpDA/roY Gki+/eDBf8SHT5Swcqrg8c34I8avcVUX55ZQFFfMrvUmLWL6R1cMa2HbZVdYpGuAVshl YuY+cuLrVVzf2jG2M3pkqqbGP3SB7gUHEYyw8hdDkyH0hPj+hsb2YE4eV/VikGotwji2 gviulTcGulM3maZQjtW5o2sfWYGlAEJwxA4Mm8Yc/0tLZdFsDl08YGhHKeaQxc/RQnNv HTAfZ/2WWXQLjZ4YGWWg/kuvFlh+GLqPdeFD4gxTba+Q3LKeXwo74qSPEpawjd7YAnmt FszQ== X-Forwarded-Encrypted: i=1; AJvYcCVV51VsDNKGc8zHOICux2KElu39o2xp5PQPSY/99xWo7JFWGvbNhscpN98q2Ip99X+otusguBt+QKiECgXd0wrKsDJv X-Gm-Message-State: AOJu0Yw7DYIFakPRzDmt4dGQuKPMG05+kJct640CPC3sOhONAZgWVegs 2fEiqTtIK6RmPfgTINwxKdilVC+4G6kQeRHV+/V2MiFRToGThPYzpDaAfdQT X-Google-Smtp-Source: AGHT+IHldQr53EBhTOxhBVe+CUB5Y00wRU3RJwgsiwc9/HvQeCqAwP/KKcj86yTFl3OFFeND7E+IQw== X-Received: by 2002:a17:902:6e01:b0:1dc:8508:8e35 with SMTP id u1-20020a1709026e0100b001dc85088e35mr6453645plk.68.1708958119044; Mon, 26 Feb 2024 06:35:19 -0800 (PST) Received: from localhost ([198.11.176.14]) by smtp.gmail.com with ESMTPSA id v4-20020a1709028d8400b001dc90b62393sm2882249plo.216.2024.02.26.06.35.18 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:35:18 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Hou Wenlong , Lai Jiangshan , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 13/73] KVM: x86/emulator: Reinject #GP if instruction emulation failed for PVM Date: Mon, 26 Feb 2024 22:35:30 +0800 Message-Id: <20240226143630.33643-14-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Hou Wenlong The privilege instruction in PVM guest supervisor mode will trigger a instruction triggers a #GP in PVM guest supervisor mode and is not implemented in the emulator, the emulator will currently exit to userspace, and VMM may not be able to handle it. This can be triggered by guest userspace, e.g., a guest userspace process can corrupt the XSTATE header in a signal frame and the XRSTOR in the guest kernel will trigger a #GP, but XRSTOR is not implemented in the emulator now. Therefore, a new emulate type for PVM is added to instruct the emulator to reinject the #GP into the guest. Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/include/asm/kvm_host.h | 8 ++++++++ arch/x86/kvm/x86.c | 5 +++-- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index a90807f676b9..3e6f27865528 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1954,6 +1954,13 @@ u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu); * the gfn, i.e. retrying the instruction will hit a * !PRESENT fault, which results in a new shadow page * and sends KVM back to square one. + * + * EMULTYPE_PVM_GP - Set when emulating an intercepted #GP for PVM. Privilege + * instruction in PVM guest supervisor mode will trigger a + * #GP and be emulated by PVM. But if a non-privilege + * instruction triggers a #GP in PVM guest supervisor mode + * and is not implemented in the emulator, the emulator + * should reinject the #GP into guest. */ #define EMULTYPE_NO_DECODE (1 << 0) #define EMULTYPE_TRAP_UD (1 << 1) @@ -1964,6 +1971,7 @@ u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu); #define EMULTYPE_PF (1 << 6) #define EMULTYPE_COMPLETE_USER_EXIT (1 << 7) #define EMULTYPE_WRITE_PF_TO_SP (1 << 8) +#define EMULTYPE_PVM_GP (1 << 9) int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type); int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 89bf368085a9..29413cb2f090 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8664,7 +8664,7 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type) ++vcpu->stat.insn_emulation_fail; trace_kvm_emulate_insn_failed(vcpu); - if (emulation_type & EMULTYPE_VMWARE_GP) { + if (emulation_type & (EMULTYPE_VMWARE_GP | EMULTYPE_PVM_GP)) { kvm_queue_exception_e(vcpu, GP_VECTOR, 0); return 1; } @@ -8902,7 +8902,8 @@ static bool kvm_vcpu_check_code_breakpoint(struct kvm_vcpu *vcpu, * and without a prefix. */ if (emulation_type & (EMULTYPE_NO_DECODE | EMULTYPE_SKIP | - EMULTYPE_TRAP_UD | EMULTYPE_VMWARE_GP | EMULTYPE_PF)) + EMULTYPE_TRAP_UD | EMULTYPE_VMWARE_GP | + EMULTYPE_PVM_GP | EMULTYPE_PF)) return false; if (unlikely(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP) && From patchwork Mon Feb 26 14:35:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572298 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1FF48130ADA; Mon, 26 Feb 2024 14:35:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958125; cv=none; b=JeLh5HVJyKICf5Pr5BjbxD1OVGDXl4ZSA5/hmS7uFX+4OV/zT5SdRC1GftLB5C/IT86urun7by4ToApdgcFM/lYm/9lvP6VQ+UzwwR/swPiCVxc3TZ/uNvAD30gMhqK+RA/0HE+3gOOmqY/zNQRNh/jKCG3MEfnudfdt1tjjKvA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958125; c=relaxed/simple; bh=Fc9fjKhu9+1SCbCYIk+yYMNj9ArhrQq5vfxkzOjYC2k=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=F5hKWLnj87I1wysMzfF/D2IdYEg6AbdASUq8wNUCYSjZHEd+k7Khm00EW7Hga4/Ql21cTJ/nJae0Z0Cx0yJrfd0eMTCT310ra+vGNeVXb8xsXx7EjqQoIYdGD7g2TepaaIEypAMRkaF3B6+4Cfkb2zScOUVnG5F8lW/9snpTBXE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=G1oT6GPZ; arc=none smtp.client-ip=209.85.210.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="G1oT6GPZ" Received: by mail-pf1-f181.google.com with SMTP id d2e1a72fcca58-6e459b39e2cso1729865b3a.1; Mon, 26 Feb 2024 06:35:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958123; x=1709562923; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=koLrT/8ZaHBP75M3jJkVq1itGq5s215EKD4aR1Xrs9Q=; b=G1oT6GPZ/aVd87JtJucn5V/abf3QP6KbbG83vB+1Ir1FY2c1zQaraSW0UoKfNT4Giz 1SCGPQgBSPAHDG32EMebHr2KBiED+M+M8GUbbkVn4wO3U4iQC1RjoAEeQdnIQgeb967d RCaSDKL9I32YpML20S2q150E1VU8/GzrTXCnpNwHnhVprdUf3h/8G44TJwcVdvkxGchm hpqNRYvqecK63zbAgS1tw51N+F1wpY9oHnxmMeLizm2Sp1e6HzK65+AP5rWuEppOAsJW YuYAvW2swzm5xx5lpYLbHK5+Ov2SEpASHevdKidPgiasEkavmbIMK7C+KWAYlUewE9bE epTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958123; x=1709562923; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=koLrT/8ZaHBP75M3jJkVq1itGq5s215EKD4aR1Xrs9Q=; b=NSvc+KnhB6BvVY+P5c+qPWaqGNmynB5UBBwfaEZU5BIyVDa5qJMzvAIGX1kCisqrvL iBRsbbKHcXJTJZAJvwJgtTA2TwTVMXJhmrErEcvkj2knoqXve448S4MtRED1YazIQQpo Vii5UO723DRpgRwu7PPI1kuXQUgV3MWeI3FvDCsJfxNoG0CydyqHH7lUjU2tnj9WMIhh Z+DWuPTYs/NbpvLv2+2ZZU6oblgb970s6vWpve/rwj0D+8WSX1PQEyqQ06VdAL66Pj/t pQWcsFmCSfHR6mC+0CHDw9oxPbW2GoZf0xgyITUXRJwTiuqXTjP1JLbP+3phe07WNnBo 9tzw== X-Forwarded-Encrypted: i=1; AJvYcCW8CxC/SIGjchO7vetrHeM1/2SPhGjvbjipgFn251hVZum/PUAtlDgzzcxPX7/97KuJZ9Al6dLgRt8dQpX/hb/5H1wL X-Gm-Message-State: AOJu0YzdRTXZcWXtbCGf6FDA6iUj6qOTn34yWpk8sqR64vGtT5Rp5YoB FpcN2WQrVG5lH6SgfitapNCVBPNQ5gCTeB1M0NXei9970zmNUFmMEdEXoxyM X-Google-Smtp-Source: AGHT+IGkIaU97O2tmt7HU52M8oE3r8e7BYbysBBvgkybW7v+NpBM415W6UhaXfuBCJz7jrhFnep1IA== X-Received: by 2002:a05:6a20:9593:b0:1a0:ecf0:640a with SMTP id iu19-20020a056a20959300b001a0ecf0640amr10170340pzb.9.1708958123223; Mon, 26 Feb 2024 06:35:23 -0800 (PST) Received: from localhost ([47.88.5.130]) by smtp.gmail.com with ESMTPSA id w13-20020aa79a0d000000b006e02f4bb4e4sm4216825pfj.18.2024.02.26.06.35.22 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:35:22 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 14/73] KVM: x86: Create stubs for PVM module as a new vendor Date: Mon, 26 Feb 2024 22:35:31 +0800 Message-Id: <20240226143630.33643-15-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan Add a new Kconfig option and create stub files for what will eventually be a new module named PVM (Pagetable-based PV Virtual Machine). PVM will function as a vendor module, similar to VMX/SVM for KVM, but it doesn't require hardware virtualization assistance. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/Kconfig | 9 +++++++++ arch/x86/kvm/Makefile | 3 +++ arch/x86/kvm/pvm/pvm.c | 26 ++++++++++++++++++++++++++ 3 files changed, 38 insertions(+) create mode 100644 arch/x86/kvm/pvm/pvm.c diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 950c12868d30..49a8b3489a0a 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -118,6 +118,15 @@ config KVM_AMD_SEV Provides support for launching Encrypted VMs (SEV) and Encrypted VMs with Encrypted State (SEV-ES) on AMD processors. +config KVM_PVM + tristate "Pagetable-based PV Virtual Machine" + depends on KVM && X86_64 + help + Provides Pagetable-based PV Virtual Machine for KVM. + + To compile this as a module, choose M here: the module + will be called kvm-pvm. + config KVM_SMM bool "System Management Mode emulation" default y diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index 97bad203b1b1..036458a27d5e 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -33,9 +33,12 @@ ifdef CONFIG_HYPERV kvm-amd-y += svm/svm_onhyperv.o endif +kvm-pvm-y += pvm/pvm.o + obj-$(CONFIG_KVM) += kvm.o obj-$(CONFIG_KVM_INTEL) += kvm-intel.o obj-$(CONFIG_KVM_AMD) += kvm-amd.o +obj-$(CONFIG_KVM_PVM) += kvm-pvm.o AFLAGS_svm/vmenter.o := -iquote $(obj) $(obj)/svm/vmenter.o: $(obj)/kvm-asm-offsets.h diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c new file mode 100644 index 000000000000..1dfa1ae57c8c --- /dev/null +++ b/arch/x86/kvm/pvm/pvm.c @@ -0,0 +1,26 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Pagetable-based Virtual Machine driver for Linux + * + * Copyright (C) 2020 Ant Group + * Copyright (C) 2020 Alibaba Group + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ +#include + +MODULE_AUTHOR("AntGroup"); +MODULE_LICENSE("GPL"); + +static void pvm_exit(void) +{ +} +module_exit(pvm_exit); + +static int __init pvm_init(void) +{ + return 0; +} +module_init(pvm_init); From patchwork Mon Feb 26 14:35:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572299 Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5A8F5130E34; Mon, 26 Feb 2024 14:35:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958128; cv=none; b=LwdDR6wcxNfPmYMWAOuF0Z8pXQtDDPCK20SKmnLhs9O6fFWIvNLeg1xZhWybgtaX6VP5H8AF3Uj+8wa4kLVKkPkMsD7vuMOwcB4eR/N85Cv0+glSOx1jZHyuICO22iyCEsOMYAGF571XkO84dBvVyWPZtZ4/zpeTd25BEsLENeM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958128; c=relaxed/simple; bh=bsV9BT402z/Bw0WTA6E9ywSofSFLcIXPfojxj3LNnZs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Y5gG6KcU9pTkELrI/5hgAd5sbmycI7C+ywrRp5pV3JhfesHD9pQ4JycbXtyXdCZmDe12bpKrYfkoQMyihdZ/ukVv0ORNvyhR+48x035BxyL66+gUozSwlWlHcv9kWO+7rHdJm+RWva0YwpG0DzdJ34bp3MZbpCzUimufKJiOcz4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=QMqw5NCR; arc=none smtp.client-ip=209.85.210.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="QMqw5NCR" Received: by mail-pf1-f177.google.com with SMTP id d2e1a72fcca58-6da9c834646so3175976b3a.3; Mon, 26 Feb 2024 06:35:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958126; x=1709562926; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=R7SpjWW3M3VT8K4eO43rbegmmX62zA+So7hwLN6Ereo=; b=QMqw5NCR0RJRsriE0ZN9FFW2tJ0rkM6qN6R8JKYCXTn9hfxq9N40rAdpwBLVW7hSQ8 Sx6AtBEGjUTiGX+pUX/FNF3+gkT9CzQ0Ln7GnT2TQ3QksuWoEJhvKx/RwGEER5G6/2WV r7sCU8X9+daKD6Cijvvmc4oPsfrfOnJuxa7zxGVzjeMWT5jK24lSiIy4ko1UhMnAV/JV ss2wjuSUDZm+6m22bP9Za5b0SloN/MvGCHHx2NbqY37O4+pvT+rvjhA2PdV8UXkDAkY6 FVzF3RL7eVYiF4+rg0lS7y9rglLepO8b+owNHKcxw3Q7DhP7aCIlV2hl/weJb01UESDX Hs9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958126; x=1709562926; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=R7SpjWW3M3VT8K4eO43rbegmmX62zA+So7hwLN6Ereo=; b=uHqcQruh7Z/B8TCjizlj+lyGlqdXFW3O+ob7/oQDGk1WUyD9zLqp+Ej7S3Cg6iTdA1 eW+1bsUBW6nWebACUxeuM95L3ba3ecTeADXrTLj3I0iWskLFVCKGQYSh9+UOHPTqLQiP 4W6Pf3eOCmBjouIVeCMMNjwuNO8wfT0+HY/YGpOIgYZylhaoj+YyWVWuTuOEmGO4cEA4 rj/x0Rm+gMlA+boVSwQ8uaRmyG8dXS59Ox85CsSZavMP8OeWStP7M65dLjoVMh6gJVJX EMU1zOyQ1FxrGzkqlBUlz4Yrc8s42eTlxnWris3pL0A6o5yIXighSfsR3X373lZqCUED xvZw== X-Forwarded-Encrypted: i=1; AJvYcCVxBlwQuHNGV0+DfLXfkrOlIv643BEHBgiNh4CEIr/QKE3dEQ3XbyrWqTZfbVh8BJKTI0JK2Rd+jCfJ203FgLXyLMhY X-Gm-Message-State: AOJu0Yw3TsKfykZoDyMdRhCi6oqt+rZThmbx38YTYra+DIOiHRvCpf/0 5ZWJ1NwQ7M+phEpgo28aHrM7uwfujioefUw3Al2vvN6TFOYtop+DPhVRCKbd X-Google-Smtp-Source: AGHT+IGxAib7PTEOyHxSwfGQM+HxjrBdjOJVRLrLNJZ8c+/eWVFjsZhC6aQYApF7JqdEKo6G4JQCnw== X-Received: by 2002:a05:6a00:92:b0:6e5:9a7:34b1 with SMTP id c18-20020a056a00009200b006e509a734b1mr5652064pfj.4.1708958126496; Mon, 26 Feb 2024 06:35:26 -0800 (PST) Received: from localhost ([47.88.5.130]) by smtp.gmail.com with ESMTPSA id fn7-20020a056a002fc700b006e4e7cafd65sm4076819pfb.42.2024.02.26.06.35.25 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:35:26 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Hou Wenlong , Lai Jiangshan , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , linux-mm@kvack.org Subject: [RFC PATCH 15/73] mm/vmalloc: Add a helper to reserve a contiguous and aligned kernel virtual area Date: Mon, 26 Feb 2024 22:35:32 +0800 Message-Id: <20240226143630.33643-16-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Hou Wenlong PVM needs to reserve a contiguous and aligned kernel virtual area for the guest kernel. Therefor, add a helper to achieve this. It is a temporary method currently, and a better method is needed in the future. Suggested-by: Lai Jiangshan Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- include/linux/vmalloc.h | 2 ++ mm/vmalloc.c | 10 ++++++++++ 2 files changed, 12 insertions(+) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index c720be70c8dd..1821494b51d6 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -204,6 +204,8 @@ static inline size_t get_vm_area_size(const struct vm_struct *area) } extern struct vm_struct *get_vm_area(unsigned long size, unsigned long flags); +extern struct vm_struct *get_vm_area_align(unsigned long size, unsigned long align, + unsigned long flags); extern struct vm_struct *get_vm_area_caller(unsigned long size, unsigned long flags, const void *caller); extern struct vm_struct *__get_vm_area_caller(unsigned long size, diff --git a/mm/vmalloc.c b/mm/vmalloc.c index d12a17fc0c17..6e4b95f24bd8 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2642,6 +2642,16 @@ struct vm_struct *get_vm_area(unsigned long size, unsigned long flags) __builtin_return_address(0)); } +struct vm_struct *get_vm_area_align(unsigned long size, unsigned long align, + unsigned long flags) +{ + return __get_vm_area_node(size, align, PAGE_SHIFT, flags, + VMALLOC_START, VMALLOC_END, + NUMA_NO_NODE, GFP_KERNEL, + __builtin_return_address(0)); +} +EXPORT_SYMBOL_GPL(get_vm_area_align); + struct vm_struct *get_vm_area_caller(unsigned long size, unsigned long flags, const void *caller) { From patchwork Mon Feb 26 14:35:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572300 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 86CE812B159; Mon, 26 Feb 2024 14:35:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958135; cv=none; b=PZ2FnLFzRAyGEvlNY7qrxpEBF+RBMJlyxKkjZxSG25syI2gDzakoJrvoK4ROVmd5sCYclDUIQqouELZB7IXyLoscSWZxdfblzKYdQPfuTF1rUTGJNT88RrcTh7BwcNjE/ow7UYP2TtQwyMRmb3DXaJhN9/jMn8TgU2Kyqy45OQk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958135; c=relaxed/simple; bh=xauN1X+DnKCvc9vbegp66abdsujgHus+chSqx8LSMo4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=cU+yo7csTPoIREC1np41RTCOTZR+Fv1k/XyjvfRvxysaW0L7oK0C87yUhdk6xx4yyrJIdbyMoMHT8UAuICb3+8eSS581bmQixMZpY4a8gz4qITz+NDcX2jstWNsRGKpMBJ/MpgLpIdgZyrq+JZkImHyXYiPvErh3ffbzluWL5VA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=UjjzPkEZ; arc=none smtp.client-ip=209.85.210.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="UjjzPkEZ" Received: by mail-pf1-f181.google.com with SMTP id d2e1a72fcca58-6e4f45d4369so744848b3a.0; Mon, 26 Feb 2024 06:35:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958132; x=1709562932; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=yD1HeaqObCXgnz+2gI2K/L5qfqtKYhGCecQdpxW3XxA=; b=UjjzPkEZq9hMSxPU1ddaV6LPXmnsD/2oza3L93R38tcG5l5ZCmEqlvmg2lmw4FZlab 6hR4McxIk9YVMn3QlXzgA6sDcImysA/J1NLBc38GqFoXncvFUdeW7BqIjUoqhBOgN9Gs YhlHGg84/uJugoFui63Wd7emeInHMw3ZyWEF5CiCosG3S0Rlo+GhqQAwfHSbVpxmu6Jl i0cnno9OmcIqjMJpr9Nb71/EAFjt4LAZCXboVdUQ1rboxSSNgQuY7ufLzF8bUKF50pd7 M4S9rHYP2fitES7qyxByR2ZGmm02nPEAFt+FPefDjtw4i2KV1aeRhK4K8aQ7rWulwN/f F+FQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958132; x=1709562932; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yD1HeaqObCXgnz+2gI2K/L5qfqtKYhGCecQdpxW3XxA=; b=PKW1Ic7ZwP1GRgYAWXWTxbcKaC0vFPBT3TE7n+fK9V/IycSuEWz/4sNssg/5HBcrnc pnSG9dJI7/fXBDMfzFnR7fKdh3yixB7WkUcVK9L9cIZ53QYjIFo38nc+IXiIczkA6Evw MOCjSZ7pENIUNFPNvtLIiLAK/a9XivFXWHUhseYhUhzzViZBigJBaE+W+zlKLeG+e661 i7QF2sbKfLt/EkPJVv7DWQgXgc0OqgUGPUeZr5fyrHBnNbAsoqDuxhMqAQ43y3Lu4ODV rgmBp6bOVRTGK6TyMj+KO8bRcdII9fdpFr0kRbB/nHyzlzvCWJx2ABPHDrNwQFfLrR1v luuA== X-Forwarded-Encrypted: i=1; AJvYcCWuzPk/ImF+zkITTrSFyyLTC03Y6ttkLARUh6zb8Z3dbi0S3pjLzQTaZuYhLSZmFE1+kjgzOiKvJ5j0KulIaKmZQJ85 X-Gm-Message-State: AOJu0Yx9sEtfuTyuNqEGkUAScjVzs/gRwy0oVr5yVHLMpOH1/9i20Xma HQsmfgBw4O5rUHEGinRUVyDR1Qayrp/GfnxU0mnQCNMuor94LDeanQLki4iJ X-Google-Smtp-Source: AGHT+IH8QenAPT1BN1EhFBpA+vGG63V6RRQrHdNvIxBeli/0imj6TysqynunDG5ZYc7ZTHjKGHqkmg== X-Received: by 2002:a05:6a00:3c86:b0:6e1:3cdb:76f1 with SMTP id lm6-20020a056a003c8600b006e13cdb76f1mr9954405pfb.20.1708958130791; Mon, 26 Feb 2024 06:35:30 -0800 (PST) Received: from localhost ([198.11.176.14]) by smtp.gmail.com with ESMTPSA id bn20-20020a056a00325400b006e2301e702fsm4138353pfb.125.2024.02.26.06.35.30 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:35:30 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 16/73] KVM: x86/PVM: Implement host mmu initialization Date: Mon, 26 Feb 2024 22:35:33 +0800 Message-Id: <20240226143630.33643-17-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan For PVM, it utilizes shadow paging as MMU virtualization for the guest. As the switcher supports guest/host world switches, the host kernel mapping should be cloned into the guest shadow paging table, similar to PTI. For simplicity, only the PGD level is cloned. Additionally, the guest Linux kernel runs in high address space, so PVM will reserve a kernel virtual area in the host vmalloc area for the guest, also at the PGD level. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/Makefile | 2 +- arch/x86/kvm/pvm/host_mmu.c | 119 ++++++++++++++++++++++++++++++++++++ arch/x86/kvm/pvm/pvm.h | 23 +++++++ 3 files changed, 143 insertions(+), 1 deletion(-) create mode 100644 arch/x86/kvm/pvm/host_mmu.c create mode 100644 arch/x86/kvm/pvm/pvm.h diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index 036458a27d5e..706ccf3eca45 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -33,7 +33,7 @@ ifdef CONFIG_HYPERV kvm-amd-y += svm/svm_onhyperv.o endif -kvm-pvm-y += pvm/pvm.o +kvm-pvm-y += pvm/pvm.o pvm/host_mmu.o obj-$(CONFIG_KVM) += kvm.o obj-$(CONFIG_KVM_INTEL) += kvm-intel.o diff --git a/arch/x86/kvm/pvm/host_mmu.c b/arch/x86/kvm/pvm/host_mmu.c new file mode 100644 index 000000000000..35e97f4f7055 --- /dev/null +++ b/arch/x86/kvm/pvm/host_mmu.c @@ -0,0 +1,119 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * PVM host mmu implementation + * + * Copyright (C) 2020 Ant Group + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include + +#include +#include +#include + +#include "mmu.h" +#include "mmu/spte.h" +#include "pvm.h" + +static struct vm_struct *pvm_va_range_l4; + +u32 pml4_index_start; +u32 pml4_index_end; +u32 pml5_index_start; +u32 pml5_index_end; + +static int __init guest_address_space_init(void) +{ + if (IS_ENABLED(CONFIG_KASAN_VMALLOC)) { + pr_warn("CONFIG_KASAN_VMALLOC is not compatible with PVM"); + return -1; + } + + pvm_va_range_l4 = get_vm_area_align(DEFAULT_RANGE_L4_SIZE, PT_L4_SIZE, + VM_ALLOC|VM_NO_GUARD); + if (!pvm_va_range_l4) + return -1; + + pml4_index_start = __PT_INDEX((u64)pvm_va_range_l4->addr, 4, 9); + pml4_index_end = __PT_INDEX((u64)pvm_va_range_l4->addr + (u64)pvm_va_range_l4->size, 4, 9); + pml5_index_start = 0x1ff; + pml5_index_end = 0x1ff; + return 0; +} + +static __init void clone_host_mmu(u64 *spt, u64 *host, int index_start, int index_end) +{ + int i; + + for (i = PTRS_PER_PGD/2; i < PTRS_PER_PGD; i++) { + /* clone only the range that doesn't belong to guest */ + if (i >= index_start && i < index_end) + continue; + + /* remove userbit from host mmu, which also disable VSYSCALL page */ + spt[i] = host[i] & ~(_PAGE_USER | SPTE_MMU_PRESENT_MASK); + } +} + +u64 *host_mmu_root_pgd; +u64 *host_mmu_la57_top_p4d; + +int __init host_mmu_init(void) +{ + u64 *host_pgd; + + if (guest_address_space_init() < 0) + return -ENOMEM; + + if (!boot_cpu_has(X86_FEATURE_PTI)) + host_pgd = (void *)current->mm->pgd; + else + host_pgd = (void *)kernel_to_user_pgdp(current->mm->pgd); + + host_mmu_root_pgd = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO); + + if (!host_mmu_root_pgd) { + host_mmu_destroy(); + return -ENOMEM; + } + if (pgtable_l5_enabled()) { + host_mmu_la57_top_p4d = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO); + if (!host_mmu_la57_top_p4d) { + host_mmu_destroy(); + return -ENOMEM; + } + + clone_host_mmu(host_mmu_root_pgd, host_pgd, pml5_index_start, pml5_index_end); + clone_host_mmu(host_mmu_la57_top_p4d, __va(host_pgd[511] & SPTE_BASE_ADDR_MASK), + pml4_index_start, pml4_index_end); + } else { + clone_host_mmu(host_mmu_root_pgd, host_pgd, pml4_index_start, pml4_index_end); + } + + if (pgtable_l5_enabled()) { + pr_warn("Supporting for LA57 host is not fully implemented yet.\n"); + host_mmu_destroy(); + return -EOPNOTSUPP; + } + + return 0; +} + +void host_mmu_destroy(void) +{ + if (pvm_va_range_l4) + free_vm_area(pvm_va_range_l4); + if (host_mmu_root_pgd) + free_page((unsigned long)(void *)host_mmu_root_pgd); + if (host_mmu_la57_top_p4d) + free_page((unsigned long)(void *)host_mmu_la57_top_p4d); + pvm_va_range_l4 = NULL; + host_mmu_root_pgd = NULL; + host_mmu_la57_top_p4d = NULL; +} diff --git a/arch/x86/kvm/pvm/pvm.h b/arch/x86/kvm/pvm/pvm.h new file mode 100644 index 000000000000..7a3732986a6d --- /dev/null +++ b/arch/x86/kvm/pvm/pvm.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __KVM_X86_PVM_H +#define __KVM_X86_PVM_H + +#define PT_L4_SHIFT 39 +#define PT_L4_SIZE (1UL << PT_L4_SHIFT) +#define DEFAULT_RANGE_L4_SIZE (32 * PT_L4_SIZE) + +#define PT_L5_SHIFT 48 +#define PT_L5_SIZE (1UL << PT_L5_SHIFT) +#define DEFAULT_RANGE_L5_SIZE (32 * PT_L5_SIZE) + +extern u32 pml4_index_start; +extern u32 pml4_index_end; +extern u32 pml5_index_start; +extern u32 pml5_index_end; + +extern u64 *host_mmu_root_pgd; + +void host_mmu_destroy(void); +int host_mmu_init(void); + +#endif /* __KVM_X86_PVM_H */ From patchwork Mon Feb 26 14:35:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572301 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0048412B15B; Mon, 26 Feb 2024 14:35:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958136; cv=none; b=gnGaL2Oh/8smFhZ3K6VEWTUiEznU5IVgVO2lSywG1jPmns3rggYZLkNOr1FNV3CFoH+jd46o/15UulAP+B2QhqWE8HlNyGuCtNxw0WH5Orcd6etOQTiIzu294Dp+P7uKdhvQIIvxOq/AomDguwdCIn4EktXaIgXFDk8J5tfSmOU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958136; c=relaxed/simple; bh=A91XOAPWlG4UQM4/y3mhJgJbexBv/5cbd9ybcfuxCMU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=BtzYmFQroxggWxInA49bk95iHynjORLYKCVCxvcGlznChpKdu7n532VDWiaVjNBa3YiCvuOSQwtMhZMYc83o3J2njo0pWmAiKgVhNr/etS/RKJF/IlBoEsOTi8JKq3kpG44vkfkZfBoWrnJplN4vBnIGWqSfFjuevWrrQrSC8eQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=lU9RktdC; arc=none smtp.client-ip=209.85.214.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="lU9RktdC" Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-1dc9222b337so12871425ad.2; Mon, 26 Feb 2024 06:35:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958134; x=1709562934; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=riQZ9ZpwPxz2QrssKIMSS0JXLxPFIoRiOKE+eD6KDsw=; b=lU9RktdCkXFgRdwP8zBrXTb5xpHvjpOyGWzjRHapsvywg/i3XXSBiqxmXXeM8v+o4Q PdhLDERYpqLpZUvTHUktdsinfoNtAkyDgKtqs8aI+s+a/k9BH0LwkCQPrFxzaHrnQFfV RoqWFiHdgEU2gx0rceEdp5IgbZGJzzZHkI9klPBg7vtu0QZ3Kg2dSM9cn9lxbSq6Mb+U bmiW8Z+3kjT1YwlRzzd2u0ZW4MDm6IEWKTx2z/eTzTCJIfAHAYWJiH2NIzabZiDN++iZ QqD7w5dKrrP6TWFNXVk9IYhssrsYXnxVO9LtTizjYwIqj4do3dRNGAMVyZWnifTkqPgS G/iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958134; x=1709562934; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=riQZ9ZpwPxz2QrssKIMSS0JXLxPFIoRiOKE+eD6KDsw=; b=hRRagJP6FHWW8b6Ck9Bi37IUpMOB5oWsBWUlPvJSaEyK51AdFF/PVhBnLHRkZcMrQ6 EBZOG6jrcnouuZql9Qkj2bN/Lx4a3HgOGf6gFwFDrZg+M6gZ0S9JqHyrtxZuzmnGkjXa CwZab6larCDKTqkcVtDZZPiD89klUDEiaYBcywmBc5+Fy4CcYCYQSvimCgP/ZraurWin Nq9fpNPoJomQQlrzLbdy2j+5ct0qJai8c6cSeMPTRbnDUkxT3sU67aBciXnt9frikX3p 8ryO3Ykzg4VkbXyzJuwVtKTvYQ5VuwPihi4ZJLWdgZYXWUET0lrlKSwMA6k9aHzE7iSD yhCA== X-Forwarded-Encrypted: i=1; AJvYcCWbgvqPA2HnGrWMOUoaraEJRBWdIHJ1EdlPoS3P9/wDpNDCJwuK7vt6neeLH/X7Zj5rcGhupEFMevF6z874OYfGPQJK X-Gm-Message-State: AOJu0YwMY21xl5OYs+UTYfUSM7mttVR6LP46aZX8ReuoySZL+d6a8NGM tGYWepY3v/pH7FMe6zgwhNcLeOZZrdbkai5YV+r+3EMOPPPLhx3KZCtp/5XV X-Google-Smtp-Source: AGHT+IHECtnuYEhCc+LtTSqZFq0GvsQFgPnhhCjRj5C7penoqaGkEFA3+9fJ+eNcu8ruHubMNcBCaQ== X-Received: by 2002:a17:903:1111:b0:1db:9a7d:2e6 with SMTP id n17-20020a170903111100b001db9a7d02e6mr7701854plh.48.1708958133971; Mon, 26 Feb 2024 06:35:33 -0800 (PST) Received: from localhost ([47.254.32.37]) by smtp.gmail.com with ESMTPSA id d20-20020a170902c19400b001db9fa23407sm4000701pld.195.2024.02.26.06.35.33 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:35:33 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 17/73] KVM: x86/PVM: Implement module initialization related callbacks Date: Mon, 26 Feb 2024 22:35:34 +0800 Message-Id: <20240226143630.33643-18-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan Implement hardware enable/disable and setup/unsetup callbacks for PVM module initialization. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 226 +++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/pvm/pvm.h | 20 ++++ 2 files changed, 246 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index 1dfa1ae57c8c..83aa2c9f42f6 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -9,18 +9,244 @@ * the COPYING file in the top-level directory. * */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include +#include + +#include "cpuid.h" +#include "x86.h" +#include "pvm.h" + MODULE_AUTHOR("AntGroup"); MODULE_LICENSE("GPL"); +static bool __read_mostly is_intel; + +static unsigned long host_idt_base; + +static void pvm_setup_mce(struct kvm_vcpu *vcpu) +{ +} + +static bool pvm_has_emulated_msr(struct kvm *kvm, u32 index) +{ + switch (index) { + case MSR_IA32_MCG_EXT_CTL: + case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR: + return false; + case MSR_AMD64_VIRT_SPEC_CTRL: + case MSR_AMD64_TSC_RATIO: + /* This is AMD SVM only. */ + return false; + case MSR_IA32_SMBASE: + /* Currenlty we only run guest in long mode. */ + return false; + default: + break; + } + + return true; +} + +static bool cpu_has_pvm_wbinvd_exit(void) +{ + return true; +} + +static int hardware_enable(void) +{ + /* Nothing to do */ + return 0; +} + +static void hardware_disable(void) +{ + /* Nothing to do */ +} + +static int pvm_check_processor_compat(void) +{ + /* Nothing to do */ + return 0; +} + +static __init void pvm_set_cpu_caps(void) +{ + if (boot_cpu_has(X86_FEATURE_NX)) + kvm_enable_efer_bits(EFER_NX); + if (boot_cpu_has(X86_FEATURE_FXSR_OPT)) + kvm_enable_efer_bits(EFER_FFXSR); + + kvm_set_cpu_caps(); + + /* Unloading kvm-intel.ko doesn't clean up kvm_caps.supported_mce_cap. */ + kvm_caps.supported_mce_cap = MCG_CTL_P | MCG_SER_P; + + kvm_caps.supported_xss = 0; + + /* PVM supervisor mode runs on hardware ring3, so no xsaves. */ + kvm_cpu_cap_clear(X86_FEATURE_XSAVES); + + /* + * PVM supervisor mode runs on hardware ring3, so SMEP and SMAP can not + * be supported directly through hardware. But they can be emulated + * through other hardware feature when needed. + */ + + /* + * PVM doesn't support SMAP, but the similar protection might be + * emulated via PKU in the future. + */ + kvm_cpu_cap_clear(X86_FEATURE_SMAP); + + /* + * PVM doesn't support SMEP. When NX is supported and the guest can + * use NX on the user pagetable to emulate the same protection as SMEP. + */ + kvm_cpu_cap_clear(X86_FEATURE_SMEP); + + /* + * Unlike VMX/SVM which can switches paging mode atomically, PVM + * implements guest LA57 through host LA57 shadow paging. + */ + if (!pgtable_l5_enabled()) + kvm_cpu_cap_clear(X86_FEATURE_LA57); + + /* + * Even host pcid is not enabled, guest pcid can be enabled to reduce + * the heavy guest tlb flushing. Guest CR4.PCIDE is not directly + * mapped to the hardware and is virtualized by PVM so that it can be + * enabled unconditionally. + */ + kvm_cpu_cap_set(X86_FEATURE_PCID); + + /* Don't expose MSR_IA32_SPEC_CTRL to guest */ + kvm_cpu_cap_clear(X86_FEATURE_SPEC_CTRL); + kvm_cpu_cap_clear(X86_FEATURE_AMD_STIBP); + kvm_cpu_cap_clear(X86_FEATURE_AMD_IBRS); + kvm_cpu_cap_clear(X86_FEATURE_AMD_SSBD); + + /* PVM hypervisor hasn't implemented LAM so far */ + kvm_cpu_cap_clear(X86_FEATURE_LAM); + + /* Don't expose MSR_IA32_DEBUGCTLMSR related features. */ + kvm_cpu_cap_clear(X86_FEATURE_BUS_LOCK_DETECT); +} + +static __init int hardware_setup(void) +{ + struct desc_ptr dt; + + store_idt(&dt); + host_idt_base = dt.address; + + pvm_set_cpu_caps(); + + kvm_configure_mmu(false, 0, 0, 0); + + enable_apicv = 0; + + return 0; +} + +static void hardware_unsetup(void) +{ +} + +struct kvm_x86_nested_ops pvm_nested_ops = {}; + +static struct kvm_x86_ops pvm_x86_ops __initdata = { + .name = KBUILD_MODNAME, + + .check_processor_compatibility = pvm_check_processor_compat, + + .hardware_unsetup = hardware_unsetup, + .hardware_enable = hardware_enable, + .hardware_disable = hardware_disable, + .has_emulated_msr = pvm_has_emulated_msr, + + .has_wbinvd_exit = cpu_has_pvm_wbinvd_exit, + + .nested_ops = &pvm_nested_ops, + + .setup_mce = pvm_setup_mce, +}; + +static struct kvm_x86_init_ops pvm_init_ops __initdata = { + .hardware_setup = hardware_setup, + + .runtime_ops = &pvm_x86_ops, +}; + static void pvm_exit(void) { + kvm_exit(); + kvm_x86_vendor_exit(); + host_mmu_destroy(); + allow_smaller_maxphyaddr = false; + kvm_cpuid_vendor_signature = 0; } module_exit(pvm_exit); +static int __init hardware_cap_check(void) +{ + /* + * switcher can't be used when KPTI. See the comments above + * SWITCHER_SAVE_AND_SWITCH_TO_HOST_CR3 + */ + if (boot_cpu_has(X86_FEATURE_PTI)) { + pr_warn("Support for host KPTI is not included yet.\n"); + return -EOPNOTSUPP; + } + if (!boot_cpu_has(X86_FEATURE_FSGSBASE)) { + pr_warn("FSGSBASE is required per PVM specification.\n"); + return -EOPNOTSUPP; + } + if (!boot_cpu_has(X86_FEATURE_RDTSCP)) { + pr_warn("RDTSCP is required to support for getcpu in guest vdso.\n"); + return -EOPNOTSUPP; + } + if (!boot_cpu_has(X86_FEATURE_CX16)) { + pr_warn("CMPXCHG16B is required for guest.\n"); + return -EOPNOTSUPP; + } + + return 0; +} + static int __init pvm_init(void) { + int r; + + r = hardware_cap_check(); + if (r) + return r; + + r = host_mmu_init(); + if (r) + return r; + + is_intel = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL; + + r = kvm_x86_vendor_init(&pvm_init_ops); + if (r) + goto exit_host_mmu; + + r = kvm_init(sizeof(struct vcpu_pvm), __alignof__(struct vcpu_pvm), THIS_MODULE); + if (r) + goto exit_vendor; + + allow_smaller_maxphyaddr = true; + kvm_cpuid_vendor_signature = PVM_CPUID_SIGNATURE; + return 0; + +exit_vendor: + kvm_x86_vendor_exit(); +exit_host_mmu: + host_mmu_destroy(); + return r; } module_init(pvm_init); diff --git a/arch/x86/kvm/pvm/pvm.h b/arch/x86/kvm/pvm/pvm.h index 7a3732986a6d..6149cf5975a4 100644 --- a/arch/x86/kvm/pvm/pvm.h +++ b/arch/x86/kvm/pvm/pvm.h @@ -2,6 +2,8 @@ #ifndef __KVM_X86_PVM_H #define __KVM_X86_PVM_H +#include + #define PT_L4_SHIFT 39 #define PT_L4_SIZE (1UL << PT_L4_SHIFT) #define DEFAULT_RANGE_L4_SIZE (32 * PT_L4_SIZE) @@ -20,4 +22,22 @@ extern u64 *host_mmu_root_pgd; void host_mmu_destroy(void); int host_mmu_init(void); +struct vcpu_pvm { + struct kvm_vcpu vcpu; +}; + +struct kvm_pvm { + struct kvm kvm; +}; + +static __always_inline struct kvm_pvm *to_kvm_pvm(struct kvm *kvm) +{ + return container_of(kvm, struct kvm_pvm, kvm); +} + +static __always_inline struct vcpu_pvm *to_pvm(struct kvm_vcpu *vcpu) +{ + return container_of(vcpu, struct vcpu_pvm, vcpu); +} + #endif /* __KVM_X86_PVM_H */ From patchwork Mon Feb 26 14:35:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572302 Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com [209.85.210.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ECC1A131E34; Mon, 26 Feb 2024 14:35:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958139; cv=none; b=hl4CmGfl1Lsgen9ro8bRxgxzzqIN17tbJ1jAmpkOtxHQRM/HtQ+CwlF1lB+VsJFwm3ipcxhs6pGNtMpwszQ9F3ciD815KsigbfCL1Xh75Hgi8SGV4O3V0N3YZJ7iliY3QUxgC4gsd1TTcrBMvIafq55IRmk0r/sjzPo4TtStc+4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958139; c=relaxed/simple; bh=iwftWf46wOnpswtTJOCMVHOfuKM+8gHZWJ2C/CjfvgU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=OZ3GyT6eyLvjlhXv/NQ9sm5ktzDLVprskqOX2u0NF/fVfZYf6MxBtFnpFxm8r/4/Fw/Fu+WPzYE3BLmgil6IMQrh9LuQ1WSpusDvqiwuIdbym8KtvDm4oa23uM71QDRyEgUXpJgz8a+ru2W5AiyA3HbmsWJOph7mjgHbgGwiGzM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=fJum386X; arc=none smtp.client-ip=209.85.210.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="fJum386X" Received: by mail-pf1-f178.google.com with SMTP id d2e1a72fcca58-6e43ee3f6fbso2829528b3a.3; Mon, 26 Feb 2024 06:35:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958137; x=1709562937; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Ui4nV+xW+CAuqp8t7yR+nLCmWIr4uDNM8z/IC9GYkUg=; b=fJum386XeS2js/BrOdhAcagZ2WbwaehatTS+qxOgRDIjaGgtdPtmqxYFryiGVHZ2aq x5TbbY7EHL/2RSEwMzDtLMdYFbO0rVrQQbBHLcLLFr/MtWNRdCtGU2/TJ9OwrPqmhS7R KrRMEVuki7P2P74SIMqpS3W2PKyFHm/vKNWm7NtQR3knB22zk7s/bom1gXRImFeCGVro gO6DAVpm3EH+JTeYRZ1BwNCxcqLpfaYHJUB+vp9o4RQkGfqSSpvoUTIYGTj3G5gLg4zm cqILY2Ob+cxNruCqxW8yXzoSS0Ce0Xgp9F28S87VR+Xf870uZmI10VhoQf/sJJt98DCy t/kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958137; x=1709562937; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ui4nV+xW+CAuqp8t7yR+nLCmWIr4uDNM8z/IC9GYkUg=; b=lFXFL807BtXzK9QA8t8NJePBW8Kqfy1AG7nr9xT+QzTFZqaN5wFkhlgHBvON7JYQ/e 3fiWx4oj5HIkL+yx4cSCl6Uw2D6mzvXFtY0FG+HOcsECBaqAcqQktYhyRltM1hmLytF1 dw1GbwRgJlHBKM11bRs2oN3renKI4K0gA5pXvT0lOhyO8un2tzwUuDsnVWx3zEdWq8NK vBLGenY0xHP0faPNpJX9wQRwjYVFFEf9qZWVnpYf91icKjlynwzW1Gt9JVUmeuJRcN7A gjWA06yyAaYZkW6Qv8exAawD4BqSx/qKBLk5EB8VtQVGd5RHUfxu+OX/HmJ0WvEabAmx 43rg== X-Forwarded-Encrypted: i=1; AJvYcCXTp+KyU8m+oQnuNqVHrs3qxm+b+sg6VHmkCTbHlD9XboS2pBndNl+iwEpV0sP6Q9Emjvav6+AAskpRo33kTPZqc/71 X-Gm-Message-State: AOJu0Yyq7DswNDyok0GXXEG4B6H3BY70kYxBQka/4ZgTCvzlrVIhWaBp H6C6wS/3urb4kDdbYbOcKSqsQeSwjGxqxhvn4ygvpHqI6CEbxgexgwDkWsIh X-Google-Smtp-Source: AGHT+IF4Q1Dv0wbolFf05KnWD0z8wS1wGRKSqhBnMZon9r60ioyBfE2I77T9HPMxwLMDV0uNyNdWWQ== X-Received: by 2002:a05:6a21:1518:b0:19e:3a94:6309 with SMTP id nq24-20020a056a21151800b0019e3a946309mr11225761pzb.5.1708958137168; Mon, 26 Feb 2024 06:35:37 -0800 (PST) Received: from localhost ([198.11.178.15]) by smtp.gmail.com with ESMTPSA id it8-20020a056a00458800b006e05c801748sm4103970pfb.199.2024.02.26.06.35.36 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:35:36 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 18/73] KVM: x86/PVM: Implement VM/VCPU initialization related callbacks Date: Mon, 26 Feb 2024 22:35:35 +0800 Message-Id: <20240226143630.33643-19-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan In the vm_init() callback, the cloned host root page table is recorded into the 'kvm' structure, allowing for the cloning of host PGD entries during SP allocation. In the vcpu_create() callback, the pfn cache for 'PVCS' is initialized and deactivated in the vcpu_free() callback. Additionally, the vcpu_reset() callback needs to perform a common x86 reset and specific PVM reset. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 120 +++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/pvm/pvm.h | 34 ++++++++++++ 2 files changed, 154 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index 83aa2c9f42f6..d4cc52bf6b3f 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -55,6 +55,117 @@ static bool cpu_has_pvm_wbinvd_exit(void) return true; } +static void reset_segment(struct kvm_segment *var, int seg) +{ + memset(var, 0, sizeof(*var)); + var->limit = 0xffff; + var->present = 1; + + switch (seg) { + case VCPU_SREG_CS: + var->s = 1; + var->type = 0xb; /* Code Segment */ + var->selector = 0xf000; + var->base = 0xffff0000; + break; + case VCPU_SREG_LDTR: + var->s = 0; + var->type = DESC_LDT; + break; + case VCPU_SREG_TR: + var->s = 0; + var->type = DESC_TSS | 0x2; // TSS32 busy + break; + default: + var->s = 1; + var->type = 3; /* Read/Write Data Segment */ + break; + } +} + +static void __pvm_vcpu_reset(struct kvm_vcpu *vcpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + + if (is_intel) + vcpu->arch.microcode_version = 0x100000000ULL; + else + vcpu->arch.microcode_version = 0x01000065; + + pvm->msr_ia32_feature_control_valid_bits = FEAT_CTL_LOCKED; +} + +static void pvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + int i; + + kvm_gpc_deactivate(&pvm->pvcs_gpc); + + if (!init_event) + __pvm_vcpu_reset(vcpu); + + /* + * For PVM, cpuid faulting relies on hardware capability, but it is set + * as supported by default in kvm_arch_vcpu_create(). Therefore, it + * should be cleared if the host doesn't support it. + */ + if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT)) + vcpu->arch.msr_platform_info &= ~MSR_PLATFORM_INFO_CPUID_FAULT; + + // X86 resets + for (i = 0; i < ARRAY_SIZE(pvm->segments); i++) + reset_segment(&pvm->segments[i], i); + kvm_set_cr8(vcpu, 0); + pvm->idt_ptr.address = 0; + pvm->idt_ptr.size = 0xffff; + pvm->gdt_ptr.address = 0; + pvm->gdt_ptr.size = 0xffff; + + // PVM resets + pvm->switch_flags = SWITCH_FLAGS_INIT; + pvm->hw_cs = __USER_CS; + pvm->hw_ss = __USER_DS; + pvm->int_shadow = 0; + pvm->nmi_mask = false; + + pvm->msr_vcpu_struct = 0; + pvm->msr_supervisor_rsp = 0; + pvm->msr_event_entry = 0; + pvm->msr_retu_rip_plus2 = 0; + pvm->msr_rets_rip_plus2 = 0; + pvm->msr_switch_cr3 = 0; +} + +static int pvm_vcpu_create(struct kvm_vcpu *vcpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + + BUILD_BUG_ON(offsetof(struct vcpu_pvm, vcpu) != 0); + + pvm->switch_flags = SWITCH_FLAGS_INIT; + kvm_gpc_init(&pvm->pvcs_gpc, vcpu->kvm, vcpu, KVM_GUEST_AND_HOST_USE_PFN); + + return 0; +} + +static void pvm_vcpu_free(struct kvm_vcpu *vcpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + + kvm_gpc_deactivate(&pvm->pvcs_gpc); +} + +static void pvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) +{ +} + +static int pvm_vm_init(struct kvm *kvm) +{ + kvm->arch.host_mmu_root_pgd = host_mmu_root_pgd; + return 0; +} + static int hardware_enable(void) { /* Nothing to do */ @@ -169,6 +280,15 @@ static struct kvm_x86_ops pvm_x86_ops __initdata = { .has_wbinvd_exit = cpu_has_pvm_wbinvd_exit, + .vm_size = sizeof(struct kvm_pvm), + .vm_init = pvm_vm_init, + + .vcpu_create = pvm_vcpu_create, + .vcpu_free = pvm_vcpu_free, + .vcpu_reset = pvm_vcpu_reset, + + .vcpu_after_set_cpuid = pvm_vcpu_after_set_cpuid, + .nested_ops = &pvm_nested_ops, .setup_mce = pvm_setup_mce, diff --git a/arch/x86/kvm/pvm/pvm.h b/arch/x86/kvm/pvm/pvm.h index 6149cf5975a4..599bbbb284dc 100644 --- a/arch/x86/kvm/pvm/pvm.h +++ b/arch/x86/kvm/pvm/pvm.h @@ -3,6 +3,9 @@ #define __KVM_X86_PVM_H #include +#include + +#define SWITCH_FLAGS_INIT (SWITCH_FLAGS_SMOD) #define PT_L4_SHIFT 39 #define PT_L4_SIZE (1UL << PT_L4_SHIFT) @@ -24,6 +27,37 @@ int host_mmu_init(void); struct vcpu_pvm { struct kvm_vcpu vcpu; + + unsigned long switch_flags; + + u32 hw_cs, hw_ss; + + int int_shadow; + bool nmi_mask; + + struct gfn_to_pfn_cache pvcs_gpc; + + /* + * Only bits masked by msr_ia32_feature_control_valid_bits can be set in + * msr_ia32_feature_control. FEAT_CTL_LOCKED is always included + * in msr_ia32_feature_control_valid_bits. + */ + u64 msr_ia32_feature_control; + u64 msr_ia32_feature_control_valid_bits; + + // PVM paravirt MSRs + unsigned long msr_vcpu_struct; + unsigned long msr_supervisor_rsp; + unsigned long msr_supervisor_redzone; + unsigned long msr_event_entry; + unsigned long msr_retu_rip_plus2; + unsigned long msr_rets_rip_plus2; + unsigned long msr_switch_cr3; + unsigned long msr_linear_address_range; + + struct kvm_segment segments[NR_VCPU_SREG]; + struct desc_ptr idt_ptr; + struct desc_ptr gdt_ptr; }; struct kvm_pvm { From patchwork Mon Feb 26 14:35:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572303 Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1042112BF0D; Mon, 26 Feb 2024 14:35:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958142; cv=none; b=MYu2Tq/wVe7njJ2s+l+92QTypwi+BlckuAsiWoOt+DdiMJm/H5UYJsqUQzS+0qR45iidpwyB6Ff+bDeCHF+97qYBCDUgRk/Wj0T6292KBH6F4nMzoMnilTNIdtCxoSsBpXlIxmm6DNBnbn9yVJTW6JZ+m316ZYM2QeQsFczZFZQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958142; c=relaxed/simple; bh=lkPWWNX4CqvxkABoFnNNUXtgtU86hs8RqI86OHed6gI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=sqx+QlnwCliJHknl4xR9xEQYOQdsoFZ2lmH4/EJ39JDyBDD37wVCQJfL3r38gtMPqfzdlt8cDqE1Dw8qp/I7LO9Hm15AQp2oPu64rFqkWRTZ9lsoRbhiZHn5z/sDGu+3Be+XKvrRE40FwfTCAuvj/7kHtmeqexzMAqOdvZJmV6A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=frsiMweg; arc=none smtp.client-ip=209.85.210.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="frsiMweg" Received: by mail-pf1-f175.google.com with SMTP id d2e1a72fcca58-6e08dd0fa0bso2686770b3a.1; Mon, 26 Feb 2024 06:35:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958140; x=1709562940; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BTFwRy/j+hGc3A0+vUN+d8c3IFpFYgzuXEpZOKFIa18=; b=frsiMwegqR48pYel71xccNr85n9qEBuLXgiIKvX3g4Qe1slOER1yHJqjC9SckgQaWT vhzCB8i6CsSNNy+crzUY4uZPRyENqpj1rpmWR0zPAin1UjnPn8qOY6rRMyw0ztXtyI/j lCXYVrf/A512sv5D4SIH6hNs/664YoFGBdQylFiAeVtc5wH441hCE5GpB2mdfvVkU9Un I8wgYinQXtCba4K0EU8vH3AG5WLqj7BDTZ1bhHNh5PU6+HzZW1YsTw7NsaSOIv9cuimH ddNp+e31ZepvBe/7EpPU/GyEr218qjtWFjlD3Ak2scQ6hW2lt7je2aA7KvGxezsnFB4z Yccg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958140; x=1709562940; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BTFwRy/j+hGc3A0+vUN+d8c3IFpFYgzuXEpZOKFIa18=; b=WjmZVmyb9NGyPQfE7/VOXmcCf78j+xqOIK7LBxh2oF45xeAeXN4UCyYjHIjtNNvwSc 9rbsexLRCP0/8yE4yranYfezkDEBXJjivDwRkPGBQPvO2tn8ePhP36b6pGixGAaFv7Ig IF85vhnXG23rfr4t57oSGQBlfqZISH0ykAL+/5pN6sJJf5si0R/lSa8JOV+Pw+sZqjke mrcRCchy+iJBHUNcRsLOTbed6NIV2jxpXG7+asunrnjyIkCCCvJFhfxGklWjiQ5smETF 3BQJGXtLcT9Pyh9babVEeN/DNg6FTlPXyVMkl4f9LFSzmYggz/CgIW5xIkiApUJ2h8/8 2NIQ== X-Forwarded-Encrypted: i=1; AJvYcCVPOIWkmpUNIIo2ysfq9HJRy50EMEfv6ZAORRBZGcgtiCJH2uft+DRvolvDWzxbkQnHyLFoUORHdKLWAGXN+K8gBIae X-Gm-Message-State: AOJu0YzDQayBtbZihwVKz+ZWF1aN5lRgmXEB6aUGl2H6VuxYBEqi23NB Hv2cQIOTJiDHjTEPjKdlWCiGe7ro0/SMqA5reYFIpQlO4jrTQMKgKR8ke57M X-Google-Smtp-Source: AGHT+IFkQAeotaJC6rOlB/07QXBxTvhlQ1Y2T2xm7Tl42zr/8ZXRkxT+6oQYkxerVjcTQcp6MLzzeQ== X-Received: by 2002:a05:6a21:1014:b0:1a0:e6c6:fa with SMTP id nk20-20020a056a21101400b001a0e6c600famr9611214pzb.7.1708958140321; Mon, 26 Feb 2024 06:35:40 -0800 (PST) Received: from localhost ([198.11.176.14]) by smtp.gmail.com with ESMTPSA id x64-20020a626343000000b006e501303f1fsm3433320pfb.40.2024.02.26.06.35.39 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:35:40 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Hou Wenlong , Lai Jiangshan , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Andy Lutomirski , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 19/73] x86/entry: Export 32-bit ignore syscall entry and __ia32_enabled variable Date: Mon, 26 Feb 2024 22:35:36 +0800 Message-Id: <20240226143630.33643-20-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Hou Wenlong For PVM hypervisor, it ignores 32-bit syscall for guest currenlty. Therefore, export 32-bit ignore syscall entry and __ia32_enabled variable for PVM module. Suggested-by: Lai Jiangshan Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/entry/common.c | 1 + arch/x86/entry/entry_64.S | 1 + 2 files changed, 2 insertions(+) diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 6356060caaf3..00ff701aa1be 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -141,6 +141,7 @@ static __always_inline int syscall_32_enter(struct pt_regs *regs) #ifdef CONFIG_IA32_EMULATION bool __ia32_enabled __ro_after_init = !IS_ENABLED(CONFIG_IA32_EMULATION_DEFAULT_DISABLED); +EXPORT_SYMBOL_GPL(__ia32_enabled); static int ia32_emulation_override_cmdline(char *arg) { diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 65bfebebeab6..5b25ea4a16ae 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1527,6 +1527,7 @@ SYM_CODE_START(entry_SYSCALL32_ignore) mov $-ENOSYS, %eax sysretl SYM_CODE_END(entry_SYSCALL32_ignore) +EXPORT_SYMBOL_GPL(entry_SYSCALL32_ignore) .pushsection .text, "ax" __FUNC_ALIGN From patchwork Mon Feb 26 14:35:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572304 Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B2526132489; Mon, 26 Feb 2024 14:35:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958146; cv=none; b=H5znE4WbrTxGD7OfMsR+saT9iaXyFBNnRNlm0K68V7GTMx9Iimx8hQLi8lj6XvCKZdxDt9+4cHVomRkKIuZIbfhhIolNeBIbnVa8OxyfpH4zEuIDnxjN5cdnnlaxjyKYouba24udrja603EgABF7XksCJ2OrWhyDADPfCKtM6OU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958146; c=relaxed/simple; bh=8SekTiXNHrHLsNEUj45LpjNSY2VTIm3w7VgqsWwXACc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=F3bAvni9pmDEJp/tdoIQVUfw8DD5dhKa1m4EtwSNiEhx4iiawhXHrPpJAVuhP0JRbN1T7KgPMQrl5HQKNZgOOiW4zU/3ormljC6FQfQpkc0fB61dkWRM6mbwnd3tP+pjMCFbGYIWo05saADQr66BUOkyPWfBdoU0nfy+QW83r+w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Bg/DjkPS; arc=none smtp.client-ip=209.85.210.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Bg/DjkPS" Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-6e53f3f1f82so214840b3a.2; Mon, 26 Feb 2024 06:35:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958144; x=1709562944; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GlT8Oy8jepWXAfHbDfNjdqLeUT8NKyYg0Ors9fKJfGM=; b=Bg/DjkPSuasxBhfHr0aYkXRRiATzQFMJVyMcYH2rEwlo+FHaYs6Qoq3NGc3TxsNo5p J9GndacRAQgiYkv41ySkiqhYZ/TCN0BvxjUAS4mH7zEoQcxltDxhFWRK+tKZCChkuzgn boGKo/vSfx9pCMR2gPXuqLMpzGyCEu4BRd5jlywu5tXofr6etmM8zj8PI9gyySzOUBRo ZXCmuVgqaC4zDwk7VRweSN8cajdOervWU7s9gfOwVCPHx1NXABHan29JNQKL7+5+f1BH t7aPSnOHHU3KaB/p+gLey0NynAkxQeHFOs+v3aVDYYwHn8wGxqTD6xFbUWtWfvTcjKbD 0EIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958144; x=1709562944; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GlT8Oy8jepWXAfHbDfNjdqLeUT8NKyYg0Ors9fKJfGM=; b=fir/mv314JTadVJ/ujR+w+zZocOlMh1mdpYu1cm41Cicz2PNKAU+MZM5su5Ei8Nvxg 9SSoRGHCYOJnyF8mTLyZufwfVFm+WYtijwP3fgC6PxaEPIuEQ6QObGjgVGybTdSTAzbX WYbGEcYmIwhx5KcSMn0YusEQCMXWRKfK8uFtfCFn7YxUZtRGeYcrDGzBoa2T7WLEaVjt EtoZi0ts99PTgPLBJoIOJ4N8Rjikr+wdWeWZa5ircbWH1XGMtjWW90RaV+mk6nUesLww uxwgzvFc49fmZB0WmWg+2VDXxe/a6rsmaK/XoEXv0KYEnnR83mv4UnxDnkxm3GXOnate MzIw== X-Forwarded-Encrypted: i=1; AJvYcCUKK7d/Q3O43c8lUaomVzOAnLDg/YW5duVgyya6rnzNM/xuNw9RZmwz72dTR/hS70+RllHLKJTMzMsbBFQiCvUhg/Xd X-Gm-Message-State: AOJu0YzHAgEJD6hakadqOjKJdiYqMbgfH3u3sVrEzn3NYtgYIX+PEzvB D2U/v0XX60cIrMs82giUlFfbdYRppu2iNy3CK+vCzMMlrfnBDScJFTE6u7gz X-Google-Smtp-Source: AGHT+IF8TJ6xNK89quEEVBVGExFNiUnrqDL3Or/nJo10zBYwCLB2W+1oUA5ZsXrst2Z2JzN340jXxg== X-Received: by 2002:aa7:8449:0:b0:6e4:74ba:d907 with SMTP id r9-20020aa78449000000b006e474bad907mr5396962pfn.27.1708958143772; Mon, 26 Feb 2024 06:35:43 -0800 (PST) Received: from localhost ([47.89.225.180]) by smtp.gmail.com with ESMTPSA id k4-20020aa79d04000000b006e0651ec05csm4066820pfp.43.2024.02.26.06.35.42 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:35:43 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 20/73] KVM: x86/PVM: Implement vcpu_load()/vcpu_put() related callbacks Date: Mon, 26 Feb 2024 22:35:37 +0800 Message-Id: <20240226143630.33643-21-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan When preparing to switch to the guest, some guest states that only matter to userspace can be loaded ahead before VM enter. In PVM, guest segment registers and user return MSRs are loaded into hardware at that time. Since LDT and IO bitmap are not supported in PVM guests, they are cleared as well. When preparing to switch to the host in vcpu_put(), host states are restored. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 235 +++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/pvm/pvm.h | 5 + 2 files changed, 240 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index d4cc52bf6b3f..52b3b47ffe42 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -13,6 +13,8 @@ #include +#include +#include #include #include "cpuid.h" @@ -26,6 +28,211 @@ static bool __read_mostly is_intel; static unsigned long host_idt_base; +static inline void __save_gs_base(struct vcpu_pvm *pvm) +{ + // switcher will do a real hw swapgs, so use hw MSR_KERNEL_GS_BASE + rdmsrl(MSR_KERNEL_GS_BASE, pvm->segments[VCPU_SREG_GS].base); +} + +static inline void __load_gs_base(struct vcpu_pvm *pvm) +{ + // switcher will do a real hw swapgs, so use hw MSR_KERNEL_GS_BASE + wrmsrl(MSR_KERNEL_GS_BASE, pvm->segments[VCPU_SREG_GS].base); +} + +static inline void __save_fs_base(struct vcpu_pvm *pvm) +{ + rdmsrl(MSR_FS_BASE, pvm->segments[VCPU_SREG_FS].base); +} + +static inline void __load_fs_base(struct vcpu_pvm *pvm) +{ + wrmsrl(MSR_FS_BASE, pvm->segments[VCPU_SREG_FS].base); +} + +/* + * Test whether DS, ES, FS and GS need to be reloaded. + * + * Reading them only returns the selectors, but writing them (if + * nonzero) loads the full descriptor from the GDT or LDT. + * + * We therefore need to write new values to the segment registers + * on every host-guest state switch unless both the new and old + * values are zero. + */ +static inline bool need_reload_sel(u16 sel1, u16 sel2) +{ + return unlikely(sel1 | sel2); +} + +/* + * Save host DS/ES/FS/GS selector, FS base, and inactive GS base. + * And load guest DS/ES/FS/GS selector, FS base, and GS base. + * + * Note, when the guest state is loaded and it is in hypervisor, the guest + * GS base is loaded in the hardware MSR_KERNEL_GS_BASE which is loaded + * with host inactive GS base when the guest state is NOT loaded. + */ +static void segments_save_host_and_switch_to_guest(struct vcpu_pvm *pvm) +{ + u16 pvm_ds_sel, pvm_es_sel, pvm_fs_sel, pvm_gs_sel; + + /* Save host segments */ + savesegment(ds, pvm->host_ds_sel); + savesegment(es, pvm->host_es_sel); + current_save_fsgs(); + + /* Load guest segments */ + pvm_ds_sel = pvm->segments[VCPU_SREG_DS].selector; + pvm_es_sel = pvm->segments[VCPU_SREG_ES].selector; + pvm_fs_sel = pvm->segments[VCPU_SREG_FS].selector; + pvm_gs_sel = pvm->segments[VCPU_SREG_GS].selector; + + if (need_reload_sel(pvm_ds_sel, pvm->host_ds_sel)) + loadsegment(ds, pvm_ds_sel); + if (need_reload_sel(pvm_es_sel, pvm->host_es_sel)) + loadsegment(es, pvm_es_sel); + if (need_reload_sel(pvm_fs_sel, current->thread.fsindex)) + loadsegment(fs, pvm_fs_sel); + if (need_reload_sel(pvm_gs_sel, current->thread.gsindex)) + load_gs_index(pvm_gs_sel); + + __load_gs_base(pvm); + __load_fs_base(pvm); +} + +/* + * Save guest DS/ES/FS/GS selector, FS base, and GS base. + * And load host DS/ES/FS/GS selector, FS base, and inactive GS base. + */ +static void segments_save_guest_and_switch_to_host(struct vcpu_pvm *pvm) +{ + u16 pvm_ds_sel, pvm_es_sel, pvm_fs_sel, pvm_gs_sel; + + /* Save guest segments */ + savesegment(ds, pvm_ds_sel); + savesegment(es, pvm_es_sel); + savesegment(fs, pvm_fs_sel); + savesegment(gs, pvm_gs_sel); + pvm->segments[VCPU_SREG_DS].selector = pvm_ds_sel; + pvm->segments[VCPU_SREG_ES].selector = pvm_es_sel; + pvm->segments[VCPU_SREG_FS].selector = pvm_fs_sel; + pvm->segments[VCPU_SREG_GS].selector = pvm_gs_sel; + + __save_fs_base(pvm); + __save_gs_base(pvm); + + /* Load host segments */ + if (need_reload_sel(pvm_ds_sel, pvm->host_ds_sel)) + loadsegment(ds, pvm->host_ds_sel); + if (need_reload_sel(pvm_es_sel, pvm->host_es_sel)) + loadsegment(es, pvm->host_es_sel); + if (need_reload_sel(pvm_fs_sel, current->thread.fsindex)) + loadsegment(fs, current->thread.fsindex); + if (need_reload_sel(pvm_gs_sel, current->thread.gsindex)) + load_gs_index(current->thread.gsindex); + + wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase); + wrmsrl(MSR_FS_BASE, current->thread.fsbase); +} + +static void pvm_prepare_switch_to_guest(struct kvm_vcpu *vcpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + + if (pvm->loaded_cpu_state) + return; + + pvm->loaded_cpu_state = 1; + +#ifdef CONFIG_X86_IOPL_IOPERM + /* + * PVM doesn't load guest I/O bitmap into hardware. Invalidate I/O + * bitmap if the current task is using it. This prevents any possible + * leakage of an active I/O bitmap to the guest and forces I/O + * instructions in guest to be trapped and emulated. + * + * The I/O bitmap will be restored when the current task exits to + * user mode in arch_exit_to_user_mode_prepare(). + */ + if (test_thread_flag(TIF_IO_BITMAP)) + native_tss_invalidate_io_bitmap(); +#endif + +#ifdef CONFIG_MODIFY_LDT_SYSCALL + /* PVM doesn't support LDT. */ + if (unlikely(current->mm->context.ldt)) + clear_LDT(); +#endif + + segments_save_host_and_switch_to_guest(pvm); + + kvm_set_user_return_msr(0, (u64)entry_SYSCALL_64_switcher, -1ull); + kvm_set_user_return_msr(1, pvm->msr_tsc_aux, -1ull); + if (ia32_enabled()) { + if (is_intel) + kvm_set_user_return_msr(2, GDT_ENTRY_INVALID_SEG, -1ull); + else + kvm_set_user_return_msr(2, (u64)entry_SYSCALL32_ignore, -1ull); + } +} + +static void pvm_prepare_switch_to_host(struct vcpu_pvm *pvm) +{ + if (!pvm->loaded_cpu_state) + return; + + ++pvm->vcpu.stat.host_state_reload; + +#ifdef CONFIG_MODIFY_LDT_SYSCALL + if (unlikely(current->mm->context.ldt)) + kvm_load_ldt(GDT_ENTRY_LDT*8); +#endif + + segments_save_guest_and_switch_to_host(pvm); + pvm->loaded_cpu_state = 0; +} + +/* + * Set all hardware states back to host. + * Except user return MSRs. + */ +static void pvm_switch_to_host(struct vcpu_pvm *pvm) +{ + preempt_disable(); + pvm_prepare_switch_to_host(pvm); + preempt_enable(); +} + +DEFINE_PER_CPU(struct vcpu_pvm *, active_pvm_vcpu); + +/* + * Switches to specified vcpu, until a matching vcpu_put(), but assumes + * vcpu mutex is already taken. + */ +static void pvm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + + if (__this_cpu_read(active_pvm_vcpu) == pvm && vcpu->cpu == cpu) + return; + + __this_cpu_write(active_pvm_vcpu, pvm); + + indirect_branch_prediction_barrier(); +} + +static void pvm_vcpu_put(struct kvm_vcpu *vcpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + + pvm_prepare_switch_to_host(pvm); +} + +static void pvm_sched_in(struct kvm_vcpu *vcpu, int cpu) +{ +} + static void pvm_setup_mce(struct kvm_vcpu *vcpu) { } @@ -100,6 +307,8 @@ static void pvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) struct vcpu_pvm *pvm = to_pvm(vcpu); int i; + pvm_switch_to_host(pvm); + kvm_gpc_deactivate(&pvm->pvcs_gpc); if (!init_event) @@ -183,6 +392,24 @@ static int pvm_check_processor_compat(void) return 0; } +/* + * When in PVM mode, the hardware MSR_LSTAR is set to the entry point + * provided by the host entry code (switcher), and the + * hypervisor can also change the hardware MSR_TSC_AUX to emulate + * the guest MSR_TSC_AUX. + */ +static __init void pvm_setup_user_return_msrs(void) +{ + kvm_add_user_return_msr(MSR_LSTAR); + kvm_add_user_return_msr(MSR_TSC_AUX); + if (ia32_enabled()) { + if (is_intel) + kvm_add_user_return_msr(MSR_IA32_SYSENTER_CS); + else + kvm_add_user_return_msr(MSR_CSTAR); + } +} + static __init void pvm_set_cpu_caps(void) { if (boot_cpu_has(X86_FEATURE_NX)) @@ -253,6 +480,8 @@ static __init int hardware_setup(void) store_idt(&dt); host_idt_base = dt.address; + pvm_setup_user_return_msrs(); + pvm_set_cpu_caps(); kvm_configure_mmu(false, 0, 0, 0); @@ -287,8 +516,14 @@ static struct kvm_x86_ops pvm_x86_ops __initdata = { .vcpu_free = pvm_vcpu_free, .vcpu_reset = pvm_vcpu_reset, + .prepare_switch_to_guest = pvm_prepare_switch_to_guest, + .vcpu_load = pvm_vcpu_load, + .vcpu_put = pvm_vcpu_put, + .vcpu_after_set_cpuid = pvm_vcpu_after_set_cpuid, + .sched_in = pvm_sched_in, + .nested_ops = &pvm_nested_ops, .setup_mce = pvm_setup_mce, diff --git a/arch/x86/kvm/pvm/pvm.h b/arch/x86/kvm/pvm/pvm.h index 599bbbb284dc..6584314487bc 100644 --- a/arch/x86/kvm/pvm/pvm.h +++ b/arch/x86/kvm/pvm/pvm.h @@ -30,13 +30,18 @@ struct vcpu_pvm { unsigned long switch_flags; + u16 host_ds_sel, host_es_sel; + u32 hw_cs, hw_ss; + int loaded_cpu_state; int int_shadow; bool nmi_mask; struct gfn_to_pfn_cache pvcs_gpc; + // emulated x86 msrs + u64 msr_tsc_aux; /* * Only bits masked by msr_ia32_feature_control_valid_bits can be set in * msr_ia32_feature_control. FEAT_CTL_LOCKED is always included From patchwork Mon Feb 26 14:35:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572305 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EADFA132C04; Mon, 26 Feb 2024 14:35:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958149; cv=none; b=IwO2Pr2iLi7KiUD4WLOZJ5Vs4p8rhOuVldAnMKdQrSyMdNu6ZJ0vTyFsrabe1ubpoO/eRRgXtXBdkbwmQdIjCaP595QDy+HoBFq/TYmSeW0MbcMX1S/2ItgSoRE/5uxYfOeGDlSdOch+9vklx48lGLUpjJ4GV7JFg5MCNybCDmM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958149; c=relaxed/simple; bh=/iStsMPT4BDfLBz8g+O9phpKKibaGHh4l64BGMMKED0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=AYxxhOnmrG1LfM/eQPmpp6M/NjPjPEzxpq+Q4r5v5OYGsbF5K1s48BbnSMM+jVTTsqBmjWxAsXn9iHP112FUa7KO+XewQEHBfNihjtMSOa5ODBqsRVzE7eh71QsT6gQCPYCF/Et/RgAClE7j5wG5e4XDAC59sx0L26XRt6R6Z4I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=VYel0VYC; arc=none smtp.client-ip=209.85.214.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="VYel0VYC" Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-1dcab44747bso5025345ad.1; Mon, 26 Feb 2024 06:35:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958147; x=1709562947; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wSD5Tv+fh0uCAiCRBZiaWK9rSl9cDRTDnFCz1andQQ0=; b=VYel0VYCTnyZqsmF/Dq6MJ2p4BQNRST4gusUrpfILuC3zjMwge3YMMlSpEyToJiS8J LzIZPq9qo6e+xz+43RV0APkoAsa1+dDDwYznYV4/s+WBktF6y/qgX++1iyTrrzzf92mO pbQz5+xhxOB7nePFKyhQtj6v5q31DKJfd7BZ3qck3nMcw39+CcudajIU4DbKKyWsLe2g KHQmiMq70F6uxFk1U+13PEYdhkmj7myu/4ogEgX1vzGBU79TcaEAHA32SNtTa7rG7NpC dOQ3KreDKYQDxdaJ0uX5DgfHYBD7WKwLzDRQurvlUb7GmvLvv2K38ogBn5/78ytj6GRn zQ2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958147; x=1709562947; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wSD5Tv+fh0uCAiCRBZiaWK9rSl9cDRTDnFCz1andQQ0=; b=TrrzPi/enwXI7w9FV6S/2ldXZO5etqPidhXmKKwCb+uDw3ZSuB396T8TpN7IjRjXOk d5nk79e/MXi2tFrSyiBsD4efKkZfEbSa1lmaVLnODXVhTY3CVntZJQ3bXwJZTJqrOAyM rSsofA0M2SmSA5GzxlduGFgl5jXGakZjH1WWmZmAOqKrabwUMKjf5kxUvMvun9TGlXp6 WdTkWy8CmtiVgUOdIyh+5Skr3rBZpRgQccXvxGcYjvHAK4wceiKL56APwNoLrB0iqMiM Jn6UPPXjL4cKUrv6Uup5d4I6XfNfbLTj+qMxhuA5B2zMg4gMXWMe7BHve/LPayX3knrY MsmA== X-Forwarded-Encrypted: i=1; AJvYcCWKYWUTLIpBY849Huw8ka2ks85Fze3YAmRqylC2X50yleXpVCxa8k+BBlkJZbyA6Ce8FIxNE9Zlqh7fZhRMPyoLgf2R X-Gm-Message-State: AOJu0YxXyt+m6eXs/lW2LCkyba0KSWVnVrK7PtrKxde8bRDq7cqpYsz8 BOq73oZHcmonZcxiCSMuzrIFhj6FTFDb3QoI/YqgdWszVAa6rRqiB2d3rl+d X-Google-Smtp-Source: AGHT+IFnewZt5yMH7itPjcpNv8Fc95fBWDAtqKcOadkb4c2kyLSm/1eqDZ70yzfd19RUKJP1Pr9OYA== X-Received: by 2002:a17:902:6504:b0:1dc:4a8b:2e21 with SMTP id b4-20020a170902650400b001dc4a8b2e21mr6190269plk.19.1708958147051; Mon, 26 Feb 2024 06:35:47 -0800 (PST) Received: from localhost ([47.88.5.130]) by smtp.gmail.com with ESMTPSA id v9-20020a1709029a0900b001db594c9d17sm3981266plp.254.2024.02.26.06.35.46 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:35:46 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 21/73] KVM: x86/PVM: Implement vcpu_run() callbacks Date: Mon, 26 Feb 2024 22:35:38 +0800 Message-Id: <20240226143630.33643-22-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan In the vcpu_run() callback, the hypervisor needs to prepare for VM enter in the switcher and record exit reasons after VM exit. The guest registers are prepared on the host SP0 stack, and the guest/host hardware CR3 is saved in the TSS for the switcher before VM enter. Additionally, the guest xsave state is loaded into hardware before VM enter. After VM exit, the guest registers are saved from the entry stack, and host xsave states are restored. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 163 +++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/pvm/pvm.h | 5 ++ 2 files changed, 168 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index 52b3b47ffe42..00a50ed0c118 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -16,8 +16,11 @@ #include #include #include +#include #include "cpuid.h" +#include "lapic.h" +#include "trace.h" #include "x86.h" #include "pvm.h" @@ -204,6 +207,31 @@ static void pvm_switch_to_host(struct vcpu_pvm *pvm) preempt_enable(); } +static void pvm_set_host_cr3_for_hypervisor(struct vcpu_pvm *pvm) +{ + unsigned long cr3; + + if (static_cpu_has(X86_FEATURE_PCID)) + cr3 = __get_current_cr3_fast() | X86_CR3_PCID_NOFLUSH; + else + cr3 = __get_current_cr3_fast(); + this_cpu_write(cpu_tss_rw.tss_ex.host_cr3, cr3); +} + +// Set tss_ex.host_cr3 for VMExit. +// Set tss_ex.enter_cr3 for VMEnter. +static void pvm_set_host_cr3(struct vcpu_pvm *pvm) +{ + pvm_set_host_cr3_for_hypervisor(pvm); + this_cpu_write(cpu_tss_rw.tss_ex.enter_cr3, pvm->vcpu.arch.mmu->root.hpa); +} + +static void pvm_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, + int root_level) +{ + /* Nothing to do. Guest cr3 will be prepared in pvm_set_host_cr3(). */ +} + DEFINE_PER_CPU(struct vcpu_pvm *, active_pvm_vcpu); /* @@ -262,6 +290,136 @@ static bool cpu_has_pvm_wbinvd_exit(void) return true; } +static int pvm_vcpu_pre_run(struct kvm_vcpu *vcpu) +{ + return 1; +} + +// Save guest registers from host sp0 or IST stack. +static __always_inline void save_regs(struct kvm_vcpu *vcpu, struct pt_regs *guest) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + + vcpu->arch.regs[VCPU_REGS_RAX] = guest->ax; + vcpu->arch.regs[VCPU_REGS_RCX] = guest->cx; + vcpu->arch.regs[VCPU_REGS_RDX] = guest->dx; + vcpu->arch.regs[VCPU_REGS_RBX] = guest->bx; + vcpu->arch.regs[VCPU_REGS_RSP] = guest->sp; + vcpu->arch.regs[VCPU_REGS_RBP] = guest->bp; + vcpu->arch.regs[VCPU_REGS_RSI] = guest->si; + vcpu->arch.regs[VCPU_REGS_RDI] = guest->di; + vcpu->arch.regs[VCPU_REGS_R8] = guest->r8; + vcpu->arch.regs[VCPU_REGS_R9] = guest->r9; + vcpu->arch.regs[VCPU_REGS_R10] = guest->r10; + vcpu->arch.regs[VCPU_REGS_R11] = guest->r11; + vcpu->arch.regs[VCPU_REGS_R12] = guest->r12; + vcpu->arch.regs[VCPU_REGS_R13] = guest->r13; + vcpu->arch.regs[VCPU_REGS_R14] = guest->r14; + vcpu->arch.regs[VCPU_REGS_R15] = guest->r15; + vcpu->arch.regs[VCPU_REGS_RIP] = guest->ip; + pvm->rflags = guest->flags; + pvm->hw_cs = guest->cs; + pvm->hw_ss = guest->ss; +} + +// load guest registers to host sp0 stack. +static __always_inline void load_regs(struct kvm_vcpu *vcpu, struct pt_regs *guest) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + + guest->ss = pvm->hw_ss; + guest->sp = vcpu->arch.regs[VCPU_REGS_RSP]; + guest->flags = (pvm->rflags & SWITCH_ENTER_EFLAGS_ALLOWED) | SWITCH_ENTER_EFLAGS_FIXED; + guest->cs = pvm->hw_cs; + guest->ip = vcpu->arch.regs[VCPU_REGS_RIP]; + guest->orig_ax = -1; + guest->di = vcpu->arch.regs[VCPU_REGS_RDI]; + guest->si = vcpu->arch.regs[VCPU_REGS_RSI]; + guest->dx = vcpu->arch.regs[VCPU_REGS_RDX]; + guest->cx = vcpu->arch.regs[VCPU_REGS_RCX]; + guest->ax = vcpu->arch.regs[VCPU_REGS_RAX]; + guest->r8 = vcpu->arch.regs[VCPU_REGS_R8]; + guest->r9 = vcpu->arch.regs[VCPU_REGS_R9]; + guest->r10 = vcpu->arch.regs[VCPU_REGS_R10]; + guest->r11 = vcpu->arch.regs[VCPU_REGS_R11]; + guest->bx = vcpu->arch.regs[VCPU_REGS_RBX]; + guest->bp = vcpu->arch.regs[VCPU_REGS_RBP]; + guest->r12 = vcpu->arch.regs[VCPU_REGS_R12]; + guest->r13 = vcpu->arch.regs[VCPU_REGS_R13]; + guest->r14 = vcpu->arch.regs[VCPU_REGS_R14]; + guest->r15 = vcpu->arch.regs[VCPU_REGS_R15]; +} + +static noinstr void pvm_vcpu_run_noinstr(struct kvm_vcpu *vcpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + struct pt_regs *sp0_regs = (struct pt_regs *)this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1; + struct pt_regs *ret_regs; + + guest_state_enter_irqoff(); + + // Load guest registers into the host sp0 stack for switcher. + load_regs(vcpu, sp0_regs); + + // Call into switcher and enter guest. + ret_regs = switcher_enter_guest(); + + // Get the guest registers from the host sp0 stack. + save_regs(vcpu, ret_regs); + pvm->exit_vector = (ret_regs->orig_ax >> 32); + pvm->exit_error_code = (u32)ret_regs->orig_ax; + + guest_state_exit_irqoff(); +} + +/* + * PVM wrappers for kvm_load_{guest|host}_xsave_state(). + * + * Currently PKU is disabled for shadowpaging and to avoid overhead, + * host CR4.PKE is unchanged for entering/exiting guest even when + * host CR4.PKE is enabled. + * + * These wrappers fix pkru when host CR4.PKE is enabled. + */ +static inline void pvm_load_guest_xsave_state(struct kvm_vcpu *vcpu) +{ + kvm_load_guest_xsave_state(vcpu); + + if (cpu_feature_enabled(X86_FEATURE_PKU)) { + if (vcpu->arch.host_pkru) + write_pkru(0); + } +} + +static inline void pvm_load_host_xsave_state(struct kvm_vcpu *vcpu) +{ + kvm_load_host_xsave_state(vcpu); + + if (cpu_feature_enabled(X86_FEATURE_PKU)) { + if (rdpkru() != vcpu->arch.host_pkru) + write_pkru(vcpu->arch.host_pkru); + } +} + +static fastpath_t pvm_vcpu_run(struct kvm_vcpu *vcpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + + trace_kvm_entry(vcpu); + + pvm_load_guest_xsave_state(vcpu); + + kvm_wait_lapic_expire(vcpu); + + pvm_set_host_cr3(pvm); + + pvm_vcpu_run_noinstr(vcpu); + + pvm_load_host_xsave_state(vcpu); + + return EXIT_FASTPATH_NONE; +} + static void reset_segment(struct kvm_segment *var, int seg) { memset(var, 0, sizeof(*var)); @@ -520,6 +678,11 @@ static struct kvm_x86_ops pvm_x86_ops __initdata = { .vcpu_load = pvm_vcpu_load, .vcpu_put = pvm_vcpu_put, + .load_mmu_pgd = pvm_load_mmu_pgd, + + .vcpu_pre_run = pvm_vcpu_pre_run, + .vcpu_run = pvm_vcpu_run, + .vcpu_after_set_cpuid = pvm_vcpu_after_set_cpuid, .sched_in = pvm_sched_in, diff --git a/arch/x86/kvm/pvm/pvm.h b/arch/x86/kvm/pvm/pvm.h index 6584314487bc..349f4eac98ec 100644 --- a/arch/x86/kvm/pvm/pvm.h +++ b/arch/x86/kvm/pvm/pvm.h @@ -28,10 +28,15 @@ int host_mmu_init(void); struct vcpu_pvm { struct kvm_vcpu vcpu; + // guest rflags, turned into hw rflags when in switcher + unsigned long rflags; + unsigned long switch_flags; u16 host_ds_sel, host_es_sel; + u32 exit_vector; + u32 exit_error_code; u32 hw_cs, hw_ss; int loaded_cpu_state; From patchwork Mon Feb 26 14:35:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572306 Received: from mail-pg1-f171.google.com (mail-pg1-f171.google.com [209.85.215.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 29F27132C2C; Mon, 26 Feb 2024 14:35:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958152; cv=none; b=DwVlYYMeVTac20peuS1RYMITc9EDCMxnvmxIb5ZA+oifKX8O2uOKd8qnk8HnzSiGkLmcrc2wCX6tk3xalBZBbUjp09W2y7bIqvuSnL2eL0uHlW0LEBfY58VAd2FNcKdbpMA5SUAUIC+X3K6cz6PBxKyP6U0dJ3PBJWrKCjd2+h8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958152; c=relaxed/simple; bh=1vEfezG9QmRjQ8J3U1Aysh33N+lTsittc0ac1Dd6Rxg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=eJirbmJJ+75WzxUb3E+cUNliqaelPGwDI9GA5tSCc97cbrQTH6Db1JQm2KuwcYbM7tD0d+rbr7q80eD7xL/qIQ6fwY28X3t/HiM+6XTFLCFj2hLKCV7y3dVuTAyuvp605OnEfSw3Rh19urkzq3JXEWnZY7OKCOzVaratRFQx0R8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=EZuFFqiG; arc=none smtp.client-ip=209.85.215.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="EZuFFqiG" Received: by mail-pg1-f171.google.com with SMTP id 41be03b00d2f7-5e42b4bbfa4so2299073a12.1; Mon, 26 Feb 2024 06:35:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958150; x=1709562950; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GEYgEPxnIogvnB7Zjz1HomLlX+OknwZSPNHHST0cuPA=; b=EZuFFqiGBF1LnvyJWBASWUZ/gpOEn8ZODCE2XB4VmTWRIb/rRBCoyC+QXhHKZe7rwx +8nSur37gYI+VZlMfNB3bAnJCPx9F7q9NhEmM9bT0GBj/czYJD8yin8B9Ov+yHEM4HUX hAdI6MqR10D55hKHPI/7aVwhXBCrD6jTLvYS5zCXD12DZuPLIX0BkBpiu1/tQkHMI9ot BD7uPBHaCzTYwU5u6s4819x0eYSTfUJclHXk/pRVtKr0EBljSfh1pkJwrCiPVQ0SZcv6 GC9ET7wLThsTb+z2/IkJ8dgdF0OnPyjdGuFWXlJPP1FHfBp3E/+19kst9/NC+Bzo6VVk BAWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958150; x=1709562950; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GEYgEPxnIogvnB7Zjz1HomLlX+OknwZSPNHHST0cuPA=; b=VbbShAuxKhYXUA7Wphj49zd7JMp79f1oQlWJHNMtiHDJgQl82iVUvGrMfM59KXi5LE UA6VnvKNIqBqyrwawA4mm9A5f+c7ps7xSu1aAPwrHUlwdUkfBfA+QoInP57sL7yi/WvE vWMhXXR3BSxUipocZrZMrIwKEEofzz8fvWMdZxqLxtlVb/L2/ROw3rCxDfapzOLhs7jy QVa0ySqAjI1nENGu1u2iFpv/WM3dPXihTNxy4cZZXAbdrBkV/gKNAgQAD9QP0zMvM+h8 tf8aacDGkfjZpPwMRsBZ2WceK9b1nVizj5LEcWVkYsJOMImi22nZdu68DbdO+JTxGKss Z8VQ== X-Forwarded-Encrypted: i=1; AJvYcCXv/lQGFw7iFTMBmQGUHz4hGfDCAZsFhkp3Zk21m31iN5iB6b088bHq6aX0CH9dFPy3VLsFGFAMhArp9vR4S/YOTG/G X-Gm-Message-State: AOJu0YyECQf1EtaBFtGNolqlcN3mlfSmI+7+UeYbGPbpiK3baTM6Zy8z TESDaqlmG16ZwewVdjFjicOBibkclk1RitDvlGkEUU2OtgMGE6kB0bxbn3yR X-Google-Smtp-Source: AGHT+IGXuqJi3oC1P9HsXiXL3F0CcvclMb8/tyAsW4SrKL+SdfZxtrL7dAPtTXQRkM5ULNQY0enwQw== X-Received: by 2002:a17:90a:68c8:b0:299:42d1:91df with SMTP id q8-20020a17090a68c800b0029942d191dfmr5769463pjj.14.1708958150219; Mon, 26 Feb 2024 06:35:50 -0800 (PST) Received: from localhost ([198.11.178.15]) by smtp.gmail.com with ESMTPSA id t20-20020a17090ad15400b00299101c1341sm4541312pjw.18.2024.02.26.06.35.49 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:35:49 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 22/73] KVM: x86/PVM: Handle some VM exits before enable interrupts Date: Mon, 26 Feb 2024 22:35:39 +0800 Message-Id: <20240226143630.33643-23-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan Similar to VMX, NMI should be handled in non-instrumented code early after VM exit. Additionally, #PF, #VE, #VC, and #DB need early handling in non-instrumented code as well. Host interrupts and #MC need to be handled before enabling interrupts. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 89 ++++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/pvm/pvm.h | 8 ++++ 2 files changed, 97 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index 00a50ed0c118..29c6d8da7c19 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -265,6 +265,58 @@ static void pvm_setup_mce(struct kvm_vcpu *vcpu) { } +static int handle_exit_external_interrupt(struct kvm_vcpu *vcpu) +{ + ++vcpu->stat.irq_exits; + return 1; +} + +static int handle_exit_failed_vmentry(struct kvm_vcpu *vcpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + u32 error_code = pvm->exit_error_code; + + kvm_queue_exception_e(vcpu, GP_VECTOR, error_code); + return 1; +} + +/* + * The guest has exited. See if we can fix it or if we need userspace + * assistance. + */ +static int pvm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + u32 exit_reason = pvm->exit_vector; + + if (exit_reason >= FIRST_EXTERNAL_VECTOR && exit_reason < NR_VECTORS) + return handle_exit_external_interrupt(vcpu); + else if (exit_reason == PVM_FAILED_VMENTRY_VECTOR) + return handle_exit_failed_vmentry(vcpu); + + vcpu_unimpl(vcpu, "pvm: unexpected exit reason 0x%x\n", exit_reason); + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; + vcpu->run->internal.suberror = + KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON; + vcpu->run->internal.ndata = 2; + vcpu->run->internal.data[0] = exit_reason; + vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu; + return 0; +} + +static void pvm_handle_exit_irqoff(struct kvm_vcpu *vcpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + u32 vector = pvm->exit_vector; + gate_desc *desc = (gate_desc *)host_idt_base + vector; + + if (vector >= FIRST_EXTERNAL_VECTOR && vector < NR_VECTORS && + vector != IA32_SYSCALL_VECTOR) + kvm_do_interrupt_irqoff(vcpu, gate_offset(desc)); + else if (vector == MC_VECTOR) + kvm_machine_check(); +} + static bool pvm_has_emulated_msr(struct kvm *kvm, u32 index) { switch (index) { @@ -369,6 +421,40 @@ static noinstr void pvm_vcpu_run_noinstr(struct kvm_vcpu *vcpu) pvm->exit_vector = (ret_regs->orig_ax >> 32); pvm->exit_error_code = (u32)ret_regs->orig_ax; + // handle noinstr vmexits reasons. + switch (pvm->exit_vector) { + case PF_VECTOR: + // if the exit due to #PF, check for async #PF. + pvm->exit_cr2 = read_cr2(); + vcpu->arch.apf.host_apf_flags = kvm_read_and_reset_apf_flags(); + break; + case NMI_VECTOR: + kvm_do_nmi_irqoff(vcpu); + break; + case VE_VECTOR: + // TODO: pvm host is TDX guest. + // tdx_get_ve_info(&pvm->host_ve); + break; + case X86_TRAP_VC: + /* + * TODO: pvm host is SEV guest. + * if (!vc_is_db(error_code)) { + * collect info and handle the first part for #VC + * break; + * } else { + * get_debugreg(pvm->exit_dr6, 6); + * set_debugreg(DR6_RESERVED, 6); + * } + */ + break; + case DB_VECTOR: + get_debugreg(pvm->exit_dr6, 6); + set_debugreg(DR6_RESERVED, 6); + break; + default: + break; + } + guest_state_exit_irqoff(); } @@ -682,9 +768,12 @@ static struct kvm_x86_ops pvm_x86_ops __initdata = { .vcpu_pre_run = pvm_vcpu_pre_run, .vcpu_run = pvm_vcpu_run, + .handle_exit = pvm_handle_exit, .vcpu_after_set_cpuid = pvm_vcpu_after_set_cpuid, + .handle_exit_irqoff = pvm_handle_exit_irqoff, + .sched_in = pvm_sched_in, .nested_ops = &pvm_nested_ops, diff --git a/arch/x86/kvm/pvm/pvm.h b/arch/x86/kvm/pvm/pvm.h index 349f4eac98ec..123cfe1c3c6a 100644 --- a/arch/x86/kvm/pvm/pvm.h +++ b/arch/x86/kvm/pvm/pvm.h @@ -7,6 +7,8 @@ #define SWITCH_FLAGS_INIT (SWITCH_FLAGS_SMOD) +#define PVM_FAILED_VMENTRY_VECTOR SWITCH_EXIT_REASONS_FAILED_VMETNRY + #define PT_L4_SHIFT 39 #define PT_L4_SIZE (1UL << PT_L4_SHIFT) #define DEFAULT_RANGE_L4_SIZE (32 * PT_L4_SIZE) @@ -35,6 +37,12 @@ struct vcpu_pvm { u16 host_ds_sel, host_es_sel; + union { + unsigned long exit_extra; + unsigned long exit_cr2; + unsigned long exit_dr6; + struct ve_info exit_ve; + }; u32 exit_vector; u32 exit_error_code; u32 hw_cs, hw_ss; From patchwork Mon Feb 26 14:35:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572307 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D2C21332A7; Mon, 26 Feb 2024 14:35:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958156; cv=none; b=ZD0LEtp3qWJ3LSJdEb54Rbf1SyZgxF1omT/NkGHHwQ/RkAA2Z5QdVSdElEyvHcrktOyyPofpsaGj8LnovblkETH6leCTlfS/ebyviOGcZBMvhkokX4khAWOH3//aPkbSyKaTVMp+uNTZKGhsjlkIfeVSf7E0NujnmlJ+sHI7N+E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958156; c=relaxed/simple; bh=9aG0oEAda+9fcYOpzWdcAOx4WxIAEu+MaKCQRMdSxNo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=JdMdsSk06omHL61p6oAF+j/Bv0o6NTCHSoFROLAPuE+GS922oFv0UnmGxLvRoaxOGIjOn2ARlkdcvXq2UuhLm1hJTkcXsK+++Vwe3xLhbEkMlmfTkl/vYuT8/DRlt2J0GZkYe9FJL5a4RyVDCgI89UN7YDXKpIUiudUhMc/cmNs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=X4xF3pKa; arc=none smtp.client-ip=209.85.210.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="X4xF3pKa" Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-6e4c359e48aso1842629b3a.1; Mon, 26 Feb 2024 06:35:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958153; x=1709562953; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MSMVlWy0KU4egESPyBq9idmlDCEcQCFrMSnFvIksvNM=; b=X4xF3pKalRSFaOWUZ5cXitBnfkz+uFG5SC4675fR2dA/bXf4wymCHW90a1WjY9MwEO pbXR8ZY54J+NR+ij3Hm4ctgzf1X7LDVtKgH103O+Z6yXG0NKPBh/dB9UD+ABxg6FsiVQ JK6CKRZdSDA6N2jPB0N6XVs3by3z4vWc4jrafYhETthY09tdhmevtjHwNkc1gVGrSG8H 55+y4UA6m8jdSJONb/B3BQCJWXn83yorp3BGnW7sS1gUpTXXo4utpLe1SSgI0bx0pnVf JBCVPoyI82sLtQMlrrovILPTUacy1Am8jBFbwnnRoahsPzO7bp5HDJpvAe4UqnmI3EjL ZJDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958153; x=1709562953; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MSMVlWy0KU4egESPyBq9idmlDCEcQCFrMSnFvIksvNM=; b=VnKgojeCcKkrmpPZYb4TyvZv0UaGnxfBySnNDwRuOxSNg/MgGgC3piHk5ohR03OY1N GI5yXLPaspVJOYyGyAt8Ysxz2ctG9QAb86dJk92yEagj+b2//Yy/h8kCHY4o/5+dalxT 2fqikUwwOdqmqIZdJi1u4V85gdsN9OnYLJnUAkcXc0yc5V8y+qVNMEznIuDjzGNZTiT8 d8k2uyNL+EALQWPAa2kAZxm+hM687n4sj8tTXy0mWdQz8dn2ry5TI1rTvWlbQtfpYOxV XgTgAxTUEQqoyDB/TUJA1VJZGfUTEeo+rTAdBgLhmKeIX3BJnPQOwpSq8kf3620RV6Zo uAGw== X-Forwarded-Encrypted: i=1; AJvYcCWDn1Ae3IVqv9vNT/oCyehBakOiQgKnTzx4XngajNPuRpdWFXXREyh853Kx/1BFOtu7PQ/1kaZgnSdnx8aEJrx0OhtX X-Gm-Message-State: AOJu0YyRdepaFN/LOrBv+bhJ0ryph6183IeojQytb61rGoWIBLL3Uuhz 1EfuJ3BvVvBdqSCJijgDK4gbK1/qdGc/KZcZ9uU3OSLtINeGi/T6xpKU0K88 X-Google-Smtp-Source: AGHT+IGfErTmpaFA2eYvcZFrMIpLzD4mTN8y9z8Ngw++N10rufjVVfzsKy/Kc0ONU+UwZK+9HN/8yg== X-Received: by 2002:a05:6a21:3a81:b0:1a0:f5e6:1115 with SMTP id zv1-20020a056a213a8100b001a0f5e61115mr7515293pzb.2.1708958153458; Mon, 26 Feb 2024 06:35:53 -0800 (PST) Received: from localhost ([47.89.225.180]) by smtp.gmail.com with ESMTPSA id e2-20020a170902f1c200b001d9a40f50c4sm4046718plc.301.2024.02.26.06.35.52 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:35:53 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 23/73] KVM: x86/PVM: Handle event handling related MSR read/write operation Date: Mon, 26 Feb 2024 22:35:40 +0800 Message-Id: <20240226143630.33643-24-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan In the PVM event handling specification, the guest needs to register the event entry into the associated MSRs before delivering the event. Therefore, handling them in the get_msr()/set_msr() callbacks is necessary to prepare for event delivery later. Additionally, the user mode syscall event still uses the original syscall event entry, but only MSR_LSTAR is used; other MSRs are ignored. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 188 +++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/pvm/pvm.h | 7 ++ 2 files changed, 195 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index 29c6d8da7c19..69f8fbbb6176 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -31,6 +31,33 @@ static bool __read_mostly is_intel; static unsigned long host_idt_base; +static inline u16 kernel_cs_by_msr(u64 msr_star) +{ + // [47..32] + // and force rpl=0 + return ((msr_star >> 32) & ~0x3); +} + +static inline u16 kernel_ds_by_msr(u64 msr_star) +{ + // [47..32] + 8 + // and force rpl=0 + return ((msr_star >> 32) & ~0x3) + 8; +} + +static inline u16 user_cs32_by_msr(u64 msr_star) +{ + // [63..48] is user_cs32 and force rpl=3 + return ((msr_star >> 48) | 0x3); +} + +static inline u16 user_cs_by_msr(u64 msr_star) +{ + // [63..48] is user_cs32, and [63..48] + 16 is user_cs + // and force rpl=3 + return ((msr_star >> 48) | 0x3) + 16; +} + static inline void __save_gs_base(struct vcpu_pvm *pvm) { // switcher will do a real hw swapgs, so use hw MSR_KERNEL_GS_BASE @@ -261,6 +288,161 @@ static void pvm_sched_in(struct kvm_vcpu *vcpu, int cpu) { } +static int pvm_get_msr_feature(struct kvm_msr_entry *msr) +{ + return 1; +} + +static void pvm_msr_filter_changed(struct kvm_vcpu *vcpu) +{ + /* Accesses to MSRs are emulated in hypervisor, nothing to do here. */ +} + +/* + * Reads an msr value (of 'msr_index') into 'msr_info'. + * Returns 0 on success, non-0 otherwise. + * Assumes vcpu_load() was already called. + */ +static int pvm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + int ret = 0; + + switch (msr_info->index) { + case MSR_STAR: + msr_info->data = pvm->msr_star; + break; + case MSR_LSTAR: + msr_info->data = pvm->msr_lstar; + break; + case MSR_SYSCALL_MASK: + msr_info->data = pvm->msr_syscall_mask; + break; + case MSR_CSTAR: + msr_info->data = pvm->unused_MSR_CSTAR; + break; + /* + * Since SYSENTER is not supported for the guest, we return a bad + * segment to the emulator when emulating the instruction for #GP. + */ + case MSR_IA32_SYSENTER_CS: + msr_info->data = GDT_ENTRY_INVALID_SEG; + break; + case MSR_IA32_SYSENTER_EIP: + msr_info->data = pvm->unused_MSR_IA32_SYSENTER_EIP; + break; + case MSR_IA32_SYSENTER_ESP: + msr_info->data = pvm->unused_MSR_IA32_SYSENTER_ESP; + break; + case MSR_PVM_VCPU_STRUCT: + msr_info->data = pvm->msr_vcpu_struct; + break; + case MSR_PVM_SUPERVISOR_RSP: + msr_info->data = pvm->msr_supervisor_rsp; + break; + case MSR_PVM_SUPERVISOR_REDZONE: + msr_info->data = pvm->msr_supervisor_redzone; + break; + case MSR_PVM_EVENT_ENTRY: + msr_info->data = pvm->msr_event_entry; + break; + case MSR_PVM_RETU_RIP: + msr_info->data = pvm->msr_retu_rip_plus2 - 2; + break; + case MSR_PVM_RETS_RIP: + msr_info->data = pvm->msr_rets_rip_plus2 - 2; + break; + default: + ret = kvm_get_msr_common(vcpu, msr_info); + } + + return ret; +} + +/* + * Writes msr value into the appropriate "register". + * Returns 0 on success, non-0 otherwise. + * Assumes vcpu_load() was already called. + */ +static int pvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + int ret = 0; + u32 msr_index = msr_info->index; + u64 data = msr_info->data; + + switch (msr_index) { + case MSR_STAR: + /* + * Guest KERNEL_CS/DS shouldn't be NULL and guest USER_CS/DS + * must be the same as the host USER_CS/DS. + */ + if (!msr_info->host_initiated) { + if (!kernel_cs_by_msr(data)) + return 1; + if (user_cs_by_msr(data) != __USER_CS) + return 1; + } + pvm->msr_star = data; + break; + case MSR_LSTAR: + if (is_noncanonical_address(msr_info->data, vcpu)) + return 1; + pvm->msr_lstar = data; + break; + case MSR_SYSCALL_MASK: + pvm->msr_syscall_mask = data; + break; + case MSR_CSTAR: + pvm->unused_MSR_CSTAR = data; + break; + case MSR_IA32_SYSENTER_CS: + pvm->unused_MSR_IA32_SYSENTER_CS = data; + break; + case MSR_IA32_SYSENTER_EIP: + pvm->unused_MSR_IA32_SYSENTER_EIP = data; + break; + case MSR_IA32_SYSENTER_ESP: + pvm->unused_MSR_IA32_SYSENTER_ESP = data; + break; + case MSR_PVM_VCPU_STRUCT: + if (!PAGE_ALIGNED(data)) + return 1; + if (!data) + kvm_gpc_deactivate(&pvm->pvcs_gpc); + else if (kvm_gpc_activate(&pvm->pvcs_gpc, data, PAGE_SIZE)) + return 1; + + pvm->msr_vcpu_struct = data; + break; + case MSR_PVM_SUPERVISOR_RSP: + pvm->msr_supervisor_rsp = msr_info->data; + break; + case MSR_PVM_SUPERVISOR_REDZONE: + pvm->msr_supervisor_redzone = msr_info->data; + break; + case MSR_PVM_EVENT_ENTRY: + if (is_noncanonical_address(data, vcpu) || + is_noncanonical_address(data + 256, vcpu) || + is_noncanonical_address(data + 512, vcpu)) { + kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu); + return 1; + } + pvm->msr_event_entry = msr_info->data; + break; + case MSR_PVM_RETU_RIP: + pvm->msr_retu_rip_plus2 = msr_info->data + 2; + break; + case MSR_PVM_RETS_RIP: + pvm->msr_rets_rip_plus2 = msr_info->data + 2; + break; + default: + ret = kvm_set_msr_common(vcpu, msr_info); + } + + return ret; +} + static void pvm_setup_mce(struct kvm_vcpu *vcpu) { } @@ -764,6 +946,9 @@ static struct kvm_x86_ops pvm_x86_ops __initdata = { .vcpu_load = pvm_vcpu_load, .vcpu_put = pvm_vcpu_put, + .get_msr_feature = pvm_get_msr_feature, + .get_msr = pvm_get_msr, + .set_msr = pvm_set_msr, .load_mmu_pgd = pvm_load_mmu_pgd, .vcpu_pre_run = pvm_vcpu_pre_run, @@ -779,6 +964,9 @@ static struct kvm_x86_ops pvm_x86_ops __initdata = { .nested_ops = &pvm_nested_ops, .setup_mce = pvm_setup_mce, + + .msr_filter_changed = pvm_msr_filter_changed, + .complete_emulated_msr = kvm_complete_insn_gp, }; static struct kvm_x86_init_ops pvm_init_ops __initdata = { diff --git a/arch/x86/kvm/pvm/pvm.h b/arch/x86/kvm/pvm/pvm.h index 123cfe1c3c6a..57ca2e901e0d 100644 --- a/arch/x86/kvm/pvm/pvm.h +++ b/arch/x86/kvm/pvm/pvm.h @@ -54,6 +54,13 @@ struct vcpu_pvm { struct gfn_to_pfn_cache pvcs_gpc; // emulated x86 msrs + u64 msr_lstar; + u64 msr_syscall_mask; + u64 msr_star; + u64 unused_MSR_CSTAR; + u64 unused_MSR_IA32_SYSENTER_CS; + u64 unused_MSR_IA32_SYSENTER_EIP; + u64 unused_MSR_IA32_SYSENTER_ESP; u64 msr_tsc_aux; /* * Only bits masked by msr_ia32_feature_control_valid_bits can be set in From patchwork Mon Feb 26 14:35:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572308 Received: from mail-pg1-f170.google.com (mail-pg1-f170.google.com [209.85.215.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 84D9013342D; Mon, 26 Feb 2024 14:35:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958159; cv=none; b=S3mKSmwfjWwR5aYVmhQcOda+Jo7mdonH0rmbEOqSGBX9vox7V3Yl+fGgIu+opeySSxOWEilCQkRIuUteutPUbfRNzDC5ByL4TdMKRfofyHxX2d2VrqXy4ixPRvpDXGdmoLr6wpOaoPBzQxObZ6eFsDnjqsP21wBLzWDaBfq1Fk8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958159; c=relaxed/simple; bh=5FZhE8CGaf4oOsLW99jFiL2V8DpwR94pz9xSUE+tHww=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Dp1m2g2MlmTalvjyKf+t3seOlBCfO6TPpz9vC+bTKrTJiPhv4pzfAxwg/03QL4JDOxxdQEAqxlNB2Ydx054H5gjnnBwO6ZTC376NW9OBDTs/6tXxO/7HlNBzMWetm/7zbEPYXldYvAlkACfFFEhXQ/DddDx0jLSSJUgQh3ragHA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=jvkIeBEr; arc=none smtp.client-ip=209.85.215.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="jvkIeBEr" Received: by mail-pg1-f170.google.com with SMTP id 41be03b00d2f7-5dbf7b74402so2528975a12.0; Mon, 26 Feb 2024 06:35:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958156; x=1709562956; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=avwD90nSI2LniFVSuZE7YLTIgz6R6zy0cQM3FsoXclk=; b=jvkIeBErSOe1/pmH3+2vtnYLAjE3iCOVvj5jkG2iz15gNUevOPhvY2wazm6HSAo2X0 RumEkss1W1WDN1eII/HDnqdhJH9QzsiqUGuok/yrNsOpLOq9IqJcH69OzXE6AigobeAg dmH+Soy1nsK4OxGQmOaE6f8W/lDQti+fu3qeeZIYMpPlES9TkDcLxfBhoWhfjFRpMvGI 1wCGRjeqAaDMzM+U7JapvTXCcauhI2WeaYiFlAQdb/uLpjHCtUQm5j5eQdvPwIsV61XY E+QKZhXPXqa7RwhQoR0EYmtcLbK2IqEUHYwl9zYFtuWtT63fE3XV90JYXBSRt72ltA2F xJPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958156; x=1709562956; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=avwD90nSI2LniFVSuZE7YLTIgz6R6zy0cQM3FsoXclk=; b=sefkZJFUlurtGfPKe8rSNiHRjZJjNMQ/pUeTFK++ulU8SCWq8w4cjdzhf7NOmW9wAL xba3yO5s6c1heaMsYBbeLEb9qXvQHBRVxySMhS7vsP6AK21AvwVyJHq/0uSafun1v/y3 J+Qn8qrHjsGhzcu6QmVVXhCYZRllKK9Mujq06e8wg8zv+G7ytht9bmXPmyQZtwg/7qWf u1oCwV1LHznwGjega0NT2bBr/2ZlvGIjXjF+aaXwTogE8Hko/xpHJdvIjplo6ZVargfL pTx7sl8+N91AgYouPso/sXWs9ikQz6WRyjxNmjQoneMX7zmh7KGKEZF3Tt7PkSC7x3d3 rbLg== X-Forwarded-Encrypted: i=1; AJvYcCVXSekHDZssgjO5dIvGj0LvOaofW9tvWcmyzDcjXtlQ85lmnftxdBZxXmxv54QSEMJJOmVvhQAF1xJBbCXO6Ag/Q07e X-Gm-Message-State: AOJu0Ywr08SFxqkc/Vlllmcc0Q9Wit6joxD9Ox3MJbWYgGPmQYvhP8ZC w/Tf8AQ4gatOSZh3uAN9h+Lc7NFyrj7MgtIapyzNlGUCqny3EVFsKn0SHqh+ X-Google-Smtp-Source: AGHT+IF9t94J1P3gFjC3w94X7z6mLYPJMKdgH0qJsWfw2n4K1ZzGIEBhrooZGx76fRCQIDy0RbezwA== X-Received: by 2002:a17:90a:8c0b:b0:29a:be15:9c90 with SMTP id a11-20020a17090a8c0b00b0029abe159c90mr2457111pjo.34.1708958156679; Mon, 26 Feb 2024 06:35:56 -0800 (PST) Received: from localhost ([47.88.5.130]) by smtp.gmail.com with ESMTPSA id x19-20020a17090ab01300b002990d91d31dsm6443295pjq.15.2024.02.26.06.35.55 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:35:56 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 24/73] KVM: x86/PVM: Introduce PVM mode switching Date: Mon, 26 Feb 2024 22:35:41 +0800 Message-Id: <20240226143630.33643-25-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan In PVM ABI, CPL is not used directly. Instead, supervisor mode and user mode are used to represent the original CPL0/CPL3 concept. It is assumed that the kernel runs in supervisor mode and userspace runs in user mode. From the x86 operating modes perspective, the PVM supervisor mode is a modified 64-bit long mode. Therefore, 32-bit compatibility mode is not allowed for the supervisor mode, and its hardware CS must be __USER_CS. When switching to user mode, the stack and GS base of supervisor mode are saved into the associated MSRs. When switching back from user mode, the stack and GS base of supervisor mode are automatically restored from the MSRs. Therefore, in PVM ABI, the value of MSR_KERNEL_GS_BASE in supervisor mode is the same as the value of MSR_GS_BASE in supervisor mode, which does not follow the x86 ABI. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 129 +++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/pvm/pvm.h | 1 + 2 files changed, 130 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index 69f8fbbb6176..3735baee1d5f 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -31,6 +31,22 @@ static bool __read_mostly is_intel; static unsigned long host_idt_base; +static inline bool is_smod(struct vcpu_pvm *pvm) +{ + unsigned long switch_flags = pvm->switch_flags; + + if ((switch_flags & SWITCH_FLAGS_MOD_TOGGLE) == SWITCH_FLAGS_SMOD) + return true; + + WARN_ON_ONCE((switch_flags & SWITCH_FLAGS_MOD_TOGGLE) != SWITCH_FLAGS_UMOD); + return false; +} + +static inline void pvm_switch_flags_toggle_mod(struct vcpu_pvm *pvm) +{ + pvm->switch_flags ^= SWITCH_FLAGS_MOD_TOGGLE; +} + static inline u16 kernel_cs_by_msr(u64 msr_star) { // [47..32] @@ -80,6 +96,82 @@ static inline void __load_fs_base(struct vcpu_pvm *pvm) wrmsrl(MSR_FS_BASE, pvm->segments[VCPU_SREG_FS].base); } +static u64 pvm_read_guest_gs_base(struct vcpu_pvm *pvm) +{ + preempt_disable(); + if (pvm->loaded_cpu_state) + __save_gs_base(pvm); + preempt_enable(); + + return pvm->segments[VCPU_SREG_GS].base; +} + +static u64 pvm_read_guest_fs_base(struct vcpu_pvm *pvm) +{ + preempt_disable(); + if (pvm->loaded_cpu_state) + __save_fs_base(pvm); + preempt_enable(); + + return pvm->segments[VCPU_SREG_FS].base; +} + +static u64 pvm_read_guest_kernel_gs_base(struct vcpu_pvm *pvm) +{ + return pvm->msr_kernel_gs_base; +} + +static void pvm_write_guest_gs_base(struct vcpu_pvm *pvm, u64 data) +{ + preempt_disable(); + pvm->segments[VCPU_SREG_GS].base = data; + if (pvm->loaded_cpu_state) + __load_gs_base(pvm); + preempt_enable(); +} + +static void pvm_write_guest_fs_base(struct vcpu_pvm *pvm, u64 data) +{ + preempt_disable(); + pvm->segments[VCPU_SREG_FS].base = data; + if (pvm->loaded_cpu_state) + __load_fs_base(pvm); + preempt_enable(); +} + +static void pvm_write_guest_kernel_gs_base(struct vcpu_pvm *pvm, u64 data) +{ + pvm->msr_kernel_gs_base = data; +} + +// switch_to_smod() and switch_to_umod() switch the mode (smod/umod) and +// the CR3. No vTLB flushing when switching the CR3 per PVM Spec. +static inline void switch_to_smod(struct kvm_vcpu *vcpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + + pvm_switch_flags_toggle_mod(pvm); + kvm_mmu_new_pgd(vcpu, pvm->msr_switch_cr3); + swap(pvm->msr_switch_cr3, vcpu->arch.cr3); + + pvm_write_guest_gs_base(pvm, pvm->msr_kernel_gs_base); + kvm_rsp_write(vcpu, pvm->msr_supervisor_rsp); + + pvm->hw_cs = __USER_CS; + pvm->hw_ss = __USER_DS; +} + +static inline void switch_to_umod(struct kvm_vcpu *vcpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + + pvm->msr_supervisor_rsp = kvm_rsp_read(vcpu); + + pvm_switch_flags_toggle_mod(pvm); + kvm_mmu_new_pgd(vcpu, pvm->msr_switch_cr3); + swap(pvm->msr_switch_cr3, vcpu->arch.cr3); +} + /* * Test whether DS, ES, FS and GS need to be reloaded. * @@ -309,6 +401,15 @@ static int pvm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) int ret = 0; switch (msr_info->index) { + case MSR_FS_BASE: + msr_info->data = pvm_read_guest_fs_base(pvm); + break; + case MSR_GS_BASE: + msr_info->data = pvm_read_guest_gs_base(pvm); + break; + case MSR_KERNEL_GS_BASE: + msr_info->data = pvm_read_guest_kernel_gs_base(pvm); + break; case MSR_STAR: msr_info->data = pvm->msr_star; break; @@ -352,6 +453,9 @@ static int pvm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_PVM_RETS_RIP: msr_info->data = pvm->msr_rets_rip_plus2 - 2; break; + case MSR_PVM_SWITCH_CR3: + msr_info->data = pvm->msr_switch_cr3; + break; default: ret = kvm_get_msr_common(vcpu, msr_info); } @@ -372,6 +476,15 @@ static int pvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) u64 data = msr_info->data; switch (msr_index) { + case MSR_FS_BASE: + pvm_write_guest_fs_base(pvm, data); + break; + case MSR_GS_BASE: + pvm_write_guest_gs_base(pvm, data); + break; + case MSR_KERNEL_GS_BASE: + pvm_write_guest_kernel_gs_base(pvm, data); + break; case MSR_STAR: /* * Guest KERNEL_CS/DS shouldn't be NULL and guest USER_CS/DS @@ -436,6 +549,9 @@ static int pvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_PVM_RETS_RIP: pvm->msr_rets_rip_plus2 = msr_info->data + 2; break; + case MSR_PVM_SWITCH_CR3: + pvm->msr_switch_cr3 = msr_info->data; + break; default: ret = kvm_set_msr_common(vcpu, msr_info); } @@ -443,6 +559,13 @@ static int pvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) return ret; } +static int pvm_get_cpl(struct kvm_vcpu *vcpu) +{ + if (is_smod(to_pvm(vcpu))) + return 0; + return 3; +} + static void pvm_setup_mce(struct kvm_vcpu *vcpu) { } @@ -683,6 +806,11 @@ static fastpath_t pvm_vcpu_run(struct kvm_vcpu *vcpu) pvm_vcpu_run_noinstr(vcpu); + if (is_smod(pvm)) { + if (pvm->hw_cs != __USER_CS || pvm->hw_ss != __USER_DS) + kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu); + } + pvm_load_host_xsave_state(vcpu); return EXIT_FASTPATH_NONE; @@ -949,6 +1077,7 @@ static struct kvm_x86_ops pvm_x86_ops __initdata = { .get_msr_feature = pvm_get_msr_feature, .get_msr = pvm_get_msr, .set_msr = pvm_set_msr, + .get_cpl = pvm_get_cpl, .load_mmu_pgd = pvm_load_mmu_pgd, .vcpu_pre_run = pvm_vcpu_pre_run, diff --git a/arch/x86/kvm/pvm/pvm.h b/arch/x86/kvm/pvm/pvm.h index 57ca2e901e0d..b0c633ce2987 100644 --- a/arch/x86/kvm/pvm/pvm.h +++ b/arch/x86/kvm/pvm/pvm.h @@ -61,6 +61,7 @@ struct vcpu_pvm { u64 unused_MSR_IA32_SYSENTER_CS; u64 unused_MSR_IA32_SYSENTER_EIP; u64 unused_MSR_IA32_SYSENTER_ESP; + u64 msr_kernel_gs_base; u64 msr_tsc_aux; /* * Only bits masked by msr_ia32_feature_control_valid_bits can be set in From patchwork Mon Feb 26 14:35:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572309 Received: from mail-pg1-f169.google.com (mail-pg1-f169.google.com [209.85.215.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8AE4A13399F; Mon, 26 Feb 2024 14:36:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958162; cv=none; b=EMUWJqLhN62GUxv3++vPOL/j61Ub923cDG2tRAFOONJz3UhtYxdNTErtZhVkL+9IaCbJJxNhJMePAaOAbY9tcwKwq/qmbOIm7wWm3n8m4NciaSir0j1vVXDWHpo/v14GCwGUzfGUJmFueqLzfCoMzg5Vv4icW7nrdCJwveEnylI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958162; c=relaxed/simple; bh=pkXKcagVoUrfls8IZw2n6mmAXzIWVwR2aUX+9PNq4Ls=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=eodqSTfeuuP3TuUdDvUtmyNGCs4c09SWZYcfOSgZw4bFI4YIVkTQkIUhu6clYIPi+Ihaft1QlAhr2GZUiO8jNkuZDsqm5LTVm3mXmHXbG1X236XfEVVdYcnRQ4npX+y9tFR5Qrcme5CZiLA6E3GowPeR9o0/m1tRaG4oKIZjOk4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=UXBfCvkY; arc=none smtp.client-ip=209.85.215.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="UXBfCvkY" Received: by mail-pg1-f169.google.com with SMTP id 41be03b00d2f7-5d42e7ab8a9so2021669a12.3; Mon, 26 Feb 2024 06:36:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958160; x=1709562960; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0jozFDje0mLN9coijYdqOvVfSSMDTpV9h2OeBFaDVOU=; b=UXBfCvkYIvmIvIACUAcHD7+X/wXG9DEXmEpNshxYTOvRnKa+OsjmwvwEEgo8A91pJb b2fU7/N6YFUbrDcFynuHLJSoPQjSeRvsW1zkhIPfMKU5tNg8KMg52Th5c0s8jYtCpBeW NwkS3TRNdeI0MhUT5e0E/BtYeIMmOHeH9Ia4uJY5Hy5NY4QmFE8YsaHvLekc3Y09mNIu qwy2lJXgCNwEh6bxJzvrgNJtRyMlVooCjBEUgsshBzM3JzOklkhe8R1rK+9A0HaTAjUH /gPZ/Trn4cDtR4ZeP/yDKeNQV8QwqN/+92WCwm8lItYHdaBNZfFTXuYgeom2+pOiDdq8 8mqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958160; x=1709562960; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0jozFDje0mLN9coijYdqOvVfSSMDTpV9h2OeBFaDVOU=; b=TjnvCL9zJFCYTAqYTgbNT9OfCCWUtKCesrY/g1lbDZeyAKspdK1svCsOoZW39PttX+ spe4jEVH7m2Q8Uye+sZU1SdNw6LPI7UXsn4Gsi4aaX3EE/Y5FaMvBi96lw2cDZNbZ+c7 rOgNlQE1hF47I0LHOe6johOWRuEIYQkIDbtyW4AAHUU2uoWQ848JrJ8U4jt+0rk7STKz a43LX5CXnOEwqdnrftcpi33moON9z498vk2ql9xFy5weAXAdUYDuMcS/HWNSG5P9eROa u3d9/Ia0Js5t7auJkQrI8k1K0UondgR6DMroh4yc+RLHKdFugppkrEX8RayD7xQee3uF N/Tw== X-Forwarded-Encrypted: i=1; AJvYcCXvi39gVkVMzgOziQuOjdc32ps17mAZ7pYFZw9mwMMUI8hqgcOoBDAyHG1GbTrfs27PLL/SNgH4X5E6jnspARQKUL6A X-Gm-Message-State: AOJu0YxsUs+3dmGACIU4wh0XtRcU/UogleGxtX42P9VciWgnTQtUNbSn MppvpGHktw9wy9k+IOeMw8uUWy5zpf19s2F28uvJYk8GkRn5YSIuyoPE5ZTI X-Google-Smtp-Source: AGHT+IFYLWbfH3g6LFVQ3GaEXvFm9BatfltE1NholegVyczA8+K3NeZcIQavpkYWVcIHl7qwAcIEcQ== X-Received: by 2002:a17:90a:db86:b0:29a:2860:28b9 with SMTP id h6-20020a17090adb8600b0029a286028b9mr4563991pjv.48.1708958159681; Mon, 26 Feb 2024 06:35:59 -0800 (PST) Received: from localhost ([47.88.5.130]) by smtp.gmail.com with ESMTPSA id sw8-20020a17090b2c8800b0029abf47ec7fsm2333867pjb.0.2024.02.26.06.35.59 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:35:59 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 25/73] KVM: x86/PVM: Implement APIC emulation related callbacks Date: Mon, 26 Feb 2024 22:35:42 +0800 Message-Id: <20240226143630.33643-26-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan For PVM, APIC virtualization for the guest is supported by reusing APIC emulation. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index 3735baee1d5f..ce047d211657 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -566,6 +566,25 @@ static int pvm_get_cpl(struct kvm_vcpu *vcpu) return 3; } +static void pvm_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, + int trig_mode, int vector) +{ + struct kvm_vcpu *vcpu = apic->vcpu; + + kvm_lapic_set_irr(vector, apic); + kvm_make_request(KVM_REQ_EVENT, vcpu); + kvm_vcpu_kick(vcpu); +} + +static void pvm_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu) +{ +} + +static bool pvm_apic_init_signal_blocked(struct kvm_vcpu *vcpu) +{ + return false; +} + static void pvm_setup_mce(struct kvm_vcpu *vcpu) { } @@ -1083,19 +1102,25 @@ static struct kvm_x86_ops pvm_x86_ops __initdata = { .vcpu_pre_run = pvm_vcpu_pre_run, .vcpu_run = pvm_vcpu_run, .handle_exit = pvm_handle_exit, + .refresh_apicv_exec_ctrl = pvm_refresh_apicv_exec_ctrl, + .deliver_interrupt = pvm_deliver_interrupt, .vcpu_after_set_cpuid = pvm_vcpu_after_set_cpuid, .handle_exit_irqoff = pvm_handle_exit_irqoff, + .request_immediate_exit = __kvm_request_immediate_exit, + .sched_in = pvm_sched_in, .nested_ops = &pvm_nested_ops, .setup_mce = pvm_setup_mce, + .apic_init_signal_blocked = pvm_apic_init_signal_blocked, .msr_filter_changed = pvm_msr_filter_changed, .complete_emulated_msr = kvm_complete_insn_gp, + .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, }; static struct kvm_x86_init_ops pvm_init_ops __initdata = { From patchwork Mon Feb 26 14:35:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572310 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E0B0813473A; Mon, 26 Feb 2024 14:36:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958165; cv=none; b=lzn4umsIadNO7aKnPcRd0+cQP+DF0vBVBwDS/OEBCo8gLfE01FFWbdxR6h3vjv3XOKj0XKzJOVqcJUVO7s0rrZLGnW6tKjNbSxFrDMZQ49HUtJU/VV4wAPfEj1LgG84zXm70mOv0e2WFE6N1KEHmfFsoD2vkybOamJ+nLq383no= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958165; c=relaxed/simple; bh=GvKuyzIXR8Z+/z01d0kpMrMpoQCTeikvWweEMnbCC3c=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=hEuzCz+7djtYFkCaATzr4v8dOYQXhTfXARburo09nIhyYFLwZ17z2zpuNM4R6hJpnuY1unVSaUnkYoKTA+1wDkg318DqxlxZk1dH/96Bo4WdtI+y4REZmK/NZVSDAKzBsOjkQT5U2keMczRsCn5kno2G0rDbKzlhTxnZlrkH8bc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=C6p4Q+mZ; arc=none smtp.client-ip=209.85.210.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="C6p4Q+mZ" Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-6e46dcd8feaso827300b3a.2; Mon, 26 Feb 2024 06:36:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958163; x=1709562963; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vJusf0G+J7FIR2mDyw6helWrSCNLVt7nNkO4Npqtj9I=; b=C6p4Q+mZKJux803GwexgBYojTakuXgdvK1fHukRcBxJO6fjpphXLZhMk5IZ8d1HMGz dzlyw4oQq9i6cxHbiRhwFgR8Z2tbtU8KD1SJL1TqB2NCQcgjx2wxnvnIyLEje+/RdOTx ysZ0yJmA8HIhb8G/jVKgHRzTc0gkMOJ8ZRjEvSNmfqM5O9OUr9bitkZDgJ27AhfGgHHA mBJBBHEDdt6LkjcD7kezA/Cb+tfDSQ2cr+LOimiUQFzl+Or36O+X4ne9POnyL+ROsY8i oV/81zirOqSfxrqU54XRhtr0Hp+EG4MBXjKWQ2uILvGHyoodvm0AWaTd84/Cv+0R0AA6 GDmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958163; x=1709562963; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vJusf0G+J7FIR2mDyw6helWrSCNLVt7nNkO4Npqtj9I=; b=YGUavYt3oZ1b84KhZDG7h7BihDsFxKjqudgcTgCERqHXw6cSvLyO13Sg8Bdcg71gnF /r9lbrarcssPJwT/ahop/OzoHsBKgk137H6rMzIawKmDW+opFuZmaqNcXFp+7OSYhZ7u 1fY3t4NRw7JHRzn7qAVfvLFmUJ6wT/0g3Inb2vrwNvzkPfZxlcNQPmotxl92pB6uxEtp U20sl1rwiNAcANHRU7uwWon13EqAxxzx5cM5IYHyAsZUdIwC3T0m1JPrvdhWYdxN12m+ L0ZpF3ofiqw0fqR9Tp4f2vD0hufolXEA2euxi8zAfaj+jjtCEC5d0C6s3VtSI6scYBIg 1ucg== X-Forwarded-Encrypted: i=1; AJvYcCU3YHIQV5qyCGdz03jGU0N0yYzqWKZUR1LXckzRcKmMjKDm2nMULbY3EYlB5yYWnA/SKa36ljL/FWtvjD9KwfJ/th4d X-Gm-Message-State: AOJu0YxIZwL/BZTD3eBVKQUXOmQnXxTuAnsdYVWqGg/dpSe3CHQFAVyB TlMO7nzEU/fGLd0UwvadBXDAaQEoVQRbMFo4AUOJnV7spHUgWiHRbpmdav4o X-Google-Smtp-Source: AGHT+IFeyLPYxLRIy1QEiRGWhhp6uPa9NhvEDD1L7H4AyNsB2/niLfYoI2fl6kZLiyl21iMCZldL0Q== X-Received: by 2002:a17:902:f709:b0:1dc:a8aa:3c80 with SMTP id h9-20020a170902f70900b001dca8aa3c80mr2062639plo.43.1708958162915; Mon, 26 Feb 2024 06:36:02 -0800 (PST) Received: from localhost ([198.11.178.15]) by smtp.gmail.com with ESMTPSA id w5-20020a170902d3c500b001dc2d1bd4d6sm4055885plb.77.2024.02.26.06.36.02 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:36:02 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 26/73] KVM: x86/PVM: Implement event delivery flags related callbacks Date: Mon, 26 Feb 2024 22:35:43 +0800 Message-Id: <20240226143630.33643-27-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan To reduce the number of VM exits for modifying the X86_EFLAGS_IF bit in guest suprvisor mode, a shared structure is used between the guest and hypervisor in PVM. This structure is stored in the guest memory. In this way, the guest supervisor can change its X86_EFLAGS_IF bit without causing a VM exit, as long as there is no IRQ window request. After a VM exit occurs, the hypervisor updates the guest's X86_EFLAGS_IF bit from the shared structure. Since the SRET/URET synthetic instruction always induces a VM exit, there is nothing to do in the enable_nmi_window() callback. Additionally, SMM mode is not supported now. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 194 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 194 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index ce047d211657..3d2a3c472664 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -585,6 +585,143 @@ static bool pvm_apic_init_signal_blocked(struct kvm_vcpu *vcpu) return false; } +static struct pvm_vcpu_struct *pvm_get_vcpu_struct(struct vcpu_pvm *pvm) +{ + struct gfn_to_pfn_cache *gpc = &pvm->pvcs_gpc; + + read_lock_irq(&gpc->lock); + while (!kvm_gpc_check(gpc, PAGE_SIZE)) { + read_unlock_irq(&gpc->lock); + + if (kvm_gpc_refresh(gpc, PAGE_SIZE)) + return NULL; + + read_lock_irq(&gpc->lock); + } + + return (struct pvm_vcpu_struct *)(gpc->khva); +} + +static void pvm_put_vcpu_struct(struct vcpu_pvm *pvm, bool dirty) +{ + struct gfn_to_pfn_cache *gpc = &pvm->pvcs_gpc; + + read_unlock_irq(&gpc->lock); + if (dirty) + mark_page_dirty_in_slot(pvm->vcpu.kvm, gpc->memslot, + gpc->gpa >> PAGE_SHIFT); +} + +static void pvm_vcpu_gpc_refresh(struct kvm_vcpu *vcpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + struct gfn_to_pfn_cache *gpc = &pvm->pvcs_gpc; + + if (!gpc->active) + return; + + if (pvm_get_vcpu_struct(pvm)) + pvm_put_vcpu_struct(pvm, false); + else + kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu); +} + +static void pvm_event_flags_update(struct kvm_vcpu *vcpu, unsigned long set, + unsigned long clear) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + static struct pvm_vcpu_struct *pvcs; + unsigned long old_flags, new_flags; + + if (!pvm->msr_vcpu_struct) + return; + + pvcs = pvm_get_vcpu_struct(pvm); + if (!pvcs) + return; + + old_flags = pvcs->event_flags; + new_flags = (old_flags | set) & ~clear; + if (new_flags != old_flags) + pvcs->event_flags = new_flags; + + pvm_put_vcpu_struct(pvm, new_flags != old_flags); +} + +static unsigned long pvm_get_rflags(struct kvm_vcpu *vcpu) +{ + return to_pvm(vcpu)->rflags; +} + +static void pvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + int need_update = !!((pvm->rflags ^ rflags) & X86_EFLAGS_IF); + + pvm->rflags = rflags; + + /* + * The IF bit of 'pvcs->event_flags' should not be changed in user + * mode. It is recommended for this bit to be cleared when switching to + * user mode, so that when the guest switches back to supervisor mode, + * the X86_EFLAGS_IF is already cleared. + */ + if (!need_update || !is_smod(pvm)) + return; + + if (rflags & X86_EFLAGS_IF) + pvm_event_flags_update(vcpu, X86_EFLAGS_IF, PVM_EVENT_FLAGS_IP); + else + pvm_event_flags_update(vcpu, 0, X86_EFLAGS_IF); +} + +static bool pvm_get_if_flag(struct kvm_vcpu *vcpu) +{ + return pvm_get_rflags(vcpu) & X86_EFLAGS_IF; +} + +static u32 pvm_get_interrupt_shadow(struct kvm_vcpu *vcpu) +{ + return to_pvm(vcpu)->int_shadow; +} + +static void pvm_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask) +{ + /* PVM spec: ignore interrupt shadow when in PVM mode. */ +} + +static void enable_irq_window(struct kvm_vcpu *vcpu) +{ + pvm_event_flags_update(vcpu, PVM_EVENT_FLAGS_IP, 0); +} + +static int pvm_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection) +{ + return (pvm_get_rflags(vcpu) & X86_EFLAGS_IF) && + !to_pvm(vcpu)->int_shadow; +} + +static bool pvm_get_nmi_mask(struct kvm_vcpu *vcpu) +{ + return to_pvm(vcpu)->nmi_mask; +} + +static void pvm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked) +{ + to_pvm(vcpu)->nmi_mask = masked; +} + +static void enable_nmi_window(struct kvm_vcpu *vcpu) +{ +} + +static int pvm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + + return !pvm->nmi_mask && !pvm->int_shadow; +} + static void pvm_setup_mce(struct kvm_vcpu *vcpu) { } @@ -826,12 +963,29 @@ static fastpath_t pvm_vcpu_run(struct kvm_vcpu *vcpu) pvm_vcpu_run_noinstr(vcpu); if (is_smod(pvm)) { + struct pvm_vcpu_struct *pvcs = pvm->pvcs_gpc.khva; + + /* + * Load the X86_EFLAGS_IF bit from PVCS. In user mode, the + * Interrupt Flag is considered to be set and cannot be + * changed. Since it is already set in 'pvm->rflags', so + * nothing to do. In supervisor mode, the Interrupt Flag is + * reflected in 'pvcs->event_flags' and can be changed + * directly without triggering a VM exit. + */ + pvm->rflags &= ~X86_EFLAGS_IF; + if (likely(pvm->msr_vcpu_struct)) + pvm->rflags |= X86_EFLAGS_IF & pvcs->event_flags; + if (pvm->hw_cs != __USER_CS || pvm->hw_ss != __USER_DS) kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu); } pvm_load_host_xsave_state(vcpu); + mark_page_dirty_in_slot(vcpu->kvm, pvm->pvcs_gpc.memslot, + pvm->pvcs_gpc.gpa >> PAGE_SHIFT); + return EXIT_FASTPATH_NONE; } @@ -965,6 +1119,27 @@ static int pvm_check_processor_compat(void) return 0; } +#ifdef CONFIG_KVM_SMM +static int pvm_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection) +{ + return 0; +} + +static int pvm_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *smram) +{ + return 0; +} + +static int pvm_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram) +{ + return 0; +} + +static void enable_smi_window(struct kvm_vcpu *vcpu) +{ +} +#endif + /* * When in PVM mode, the hardware MSR_LSTAR is set to the entry point * provided by the host entry code (switcher), and the @@ -1098,10 +1273,21 @@ static struct kvm_x86_ops pvm_x86_ops __initdata = { .set_msr = pvm_set_msr, .get_cpl = pvm_get_cpl, .load_mmu_pgd = pvm_load_mmu_pgd, + .get_rflags = pvm_get_rflags, + .set_rflags = pvm_set_rflags, + .get_if_flag = pvm_get_if_flag, .vcpu_pre_run = pvm_vcpu_pre_run, .vcpu_run = pvm_vcpu_run, .handle_exit = pvm_handle_exit, + .set_interrupt_shadow = pvm_set_interrupt_shadow, + .get_interrupt_shadow = pvm_get_interrupt_shadow, + .interrupt_allowed = pvm_interrupt_allowed, + .nmi_allowed = pvm_nmi_allowed, + .get_nmi_mask = pvm_get_nmi_mask, + .set_nmi_mask = pvm_set_nmi_mask, + .enable_nmi_window = enable_nmi_window, + .enable_irq_window = enable_irq_window, .refresh_apicv_exec_ctrl = pvm_refresh_apicv_exec_ctrl, .deliver_interrupt = pvm_deliver_interrupt, @@ -1117,10 +1303,18 @@ static struct kvm_x86_ops pvm_x86_ops __initdata = { .setup_mce = pvm_setup_mce, +#ifdef CONFIG_KVM_SMM + .smi_allowed = pvm_smi_allowed, + .enter_smm = pvm_enter_smm, + .leave_smm = pvm_leave_smm, + .enable_smi_window = enable_smi_window, +#endif + .apic_init_signal_blocked = pvm_apic_init_signal_blocked, .msr_filter_changed = pvm_msr_filter_changed, .complete_emulated_msr = kvm_complete_insn_gp, .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, + .vcpu_gpc_refresh = pvm_vcpu_gpc_refresh, }; static struct kvm_x86_init_ops pvm_init_ops __initdata = { From patchwork Mon Feb 26 14:35:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572311 Received: from mail-pg1-f181.google.com (mail-pg1-f181.google.com [209.85.215.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 524D4134CE3; Mon, 26 Feb 2024 14:36:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958168; cv=none; b=tMAN3qyT4luAXi5eUKKBSZ2Mcocscb0R2vvUMtvs8Tkkck8gUet97YMNPSb/xvHMJfeyd5OLFgHL0uUvJzT6BqmMMW33+TlnMo2g+xK1hp8WqQ5IsmtBT/5UdCo815ZaF7InRJfTIBiDCOm0ad/LV7F687kwy4Ccq465WqASXds= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958168; c=relaxed/simple; bh=/mEAVcSbJaQJo/7ORL9ShODHv0ZFinR5lEggkdY6Xvs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=BqUj07t8b/s50qEJD03GXTHWTlyMg9UBJ+djHcgcaD4XFVqLd5obznQhQvPpXM7kcYyMu6F7dZthseFkULFGOggRmkLxXFmz8T5CapLRA0gp/qeTCtjfrWYXDIG95ZD2ReiYd/hQkLI1hdYRIPoBnNm6gDRyFGA6LWNgwadD+vg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=PGXfzOtC; arc=none smtp.client-ip=209.85.215.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="PGXfzOtC" Received: by mail-pg1-f181.google.com with SMTP id 41be03b00d2f7-5ca29c131ebso2498113a12.0; Mon, 26 Feb 2024 06:36:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958166; x=1709562966; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ToY5GgR/8oR+7K/4zBA63JoSPDOzZ4MQ9j4bv4bYPWw=; b=PGXfzOtCAJUxFDzLiPReuWMgxcwjrF0XUYik1kfmclYHgM/Td/3zvvlqOZl+cGuknI THmzg4G+VmwNUeq+JA1ErGdyuWT0wb8OESxLuE5XRML2L9Mu/LJJZJmt2wBuYTBY+nHm YCvA0IPrbCy2wYZeo6WJJAsrVtFhkOgKPDIeHS7r1ExKOLJ+BwMp8fQ0XRBHEqZq6BSB VqGfKi4tnPe6sm6DwgXJTrr6sgnCghsi8RWakKXBX+9cWg4/lUfrmxJSuxECWHyjW/+q AU5yv808mcEHf1MKbHTdXi4EWNB+ZKaRwQ9ga8whrYw8/ElKBsZPQ2d66NL3zu7QoT6l ouHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958166; x=1709562966; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ToY5GgR/8oR+7K/4zBA63JoSPDOzZ4MQ9j4bv4bYPWw=; b=vg2xU4liRH84UN6SgwHCxn7opDgWZbwwY+KOOVmCFa6nduz8LlBA2IdlNuVknFAlo5 Fq/aB/6cION3sgJhB4UXd+5MJoaBsT0pNdlPvSv/iNmbYW3NS74k4tZ95v2tv49+0JHe qgFumPUSjSjSB4iA4avShydjhPcwwTLvvm1KUqMm1npv955NzRJwbJIAc1nw9cpsThpq sPekMlzG3upvCxq42ncWPNlMM1ZBgG/v2Us6cLQZ3518Z+fT4goAbeS4f783ygSSGtVL RlrzNpXk/2iBDGjjr7+v2EKHkhqXYfWO7ka0LQi6VVnY/EEJi7ca1lcmW9EW9HllMblc ck6A== X-Forwarded-Encrypted: i=1; AJvYcCW8BbSdu4NWV/KgTwdHF5U0QOQ1OnrvwV/C1345MKK5rzhFxH8Bu9KgV0sod9goorUcZlOekWEDaW1SZUNTPKpoHGo2 X-Gm-Message-State: AOJu0YxilQrdCqLgqwC0Lu6qJ8utXKQtGgtqCSHuXLcKR8QiZhkTIsw2 cRARUMk4ti/JzGVeeeRbg9R2+23sSuIYuSR/35pIRj/7Jiily7ixIgHWYBAa X-Google-Smtp-Source: AGHT+IGXXJTeR/ZrfgQ6mnuG0H2PJAJ5iD62Ai9ROjwtr1SkKhwKPvQVD+EpFqd2UqCYBTCdwjrj/Q== X-Received: by 2002:a17:903:2450:b0:1db:bbe0:9e9 with SMTP id l16-20020a170903245000b001dbbbe009e9mr10410428pls.58.1708958166218; Mon, 26 Feb 2024 06:36:06 -0800 (PST) Received: from localhost ([47.88.5.130]) by smtp.gmail.com with ESMTPSA id kf14-20020a17090305ce00b001db45bae92dsm4074496plb.74.2024.02.26.06.36.05 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:36:05 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 27/73] KVM: x86/PVM: Implement event injection related callbacks Date: Mon, 26 Feb 2024 22:35:44 +0800 Message-Id: <20240226143630.33643-28-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan In PVM, events are injected and consumed directly. The PVM hypervisor does not follow the IDT-based event delivery mechanism but instead utilizes a new PVM-specific event delivery ABI, which is similar to FRED event delivery. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 193 +++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/pvm/pvm.h | 1 + 2 files changed, 194 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index 3d2a3c472664..57d987903791 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -648,6 +648,150 @@ static void pvm_event_flags_update(struct kvm_vcpu *vcpu, unsigned long set, pvm_put_vcpu_struct(pvm, new_flags != old_flags); } +static void pvm_standard_event_entry(struct kvm_vcpu *vcpu, unsigned long entry) +{ + // Change rip, rflags, rcx and r11 per PVM event delivery specification, + // this allows to use sysret in VM enter. + kvm_rip_write(vcpu, entry); + kvm_set_rflags(vcpu, X86_EFLAGS_FIXED); + kvm_rcx_write(vcpu, entry); + kvm_r11_write(vcpu, X86_EFLAGS_IF | X86_EFLAGS_FIXED); +} + +/* handle pvm user event per PVM Spec. */ +static int do_pvm_user_event(struct kvm_vcpu *vcpu, int vector, + bool has_err_code, u64 err_code) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + unsigned long entry = vector == PVM_SYSCALL_VECTOR ? + pvm->msr_lstar : pvm->msr_event_entry; + struct pvm_vcpu_struct *pvcs; + + pvcs = pvm_get_vcpu_struct(pvm); + if (!pvcs) { + kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu); + return 1; + } + + pvcs->user_cs = pvm->hw_cs; + pvcs->user_ss = pvm->hw_ss; + pvcs->eflags = kvm_get_rflags(vcpu); + pvcs->pkru = 0; + pvcs->user_gsbase = pvm_read_guest_gs_base(pvm); + pvcs->rip = kvm_rip_read(vcpu); + pvcs->rsp = kvm_rsp_read(vcpu); + pvcs->rcx = kvm_rcx_read(vcpu); + pvcs->r11 = kvm_r11_read(vcpu); + + if (has_err_code) + pvcs->event_errcode = err_code; + if (vector != PVM_SYSCALL_VECTOR) + pvcs->event_vector = vector; + + if (vector == PF_VECTOR) + pvcs->cr2 = vcpu->arch.cr2; + + pvm_put_vcpu_struct(pvm, true); + + switch_to_smod(vcpu); + + pvm_standard_event_entry(vcpu, entry); + + return 1; +} + +static int do_pvm_supervisor_exception(struct kvm_vcpu *vcpu, int vector, + bool has_error_code, u64 error_code) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + unsigned long stack; + struct pvm_supervisor_event frame; + struct x86_exception e; + int ret; + + memset(&frame, 0, sizeof(frame)); + frame.cs = kernel_cs_by_msr(pvm->msr_star); + frame.ss = kernel_ds_by_msr(pvm->msr_star); + frame.rip = kvm_rip_read(vcpu); + frame.rflags = kvm_get_rflags(vcpu); + frame.rsp = kvm_rsp_read(vcpu); + frame.errcode = ((unsigned long)vector << 32) | error_code; + frame.r11 = kvm_r11_read(vcpu); + frame.rcx = kvm_rcx_read(vcpu); + + stack = ((frame.rsp - pvm->msr_supervisor_redzone) & ~15UL) - sizeof(frame); + + ret = kvm_write_guest_virt_system(vcpu, stack, &frame, sizeof(frame), &e); + if (ret) { + kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu); + return 1; + } + + if (vector == PF_VECTOR) { + struct pvm_vcpu_struct *pvcs; + + pvcs = pvm_get_vcpu_struct(pvm); + if (!pvcs) { + kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu); + return 1; + } + + pvcs->cr2 = vcpu->arch.cr2; + pvm_put_vcpu_struct(pvm, true); + } + + kvm_rsp_write(vcpu, stack); + + pvm_standard_event_entry(vcpu, pvm->msr_event_entry + 256); + + return 1; +} + +static int do_pvm_supervisor_interrupt(struct kvm_vcpu *vcpu, int vector, + bool has_error_code, u64 error_code) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + unsigned long stack = kvm_rsp_read(vcpu); + struct pvm_vcpu_struct *pvcs; + + pvcs = pvm_get_vcpu_struct(pvm); + if (!pvcs) { + kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu); + return 1; + } + pvcs->eflags = kvm_get_rflags(vcpu); + pvcs->rip = kvm_rip_read(vcpu); + pvcs->rsp = stack; + pvcs->rcx = kvm_rcx_read(vcpu); + pvcs->r11 = kvm_r11_read(vcpu); + + pvcs->event_vector = vector; + if (has_error_code) + pvcs->event_errcode = error_code; + + pvm_put_vcpu_struct(pvm, true); + + stack = (stack - pvm->msr_supervisor_redzone) & ~15UL; + kvm_rsp_write(vcpu, stack); + + pvm_standard_event_entry(vcpu, pvm->msr_event_entry + 512); + + return 1; +} + +static int do_pvm_event(struct kvm_vcpu *vcpu, int vector, + bool has_error_code, u64 error_code) +{ + if (!is_smod(to_pvm(vcpu))) + return do_pvm_user_event(vcpu, vector, has_error_code, error_code); + + if (vector < 32) + return do_pvm_supervisor_exception(vcpu, vector, + has_error_code, error_code); + + return do_pvm_supervisor_interrupt(vcpu, vector, has_error_code, error_code); +} + static unsigned long pvm_get_rflags(struct kvm_vcpu *vcpu) { return to_pvm(vcpu)->rflags; @@ -722,6 +866,51 @@ static int pvm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection) return !pvm->nmi_mask && !pvm->int_shadow; } +/* Always inject the exception directly and consume the event. */ +static void pvm_inject_exception(struct kvm_vcpu *vcpu) +{ + unsigned int vector = vcpu->arch.exception.vector; + bool has_error_code = vcpu->arch.exception.has_error_code; + u32 error_code = vcpu->arch.exception.error_code; + + kvm_deliver_exception_payload(vcpu, &vcpu->arch.exception); + + if (do_pvm_event(vcpu, vector, has_error_code, error_code)) + kvm_clear_exception_queue(vcpu); +} + +/* Always inject the interrupt directly and consume the event. */ +static void pvm_inject_irq(struct kvm_vcpu *vcpu, bool reinjected) +{ + int irq = vcpu->arch.interrupt.nr; + + trace_kvm_inj_virq(irq, vcpu->arch.interrupt.soft, false); + + if (do_pvm_event(vcpu, irq, false, 0)) + kvm_clear_interrupt_queue(vcpu); + + ++vcpu->stat.irq_injections; +} + +/* Always inject the NMI directly and consume the event. */ +static void pvm_inject_nmi(struct kvm_vcpu *vcpu) +{ + if (do_pvm_event(vcpu, NMI_VECTOR, false, 0)) { + vcpu->arch.nmi_injected = false; + pvm_set_nmi_mask(vcpu, true); + } + + ++vcpu->stat.nmi_injections; +} + +static void pvm_cancel_injection(struct kvm_vcpu *vcpu) +{ + /* + * Nothing to do. Since exceptions/interrupts are delivered immediately + * during event injection, so they cannot be cancelled and reinjected. + */ +} + static void pvm_setup_mce(struct kvm_vcpu *vcpu) { } @@ -1282,6 +1471,10 @@ static struct kvm_x86_ops pvm_x86_ops __initdata = { .handle_exit = pvm_handle_exit, .set_interrupt_shadow = pvm_set_interrupt_shadow, .get_interrupt_shadow = pvm_get_interrupt_shadow, + .inject_irq = pvm_inject_irq, + .inject_nmi = pvm_inject_nmi, + .inject_exception = pvm_inject_exception, + .cancel_injection = pvm_cancel_injection, .interrupt_allowed = pvm_interrupt_allowed, .nmi_allowed = pvm_nmi_allowed, .get_nmi_mask = pvm_get_nmi_mask, diff --git a/arch/x86/kvm/pvm/pvm.h b/arch/x86/kvm/pvm/pvm.h index b0c633ce2987..39506ddbe5c5 100644 --- a/arch/x86/kvm/pvm/pvm.h +++ b/arch/x86/kvm/pvm/pvm.h @@ -7,6 +7,7 @@ #define SWITCH_FLAGS_INIT (SWITCH_FLAGS_SMOD) +#define PVM_SYSCALL_VECTOR SWITCH_EXIT_REASONS_SYSCALL #define PVM_FAILED_VMENTRY_VECTOR SWITCH_EXIT_REASONS_FAILED_VMETNRY #define PT_L4_SHIFT 39 From patchwork Mon Feb 26 14:35:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572312 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF5C91350F6; Mon, 26 Feb 2024 14:36:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958171; cv=none; b=LJzv5AYaJ3+PbAxwo3gfesIJKzgJDP01XD5bkyIYhQq3T4XhaxSizcyEm3K2HcmxYDePe4OtmOGDIMvNo6oa/iv7QLD+IWRN1uOZEUZaMU5Vodb6PiVQISCTdu2160p2Tv2M2K5tJPdEs+RQP9jP9sfFyMyqXuh4b8smtJYskn4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958171; c=relaxed/simple; bh=S9RepT6PoqM2S+OMOYJJ6Pjp/uzBUojYH/rcTDukdCQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=qWYNDgFSiQiMjWAIAY3VzgkNi1hLIxX6nuduNnlerujGwjlTZMZuWg/lTizTvRNCgqEUY4tzIIXaMi3k0r9/A/9OuEs/xuZxLj7fjYrycLBErDsuaoxJWWQZEYeVCvB5VG+eBocbWpUrCRscldZhdu9QSoC/FzoJvKU3M7K24YI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=mqaCwwb5; arc=none smtp.client-ip=209.85.210.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="mqaCwwb5" Received: by mail-pf1-f181.google.com with SMTP id d2e1a72fcca58-6e46dcd8feaso827390b3a.2; Mon, 26 Feb 2024 06:36:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958169; x=1709562969; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Yal7PqZq3fB2N6wNdWSs550cQCs0YFhnt+trBoKCwic=; b=mqaCwwb5Zz1bN5TZhSCrivH4/roxBTSRSkWnKuPEmibM1bBYQLNLc55TbYpTktPSj2 4EYZ63zJ/vig54AOwqmAK9uwKPw7KeEs3NnxFZfp91cfgkBuSzBUUR3WFAdWVz0RD3yj +PVv5N+O1aoBWYgLVM8e/5KwfO8quJ2iMhcXdwsIh8PNpkPm4/MEjNH8nlTqpvbM7Lfk bScgKGAOH6h1LkRNF+enS8LIfrOTCg0JMMcJnTD/HKDV7UrXEz43tPKTa5MRvtlFhPvi gwk6JNt+kUnHk/xlndOMNMjCkKFQifA513I3baCkKqo4VpxFNeqHQR8UZAUIPOcoGqo8 YcRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958169; x=1709562969; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Yal7PqZq3fB2N6wNdWSs550cQCs0YFhnt+trBoKCwic=; b=TQU6AOUeilZNPVH0zwq0IhM/1MFdu5ExqE7earjqHJH5ZCHpshxE9/5b3xpr238+UX 5oVmG465D2DuP2uLzd9YO2HLMYqYgnhU9I0sHsd/yhr8679zTtiHi9CiHdNjStZSintm vlxfRh8OUIPsYQogxR63jAH4IJhwzR88i29RUDi08ADA2qS3kAYG0O0+GiMbxCS3iR5Q 9do6YWPNrBsOOrCaU8Zawh0XuKj+b9CC6s8DGFsHzl9Lx+Vp8LFprv1B6ld6G0gD/hV4 MDgkM7jhqk4vjsMnQLsQcBuzsk7sV2ors6oYXmague2O/xzxi4y/iWdRusxdnQsL4a4J qGBg== X-Forwarded-Encrypted: i=1; AJvYcCU116GiRuihyfzo+QKBvt1ooeBbt40+UhLyDMq1jnz52mzzeSVnal+GPVA4bt5YXpf8ebGnhnCQkzFk6f8It8RMxVXx X-Gm-Message-State: AOJu0YzA5LsMQGMS/7BahaV4PCmkXkKjcOIDx0S/2Wjye3gK7+wkesF8 65rsMesRX7ExhOHCOgTJ48Wl4bwgLsO7qL1oB7hFp/LR9uEsywLrO7m2td1s X-Google-Smtp-Source: AGHT+IHB4Cu1aKHNfPZa42dZ40xlkovy8vQGKuBOn8Bof5eHWznKUcKJ+HGxXyIylbQxXjd/SAnndQ== X-Received: by 2002:a62:be06:0:b0:6e5:d7:e27a with SMTP id l6-20020a62be06000000b006e500d7e27amr4502659pff.5.1708958169213; Mon, 26 Feb 2024 06:36:09 -0800 (PST) Received: from localhost ([47.89.225.180]) by smtp.gmail.com with ESMTPSA id y10-20020a62f24a000000b006e467935c58sm4033353pfl.89.2024.02.26.06.36.08 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:36:08 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 28/73] KVM: x86/PVM: Handle syscall from user mode Date: Mon, 26 Feb 2024 22:35:45 +0800 Message-Id: <20240226143630.33643-29-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan Similar to the vector event from user mode, the syscall event from user mode follows the PVM event delivery ABI. Additionally, the 32-bit user mode can only use "INT 0x80" for syscall. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index 57d987903791..92eef226df28 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -915,6 +915,15 @@ static void pvm_setup_mce(struct kvm_vcpu *vcpu) { } +static int handle_exit_syscall(struct kvm_vcpu *vcpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + + if (!is_smod(pvm)) + return do_pvm_user_event(vcpu, PVM_SYSCALL_VECTOR, false, 0); + return 1; +} + static int handle_exit_external_interrupt(struct kvm_vcpu *vcpu) { ++vcpu->stat.irq_exits; @@ -939,7 +948,11 @@ static int pvm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) struct vcpu_pvm *pvm = to_pvm(vcpu); u32 exit_reason = pvm->exit_vector; - if (exit_reason >= FIRST_EXTERNAL_VECTOR && exit_reason < NR_VECTORS) + if (exit_reason == PVM_SYSCALL_VECTOR) + return handle_exit_syscall(vcpu); + else if (exit_reason == IA32_SYSCALL_VECTOR) + return do_pvm_event(vcpu, IA32_SYSCALL_VECTOR, false, 0); + else if (exit_reason >= FIRST_EXTERNAL_VECTOR && exit_reason < NR_VECTORS) return handle_exit_external_interrupt(vcpu); else if (exit_reason == PVM_FAILED_VMENTRY_VECTOR) return handle_exit_failed_vmentry(vcpu); From patchwork Mon Feb 26 14:35:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572313 Received: from mail-io1-f47.google.com (mail-io1-f47.google.com [209.85.166.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C6499135A4E; Mon, 26 Feb 2024 14:36:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958175; cv=none; b=HvZxQQ4k+U2L3hlsQZC2E9YAwIzOh9hct6EFs26PuLCer4kvCHBaarLVeJlFLQWqqwScNdhkZMB5q+eW3N5PsynHKwbiEab9B8t1n3RU7/bLJI5eXlGtGjPUDXTAj7uGALlgH0V0zcXGJizXprbFVT/6kxNbY77K/O8p2JIlIvE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958175; c=relaxed/simple; bh=s8BFiuzUQau5p+yaQ0/fjHGiJ1v48d8tcf7X9leFdDk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=IGAnKKbC+wowGBCSQmM81Ju1K8ke+RsPW8twRijgZr4/HaHn6+raKKKtsAzEiJSp/P8Q/3NPXIGSZ+ji+3MRLMBZF6byy4Ck0M2QUJsaCqK1PbdTrTnl2e8VSE3hAc1SVexqxmMd1wfYMVc3FoMab7gNg6OK+J+VsgTMSdv4h2c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=jVSJjN6R; arc=none smtp.client-ip=209.85.166.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="jVSJjN6R" Received: by mail-io1-f47.google.com with SMTP id ca18e2360f4ac-7c78573f31aso149892139f.3; Mon, 26 Feb 2024 06:36:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958172; x=1709562972; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=UtUkxsopGOuyKL99aD2RpbSsknxN3/UH+bUJznUscCw=; b=jVSJjN6RAGsTofl+30a2DdIfw9wj3m5ZRsCEJMr7Gaq2/lsNAO1tRkNZRV3LeX9/5d 0A8x9mDQkaZfDPKGBbJmZWoLV1g9csSsyd5gAjHNm82G/ztVPGzf62S2ILd0s38vRXF2 +isNQobyi0UYJcmI5KIR16Qumv+zv7LSBO/Tzxa9XGen6Vzj9mLmmSqQQ5Es0ZKLUFq9 BJ2Bgr7zny56w9Oov0TiHwCKAGy8FMXIFAE141A9YFGBd2e8c54GirqhPFHD+K1flmrW OJpvcFULu8ANJ/z7m6FpX6tInWcAJcCeTFWCs9j3gZ/MqL6qPP0tL5JYLLSsjt+X8JXP 4ftQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958172; x=1709562972; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UtUkxsopGOuyKL99aD2RpbSsknxN3/UH+bUJznUscCw=; b=SrW+3iOUoMqF4et23l5LqGs6vGW3/8uhK71TfSV3iOa7nrLmnpxB46wus16g8q+0b6 7Sr1LqohqXEjpisF7zXgiygQnNRQ6Vn+n8eJ8cPhqCCNAu61fX4D+3oGaPQWAyhQ9Kd7 ojXYvVgEOfR6pDGuce/b69u9giE5c76xMKurZK13nEEI+41RWRkAJHDn/tRDajCl+r5r 3FBzgwV/sgA7kFmlIpirJIDd35fCrdglJ29WhIH3JyvQpcEhSQlW2DPWBfCgAuDkSu1x YE1pjibJWzR9fJ94aeDC9W2nsrzmd2B1HB4BD6+GsHjdDhTAmZPGsCtPJx3Gl/H3o3Dg 0Q7Q== X-Forwarded-Encrypted: i=1; AJvYcCX6sgo9YpJVvEQM8cLJhYD46VFHyvfBJO2T4EfdXawyyky2ywCGMQzF1HsPyWncbEyvfBviXwUgOWB2KLq/ixKoN/Ng X-Gm-Message-State: AOJu0Yzs1YvZdGtBeYqAG1aY6Nb8sb/bNhOBKTEnUaf3NtOBw90VTSjy P67ckPmqT/0P7OC8m40Q0rvxSTdFOWX8BN0PRSgUYC9o6MrKHIABCcm6ZRgi X-Google-Smtp-Source: AGHT+IFQeXNAab//9PiSgKBW3UP1nqq/u+J5D/SgwtYsUNu1ZEVUWMN2GeFBx3/XMSHtKBNQ5klvEw== X-Received: by 2002:a92:d588:0:b0:365:1305:fac5 with SMTP id a8-20020a92d588000000b003651305fac5mr7579709iln.0.1708958172552; Mon, 26 Feb 2024 06:36:12 -0800 (PST) Received: from localhost ([47.88.5.130]) by smtp.gmail.com with ESMTPSA id l64-20020a638843000000b005dccf9e3b74sm4047652pgd.92.2024.02.26.06.36.11 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:36:12 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 29/73] KVM: x86/PVM: Implement allowed range checking for #PF Date: Mon, 26 Feb 2024 22:35:46 +0800 Message-Id: <20240226143630.33643-30-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan In PVM, guest is only allowed to be running in the reserved virtual address range provided by the hypervisor. So guest needs to get the allowed range information from the MSR and the hypervisor needs to check the fault address and prevent install mapping in the #PF handler. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 74 ++++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/pvm/pvm.h | 5 +++ 2 files changed, 79 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index 92eef226df28..26b2201f7dde 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -144,6 +144,28 @@ static void pvm_write_guest_kernel_gs_base(struct vcpu_pvm *pvm, u64 data) pvm->msr_kernel_gs_base = data; } +static __always_inline bool pvm_guest_allowed_va(struct kvm_vcpu *vcpu, u64 va) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + + if ((s64)va > 0) + return true; + if (pvm->l4_range_start <= va && va < pvm->l4_range_end) + return true; + if (pvm->l5_range_start <= va && va < pvm->l5_range_end) + return true; + + return false; +} + +static bool pvm_disallowed_va(struct kvm_vcpu *vcpu, u64 va) +{ + if (is_noncanonical_address(va, vcpu)) + return true; + + return !pvm_guest_allowed_va(vcpu, va); +} + // switch_to_smod() and switch_to_umod() switch the mode (smod/umod) and // the CR3. No vTLB flushing when switching the CR3 per PVM Spec. static inline void switch_to_smod(struct kvm_vcpu *vcpu) @@ -380,6 +402,48 @@ static void pvm_sched_in(struct kvm_vcpu *vcpu, int cpu) { } +static void pvm_set_msr_linear_address_range(struct vcpu_pvm *pvm, + u64 pml4_i_s, u64 pml4_i_e, + u64 pml5_i_s, u64 pml5_i_e) +{ + pvm->msr_linear_address_range = ((0xfe00 | pml4_i_s) << 0) | + ((0xfe00 | pml4_i_e) << 16) | + ((0xfe00 | pml5_i_s) << 32) | + ((0xfe00 | pml5_i_e) << 48); + + pvm->l4_range_start = (0x1fffe00 | pml4_i_s) * PT_L4_SIZE; + pvm->l4_range_end = (0x1fffe00 | pml4_i_e) * PT_L4_SIZE; + pvm->l5_range_start = (0xfe00 | pml5_i_s) * PT_L5_SIZE; + pvm->l5_range_end = (0xfe00 | pml5_i_e) * PT_L5_SIZE; +} + +static void pvm_set_default_msr_linear_address_range(struct vcpu_pvm *pvm) +{ + pvm_set_msr_linear_address_range(pvm, pml4_index_start, pml4_index_end, + pml5_index_start, pml5_index_end); +} + +static bool pvm_check_and_set_msr_linear_address_range(struct vcpu_pvm *pvm, u64 msr) +{ + u64 pml4_i_s = (msr >> 0) & 0x1ff; + u64 pml4_i_e = (msr >> 16) & 0x1ff; + u64 pml5_i_s = (msr >> 32) & 0x1ff; + u64 pml5_i_e = (msr >> 48) & 0x1ff; + + /* PVM specification requires those bits to be all set. */ + if ((msr & 0xff00ff00ff00ff00) != 0xff00ff00ff00ff00) + return false; + + /* Guest ranges should be inside what the hypervisor can provide. */ + if (pml4_i_s < pml4_index_start || pml4_i_e > pml4_index_end || + pml5_i_s < pml5_index_start || pml5_i_e > pml5_index_end) + return false; + + pvm_set_msr_linear_address_range(pvm, pml4_i_s, pml4_i_e, pml5_i_s, pml5_i_e); + + return true; +} + static int pvm_get_msr_feature(struct kvm_msr_entry *msr) { return 1; @@ -456,6 +520,9 @@ static int pvm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_PVM_SWITCH_CR3: msr_info->data = pvm->msr_switch_cr3; break; + case MSR_PVM_LINEAR_ADDRESS_RANGE: + msr_info->data = pvm->msr_linear_address_range; + break; default: ret = kvm_get_msr_common(vcpu, msr_info); } @@ -552,6 +619,10 @@ static int pvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_PVM_SWITCH_CR3: pvm->msr_switch_cr3 = msr_info->data; break; + case MSR_PVM_LINEAR_ADDRESS_RANGE: + if (!pvm_check_and_set_msr_linear_address_range(pvm, msr_info->data)) + return 1; + break; default: ret = kvm_set_msr_common(vcpu, msr_info); } @@ -1273,6 +1344,7 @@ static void pvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) pvm->msr_retu_rip_plus2 = 0; pvm->msr_rets_rip_plus2 = 0; pvm->msr_switch_cr3 = 0; + pvm_set_default_msr_linear_address_range(pvm); } static int pvm_vcpu_create(struct kvm_vcpu *vcpu) @@ -1520,6 +1592,8 @@ static struct kvm_x86_ops pvm_x86_ops __initdata = { .msr_filter_changed = pvm_msr_filter_changed, .complete_emulated_msr = kvm_complete_insn_gp, .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, + + .disallowed_va = pvm_disallowed_va, .vcpu_gpc_refresh = pvm_vcpu_gpc_refresh, }; diff --git a/arch/x86/kvm/pvm/pvm.h b/arch/x86/kvm/pvm/pvm.h index 39506ddbe5c5..bf3a6a1837c0 100644 --- a/arch/x86/kvm/pvm/pvm.h +++ b/arch/x86/kvm/pvm/pvm.h @@ -82,6 +82,11 @@ struct vcpu_pvm { unsigned long msr_switch_cr3; unsigned long msr_linear_address_range; + u64 l4_range_start; + u64 l4_range_end; + u64 l5_range_start; + u64 l5_range_end; + struct kvm_segment segments[NR_VCPU_SREG]; struct desc_ptr idt_ptr; struct desc_ptr gdt_ptr; From patchwork Mon Feb 26 14:35:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572314 Received: from mail-il1-f177.google.com (mail-il1-f177.google.com [209.85.166.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B32113664A; Mon, 26 Feb 2024 14:36:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958178; cv=none; b=cLEp03nAn0kXbJ5HgGIyu46zOmVfiljzg9a4QoqNRRwb3pXcG707rsIsQvOdE+2x3OtrmW8iA18S/tEB2GroXzisnGh99hBnW8drvN3GBxOaqNv/BlnFh3G5fyxLJwzO+QbjTNmRMW3VBLjDA78/iFK6FSoRuhUIqBW4UDz1IR8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958178; c=relaxed/simple; bh=8gFP5rvdzZxmykD3GgHFtUmnYj+RZy2c2VA4Es1YSSs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=cT0TDclyEUJaN2VFlVaxH1zLZHmShxOoAuEI/98RUBOMpyQAzucjzV0E4q/tcA0uYWXfHdCQvq4CoAQS1QBSEysWmofAySgs36mhqGOAZic2Rze2spE+PkghE/yebhDz1dTtX37YFItD0di7HQQ9QTYJ7uoAlgQjxJ9CmPb4uNM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=FHwQt2Gv; arc=none smtp.client-ip=209.85.166.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="FHwQt2Gv" Received: by mail-il1-f177.google.com with SMTP id e9e14a558f8ab-3657dbe2008so10162675ab.1; Mon, 26 Feb 2024 06:36:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958176; x=1709562976; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0en7LuGGPdtEWR7o0ZKkW7B0C/vkNLQUW7iYqtMdnKQ=; b=FHwQt2GvLmnD7J+9eoMsAbHRGZkOiBEAZis4BhscMz/pGSjYyHVNfqKjrXHgdZ0t1P Cb06XDtr1W2LunueSHxwSNb7SAgF5vPj20rqpYmOtY8bFdS82wxxf2o81uG/86sCFfue 1LdCSwhInuJr/iKmL7oVI/xJdRMBZU9SGlkOQPymKBUECbWrWyUGfUh6nBQKlUUyKIAz 44SzK6TyVJDckFK8YEvGYG9TPoZ67LGZvTBJzE4nksOhiE5GpxT5ZbEuFUn7n+UnIWQY a5JQLZ/mHdfxNHJCdW2ec0eQ5Y8RG3bppdkw6DX+U+0fsVBoDupS5XV1m1Q9fOdlks3C 3e0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958176; x=1709562976; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0en7LuGGPdtEWR7o0ZKkW7B0C/vkNLQUW7iYqtMdnKQ=; b=E+2MsHlclWUf0jxwMKDfG01yKENkS23Ni4bw8CHTAWlLpKCwjP38bsUrTk2fOdm+DZ 1qyugHprBRwwpMVVYP8dDNTtPJ49mItK87J9JMcJvyVx27cD+tmCTzgaNImSmJsATxSX z4c2upcQh0ksCwU+7mF/tfCrTkxtf7it6Dwvad9HDl1L0rv2d49vAFnmrNJbkcwLpDV4 0KRjM7clfyP6kBPjNFSAx/H0ALHRRXoTcqJ9lu4uMK5GNuYz8xA55cP8AMXsFhEKtBGw 9D7p01QFv4X+H7nPJFS+4CMONDXOE0wof8eadzSkT530u5vRFIV5VbTHId4sQNioSz7l 4nHg== X-Forwarded-Encrypted: i=1; AJvYcCWpA0XHxGf/hT2m40OU26xLcc1QDbxyxE9PrHklLG+2tlESybpRtnCo6P56vfKw518FeMBOz6alEn+0D7XgeC8qTzfh X-Gm-Message-State: AOJu0YyGIELOvZaGCLXBzSuYsi9TOfJivskvNA21SUR0AN5ryjzlzXlm yzOh52WDfhG6FePogWuRVqj3ocA6sPvENkqAFpIuA8NjUScYA0emHcXh/ZVU X-Google-Smtp-Source: AGHT+IHnVwo7GYxJJDVczscbJVFNB9n7llcj1bxVQaALhIzn6VPejmlCoxnolZB4h9xn3TwoQmd4jw== X-Received: by 2002:a05:6e02:2166:b0:365:13a8:4090 with SMTP id s6-20020a056e02216600b0036513a84090mr9791278ilv.27.1708958175892; Mon, 26 Feb 2024 06:36:15 -0800 (PST) Received: from localhost ([47.89.225.180]) by smtp.gmail.com with ESMTPSA id s35-20020a634523000000b005cfb6e7b0c7sm4049086pga.39.2024.02.26.06.36.14 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:36:15 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 30/73] KVM: x86/PVM: Implement segment related callbacks Date: Mon, 26 Feb 2024 22:35:47 +0800 Message-Id: <20240226143630.33643-31-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan Segmentation in PVM guest is generally disabled and is only available for instruction emulation. The segment descriptors of segment registers are just cached and do not take effect in hardware. Since the PVM guest is only allowed to run in x86 long mode, the value of guest CS/SS is fixed and depends on the current mode. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 128 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 128 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index 26b2201f7dde..6f91dffb6c50 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -630,6 +630,52 @@ static int pvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) return ret; } +static void pvm_get_segment(struct kvm_vcpu *vcpu, + struct kvm_segment *var, int seg) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + + // Update CS or SS to reflect the current mode. + if (seg == VCPU_SREG_CS) { + if (is_smod(pvm)) { + pvm->segments[seg].selector = kernel_cs_by_msr(pvm->msr_star); + pvm->segments[seg].dpl = 0; + pvm->segments[seg].l = 1; + pvm->segments[seg].db = 0; + } else { + pvm->segments[seg].selector = pvm->hw_cs >> 3; + pvm->segments[seg].dpl = 3; + if (pvm->hw_cs == __USER_CS) { + pvm->segments[seg].l = 1; + pvm->segments[seg].db = 0; + } else { // __USER32_CS + pvm->segments[seg].l = 0; + pvm->segments[seg].db = 1; + } + } + } else if (seg == VCPU_SREG_SS) { + if (is_smod(pvm)) { + pvm->segments[seg].dpl = 0; + pvm->segments[seg].selector = kernel_ds_by_msr(pvm->msr_star); + } else { + pvm->segments[seg].dpl = 3; + pvm->segments[seg].selector = pvm->hw_ss >> 3; + } + } + + // Update DS/ES/FS/GS states from the hardware when the states are loaded. + pvm_switch_to_host(pvm); + *var = pvm->segments[seg]; +} + +static u64 pvm_get_segment_base(struct kvm_vcpu *vcpu, int seg) +{ + struct kvm_segment var; + + pvm_get_segment(vcpu, &var, seg); + return var.base; +} + static int pvm_get_cpl(struct kvm_vcpu *vcpu) { if (is_smod(to_pvm(vcpu))) @@ -637,6 +683,80 @@ static int pvm_get_cpl(struct kvm_vcpu *vcpu) return 3; } +static void pvm_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + int cpl = pvm_get_cpl(vcpu); + + // Unload DS/ES/FS/GS states from hardware before changing them. + // It also has to unload the VCPU when leaving PVM mode. + pvm_switch_to_host(pvm); + pvm->segments[seg] = *var; + + switch (seg) { + case VCPU_SREG_CS: + if (var->dpl == 1 || var->dpl == 2) + goto invalid_change; + if (!kvm_vcpu_has_run(vcpu)) { + // CPL changing is only valid for the first changed + // after the vcpu is created (vm-migration). + if (cpl != var->dpl) + pvm_switch_flags_toggle_mod(pvm); + } else { + if (cpl != var->dpl) + goto invalid_change; + if (cpl == 0 && !var->l) + goto invalid_change; + } + break; + case VCPU_SREG_LDTR: + // pvm doesn't support LDT + if (var->selector) + goto invalid_change; + break; + default: + break; + } + + return; + +invalid_change: + kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu); +} + +static void pvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + + if (pvm->hw_cs == __USER_CS) { + *db = 0; + *l = 1; + } else { + *db = 1; + *l = 0; + } +} + +static void pvm_get_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) +{ + *dt = to_pvm(vcpu)->idt_ptr; +} + +static void pvm_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) +{ + to_pvm(vcpu)->idt_ptr = *dt; +} + +static void pvm_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) +{ + *dt = to_pvm(vcpu)->gdt_ptr; +} + +static void pvm_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) +{ + to_pvm(vcpu)->gdt_ptr = *dt; +} + static void pvm_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, int trig_mode, int vector) { @@ -1545,8 +1665,16 @@ static struct kvm_x86_ops pvm_x86_ops __initdata = { .get_msr_feature = pvm_get_msr_feature, .get_msr = pvm_get_msr, .set_msr = pvm_set_msr, + .get_segment_base = pvm_get_segment_base, + .get_segment = pvm_get_segment, + .set_segment = pvm_set_segment, .get_cpl = pvm_get_cpl, + .get_cs_db_l_bits = pvm_get_cs_db_l_bits, .load_mmu_pgd = pvm_load_mmu_pgd, + .get_gdt = pvm_get_gdt, + .set_gdt = pvm_set_gdt, + .get_idt = pvm_get_idt, + .set_idt = pvm_set_idt, .get_rflags = pvm_get_rflags, .set_rflags = pvm_set_rflags, .get_if_flag = pvm_get_if_flag, From patchwork Mon Feb 26 14:35:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572315 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0722513666D; Mon, 26 Feb 2024 14:36:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958181; cv=none; b=F46dMhL42sUf9ofF4Ii3p508obrzbjFNHfTY4N/U1fM/uhmVTCDTqZVo3S5k/AtqZ76mmpvqZSa8rRbDdhf6VjSHnQ6oSxD8mk5AAmU1I0YooxL/Lc8Q/TfEG7JTuyx3F5umKSgUg7yVISpJ5PVqQQhhR/B9lABEeqZlb27uTeo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958181; c=relaxed/simple; bh=DrPI0FjABGzIoUG3fKSzVnoxo2ZVis3PQz/praGRAjI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=rDCWiRJnC65f7tYLAX0WLuM5Mw3RAkgbkKni+SkRl0tGrScgVUEm6ElyDVH9C3X4LJL074SbgOWgN+nmjdgt79TFQU9WV4wVPSo/HTfz69ys5U5FaHQF0awk79Olw4ybzbIO1+cbi5LRo8J8005WLsai3THB5kc0/r6XPE9J0r0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=OchYuHc4; arc=none smtp.client-ip=209.85.214.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="OchYuHc4" Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-1dbd32cff0bso21998925ad.0; Mon, 26 Feb 2024 06:36:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958179; x=1709562979; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hPbfhC/aAqAbSyaObXdyVLUFdRm0DA52iNK6y2iv5Zg=; b=OchYuHc4KxErK7dxZIet03ALcYYwqc6obL5xCoLAgp3gVAiSqQCMQcnL0Khj2uJbY/ eWjkNnBPH+StnRNWHvZi9TTfmQlDpbu811ogRdAzPPz+WmooiJcyt+6egQSRWZ2MWCpl wCMs8UOPr0Jm8cl1LYcLlGx1Jt2Mx5y8PgzwoZ60jb45gIUF7x+511qg50r2n0r5RbKr ZhgGjF5gMwZqbcMCYDhRjit0eEZZ+aa9TUAlAPhTg9ZuF8Hp2K/jGj2L5cRwBDUFMUmD Gug8ybBUn15Jg/ybVh/5sC5/AeUvaYFqIeMOcjxy3FqjPVys+vNc8y2EKrWFBz4ULsfd LFAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958179; x=1709562979; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hPbfhC/aAqAbSyaObXdyVLUFdRm0DA52iNK6y2iv5Zg=; b=BHcRS+1S2DwJDg+bZoEMS24I/JQVXiYyT7ffYJCKBRhve+sZ818jtWAnr+UE1WdiPr 0w61z/gn3l1Dxswh/+qAD04nHgO4o744PPmfOX+xK8qHvWuFpywRRY+hTnmZwc5sUBDN 03YuClW75hHGCcZRwDjdaNBhOTX37MJZbLocjuc2mM2Kobh9bHkElIiR162dxTnhH787 WfgxkGwLJI77sdAI00r8ywH2Tds7orDav+LUJMmQVJnLfOjElAX7/idAI/FTA4Wca6oV LbNFZTOWiQA0+8y4Iv9sUAxLge/D153yskIuAPOUaiAMP06IVWDl5pRe5kra/SH9mG/z tjLQ== X-Forwarded-Encrypted: i=1; AJvYcCW1H430rzzH/EXHDPBTwAhvkAsl3ncx2kxXKTJaH+DK9P90ekmyMBcqu2YiBXWwHqgRUUxu50zdSQE+D5t98EXEAiHq X-Gm-Message-State: AOJu0Yxfjyxj7Q7M0zuAsY8L+t4WDW2Vk2lZIxIHPXzJc8naV7soV6VW rOVnWec42ou6z++DL4npoLgs3EKEJyI3l7RYogXEbpw0wLqQ1rkhuM/sjyrO X-Google-Smtp-Source: AGHT+IG5rBMQsBjvkaIYNUuj+uXskXTIoKE+AQ7xs4oeIW/gqaIHEwUUGjBYF7X7C2QKKW1z6tgWRQ== X-Received: by 2002:a17:902:e848:b0:1dc:722a:1e0a with SMTP id t8-20020a170902e84800b001dc722a1e0amr7156547plg.41.1708958179150; Mon, 26 Feb 2024 06:36:19 -0800 (PST) Received: from localhost ([198.11.178.15]) by smtp.gmail.com with ESMTPSA id mm12-20020a1709030a0c00b001d8d1a2e5fesm4005783plb.196.2024.02.26.06.36.18 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:36:18 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 31/73] KVM: x86/PVM: Implement instruction emulation for #UD and #GP Date: Mon, 26 Feb 2024 22:35:48 +0800 Message-Id: <20240226143630.33643-32-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan The privilege instruction in supervisor mode will trigger a #GP and induce VM exit. Therefore, PVM reuses the existing x86 emulator in PVM to support privilege instruction emulation in supervisor mode. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index 6f91dffb6c50..4ec8c2c514ca 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -402,6 +402,40 @@ static void pvm_sched_in(struct kvm_vcpu *vcpu, int cpu) { } +static void pvm_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall) +{ + /* KVM_X86_QUIRK_FIX_HYPERCALL_INSN should not be enabled for pvm guest */ + + /* ud2; int3; */ + hypercall[0] = 0x0F; + hypercall[1] = 0x0B; + hypercall[2] = 0xCC; +} + +static int pvm_check_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type, + void *insn, int insn_len) +{ + return X86EMUL_CONTINUE; +} + +static int skip_emulated_instruction(struct kvm_vcpu *vcpu) +{ + return kvm_emulate_instruction(vcpu, EMULTYPE_SKIP); +} + +static int pvm_check_intercept(struct kvm_vcpu *vcpu, + struct x86_instruction_info *info, + enum x86_intercept_stage stage, + struct x86_exception *exception) +{ + /* + * HF_GUEST_MASK is not used even nested pvm is supported. L0 pvm + * might even be unaware the L1 pvm. + */ + WARN_ON_ONCE(1); + return X86EMUL_CONTINUE; +} + static void pvm_set_msr_linear_address_range(struct vcpu_pvm *pvm, u64 pml4_i_s, u64 pml4_i_e, u64 pml5_i_s, u64 pml5_i_e) @@ -1682,8 +1716,10 @@ static struct kvm_x86_ops pvm_x86_ops __initdata = { .vcpu_pre_run = pvm_vcpu_pre_run, .vcpu_run = pvm_vcpu_run, .handle_exit = pvm_handle_exit, + .skip_emulated_instruction = skip_emulated_instruction, .set_interrupt_shadow = pvm_set_interrupt_shadow, .get_interrupt_shadow = pvm_get_interrupt_shadow, + .patch_hypercall = pvm_patch_hypercall, .inject_irq = pvm_inject_irq, .inject_nmi = pvm_inject_nmi, .inject_exception = pvm_inject_exception, @@ -1699,6 +1735,7 @@ static struct kvm_x86_ops pvm_x86_ops __initdata = { .vcpu_after_set_cpuid = pvm_vcpu_after_set_cpuid, + .check_intercept = pvm_check_intercept, .handle_exit_irqoff = pvm_handle_exit_irqoff, .request_immediate_exit = __kvm_request_immediate_exit, @@ -1721,6 +1758,7 @@ static struct kvm_x86_ops pvm_x86_ops __initdata = { .complete_emulated_msr = kvm_complete_insn_gp, .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, + .check_emulate_instruction = pvm_check_emulate_instruction, .disallowed_va = pvm_disallowed_va, .vcpu_gpc_refresh = pvm_vcpu_gpc_refresh, }; From patchwork Mon Feb 26 14:35:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572316 Received: from mail-pj1-f48.google.com (mail-pj1-f48.google.com [209.85.216.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C709136679; Mon, 26 Feb 2024 14:36:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958185; cv=none; b=oZiREmpsrTYm5+0lJBNgx9N/vizmDntzhk4+YzuwHFqnl9DtyKWycgtW7gyYCgN60OG/YlgYeAu+X0DdddHlPgesRjX/93vKbZyN3WFbraunfuEOqUkJse6xTzVMVPyZS6r+7w8IzdkOCxeQxLQF/GEju8qhmAeRxlweKKa7NZ8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958185; c=relaxed/simple; bh=FSCeMybtjLkWOcvn1/oPwqYyDuFS9jfLHr/iI3/35J4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=cibUgOoHSHa3jwbuXLcocq8MCclDZ67GUZsIClOkd3kZ6EhdaTf9LsmzisOOaWc9iwrC0mBwsM6cdtvzJkrMHIUfTc12pQkpKcge+HCmUOXzCxAfS+RdQIw04Gjr7fY5jco1tZRIqquT90NvLHbkxGMjVeF3zPvj5UsMa6ib+MQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=fBkD0NRP; arc=none smtp.client-ip=209.85.216.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="fBkD0NRP" Received: by mail-pj1-f48.google.com with SMTP id 98e67ed59e1d1-29ab78237d2so549898a91.1; Mon, 26 Feb 2024 06:36:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958182; x=1709562982; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=y7bwJTFGuFIC+k+OLuNRGTvG01rlJOsBv/cJ09NssQo=; b=fBkD0NRPw4AGY3YXrHNWWtI42PIFEEkn5wKbwArZSgYMkBVvYVd7om/top6bbFAgz4 5F/b17qDkquMppAwXit4mXEh4YxrNo7lzE162Yf8kKmiGpE5lmhYbxa/T184OQwjNu8d xZ9eLaaGSsjlZTehGE9UxJxt1Aam02S9hivlsQFV/AAwVba0InMQcbc+wezckpYV/emF Y5PtGyItg+QEblMS+xxV+PtdW/LByiG5C+M6K9IAvDG0LNrCIIt2gVLLWXFDvnJF5TgD c9nSCtMKOL6+JCIn3tcZxCll6ybsexTN9FB4BLIAbfSTL2ySCqfwRYyFvT2pgcd7sajP Dj3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958182; x=1709562982; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=y7bwJTFGuFIC+k+OLuNRGTvG01rlJOsBv/cJ09NssQo=; b=Kvcebnit4V0I81TxrkGG3PjDSULGVcCDoM8dvusHBOrvbmSKImWhjIDxpaP0Lp3kSF me17vdL23OUnL73ETi/cw0ndTB0OgMhP8rXX5D9CFM2dXUzAmeU+8eqShh+jdT37RWe0 6sDHlhiyQpd9iMxtMlNn3Euu88v4rzZM+fxsiIAoE0uameoWZ7piZxXO3T5n55KJoHKx Qvu3KJrsJBEfnkXg9cGkobQX08PEyiJvX+IYWooRiEKpKfoidv0iVAfWKaS79I+tL4PE jx4nfx/KFA/tlpSZpLQcEFl5r2grNlpDcEOu5NqOP30Ahkb1tyzffAlecidqS6RorHJk 1lLQ== X-Forwarded-Encrypted: i=1; AJvYcCV3ofhI5ahyvp8i1MH6ErFjEmyNpvqXNm7QrWsdV7F5/4l4TzPo6kTS1FnbNKE4ZFxyv7ShiJ9A1/q1hq3wDc7Za9Au X-Gm-Message-State: AOJu0YwEiYOrpaj/UND34JIsSJfwMbx4+OPZN2E/KiP7pe6FgEPq4Xox 7bhnP62BH6BD8/IF7KJ1fzR4+h9lvl5oDGkjoDen5wy/OTEN+c0o3d7H0+nk X-Google-Smtp-Source: AGHT+IG3z1woxkp3HjHcu/cKR62RacPqwF1XKXl10J4SM9h9SBQY/rrOhEnQy0WWaLka6uM1LzHAOg== X-Received: by 2002:a17:90a:bd09:b0:29a:8b1b:1f61 with SMTP id y9-20020a17090abd0900b0029a8b1b1f61mr3842116pjr.17.1708958182556; Mon, 26 Feb 2024 06:36:22 -0800 (PST) Received: from localhost ([47.254.32.37]) by smtp.gmail.com with ESMTPSA id n15-20020a17090ade8f00b002995e9aca72sm4579874pjv.29.2024.02.26.06.36.21 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:36:22 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 32/73] KVM: x86/PVM: Enable guest debugging functions Date: Mon, 26 Feb 2024 22:35:49 +0800 Message-Id: <20240226143630.33643-33-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan The guest DR7 is loaded before VM enter to enable debugging functions for the guest. If guest debugging is not enabled, the #DB and #BP exceptions are reinjected into the guest directly; otherwise, they are handled by the hypervisor. However, DR7_GD is cleared since debug register read/write is a privileged instruction, which always leads to a VM exit for #GP. The address of breakpoints is limited to the allowed address range, similar to the check in the #PF path. Guest DR7 is loaded before VM enter to enable debug function for guest. If guest debug is not enabled, the #DB and #BP are reinjected into guest directly, otherwise, they are handled by hypervisor similar to VMX. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 96 ++++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/pvm/pvm.h | 3 ++ 2 files changed, 99 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index 4ec8c2c514ca..299305903005 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -383,6 +383,8 @@ static void pvm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct vcpu_pvm *pvm = to_pvm(vcpu); + pvm->host_debugctlmsr = get_debugctlmsr(); + if (__this_cpu_read(active_pvm_vcpu) == pvm && vcpu->cpu == cpu) return; @@ -533,6 +535,9 @@ static int pvm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_SYSENTER_ESP: msr_info->data = pvm->unused_MSR_IA32_SYSENTER_ESP; break; + case MSR_IA32_DEBUGCTLMSR: + msr_info->data = 0; + break; case MSR_PVM_VCPU_STRUCT: msr_info->data = pvm->msr_vcpu_struct; break; @@ -619,6 +624,9 @@ static int pvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_SYSENTER_ESP: pvm->unused_MSR_IA32_SYSENTER_ESP = data; break; + case MSR_IA32_DEBUGCTLMSR: + /* It is ignored now. */ + break; case MSR_PVM_VCPU_STRUCT: if (!PAGE_ALIGNED(data)) return 1; @@ -810,6 +818,10 @@ static bool pvm_apic_init_signal_blocked(struct kvm_vcpu *vcpu) return false; } +static void update_exception_bitmap(struct kvm_vcpu *vcpu) +{ +} + static struct pvm_vcpu_struct *pvm_get_vcpu_struct(struct vcpu_pvm *pvm) { struct gfn_to_pfn_cache *gpc = &pvm->pvcs_gpc; @@ -1235,6 +1247,72 @@ static int pvm_vcpu_pre_run(struct kvm_vcpu *vcpu) return 1; } +static void pvm_sync_dirty_debug_regs(struct kvm_vcpu *vcpu) +{ + WARN_ONCE(1, "pvm never sets KVM_DEBUGREG_WONT_EXIT\n"); +} + +static void pvm_set_dr7(struct kvm_vcpu *vcpu, unsigned long val) +{ + to_pvm(vcpu)->guest_dr7 = val; +} + +static __always_inline unsigned long __dr7_enable_mask(int drnum) +{ + unsigned long bp_mask = 0; + + bp_mask |= (DR_LOCAL_ENABLE << (drnum * DR_ENABLE_SIZE)); + bp_mask |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE)); + + return bp_mask; +} + +static __always_inline unsigned long __dr7_mask(int drnum) +{ + unsigned long bp_mask = 0xf; + + bp_mask <<= (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE); + bp_mask |= __dr7_enable_mask(drnum); + + return bp_mask; +} + +/* + * Calculate the correct dr7 for the hardware to avoid the host + * being watched. + * + * It only needs to be calculated each time when vcpu->arch.eff_db or + * pvm->guest_dr7 is changed. But now it is calculated each time on + * VM-enter since there is no proper callback for vcpu->arch.eff_db and + * it is slow path. + */ +static __always_inline unsigned long pvm_eff_dr7(struct kvm_vcpu *vcpu) +{ + unsigned long eff_dr7 = to_pvm(vcpu)->guest_dr7; + int i; + + /* + * DR7_GD should not be set to hardware. And it doesn't need to be + * set to hardware since PVM guest is running on hardware ring3. + * All access to debug registers will be trapped and the emulation + * code can handle DR7_GD correctly for PVM. + */ + eff_dr7 &= ~DR7_GD; + + /* + * Disallow addresses that are not for the guest, especially addresses + * on the host entry code. + */ + for (i = 0; i < KVM_NR_DB_REGS; i++) { + if (!pvm_guest_allowed_va(vcpu, vcpu->arch.eff_db[i])) + eff_dr7 &= ~__dr7_mask(i); + if (!pvm_guest_allowed_va(vcpu, vcpu->arch.eff_db[i] + 7)) + eff_dr7 &= ~__dr7_mask(i); + } + + return eff_dr7; +} + // Save guest registers from host sp0 or IST stack. static __always_inline void save_regs(struct kvm_vcpu *vcpu, struct pt_regs *guest) { @@ -1301,6 +1379,9 @@ static noinstr void pvm_vcpu_run_noinstr(struct kvm_vcpu *vcpu) // Load guest registers into the host sp0 stack for switcher. load_regs(vcpu, sp0_regs); + if (unlikely(pvm->guest_dr7 & DR7_BP_EN_MASK)) + set_debugreg(pvm_eff_dr7(vcpu), 7); + // Call into switcher and enter guest. ret_regs = switcher_enter_guest(); @@ -1309,6 +1390,11 @@ static noinstr void pvm_vcpu_run_noinstr(struct kvm_vcpu *vcpu) pvm->exit_vector = (ret_regs->orig_ax >> 32); pvm->exit_error_code = (u32)ret_regs->orig_ax; + // dr7 requires to be zero when the controling of debug registers + // passes back to the host. + if (unlikely(pvm->guest_dr7 & DR7_BP_EN_MASK)) + set_debugreg(0, 7); + // handle noinstr vmexits reasons. switch (pvm->exit_vector) { case PF_VECTOR: @@ -1387,8 +1473,15 @@ static fastpath_t pvm_vcpu_run(struct kvm_vcpu *vcpu) pvm_set_host_cr3(pvm); + if (pvm->host_debugctlmsr) + update_debugctlmsr(0); + pvm_vcpu_run_noinstr(vcpu); + /* MSR_IA32_DEBUGCTLMSR is zeroed before vmenter. Restore it if needed */ + if (pvm->host_debugctlmsr) + update_debugctlmsr(pvm->host_debugctlmsr); + if (is_smod(pvm)) { struct pvm_vcpu_struct *pvcs = pvm->pvcs_gpc.khva; @@ -1696,6 +1789,7 @@ static struct kvm_x86_ops pvm_x86_ops __initdata = { .vcpu_load = pvm_vcpu_load, .vcpu_put = pvm_vcpu_put, + .update_exception_bitmap = update_exception_bitmap, .get_msr_feature = pvm_get_msr_feature, .get_msr = pvm_get_msr, .set_msr = pvm_set_msr, @@ -1709,6 +1803,8 @@ static struct kvm_x86_ops pvm_x86_ops __initdata = { .set_gdt = pvm_set_gdt, .get_idt = pvm_get_idt, .set_idt = pvm_set_idt, + .set_dr7 = pvm_set_dr7, + .sync_dirty_debug_regs = pvm_sync_dirty_debug_regs, .get_rflags = pvm_get_rflags, .set_rflags = pvm_set_rflags, .get_if_flag = pvm_get_if_flag, diff --git a/arch/x86/kvm/pvm/pvm.h b/arch/x86/kvm/pvm/pvm.h index bf3a6a1837c0..4cdcbed1c813 100644 --- a/arch/x86/kvm/pvm/pvm.h +++ b/arch/x86/kvm/pvm/pvm.h @@ -37,6 +37,7 @@ struct vcpu_pvm { unsigned long switch_flags; u16 host_ds_sel, host_es_sel; + u64 host_debugctlmsr; union { unsigned long exit_extra; @@ -52,6 +53,8 @@ struct vcpu_pvm { int int_shadow; bool nmi_mask; + unsigned long guest_dr7; + struct gfn_to_pfn_cache pvcs_gpc; // emulated x86 msrs From patchwork Mon Feb 26 14:35:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572317 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 042AA1369B8; Mon, 26 Feb 2024 14:36:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958188; cv=none; b=S/G6GVlAyLN2+0JsAfwDQ5hFDuYBGHmERGAOL2Xvf6RHS/Tq8/bx4597zNgNiX6qN82AOkvYjheqPEfQifNILnbtfgGmWPiQftfvkDqTEFwPKNJnSXcuOUGvLo6Bd5wkDIac6C68XK6O/DuqFuBenFzTWJ88+xUO8EoeCCfAmNk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958188; c=relaxed/simple; bh=FqG+PShZWy3cSlJS3qEDXedk0RFcNd6z6U1V+6nl3B8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=n5+MwEZqRcLM4lxDZVboLkH4+VVz5KJ7PJBY1WE81iVxISNNZjMsOEwqvOCo5Vw6Lu5Lg2O4gEKL5N92fF6zrJJO8752O7gHNegmZVLQRUiihe57LczbPw1tKg86IvjmdQwCXCn8k9VAJ0cb6XXc5iEfgQr1ILx7bLfFQP9qRjw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=cRfO21ku; arc=none smtp.client-ip=209.85.214.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="cRfO21ku" Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-1dc1ff3ba1aso25326715ad.3; Mon, 26 Feb 2024 06:36:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958186; x=1709562986; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=h/GP0KqLiQo98vwHDOjndhSbhcZ2rln4TVJt6Ie5g30=; b=cRfO21kuPaMtiH7S2C5gbeshHNiYpLy+BbUc1xYjM5vx23faOppFqYm//5BGiIjVIK uSMFGQqM8fb/NwzaCxkrTqChfGbh2pnQIi+pJwuBBboCCr+nzw33nXKo49VC3ZgeFa3K i0iAdDi9io+zJW9RShFT35yArr2WxzJW2/Dbki6+P2q59HSnk2xDbrACsLTrKqMNKlUa J6VNB22y4jjrovq6F0a5CeuG8Mu8y2TJvV3tmheMTzV21WdHCuumL8l/V3xxzSUvwpYE 5OCjyfN99HUv5d/OAL2NrgIqfHR6Npboi8mbzNy1PgFBU4UMWVmkJoDF0CUmnkNWYevA uK5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958186; x=1709562986; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=h/GP0KqLiQo98vwHDOjndhSbhcZ2rln4TVJt6Ie5g30=; b=MjSVIH5OUt2enxBW+kBtbGw6JsmPQAOvjc65z3LhP/XHxsCExXmnBHH6UEBKp8R/uX Vdrvz7dzxZzolwtQmO1F+tm4YLdfqQ35uJiyygiKnIsP1JQydeCNyo9miWbOZVOP3j3x ZPBGY8woAabG8r/SsZkGp3ZMkI6B4tb02fu7m5jZAH4FCqZKDeHholDtT5CbaL6eoQXJ D3B/PFz4gLfqvLrq7ciBS8bXvMizo2Lu4LHMAE3DnVdW8rzXybzZ1fQsx1X5m/B4NSVe HUQNNkR/0nOMYhFX2tD83Ymdd8vWLc1bhwaUlwJhZdBlOwqWGEfliMeBjR81kJwB/9rn uKTw== X-Forwarded-Encrypted: i=1; AJvYcCXFfVkmAraO09A5WKS+6ZMEpzmZS82+7mazVi6BwKSWkQvoUVILU2aZcy043fas9MGURSZpinQGhdzB9aRalgJjuNd1 X-Gm-Message-State: AOJu0YwOro6pZeQxHzqhrzboxYG76WxYkE8dPOTmHZUpmUA8kn3073+t FgSIikdGibEEFfMupythdmAJWqLEzt8zL+gKBplKUcnAlLNfis5hcfrqD/kz X-Google-Smtp-Source: AGHT+IEMuwRtiAJXeXANhaAHgTUUopEhhNurxT2IDkexHNjvTyx0eaYlmjpPJEMP+wZaJcr/GimM2A== X-Received: by 2002:a17:902:ea12:b0:1dc:90c0:1e6a with SMTP id s18-20020a170902ea1200b001dc90c01e6amr5660164plg.45.1708958185760; Mon, 26 Feb 2024 06:36:25 -0800 (PST) Received: from localhost ([47.88.5.130]) by smtp.gmail.com with ESMTPSA id u11-20020a170902e20b00b001dc9422891esm2650104plb.30.2024.02.26.06.36.24 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:36:25 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 33/73] KVM: x86/PVM: Handle VM-exit due to hardware exceptions Date: Mon, 26 Feb 2024 22:35:50 +0800 Message-Id: <20240226143630.33643-34-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan When the exceptions are of interest to the hypervisor for emulation or debugging, they should be handled by the hypervisor first, for example, handling #PF for shadow page table. If the exceptions are pure guest exceptions, they should be reinjected into the guest directly. If the exceptions belong to the host, they should already have been handled in an atomic way before enabling interrupts. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 157 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 157 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index 299305903005..c6fd01c19c3e 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -20,6 +20,7 @@ #include "cpuid.h" #include "lapic.h" +#include "mmu.h" #include "trace.h" #include "x86.h" #include "pvm.h" @@ -1161,6 +1162,160 @@ static int handle_exit_syscall(struct kvm_vcpu *vcpu) return 1; } +static int handle_exit_debug(struct kvm_vcpu *vcpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + struct kvm_run *kvm_run = pvm->vcpu.run; + + if (pvm->vcpu.guest_debug & + (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP)) { + kvm_run->exit_reason = KVM_EXIT_DEBUG; + kvm_run->debug.arch.dr6 = pvm->exit_dr6 | DR6_FIXED_1 | DR6_RTM; + kvm_run->debug.arch.dr7 = vcpu->arch.guest_debug_dr7; + kvm_run->debug.arch.pc = kvm_rip_read(vcpu); + kvm_run->debug.arch.exception = DB_VECTOR; + return 0; + } + + kvm_queue_exception_p(vcpu, DB_VECTOR, pvm->exit_dr6); + return 1; +} + +/* check if the previous instruction is "int3" on receiving #BP */ +static bool is_bp_trap(struct kvm_vcpu *vcpu) +{ + u8 byte = 0; + unsigned long rip; + struct x86_exception exception; + int r; + + rip = kvm_rip_read(vcpu) - 1; + r = kvm_read_guest_virt(vcpu, rip, &byte, 1, &exception); + + /* Just assume it to be int3 when failed to fetch the instruction. */ + if (r) + return true; + + return byte == 0xcc; +} + +static int handle_exit_breakpoint(struct kvm_vcpu *vcpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + struct kvm_run *kvm_run = pvm->vcpu.run; + + /* + * Breakpoint exception can be caused by int3 or int 3. While "int3" + * participates in guest debug, but "int 3" should not. + */ + if ((vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP) && is_bp_trap(vcpu)) { + kvm_rip_write(vcpu, kvm_rip_read(vcpu) - 1); + kvm_run->exit_reason = KVM_EXIT_DEBUG; + kvm_run->debug.arch.pc = kvm_rip_read(vcpu); + kvm_run->debug.arch.exception = BP_VECTOR; + return 0; + } + + kvm_queue_exception(vcpu, BP_VECTOR); + return 1; +} + +static int handle_exit_exception(struct kvm_vcpu *vcpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + struct kvm_run *kvm_run = vcpu->run; + u32 vector, error_code; + int err; + + vector = pvm->exit_vector; + error_code = pvm->exit_error_code; + + switch (vector) { + // #PF, #GP, #UD, #DB and #BP are guest exceptions or hypervisor + // interested exceptions for emulation or debugging. + case PF_VECTOR: + // Remove hardware generated PFERR_USER_MASK when in supervisor + // mode to reflect the real mode in PVM. + if (is_smod(pvm)) + error_code &= ~PFERR_USER_MASK; + + // If it is a PK fault, set pkru=0 and re-enter the guest silently. + // See the comment before pvm_load_guest_xsave_state(). + if (cpu_feature_enabled(X86_FEATURE_PKU) && (error_code & PFERR_PK_MASK)) + return 1; + + return kvm_handle_page_fault(vcpu, error_code, pvm->exit_cr2, + NULL, 0); + case GP_VECTOR: + err = kvm_emulate_instruction(vcpu, EMULTYPE_PVM_GP); + if (!err) + return 0; + + if (vcpu->arch.halt_request) { + vcpu->arch.halt_request = 0; + return kvm_emulate_halt_noskip(vcpu); + } + return 1; + case UD_VECTOR: + if (!is_smod(pvm)) { + kvm_queue_exception(vcpu, UD_VECTOR); + return 1; + } + return handle_ud(vcpu); + case DB_VECTOR: + return handle_exit_debug(vcpu); + case BP_VECTOR: + return handle_exit_breakpoint(vcpu); + + // #DE, #OF, #BR, #NM, #MF, #XM, #TS, #NP, #SS and #AC are pure guest + // exceptions. + case DE_VECTOR: + case OF_VECTOR: + case BR_VECTOR: + case NM_VECTOR: + case MF_VECTOR: + case XM_VECTOR: + kvm_queue_exception(vcpu, vector); + return 1; + case AC_VECTOR: + case TS_VECTOR: + case NP_VECTOR: + case SS_VECTOR: + kvm_queue_exception_e(vcpu, vector, error_code); + return 1; + + // #NMI, #VE, #VC, #MC and #DF are exceptions that belong to host. + // They should have been handled in atomic way when vmexit. + case NMI_VECTOR: + // NMI is handled by pvm_vcpu_run_noinstr(). + return 1; + case VE_VECTOR: + // TODO: tdx_handle_virt_exception(regs, &pvm->exit_ve); break; + goto unknown_exit_reason; + case X86_TRAP_VC: + // TODO: handle the second part for #VC. + goto unknown_exit_reason; + case MC_VECTOR: + // MC is handled by pvm_handle_exit_irqoff(). + // TODO: split kvm_machine_check() to avoid irq-enabled or + // schedule code (thread dead) in pvm_handle_exit_irqoff(). + return 1; + case DF_VECTOR: + // DF is handled when exiting and can't reach here. + pr_warn_once("host bug, can't reach here"); + break; + default: +unknown_exit_reason: + pr_warn_once("unknown exit_reason vector:%d, error_code:%x, rip:0x%lx\n", + vector, pvm->exit_error_code, kvm_rip_read(vcpu)); + kvm_run->exit_reason = KVM_EXIT_EXCEPTION; + kvm_run->ex.exception = vector; + kvm_run->ex.error_code = error_code; + break; + } + return 0; +} + static int handle_exit_external_interrupt(struct kvm_vcpu *vcpu) { ++vcpu->stat.irq_exits; @@ -1187,6 +1342,8 @@ static int pvm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) if (exit_reason == PVM_SYSCALL_VECTOR) return handle_exit_syscall(vcpu); + else if (exit_reason >= 0 && exit_reason < FIRST_EXTERNAL_VECTOR) + return handle_exit_exception(vcpu); else if (exit_reason == IA32_SYSCALL_VECTOR) return do_pvm_event(vcpu, IA32_SYSCALL_VECTOR, false, 0); else if (exit_reason >= FIRST_EXTERNAL_VECTOR && exit_reason < NR_VECTORS) From patchwork Mon Feb 26 14:35:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572318 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E7CFB137C49; Mon, 26 Feb 2024 14:36:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958191; cv=none; b=iD7zM9SG4iCzG5N5oeKgrnZXQoh5dZgBv0ss0grFaw5ISCQPOcMI+U4OW2TZ1I1DwRxDszuRDeoAcsslRLr5/AQFmC9FpALOFTqmP7VmvjhG5b+JETh1ASOqxyQGuzKRJPdMKOoY9cPF7/sVGz+Ay0+2KOUeytXsTTzJZP6t49A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958191; c=relaxed/simple; bh=psHI8zxpnvNzC6HzjO8u2HuTrjxyl16doUd1Ga+thcE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=awtGEWSIULOspr6vIW7QEq0z9SbgHP34oybpAN3S1t03IEopOtJDzEfwURGxhkRpXOF73hxlklHTvONJsrHD6II7RgNKuKMRx84pDLCW4BXP4wv9rDCGi+iWhzlMZcBE1/G3ny6PI5KzCB0yqLxf5z1i0CSD961/7vfM6y3dIaQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=OhF8/ckU; arc=none smtp.client-ip=209.85.210.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="OhF8/ckU" Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-6e457fab0e2so1882673b3a.0; Mon, 26 Feb 2024 06:36:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958189; x=1709562989; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=cQ/nS0bgBloNdtpxFga4Mz7X6Yf8rS35YXtEExYA9cI=; b=OhF8/ckUC6I4X2JVgIc0RRMIJTttTBpAbGEYPhkjbSm/ZRI0iu5GVIaZ3GYFU/5JZo qvZd6Yuxl9kkP8cBJUDzTpzlPpmUTG0uvOCWrHW71Z3s763krArxkNPN4exaAoe0Dqm1 K2z/M1o5apPUorx4oc/NCCVh3iJqIllq4b7oboW+O9EqcC50Hpdqj1LZ1mky17TAt8dy 0aoAHhJLI9wZ0DbFqiActbO1mUJDN3cTc769mmx72hKJhd9ZqBcK24kl9Hmv7Jy+ClfP I91DGnu3fsBxHaKI6VftZqhwzIOdMzIBj8aq64U1XoziBvXnmsoGwzpK9Lg1k2DgEIwU Bm4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958189; x=1709562989; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cQ/nS0bgBloNdtpxFga4Mz7X6Yf8rS35YXtEExYA9cI=; b=M0XthRboPolxGd330ywkJdcY0SZsih5VN/pVQuXIsNxemq1AmUUutYvROUBt7wzhO0 JdUdZ0zYKwOq602No0hqiZSanwUyjM7CPoD7ZJwf7yzkwNoBFCqaFzUG9kG10koLuVRQ K7dZ9mp26WYeBEFR9Nol9M5wqH/qhlOUuEraLIyaXWEbVBxHYaMeQrWQKLv0ABdxU6hI 0KVLF6sM4GdDI7yhK8a4b7xavo5DT0JwFttmA/5qk6yh/2LJ93faK2B25Ge7N/Jaav1D OeyTSbgp/OVhpg6pCbNYTR/LxLwT2XD6NytrPFi/IVgO+hVtu4tR8QuZOHhGbRfJpF7a eEIw== X-Forwarded-Encrypted: i=1; AJvYcCV/9r7+9VCLzULlpyTaUrKOB2EkBsIkTX/VpIOFhzxgekzSv3GpF/TebZIY5pieCFxdaKQVCKsFSsNII5g+splS5PZ5 X-Gm-Message-State: AOJu0Yx7beuKR3re804e+ZnULV65OcwmelfB+SKDXiX8gpak5FoSZXA/ s+6vAgVirTEPQTNtW9KCNgTErkwb0SNYD0IiLG7tHxhVVenZmP5+UyzcZhVo X-Google-Smtp-Source: AGHT+IG2wXl5b5ObFHNLA75gIkPTapG8hUSmdxDiOsx492fO/MObCAdwGggzwvMH2P2Zznwt4ZBAwg== X-Received: by 2002:a05:6a20:c888:b0:19e:cbe9:63b with SMTP id hb8-20020a056a20c88800b0019ecbe9063bmr8230483pzb.3.1708958189007; Mon, 26 Feb 2024 06:36:29 -0800 (PST) Received: from localhost ([47.89.225.180]) by smtp.gmail.com with ESMTPSA id v6-20020a626106000000b006e53e075d60sm672837pfb.70.2024.02.26.06.36.28 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:36:28 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 34/73] KVM: x86/PVM: Handle ERETU/ERETS synthetic instruction Date: Mon, 26 Feb 2024 22:35:51 +0800 Message-Id: <20240226143630.33643-35-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan PVM uses the ERETU synthetic instruction to return to user mode and the ERETS instruction to return to supervisor mode. Similar to event injection, information passing is different. For the ERETU, information is passed by the shared PVCS structure, and for the ERETS, information is passed by the current guest stack. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 74 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 74 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index c6fd01c19c3e..514f0573f70f 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -1153,12 +1153,86 @@ static void pvm_setup_mce(struct kvm_vcpu *vcpu) { } +static int handle_synthetic_instruction_return_user(struct kvm_vcpu *vcpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + struct pvm_vcpu_struct *pvcs; + + // instruction to return user means nmi allowed. + pvm->nmi_mask = false; + + /* + * switch to user mode before kvm_set_rflags() to avoid PVM_EVENT_FLAGS_IF + * to be set. + */ + switch_to_umod(vcpu); + + pvcs = pvm_get_vcpu_struct(pvm); + if (!pvcs) { + kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu); + return 1; + } + + /* + * pvm_set_rflags() doesn't clear PVM_EVENT_FLAGS_IP + * for user mode, so clear it here. + */ + if (pvcs->event_flags & PVM_EVENT_FLAGS_IP) { + pvcs->event_flags &= ~PVM_EVENT_FLAGS_IP; + kvm_make_request(KVM_REQ_EVENT, vcpu); + } + + pvm->hw_cs = pvcs->user_cs | USER_RPL; + pvm->hw_ss = pvcs->user_ss | USER_RPL; + + pvm_write_guest_gs_base(pvm, pvcs->user_gsbase); + kvm_set_rflags(vcpu, pvcs->eflags | X86_EFLAGS_IF); + kvm_rip_write(vcpu, pvcs->rip); + kvm_rsp_write(vcpu, pvcs->rsp); + kvm_rcx_write(vcpu, pvcs->rcx); + kvm_r11_write(vcpu, pvcs->r11); + + pvm_put_vcpu_struct(pvm, false); + + return 1; +} + +static int handle_synthetic_instruction_return_supervisor(struct kvm_vcpu *vcpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + unsigned long rsp = kvm_rsp_read(vcpu); + struct pvm_supervisor_event frame; + struct x86_exception e; + + if (kvm_read_guest_virt(vcpu, rsp, &frame, sizeof(frame), &e)) { + kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu); + return 1; + } + + // instruction to return supervisor means nmi allowed. + pvm->nmi_mask = false; + + kvm_set_rflags(vcpu, frame.rflags); + kvm_rip_write(vcpu, frame.rip); + kvm_rsp_write(vcpu, frame.rsp); + kvm_rcx_write(vcpu, frame.rcx); + kvm_r11_write(vcpu, frame.r11); + + return 1; +} + static int handle_exit_syscall(struct kvm_vcpu *vcpu) { struct vcpu_pvm *pvm = to_pvm(vcpu); + unsigned long rip = kvm_rip_read(vcpu); if (!is_smod(pvm)) return do_pvm_user_event(vcpu, PVM_SYSCALL_VECTOR, false, 0); + + if (rip == pvm->msr_retu_rip_plus2) + return handle_synthetic_instruction_return_user(vcpu); + if (rip == pvm->msr_rets_rip_plus2) + return handle_synthetic_instruction_return_supervisor(vcpu); return 1; } From patchwork Mon Feb 26 14:35:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572319 Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 247A6137C28; Mon, 26 Feb 2024 14:36:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958194; cv=none; b=IskGchOQhhtV498mrCMkKrMgH4EHwfkd38zyTefqoUB6iOqYcX8u4kg9R5En8YtUdfB3LJ/fxFT050MOtoQfHxQWABtIRxeUInswvvu3mI8NM01jrnJw0v16FWv8ZJCJ+KVOkOg0rNoTriFT4vttvGqK2Pyn4h666BkQt97dtFY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958194; c=relaxed/simple; bh=TBy/bNE8b+sGzQQeRO6wKchQYra2E1+jmrOAziFjCG8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=XA3qLeYfp0gLlZqSc3V+xLYx8N86kRZUcnQt7zFD5D4xBIx8i6a/+dvA9KugU0Gr35D6ujV8HIyJE7+2K3u2WzUaUKbA4PneCdomeqRcTpX9WT7z1Ulxqm7HmfEQAUb3Ph3fvFJBLe7c5UCp8uUiqOZ7xH64kaA5wjchG5PmmQg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=lbTF86xe; arc=none smtp.client-ip=209.85.210.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="lbTF86xe" Received: by mail-pf1-f177.google.com with SMTP id d2e1a72fcca58-6e08dd0fa0bso2687626b3a.1; Mon, 26 Feb 2024 06:36:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958192; x=1709562992; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=r31fdmf865h0HOvGM8QPICowYa+OjjDSIGglWYpsdYg=; b=lbTF86xew8igCPX7Kxo6Qq8LBVTNC+jWBKSgCaQrKn7SiHqQ7377v5AYkRUriK1EGJ WtUMlCVONnUHt+YBWbuAOBAGQ4T0bl6Yd9hn4TsmjyNS+C5mVXQbprWV1A/ySeRbF6hI k26Prx3RHTsKQtcgpw0USJJ02Qtij5OEwSrGLwrmFBjIreQn1hsJv9izRVtvjdmNy8k6 7Ewij0RJeYlXEnknQ+wbnhoT7+srLXAtCUEE1YzoT9a9vNf4sEtzZq1EN/IPtNqtLG1B NuQYidbEKB/uoPMUHOC8dcIP7haOc0NT1fHDTyKHLhS5bYAdM9UY7N08736EyXh+HooD d9YQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958192; x=1709562992; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=r31fdmf865h0HOvGM8QPICowYa+OjjDSIGglWYpsdYg=; b=tv8c7eJ57AnmJ4xFy/7oOfK+BmQas1QwPR7BmjAfAcQoqSGqxLR8Ovt5Idg2LevlD/ W06ZpSc3LdrEP7OP3bwDDPKuFNp5Y75tLPjJa8VmVH4bp4sRsFXQL2z94WiNh/bhFl6j Hk6QAc/SCnPAaBW+BsrTIhkln4g8SLUw/Q/GuE2XfADchLBKEFIrlIGAwIG50epLdhth I6PkE9UH/q3twMH3zhNKAZ/zwctuglp8K4vjzMrLIq/jfsKbT4AwqeuUTUqiW6SRw6un BThOSVZ9Xj8HKYl3Ia0Lf3sRVVElF0KxrzE3SpLatMtD5DM+Xk962gKpU8tDLQnIdyn1 /+bg== X-Forwarded-Encrypted: i=1; AJvYcCU4Ucg5EtKtSdlzvKX2U3n5f6J2yguYGKbGNvTUpcWlh8G5XFnnGqVDPYkUYIpHvzE99Iki1RZgc8p3wKuWt07SWahA X-Gm-Message-State: AOJu0YyTij2vheVYR7NWdYi4eXddHbzNFMWKq5HUrO7ZuRKoxqG1ltIN sq++hIgBijDC6nMPTC2KAOMsPsb6pZO+69GpZv/IjXIHYETQ6OUVvLe3EziC X-Google-Smtp-Source: AGHT+IEVd1oTp1pMqvLImfkxKHeXcEgGZK9UOD91jprlyG02MUCEiplvDZZwsZ4L49WqyJBbUJ03Mg== X-Received: by 2002:a05:6a20:9f43:b0:1a0:694c:c467 with SMTP id ml3-20020a056a209f4300b001a0694cc467mr8755527pzb.14.1708958192240; Mon, 26 Feb 2024 06:36:32 -0800 (PST) Received: from localhost ([47.254.32.37]) by smtp.gmail.com with ESMTPSA id h20-20020a635314000000b005dc1edf7371sm4012540pgb.9.2024.02.26.06.36.31 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:36:31 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 35/73] KVM: x86/PVM: Handle PVM_SYNTHETIC_CPUID synthetic instruction Date: Mon, 26 Feb 2024 22:35:52 +0800 Message-Id: <20240226143630.33643-36-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan The PVM guest utilizes the CPUID instruction for detecting PVM hypervisor support. However, the CPUID instruction in the PVM guest is not directly trapped and emulated. Instead, the PVM guest employs the "invlpg 0xffffffffff4d5650; cpuid;" instructions to cause a #GP trap. The hypervisor must identify this trap and handle the emulation of the CPUID instruction within the #GP handling process. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index 514f0573f70f..a2602d9828a5 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -1294,6 +1294,36 @@ static int handle_exit_breakpoint(struct kvm_vcpu *vcpu) return 1; } +static bool handle_synthetic_instruction_pvm_cpuid(struct kvm_vcpu *vcpu) +{ + /* invlpg 0xffffffffff4d5650; cpuid; */ + static const char pvm_synthetic_cpuid_insns[] = { PVM_SYNTHETIC_CPUID }; + char insns[10]; + struct x86_exception e; + + if (kvm_read_guest_virt(vcpu, kvm_get_linear_rip(vcpu), + insns, sizeof(insns), &e) == 0 && + memcmp(insns, pvm_synthetic_cpuid_insns, sizeof(insns)) == 0) { + u32 eax, ebx, ecx, edx; + + if (unlikely(pvm_guest_allowed_va(vcpu, PVM_SYNTHETIC_CPUID_ADDRESS))) + kvm_mmu_invlpg(vcpu, PVM_SYNTHETIC_CPUID_ADDRESS); + + eax = kvm_rax_read(vcpu); + ecx = kvm_rcx_read(vcpu); + kvm_cpuid(vcpu, &eax, &ebx, &ecx, &edx, false); + kvm_rax_write(vcpu, eax); + kvm_rbx_write(vcpu, ebx); + kvm_rcx_write(vcpu, ecx); + kvm_rdx_write(vcpu, edx); + + kvm_rip_write(vcpu, kvm_rip_read(vcpu) + sizeof(insns)); + return true; + } + + return false; +} + static int handle_exit_exception(struct kvm_vcpu *vcpu) { struct vcpu_pvm *pvm = to_pvm(vcpu); @@ -1321,6 +1351,9 @@ static int handle_exit_exception(struct kvm_vcpu *vcpu) return kvm_handle_page_fault(vcpu, error_code, pvm->exit_cr2, NULL, 0); case GP_VECTOR: + if (is_smod(pvm) && handle_synthetic_instruction_pvm_cpuid(vcpu)) + return 1; + err = kvm_emulate_instruction(vcpu, EMULTYPE_PVM_GP); if (!err) return 0; From patchwork Mon Feb 26 14:35:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572320 Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 51BF51384BC; Mon, 26 Feb 2024 14:36:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958197; cv=none; b=KrZq/znptVOuNKyZ6QwuXtreWG3HLhu/9Mbh7zkiJ+eaVKCAGbDp/PR0H0Q7HJfCVN7jl0aEwthZ33FS885OShMbbMS2CJgY4wvQYQ/4kGb48xPDRkHz0s+M2aGfiBM+MUxNiREhj0S891mvVvUP4+KAoD6YBhVXBVDRECoq5SY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958197; c=relaxed/simple; bh=/z7tYbeor/O77YAziAWp5Dpxyzv/isU/KTQwhYd0TaA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=SDZfFRjMsrf6F0CNoN0ifaV5PUpPyv0A5Lniz3/aSXS0H65xMB68jrB1wMMwpPlPaTgB1D7/l4fsp6BN2uEAlOxoatP+uvK1bUACQRdTtpRiFcympx4IEj5KWUC200zN6Ljmjm79ENaYYijQATr1LUwufHju1jwzMlJh4zG2QkI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=nF4q06WL; arc=none smtp.client-ip=209.85.210.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="nF4q06WL" Received: by mail-pf1-f179.google.com with SMTP id d2e1a72fcca58-6e459b39e2cso1730867b3a.1; Mon, 26 Feb 2024 06:36:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958195; x=1709562995; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=kQSm8Jft4e9xZ2lN58L7pK7AxP5HjLCoF4G+WEfOzXo=; b=nF4q06WL2nlJ8lk+eH0HWGgMVdOnXOqBdcrtooSo4KUVK5CAwLITu/aySuwE69iv/8 2PtPalPJOLOYpVIV/iCfGUZ7t4d0lfzulXFpEX1Y9dm+ndUAAAnuOD/2aQsgiNX/ibs0 O2ftsOVtG6qY77fnPJMeGRP7Fe8+OjlhuOwvCajTMqsg1v10tM3S6aP0ykF8KJsbMfPM Q1yTI3qBz64+OT48FVuY9C0qF48gde0RVfnG94csnlu9vfG/Sg6OxDCMl7VtOGOqStrF U9GUoFa96lQjgO8/m+tkzDjg7ImMsPRTrJ7C+2jghFV5ukI+nGLzNzlcNkt0vbKY/Y30 30Sw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958195; x=1709562995; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kQSm8Jft4e9xZ2lN58L7pK7AxP5HjLCoF4G+WEfOzXo=; b=egUghtLpERoJVZS5Jwn+8iBcP/f5qPaI8SvGzgFs2luLwrLnD91ZyJnvDbILT7Bfcv 1OEYfZcg6nNerrycMrzlrsEIbT1D/PTQNxpWiW8Piqb88yR0g/1s7jxcIwPni6kXar8D UPCTiVWyrPLpaJtRNTyQow5EddCGCyO+dzM3gtIy6A+jGU2BmoIaYG3mDq1gUFhljJuz dPQD1NwAb8QZ3ZOqXFP895I5zjknQ8KnToVa5gITt8Z++aBojm494MgZPHDJWLNpeWNr FUyPCznbr0pni5GAYEMaH5N0TrUuep2uFJHG4zOrJeRc8Zvn8ACTDqh47F4drTN5om7Z YKqA== X-Forwarded-Encrypted: i=1; AJvYcCXL0NyZX7Gs452WGBE6JIOBMXf1ail8vsjOujMeM0KtUNEtA6dR9QawoYzL7vAwjo4sdQSs/74WbFaODLA24CZfg5aH X-Gm-Message-State: AOJu0YxChEAQvMvyfS/MbhHn7R6cfcmoIVIvvuc2o4Jir7UKPuoExtbk NV2Xg9YPc+iIwj18OyJ917FlEXGWOAYEouR4cq67vsNGeuZtHm+vuTDhE89q X-Google-Smtp-Source: AGHT+IF9pJaU2X3yeMtQVJ4YQJDksTT8euB4/X95jZUNCKB+O09eLtY+VHN80jZNMntxB/oHCwGlcA== X-Received: by 2002:a17:902:dac9:b0:1dc:b16c:6406 with SMTP id q9-20020a170902dac900b001dcb16c6406mr1205885plx.6.1708958195290; Mon, 26 Feb 2024 06:36:35 -0800 (PST) Received: from localhost ([198.11.176.14]) by smtp.gmail.com with ESMTPSA id jd12-20020a170903260c00b001db63cfe07dsm3985445plb.283.2024.02.26.06.36.34 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:36:34 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 36/73] KVM: x86/PVM: Handle KVM hypercall Date: Mon, 26 Feb 2024 22:35:53 +0800 Message-Id: <20240226143630.33643-37-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan PVM uses the syscall instruction as the hypercall instruction, so r10 is used as a replacement for rcx since rcx is clobbered by the syscall. Additionally, the syscall is a trap and does not need to skip the hypercall instruction for PVM. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index a2602d9828a5..242c355fda8f 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -1221,6 +1221,18 @@ static int handle_synthetic_instruction_return_supervisor(struct kvm_vcpu *vcpu) return 1; } +static int handle_kvm_hypercall(struct kvm_vcpu *vcpu) +{ + int r; + + // In PVM, r10 is the replacement for rcx in hypercall + kvm_rcx_write(vcpu, kvm_r10_read(vcpu)); + r = kvm_emulate_hypercall_noskip(vcpu); + kvm_r10_write(vcpu, kvm_rcx_read(vcpu)); + + return r; +} + static int handle_exit_syscall(struct kvm_vcpu *vcpu) { struct vcpu_pvm *pvm = to_pvm(vcpu); @@ -1233,7 +1245,8 @@ static int handle_exit_syscall(struct kvm_vcpu *vcpu) return handle_synthetic_instruction_return_user(vcpu); if (rip == pvm->msr_rets_rip_plus2) return handle_synthetic_instruction_return_supervisor(vcpu); - return 1; + + return handle_kvm_hypercall(vcpu); } static int handle_exit_debug(struct kvm_vcpu *vcpu) From patchwork Mon Feb 26 14:35:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572321 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8DCDA12CD9D; Mon, 26 Feb 2024 14:36:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958201; cv=none; b=rdt0UKqa0NKfjO+7tZdr9qM40Gn3DCesCiPIawPR4bOkG9nwcvDOYKFU/1lCL4tkVL3IKNekJXTj/6mupN7MhuRtbE89mRY3bZVpEGTvKGJ4UgS0GV4XgvLEXNaimYumupT0x4VkN3GI34GjXL+RBY4zJLY00AFxsmyVcA11WhI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958201; c=relaxed/simple; bh=Pc+HjLDfkMboz1gri9VqQXOqgPnZdAnlcdum7tFbpTQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=JuLZ64R2Nl5Kh0TWd8FpglwPPvcDWvyLGh9EJgJBcLFqCLlycQfucH95yU8AvNS5gSkfzVwGGSKA3EAcc//7ZCDXbjcCLjqSBVXQQ3O4x+qavZI53YZe33gRMflhXARKsfZnU8TmHyT3BpOUmQ3mi9tupKC8xwdlhNSbApl3m/o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=GB1Mcbht; arc=none smtp.client-ip=209.85.210.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="GB1Mcbht" Received: by mail-pf1-f176.google.com with SMTP id d2e1a72fcca58-6e54451edc6so9825b3a.1; Mon, 26 Feb 2024 06:36:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958198; x=1709562998; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=PuLDsBuYqqEPmR/eGtiQciHTrvGTXGMZNjGacLqQFso=; b=GB1McbhtC3WDTZqvffoHZUu4vQ0W5EPZ25bpbIvK6jEwxnCZTl3eV/XjdXzGE0VwLH xXliAadMbDVB/3F/DdirwBEoPULAKReZqpbo+95gvLW80yWonnOnzJH/56yv9exjgSua 0pCXTb8GR30jS8oSugX8GPh+lJ9KMpGnHLhmz7oxlnTInNEMjevLOlsPCGCWPN89eD1W JeoOlLpQMCnm9vyF5yW0U42qIBCFBWc3DDqxuPhiK/Oly8iX6bbEdfTnBHegvmVdX3ue XUcE91U7Gfh+kPpKtK4o+XqdP+Ml4Wcz4NOVHsNYEb5b3EpTaCtcZYEbI3hR7leXqkfT ygCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958198; x=1709562998; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PuLDsBuYqqEPmR/eGtiQciHTrvGTXGMZNjGacLqQFso=; b=HQzPTfEKceOolE8GlYfXlmHtFhbP6Jc7KM0HCWLhnaaeChi/ooMEd89xs9G/zKWpiy grmy5vivXXyBck+bKCEIqhTtN5Lv3TyydNnNunwa+5Npv9FZA79om+eKvpQhJsRaQ4Xx 9rbHz409ORP4rcc7ADNV+D5nf4/+TKWBBerGY7zeBJLjlKxRPMLUBcDIz0lMTLL69LUN 4rDVWCjVWBZJvcWeqPY1Bu5agyZHFnPERNVCkDdpDh/bec+Blp4X04m9VkM2qrNr/yLz 0U6sty+xVUCEmC2yFyS0Pd0qoHcAnjdAfnDEkQnwryTNvQOQbyx78l5Lfy5OW/FPIsS5 ck/A== X-Forwarded-Encrypted: i=1; AJvYcCUjK5wCiuC/CytL0CXjUC/pH1EqMf7PPtcBEbv72rxcIPdrhOZfeu916c40rm5pehYkl6RP7opu4MC7Qrb1kytMn7lf X-Gm-Message-State: AOJu0YxZfxLyRNuMgHFamxJ2hRPIhF69i5R3PPC4sFtcIcIfHp4QwSXM Cue5npnLZJDERbeZ31kOSjiVoNj3ptQP8+K2mTlCW/tvn50Uil+3wA12QhS2 X-Google-Smtp-Source: AGHT+IFQ/3bBQbOGHcBQV+UZbU9IzhmalAj69TR3Xp69m0QhVzQTXpB57NuE5HaVWReOIpYQxxz5xg== X-Received: by 2002:a05:6a21:2d09:b0:1a0:fd3e:532c with SMTP id tw9-20020a056a212d0900b001a0fd3e532cmr4098092pzb.17.1708958198571; Mon, 26 Feb 2024 06:36:38 -0800 (PST) Received: from localhost ([198.11.178.15]) by smtp.gmail.com with ESMTPSA id c9-20020a056a00008900b006e4452bd4c6sm4114532pfj.157.2024.02.26.06.36.37 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:36:38 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 37/73] KVM: x86/PVM: Use host PCID to reduce guest TLB flushing Date: Mon, 26 Feb 2024 22:35:54 +0800 Message-Id: <20240226143630.33643-38-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan Since the host doesn't use all PCIDs, PVM can utilize the host PCID to reduce guest TLB flushing. The PCID allocation algorithm in PVM is similar to that of the host. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 228 ++++++++++++++++++++++++++++++++++++++++- arch/x86/kvm/pvm/pvm.h | 5 + 2 files changed, 232 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index 242c355fda8f..2d3785e7f2f3 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -349,6 +349,211 @@ static void pvm_switch_to_host(struct vcpu_pvm *pvm) preempt_enable(); } +struct host_pcid_one { + /* + * It is struct vcpu_pvm *pvm, but it is not allowed to be + * dereferenced since it might be freed. + */ + void *pvm; + u64 root_hpa; +}; + +struct host_pcid_state { + struct host_pcid_one pairs[NUM_HOST_PCID_FOR_GUEST]; + int evict_next_round_robin; +}; + +static DEFINE_PER_CPU(struct host_pcid_state, pvm_tlb_state); + +static void host_pcid_flush_all(struct vcpu_pvm *pvm) +{ + struct host_pcid_state *tlb_state = this_cpu_ptr(&pvm_tlb_state); + int i; + + for (i = 0; i < NUM_HOST_PCID_FOR_GUEST; i++) { + if (tlb_state->pairs[i].pvm == pvm) + tlb_state->pairs[i].pvm = NULL; + } +} + +static inline unsigned int host_pcid_to_index(unsigned int host_pcid) +{ + return host_pcid & ~HOST_PCID_TAG_FOR_GUEST; +} + +static inline int index_to_host_pcid(int index) +{ + return index | HOST_PCID_TAG_FOR_GUEST; +} + +/* + * Free the uncached guest pcid (not in mmu->root nor mmu->prev_root), so + * that the next allocation would not evict a clean one. + * + * It would be better if kvm.ko notifies us when a root_pgd is freed + * from the cache. + * + * Returns a freed index or -1 if nothing is freed. + */ +static int host_pcid_free_uncached(struct vcpu_pvm *pvm) +{ + /* It is allowed to do nothing. */ + return -1; +} + +/* + * Get a host pcid of the current pCPU for the specific guest pgd. + * PVM vTLB is guest pgd tagged. + */ +static int host_pcid_get(struct vcpu_pvm *pvm, u64 root_hpa, bool *flush) +{ + struct host_pcid_state *tlb_state = this_cpu_ptr(&pvm_tlb_state); + int i, j = -1; + + /* find if it is allocated. */ + for (i = 0; i < NUM_HOST_PCID_FOR_GUEST; i++) { + struct host_pcid_one *tlb = &tlb_state->pairs[i]; + + if (tlb->root_hpa == root_hpa && tlb->pvm == pvm) + return index_to_host_pcid(i); + + /* if it has no owner, allocate it if not found. */ + if (!tlb->pvm) + j = i; + } + + /* + * Fallback to: + * use the fallback recorded in the above loop. + * use a freed uncached. + * evict one (which might be still usable) by round-robin policy. + */ + if (j < 0) + j = host_pcid_free_uncached(pvm); + if (j < 0) { + j = tlb_state->evict_next_round_robin; + if (++tlb_state->evict_next_round_robin == NUM_HOST_PCID_FOR_GUEST) + tlb_state->evict_next_round_robin = 0; + } + + /* associate the host pcid to the guest */ + tlb_state->pairs[j].pvm = pvm; + tlb_state->pairs[j].root_hpa = root_hpa; + + *flush = true; + return index_to_host_pcid(j); +} + +static void host_pcid_free(struct vcpu_pvm *pvm, u64 root_hpa) +{ + struct host_pcid_state *tlb_state = this_cpu_ptr(&pvm_tlb_state); + int i; + + for (i = 0; i < NUM_HOST_PCID_FOR_GUEST; i++) { + struct host_pcid_one *tlb = &tlb_state->pairs[i]; + + if (tlb->root_hpa == root_hpa && tlb->pvm == pvm) { + tlb->pvm = NULL; + return; + } + } +} + +static inline void *host_pcid_owner(int host_pcid) +{ + return this_cpu_read(pvm_tlb_state.pairs[host_pcid_to_index(host_pcid)].pvm); +} + +static inline u64 host_pcid_root(int host_pcid) +{ + return this_cpu_read(pvm_tlb_state.pairs[host_pcid_to_index(host_pcid)].root_hpa); +} + +static void __pvm_hwtlb_flush_all(struct vcpu_pvm *pvm) +{ + if (static_cpu_has(X86_FEATURE_PCID)) + host_pcid_flush_all(pvm); +} + +static void pvm_flush_hwtlb(struct kvm_vcpu *vcpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + + get_cpu(); + __pvm_hwtlb_flush_all(pvm); + put_cpu(); +} + +static void pvm_flush_hwtlb_guest(struct kvm_vcpu *vcpu) +{ + /* + * flushing hwtlb for guest only when: + * change to the shadow page table. + * reused an used (guest) pcid. + * change to the shadow page table always results flushing hwtlb + * and PVM uses pgd tagged tlb. + * + * So no hwtlb needs to be flushed here. + */ +} + +static void pvm_flush_hwtlb_current(struct kvm_vcpu *vcpu) +{ + /* No flush required if the current context is invalid. */ + if (!VALID_PAGE(vcpu->arch.mmu->root.hpa)) + return; + + if (static_cpu_has(X86_FEATURE_PCID)) { + get_cpu(); + host_pcid_free(to_pvm(vcpu), vcpu->arch.mmu->root.hpa); + put_cpu(); + } +} + +static void pvm_flush_hwtlb_gva(struct kvm_vcpu *vcpu, gva_t addr) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + int max = MIN_HOST_PCID_FOR_GUEST + NUM_HOST_PCID_FOR_GUEST; + int i; + + if (!static_cpu_has(X86_FEATURE_PCID)) + return; + + get_cpu(); + if (!this_cpu_has(X86_FEATURE_INVPCID)) { + host_pcid_flush_all(pvm); + put_cpu(); + return; + } + + host_pcid_free_uncached(pvm); + for (i = MIN_HOST_PCID_FOR_GUEST; i < max; i++) { + if (host_pcid_owner(i) == pvm) + invpcid_flush_one(i, addr); + } + + put_cpu(); +} + +static void pvm_set_host_cr3_for_guest_with_host_pcid(struct vcpu_pvm *pvm) +{ + u64 root_hpa = pvm->vcpu.arch.mmu->root.hpa; + bool flush = false; + u32 host_pcid = host_pcid_get(pvm, root_hpa, &flush); + u64 hw_cr3 = root_hpa | host_pcid; + + if (!flush) + hw_cr3 |= CR3_NOFLUSH; + this_cpu_write(cpu_tss_rw.tss_ex.enter_cr3, hw_cr3); +} + +static void pvm_set_host_cr3_for_guest_without_host_pcid(struct vcpu_pvm *pvm) +{ + u64 root_hpa = pvm->vcpu.arch.mmu->root.hpa; + + this_cpu_write(cpu_tss_rw.tss_ex.enter_cr3, root_hpa); +} + static void pvm_set_host_cr3_for_hypervisor(struct vcpu_pvm *pvm) { unsigned long cr3; @@ -365,7 +570,11 @@ static void pvm_set_host_cr3_for_hypervisor(struct vcpu_pvm *pvm) static void pvm_set_host_cr3(struct vcpu_pvm *pvm) { pvm_set_host_cr3_for_hypervisor(pvm); - this_cpu_write(cpu_tss_rw.tss_ex.enter_cr3, pvm->vcpu.arch.mmu->root.hpa); + + if (static_cpu_has(X86_FEATURE_PCID)) + pvm_set_host_cr3_for_guest_with_host_pcid(pvm); + else + pvm_set_host_cr3_for_guest_without_host_pcid(pvm); } static void pvm_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, @@ -391,6 +600,9 @@ static void pvm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) __this_cpu_write(active_pvm_vcpu, pvm); + if (vcpu->cpu != cpu) + __pvm_hwtlb_flush_all(pvm); + indirect_branch_prediction_barrier(); } @@ -398,6 +610,7 @@ static void pvm_vcpu_put(struct kvm_vcpu *vcpu) { struct vcpu_pvm *pvm = to_pvm(vcpu); + host_pcid_free_uncached(pvm); pvm_prepare_switch_to_host(pvm); } @@ -2086,6 +2299,11 @@ static struct kvm_x86_ops pvm_x86_ops __initdata = { .set_rflags = pvm_set_rflags, .get_if_flag = pvm_get_if_flag, + .flush_tlb_all = pvm_flush_hwtlb, + .flush_tlb_current = pvm_flush_hwtlb_current, + .flush_tlb_gva = pvm_flush_hwtlb_gva, + .flush_tlb_guest = pvm_flush_hwtlb_guest, + .vcpu_pre_run = pvm_vcpu_pre_run, .vcpu_run = pvm_vcpu_run, .handle_exit = pvm_handle_exit, @@ -2152,8 +2370,16 @@ static void pvm_exit(void) } module_exit(pvm_exit); +#define TLB_NR_DYN_ASIDS 6 + static int __init hardware_cap_check(void) { + BUILD_BUG_ON(MIN_HOST_PCID_FOR_GUEST <= TLB_NR_DYN_ASIDS); +#ifdef CONFIG_PAGE_TABLE_ISOLATION + BUILD_BUG_ON((MIN_HOST_PCID_FOR_GUEST + NUM_HOST_PCID_FOR_GUEST) >= + (1 << X86_CR3_PTI_PCID_USER_BIT)); +#endif + /* * switcher can't be used when KPTI. See the comments above * SWITCHER_SAVE_AND_SWITCH_TO_HOST_CR3 diff --git a/arch/x86/kvm/pvm/pvm.h b/arch/x86/kvm/pvm/pvm.h index 4cdcbed1c813..31060831e009 100644 --- a/arch/x86/kvm/pvm/pvm.h +++ b/arch/x86/kvm/pvm/pvm.h @@ -28,6 +28,11 @@ extern u64 *host_mmu_root_pgd; void host_mmu_destroy(void); int host_mmu_init(void); +#define HOST_PCID_TAG_FOR_GUEST (32) + +#define MIN_HOST_PCID_FOR_GUEST HOST_PCID_TAG_FOR_GUEST +#define NUM_HOST_PCID_FOR_GUEST HOST_PCID_TAG_FOR_GUEST + struct vcpu_pvm { struct kvm_vcpu vcpu; From patchwork Mon Feb 26 14:35:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572322 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CA9351386D8; Mon, 26 Feb 2024 14:36:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958204; cv=none; b=Eado+H3YMW6cdMS0OgXnXwBdXjl0hDPmVNDe7VTtjE/kxdG+7iBvg+/I6JUE/x4EzEtf+wUqMvDhLW2V6myem+13N2uDT9b3xlV5muRmfVnNQI9TaNe7eZMhIIliA4n7388MDcjxEqMHU2DRjotDyXiQhIzA/2Zm30ZwqBA+nlc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958204; c=relaxed/simple; bh=l2h4Uf1QXsvH+X3OivH8HCuXCYpvpQ9MT+dXZ1HPxRU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=tzsruJDOY6T8liDuXGFQIyJkOdOmNm+27O5zCdLi9ea+d1/qTP5kE9TfrQcnDRMDAT2x5eDhwtOJW8NHEQ7kJlYJ2+vJNWu8lRLe4UCB1YMRnFAaaTJLC3ZgEMvL96ciliHaEzuTYDACtoF8OMH00wSlFAjvscEyXcSZNZ1b5jg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=eflfwOSF; arc=none smtp.client-ip=209.85.214.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="eflfwOSF" Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-1dcb3e6ff3fso732925ad.2; Mon, 26 Feb 2024 06:36:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958202; x=1709563002; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vE7TUqSC7cygcOHG2v4FLYxkyOZYozS8dRRg73DCalE=; b=eflfwOSFX0JBTEgRDJ2aEmSryMAjCO8a4x9oVLL4x4WFrb42kALxfdv1vIAKuOm1nj EYnUad5dl8qMZdDun9jYqTRpaBAsnKWaiT2NrKJg+izvicYGBFEYXsQ2r5JpnAW6A8tx r4r7+Gu85qQR5eux0BsAdOhuKjQ71R7i3FemLnFkG6aiIBCmpRBwiUpe8y70b+/UozgA sSPC8048Khk9HHph1sdpGS/RCj6Vs1QZcD5uAjoa5GbaRuSZTMHwZuUh5AaBv0xFs8Lt rnJRO9hajnTNIIHxXGiHl0JZDPpa8Pugz/4Qa9rLmXIi82h058KYpie0fR3LiVXQrSMU 9sYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958202; x=1709563002; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vE7TUqSC7cygcOHG2v4FLYxkyOZYozS8dRRg73DCalE=; b=RYhyc+XCq2Qr+7WRvo3PKIi0C5bI+y/qwXlVlfYCuN30FC4sHesFzlUbX4Ui/RUjAR MMX3u4iqzXbs/v0kpRAtwimG/kWljjFjvdqQ60pzvM6XC1vihohh6nIv9/7qkYyGU1tA +D2rqkhxUtTBZYENzC0C6TXOo6TWh2U0VGCD5Tl/D/GkcW7YBZbZIWgCM2SoGQF32aIw qevBQE2P6K+P1PPwmNZ8bfLArLRoYXzscUhwhZcPoTVKfeYXik0C6XnJljAJlL9tXya3 tiOgVMImfB6sniEY2TnaB7AW3MG2Kk9CJs/E4Mh2GZyXvq3ibNoi8B6tyDEv6ojCo1jP UFOg== X-Forwarded-Encrypted: i=1; AJvYcCVIXVZPW6o3S31OnzNqX0EnZ5zrtyZEA8zGxOczdNhkrfwhujB4XnBU1MkOu1JI53NkHWyaueVnTydGz9uYJKGn7zpG X-Gm-Message-State: AOJu0YxlJyreDthb8GsrKgo/QafP0XB/KFZkO52VE0pXKZfesRAw+izP pJZQ51n3owCP7ZWSB+XmhMMGTWD0WeTeeNsp3aT6gmmvChmRDrlaV0JZY2lK X-Google-Smtp-Source: AGHT+IFk3I4v14i2M5tN+kC8DAwFWEOf84zfaUbuMiHnbH8kQScS7ykZALTJJr9O2miKstRcIrHpXw== X-Received: by 2002:a17:902:e545:b0:1db:7052:2f62 with SMTP id n5-20020a170902e54500b001db70522f62mr7606398plf.50.1708958201844; Mon, 26 Feb 2024 06:36:41 -0800 (PST) Received: from localhost ([47.88.5.130]) by smtp.gmail.com with ESMTPSA id x2-20020a170902820200b001da34166cd2sm4013128pln.180.2024.02.26.06.36.41 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:36:41 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 38/73] KVM: x86/PVM: Handle hypercalls for privilege instruction emulation Date: Mon, 26 Feb 2024 22:35:55 +0800 Message-Id: <20240226143630.33643-39-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan The privileged instructions in the PVM guest will be trapped and emulated. To reduce the emulation overhead, some privileged instructions in the hot path, such as RDMSR/WRMSR and TLB flushing related instructions, will be replaced by hypercalls to improve performance. The handling of those hypercalls is the same as the associated privileged instruction emulation. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 114 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 113 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index 2d3785e7f2f3..8d8c783c72b5 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -1434,6 +1434,96 @@ static int handle_synthetic_instruction_return_supervisor(struct kvm_vcpu *vcpu) return 1; } +static int handle_hc_interrupt_window(struct kvm_vcpu *vcpu) +{ + kvm_make_request(KVM_REQ_EVENT, vcpu); + pvm_event_flags_update(vcpu, 0, PVM_EVENT_FLAGS_IP); + + ++vcpu->stat.irq_window_exits; + return 1; +} + +static int handle_hc_irq_halt(struct kvm_vcpu *vcpu) +{ + kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) | X86_EFLAGS_IF); + + return kvm_emulate_halt_noskip(vcpu); +} + +static void pvm_flush_tlb_guest_current_kernel_user(struct kvm_vcpu *vcpu) +{ + /* + * sync the current pgd and user_pgd (pvm->msr_switch_cr3) + * which is a subset work of KVM_REQ_TLB_FLUSH_GUEST. + */ + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu); +} + +/* + * Hypercall: PVM_HC_TLB_FLUSH + * Flush all TLBs. + */ +static int handle_hc_flush_tlb_all(struct kvm_vcpu *vcpu) +{ + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu); + + return 1; +} + +/* + * Hypercall: PVM_HC_TLB_FLUSH_CURRENT + * Flush all TLBs tagged with the current CR3 and MSR_PVM_SWITCH_CR3. + */ +static int handle_hc_flush_tlb_current_kernel_user(struct kvm_vcpu *vcpu) +{ + pvm_flush_tlb_guest_current_kernel_user(vcpu); + + return 1; +} + +/* + * Hypercall: PVM_HC_TLB_INVLPG + * Flush TLBs associated with a single address for all tags. + */ +static int handle_hc_invlpg(struct kvm_vcpu *vcpu, unsigned long addr) +{ + kvm_mmu_invlpg(vcpu, addr); + + return 1; +} + +/* + * Hypercall: PVM_HC_RDMSR + * Write MSR. + * Return with RAX = the MSR value if succeeded. + * Return with RAX = 0 if it failed. + */ +static int handle_hc_rdmsr(struct kvm_vcpu *vcpu, u32 index) +{ + u64 value = 0; + + kvm_get_msr(vcpu, index, &value); + kvm_rax_write(vcpu, value); + + return 1; +} + +/* + * Hypercall: PVM_HC_WRMSR + * Write MSR. + * Return with RAX = 0 if succeeded. + * Return with RAX = -EIO if it failed + */ +static int handle_hc_wrmsr(struct kvm_vcpu *vcpu, u32 index, u64 value) +{ + if (kvm_set_msr(vcpu, index, value)) + kvm_rax_write(vcpu, -EIO); + else + kvm_rax_write(vcpu, 0); + + return 1; +} + static int handle_kvm_hypercall(struct kvm_vcpu *vcpu) { int r; @@ -1450,6 +1540,7 @@ static int handle_exit_syscall(struct kvm_vcpu *vcpu) { struct vcpu_pvm *pvm = to_pvm(vcpu); unsigned long rip = kvm_rip_read(vcpu); + unsigned long a0, a1; if (!is_smod(pvm)) return do_pvm_user_event(vcpu, PVM_SYSCALL_VECTOR, false, 0); @@ -1459,7 +1550,28 @@ static int handle_exit_syscall(struct kvm_vcpu *vcpu) if (rip == pvm->msr_rets_rip_plus2) return handle_synthetic_instruction_return_supervisor(vcpu); - return handle_kvm_hypercall(vcpu); + a0 = kvm_rbx_read(vcpu); + a1 = kvm_r10_read(vcpu); + + // handle hypercall, check it for pvm hypercall and then kvm hypercall + switch (kvm_rax_read(vcpu)) { + case PVM_HC_IRQ_WIN: + return handle_hc_interrupt_window(vcpu); + case PVM_HC_IRQ_HALT: + return handle_hc_irq_halt(vcpu); + case PVM_HC_TLB_FLUSH: + return handle_hc_flush_tlb_all(vcpu); + case PVM_HC_TLB_FLUSH_CURRENT: + return handle_hc_flush_tlb_current_kernel_user(vcpu); + case PVM_HC_TLB_INVLPG: + return handle_hc_invlpg(vcpu, a0); + case PVM_HC_RDMSR: + return handle_hc_rdmsr(vcpu, a0); + case PVM_HC_WRMSR: + return handle_hc_wrmsr(vcpu, a0, a1); + default: + return handle_kvm_hypercall(vcpu); + } } static int handle_exit_debug(struct kvm_vcpu *vcpu) From patchwork Mon Feb 26 14:35:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572323 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BC0D013957D; Mon, 26 Feb 2024 14:36:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958207; cv=none; b=b/7STdHRXFy652//vztELCH3Qw6jsI9hECkMwqQbn08JoukQinEoXj1soY2/X3/gx4POJsKjDtjjHtOIHCz5ZEfIAGtQFVgLzaEjIJC7OKKdY/2UwBBJ8b4irTYCfDqQM8gT4S975vZ4TkJ99xXGBeTRTDTjQ9U3iS2EjE7Q/ec= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958207; c=relaxed/simple; bh=2EvDgfFMmHJpG+Kyi4MXHgiwVk6T3MZMRKHRe3Ga2WY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=WDorP+ey4u5aQH85A8Jd6ey6kiWRmI2mBbLzDbW1R9RT+G3hf4KBmHYVNQAJHsvZJpYEz9wS4vSNtEnBN7ep8wm7sL5wGyfZfz/VTmLhY1lJFiYSqGlWnypkHvT+F5rOubzJ3naY6WnRezuMMTgjQKenyo6gC0jYbH1Lg+IO00g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=NT6QslRW; arc=none smtp.client-ip=209.85.210.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="NT6QslRW" Received: by mail-pf1-f181.google.com with SMTP id d2e1a72fcca58-6e4c359e48aso1843557b3a.1; Mon, 26 Feb 2024 06:36:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958205; x=1709563005; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DkYvm4imcF9izb+MCYYTd7W/K0PihGpy1Nsmpgapn0o=; b=NT6QslRWtANR2d4Ug4ofmXD5nRqzASuXvikBU1TIpT2OpKX6UWg9EiYVVjL5PbbD5C TmDjfFRCi9ihAiQz0yQjgWoB5O8ZsWSKfEE8DLPrrRI2BBzyZ44fkQ1i6V0naBhtDRQk XI8cZWYbYccN21jAocswxBqj7P6WRFoOrEgg4apn4XGavX7/Qiv48rRizzyD4KC7WVf3 a7eHw+hYUqHMgjw4mlKL4dDOMyXmMX3iV75zWU8qdhshTLt3751Z4dGSlq1YHFcb1nzB b/vttIIBM1TcRiWRbuUEmMpXbgKqwc6lbWpn/bX7JZqiTwGHMr1rYzK6eaDYJiPpADf6 KcnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958205; x=1709563005; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DkYvm4imcF9izb+MCYYTd7W/K0PihGpy1Nsmpgapn0o=; b=SM0XOtLqaTy9zRyiMZieLvhZxb3TwajhG33Ukng6jDN18slGQP/7qa2BlAW9DSMnIU kmrDKAUraeLSfGRqrZBb06yPYEE6MFgQij6TMmYy+GFJI3jE0Upgaq0aMVnXu3RzKseE mgp3909Y24qsMzWtCYqSU4rVobTRXgnkw/wrM8nM8FeEjGsr0CU3cheQtufW01Qv3zYj JgrzkL2F4k9j8T6D3lh3RV3Tgle0XPT5BaJuaBzf2nxXF+TiqjfWGkZpfITyRSvg17gZ 1V3ddRMMw+2fT2nj8HLitv9uiZFWz3ixAiGNV9EIZZftT/ZJzgdKILsCft8sT4m72IPR qpvg== X-Forwarded-Encrypted: i=1; AJvYcCXPgUcagAo/w1WfEhvtN0NyhoyCwivUduOkJ6LZkRkLYX21xTMJfTFw67XDORRpBB+ep0Fpama2QZru/P10GOgHPaXf X-Gm-Message-State: AOJu0YyjWBePJ+n776zMwA57o7a0UmGshCvXvUXg95Hbk+8bEgGHZ1ku opF4g5Knga6khk6ZM7KVw/yx/55/R3qTkuPn4XUjDW7kHsILJ+aBGoUawmjL X-Google-Smtp-Source: AGHT+IFX6Q1PAX5b88PnbLkr1fmmS4bOoCFR7ZFSnjGP+cqhwuCSK+KLG6OQGg7tbsK8DsGnXYHjxA== X-Received: by 2002:a05:6a20:2114:b0:1a0:e4a6:2d86 with SMTP id y20-20020a056a20211400b001a0e4a62d86mr6549874pzy.59.1708958204920; Mon, 26 Feb 2024 06:36:44 -0800 (PST) Received: from localhost ([47.89.225.180]) by smtp.gmail.com with ESMTPSA id n7-20020aa78a47000000b006e50bbf4e71sm2634040pfa.9.2024.02.26.06.36.44 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:36:44 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 39/73] KVM: x86/PVM: Handle hypercall for CR3 switching Date: Mon, 26 Feb 2024 22:35:56 +0800 Message-Id: <20240226143630.33643-40-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan If the guest uses the same page table for supervisor mode and user mode, then the user mode can access the supervisor mode address space. Therefore, for safety, the guest needs to provide two different page tables for one process, which is similar to KPTI. When switching CR3 during the process switching, the guest uses the hypercall to provide the two page tables for the hypervisor, and then the hypervisor can switch CR3 during the mode switch automatically. Additionally, an extra flag is introduced to perform TLB flushing at the same time. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 41 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 40 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index 8d8c783c72b5..ad08643c098a 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -1459,6 +1459,42 @@ static void pvm_flush_tlb_guest_current_kernel_user(struct kvm_vcpu *vcpu) kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu); } +/* + * Hypercall: PVM_HC_LOAD_PGTBL + * Load two PGDs into the current CR3 and MSR_PVM_SWITCH_CR3. + * + * Arguments: + * flags: bit0: flush the TLBs tagged with @pgd and @user_pgd. + * bit1: 4 (bit1=0) or 5 (bit1=1 && cpuid_has(LA57)) level paging. + * pgd: to be loaded into CR3. + * user_pgd: to be loaded into MSR_PVM_SWITCH_CR3. + */ +static int handle_hc_load_pagetables(struct kvm_vcpu *vcpu, unsigned long flags, + unsigned long pgd, unsigned long user_pgd) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + unsigned long cr4 = vcpu->arch.cr4; + + if (!(flags & 2)) + cr4 &= ~X86_CR4_LA57; + else if (guest_cpuid_has(vcpu, X86_FEATURE_LA57)) + cr4 |= X86_CR4_LA57; + + if (cr4 != vcpu->arch.cr4) { + vcpu->arch.cr4 = cr4; + kvm_mmu_reset_context(vcpu); + } + + kvm_mmu_new_pgd(vcpu, pgd); + vcpu->arch.cr3 = pgd; + pvm->msr_switch_cr3 = user_pgd; + + if (flags & 1) + pvm_flush_tlb_guest_current_kernel_user(vcpu); + + return 1; +} + /* * Hypercall: PVM_HC_TLB_FLUSH * Flush all TLBs. @@ -1540,7 +1576,7 @@ static int handle_exit_syscall(struct kvm_vcpu *vcpu) { struct vcpu_pvm *pvm = to_pvm(vcpu); unsigned long rip = kvm_rip_read(vcpu); - unsigned long a0, a1; + unsigned long a0, a1, a2; if (!is_smod(pvm)) return do_pvm_user_event(vcpu, PVM_SYSCALL_VECTOR, false, 0); @@ -1552,6 +1588,7 @@ static int handle_exit_syscall(struct kvm_vcpu *vcpu) a0 = kvm_rbx_read(vcpu); a1 = kvm_r10_read(vcpu); + a2 = kvm_rdx_read(vcpu); // handle hypercall, check it for pvm hypercall and then kvm hypercall switch (kvm_rax_read(vcpu)) { @@ -1559,6 +1596,8 @@ static int handle_exit_syscall(struct kvm_vcpu *vcpu) return handle_hc_interrupt_window(vcpu); case PVM_HC_IRQ_HALT: return handle_hc_irq_halt(vcpu); + case PVM_HC_LOAD_PGTBL: + return handle_hc_load_pagetables(vcpu, a0, a1, a2); case PVM_HC_TLB_FLUSH: return handle_hc_flush_tlb_all(vcpu); case PVM_HC_TLB_FLUSH_CURRENT: From patchwork Mon Feb 26 14:35:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572324 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D366513A24B; Mon, 26 Feb 2024 14:36:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958210; cv=none; b=mAic7tQJamEvk2mUX4yXWLu8evAynvgsBjjpNLqcyqxnvbZy6L4x8b3kctTewg5oAc4CyILsH7CoNlZqm4oU7t3dw+VgE8cqyXIqL+qtfZnmoduFmJBD1L1iciQ0uEc4VNb8jmIQrDwdXqKQvPvoTE8uokePNQLKD9mxxJCnK9c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958210; c=relaxed/simple; bh=Ck1F5Gy12w7sgnoP3E6l6/tEBdI4swyAdKim2OPHJkU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=q01tUHmz1uz9lemeYdA1o33qHOx3YTQ+nVIq0Vh+OKs9oKPwmJ5wn+9ReGIawmt2tA4m7wK/8zZJ3FIcuXkb8qDBrIKW5GEu4mNBCVXsCuhNkrBa14caoBPsbWOl4Op5wC+7Z3mkxORjOtPUOLgXZqxDM9yWyBYTLqP4Y9pj6w4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=BcY5fNCO; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="BcY5fNCO" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-1dc96f64c10so9032825ad.1; Mon, 26 Feb 2024 06:36:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958208; x=1709563008; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ruju1XhRDDaW6b1WJ3BMmV9hOYmSG+m9SE28NxmRmbc=; b=BcY5fNCO7iH1NAuaKAnSOMKpaH94O9SyXnF1k6O5byIUZ/TY1tiGwDQROgXk0OTRp8 CHdXOLqY9uJgYm6JQYAkgYlZ7XBd/hcsbB5RMM0pP4JgmkZDb6kWekx9fFezbzB1h1Lt bnRASG2/nu9dVfe32vdrX0Wxq/KbywZRMTKiafgylBoWp/kTYxh0NtSIWpTO93vBvuux 2WtRd6INLd7QIcXWof+az/WSD33M61MWFYBPVmc8GJd8fMGA1Wlg/E/Pk5ZDwslCNYo/ Z1BNxPpv0uMLksNXdH1p2Vfna98S8PcpAia7WLVSv6nWjiD9PKk/HzPcbgUYZCw/jUPp +18g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958208; x=1709563008; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ruju1XhRDDaW6b1WJ3BMmV9hOYmSG+m9SE28NxmRmbc=; b=WMj0LQRLx8Qg2wB/m79h6rgin7iJ6rZeChf072Bxq4Xiijdk/FWxGqC1HM7/k+SnIm CMDUgtMKLyluFbs5TbJmp2TPqgd2VVdMpTgxQ/gW199tqVWF/blTwG/BxRR6fhe0XU3E Kb81Fq58J0+iSC3qI2ZOtmQ5EGl+Pnrrs79uvLFt/skZhuqWX46rxMkRJzEzFE7ny/7g 3aH2wlMZnw9zvv+7Ug+ptLagBASl6RrLNTn6SND8AWU7pMB/HePRi4t1f5I4xLosv7hj 4Ub3uM1+KJWd6EW6Q5FFJ3gvP4uzp7ySkB8Zz+4NhrlIsmlJT8CvPnDbs64R0jUSUkTP ALGw== X-Forwarded-Encrypted: i=1; AJvYcCVIMGBO1ACj2Jbh/Gpgsy+JsELC5OFTriEx9fYmdvvYPHXiC4ojitChfwC2ZDpKCUSD556ylW3RG8p5lkSk0uHomCF7 X-Gm-Message-State: AOJu0YzMXJmVh9/SwSSTuyxzVKrivN+iD4Rj7hmJO6vg7lFVNXUFLNh7 5Gl+F/JoJW0ShDru9mdW7p/WfK/5zybdW1m9+Wg//0rPSzbNwikiltI8xiMt X-Google-Smtp-Source: AGHT+IE7smdj7NwrDT3NNyZiEHYf8vaUn/EqAmZtnBzXvryKXBZoXfIAi/pXWEgFE4lZlYJ9YXmLmw== X-Received: by 2002:a17:902:d4c7:b0:1dc:94f6:3326 with SMTP id o7-20020a170902d4c700b001dc94f63326mr4099780plg.18.1708958207963; Mon, 26 Feb 2024 06:36:47 -0800 (PST) Received: from localhost ([47.88.5.130]) by smtp.gmail.com with ESMTPSA id d5-20020a170902b70500b001dcabe7a182sm1219740pls.161.2024.02.26.06.36.47 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:36:47 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 40/73] KVM: x86/PVM: Handle hypercall for loading GS selector Date: Mon, 26 Feb 2024 22:35:57 +0800 Message-Id: <20240226143630.33643-41-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan SWAPGS is not supported in PVM, so the native load_gs_index() cannot be used in the guest. Therefore, a hypercall is introduced to load the GS selector into the GS segment register, and the resulting GS base is returned to the guest. This is prepared for supporting 32-bit processes. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 71 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index ad08643c098a..ee55e99fb204 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -1528,6 +1528,75 @@ static int handle_hc_invlpg(struct kvm_vcpu *vcpu, unsigned long addr) return 1; } +/* + * Hypercall: PVM_HC_LOAD_GS + * Load %gs with the selector %rdi and load the resulted base address + * into RAX. + * + * If %rdi is an invalid selector (including RPL != 3), NULL selector + * will be used instead. + * + * Return the resulted GS BASE in vCPU's RAX. + */ +static int handle_hc_load_gs(struct kvm_vcpu *vcpu, unsigned short sel) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + unsigned long guest_kernel_gs_base; + + /* Use NULL selector if RPL != 3. */ + if (sel != 0 && (sel & 3) != 3) + sel = 0; + + /* Protect the guest state on the hardware. */ + preempt_disable(); + + /* + * Switch to the guest state because the CPU is going to set the %gs to + * the guest value. Save the original guest MSR_GS_BASE if it is + * already the guest state. + */ + if (!pvm->loaded_cpu_state) + pvm_prepare_switch_to_guest(vcpu); + else + __save_gs_base(pvm); + + /* + * Load sel into %gs, which also changes the hardware MSR_KERNEL_GS_BASE. + * + * Before load_gs_index(sel): + * hardware %gs: old gs index + * hardware MSR_KERNEL_GS_BASE: guest MSR_GS_BASE + * + * After load_gs_index(sel); + * hardware %gs: resulted %gs, @sel or NULL + * hardware MSR_KERNEL_GS_BASE: resulted GS BASE + * + * The resulted %gs is the new guest %gs and will be saved into + * pvm->segments[VCPU_SREG_GS].selector later when the CPU is + * switching to host or the guest %gs is read (pvm_get_segment()). + * + * The resulted hardware MSR_KERNEL_GS_BASE will be returned via RAX + * to the guest and the hardware MSR_KERNEL_GS_BASE, which represents + * the guest MSR_GS_BASE when in VM-Exit state, is restored back to + * the guest MSR_GS_BASE. + */ + load_gs_index(sel); + + /* Get the resulted guest MSR_KERNEL_GS_BASE. */ + rdmsrl(MSR_KERNEL_GS_BASE, guest_kernel_gs_base); + + /* Restore the guest MSR_GS_BASE into the hardware MSR_KERNEL_GS_BASE. */ + __load_gs_base(pvm); + + /* Finished access to the guest state on the hardware. */ + preempt_enable(); + + /* Return RAX with the resulted GS BASE. */ + kvm_rax_write(vcpu, guest_kernel_gs_base); + + return 1; +} + /* * Hypercall: PVM_HC_RDMSR * Write MSR. @@ -1604,6 +1673,8 @@ static int handle_exit_syscall(struct kvm_vcpu *vcpu) return handle_hc_flush_tlb_current_kernel_user(vcpu); case PVM_HC_TLB_INVLPG: return handle_hc_invlpg(vcpu, a0); + case PVM_HC_LOAD_GS: + return handle_hc_load_gs(vcpu, a0); case PVM_HC_RDMSR: return handle_hc_rdmsr(vcpu, a0); case PVM_HC_WRMSR: From patchwork Mon Feb 26 14:35:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572325 Received: from mail-oo1-f51.google.com (mail-oo1-f51.google.com [209.85.161.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C61FE13A27E; Mon, 26 Feb 2024 14:36:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958214; cv=none; b=J98nMspZxKi6boeC+2NPygnH/VaRbUA5mtfoXwhB2ZfrgYpxogWHCL61tB9p60Ny6W95CULavXL0sJ4pQgtNTdlJOX+0FL4ViJiS91fm9v4P+cKFU6Gfe8Z5roHPVpXnlpbJBcDTDHGLy4yY9JxVY3UJwYVQSIDjw0+vEJIl/zI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958214; c=relaxed/simple; bh=Hf0BNKBIZ56qS48eJNOQyF5J6qUwZwEo0QFLiddsBX8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=fQxcEKLXwc1bZX8cYIcjI5C/sbwi5iRupGDf/OyNwd6UoIgbCUlaFcu7M2rZ20K6oIrkBmxRpbk9uEqI8lUpUGvNwXzka0v0F6CWjWiA9Hg2TrsMx2E6WnJoykEbkWuQM2gtd7qzoTkHtpkpRWGx9lcZmRK8ydV/wth/QF10CdA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MF2Te05o; arc=none smtp.client-ip=209.85.161.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MF2Te05o" Received: by mail-oo1-f51.google.com with SMTP id 006d021491bc7-5a09c79bb2dso256390eaf.2; Mon, 26 Feb 2024 06:36:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958211; x=1709563011; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=uOInb/lRonlNK+MHppezKyZznP49HjtDhF1ZnprcgPA=; b=MF2Te05oosZ4bIwtKWK6nj6T3FNe3rt5FV42bJUBSD4On9DYu6rwycd+4VFBeU6R6F 57xBS+NpMalgqp17wrHyRKpuclEaAIa4VxmGwNOKpZXGV/kAihiQkfrgVsrM7yPl1dsL k/4TFYvRLncs1ioqYql0KzW98ynNHgmgzN//dynbeljM8Wh+xSSOpDGG/oYY1Ai8Vq+U u+DkEwEneiNSGnUawctbud9cNBZBXW0J32sus2WGkQw7M0KdXrT2ABn59HVBhVyYannx mkzThQpERed4lWAQ5GKuIyp3z8k5dFjFZyJxwzhCp8a+JNi40etubw/Z5LxLJ6m1ZJ/M RGFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958211; x=1709563011; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uOInb/lRonlNK+MHppezKyZznP49HjtDhF1ZnprcgPA=; b=fLoGXyZSjw7xBTTYmrjNMK83kzmpUQw6xsjqjdpWBhZF+JCDIPr3lsbF9ss0twbuxP mVRhfOknguM3XUnQGwof3LrGS7mMo2wQFTuqKgDD7vLxkgS2G0NnRe1ynoNbuAFar0dv Nayz8ayGO51fe1zZm3+DhjnLAV8nP63WWfNdfVcXEvO/SS8L9Y4cROab8L3NhfkgfJ7C jDGzDs/3RBOF0pcW7rH+oBs43dmVEP2ka9ekIIuUK8IrkDLr55HeblK5ozkGu+KKCYJJ Z+3e8w3/08tcfH8rYHhhPaw42byJ99LCKPr3RVjuES4pPDInkehBdEnrW+uk31Un0ncm MoiQ== X-Forwarded-Encrypted: i=1; AJvYcCU/vZM9uiiE8hCugzWkHrofVVTHCXB7viM6EFVkkFxXIIcoK07cqqA9tISOW1DOB8VsMMsmp1A5hzj4a0myDClHmwfW X-Gm-Message-State: AOJu0YxLRt8RBVyfzyA6wClxPyIj2Ba3z5wxJVxBr3wUaSpyMyx9OcsF wUh7Zaro+IYRTfyB9Fjx8/thpEh+08gVobn6zkIitnMWz4xCGNObrDSC+9Kx X-Google-Smtp-Source: AGHT+IFLnhttqMOFxckoM1wrT4XTJOBuizOceWWP2mvuPI+Y3DasCYzrbMYy8iyVtW5bJc17zqiXTg== X-Received: by 2002:a05:6358:9226:b0:179:ff:2486 with SMTP id d38-20020a056358922600b0017900ff2486mr9648425rwb.29.1708958211198; Mon, 26 Feb 2024 06:36:51 -0800 (PST) Received: from localhost ([47.254.32.37]) by smtp.gmail.com with ESMTPSA id e6-20020a63ee06000000b005dc491ccdcesm4051500pgi.14.2024.02.26.06.36.50 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:36:50 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 41/73] KVM: x86/PVM: Allow to load guest TLS in host GDT Date: Mon, 26 Feb 2024 22:35:58 +0800 Message-Id: <20240226143630.33643-42-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan The 32-bit process needs to use TLS in libc, so a hypercall is introduced to load the guest TLS into the host GDT. The checking of the guest TLS is the same as tls_desc_okay() in the arch/x86/kernel/tls.c file. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 81 ++++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/pvm/pvm.h | 1 + 2 files changed, 82 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index ee55e99fb204..e68052f33186 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -281,6 +281,26 @@ static void segments_save_guest_and_switch_to_host(struct vcpu_pvm *pvm) wrmsrl(MSR_FS_BASE, current->thread.fsbase); } +/* + * Load guest TLS entries into the GDT. + */ +static inline void host_gdt_set_tls(struct vcpu_pvm *pvm) +{ + struct desc_struct *gdt = get_current_gdt_rw(); + unsigned int i; + + for (i = 0; i < GDT_ENTRY_TLS_ENTRIES; i++) + gdt[GDT_ENTRY_TLS_MIN + i] = pvm->tls_array[i]; +} + +/* + * Load current task's TLS into the GDT. + */ +static inline void host_gdt_restore_tls(void) +{ + native_load_tls(¤t->thread, smp_processor_id()); +} + static void pvm_prepare_switch_to_guest(struct kvm_vcpu *vcpu) { struct vcpu_pvm *pvm = to_pvm(vcpu); @@ -304,6 +324,8 @@ static void pvm_prepare_switch_to_guest(struct kvm_vcpu *vcpu) native_tss_invalidate_io_bitmap(); #endif + host_gdt_set_tls(pvm); + #ifdef CONFIG_MODIFY_LDT_SYSCALL /* PVM doesn't support LDT. */ if (unlikely(current->mm->context.ldt)) @@ -334,6 +356,8 @@ static void pvm_prepare_switch_to_host(struct vcpu_pvm *pvm) kvm_load_ldt(GDT_ENTRY_LDT*8); #endif + host_gdt_restore_tls(); + segments_save_guest_and_switch_to_host(pvm); pvm->loaded_cpu_state = 0; } @@ -1629,6 +1653,60 @@ static int handle_hc_wrmsr(struct kvm_vcpu *vcpu, u32 index, u64 value) return 1; } +// Check if the tls desc is allowed on the host GDT. +// The same logic as tls_desc_okay() in arch/x86/kernel/tls.c. +static bool tls_desc_okay(struct desc_struct *desc) +{ + // Only allow present segments. + if (!desc->p) + return false; + + // Only allow data segments. + if (desc->type & (1 << 3)) + return false; + + // Only allow 32-bit data segments. + if (!desc->d) + return false; + + return true; +} + +/* + * Hypercall: PVM_HC_LOAD_TLS + * Load guest TLS desc into host GDT. + */ +static int handle_hc_load_tls(struct kvm_vcpu *vcpu, unsigned long tls_desc_0, + unsigned long tls_desc_1, unsigned long tls_desc_2) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + unsigned long *tls_array = (unsigned long *)&pvm->tls_array[0]; + int i; + + tls_array[0] = tls_desc_0; + tls_array[1] = tls_desc_1; + tls_array[2] = tls_desc_2; + + for (i = 0; i < GDT_ENTRY_TLS_ENTRIES; i++) { + if (!tls_desc_okay(&pvm->tls_array[i])) { + pvm->tls_array[i] = (struct desc_struct){0}; + continue; + } + /* Standarding TLS descs, same as fill_ldt(). */ + pvm->tls_array[i].type |= 1; + pvm->tls_array[i].s = 1; + pvm->tls_array[i].dpl = 0x3; + pvm->tls_array[i].l = 0; + } + + preempt_disable(); + if (pvm->loaded_cpu_state) + host_gdt_set_tls(pvm); + preempt_enable(); + + return 1; +} + static int handle_kvm_hypercall(struct kvm_vcpu *vcpu) { int r; @@ -1679,6 +1757,8 @@ static int handle_exit_syscall(struct kvm_vcpu *vcpu) return handle_hc_rdmsr(vcpu, a0); case PVM_HC_WRMSR: return handle_hc_wrmsr(vcpu, a0, a1); + case PVM_HC_LOAD_TLS: + return handle_hc_load_tls(vcpu, a0, a1, a2); default: return handle_kvm_hypercall(vcpu); } @@ -2296,6 +2376,7 @@ static void pvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) pvm->hw_ss = __USER_DS; pvm->int_shadow = 0; pvm->nmi_mask = false; + memset(&pvm->tls_array[0], 0, sizeof(pvm->tls_array)); pvm->msr_vcpu_struct = 0; pvm->msr_supervisor_rsp = 0; diff --git a/arch/x86/kvm/pvm/pvm.h b/arch/x86/kvm/pvm/pvm.h index 31060831e009..f28ab0b48f40 100644 --- a/arch/x86/kvm/pvm/pvm.h +++ b/arch/x86/kvm/pvm/pvm.h @@ -98,6 +98,7 @@ struct vcpu_pvm { struct kvm_segment segments[NR_VCPU_SREG]; struct desc_ptr idt_ptr; struct desc_ptr gdt_ptr; + struct desc_struct tls_array[GDT_ENTRY_TLS_ENTRIES]; }; struct kvm_pvm { From patchwork Mon Feb 26 14:35:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572326 Received: from mail-ot1-f49.google.com (mail-ot1-f49.google.com [209.85.210.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A449E13A879; Mon, 26 Feb 2024 14:36:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958217; cv=none; b=KKtPaSOE8hm5oU0gYUGXf7KIvZpYVe+alI5sKA7xlmpmMjp9GkBGEKADinXKPSz8/3K5fqx8GwkWRP2B2906UZO/gK8uWQG0wWwBWpHXAI2gFB5ZGJp95IhFojynXhaGR2R4NTXg/Wz3Ta1d7APLpV+9HDcIf8EKBn+4F+ts+74= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958217; c=relaxed/simple; bh=QvIfkotJQxMfa11Ye+5Mx3wJUOol1gW3+VkFSNADlgU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=M29P2vdem1OjHlksJjqjywDbtnQZrUDmhm1qKcvm1LEyfDFt7hOKJ4IvNWFUp96CGXh7LPyd3XzLqaMR61U9keQRh7Yg/u4S3BQ6T8F1ZdHGyZpXLDBfGiXHX2J6KHKKos2zjzDjA1wQ1Cnrz4s0O4ywyyMXYt39PBF0ZsuCRkI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=PDNLVYDu; arc=none smtp.client-ip=209.85.210.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="PDNLVYDu" Received: by mail-ot1-f49.google.com with SMTP id 46e09a7af769-6e0f43074edso2168511a34.1; Mon, 26 Feb 2024 06:36:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958214; x=1709563014; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=HdqH8jVNXmEu4zs1zan2Z9bnYeRZlguwUihIbfO84eA=; b=PDNLVYDuhvgcfAQ5VsTHEfTR5kK26G412sx+wCWD6+hkKZ8441GE6au7P4C5lozW+Y ffjIA8e18VL9SnNmE8IitGQFu4bBGiK3W6+9Zgvclo39zcmpprr2zny9Om/viMQqmB/G roGDmCk2bXF94wDcawunQECGhZeKbjO0zinG3wb3BBwMJ+OYFHNzbRXt2LVsrK9A8i9L 7V/ZdlGhKAXRIPdh7jWXIr5rht3FNOXvTl3l9ddHJmbuxTxHmP5IMsEpxBpAgmGjAUJ0 nxiYr6M4/qsutgzgBXumth6iJSSZ2Z3CUM7HKDplkzfKbxCX6Iu+WFbZBL0EhuexUI/r MpqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958214; x=1709563014; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HdqH8jVNXmEu4zs1zan2Z9bnYeRZlguwUihIbfO84eA=; b=YvdHQHsVs9fAZpfSJqOJk0c4Mn+HRlH8Gu8S829qVHgjze/Bw7ghU0TZ0qOHYztZQk 1TcjwWH7WKLFITdADMSTGBUWHMECpRD4xnQZsSOKsu1+C59wjbYc64AA6HRCEiXai37i 5rYbxbuP75CNiC9aRkWeT5bItQ/bFzIfBNKh6ZIoqkKriLkxmvrv2W5GJBDP9rLtYRGB UXv92AAWWR3QYbN5Qct/bxbteX9RLZdEGwLDT/wvwcjxexowUt3SKHaTi5oR2X5pOmwH kl6m1vRjjZ+oh5tTX98F3yg8TPV8Otol51iDRUfGl7gRSRniSIgNEp5ixoP9GAXiI0CF TRKQ== X-Forwarded-Encrypted: i=1; AJvYcCWlmXSvpFPrNAWQsMOk668SDXDCHAm9uCEWYMfDLU8JR83KDdJEH/FFtyXlbTrZqcYfE1obsgSN5c3ipWLT90pOht+M X-Gm-Message-State: AOJu0YxS8PhHJ3IF327hVzN5iQmAwP8Ufmy76/XL2R7kimau2aWYbp6g ubYtREn2G7TAT3LEweqqvchAYj8Hix9XZ9Cf4xtnMRaMn1kR0Q7FyMaLmmFw X-Google-Smtp-Source: AGHT+IGzhilLnOqDnW8DtlnwMh+tC9Pj86iMFec5r5I9e0Ve/nJKQ5bcBi6m3xQOLVYQTSM7JwtmLA== X-Received: by 2002:a05:6358:4428:b0:17a:def8:5687 with SMTP id z40-20020a056358442800b0017adef85687mr5692172rwc.27.1708958214434; Mon, 26 Feb 2024 06:36:54 -0800 (PST) Received: from localhost ([47.254.32.37]) by smtp.gmail.com with ESMTPSA id d4-20020a634f04000000b005d8b2f04eb7sm3922538pgb.62.2024.02.26.06.36.53 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:36:54 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Hou Wenlong , Lai Jiangshan , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 42/73] KVM: x86/PVM: Support for kvm_exit() tracepoint Date: Mon, 26 Feb 2024 22:35:59 +0800 Message-Id: <20240226143630.33643-43-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Hou Wenlong Similar to VMX/SVM, add necessary information to support kvm_exit() tracepoint. Suggested-by: Lai Jiangshan Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/kvm/pvm/pvm.c | 41 +++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/pvm/pvm.h | 35 +++++++++++++++++++++++++++++++++++ arch/x86/kvm/trace.h | 7 ++++++- 3 files changed, 82 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index e68052f33186..6ac599587567 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -1996,6 +1996,43 @@ static int pvm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) return 0; } +static u32 pvm_get_syscall_exit_reason(struct kvm_vcpu *vcpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + unsigned long rip = kvm_rip_read(vcpu); + + if (is_smod(pvm)) { + if (rip == pvm->msr_retu_rip_plus2) + return PVM_EXIT_REASONS_ERETU; + else if (rip == pvm->msr_rets_rip_plus2) + return PVM_EXIT_REASONS_ERETS; + else + return PVM_EXIT_REASONS_HYPERCALL; + } + + return PVM_EXIT_REASONS_SYSCALL; +} + +static void pvm_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u64 *info1, u64 *info2, + u32 *intr_info, u32 *error_code) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + + if (pvm->exit_vector == PVM_SYSCALL_VECTOR) + *reason = pvm_get_syscall_exit_reason(vcpu); + else if (pvm->exit_vector == IA32_SYSCALL_VECTOR) + *reason = PVM_EXIT_REASONS_INT80; + else if (pvm->exit_vector >= FIRST_EXTERNAL_VECTOR && + pvm->exit_vector < NR_VECTORS) + *reason = PVM_EXIT_REASONS_INTERRUPT; + else + *reason = pvm->exit_vector; + *info1 = pvm->exit_vector; + *info2 = pvm->exit_error_code; + *intr_info = pvm->exit_vector; + *error_code = pvm->exit_error_code; +} + static void pvm_handle_exit_irqoff(struct kvm_vcpu *vcpu) { struct vcpu_pvm *pvm = to_pvm(vcpu); @@ -2298,6 +2335,8 @@ static fastpath_t pvm_vcpu_run(struct kvm_vcpu *vcpu) mark_page_dirty_in_slot(vcpu->kvm, pvm->pvcs_gpc.memslot, pvm->pvcs_gpc.gpa >> PAGE_SHIFT); + trace_kvm_exit(vcpu, KVM_ISA_PVM); + return EXIT_FASTPATH_NONE; } @@ -2627,6 +2666,8 @@ static struct kvm_x86_ops pvm_x86_ops __initdata = { .refresh_apicv_exec_ctrl = pvm_refresh_apicv_exec_ctrl, .deliver_interrupt = pvm_deliver_interrupt, + .get_exit_info = pvm_get_exit_info, + .vcpu_after_set_cpuid = pvm_vcpu_after_set_cpuid, .check_intercept = pvm_check_intercept, diff --git a/arch/x86/kvm/pvm/pvm.h b/arch/x86/kvm/pvm/pvm.h index f28ab0b48f40..2f8fdb0ae3df 100644 --- a/arch/x86/kvm/pvm/pvm.h +++ b/arch/x86/kvm/pvm/pvm.h @@ -10,6 +10,41 @@ #define PVM_SYSCALL_VECTOR SWITCH_EXIT_REASONS_SYSCALL #define PVM_FAILED_VMENTRY_VECTOR SWITCH_EXIT_REASONS_FAILED_VMETNRY +#define PVM_EXIT_REASONS_SHIFT 16 +#define PVM_EXIT_REASONS_SYSCALL (1UL << PVM_EXIT_REASONS_SHIFT) +#define PVM_EXIT_REASONS_HYPERCALL (2UL << PVM_EXIT_REASONS_SHIFT) +#define PVM_EXIT_REASONS_ERETU (3UL << PVM_EXIT_REASONS_SHIFT) +#define PVM_EXIT_REASONS_ERETS (4UL << PVM_EXIT_REASONS_SHIFT) +#define PVM_EXIT_REASONS_INTERRUPT (5UL << PVM_EXIT_REASONS_SHIFT) +#define PVM_EXIT_REASONS_INT80 (6UL << PVM_EXIT_REASONS_SHIFT) + +#define PVM_EXIT_REASONS \ + { DE_VECTOR, "DE excp" }, \ + { DB_VECTOR, "DB excp" }, \ + { NMI_VECTOR, "NMI excp" }, \ + { BP_VECTOR, "BP excp" }, \ + { OF_VECTOR, "OF excp" }, \ + { BR_VECTOR, "BR excp" }, \ + { UD_VECTOR, "UD excp" }, \ + { NM_VECTOR, "NM excp" }, \ + { DF_VECTOR, "DF excp" }, \ + { TS_VECTOR, "TS excp" }, \ + { SS_VECTOR, "SS excp" }, \ + { GP_VECTOR, "GP excp" }, \ + { PF_VECTOR, "PF excp" }, \ + { MF_VECTOR, "MF excp" }, \ + { AC_VECTOR, "AC excp" }, \ + { MC_VECTOR, "MC excp" }, \ + { XM_VECTOR, "XM excp" }, \ + { VE_VECTOR, "VE excp" }, \ + { PVM_EXIT_REASONS_SYSCALL, "SYSCALL" }, \ + { PVM_EXIT_REASONS_HYPERCALL, "HYPERCALL" }, \ + { PVM_EXIT_REASONS_ERETU, "ERETU" }, \ + { PVM_EXIT_REASONS_ERETS, "ERETS" }, \ + { PVM_EXIT_REASONS_INTERRUPT, "INTERRUPT" }, \ + { PVM_EXIT_REASONS_INT80, "INT80" }, \ + { PVM_FAILED_VMENTRY_VECTOR, "FAILED_VMENTRY" } + #define PT_L4_SHIFT 39 #define PT_L4_SIZE (1UL << PT_L4_SHIFT) #define DEFAULT_RANGE_L4_SIZE (32 * PT_L4_SIZE) diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index 83843379813e..3d6549679e98 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -8,6 +8,8 @@ #include #include +#include "pvm/pvm.h" + #undef TRACE_SYSTEM #define TRACE_SYSTEM kvm @@ -282,11 +284,14 @@ TRACE_EVENT(kvm_apic, #define KVM_ISA_VMX 1 #define KVM_ISA_SVM 2 +#define KVM_ISA_PVM 3 #define kvm_print_exit_reason(exit_reason, isa) \ (isa == KVM_ISA_VMX) ? \ __print_symbolic(exit_reason & 0xffff, VMX_EXIT_REASONS) : \ - __print_symbolic(exit_reason, SVM_EXIT_REASONS), \ + ((isa == KVM_ISA_SVM) ? \ + __print_symbolic(exit_reason, SVM_EXIT_REASONS) : \ + __print_symbolic(exit_reason, PVM_EXIT_REASONS)), \ (isa == KVM_ISA_VMX && exit_reason & ~0xffff) ? " " : "", \ (isa == KVM_ISA_VMX) ? \ __print_flags(exit_reason & ~0xffff, " ", VMX_EXIT_REASON_FLAGS) : "" From patchwork Mon Feb 26 14:36:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572327 Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com [209.85.210.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 04F8D13AA33; Mon, 26 Feb 2024 14:36:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958220; cv=none; b=CUrTDHyQPZ+LR01kJwDyblFPR2RvlSTScdKzS38wfXSv07nObiy26GMwAXgy2N/lnuoYJjKcBJlTxrfDuvXkAREDO5LXWnwbJoMHhF2A4hagDYrFtyB1ak12RwnWPaPjY7iZZ0xXlxyTQqPK5dja8AXchoNNUABcGdFm8Y6WGWA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958220; c=relaxed/simple; bh=VFpQkf0DwaMmnSB2prKBrCNuNiYKSSA7i3bHq/5WTHg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Kcbo/votl7IOUP3ReU0V4I1af8f2yCHnXeNP0qLXGHAhJApi9Fi1iAa6k6gLzKuQbG0IYk8CRIrd9O1BaIPssxW941RknyPLWfHo8TaGG2U8Fxi2Npqr7kZiws1iwJLlzctof4iSpEoBFlLKAHP1zDNNufvDv+KA3vcvhOhzw98= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=bt25SjoV; arc=none smtp.client-ip=209.85.210.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="bt25SjoV" Received: by mail-pf1-f178.google.com with SMTP id d2e1a72fcca58-6e53f3f1f82so216233b3a.2; Mon, 26 Feb 2024 06:36:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958218; x=1709563018; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BcAOo2JYDa9YbLmrgou0VrWmz7Zj4ViKVXNN4Wof/Rw=; b=bt25SjoViGK+IvJiU9EifeZog6vpvDggZnoCDOoSBhkGIXkCVmpGgSU9m27XNqaRX0 461rotuEdCbU6x1VGm20mpGo7++FeEa20w5Scq24ajj4rBGtfyLmLi21wu6nk7yfbjl9 LS9dk7bzFVD+3kd3xrWOkBiK2F5shpe9ZhtZkHasS04G8yzJ6y01an9UCuDWGItq05sR 2i/AtgiLZrATgZc0LPfkQs8lpNSQ4HK1ellZLVz7PfCrtq2XGisi2LmyuR6lgUHN4ibX DdbLoHU/pgDdBspiJP5U9lhGbOV4st+OBcbZ4wJLdDtxZQurA0VCp5G00u9ZPjwI3stD 6oeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958218; x=1709563018; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BcAOo2JYDa9YbLmrgou0VrWmz7Zj4ViKVXNN4Wof/Rw=; b=IeliWwrJp9dtiHTCQ+Cc4bQQUXsXdyrfr6Dp39ZProTdV+F+0l24Cj9KVvjtCHFIEp KmzHcZnAa988hV11TYbJ8gx8usiUgq3W4BK6nkbcnZ2aP1Rw4jU8nmSwR92Wt8AKkG41 90bY70GQenmrluljmwIIi4sANIwm48MDRGxp8pkYtsWBtsC04ANZlTMHXhJz+AHA4Ku2 9rqAu3DlDhn5Ix8Zn48qbW0FeseizmKVu1FVkx0S1bMP5LYjZc8qxs4NC+K15Va/33ez q8WYqkKCOUUCxpIFXeP2uZIOCa/4JXg6mxK5ylCv5eZyPaGX83UPNF6KoVYDkVDh9fME XL0g== X-Forwarded-Encrypted: i=1; AJvYcCW/MIJQL2d+PnQ0vYBuuKx++FTkxMP0zXH/SyvFdV2h3yf7PcYMMshlfmNdfdHp+6Q3fEeBZZHvzOc+9ZxPPFeeAE8P X-Gm-Message-State: AOJu0YwuJxaByjhWUzM+UNHkqLUhctCL+OW5Ps55NtUOLMIN8Db4+9oA Q9+6NBP5zok7wMKCtHRkpr1kMdqMjgeF6IOlNsQJZUBu2WqsKZpKmm+pbGuM X-Google-Smtp-Source: AGHT+IGqZOMLFmNX0VMVhFOqOC3rTr2ifATiXER7sN1I6LWQ9jctmC5a0wu07Wfb4xHqxl9VB/CzYQ== X-Received: by 2002:a05:6a20:2c92:b0:1a0:817d:80d with SMTP id g18-20020a056a202c9200b001a0817d080dmr4230103pzj.45.1708958218030; Mon, 26 Feb 2024 06:36:58 -0800 (PST) Received: from localhost ([47.88.5.130]) by smtp.gmail.com with ESMTPSA id r23-20020a17090a941700b0029942a73eaesm4505516pjo.9.2024.02.26.06.36.57 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:36:57 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 43/73] KVM: x86/PVM: Enable direct switching Date: Mon, 26 Feb 2024 22:36:00 +0800 Message-Id: <20240226143630.33643-44-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan To enable direct switching, certain necessary information needs to be prepared in TSS for the switcher. Since only syscall and RETU hypercalls are allowed for now, CPL switching-related information is needed before VM enters. Additionally, after VM exit, the states in the hypervisor should be updated if direct switching has occurred. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 87 +++++++++++++++++++++++++++++++++++++++++- arch/x86/kvm/pvm/pvm.h | 15 ++++++++ 2 files changed, 100 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index 6ac599587567..138d0c255cb8 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -559,23 +559,70 @@ static void pvm_flush_hwtlb_gva(struct kvm_vcpu *vcpu, gva_t addr) put_cpu(); } +static bool check_switch_cr3(struct vcpu_pvm *pvm, u64 switch_host_cr3) +{ + u64 root = pvm->vcpu.arch.mmu->prev_roots[0].hpa; + + if (pvm->vcpu.arch.mmu->prev_roots[0].pgd != pvm->msr_switch_cr3) + return false; + if (!VALID_PAGE(root)) + return false; + if (host_pcid_owner(switch_host_cr3 & X86_CR3_PCID_MASK) != pvm) + return false; + if (host_pcid_root(switch_host_cr3 & X86_CR3_PCID_MASK) != root) + return false; + if (root != (switch_host_cr3 & CR3_ADDR_MASK)) + return false; + + return true; +} + static void pvm_set_host_cr3_for_guest_with_host_pcid(struct vcpu_pvm *pvm) { u64 root_hpa = pvm->vcpu.arch.mmu->root.hpa; bool flush = false; u32 host_pcid = host_pcid_get(pvm, root_hpa, &flush); u64 hw_cr3 = root_hpa | host_pcid; + u64 switch_host_cr3; if (!flush) hw_cr3 |= CR3_NOFLUSH; this_cpu_write(cpu_tss_rw.tss_ex.enter_cr3, hw_cr3); + + if (is_smod(pvm)) { + this_cpu_write(cpu_tss_rw.tss_ex.smod_cr3, hw_cr3 | CR3_NOFLUSH); + switch_host_cr3 = this_cpu_read(cpu_tss_rw.tss_ex.umod_cr3); + } else { + this_cpu_write(cpu_tss_rw.tss_ex.umod_cr3, hw_cr3 | CR3_NOFLUSH); + switch_host_cr3 = this_cpu_read(cpu_tss_rw.tss_ex.smod_cr3); + } + + if (check_switch_cr3(pvm, switch_host_cr3)) + pvm->switch_flags &= ~SWITCH_FLAGS_NO_DS_CR3; + else + pvm->switch_flags |= SWITCH_FLAGS_NO_DS_CR3; } static void pvm_set_host_cr3_for_guest_without_host_pcid(struct vcpu_pvm *pvm) { u64 root_hpa = pvm->vcpu.arch.mmu->root.hpa; + u64 switch_root = 0; + + if (pvm->vcpu.arch.mmu->prev_roots[0].pgd == pvm->msr_switch_cr3) { + switch_root = pvm->vcpu.arch.mmu->prev_roots[0].hpa; + pvm->switch_flags &= ~SWITCH_FLAGS_NO_DS_CR3; + } else { + pvm->switch_flags |= SWITCH_FLAGS_NO_DS_CR3; + } this_cpu_write(cpu_tss_rw.tss_ex.enter_cr3, root_hpa); + if (is_smod(pvm)) { + this_cpu_write(cpu_tss_rw.tss_ex.smod_cr3, root_hpa); + this_cpu_write(cpu_tss_rw.tss_ex.umod_cr3, switch_root); + } else { + this_cpu_write(cpu_tss_rw.tss_ex.umod_cr3, root_hpa); + this_cpu_write(cpu_tss_rw.tss_ex.smod_cr3, switch_root); + } } static void pvm_set_host_cr3_for_hypervisor(struct vcpu_pvm *pvm) @@ -591,6 +638,8 @@ static void pvm_set_host_cr3_for_hypervisor(struct vcpu_pvm *pvm) // Set tss_ex.host_cr3 for VMExit. // Set tss_ex.enter_cr3 for VMEnter. +// Set tss_ex.smod_cr3 and tss_ex.umod_cr3 and set or clear +// SWITCH_FLAGS_NO_DS_CR3 for direct switching. static void pvm_set_host_cr3(struct vcpu_pvm *pvm) { pvm_set_host_cr3_for_hypervisor(pvm); @@ -1058,6 +1107,11 @@ static bool pvm_apic_init_signal_blocked(struct kvm_vcpu *vcpu) static void update_exception_bitmap(struct kvm_vcpu *vcpu) { + /* disable direct switch when single step debugging */ + if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) + to_pvm(vcpu)->switch_flags |= SWITCH_FLAGS_SINGLE_STEP; + else + to_pvm(vcpu)->switch_flags &= ~SWITCH_FLAGS_SINGLE_STEP; } static struct pvm_vcpu_struct *pvm_get_vcpu_struct(struct vcpu_pvm *pvm) @@ -1288,10 +1342,12 @@ static void pvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) if (!need_update || !is_smod(pvm)) return; - if (rflags & X86_EFLAGS_IF) + if (rflags & X86_EFLAGS_IF) { + pvm->switch_flags &= ~SWITCH_FLAGS_IRQ_WIN; pvm_event_flags_update(vcpu, X86_EFLAGS_IF, PVM_EVENT_FLAGS_IP); - else + } else { pvm_event_flags_update(vcpu, 0, X86_EFLAGS_IF); + } } static bool pvm_get_if_flag(struct kvm_vcpu *vcpu) @@ -1311,6 +1367,7 @@ static void pvm_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask) static void enable_irq_window(struct kvm_vcpu *vcpu) { + to_pvm(vcpu)->switch_flags |= SWITCH_FLAGS_IRQ_WIN; pvm_event_flags_update(vcpu, PVM_EVENT_FLAGS_IP, 0); } @@ -1332,6 +1389,7 @@ static void pvm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked) static void enable_nmi_window(struct kvm_vcpu *vcpu) { + to_pvm(vcpu)->switch_flags |= SWITCH_FLAGS_NMI_WIN; } static int pvm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection) @@ -1361,6 +1419,8 @@ static void pvm_inject_irq(struct kvm_vcpu *vcpu, bool reinjected) trace_kvm_inj_virq(irq, vcpu->arch.interrupt.soft, false); + to_pvm(vcpu)->switch_flags &= ~SWITCH_FLAGS_IRQ_WIN; + if (do_pvm_event(vcpu, irq, false, 0)) kvm_clear_interrupt_queue(vcpu); @@ -1397,6 +1457,7 @@ static int handle_synthetic_instruction_return_user(struct kvm_vcpu *vcpu) // instruction to return user means nmi allowed. pvm->nmi_mask = false; + pvm->switch_flags &= ~(SWITCH_FLAGS_IRQ_WIN | SWITCH_FLAGS_NMI_WIN); /* * switch to user mode before kvm_set_rflags() to avoid PVM_EVENT_FLAGS_IF @@ -1448,6 +1509,7 @@ static int handle_synthetic_instruction_return_supervisor(struct kvm_vcpu *vcpu) // instruction to return supervisor means nmi allowed. pvm->nmi_mask = false; + pvm->switch_flags &= ~SWITCH_FLAGS_NMI_WIN; kvm_set_rflags(vcpu, frame.rflags); kvm_rip_write(vcpu, frame.rip); @@ -1461,6 +1523,7 @@ static int handle_synthetic_instruction_return_supervisor(struct kvm_vcpu *vcpu) static int handle_hc_interrupt_window(struct kvm_vcpu *vcpu) { kvm_make_request(KVM_REQ_EVENT, vcpu); + to_pvm(vcpu)->switch_flags &= ~SWITCH_FLAGS_IRQ_WIN; pvm_event_flags_update(vcpu, 0, PVM_EVENT_FLAGS_IP); ++vcpu->stat.irq_window_exits; @@ -2199,6 +2262,7 @@ static __always_inline void load_regs(struct kvm_vcpu *vcpu, struct pt_regs *gue static noinstr void pvm_vcpu_run_noinstr(struct kvm_vcpu *vcpu) { + struct tss_extra *tss_ex = this_cpu_ptr(&cpu_tss_rw.tss_ex); struct vcpu_pvm *pvm = to_pvm(vcpu); struct pt_regs *sp0_regs = (struct pt_regs *)this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1; struct pt_regs *ret_regs; @@ -2208,12 +2272,25 @@ static noinstr void pvm_vcpu_run_noinstr(struct kvm_vcpu *vcpu) // Load guest registers into the host sp0 stack for switcher. load_regs(vcpu, sp0_regs); + // Prepare context for direct switching. + tss_ex->switch_flags = pvm->switch_flags; + tss_ex->pvcs = pvm->pvcs_gpc.khva; + tss_ex->retu_rip = pvm->msr_retu_rip_plus2; + tss_ex->smod_entry = pvm->msr_lstar; + tss_ex->smod_gsbase = pvm->msr_kernel_gs_base; + tss_ex->smod_rsp = pvm->msr_supervisor_rsp; + if (unlikely(pvm->guest_dr7 & DR7_BP_EN_MASK)) set_debugreg(pvm_eff_dr7(vcpu), 7); // Call into switcher and enter guest. ret_regs = switcher_enter_guest(); + // Get the resulted mode and PVM MSRs which might be changed + // when direct switching. + pvm->switch_flags = tss_ex->switch_flags; + pvm->msr_supervisor_rsp = tss_ex->smod_rsp; + // Get the guest registers from the host sp0 stack. save_regs(vcpu, ret_regs); pvm->exit_vector = (ret_regs->orig_ax >> 32); @@ -2293,6 +2370,7 @@ static inline void pvm_load_host_xsave_state(struct kvm_vcpu *vcpu) static fastpath_t pvm_vcpu_run(struct kvm_vcpu *vcpu) { struct vcpu_pvm *pvm = to_pvm(vcpu); + bool is_smod_befor_run = is_smod(pvm); trace_kvm_entry(vcpu); @@ -2307,6 +2385,11 @@ static fastpath_t pvm_vcpu_run(struct kvm_vcpu *vcpu) pvm_vcpu_run_noinstr(vcpu); + if (is_smod_befor_run != is_smod(pvm)) { + swap(pvm->vcpu.arch.mmu->root, pvm->vcpu.arch.mmu->prev_roots[0]); + swap(pvm->msr_switch_cr3, pvm->vcpu.arch.cr3); + } + /* MSR_IA32_DEBUGCTLMSR is zeroed before vmenter. Restore it if needed */ if (pvm->host_debugctlmsr) update_debugctlmsr(pvm->host_debugctlmsr); diff --git a/arch/x86/kvm/pvm/pvm.h b/arch/x86/kvm/pvm/pvm.h index 2f8fdb0ae3df..e49d9dc70a94 100644 --- a/arch/x86/kvm/pvm/pvm.h +++ b/arch/x86/kvm/pvm/pvm.h @@ -5,6 +5,21 @@ #include #include +/* + * Extra switch flags: + * + * IRQ_WIN: + * There is an irq window request, and the vcpu should not directly + * switch to context with IRQ enabled, e.g. user mode. + * NMI_WIN: + * There is an NMI window request. + * SINGLE_STEP: + * KVM_GUESTDBG_SINGLESTEP is set. + */ +#define SWITCH_FLAGS_IRQ_WIN _BITULL(8) +#define SWITCH_FLAGS_NMI_WIN _BITULL(9) +#define SWITCH_FLAGS_SINGLE_STEP _BITULL(10) + #define SWITCH_FLAGS_INIT (SWITCH_FLAGS_SMOD) #define PVM_SYSCALL_VECTOR SWITCH_EXIT_REASONS_SYSCALL From patchwork Mon Feb 26 14:36:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572328 Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6ADBA13AA55; Mon, 26 Feb 2024 14:37:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958223; cv=none; b=o7vaJGNlJJ9myNoKvP5xdZ5NgLXnSqjHcXIzK3FBfPICpc1VE5jbshMNBVVJ85gGSy7gBHlR33V5YjQB7rHEYdSP0YWdsDTK9WeA8GT0TVD4QyXpFtKwt0g2y2CootFXrINvIaOhu3npxKAuHiWsCYx2FPhvt6j/j5RIICLJnmQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958223; c=relaxed/simple; bh=pm5GaYpbz831lvo3EsBoWH68tqICveycrA1gNbHXH1s=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=DO6e1iQf2iov/558qXi2Z9W6UswjkkeV+F4e7FUtmzC7tdr8EuFi+t2Mu1ty8Ji6DTA8XOpPpqIXcg6IW7UcHTppaBl3tRPaGQ3odGL9bdznr0HODJtccKmeR55HClh+xRWxU9zNckS2MmaKbPFOjXRgE0Nb/vHFHRwikNVztxY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=h8iXLUvI; arc=none smtp.client-ip=209.85.214.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="h8iXLUvI" Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-1dca3951ad9so8755425ad.3; Mon, 26 Feb 2024 06:37:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958221; x=1709563021; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=e8DBgCM0VR+4taDSV316Rvjx7ITFdXhliPm7c//LFBQ=; b=h8iXLUvIRO5li2pYiZFKEzAua2YKKYTkPPe+BRz2UfN9d5vkKW9WG93pGWCy0+iRnk 3651d4pQnJre8ilR753yrqsdxPClzD6CJH2K4SkAqDRusvjZm+QEIwwTcyU926CUBVP3 mRTnYfD+Hveovooy+bv0ddJNxtT+31W22c9JEa47BdTXUP7vx/Yuxu7S5gshNdXNzJHB kCO26oXnfz4x+3Y4G4naCq/aUDJxUFzr4T63I/bpybPT8oUKPc1papXfaFJQczCf9BQK tB7IfRbV7v1R0yAMvNkFAV5sOdk/jDKirS9eCHdN97OZojStH5jGtSPKLnw4OaU3kEu7 KQkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958221; x=1709563021; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=e8DBgCM0VR+4taDSV316Rvjx7ITFdXhliPm7c//LFBQ=; b=QjEqtjjQyHgZQnX/KoZ9U4cSmuJX+zHw5+oXa+gRvNNFxSHqQ1MggisSy11BVrsjbU 2oYqgr5PYnk7J0W8gxC5aBAoG8tTIfZn7TJ5S1LfrPqLj8OJEJd/G4eQjaJJ8/zhvlAa qflIAPiCZTSGkpDhpGjwPNHbuhvHre+v0orhlo2JXlIj1SKvXRT1roBRhhCxCdL1pw8F xahyCZOQx8BQWTYwm/iWx0S3J5pusIRvlvoFtURiugHVHKss59hbYKoCppPut64iBUo+ VIjIERnMs0fWzE3RcjvwHrFv6Gy/QMRaNWg8v+o2+R5ANQPHTBGuCkSbl8ULBTN1XOmJ qMTg== X-Forwarded-Encrypted: i=1; AJvYcCXBLBlP3yibu1hw1uWm955eFhkHWyinzoA1sqvQsWc7G61eYb2KJGMdy/6ID38f2B3aiVCV1Z8BAfGr78JJYJx+WDZi X-Gm-Message-State: AOJu0YyIRCrRRQVNZd7u52UykNr70yQzWeW2gSzK1CptosjBA3ijCtKw GyT/Gfario8lVuSwRGx66ruWUR2Pe1p7qxEl4wpUFPcytc7X6bwJcO+uoBIS X-Google-Smtp-Source: AGHT+IGL0ddPeVRaBmwnCUkeiVCWzHNiZRsnUe1vOCNit47W2YeMz73kMlfrCnSL7E7+/jE0C1y/Rg== X-Received: by 2002:a17:902:e841:b0:1db:37b1:b1a3 with SMTP id t1-20020a170902e84100b001db37b1b1a3mr10296670plg.17.1708958221380; Mon, 26 Feb 2024 06:37:01 -0800 (PST) Received: from localhost ([47.254.32.37]) by smtp.gmail.com with ESMTPSA id ko4-20020a17090307c400b001dbcf653024sm3994437plb.293.2024.02.26.06.37.00 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:37:00 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 44/73] KVM: x86/PVM: Implement TSC related callbacks Date: Mon, 26 Feb 2024 22:36:01 +0800 Message-Id: <20240226143630.33643-45-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan Without hardware assistance, TSC offset and TSC multiplier are not supported in PVM. Therefore, the guest uses the host TSC directly, which means the TSC offset is 0. Although it currently works correctly, a proper ABI is needed to describe it. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index 138d0c255cb8..f2cd1a1c199d 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -725,6 +725,28 @@ static int pvm_check_intercept(struct kvm_vcpu *vcpu, return X86EMUL_CONTINUE; } +static u64 pvm_get_l2_tsc_offset(struct kvm_vcpu *vcpu) +{ + return 0; +} + +static u64 pvm_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu) +{ + return 0; +} + +static void pvm_write_tsc_offset(struct kvm_vcpu *vcpu) +{ + // TODO: add proper ABI and make guest use host TSC + vcpu->arch.tsc_offset = 0; + vcpu->arch.l1_tsc_offset = 0; +} + +static void pvm_write_tsc_multiplier(struct kvm_vcpu *vcpu) +{ + // TODO: add proper ABI and make guest use host TSC +} + static void pvm_set_msr_linear_address_range(struct vcpu_pvm *pvm, u64 pml4_i_s, u64 pml4_i_e, u64 pml5_i_s, u64 pml5_i_e) @@ -2776,6 +2798,10 @@ static struct kvm_x86_ops pvm_x86_ops __initdata = { .complete_emulated_msr = kvm_complete_insn_gp, .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, + .get_l2_tsc_offset = pvm_get_l2_tsc_offset, + .get_l2_tsc_multiplier = pvm_get_l2_tsc_multiplier, + .write_tsc_offset = pvm_write_tsc_offset, + .write_tsc_multiplier = pvm_write_tsc_multiplier, .check_emulate_instruction = pvm_check_emulate_instruction, .disallowed_va = pvm_disallowed_va, .vcpu_gpc_refresh = pvm_vcpu_gpc_refresh, From patchwork Mon Feb 26 14:36:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572329 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 806A213B29C; Mon, 26 Feb 2024 14:37:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958227; cv=none; b=WhW7qYrt152b87Uhm6bH9eq1ohdSXmiQpv4/m8Umw4XvBidlSj+LMSKxnT3kofmEhmLcwEG5+nhDzp0uVj13aPhHwmSy52AzU3lrzZP7YYT+KciFSScoBcOi3OuHesQylMZbdmKL3Q4QqhFvr7GaL86oStuX7+Kt7sc9g5SLPVs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958227; c=relaxed/simple; bh=8YaAHnt0AiEYAzbERIaHQQ44YFwc8NdXJM54PxTApI0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=F724fmq7JWRFYvExTxOq6duP80Ue7zwK6VcKoF6CQcJzB8AD1949nEqV0bz/9R92p+TQu61KIXLAXi7V2GLCa5zRFI07hK/nBGUuxO0fRBb5mBECrXIWwiM7cKet9bAkc0c1KyRKK0hyxGNSSV6oxF9CkcBAVurkLYtg/jIxYRI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=aKsId2Gs; arc=none smtp.client-ip=209.85.210.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="aKsId2Gs" Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-6e54451edc6so10405b3a.1; Mon, 26 Feb 2024 06:37:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958224; x=1709563024; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=h+dxYsygiG51aGauEBa8dsdGiX2H9H8gIcL8GbhB21k=; b=aKsId2Gsr9e8pFkRotuG6bpVriIyAbkkoniIDKGk6P4tdW925U1noPGghJJPvezqC6 PzWw8c6PbMfx+hIcLeDnMUFt/LyV2VZJyYze3wf4cADsTIpQyCztQeGzNPC41yUBKgz3 N3cmOU0bynH6Ns6TGWmrfvcDug0kQXXKBHrqkFO+9XAwwRABo7p16zDfagk3+hDRayX9 23ICBkMU6ZWu7o/Eeah6v7o5rOYg6lqGnpEKcrXZIdC0z9+V0zkT5yOHcWX3Mma8S4nM TZU3Oc1mKMI+mpI2ZkbAPs+zTE2QriQSsorkRD6Xs7vuNysCXGFTW/D8rzdRHbNVEu5k mJ4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958224; x=1709563024; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=h+dxYsygiG51aGauEBa8dsdGiX2H9H8gIcL8GbhB21k=; b=ltoAzF2DVB4cq8GT/VPoTfIATdGKasPboATewXvrwsgtuDeyGajc7KH0z9/R/cZfpT LcT3Ck/G/fxRcXO0me/7mbzY8/vUHdrmQqrRrzf36C5PqtZeOFICE9vJ7C1qQK9aEOLI N02EVckT5li6dApDm+wxoFFvWapMOmeRvI8ICoEeeeQXOd+fUPgonamewK4IfBw8M9dV 6cWp/ICZl4i6ugUPgqu8Q6KTYJ8sbOBjteBi2W5s2jCPZZI5r2bVhFhF+KbNg5CxRt36 9rrX2iW/NeLLBcIP+BMa681F4A0P5g8J2kSQS0xd53aw1VssToh3zcSL5zlUbPgWss9u kHXg== X-Forwarded-Encrypted: i=1; AJvYcCUJZRkTkIT+/t3ADKXd1f73WGwAFQDw47gbMulUwP5KvU3BTUzwtIl0fHVHu4oa2cjd/uIswvv7wbLlBNT9LocszM0Z X-Gm-Message-State: AOJu0YyiNl4cCHDQxk8v3sx+QN0ELD5EWMYqpEw0xLw3xzfxwFw6rgrH aSFjFwDV7eMJVkZtpB5AJAm27Y32uoi1McrX83Onzs71Fa1KV//PCtNcZSHt X-Google-Smtp-Source: AGHT+IF4OCEZJFmMPMviijl3yGOrjNcy7E8v6oBWvvHaH9EDSCrEI/ACp3ji29GenWVxL6k1nUVRJg== X-Received: by 2002:a05:6a20:438f:b0:1a0:f096:5022 with SMTP id i15-20020a056a20438f00b001a0f0965022mr8595562pzl.46.1708958224431; Mon, 26 Feb 2024 06:37:04 -0800 (PST) Received: from localhost ([47.89.225.180]) by smtp.gmail.com with ESMTPSA id p11-20020a170902eacb00b001d706e373a9sm4001330pld.292.2024.02.26.06.37.03 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:37:04 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 45/73] KVM: x86/PVM: Add dummy PMU related callbacks Date: Mon, 26 Feb 2024 22:36:02 +0800 Message-Id: <20240226143630.33643-46-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan Currently, PMU virtualization is not implemented, so dummy PMU related callbacks are added to make PVM work. In the future, the existing code in pmu_intel.c and pmu_amd.c will be reused to implement PMU virtualization for PVM. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 72 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index f2cd1a1c199d..e6464095d40b 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -21,6 +21,7 @@ #include "cpuid.h" #include "lapic.h" #include "mmu.h" +#include "pmu.h" #include "trace.h" #include "x86.h" #include "pvm.h" @@ -2701,6 +2702,76 @@ static void hardware_unsetup(void) { } +//====== start of dummy pmu =========== +//TODO: split kvm-pmu-intel.ko & kvm-pmu-amd.ko from kvm-intel.ko & kvm-amd.ko. +static bool dummy_pmu_hw_event_available(struct kvm_pmc *pmc) +{ + return true; +} + +static struct kvm_pmc *dummy_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx) +{ + return NULL; +} + +static struct kvm_pmc *dummy_pmu_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu, + unsigned int idx, u64 *mask) +{ + return NULL; +} + +static bool dummy_pmu_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx) +{ + return false; +} + +static struct kvm_pmc *dummy_pmu_msr_idx_to_pmc(struct kvm_vcpu *vcpu, u32 msr) +{ + return NULL; +} + +static bool dummy_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr) +{ + return 0; +} + +static int dummy_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) +{ + return 1; +} + +static int dummy_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) +{ + return 1; +} + +static void dummy_pmu_refresh(struct kvm_vcpu *vcpu) +{ +} + +static void dummy_pmu_init(struct kvm_vcpu *vcpu) +{ +} + +static void dummy_pmu_reset(struct kvm_vcpu *vcpu) +{ +} + +struct kvm_pmu_ops dummy_pmu_ops = { + .hw_event_available = dummy_pmu_hw_event_available, + .pmc_idx_to_pmc = dummy_pmc_idx_to_pmc, + .rdpmc_ecx_to_pmc = dummy_pmu_rdpmc_ecx_to_pmc, + .msr_idx_to_pmc = dummy_pmu_msr_idx_to_pmc, + .is_valid_rdpmc_ecx = dummy_pmu_is_valid_rdpmc_ecx, + .is_valid_msr = dummy_pmu_is_valid_msr, + .get_msr = dummy_pmu_get_msr, + .set_msr = dummy_pmu_set_msr, + .refresh = dummy_pmu_refresh, + .init = dummy_pmu_init, + .reset = dummy_pmu_reset, +}; +//========== end of dummy pmu ============= + struct kvm_x86_nested_ops pvm_nested_ops = {}; static struct kvm_x86_ops pvm_x86_ops __initdata = { @@ -2811,6 +2882,7 @@ static struct kvm_x86_init_ops pvm_init_ops __initdata = { .hardware_setup = hardware_setup, .runtime_ops = &pvm_x86_ops, + .pmu_ops = &dummy_pmu_ops, }; static void pvm_exit(void) From patchwork Mon Feb 26 14:36:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572330 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B51313B2BE; Mon, 26 Feb 2024 14:37:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958230; cv=none; b=GepthXsp7w27LghAs+v9/AErriKw+FhZsHuoZ1fAUEHciWF1R9jYmzgRhKDgCoHYij4R/2vaBgd7HXyXt/JS4JCOW/xmBiHNxk9y4WHBFIn0wg/fwlpgxJv3agccA7c+dIruK4wl6Pdtd7ELzhkaZYzVU9Bctr+LkyfJuYhDJM4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958230; c=relaxed/simple; bh=oFLpUu2MeS6BRZ/Jxq6k+lA8YlZJnhVOkKcRl5d5Fow=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=GV+ylua7fheFl3uRfLZcrDXnNr0XnBg4iqiT/1iKtcLwlWSCX2swWux7PfmjYg2hQ4Lyz9cfAn9wItS79Zm+btr+RUjHndks7tUeGFUxrnyeMLlriBS+o65CqV7BOXxHoSZ8dNU04c0qzQ2+qg7r42sJtVWi0tlgqgNy65vuHyE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=iLckvUP1; arc=none smtp.client-ip=209.85.214.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="iLckvUP1" Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-1db6e0996ceso23695275ad.2; Mon, 26 Feb 2024 06:37:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958227; x=1709563027; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=aUYgIc9I0TbzhgfqqzmtJ/r7wde5wI9PGH/CYLTDvPE=; b=iLckvUP15BVLyuUhvdIyi+ZB/uypMJv+NuNqi9LjlgpBsXZJB+vTCb/h2+7KLZ/8/S XJHCDX3O1UtfIuoyn6uPQCMluaVjLnL3SkiqEj+07ATztrV2JID6l6nphZ5fvzKP/jdk tGt8vrZeNjaAweKJjcOEjWUXIaEHhucykgiAi7Fs51uGU98jcCeNQ4/AWNOyAfyXgbex dwYBBsWsJcVXl8kKtKnWkiRR0xL8d1/0qBbt48KRMbLmDCBgvWC7FCaSRRMjlmfDlQ61 2upN6PqvRPUW4I93K6lIqvheZMzMbkFVqQSchM8oPWAsu5di6K99VfcOMLpMrTK3biSw JTJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958227; x=1709563027; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=aUYgIc9I0TbzhgfqqzmtJ/r7wde5wI9PGH/CYLTDvPE=; b=fPg2fPWROG4TKCsMV7ZYijn+2rYKvg4+Hzpx2555qL0tMHChqimWbhumwhZWumqrKB C0l1OJXKlPZjMwWwnC9rQZ4Fx5NVM2ErkFgWg0aPaiFW28Q7iwukB00EZOQFPTVqTgZ/ uIWU9zOShK/U/mXthXVuLRiCDps+ErnSlMcHwmJmiHjBOj7rEetg+yh81TIUe6Z0TAwU iXSkDeYto/SYKXxCq+Df1y/xzvTLFeyqCzTmY0vV33yJDwQDG72DXcvJodYqDNKWyAmC X1Hojv1qh/AwoEP0BJIXeNcQZZ4blVFGjGcDwpcoxq0WIyGHlBO07yHhE6jt5VjFDrO8 kP0A== X-Forwarded-Encrypted: i=1; AJvYcCXATpXRsbkHv5jwinxVDbyuXtU0JXvwuWrp2BYRny3NEd8OZ5IrNwmKw4FtDvuNVGMG7SZ7pdBPuNADrTPXyACx8j3F X-Gm-Message-State: AOJu0YzDSVgwMTJckG/haZamQ2hsWW+vXijmQJaU1fKIE0KeX+Qb1oH0 2F1mA724CFrW8WX7pDVxkuAZkVV3TFnugYmQSKC2joGcPkx6GN1G1D59jM0S X-Google-Smtp-Source: AGHT+IFJaJHhws5Iu1WFFipNsAdqIIPIxk4VjyF0aTOqhtII3ndXXVxgujlU2qF2jxijoIMCL/nFaw== X-Received: by 2002:a17:903:2343:b0:1db:d256:9327 with SMTP id c3-20020a170903234300b001dbd2569327mr8634686plh.19.1708958227515; Mon, 26 Feb 2024 06:37:07 -0800 (PST) Received: from localhost ([47.254.32.37]) by smtp.gmail.com with ESMTPSA id li6-20020a170903294600b001dc94fde843sm2603712plb.177.2024.02.26.06.37.06 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:37:07 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Hou Wenlong , Lai Jiangshan , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 46/73] KVM: x86/PVM: Support for CPUID faulting Date: Mon, 26 Feb 2024 22:36:03 +0800 Message-Id: <20240226143630.33643-47-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Hou Wenlong For PVM, CPUID faulting relies on hardware, so the guest could access the host CPUID information if CPUID faulting is not enabled. To enable the guest to access its own CPUID information, introduce a module parameter to force enable CPUID faulting for the guest. Suggested-by: Lai Jiangshan Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/kvm/pvm/pvm.c | 69 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 69 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index e6464095d40b..fd3d6f7301af 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -29,6 +29,9 @@ MODULE_AUTHOR("AntGroup"); MODULE_LICENSE("GPL"); +static bool __read_mostly enable_cpuid_intercept = 0; +module_param_named(cpuid_intercept, enable_cpuid_intercept, bool, 0444); + static bool __read_mostly is_intel; static unsigned long host_idt_base; @@ -168,6 +171,53 @@ static bool pvm_disallowed_va(struct kvm_vcpu *vcpu, u64 va) return !pvm_guest_allowed_va(vcpu, va); } +static void __set_cpuid_faulting(bool on) +{ + u64 msrval; + + rdmsrl_safe(MSR_MISC_FEATURES_ENABLES, &msrval); + msrval &= ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT; + msrval |= (on << MSR_MISC_FEATURES_ENABLES_CPUID_FAULT_BIT); + wrmsrl(MSR_MISC_FEATURES_ENABLES, msrval); +} + +static void reset_cpuid_intercept(struct kvm_vcpu *vcpu) +{ + if (test_thread_flag(TIF_NOCPUID)) + return; + + if (enable_cpuid_intercept || cpuid_fault_enabled(vcpu)) + __set_cpuid_faulting(false); +} + +static void set_cpuid_intercept(struct kvm_vcpu *vcpu) +{ + if (test_thread_flag(TIF_NOCPUID)) + return; + + if (enable_cpuid_intercept || cpuid_fault_enabled(vcpu)) + __set_cpuid_faulting(true); +} + +static void pvm_update_guest_cpuid_faulting(struct kvm_vcpu *vcpu, u64 data) +{ + bool guest_enabled = cpuid_fault_enabled(vcpu); + bool set_enabled = data & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT; + struct vcpu_pvm *pvm = to_pvm(vcpu); + + if (!(guest_enabled ^ set_enabled)) + return; + if (enable_cpuid_intercept) + return; + if (test_thread_flag(TIF_NOCPUID)) + return; + + preempt_disable(); + if (pvm->loaded_cpu_state) + __set_cpuid_faulting(set_enabled); + preempt_enable(); +} + // switch_to_smod() and switch_to_umod() switch the mode (smod/umod) and // the CR3. No vTLB flushing when switching the CR3 per PVM Spec. static inline void switch_to_smod(struct kvm_vcpu *vcpu) @@ -335,6 +385,8 @@ static void pvm_prepare_switch_to_guest(struct kvm_vcpu *vcpu) segments_save_host_and_switch_to_guest(pvm); + set_cpuid_intercept(vcpu); + kvm_set_user_return_msr(0, (u64)entry_SYSCALL_64_switcher, -1ull); kvm_set_user_return_msr(1, pvm->msr_tsc_aux, -1ull); if (ia32_enabled()) { @@ -352,6 +404,8 @@ static void pvm_prepare_switch_to_host(struct vcpu_pvm *pvm) ++pvm->vcpu.stat.host_state_reload; + reset_cpuid_intercept(&pvm->vcpu); + #ifdef CONFIG_MODIFY_LDT_SYSCALL if (unlikely(current->mm->context.ldt)) kvm_load_ldt(GDT_ENTRY_LDT*8); @@ -937,6 +991,17 @@ static int pvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_DEBUGCTLMSR: /* It is ignored now. */ break; + case MSR_MISC_FEATURES_ENABLES: + ret = kvm_set_msr_common(vcpu, msr_info); + if (!ret) + pvm_update_guest_cpuid_faulting(vcpu, data); + break; + case MSR_PLATFORM_INFO: + if ((data & MSR_PLATFORM_INFO_CPUID_FAULT) && + !boot_cpu_has(X86_FEATURE_CPUID_FAULT)) + return 1; + ret = kvm_set_msr_common(vcpu, msr_info); + break; case MSR_PVM_VCPU_STRUCT: if (!PAGE_ALIGNED(data)) return 1; @@ -2925,6 +2990,10 @@ static int __init hardware_cap_check(void) pr_warn("CMPXCHG16B is required for guest.\n"); return -EOPNOTSUPP; } + if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT) && enable_cpuid_intercept) { + pr_warn("Host doesn't support cpuid faulting.\n"); + return -EOPNOTSUPP; + } return 0; } From patchwork Mon Feb 26 14:36:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572331 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E24C13DB90; Mon, 26 Feb 2024 14:37:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958232; cv=none; b=YpOYci3PazI3ktCmaYD6Y01eRCZ/esQ5abmVZqX1ox/fLM785zIKlI6PIWWQO5Jj+L/R4rSeMjA0RHBNTtg0pAxavDFZcFttsLfiB2pZsR2cb29VBUpw/Eq+Jd9BaWXEThTIpOCUe/+v/yMgAQp+8hONHvdIVPNju6BB+plWUT8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958232; c=relaxed/simple; bh=dbue2J2vSwgwnN6frugUZa4aGyoh+CRYUv+AVhY+tPQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=EGZBVyMSUb4LBTqpvxC+cwtmf2E4WNO3NVmY8v9pNJEyAINaX4euqgjZyHP70xBJ2fDQEk0l0K0XQ3WJMo3pI77IPvvw2WcCwm5cXAqHKrEJ7HckSLXwhC8NYg2WKhJy73REtk0GhNU6Fao16w3RTUuhvRCvpcIhuz+2axkxDdU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=iaUtwL3H; arc=none smtp.client-ip=209.85.214.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="iaUtwL3H" Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-1dc91d2384cso10130245ad.1; Mon, 26 Feb 2024 06:37:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958230; x=1709563030; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=O/RvrWbn9sfOzdiVMPXkxTdPgTjoBGXeTTjPXRznq0o=; b=iaUtwL3HBWkfZLhgia7eVWky2Bt1q+VRnkOI03i6l4+Kq6Tbt+ov80oSUJ9w/l5lAw vgYUdRcSbxj1qb6fZaalm0yBeLchpfEkOjqrFCw7sJPJms55QV+IqhRXrE+YSgtBuQOs IrSDpDpdZQOL9gl0gJu9S3+wQmj7yuMOGePnz09jU79JhEH448YOW6EC5J3biu1BI6q8 a3JO5ea47znvj2Vwja+A2m+2cqUNFemKfPIs3wwE/lINLv/teIpP4otQQMHaWioWpjZZ QT1sF5UpDGR0NHqHFDvZ8tQCSk5TtGPdFRfc13XcX2A0BFnbcSUPNoXRBcMqgvSPDpXq eYfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958230; x=1709563030; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=O/RvrWbn9sfOzdiVMPXkxTdPgTjoBGXeTTjPXRznq0o=; b=nT1hYfKW/lVFbcnyFPIcv8nYRiImdWtwpm6IOcnxNGxSxvvFWEMplAYHmg3wlBN/IV QjYx5CCH85Cf5NLOhAj6Bg8Gdj9GJ7W70CuwGA5/Bm8mLjBtuCSKmBXCBy5yrRADgaGT QJ9bng1DjQ69GatF63ItbfFeflZnSJdsKzVzd7Iqo3DNpyW4r+V/c5jhQXyuE4n3aSjw I72BdXn48j59lsmghwnr9EuUesjPTOuzWDmZPLC2Zq3IgwHbiq4B5szfGgQFlk6pZCRB Per7py1tFz3tvELGZepHrbAFk+ubswtS7zYjpKGP6HP2Et2jKfbRlhtEZX/xN+q2Qm/x 5PPQ== X-Forwarded-Encrypted: i=1; AJvYcCWUIrqEFkrH/cH9XnRc6uQQ6YAl5xwWKqdHXbsAIevstkU9i3luGfYcUjJn3qlPXZpv4F/WGFn7D6FT9/xyUwKzzNOZ X-Gm-Message-State: AOJu0YyFE59fUxicBFi0VSx63hP0KY2/9AAem1wMT88xeE818ZkFskzQ rztK6tHqcWToPy5kU17LoTGUteH3lHDWihkeRpxYozskGg5So2VBXwirA2/T X-Google-Smtp-Source: AGHT+IE0Yc9xW1Lt/LivWzFFfjqGKYtZxLFt9+CYuEpG7BYUPujv+WKZLnxBBFYicjNxCLlGbqrD0w== X-Received: by 2002:a17:902:ec85:b0:1dc:652d:708f with SMTP id x5-20020a170902ec8500b001dc652d708fmr8356122plg.15.1708958230553; Mon, 26 Feb 2024 06:37:10 -0800 (PST) Received: from localhost ([47.254.32.37]) by smtp.gmail.com with ESMTPSA id i6-20020a170902eb4600b001dc3c3be4a4sm4031243pli.304.2024.02.26.06.37.09 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:37:10 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 47/73] KVM: x86/PVM: Handle the left supported MSRs in msrs_to_save_base[] Date: Mon, 26 Feb 2024 22:36:04 +0800 Message-Id: <20240226143630.33643-48-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan The MSR_TSC_AUX is allowed to be modified by the guest to support RDTSCP/RDPID in the guest. However, the MSR_IA32_FEAT_CTRL is not fully supported for the guest at this time; only the FEAT_CTL_LOCKED bit is valid for the guest. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index fd3d6f7301af..a32d2728eb02 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -854,6 +854,32 @@ static void pvm_msr_filter_changed(struct kvm_vcpu *vcpu) /* Accesses to MSRs are emulated in hypervisor, nothing to do here. */ } +static inline bool is_pvm_feature_control_msr_valid(struct vcpu_pvm *pvm, + struct msr_data *msr_info) +{ + /* + * currently only FEAT_CTL_LOCKED bit is valid, maybe + * vmx, sgx and mce associated bits can be valid when those features + * are supported for guest. + */ + u64 valid_bits = pvm->msr_ia32_feature_control_valid_bits; + + if (!msr_info->host_initiated && + (pvm->msr_ia32_feature_control & FEAT_CTL_LOCKED)) + return false; + + return !(msr_info->data & ~valid_bits); +} + +static void pvm_update_uret_msr(struct vcpu_pvm *pvm, unsigned int slot, + u64 data, u64 mask) +{ + preempt_disable(); + if (pvm->loaded_cpu_state) + kvm_set_user_return_msr(slot, data, mask); + preempt_enable(); +} + /* * Reads an msr value (of 'msr_index') into 'msr_info'. * Returns 0 on success, non-0 otherwise. @@ -899,9 +925,15 @@ static int pvm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_SYSENTER_ESP: msr_info->data = pvm->unused_MSR_IA32_SYSENTER_ESP; break; + case MSR_TSC_AUX: + msr_info->data = pvm->msr_tsc_aux; + break; case MSR_IA32_DEBUGCTLMSR: msr_info->data = 0; break; + case MSR_IA32_FEAT_CTL: + msr_info->data = pvm->msr_ia32_feature_control; + break; case MSR_PVM_VCPU_STRUCT: msr_info->data = pvm->msr_vcpu_struct; break; @@ -988,9 +1020,18 @@ static int pvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_SYSENTER_ESP: pvm->unused_MSR_IA32_SYSENTER_ESP = data; break; + case MSR_TSC_AUX: + pvm->msr_tsc_aux = data; + pvm_update_uret_msr(pvm, 1, data, -1ull); + break; case MSR_IA32_DEBUGCTLMSR: /* It is ignored now. */ break; + case MSR_IA32_FEAT_CTL: + if (!is_intel || !is_pvm_feature_control_msr_valid(pvm, msr_info)) + return 1; + pvm->msr_ia32_feature_control = data; + break; case MSR_MISC_FEATURES_ENABLES: ret = kvm_set_msr_common(vcpu, msr_info); if (!ret) From patchwork Mon Feb 26 14:36:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572332 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 63A86141988; Mon, 26 Feb 2024 14:37:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958235; cv=none; b=C28g9083lhhBAqOx7Q0XTfw4Ph7aUSNdVJQ/PeLAUw2nf5K+A5zFQXWls1NBZKVZWc5CftTrNarsD9vJn4hSFvLXSIOzrHgErEeyc2YckWNdHadbRjd732hRrXp+2xWu1mpU2/g3snZ4j+TtLthP93tT2OXwIs4uJoL/skPCMYg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958235; c=relaxed/simple; bh=yFTowSeo6zJ347eo3a63wiouXx/voA/dHD8YL405VJQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=L1CxWFP+xkIYr3PzU/+xgnKL5zZKJRia11A6Nz95vTgodxVfEUE9ZzfUOwJY0y/cZsSYKAOldAxiLtcM9+x0DzzYnrJysfiIY+4YRpaJybUMJU6LpGZCVHVWo7y0xaCcE0OlhTV5q0kVMhjvtgtfoJw0tFhojB3HUvN3ILkpAk4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=FVCO6XNH; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="FVCO6XNH" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-1dc3b4b9b62so26747325ad.1; Mon, 26 Feb 2024 06:37:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958233; x=1709563033; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7XTvgh4XHyCc0kVcHAEtjm6YqaHRshQPoue6SNYuJFs=; b=FVCO6XNHiXC7WQhTsUMklTYmkEvI2mHcjrx2e7rCyWEY8x0LQA01DCFFGz+rpCuoI+ amAaTMjZsJg1UGJV24mug2JP6DuLJglpsmVABqwWkNClXyeIk2oN3XMjgWIB/PYuMmfx T6X3VkjSHeeOaAVWHGUN8Az7MSaRlc8mdaBKXKzfW8bJd6W/rgfJ8SkbUr7htx2QGlYZ Sbuus0+r23xCcPp2zu8rSuUNO1JSY2FH1BGhf+j3iNZuSrTPkO+FBCfnc2HsNhfx+GMK hh4dprnolUxKfct7FxDIDP/+zGFb7QSM6ZM2JlRm8FHC4a2XkPxmC8wPUYGilImHpLNB 3I7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958233; x=1709563033; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7XTvgh4XHyCc0kVcHAEtjm6YqaHRshQPoue6SNYuJFs=; b=IOXAo4+JcPeFi4y5QoXQbJ3LrJ+cO0SNjcsP5Gl5IE6vfJ3NQp6DbUyhBcj13zPKOq bpWyNMpF0/doTs+ZWwuR+N80Khg+jEMWT7a0548+WSe8JL17A439LzI/6QkdEjm2bFX9 oBKJ2FuywE/zM4f7KkzaYWbHXcn8tXWPz3i4fIMC1YtbAUob3Uh6XFRdaTwcMCfMGvUf acoSgEMwfB6cNjtgeFRrDjU5a/zNHb5dylgE5rzNB4vCoTYZQ5bbnNUpFA/YAIE+z5z0 TVVcOjzWBSovWYRRSspmLB5kBqBXCnhS4mw9BJGgeVNiBcalF56NshitJmBlxFsw3vF9 QQoA== X-Forwarded-Encrypted: i=1; AJvYcCWYxXffqtpiJF4ak3LHylkoBWoXqnwveuPPklX8bbKZt08JpAJIny9tHwh5Uuu06BRkYNRDYJxBl0c1AmqI0Ocps0h/ X-Gm-Message-State: AOJu0YzA74egCDYz6Jdgw0kr0S40LWIEta5XC2yKQZs3rFY46sLYlnsC wNkrgQaa8BllMRd1tiQu8pYs8pYnFsr6Q75tXUdqGsvj9HdqN0I7j6Sa4B3n X-Google-Smtp-Source: AGHT+IF5XSfJSZznMuAc9toziLAfWWGD+vFYIlYFoJV2gE5OD7aU82GBsy1OepFYT3Tcf3W2YRYEvw== X-Received: by 2002:a17:903:4303:b0:1db:9ff1:b59b with SMTP id jz3-20020a170903430300b001db9ff1b59bmr6876540plb.23.1708958233553; Mon, 26 Feb 2024 06:37:13 -0800 (PST) Received: from localhost ([47.254.32.37]) by smtp.gmail.com with ESMTPSA id p6-20020a1709026b8600b001db7e3411f7sm3963697plk.134.2024.02.26.06.37.12 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:37:13 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 48/73] KVM: x86/PVM: Implement system registers setting callbacks Date: Mon, 26 Feb 2024 22:36:05 +0800 Message-Id: <20240226143630.33643-49-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan In PVM, the hardware CR0, CR3, and EFER are fixed, and the value of the guest must match the fixed value; otherwise, the guest is not allowed to run on the CPU. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 51 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index a32d2728eb02..b261309fc946 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -1088,6 +1088,51 @@ static int pvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) return ret; } +static void pvm_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) +{ + /* Nothing to do */ +} + +static int pvm_set_efer(struct kvm_vcpu *vcpu, u64 efer) +{ + vcpu->arch.efer = efer; + + return 0; +} + +static bool pvm_is_valid_cr0(struct kvm_vcpu *vcpu, unsigned long cr4) +{ + return true; +} + +static void pvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) +{ + if (vcpu->arch.efer & EFER_LME) { + if (!is_paging(vcpu) && (cr0 & X86_CR0_PG)) + vcpu->arch.efer |= EFER_LMA; + + if (is_paging(vcpu) && !(cr0 & X86_CR0_PG)) + vcpu->arch.efer &= ~EFER_LMA; + } + + vcpu->arch.cr0 = cr0; +} + +static bool pvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) +{ + return true; +} + +static void pvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) +{ + unsigned long old_cr4 = vcpu->arch.cr4; + + vcpu->arch.cr4 = cr4; + + if ((cr4 ^ old_cr4) & (X86_CR4_OSXSAVE | X86_CR4_PKE)) + kvm_update_cpuid_runtime(vcpu); +} + static void pvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg) { @@ -2912,13 +2957,19 @@ static struct kvm_x86_ops pvm_x86_ops __initdata = { .set_segment = pvm_set_segment, .get_cpl = pvm_get_cpl, .get_cs_db_l_bits = pvm_get_cs_db_l_bits, + .is_valid_cr0 = pvm_is_valid_cr0, + .set_cr0 = pvm_set_cr0, .load_mmu_pgd = pvm_load_mmu_pgd, + .is_valid_cr4 = pvm_is_valid_cr4, + .set_cr4 = pvm_set_cr4, + .set_efer = pvm_set_efer, .get_gdt = pvm_get_gdt, .set_gdt = pvm_set_gdt, .get_idt = pvm_get_idt, .set_idt = pvm_set_idt, .set_dr7 = pvm_set_dr7, .sync_dirty_debug_regs = pvm_sync_dirty_debug_regs, + .cache_reg = pvm_cache_reg, .get_rflags = pvm_get_rflags, .set_rflags = pvm_set_rflags, .get_if_flag = pvm_get_if_flag, From patchwork Mon Feb 26 14:36:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572333 Received: from mail-oo1-f49.google.com (mail-oo1-f49.google.com [209.85.161.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 48D5F1420DA; Mon, 26 Feb 2024 14:37:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958239; cv=none; b=tF+sA2x1GzrBizQg9um2Pk37beIB6T/kpMJ9Do2m+tUPclCUn+s4/my1ImE1FYgkBiCnjvM5BLgsfFTjGmDbwL4ZEB4+zIzmo+wai2WcHo+eweCfdzCUYNnQ4wRw9P+wdFW3jWdOoRPhencXkskLuFlSyFi1F4FFxGxKjOXzwik= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958239; c=relaxed/simple; bh=llmkTWOAec6cDScmK8+FHAQ4llySQV//Sto5/x2wJt4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=M6oNpWCMhgPb/PYgSlpi7MMYJEhPj+Cs10F/7Xe7R5T3B0oGkCZKR7dfjxGyazdRn/I/L8w93FFYJgJSJEIrz4A2kOU3jItWg2jqLtTtpDxUJg/8gOkjLcMjCsjliHG5YJAWkWKiVnyPoOPWg0LnnV09ryC0byIsznWkELa9goE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=k9wdROLO; arc=none smtp.client-ip=209.85.161.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="k9wdROLO" Received: by mail-oo1-f49.google.com with SMTP id 006d021491bc7-5a058b9bd2dso1430383eaf.2; Mon, 26 Feb 2024 06:37:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958237; x=1709563037; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ykebOMEhe42hVzZj9dP5i+rdn/qFbIaVrdxgINn/tr0=; b=k9wdROLOyvwHuBHNcfxN9Am/Nyyn0/VOCKv1+pi6Ziyt2GTQDB2dlLQlvoxjHtjpyq 9lzToBexSRO31RpztlcJagtuFvlyN0jLOfMHRVuvhDZhi5KlMSp7Rav7ce2Mp6txVoNt 6+SmpZR5gHWvvahR1ghbm/D5vsmBn4sPeha6tUVMjVGW29Jws53HSM3ZpdAZdk0WbjtY MOOxE7Er/hidX2gFgfw5P2pmtxdsKPEZvH/bn2ZRa6zj8Co9KolUn1zxrVss+wGWxjgE KlF7+qFCR9wZNrCa4kE55Ek2aJtvLrotEz4T4VxyVBZ+Dcigt/CzAo8TU6vF9WhZaMRY aB2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958237; x=1709563037; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ykebOMEhe42hVzZj9dP5i+rdn/qFbIaVrdxgINn/tr0=; b=paNwf/7relrXgy6rtB7Q3evD/s4ICFt9V/r4X1DwkDMbRMTP9FOoeFgB2A3eKs8OvG UgDPrSyKydCito6r4n3xBYWmXzMcmSrlPPxtcaWjY4Cu6oiWPWzUPGojbZlNC+ylMFii rUtcbMQeXqi7xAhq2+faRMav/beSfeU944f2VftsJ3UlJbWmlEHU9msOhWI+JuGl9WpO CdFNLrcVVMbyLCmhmjvs1oFFBF67/2bq6OdcuUY/osEFkE4Bqtm9Gou1w4y4CYFLcTz9 TXEM174Nxh+Yitd+C+7W/eBH3kkzcgqGaQJ2EwPdiUNhBIvSe5itwI0drMnbnr36MCkC zj1Q== X-Forwarded-Encrypted: i=1; AJvYcCUUzabg0gTuqJ8VriLOKNH16zpUMdcUbo7Ymt7YT/hwxsBbT9t8BNP4lW+nUV9vAMNVn1W0yBeq9lmJ6RmOE9k/hJxs X-Gm-Message-State: AOJu0YyISOrGrrUZzv8rCD//Y+RqyaIi5rAbF/rTmGuT3AlEdgXmJMbx 0krMl7yvUYnE/pxFlrF6e+cSgG+FQp2EIdiWEg7K0+vnYz2afJYv3Lycc6UD X-Google-Smtp-Source: AGHT+IHJKCdVjrIAJjKbEQPVzAmdRI/uDh7uFII5PW1H93GcBZt4ZQV27iVaUqhBwEHgUMLpQnh6qA== X-Received: by 2002:a05:6359:4c1b:b0:17b:ac0e:a6ba with SMTP id kj27-20020a0563594c1b00b0017bac0ea6bamr4415509rwc.22.1708958237086; Mon, 26 Feb 2024 06:37:17 -0800 (PST) Received: from localhost ([47.88.5.130]) by smtp.gmail.com with ESMTPSA id j5-20020a63ec05000000b005dc8702f0a9sm4132218pgh.1.2024.02.26.06.37.16 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:37:16 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 49/73] KVM: x86/PVM: Implement emulation for non-PVM mode Date: Mon, 26 Feb 2024 22:36:06 +0800 Message-Id: <20240226143630.33643-50-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan The PVM hypervisor supports a modified long mode with PVM ABI, known as PVM mode. PVM mode includes a 64-bit supervisor mode, 64-bit user mode, and a 32-bit compatible user mode. The 32-bit supervisor mode and other operating modes are considered non-PVM modes. In PVM mode, the states of system registers are standard, and the guest is allowed to run on the hardware. So far, non-PVM mode is required for booting the guest and bringing up vCPUs. Currently, there is only basic support for non-PVM mode through instruction emulation. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kvm/pvm/pvm.c | 145 +++++++++++++++++++++++++++++++++++++++-- arch/x86/kvm/pvm/pvm.h | 1 + 2 files changed, 139 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/pvm/pvm.c b/arch/x86/kvm/pvm/pvm.c index b261309fc946..e4b8f0108c31 100644 --- a/arch/x86/kvm/pvm/pvm.c +++ b/arch/x86/kvm/pvm/pvm.c @@ -12,6 +12,7 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include +#include #include #include @@ -218,6 +219,104 @@ static void pvm_update_guest_cpuid_faulting(struct kvm_vcpu *vcpu, u64 data) preempt_enable(); } +/* + * Non-PVM mode is not a part of PVM. Basic support for it via emulation. + * Non-PVM mode is required for booting the guest and bringing up vCPUs so far. + * + * In future, when VMM can directly boot the guest and bring vCPUs up from + * 64-bit mode without any help from non-64-bit mode, then the support non-PVM + * mode will be removed. + */ +#define CONVERT_TO_PVM_CR0_OFF (X86_CR0_NW | X86_CR0_CD) +#define CONVERT_TO_PVM_CR0_ON (X86_CR0_NE | X86_CR0_AM | X86_CR0_WP | \ + X86_CR0_PG | X86_CR0_PE) + +static bool try_to_convert_to_pvm_mode(struct kvm_vcpu *vcpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + unsigned long cr0 = vcpu->arch.cr0; + + if (!is_long_mode(vcpu)) + return false; + if (!pvm->segments[VCPU_SREG_CS].l) { + if (is_smod(pvm)) + return false; + if (!pvm->segments[VCPU_SREG_CS].db) + return false; + } + + /* Atomically set EFER_SCE converting to PVM mode. */ + if ((vcpu->arch.efer | EFER_SCE) != vcpu->arch.efer) + vcpu->arch.efer |= EFER_SCE; + + /* Change CR0 on converting to PVM mode. */ + cr0 &= ~CONVERT_TO_PVM_CR0_OFF; + cr0 |= CONVERT_TO_PVM_CR0_ON; + if (cr0 != vcpu->arch.cr0) + kvm_set_cr0(vcpu, cr0); + + /* Atomically set MSR_STAR on converting to PVM mode. */ + if (!kernel_cs_by_msr(pvm->msr_star)) + pvm->msr_star = ((u64)pvm->segments[VCPU_SREG_CS].selector << 32) | + ((u64)__USER32_CS << 48); + + pvm->non_pvm_mode = false; + + return true; +} + +static int handle_non_pvm_mode(struct kvm_vcpu *vcpu) +{ + struct vcpu_pvm *pvm = to_pvm(vcpu); + int ret = 1; + unsigned int count = 130; + + if (try_to_convert_to_pvm_mode(vcpu)) + return 1; + + while (pvm->non_pvm_mode && count-- != 0) { + if (kvm_test_request(KVM_REQ_EVENT, vcpu)) + return 1; + + if (try_to_convert_to_pvm_mode(vcpu)) + return 1; + + ret = kvm_emulate_instruction(vcpu, 0); + + if (!ret) + goto out; + + /* don't do mode switch in emulation */ + if (!is_smod(pvm)) + goto emulation_error; + + if (vcpu->arch.exception.pending) + goto emulation_error; + + if (vcpu->arch.halt_request) { + vcpu->arch.halt_request = 0; + ret = kvm_emulate_halt_noskip(vcpu); + goto out; + } + /* + * Note, return 1 and not 0, vcpu_run() will invoke + * xfer_to_guest_mode() which will create a proper return + * code. + */ + if (__xfer_to_guest_mode_work_pending()) + return 1; + } + +out: + return ret; + +emulation_error: + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; + vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION; + vcpu->run->internal.ndata = 0; + return 0; +} + // switch_to_smod() and switch_to_umod() switch the mode (smod/umod) and // the CR3. No vTLB flushing when switching the CR3 per PVM Spec. static inline void switch_to_smod(struct kvm_vcpu *vcpu) @@ -359,6 +458,10 @@ static void pvm_prepare_switch_to_guest(struct kvm_vcpu *vcpu) if (pvm->loaded_cpu_state) return; + // we can't load guest state to hardware when guest is not on long mode + if (unlikely(pvm->non_pvm_mode)) + return; + pvm->loaded_cpu_state = 1; #ifdef CONFIG_X86_IOPL_IOPERM @@ -1138,6 +1241,11 @@ static void pvm_get_segment(struct kvm_vcpu *vcpu, { struct vcpu_pvm *pvm = to_pvm(vcpu); + if (pvm->non_pvm_mode) { + *var = pvm->segments[seg]; + return; + } + // Update CS or SS to reflect the current mode. if (seg == VCPU_SREG_CS) { if (is_smod(pvm)) { @@ -1209,7 +1317,7 @@ static void pvm_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int if (cpl != var->dpl) goto invalid_change; if (cpl == 0 && !var->l) - goto invalid_change; + pvm->non_pvm_mode = true; } break; case VCPU_SREG_LDTR: @@ -1231,12 +1339,17 @@ static void pvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l) { struct vcpu_pvm *pvm = to_pvm(vcpu); - if (pvm->hw_cs == __USER_CS) { - *db = 0; - *l = 1; + if (pvm->non_pvm_mode) { + *db = pvm->segments[VCPU_SREG_CS].db; + *l = pvm->segments[VCPU_SREG_CS].l; } else { - *db = 1; - *l = 0; + if (pvm->hw_cs == __USER_CS) { + *db = 0; + *l = 1; + } else { + *db = 1; + *l = 0; + } } } @@ -1513,7 +1626,7 @@ static void pvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) * user mode, so that when the guest switches back to supervisor mode, * the X86_EFLAGS_IF is already cleared. */ - if (!need_update || !is_smod(pvm)) + if (unlikely(pvm->non_pvm_mode) || !need_update || !is_smod(pvm)) return; if (rflags & X86_EFLAGS_IF) { @@ -1536,7 +1649,11 @@ static u32 pvm_get_interrupt_shadow(struct kvm_vcpu *vcpu) static void pvm_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask) { + struct vcpu_pvm *pvm = to_pvm(vcpu); + /* PVM spec: ignore interrupt shadow when in PVM mode. */ + if (pvm->non_pvm_mode) + pvm->int_shadow = mask; } static void enable_irq_window(struct kvm_vcpu *vcpu) @@ -2212,6 +2329,9 @@ static int pvm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) struct vcpu_pvm *pvm = to_pvm(vcpu); u32 exit_reason = pvm->exit_vector; + if (unlikely(pvm->non_pvm_mode)) + return handle_non_pvm_mode(vcpu); + if (exit_reason == PVM_SYSCALL_VECTOR) return handle_exit_syscall(vcpu); else if (exit_reason >= 0 && exit_reason < FIRST_EXTERNAL_VECTOR) @@ -2546,6 +2666,13 @@ static fastpath_t pvm_vcpu_run(struct kvm_vcpu *vcpu) struct vcpu_pvm *pvm = to_pvm(vcpu); bool is_smod_befor_run = is_smod(pvm); + /* + * Don't enter guest if guest state is invalid, let the exit handler + * start emulation until we arrive back to a valid state. + */ + if (pvm->non_pvm_mode) + return EXIT_FASTPATH_NONE; + trace_kvm_entry(vcpu); pvm_load_guest_xsave_state(vcpu); @@ -2657,6 +2784,10 @@ static void pvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT)) vcpu->arch.msr_platform_info &= ~MSR_PLATFORM_INFO_CPUID_FAULT; + // Non-PVM mode resets + pvm->non_pvm_mode = true; + pvm->msr_star = 0; + // X86 resets for (i = 0; i < ARRAY_SIZE(pvm->segments); i++) reset_segment(&pvm->segments[i], i); diff --git a/arch/x86/kvm/pvm/pvm.h b/arch/x86/kvm/pvm/pvm.h index e49d9dc70a94..1a4feddb13b3 100644 --- a/arch/x86/kvm/pvm/pvm.h +++ b/arch/x86/kvm/pvm/pvm.h @@ -106,6 +106,7 @@ struct vcpu_pvm { int loaded_cpu_state; int int_shadow; + bool non_pvm_mode; bool nmi_mask; unsigned long guest_dr7; From patchwork Mon Feb 26 14:36:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572334 Received: from mail-oo1-f51.google.com (mail-oo1-f51.google.com [209.85.161.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BAD45145B0E; Mon, 26 Feb 2024 14:37:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958246; cv=none; b=hNGFW77DOcvTzUUgFTNqEBNxxEfp1zc8lA21X29Pp4/XS+cAhHsw/czkUp4M///29urYH4pxHa72yzwKjWJ5vRXf9wrCbSA0ccxjgnADydksaMxlrpa8e4bGYXNoYaoKt8PltSWv/1vOMVdB03Oc29niYDIMfiSfYUOIG8/OKM4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958246; c=relaxed/simple; bh=dqCBbEmV/ZV8A16xi0OdKKYnGJMxg43dgKZF/SkcABw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=fK/6Vh0GNrKeCKY0vO/dKcO8dn1V8fFP+up1QLh4/kIQlzBqFigmoMPBYB4nkk5LJDs+rrNXhv0nT+1WeboqPEFBnZUQZW2DvJ5XHctL/8k5uDGbaazEhdR6SPqavUTDwqsbbJt5lVDCQxWcLGA18/J+Yth8vdvRe8Xsl/s6vb8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Ix21bWJv; arc=none smtp.client-ip=209.85.161.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Ix21bWJv" Received: by mail-oo1-f51.google.com with SMTP id 006d021491bc7-5a0932aa9ecso390233eaf.3; Mon, 26 Feb 2024 06:37:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958243; x=1709563043; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2ObDfmX3udGGUccUu3hcZNpHGnDogYqHkynq3vwZFUA=; b=Ix21bWJvGhsRoCwFDzngd6U456jeRk0imiJfCnYe7z1qfvJ1zLfUGWvCcRzU+SIMQ1 bKN7YawYlEu91Asq+FVHbDrrIZmV96ScKDyCyUVWd951ZymcFslTc32hJWEAQP8Uhyk1 4mwPfVK21wCB60anGBKtaNxdksWVWfD4ccBomByS7X5NdvvXIY0TiPZgPJ3/r87hOZQf xgVYKopcJ+L+MOsDJPV01Z6VlSpDjQCsP6oMxs6L+bOZAkEAmjpGyzTJFE1J4SQUoc3K 3gUCgbUPXm44yzsPvMQ/5ZpHeKsY724psROH41GWMaK5vFPMKub0IbwiNUlfK/zLKidA ckaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958243; x=1709563043; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2ObDfmX3udGGUccUu3hcZNpHGnDogYqHkynq3vwZFUA=; b=taRsOBUbZrPIGA3eXoGztE0eajD8B6UmN8UTJlje+Bn+qyu/M1wcO+zo6u9Nm/cg6n GDihOXs5zD/jImjlxzm+uDzB+khlYgvZ7FYMx4Ce03xCfxDaBUgWvnW/IF6fDHBVIXnh E8Tb+KrbZqVjfRyv9Hw2TwQS9tsv5+7H+nIwhZK+40AgWUMY5poh/njGUf0Q15bJr8kt KnOn/lzozZ4N+/89ZaCu4gQ7aN+d1z0DCneYonLkChf3Gx6RsD0rp6SZGVVWIJCKx8+N u3nYCEkztQ+3gn8dUWdygJVqVCv8flTisb78+a+F4xm17YaoBZBORU0jyQ0S+A/CO+Nl udWQ== X-Forwarded-Encrypted: i=1; AJvYcCUK7HhCb/AKRWRircaVysBPXBwSEP26uOfyvq0hdcQknLcXBT8ruukyQ3sH+2T/4nAZD2//8K0Bn12ft0IijpL+gXYF X-Gm-Message-State: AOJu0YzVBuzJNnbyk42kZoVksHn1XkAe4kjNhHbKrMZPhbAr0LPlrXPA WxZ8cQM0WTK2Y+Wy590tXdBc8OBAfoj8aAmDmmiv0f+0nAqbgIYDHehZJXH3 X-Google-Smtp-Source: AGHT+IHLN/SUYjzssbaEOh6SOBt5tP/VwGseuF8PdDF5OrypYo5qkGwk/tshDYpQEUenhcMGQma/mQ== X-Received: by 2002:a05:6358:e483:b0:17b:b13e:5b31 with SMTP id by3-20020a056358e48300b0017bb13e5b31mr1130781rwb.6.1708958243486; Mon, 26 Feb 2024 06:37:23 -0800 (PST) Received: from localhost ([47.254.32.37]) by smtp.gmail.com with ESMTPSA id z25-20020a631919000000b005dc85821c80sm3976552pgl.12.2024.02.26.06.37.22 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:37:23 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Hou Wenlong , Lai Jiangshan , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" , Andrew Morton , Alexey Dobriyan , Thomas Garnier Subject: [RFC PATCH 50/73] x86/tools/relocs: Cleanup cmdline options Date: Mon, 26 Feb 2024 22:36:07 +0800 Message-Id: <20240226143630.33643-51-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Hou Wenlong Collect all cmdline options into a structure to make code clean and to be easy to add new cmdline option. Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/tools/relocs.c | 30 ++++++++++++++---------------- arch/x86/tools/relocs.h | 19 +++++++++++++------ arch/x86/tools/relocs_common.c | 27 +++++++++------------------ 3 files changed, 36 insertions(+), 40 deletions(-) diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c index cf28a8f05375..743e5e44338b 100644 --- a/arch/x86/tools/relocs.c +++ b/arch/x86/tools/relocs.c @@ -125,13 +125,13 @@ static int is_reloc(enum symtype type, const char *sym_name) !regexec(&sym_regex_c[type], sym_name, 0, NULL, 0); } -static void regex_init(int use_real_mode) +static void regex_init(void) { char errbuf[128]; int err; int i; - if (use_real_mode) + if (opts.use_real_mode) sym_regex = sym_regex_realmode; else sym_regex = sym_regex_kernel; @@ -1164,7 +1164,7 @@ static int write32_as_text(uint32_t v, FILE *f) return fprintf(f, "\t.long 0x%08"PRIx32"\n", v) > 0 ? 0 : -1; } -static void emit_relocs(int as_text, int use_real_mode) +static void emit_relocs(void) { int i; int (*write_reloc)(uint32_t, FILE *) = write32; @@ -1172,12 +1172,12 @@ static void emit_relocs(int as_text, int use_real_mode) const char *symname); #if ELF_BITS == 64 - if (!use_real_mode) + if (!opts.use_real_mode) do_reloc = do_reloc64; else die("--realmode not valid for a 64-bit ELF file"); #else - if (!use_real_mode) + if (!opts.use_real_mode) do_reloc = do_reloc32; else do_reloc = do_reloc_real; @@ -1186,7 +1186,7 @@ static void emit_relocs(int as_text, int use_real_mode) /* Collect up the relocations */ walk_relocs(do_reloc); - if (relocs16.count && !use_real_mode) + if (relocs16.count && !opts.use_real_mode) die("Segment relocations found but --realmode not specified\n"); /* Order the relocations for more efficient processing */ @@ -1199,7 +1199,7 @@ static void emit_relocs(int as_text, int use_real_mode) #endif /* Print the relocations */ - if (as_text) { + if (opts.as_text) { /* Print the relocations in a form suitable that * gas will like. */ @@ -1208,7 +1208,7 @@ static void emit_relocs(int as_text, int use_real_mode) write_reloc = write32_as_text; } - if (use_real_mode) { + if (opts.use_real_mode) { write_reloc(relocs16.count, stdout); for (i = 0; i < relocs16.count; i++) write_reloc(relocs16.offset[i], stdout); @@ -1271,11 +1271,9 @@ static void print_reloc_info(void) # define process process_32 #endif -void process(FILE *fp, int use_real_mode, int as_text, - int show_absolute_syms, int show_absolute_relocs, - int show_reloc_info) +void process(FILE *fp) { - regex_init(use_real_mode); + regex_init(); read_ehdr(fp); read_shdrs(fp); read_strtabs(fp); @@ -1284,17 +1282,17 @@ void process(FILE *fp, int use_real_mode, int as_text, read_got(fp); if (ELF_BITS == 64) percpu_init(); - if (show_absolute_syms) { + if (opts.show_absolute_syms) { print_absolute_symbols(); return; } - if (show_absolute_relocs) { + if (opts.show_absolute_relocs) { print_absolute_relocs(); return; } - if (show_reloc_info) { + if (opts.show_reloc_info) { print_reloc_info(); return; } - emit_relocs(as_text, use_real_mode); + emit_relocs(); } diff --git a/arch/x86/tools/relocs.h b/arch/x86/tools/relocs.h index 4c49c82446eb..1cb0e235ad73 100644 --- a/arch/x86/tools/relocs.h +++ b/arch/x86/tools/relocs.h @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -30,10 +31,16 @@ enum symtype { S_NSYMTYPES }; -void process_32(FILE *fp, int use_real_mode, int as_text, - int show_absolute_syms, int show_absolute_relocs, - int show_reloc_info); -void process_64(FILE *fp, int use_real_mode, int as_text, - int show_absolute_syms, int show_absolute_relocs, - int show_reloc_info); +struct opts { + bool use_real_mode; + bool as_text; + bool show_absolute_syms; + bool show_absolute_relocs; + bool show_reloc_info; +}; + +extern struct opts opts; + +void process_32(FILE *fp); +void process_64(FILE *fp); #endif /* RELOCS_H */ diff --git a/arch/x86/tools/relocs_common.c b/arch/x86/tools/relocs_common.c index 6634352a20bc..17d69baee0c3 100644 --- a/arch/x86/tools/relocs_common.c +++ b/arch/x86/tools/relocs_common.c @@ -1,6 +1,8 @@ // SPDX-License-Identifier: GPL-2.0 #include "relocs.h" +struct opts opts; + void die(char *fmt, ...) { va_list ap; @@ -18,40 +20,33 @@ static void usage(void) int main(int argc, char **argv) { - int show_absolute_syms, show_absolute_relocs, show_reloc_info; - int as_text, use_real_mode; const char *fname; FILE *fp; int i; unsigned char e_ident[EI_NIDENT]; - show_absolute_syms = 0; - show_absolute_relocs = 0; - show_reloc_info = 0; - as_text = 0; - use_real_mode = 0; fname = NULL; for (i = 1; i < argc; i++) { char *arg = argv[i]; if (*arg == '-') { if (strcmp(arg, "--abs-syms") == 0) { - show_absolute_syms = 1; + opts.show_absolute_syms = true; continue; } if (strcmp(arg, "--abs-relocs") == 0) { - show_absolute_relocs = 1; + opts.show_absolute_relocs = true; continue; } if (strcmp(arg, "--reloc-info") == 0) { - show_reloc_info = 1; + opts.show_reloc_info = true; continue; } if (strcmp(arg, "--text") == 0) { - as_text = 1; + opts.as_text = true; continue; } if (strcmp(arg, "--realmode") == 0) { - use_real_mode = 1; + opts.use_real_mode = true; continue; } } @@ -73,13 +68,9 @@ int main(int argc, char **argv) } rewind(fp); if (e_ident[EI_CLASS] == ELFCLASS64) - process_64(fp, use_real_mode, as_text, - show_absolute_syms, show_absolute_relocs, - show_reloc_info); + process_64(fp); else - process_32(fp, use_real_mode, as_text, - show_absolute_syms, show_absolute_relocs, - show_reloc_info); + process_32(fp); fclose(fp); return 0; } From patchwork Mon Feb 26 14:36:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572335 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A4459145B0E; Mon, 26 Feb 2024 14:37:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958252; cv=none; b=FT4R0EY/BbE32KDvjQsmrXMGFWNIbZvQOqPDrmQg9V9nZNk6xR2fH+vGm1EOIPvJueCqL9BW0z7s4+fRQPoiI4aTxVoHIqa7B4HG04+QK6BFWu5zNJMgRN2yrjTC8YRDKpeaODy9q+wKbDZuEjSWUaceX3s2SyBHcEZZKpy4nLs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958252; c=relaxed/simple; bh=DqsbfnlhZd45ReX+1hNDTDjtMxI2uW4PJFOcWdXla1Q=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=QPpynbd6B4hxAfRT6PgaqB56GunSPpvo8mwXDHpeQg7xsSCDmqA/mGhCAXeLl0OJbrux+Fd22WoUAdJBQR33gn8SFN0SW3xOBlhliqGGHoD3QNjcbssjTDR25TZHkqwxtzX6CNk9LUaD6kBuQOV7tDR0iLGjKUIuwqoIJq71EEM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=SSH0Ggjj; arc=none smtp.client-ip=209.85.210.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="SSH0Ggjj" Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-6e4560664b5so2527321b3a.1; Mon, 26 Feb 2024 06:37:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958250; x=1709563050; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2sxoFnLvmFsQ0Zs/11Gmenjh9ftrykz5AQVG07AFVn4=; b=SSH0Ggjj1gAJ0K3C3cVUZ/MrQ7nJska8H9LvDo1G2S9x3XJVd+Ooh3x8jF5DZNjzHJ dW98wKLLpYRaFFkOybPlNeuBdOF7ERKVOBdygNEbkuaREFuKNzA8heT8uq6Z4H8BLC85 5y5/IBQZByH+OhMEx8F+9xTooPRen1JucYk3UiftjyVVvT0fFp03ZAIrQAY6a3MsdaD7 p8ZKNhGpBWBP1rjCa/IodO9eWg8nBqNsD5oAjHJ0TwPpdYB+IOka3t/ym4X3cZYKw9UR IaWrMLXqVt6lCLrOoxfXjJj3j8mdNarDXrxxA1bToPPOhooiZq2TEltQWDx1CO8Rdk5w aNPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958250; x=1709563050; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2sxoFnLvmFsQ0Zs/11Gmenjh9ftrykz5AQVG07AFVn4=; b=ZYJmkIlqZBANtWBS/PDCNBnTXDk78vedPXj5CfcQa2tMafsQJNLXfxqdzeS7I9Df3G FDMR3SaXlyMW1Ac9Qayyn1SrbNLz5JD1w1Xppq+Ji/doSYAZmPR3Nnr6LeFDs9p0e6ai zviZjNCoAs+aadBVRBIzWox35yupTEK5dFhbBx+IaXkLm9Wm6m2o7niWXrlYTW9rJIAC LgCQSYdQ6Fl3bQUeYz1MafAILuCRN+2WbjSuEKw+zyILI4zdpoFUfsCO2Lym+CS34dSB xkk7oJVYXK443oMw2/7ZsLZB/l8V0lTtlvHY+5UlMWZywWeyPFyiC3WL06QZE/9Ronz9 OdcQ== X-Forwarded-Encrypted: i=1; AJvYcCWR79IbYr2pjJa/0Oe9zAqR03Jm6WsTA5ImHIfTEVYeFjPB56ZlUmfD+M91Cm6qX0OEb2aLNcG7Asvm6BUckbngVdU/ X-Gm-Message-State: AOJu0YzszJ3afYmF6bBXxErFnMYFzVx8LCB8TfPJgYywSzOkH+/JUTUP EmdC4M0EP/0zrhR01KvfZGIyGCjCkjwR635EiR76IaFMq15ZThElsRjyi2sm X-Google-Smtp-Source: AGHT+IFGXwmZyDUi2kJcasxtfmddqXmccTttN0JSJ6MO/Cez6VoWssGgCICDRL8xVQEgAJv43imG5A== X-Received: by 2002:a05:6a20:e68f:b0:1a0:7f3f:8d5d with SMTP id mz15-20020a056a20e68f00b001a07f3f8d5dmr10062037pzb.39.1708958249812; Mon, 26 Feb 2024 06:37:29 -0800 (PST) Received: from localhost ([47.88.5.130]) by smtp.gmail.com with ESMTPSA id w13-20020aa7858d000000b006e488553f09sm4092298pfn.81.2024.02.26.06.37.29 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:37:29 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Hou Wenlong , Lai Jiangshan , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" , Andrew Morton , Alexey Dobriyan , Thomas Garnier Subject: [RFC PATCH 51/73] x86/tools/relocs: Append relocations into input file Date: Mon, 26 Feb 2024 22:36:08 +0800 Message-Id: <20240226143630.33643-52-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Hou Wenlong Add a command line option to append relocations into a reserved section named ".data.reloc" section of the input file. This is the same as the implementation in MIPS. Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/tools/relocs.c | 62 +++++++++++++++++++++++++++------- arch/x86/tools/relocs.h | 1 + arch/x86/tools/relocs_common.c | 11 ++++-- 3 files changed, 60 insertions(+), 14 deletions(-) diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c index 743e5e44338b..97e0243b9abb 100644 --- a/arch/x86/tools/relocs.c +++ b/arch/x86/tools/relocs.c @@ -912,6 +912,17 @@ static int is_percpu_sym(ElfW(Sym) *sym, const char *symname) strncmp(symname, "init_per_cpu_", 13); } +static struct section *sec_lookup(const char *name) +{ + int i; + + for (i = 0; i < shnum; i++) { + if (!strcmp(sec_name(i), name)) + return &secs[i]; + } + + return NULL; +} static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym, const char *symname) @@ -1164,12 +1175,13 @@ static int write32_as_text(uint32_t v, FILE *f) return fprintf(f, "\t.long 0x%08"PRIx32"\n", v) > 0 ? 0 : -1; } -static void emit_relocs(void) +static void emit_relocs(FILE *f) { int i; int (*write_reloc)(uint32_t, FILE *) = write32; int (*do_reloc)(struct section *sec, Elf_Rel *rel, Elf_Sym *sym, const char *symname); + FILE *outf = stdout; #if ELF_BITS == 64 if (!opts.use_real_mode) @@ -1208,37 +1220,63 @@ static void emit_relocs(void) write_reloc = write32_as_text; } +#if ELF_BITS == 64 + if (opts.keep_relocs) { + struct section *sec_reloc; + uint32_t size_needed; + unsigned long offset; + + sec_reloc = sec_lookup(".data.reloc"); + if (!sec_reloc) + die("Could not find relocation data section\n"); + + size_needed = (3 + relocs64.count + relocs32neg.count + + relocs32.count) * sizeof(uint32_t); + if (size_needed > sec_reloc->shdr.sh_size) + die("Relocations overflow available space!\n" \ + "Please adjust CONFIG_RELOCATION_TABLE_SIZE" \ + "to at least 0x%08x\n", (size_needed + 0x1000) & ~0xFFF); + + offset = sec_reloc->shdr.sh_offset + sec_reloc->shdr.sh_size - + size_needed; + if (fseek(f, offset, SEEK_SET) < 0) + die("Seek to %ld failed: %s\n", offset, strerror(errno)); + + outf = f; + } +#endif + if (opts.use_real_mode) { - write_reloc(relocs16.count, stdout); + write_reloc(relocs16.count, outf); for (i = 0; i < relocs16.count; i++) - write_reloc(relocs16.offset[i], stdout); + write_reloc(relocs16.offset[i], outf); - write_reloc(relocs32.count, stdout); + write_reloc(relocs32.count, outf); for (i = 0; i < relocs32.count; i++) - write_reloc(relocs32.offset[i], stdout); + write_reloc(relocs32.offset[i], outf); } else { #if ELF_BITS == 64 /* Print a stop */ - write_reloc(0, stdout); + write_reloc(0, outf); /* Now print each relocation */ for (i = 0; i < relocs64.count; i++) - write_reloc(relocs64.offset[i], stdout); + write_reloc(relocs64.offset[i], outf); /* Print a stop */ - write_reloc(0, stdout); + write_reloc(0, outf); /* Now print each inverse 32-bit relocation */ for (i = 0; i < relocs32neg.count; i++) - write_reloc(relocs32neg.offset[i], stdout); + write_reloc(relocs32neg.offset[i], outf); #endif /* Print a stop */ - write_reloc(0, stdout); + write_reloc(0, outf); /* Now print each relocation */ for (i = 0; i < relocs32.count; i++) - write_reloc(relocs32.offset[i], stdout); + write_reloc(relocs32.offset[i], outf); } } @@ -1294,5 +1332,5 @@ void process(FILE *fp) print_reloc_info(); return; } - emit_relocs(); + emit_relocs(fp); } diff --git a/arch/x86/tools/relocs.h b/arch/x86/tools/relocs.h index 1cb0e235ad73..20f729e4579f 100644 --- a/arch/x86/tools/relocs.h +++ b/arch/x86/tools/relocs.h @@ -37,6 +37,7 @@ struct opts { bool show_absolute_syms; bool show_absolute_relocs; bool show_reloc_info; + bool keep_relocs; }; extern struct opts opts; diff --git a/arch/x86/tools/relocs_common.c b/arch/x86/tools/relocs_common.c index 17d69baee0c3..87d94d9e4b97 100644 --- a/arch/x86/tools/relocs_common.c +++ b/arch/x86/tools/relocs_common.c @@ -14,7 +14,7 @@ void die(char *fmt, ...) static void usage(void) { - die("relocs [--abs-syms|--abs-relocs|--reloc-info|--text|--realmode]" \ + die("relocs [--abs-syms|--abs-relocs|--reloc-info|--text|--realmode|--keep]" \ " vmlinux\n"); } @@ -49,6 +49,10 @@ int main(int argc, char **argv) opts.use_real_mode = true; continue; } + if (strcmp(arg, "--keep") == 0) { + opts.keep_relocs = true; + continue; + } } else if (!fname) { fname = arg; @@ -59,7 +63,10 @@ int main(int argc, char **argv) if (!fname) { usage(); } - fp = fopen(fname, "r"); + if (opts.keep_relocs) + fp = fopen(fname, "r+"); + else + fp = fopen(fname, "r"); if (!fp) { die("Cannot open %s: %s\n", fname, strerror(errno)); } From patchwork Mon Feb 26 14:36:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572336 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 69A5F12E1CB; Mon, 26 Feb 2024 14:37:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958260; cv=none; b=iMMW/RvYNc0K0wLnKNx//zgmnbaOqMNQAgxq4xNWlCrpiFMk9e+VauPjyxAxmLMuQ9J8R4tQElVKVXvrh/MjD5f/BjV40ibRnAFUrnmi8CwKshvED/2gyCiDmiuZRgGGxR76qzDAm8h4HQl9gDKvnWr7yysMJi6GK6WLBC17jv4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958260; c=relaxed/simple; bh=vMW1eVYwDePlNHpf4YAsMTIArLcthLhAiwgx16DGjY4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=IjAcoc7sDy9hJPMNo6VJViTHe04x/EtMVZCd/XnZ4riAvY0sxLO5CvZTJDscEDkuYotoXiQkLje7z+ZvdEHvonfofIWvwNqJ4YhZmOso4YrXrceXUCl03hsRw9UR4vHNBhAV5ce+H1kYNVNUIjwmr5W5t4VJ6VVvE3BHijq8d1U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=jiMaRYER; arc=none smtp.client-ip=209.85.210.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="jiMaRYER" Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-6e09143c7bdso1618276b3a.3; Mon, 26 Feb 2024 06:37:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958258; x=1709563058; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4klgCkETp7nNDyIHImj1EDNxCevpZbInRHKa35A/H0o=; b=jiMaRYERXkQv7gFmi+P94u5ubosnzFPodxog5GnlOt4chNVYwHHoq8+gCuyTdValDA a4nHT6DZmrU2mMrF4YxegrYju8hYBBSHZaeDdbgKjGrJ01V7fRFDL2A/zeDKDiC3SVj1 jbfS7I/EN1tM8uIXOj5RZf/9BQseWg91MNDGsQKMZiznaoOqS5jL5EjoPshrNzgr4FQm G/le8CBkkY1zudXr31lO21F22eLIjApy6G6iX4+k8/N+nV0hbSlDD1o0Zx87npE8Ig35 KiD8ubQIUC4gX1b2mjjgQEfg5madG03D1MuIufxoNxkmz93v1JQ1PLbhdrl4ulesIjAE 21iQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958258; x=1709563058; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4klgCkETp7nNDyIHImj1EDNxCevpZbInRHKa35A/H0o=; b=MPCDYFR2iRIYY5WqeFCUDerqkRdjSM4dJbRyJAgGO6zsmMWaU/ssgIwS5EacZ0y4vq 4l5eUg46xY6f7xkzuxPn+C7RTURwkQagHD0UUfkmdNU9n0IqLVhHU6jaCTbML7hobZw6 6TM8s2sBmtLAz0XKM5PG9mbiA/np4p+DA63YJYCTX6bLmk51k4N4PplihwkIjfixGgaJ 8m35kv5MJpyEmtOd/qKcCFa4sVOga61FSO+Cpl9U2J5ir3WdMgP1M7uu+9f0W2pyAHSJ kf70f4luNoj3yLdqcO/e09fGCUi+ta4sm9TVd+CCl1Yc1Nw/R2e7HRihNIAmh6NB/MkQ 8MbQ== X-Forwarded-Encrypted: i=1; AJvYcCVW1vQkTWdFq6FEza6TA77EKKWuigazDPLo29sN/52wsgcHLjL3DFAv0SrFSpCotfD3o5SDurb4zdFDJw7F7llH8ri8 X-Gm-Message-State: AOJu0YxmE9LIymHWEbeaxHMzwbiXM4nfWLsBAUer34Jvr+tlUyq2fMyZ 6f1xJRCmkM3fclj1IWcQCaz2o+GUHleH8Zz75iDBXPqFs3rUUTZ2YgERCZpD X-Google-Smtp-Source: AGHT+IG7vWBbWiwjmwA+PlSCSOw5iGOSdL3EIXVHatQniQSGe/OWjrOAq8NACkhkYE96uIqmTOZEpA== X-Received: by 2002:aa7:8a54:0:b0:6e3:d201:3f87 with SMTP id n20-20020aa78a54000000b006e3d2013f87mr5851960pfa.28.1708958258290; Mon, 26 Feb 2024 06:37:38 -0800 (PST) Received: from localhost ([47.254.32.37]) by smtp.gmail.com with ESMTPSA id p17-20020a056a000b5100b006dd8a07696csm4108591pfo.106.2024.02.26.06.37.37 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:37:38 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Hou Wenlong , Lai Jiangshan , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" , Petr Pavlu , Josh Poimboeuf , Nick Desaulniers Subject: [RFC PATCH 52/73] x86/boot: Allow to do relocation for uncompressed kernel Date: Mon, 26 Feb 2024 22:36:09 +0800 Message-Id: <20240226143630.33643-53-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Hou Wenlong Relocation is currently only performed during the uncompression process. However, in some situations, such as with security containers, the uncompressed kernel can be booted directly. Therefore, it is useful to allow for relocation of the uncompressed kernel. Taking inspiration from the implementation in MIPS, a new section named ".data.relocs" is reserved for relocations. The relocs tool can then append the relocations into this section. Additionally, a helper function is introduced to perform relocations during booting, similar to the relocations in the bootloader. For PVH entry, relocation for the pre-constructed page table should not be performed; otherwise, booting will fail. Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/Kconfig | 20 +++++++++ arch/x86/Makefile.postlink | 9 +++- arch/x86/kernel/head64_identity.c | 70 +++++++++++++++++++++++++++++++ arch/x86/kernel/vmlinux.lds.S | 18 ++++++++ 4 files changed, 116 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a53b65499951..d02ef3bdb171 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2183,6 +2183,26 @@ config RELOCATABLE it has been loaded at and the compile time physical address (CONFIG_PHYSICAL_START) is used as the minimum location. +config RELOCATABLE_UNCOMPRESSED_KERNEL + bool + depends on RELOCATABLE + help + A table of relocation data will be appended to the uncompressed + kernel binary and parsed at boot to fix up the relocated kernel. + +config RELOCATION_TABLE_SIZE + hex "Relocation table size" + depends on RELOCATABLE_UNCOMPRESSED_KERNEL + range 0x0 0x01000000 + default "0x00200000" + help + This option allows the amount of space reserved for the table to be + adjusted, although the default of 1Mb should be ok in most cases. + + The build will fail and a valid size suggested if this is too small. + + If unsure, leave at the default value. + config X86_PIE bool "Build a PIE kernel" default n diff --git a/arch/x86/Makefile.postlink b/arch/x86/Makefile.postlink index fef2e977cc7d..c115692b67b2 100644 --- a/arch/x86/Makefile.postlink +++ b/arch/x86/Makefile.postlink @@ -4,7 +4,8 @@ # =========================================================================== # # 1. Separate relocations from vmlinux into vmlinux.relocs. -# 2. Strip relocations from vmlinux. +# 2. Insert relocations table into vmlinux +# 3. Strip relocations from vmlinux. PHONY := __archpost __archpost: @@ -20,6 +21,9 @@ quiet_cmd_relocs = RELOCS $(OUT_RELOCS)/$@.relocs $(CMD_RELOCS) $@ > $(OUT_RELOCS)/$@.relocs; \ $(CMD_RELOCS) --abs-relocs $@ +quiet_cmd_insert_relocs = RELOCS $@ + cmd_insert_relocs = $(CMD_RELOCS) --keep $@ + quiet_cmd_strip_relocs = RSTRIP $@ cmd_strip_relocs = \ $(OBJCOPY) --remove-section='.rel.*' --remove-section='.rel__*' \ @@ -29,6 +33,9 @@ quiet_cmd_strip_relocs = RSTRIP $@ vmlinux: FORCE @true +ifeq ($(CONFIG_RELOCATABLE_UNCOMPRESSED_KERNEL),y) + $(call cmd,insert_relocs) +endif ifeq ($(CONFIG_X86_NEED_RELOCS),y) $(call cmd,relocs) $(call cmd,strip_relocs) diff --git a/arch/x86/kernel/head64_identity.c b/arch/x86/kernel/head64_identity.c index ecac6e704868..4548ad615ecf 100644 --- a/arch/x86/kernel/head64_identity.c +++ b/arch/x86/kernel/head64_identity.c @@ -315,3 +315,73 @@ void __head startup_64_setup_env(void) startup_64_load_idt(); } + +#ifdef CONFIG_RELOCATABLE_UNCOMPRESSED_KERNEL +extern u8 __relocation_end[]; + +static bool __head is_in_pvh_pgtable(unsigned long ptr) +{ +#ifdef CONFIG_PVH + if (ptr >= (unsigned long)init_top_pgt && + ptr < (unsigned long)init_top_pgt + PAGE_SIZE) + return true; + if (ptr >= (unsigned long)level3_ident_pgt && + ptr < (unsigned long)level3_ident_pgt + PAGE_SIZE) + return true; +#endif + return false; +} + +void __head __relocate_kernel(unsigned long physbase, unsigned long virtbase) +{ + int *reloc = (int *)__relocation_end; + unsigned long ptr; + unsigned long delta = virtbase - __START_KERNEL_map; + unsigned long map = physbase - __START_KERNEL; + long extended; + + /* + * Relocation had happended in bootloader, + * don't do it again. + */ + if (SYM_ABS_VA(_text) != __START_KERNEL) + return; + + if (!delta) + return; + + /* + * Format is: + * + * kernel bits... + * 0 - zero terminator for 64 bit relocations + * 64 bit relocation repeated + * 0 - zero terminator for inverse 32 bit relocations + * 32 bit inverse relocation repeated + * 0 - zero terminator for 32 bit relocations + * 32 bit relocation repeated + * + * So we work backwards from the end of .data.relocs section, see + * handle_relocations() in arch/x86/boot/compressed/misc.c. + */ + while (*--reloc) { + extended = *reloc; + ptr = (unsigned long)(extended + map); + *(uint32_t *)ptr += delta; + } + + while (*--reloc) { + extended = *reloc; + ptr = (unsigned long)(extended + map); + *(int32_t *)ptr -= delta; + } + + while (*--reloc) { + extended = *reloc; + ptr = (unsigned long)(extended + map); + if (is_in_pvh_pgtable(ptr)) + continue; + *(uint64_t *)ptr += delta; + } +} +#endif diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index 834c68b45f15..3b05807fe1dc 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -339,6 +339,24 @@ SECTIONS } #endif +#ifdef CONFIG_RELOCATABLE_UNCOMPRESSED_KERNEL + . = ALIGN(4); + .data.reloc : AT(ADDR(.data.reloc) - LOAD_OFFSET) { + __relocation_start = .; + /* + * Space for relocation table + * This needs to be filled so that the + * relocs tool can overwrite the content. + * Put a dummy data item at the start to + * avoid to generate NOBITS section. + */ + LONG(0); + FILL(0); + . += CONFIG_RELOCATION_TABLE_SIZE - 4; + __relocation_end = .; + } +#endif + /* * struct alt_inst entries. From the header (alternative.h): * "Alternative instructions for different CPU types or capabilities" From patchwork Mon Feb 26 14:36:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572337 Received: from mail-pj1-f44.google.com (mail-pj1-f44.google.com [209.85.216.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA5CD12EBC1; Mon, 26 Feb 2024 14:37:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958266; cv=none; b=WhauTEGqLGX7shDiOVRTkjYInniCyqrzaUXxgzkr0m3ahsOqggRHdcfb5BidBz8iWvFVsvR8Hi5HD9p2VtFhQO8BdhnDnhOUDDVruESYhA2tTsAJeNhRVqfD15qamDpZoNa8G0WL/rra3WVcVyY7j20fIVHp3dqNmp4baCAqowA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958266; c=relaxed/simple; bh=5UQ/75p105DFTiVsQkZHczkwiUZN2Bo2WzyYu3VvV3g=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=HFPZ+jtlH/nsGc0lf/ZcaGIXA25Q3YpvBYNyhtID0x6ZhgsqNpZCW8l4mRH42VopFiOjSfNNX+NYFPKZApevLgU37HZc5G2fcXja+PCE/mBPk4+8vGPtGfdkEpop0BdBZMZmrOGTpJ3JQJ9lH1uUlw0Pb51dl6lZPAGGZ6zqVJA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=YZkumKZX; arc=none smtp.client-ip=209.85.216.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="YZkumKZX" Received: by mail-pj1-f44.google.com with SMTP id 98e67ed59e1d1-29acc73df4eso498097a91.1; Mon, 26 Feb 2024 06:37:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958264; x=1709563064; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rqdg4X/YjHymybWX0IILhA6ds4v8UP51Kri2hBr6/kU=; b=YZkumKZXdluRoyBAJ1/ufj6J4qIwtUH0q2OhLrHG8hQ9InPE47FIih0YpD9yQ8Fj7T j09MU19ZDtUlzIRqMPPX1m6HfNaNkXuKiAVjAtlBA+UkjW3eGsa+gBDv9qyOj8W0eMDa SZTlvlF8/Ln+E4D5f4yzEdAOV0iU010tD7N2ZrQTMm9oxVutEEWJmos6L/mgXdwtFdc/ QoBP8R6xrV+XMw1GWzUa0Pxc+MFJWNWOG8WLSSEJyEiFNxYXTyxgQo/sSBnrF2LVgv4v mKXR+CXcxOavuobVujoQ8N9MZKhYcZIU9nfSjC7ay3uHGZURsvf1vvA8ND3hie9k3dtB R+5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958264; x=1709563064; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rqdg4X/YjHymybWX0IILhA6ds4v8UP51Kri2hBr6/kU=; b=Laacl0xY4pEfjVivDnm2uKezC9VT0KPQngNAnqXW08kesOh1iuESDc/camyxg3RB8y RLtLUc2GisfzeKJxHZDthmxoHvrIUCEZoxGrKjtpyQIralKkWtw2OCy69aRpyal5MhX3 BS7xuCklbcJLRol3CHzdWkKMje7rgIrg+0yvxfwRWqHPw/q1uM+rn0ElBEkSI0ZUiT6k PntW70WRJfNQEqqzZNOSlU03pRYS/yqvFCWKr7UYbraRkNkdhgScn5IMcSMKFQLySv0E cBqmB4HVm8PeszAq+nfpCL3EJIhTskKC3HFijnS0MYbX6x1aVOQsvlO9gHqS7lk/UZ9t +8Qg== X-Forwarded-Encrypted: i=1; AJvYcCUd4SLz0PGFIsFU173nev97fJeq34jklf1IOK7TFx5Bsm8aBLny8+jA2n7XqVM1XQviuc4euTW+7cYAO7t/msVE51qo X-Gm-Message-State: AOJu0YyV2BJTv4Ft3ae4KL7hqxwDBq7IbzQr2Nnp0jDv2bFdG0EnkQn9 +m3XGMrUQO0kmpGN6YraTbuZZKpeiEbockjfdxwugQCd922XbxUU4oiZNDTg X-Google-Smtp-Source: AGHT+IECS7xluNhmsd8ZjIWA8oFryLoaOHejAuTxIJmqPfeBjlZqxafvW0TcCsWSOMUU7MiBSKKWzg== X-Received: by 2002:a17:90a:4a89:b0:29a:83da:ed62 with SMTP id f9-20020a17090a4a8900b0029a83daed62mr5947851pjh.4.1708958263592; Mon, 26 Feb 2024 06:37:43 -0800 (PST) Received: from localhost ([198.11.176.14]) by smtp.gmail.com with ESMTPSA id q13-20020a17090a178d00b0029ac5848d5bsm1523294pja.1.2024.02.26.06.37.42 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:37:43 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" , "Mike Rapoport (IBM)" , Daniel Sneddon , Rick Edgecombe , Alexey Kardashevskiy , Yu-cheng Yu , "Kirill A. Shutemov" Subject: [RFC PATCH 53/73] x86/pvm: Add Kconfig option and the CPU feature bit for PVM guest Date: Mon, 26 Feb 2024 22:36:10 +0800 Message-Id: <20240226143630.33643-54-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan Add the configuration option CONFIG_PVM_GUEST to enable the building of a PVM guest. Introduce a new CPU feature bit to control the behavior of the PVM guest. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/Kconfig | 8 ++++++++ arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h | 8 +++++++- 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d02ef3bdb171..2ccc8a27e081 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -851,6 +851,14 @@ config KVM_GUEST underlying device model, the host provides the guest with timing infrastructure such as time of day, and system time +config PVM_GUEST + bool "PVM Guest support" + depends on X86_64 && KVM_GUEST + default n + help + This option enables the kernel to run as a PVM guest under the PVM + hypervisor. + config ARCH_CPUIDLE_HALTPOLL def_bool n prompt "Disable host haltpoll when loading haltpoll driver" diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 4af140cf5719..e17e72f13423 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -237,6 +237,7 @@ #define X86_FEATURE_PVUNLOCK ( 8*32+20) /* "" PV unlock function */ #define X86_FEATURE_VCPUPREEMPT ( 8*32+21) /* "" PV vcpu_is_preempted function */ #define X86_FEATURE_TDX_GUEST ( 8*32+22) /* Intel Trust Domain Extensions Guest */ +#define X86_FEATURE_KVM_PVM_GUEST ( 8*32+23) /* KVM Pagetable-based Virtual Machine guest */ /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */ #define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index 702d93fdd10e..5d56e804ab18 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -105,6 +105,12 @@ # define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31)) #endif +#ifdef CONFIG_PVM_GUEST +#define DISABLE_KVM_PVM_GUEST 0 +#else +#define DISABLE_KVM_PVM_GUEST (1 << (X86_FEATURE_KVM_PVM_GUEST & 31)) +#endif + #ifdef CONFIG_X86_USER_SHADOW_STACK #define DISABLE_USER_SHSTK 0 #else @@ -128,7 +134,7 @@ #define DISABLED_MASK5 0 #define DISABLED_MASK6 0 #define DISABLED_MASK7 (DISABLE_PTI) -#define DISABLED_MASK8 (DISABLE_XENPV|DISABLE_TDX_GUEST) +#define DISABLED_MASK8 (DISABLE_XENPV|DISABLE_TDX_GUEST|DISABLE_KVM_PVM_GUEST) #define DISABLED_MASK9 (DISABLE_SGX) #define DISABLED_MASK10 0 #define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \ From patchwork Mon Feb 26 14:36:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572338 Received: from mail-oo1-f50.google.com (mail-oo1-f50.google.com [209.85.161.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4CA351474AE; Mon, 26 Feb 2024 14:37:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958275; cv=none; b=H59LIbYReXOOxZa6CFfa06vs1eTCeykqPBnGjd3MqVJIxxtnzkY96gUuDAukmAQE44EKlIPTnM/np5zthZrD3CFObrSjrMPWEuZ8uqfEizI5yiggziB64iTuDcwbU4Qltn8KVfxy5jVTDKqQVHIHZhOVZqWeKXITqn3eYokwKVw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958275; c=relaxed/simple; bh=iF+u/pfN6Z+LC5rl3PmX0bCf9rpWWKCNx9V96EJqjYQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=RQ39hGtv5Zmz+kpM3jUuwRGUFmRnbyPNBMr/kc2DVcobHFIwnamzWfaArshQkWzMqsmOB788WVa2Z6KrBHNJuhOCXuN//gdgYtZbJmXHMaKSwWXP96aSsOzeT7YdWd308B58ef8i19kfQpfppBSjqokgYq9oXmc3Fq4PU100VJ4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=gv8zeCts; arc=none smtp.client-ip=209.85.161.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gv8zeCts" Received: by mail-oo1-f50.google.com with SMTP id 006d021491bc7-5a04fb5e689so1598809eaf.1; Mon, 26 Feb 2024 06:37:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958273; x=1709563073; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=j3F50O3ylG3ZskDRT0xmw8UdB0U5PMTg1O+LfmHJzgw=; b=gv8zeCtsq2M6LddJ3hVu/JOGKJ76Kd5zhChWtb4fjuvoBJhaCx1wk4RBLhnrA4oTQE ub1DP4v3QfGcDpBRxkLjCalAsEqB+mKaKKHCfvCZVdgnSH4oPkdB46LwKM9gHFBgblqL CO5xQ0BlyRyqyYR5jKnU8a9ZsahFPUMASPgt6Rx9KkK2jJTOTJ5TXkWyzVwCKoOP8172 ocxbVBPn6YqboT0femcO67wZw8m9szL8ppw9oq7Sf4NH8pMk2LAXKL4yE//nS95Dvxsp j4MqOO62Rw+wNNRm6Ksr1a/ZF9xBpf3AZhUtrGeDO8f3eWCFNZs1TjUjj281ubX0RkY/ Yu2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958273; x=1709563073; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=j3F50O3ylG3ZskDRT0xmw8UdB0U5PMTg1O+LfmHJzgw=; b=KNn7mBOGnn5nFQ7smH1HBQqotAJy1b4kxLrwPzlA/WhEmpWqJM1hr88cFgEs97xaCY YHxBxSMppMjuRdDwOcGk8uMumKdEeB4oJD0Zi7W21xiUifZ5o++u4Ztni35sFg3zw4lm jJCc3YIuiN5f9WQSE0NCKximHNereuGvGPmHUmg/V4ZvA76tyeEBC/ApcSAyNMchV4Cw c4i/fdIwZbTcFtCEANkSW58a5/lfg14itKJa+qgbogzlYvpysjKodGHtzD1R7uMQFzyC Qu/71ymSqBQ18y/TViJXivnufOitfmxwPxdrjEJSRA+LUc06km+wCFJ+hntD61Gm7Vme gKag== X-Forwarded-Encrypted: i=1; AJvYcCWAx0aVb+lDrGMiWVV0SkjGrDzHBxXpXtxMwrgBOrjmpPgCTTaPbemBe+xD4ha5RI+aBEPGMcP9+O6sMZRR8NtXyWG2 X-Gm-Message-State: AOJu0YxdSR73sgIU2uKzi72ZoEbnyFpwLXEXujMkmoimUh2rBkGMbLFj BZ5QX43iXMxHBj05hWERQi9RmJTYpCKEFEEqrNeM2JDZG6XgUje12AO9VgQC X-Google-Smtp-Source: AGHT+IFPVhu8MaUL4U9TiSoKfpjFwi5MzWPuMosqmec6SO4g8BFH6Kp+bCxObLQ7ckCBnTySm25wpA== X-Received: by 2002:a05:6358:286:b0:17a:f91c:825b with SMTP id w6-20020a056358028600b0017af91c825bmr8812523rwj.5.1708958273041; Mon, 26 Feb 2024 06:37:53 -0800 (PST) Received: from localhost ([47.254.32.37]) by smtp.gmail.com with ESMTPSA id v4-20020aa78504000000b006e4e557346esm4114190pfn.28.2024.02.26.06.37.52 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:37:52 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" , "Mike Rapoport (IBM)" , Rick Edgecombe , Pengfei Xu , Ze Gao , Josh Poimboeuf Subject: [RFC PATCH 54/73] x86/pvm: Detect PVM hypervisor support Date: Mon, 26 Feb 2024 22:36:11 +0800 Message-Id: <20240226143630.33643-55-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan Detect PVM hypervisor support through the use of the PVM synthetic instruction 'PVM_SYNTHETIC_CPUID'. This is a necessary step in preparing to initialize the PVM guest during booting. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/include/asm/pvm_para.h | 69 +++++++++++++++++++++++++++++++++ arch/x86/kernel/Makefile | 1 + arch/x86/kernel/pvm.c | 22 +++++++++++ 3 files changed, 92 insertions(+) create mode 100644 arch/x86/include/asm/pvm_para.h create mode 100644 arch/x86/kernel/pvm.c diff --git a/arch/x86/include/asm/pvm_para.h b/arch/x86/include/asm/pvm_para.h new file mode 100644 index 000000000000..efd7afdf9be9 --- /dev/null +++ b/arch/x86/include/asm/pvm_para.h @@ -0,0 +1,69 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_PVM_PARA_H +#define _ASM_X86_PVM_PARA_H + +#include +#include + +#ifdef CONFIG_PVM_GUEST +#include +#include + +void __init pvm_early_setup(void); + +static inline void pvm_cpuid(unsigned int *eax, unsigned int *ebx, + unsigned int *ecx, unsigned int *edx) +{ + asm(__ASM_FORM(.byte PVM_SYNTHETIC_CPUID ;) + : "=a" (*eax), + "=b" (*ebx), + "=c" (*ecx), + "=d" (*edx) + : "0" (*eax), "2" (*ecx)); +} + +/* + * pvm_detect() is called before event handling is set up and it might be + * possibly called under any hypervisor other than PVM, so it should not + * trigger any trap in all possible scenarios. PVM_SYNTHETIC_CPUID is supposed + * to not trigger any trap in the real or virtual x86 kernel mode and is also + * guaranteed to trigger a trap in the underlying hardware user mode for the + * hypervisor emulating it. + */ +static inline bool pvm_detect(void) +{ + unsigned long cs; + uint32_t eax, signature[3]; + + /* check underlying interrupt flags */ + if (arch_irqs_disabled_flags(native_save_fl())) + return false; + + /* check underlying CS */ + asm volatile("mov %%cs,%0\n\t" : "=r" (cs) : ); + if ((cs & 3) != 3) + return false; + + /* check KVM_SIGNATURE and KVM_CPUID_VENDOR_FEATURES */ + eax = KVM_CPUID_SIGNATURE; + pvm_cpuid(&eax, &signature[0], &signature[1], &signature[2]); + if (memcmp(KVM_SIGNATURE, signature, 12)) + return false; + if (eax < KVM_CPUID_VENDOR_FEATURES) + return false; + + /* check PVM_CPUID_SIGNATURE */ + eax = KVM_CPUID_VENDOR_FEATURES; + pvm_cpuid(&eax, &signature[0], &signature[1], &signature[2]); + if (signature[0] != PVM_CPUID_SIGNATURE) + return false; + + return true; +} +#else +static inline void pvm_early_setup(void) +{ +} +#endif /* CONFIG_PVM_GUEST */ + +#endif /* _ASM_X86_PVM_PARA_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index dc1f5a303e9b..67f11f7d5c88 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -129,6 +129,7 @@ obj-$(CONFIG_AMD_NB) += amd_nb.o obj-$(CONFIG_DEBUG_NMI_SELFTEST) += nmi_selftest.o obj-$(CONFIG_KVM_GUEST) += kvm.o kvmclock.o +obj-$(CONFIG_PVM_GUEST) += pvm.o obj-$(CONFIG_PARAVIRT) += paravirt.o obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o obj-$(CONFIG_PARAVIRT_CLOCK) += pvclock.o diff --git a/arch/x86/kernel/pvm.c b/arch/x86/kernel/pvm.c new file mode 100644 index 000000000000..2d27044eaf25 --- /dev/null +++ b/arch/x86/kernel/pvm.c @@ -0,0 +1,22 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * KVM PVM paravirt_ops implementation + * + * Copyright (C) 2020 Ant Group + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ +#define pr_fmt(fmt) "pvm-guest: " fmt + +#include +#include + +void __init pvm_early_setup(void) +{ + if (!pvm_detect()) + return; + + setup_force_cpu_cap(X86_FEATURE_KVM_PVM_GUEST); +} From patchwork Mon Feb 26 14:36:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572339 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D656712AAFF; Mon, 26 Feb 2024 14:38:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958285; cv=none; b=gpbry4X6FuzNjq2DM2VVBMaP4JRzVEGTNVo5fUOzuhGMC8cgk3x8fKroffXy/0r201Ov7yR4k5Xvhy4EK5DXO6+IiKpaADDpsNCV5kD66FoY7jEopjoNW7u0r+3ZW+EUOXlGMNdu9XemXfj8DWtWfy2G6cFX+pKf508ZytPo/3I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958285; c=relaxed/simple; bh=kcNZ0lhM9kk5ZqiWC7oiKK6STog5OGE3f4B2GekF4Ys=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=laVEKoVkM/M6HaFJd3rLs2Ch566I9c7JWWkAmSSkq4G0mbzDS798ps9dHnKpsLSqyupgAjNF2jU+1kF4UwGBKQsj+25B/DrgFncraUjRrh49/+kV5NiogCM4smxtg8ZSj2Ol3rXmst5RoE7esou/nXcKyXo9FkMdWBlCTs71id8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=WSaROuhD; arc=none smtp.client-ip=209.85.214.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="WSaROuhD" Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-1dc3b4b9b62so26756195ad.1; Mon, 26 Feb 2024 06:38:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958283; x=1709563083; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wF5I3QGcDe499zvzwDfAHdHqAxG40+9SQfq6UFOiBbY=; b=WSaROuhD9PpSiFeIhsaXzJ4U2q7+OoEyKLiw/VpecKjuf+6Hy2fPROGGc6nahG1Dc2 xjNZrkCd4wTSOoO7Ak9c94KR3sAkpwu6/avT8e5M+i1r4Ij7nRymqn6xdLgaaml/rfq7 BCohSMGUT2mG1XeEsIcnpL+RHxzU1nAc/q9jxKNT/2RaDqiz7FfGAxomRP7rlXrMzqcs sObS/4vX8gNzzwdD7q5cX3Otrsn4NxNKChWlt7dJ0IxudnSgndrHj3XRPndaygfllkXk y1JMxUdHPJk7K06rB69zvHc0oWygJ2aGjEhscwXaCKmso6QL3Bhx7lWbKAZ3jPHUxNfn VJ5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958283; x=1709563083; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wF5I3QGcDe499zvzwDfAHdHqAxG40+9SQfq6UFOiBbY=; b=HC5urZi5V2vmO1Y1igWOSdoQcqmcTeLIlgw2Tn94ZUCmLkuaEB2rho8fUllzhpE9FA 0c8NT0tvQ8jxP0IZCVmh/UOBUg1QZHkROHtqPfkHF0D1c+sDXesWcGKKqo7R0OcT86T9 ETNl/7DNtCbVXaby4qps/if8ADkPbCDby3SoUcwx98T8miKXXNkhQQeW1iHXxJk+lIZv Cc5hJLZYZTZBU5rxFw7lHB7McYnoHS6G3JRVx5YDY8KP+ASktkoAcolbOjPSH4n/trcq GIXMC5Hsl4xQKTXTjro3hcuAWodGmtOrfUuz+twBN6WXjP2Q0WWJvC8D7dnKAhJFc0il uONQ== X-Forwarded-Encrypted: i=1; AJvYcCVkkeLVEYBWI4ia/swGVVVR56dQ3Vqe2gYXfUXDe9tb+ZqaNulhh2VUYuuUVau+6aYM/dfhcB7Hsdh71bv4snlx3++X X-Gm-Message-State: AOJu0YwKFr+YczyNt3kDD3eQHeDOJ4FbC38JNy3AWtJ05kDPhwfSQRTA l9GFLA9MXn+xKJIta+WzQOdRrBB8sxk8gPzScHZv7wb8QlI4HV5oTKPJkoti X-Google-Smtp-Source: AGHT+IEBOatbi0zsLKeC/BI5oxkPGQKl9gN+gvqMHyn8obAn/tdzQdkUPLAn2vjrNgBgJiRxD1DwBw== X-Received: by 2002:a17:902:c404:b0:1d7:836d:7b3f with SMTP id k4-20020a170902c40400b001d7836d7b3fmr9974536plk.9.1708958283008; Mon, 26 Feb 2024 06:38:03 -0800 (PST) Received: from localhost ([47.88.5.130]) by smtp.gmail.com with ESMTPSA id ks14-20020a170903084e00b001dc30f13e6asm4018049plb.137.2024.02.26.06.38.02 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:38:02 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Hou Wenlong , Lai Jiangshan , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" , David Woodhouse , Brian Gerst , Josh Poimboeuf , Thomas Garnier , Ard Biesheuvel , Tom Lendacky Subject: [RFC PATCH 55/73] x86/pvm: Relocate kernel image to specific virtual address range Date: Mon, 26 Feb 2024 22:36:12 +0800 Message-Id: <20240226143630.33643-56-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Hou Wenlong For a PVM guest, it is only allowed to run in the specific virtual address range provided by the hypervisor. Therefore, the PVM guest needs to be a PIE kernel and perform relocation during the booting process. Additionally, for a compressed kernel image, kaslr needs to be disabled; otherwise, it will fail to boot. Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/Kconfig | 3 ++- arch/x86/kernel/head64_identity.c | 27 +++++++++++++++++++++++++++ arch/x86/kernel/head_64.S | 13 +++++++++++++ arch/x86/kernel/pvm.c | 5 ++++- 4 files changed, 46 insertions(+), 2 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 2ccc8a27e081..1b4bea3db53d 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -853,7 +853,8 @@ config KVM_GUEST config PVM_GUEST bool "PVM Guest support" - depends on X86_64 && KVM_GUEST + depends on X86_64 && KVM_GUEST && X86_PIE + select RELOCATABLE_UNCOMPRESSED_KERNEL default n help This option enables the kernel to run as a PVM guest under the PVM diff --git a/arch/x86/kernel/head64_identity.c b/arch/x86/kernel/head64_identity.c index 4548ad615ecf..4e6a073d9e6c 100644 --- a/arch/x86/kernel/head64_identity.c +++ b/arch/x86/kernel/head64_identity.c @@ -20,6 +20,7 @@ #include #include #include +#include extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD]; extern unsigned int next_early_pgt; @@ -385,3 +386,29 @@ void __head __relocate_kernel(unsigned long physbase, unsigned long virtbase) } } #endif + +#ifdef CONFIG_PVM_GUEST +extern unsigned long pvm_range_start; +extern unsigned long pvm_range_end; + +static void __head detect_pvm_range(void) +{ + unsigned long msr_val; + unsigned long pml4_index_start, pml4_index_end; + + msr_val = __rdmsr(MSR_PVM_LINEAR_ADDRESS_RANGE); + pml4_index_start = msr_val & 0x1ff; + pml4_index_end = (msr_val >> 16) & 0x1ff; + pvm_range_start = (0x1fffe00 | pml4_index_start) * P4D_SIZE; + pvm_range_end = (0x1fffe00 | pml4_index_end) * P4D_SIZE; +} + +void __head pvm_relocate_kernel(unsigned long physbase) +{ + if (!pvm_detect()) + return; + + detect_pvm_range(); + __relocate_kernel(physbase, pvm_range_end - (2UL << 30)); +} +#endif diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index b8278f05bbd0..1d931bab4393 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -91,6 +91,19 @@ SYM_CODE_START_NOALIGN(startup_64) movq %rdx, PER_CPU_VAR(this_cpu_off) #endif +#ifdef CONFIG_PVM_GUEST + leaq _text(%rip), %rdi + call pvm_relocate_kernel +#ifdef CONFIG_SMP + /* Fill __per_cpu_offset[0] again, because it got relocated. */ + movabs $__per_cpu_load, %rdx + movabs $__per_cpu_start, %rax + subq %rax, %rdx + movq %rdx, __per_cpu_offset(%rip) + movq %rdx, PER_CPU_VAR(this_cpu_off) +#endif +#endif + call startup_64_setup_env /* Now switch to __KERNEL_CS so IRET works reliably */ diff --git a/arch/x86/kernel/pvm.c b/arch/x86/kernel/pvm.c index 2d27044eaf25..fc82c71b305b 100644 --- a/arch/x86/kernel/pvm.c +++ b/arch/x86/kernel/pvm.c @@ -13,9 +13,12 @@ #include #include +unsigned long pvm_range_start __initdata; +unsigned long pvm_range_end __initdata; + void __init pvm_early_setup(void) { - if (!pvm_detect()) + if (!pvm_range_end) return; setup_force_cpu_cap(X86_FEATURE_KVM_PVM_GUEST); From patchwork Mon Feb 26 14:36:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572340 Received: from mail-pg1-f179.google.com (mail-pg1-f179.google.com [209.85.215.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7FC3512AAFF; Mon, 26 Feb 2024 14:38:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958294; cv=none; b=UPrNhFKlDxbEoATPjZ2HqgcyzBckcmTGCew0R1bZfKLzCssSzeKPNoOCi5VBpVGsAJLVZSx4+soXHWFeuqTrkRQ4Vmud0Ox4XzjfYFyICEMmH35R+/sLmYrOlOXG8ex7zUveHRZiZufudl+qT4hxWYc6g8/aY5Z7J9zZrGeBr84= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958294; c=relaxed/simple; bh=NCJKU3nUGxcKaNdfIK0z08Xjq9FRBvLE0kkArrKryjo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=fpbyyj7xcz/D2nqv9ocsbIsobbmMKX5xvlJNUenldoIJq8hBCQIR4YA8fNOsP7nknzpGI7ryl6ozO5tCmEOSd7vuO2nIznZ7aETCKl/SHe/7eXNTGu4wVTE9+zHTrNXXd5eeNuUAlyPlmaiw2M2RwzOtHqobIVthqkP/J55MUFk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=UdFfYKj2; arc=none smtp.client-ip=209.85.215.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="UdFfYKj2" Received: by mail-pg1-f179.google.com with SMTP id 41be03b00d2f7-5ce6b5e3c4eso2357652a12.2; Mon, 26 Feb 2024 06:38:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958291; x=1709563091; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+gfvWhFbgTyzu48YJx+Ed8XWBq6XxX8rbNsTiG4trvc=; b=UdFfYKj2m0CkfU1m8MUlXMQRv/yzFe8J5gH4DZM4kOxfrlljZt9jMPUC9WJA5S47BO KvrrlSfmfJWWWxY7HGCJvusbmJAk9mA/Shvhh9OuGipYCyjR6ejUVQu3GcFVgOTFkCpD 7l0xzxiqmCO1kbSyoxXWhsHOg/7qQqBr6T4rypfVn+GCfA4p2quZZvgJXgq55yZjaz1d EBBxycpLnJgk7bNL0yunuF8vNmYi7Ei4LPKyXpANwqvkK+vi2KEc7EOsf5VDw9lDtqjb mT+NGg0T4iWJuVhYpAtzFy0eFhm1RofetySkObRKnS6+jFrudo2kG+Veh15noBMHVv1W awlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958291; x=1709563091; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+gfvWhFbgTyzu48YJx+Ed8XWBq6XxX8rbNsTiG4trvc=; b=liY+99YVASWTL5LqcJPMHUHfJhPHLe2Im3r1bZzHq6HkxZy7cVvS8OCgvNQ8CldAYB EGvdo8JotXwa10OO26v+0Y/V1C0QrTotZgvik9+ntA6s+RiNjZe4nvJHWV5/N1lXGXZs GROHJy+zzepW8wrrdSdL5kINWF6VWd8X5mzz3HYy6J0nCxBq+EVewFO9jkxL58VmX0uS jyxy8xCtoGdvQ49NYOr5Oo5Cs/H+f7B/xHA0X8TUWqugeRZsHJK6VZFJZ2GkMHbx8Gl/ X4En0RScFJUn/Aa5kxBkAJe0UUCQnZ/gAXfAY28cFRYHXfahdQwgJj+nYXOLJ91NkFGt bO6g== X-Forwarded-Encrypted: i=1; AJvYcCW0KlHWyTTS+tSZt/FNmWbiujAIpHZOzulkdBjiwLV6KtK+/yc2VMjfEC1exaXidM5fgIz63RnmGUxfMk7Cjd60EjSpHy8kcowZsjMKWiDCD16ht2DXBONn6pTBByPmmWOM81TFHhQ9Eg== X-Gm-Message-State: AOJu0YyUXYtezgc8w2/JtEnfyBngx0nSSRfREs0ZfWbLMX9iPFtZoKZh +SYqobDl8xafRXnCT+ZSrFwS2uLiC+IAzYExLQrKe1kTUt43Q7cSGOvD/zsR X-Google-Smtp-Source: AGHT+IEIutnYKn12v5tvNLy/G9rTRg1goq2sEiyOH6n7oCM/wCUKEcvtJKvKWrDR0plMg8cjveFEyw== X-Received: by 2002:a17:902:8349:b0:1dc:1fda:202e with SMTP id z9-20020a170902834900b001dc1fda202emr6685293pln.51.1708958291395; Mon, 26 Feb 2024 06:38:11 -0800 (PST) Received: from localhost ([198.11.176.14]) by smtp.gmail.com with ESMTPSA id d3-20020a170903230300b001d9edac54b1sm4015055plh.171.2024.02.26.06.38.10 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:38:11 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Hou Wenlong , Lai Jiangshan , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" , Boris Ostrovsky , Darren Hart , Andy Shevchenko , xen-devel@lists.xenproject.org, platform-driver-x86@vger.kernel.org Subject: [RFC PATCH 56/73] x86/pvm: Relocate kernel image early in PVH entry Date: Mon, 26 Feb 2024 22:36:13 +0800 Message-Id: <20240226143630.33643-57-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Hou Wenlong For a PIE kernel, it runs in a high virtual address in the PVH entry, so it needs to relocate the kernel image early in the PVH entry for the PVM guest. Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/include/asm/init.h | 5 +++++ arch/x86/kernel/head64_identity.c | 5 ----- arch/x86/platform/pvh/enlighten.c | 22 ++++++++++++++++++++++ arch/x86/platform/pvh/head.S | 4 ++++ 4 files changed, 31 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h index cc9ccf61b6bd..f78edef60253 100644 --- a/arch/x86/include/asm/init.h +++ b/arch/x86/include/asm/init.h @@ -4,6 +4,11 @@ #define __head __section(".head.text") +#define SYM_ABS_VA(sym) ({ \ + unsigned long __v; \ + asm("movabsq $" __stringify(sym) ", %0":"=r"(__v)); \ + __v; }) + struct x86_mapping_info { void *(*alloc_pgt_page)(void *); /* allocate buf for page table */ void *context; /* context for alloc_pgt_page */ diff --git a/arch/x86/kernel/head64_identity.c b/arch/x86/kernel/head64_identity.c index 4e6a073d9e6c..f69f9904003c 100644 --- a/arch/x86/kernel/head64_identity.c +++ b/arch/x86/kernel/head64_identity.c @@ -82,11 +82,6 @@ static void __head set_kernel_map_base(unsigned long text_base) } #endif -#define SYM_ABS_VA(sym) ({ \ - unsigned long __v; \ - asm("movabsq $" __stringify(sym) ", %0":"=r"(__v)); \ - __v; }) - static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd) { unsigned long vaddr, vaddr_end; diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index 00a92cb2c814..8c64c31c971b 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -1,8 +1,10 @@ // SPDX-License-Identifier: GPL-2.0 #include +#include #include +#include #include #include #include @@ -113,6 +115,26 @@ static void __init hypervisor_specific_init(bool xen_guest) xen_pvh_init(&pvh_bootparams); } +#ifdef CONFIG_PVM_GUEST +void pvm_relocate_kernel(unsigned long physbase); + +void __init pvm_update_pgtable(unsigned long physbase) +{ + pgdval_t *pgd; + pudval_t *pud; + unsigned long base; + + pvm_relocate_kernel(physbase); + + pgd = (pgdval_t *)init_top_pgt; + base = SYM_ABS_VA(_text); + pgd[pgd_index(base)] = pgd[0]; + pgd[pgd_index(page_offset_base)] = pgd[0]; + pud = (pudval_t *)level3_ident_pgt; + pud[pud_index(base)] = (unsigned long)level2_ident_pgt + _KERNPG_TABLE_NOENC; +} +#endif + /* * This routine (and those that it might call) should not use * anything that lives in .bss since that segment will be cleared later. diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S index baaa3fe34a00..127f297f7257 100644 --- a/arch/x86/platform/pvh/head.S +++ b/arch/x86/platform/pvh/head.S @@ -109,6 +109,10 @@ SYM_CODE_START_LOCAL(pvh_start_xen) wrmsr #ifdef CONFIG_X86_PIE +#ifdef CONFIG_PVM_GUEST + leaq _text(%rip), %rdi + call pvm_update_pgtable +#endif movabs $2f, %rax ANNOTATE_RETPOLINE_SAFE jmp *%rax From patchwork Mon Feb 26 14:36:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572341 Received: from mail-pg1-f175.google.com (mail-pg1-f175.google.com [209.85.215.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ED9C712F36A; Mon, 26 Feb 2024 14:38:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958300; cv=none; b=LWfSxdPljejcs5mYH/tuH9aPUDbUM/Tyfoj0+C75QAWaDF7M1AG9SLff3Kukxd4ja/5dojonDP5Kvqk9XN3ciE8pEK7TGfU8MqdoVlRXHbXv4r2fstChYQcp2gyFVlbl0IZW9sntdsNQmQRxfUzeHT4X6Cu7ZJsBkKvmWD5UgMw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958300; c=relaxed/simple; bh=wQWJDRe13cC/lAPr09l+aehtvQPpP+T7S4UcqUaG0oM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=kN1KBNUUp58lHx0Xyc8I03vA0luST7lV6BzYhgywUAfhFJ/UeDffwiynTr4cyf8IyA2weCDOb3puqrnHNCXSNeUWHJlyp3T0PKgxUV9R+4naqHLervOw/W4jaYGZxEKz3qpJFvdolmcyKNmNWmv3lyt7UxJoXcX9FDqQV1CTWRk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=BVB9AbnQ; arc=none smtp.client-ip=209.85.215.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="BVB9AbnQ" Received: by mail-pg1-f175.google.com with SMTP id 41be03b00d2f7-5c6bd3100fcso2387589a12.3; Mon, 26 Feb 2024 06:38:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958298; x=1709563098; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+RiTkMvyx33Aqiv75PkY1cAw5jxjHAa5YAVlUH/MwnE=; b=BVB9AbnQhzoWZYiW2pKJcB+Ah+Jgn1j1IluP9iWWRzUZH/UrL6wTMndJtEvJBU0j+w oUO9tN09s9x5MHAYrvgyGKkKa4rJrQVirm1jHhanH83reBlqljxN88DMCGT4CDhPK+WJ qPcdjF4+IFEp+SXhdA5SH6voduX3SAO6G8OYrkiyF2k4XsfbsPoD8hcmyw0M5sq8a0kM clqoYsOE9ZGlcOfRTwwof27AgkCAQ2T4bRWvvUc8XMKB4lCwZImxBfNlSkiT8TmvL3Gy yXRPUKEfhTXByDlSnsZ1MOS+aRnP0CZVulRfnHFp6qORSMFM7K9bwbBKKGVJvRO5VBZ9 RTJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958298; x=1709563098; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+RiTkMvyx33Aqiv75PkY1cAw5jxjHAa5YAVlUH/MwnE=; b=ElefZ8huafonUv/Bh9GE44aRKeTXva33cN9akYm68Kkq6C5Nf4GlIzo6EQSZQd81L/ NnbTrbwwrZavPYZ/hG5l6dSzBZOoN8+vqV2drG8rq9r1OqCAuE6eBZAKaLrEcq3As2mf yA5gvmOYciA+HcZ1etBIbL8kEPeSDOtgbJ5nCgCtAkgbK0ao0NdrABVqYN3roqcDy1qy DgYX0LXv1IIjz5p4DFxzA+QeZZp6tfLyERux5F9si2SEIN1L8+rCepSpFbUkxnR3xB5m TkCuz/L+IXMvT49sWEvqQ3wcm473Krs3yUgh3bOM43eWN4gGqKSHmSVsGy/504wvvR8c TdJw== X-Forwarded-Encrypted: i=1; AJvYcCXkNPHbNSvNdSgpicPZGVUxcw9mPnCR7zDfsYgf03soHroCLkb38gz1ZNAdpEAp2x3HrihyNEZuo0i+RiqpMNFr3qNe X-Gm-Message-State: AOJu0YzIfuRSolwoiRBk2AOOVf8O/ODjpPrfLRXjBlNu+fd03FjfAvju pPainCMWRg7Z4BZNW5fVYjDPEg8FxkDU9MkolkPEiHpJ5ipKnCl56vfCPuYf X-Google-Smtp-Source: AGHT+IGS7+CQrWPm6z5nP1RqkHXHHk4aPrrGAQAjzn0ILkLtPs+kVghGUmCwm0mOuOTFyjaUkqS3Ww== X-Received: by 2002:a05:6a20:12cb:b0:1a0:e089:e25e with SMTP id v11-20020a056a2012cb00b001a0e089e25emr6228306pzg.46.1708958297873; Mon, 26 Feb 2024 06:38:17 -0800 (PST) Received: from localhost ([198.11.178.15]) by smtp.gmail.com with ESMTPSA id g11-20020a63e60b000000b005dbd0facb4dsm3930417pgh.61.2024.02.26.06.38.17 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:38:17 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Hou Wenlong , Lai Jiangshan , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Jonathan Corbet , Josh Poimboeuf , Yuntao Wang , Wang Jinchao Subject: [RFC PATCH 57/73] x86/pvm: Make cpu entry area and vmalloc area variable Date: Mon, 26 Feb 2024 22:36:14 +0800 Message-Id: <20240226143630.33643-58-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Hou Wenlong For the PVM guest, the entire kernel layout should be within the allowed virtual address range. Therefore, establish CPU_ENTRY_AREA_BASE and VMEMORY_END as a variable for the PVM guest, allowing it to be modified as necessary for the PVM guest. Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/include/asm/page_64.h | 3 +++ arch/x86/include/asm/pgtable_64_types.h | 14 ++++++++++++-- arch/x86/kernel/head64.c | 7 +++++++ arch/x86/mm/dump_pagetables.c | 3 ++- arch/x86/mm/kaslr.c | 4 ++-- 5 files changed, 26 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h index b8692e6cc939..4f64f049f3d0 100644 --- a/arch/x86/include/asm/page_64.h +++ b/arch/x86/include/asm/page_64.h @@ -18,6 +18,9 @@ extern unsigned long page_offset_base; extern unsigned long vmalloc_base; extern unsigned long vmemmap_base; +extern unsigned long cpu_entry_area_base; +extern unsigned long vmemory_end; + static __always_inline unsigned long __phys_addr_nodebug(unsigned long x) { unsigned long y = x - KERNEL_MAP_BASE; diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index 6780f2e63717..66c8e7325d27 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -140,6 +140,7 @@ extern unsigned int ptrs_per_p4d; # define VMEMMAP_START __VMEMMAP_BASE_L4 #endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */ +#ifndef CONFIG_PVM_GUEST /* * End of the region for which vmalloc page tables are pre-allocated. * For non-KMSAN builds, this is the same as VMALLOC_END. @@ -147,6 +148,10 @@ extern unsigned int ptrs_per_p4d; * VMALLOC_START..VMALLOC_END (see below). */ #define VMEMORY_END (VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1) +#else +#define RAW_VMEMORY_END (__VMALLOC_BASE_L4 + (VMALLOC_SIZE_TB_L4 << 40) - 1) +#define VMEMORY_END vmemory_end +#endif /* CONFIG_PVM_GUEST */ #ifndef CONFIG_KMSAN #define VMALLOC_END VMEMORY_END @@ -166,7 +171,7 @@ extern unsigned int ptrs_per_p4d; * KMSAN_MODULES_ORIGIN_START to * KMSAN_MODULES_ORIGIN_START + MODULES_LEN - origins for modules. */ -#define VMALLOC_QUARTER_SIZE ((VMALLOC_SIZE_TB << 40) >> 2) +#define VMALLOC_QUARTER_SIZE ((VMEMORY_END + 1 - VMALLOC_START) >> 2) #define VMALLOC_END (VMALLOC_START + VMALLOC_QUARTER_SIZE - 1) /* @@ -202,7 +207,12 @@ extern unsigned int ptrs_per_p4d; #define ESPFIX_BASE_ADDR (ESPFIX_PGD_ENTRY << P4D_SHIFT) #define CPU_ENTRY_AREA_PGD _AC(-4, UL) -#define CPU_ENTRY_AREA_BASE (CPU_ENTRY_AREA_PGD << P4D_SHIFT) +#define RAW_CPU_ENTRY_AREA_BASE (CPU_ENTRY_AREA_PGD << P4D_SHIFT) +#ifdef CONFIG_PVM_GUEST +#define CPU_ENTRY_AREA_BASE cpu_entry_area_base +#else +#define CPU_ENTRY_AREA_BASE RAW_CPU_ENTRY_AREA_BASE +#endif #define EFI_VA_START ( -4 * (_AC(1, UL) << 30)) #define EFI_VA_END (-68 * (_AC(1, UL) << 30)) diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 0b0e460609e5..d0e8d648bd38 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -72,6 +72,13 @@ unsigned long kernel_map_base __ro_after_init = __START_KERNEL_map; EXPORT_SYMBOL(kernel_map_base); #endif +#ifdef CONFIG_PVM_GUEST +unsigned long cpu_entry_area_base __ro_after_init = RAW_CPU_ENTRY_AREA_BASE; +EXPORT_SYMBOL(cpu_entry_area_base); +unsigned long vmemory_end __ro_after_init = RAW_VMEMORY_END; +EXPORT_SYMBOL(vmemory_end); +#endif + /* Wipe all early page tables except for the kernel symbol map */ static void __init reset_early_page_tables(void) { diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index d5c6f61242aa..166c7d36d8ff 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -95,7 +95,7 @@ static struct addr_marker address_markers[] = { #ifdef CONFIG_MODIFY_LDT_SYSCALL [LDT_NR] = { 0UL, "LDT remap" }, #endif - [CPU_ENTRY_AREA_NR] = { CPU_ENTRY_AREA_BASE,"CPU entry Area" }, + [CPU_ENTRY_AREA_NR] = { 0UL, "CPU entry Area" }, #ifdef CONFIG_X86_ESPFIX64 [ESPFIX_START_NR] = { ESPFIX_BASE_ADDR, "ESPfix Area", 16 }, #endif @@ -479,6 +479,7 @@ static int __init pt_dump_init(void) address_markers[MODULES_VADDR_NR].start_address = MODULES_VADDR; address_markers[MODULES_END_NR].start_address = MODULES_END; address_markers[FIXADDR_START_NR].start_address = FIXADDR_START; + address_markers[CPU_ENTRY_AREA_NR].start_address = CPU_ENTRY_AREA_BASE; #endif #ifdef CONFIG_X86_32 address_markers[VMALLOC_START_NR].start_address = VMALLOC_START; diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c index 37db264866b6..e3825c7542a3 100644 --- a/arch/x86/mm/kaslr.c +++ b/arch/x86/mm/kaslr.c @@ -38,7 +38,7 @@ * highest amount of space for randomization available, but that's too hard * to keep straight and caused issues already. */ -static const unsigned long vaddr_end = CPU_ENTRY_AREA_BASE; +static const unsigned long vaddr_end = RAW_CPU_ENTRY_AREA_BASE; /* * Memory regions randomized by KASLR (except modules that use a separate logic @@ -79,7 +79,7 @@ void __init kernel_randomize_memory(void) * limited.... */ BUILD_BUG_ON(vaddr_start >= vaddr_end); - BUILD_BUG_ON(vaddr_end != CPU_ENTRY_AREA_BASE); + BUILD_BUG_ON(vaddr_end != RAW_CPU_ENTRY_AREA_BASE); BUILD_BUG_ON(vaddr_end > __START_KERNEL_map); if (!kaslr_memory_enabled()) From patchwork Mon Feb 26 14:36:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572342 Received: from mail-pg1-f182.google.com (mail-pg1-f182.google.com [209.85.215.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 752C712AAC8; Mon, 26 Feb 2024 14:38:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958313; cv=none; b=GrRW3lwLVW2lk/lcCeqsZCFUS/g3TunBjlJCNp/70ZwNbZ4ukK23DndQua6HK86O5fOseFtO6+PcD0wDc6S1ECyCLQ+Dotc2AleDv8OL18PPHpXAsRJoRhFF+14vfFaA+82n8a4v15TxpxBINxEQrBzH0+X1LEsekZ19d1CNODo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958313; c=relaxed/simple; bh=RuI+mIXFCnHTy2O7ADohtxpMA+EmOljcf64cLsOTZSQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=We+evRHZIgnzuDUm0vuV6wLp8qjHW5tZJyOPJN8q1giKWYXQuonhhVsvc+fGVC3oxLDpjvZdnUyX61D+GkiIZXH4NeEgKhklhsRj/XPRfZQVufTvJlIBy50/gmF7qQM+jtxyxjse+QDBbZKNdhGzUwLiEStziwwXCkYi59cIdn8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=C2Jo9S4P; arc=none smtp.client-ip=209.85.215.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="C2Jo9S4P" Received: by mail-pg1-f182.google.com with SMTP id 41be03b00d2f7-5d4d15ec7c5so2457731a12.1; Mon, 26 Feb 2024 06:38:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958310; x=1709563110; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4nV4uIyJE1TrPKasp5oCdCMedKEyqDIAzttVGmNtNdU=; b=C2Jo9S4PWO0hcbh3EZ6O88rV5fPrwLWx5dMjRmwlmEbGg4YJrH4Ywr7O557B1nkg+v QRuZQMLQFKmIQ4Nnm0SzTjRSiBoPBd2YupctGLn8UXXZvSWZ9i4sdQwU3rcfVv3pRtU0 FFOVcUI0TWhLIX7ip57vxpGr4oo6aLG2oLYCDkQN6roz2YyEsePSq3Ov/AYfV0weZxLg t61wI55LieyTWd15boX8W18X/VCxLGA6ulFl76T4H+FyBUoC6zz5+KL8N5l/WIhHRgiK zhhRfXAfAJwsLrgGUNbqCRulJyc54oodRiVOUCT2cqr0dAYbVSV9RzpGXJxBTlGobubh t+CA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958310; x=1709563110; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4nV4uIyJE1TrPKasp5oCdCMedKEyqDIAzttVGmNtNdU=; b=rp2HFagYjPJDkP4dYfbsqGx1vMPqxhbHLjzlHoZOp1ovRW7Ej358CWrB4W5iaAkQcm kLTpBb9gdA1ZuUJhJ7sAqRszxWP8HRkIG04S6xpKvOgX1AkGqCzalwvCj/U+qHQVC4uc veK4LH/BJk59/GpbyK2hKAGMILGPlIkgpVa683lefq/0MTymLUmZ4nzuyECPdi2CiN1g GU/7ecKpGkd2d0/NYcLZqYOGde0lRuz1umxVA8pApUs9K6YdDWNuQAvX0GZbGqrm8M6a jiBZ+LZhoNH0i/YnIhD2JVuoWttx7SjBAu2uWEjw1PQKcXI0MyNaRawUzXMLLBqoVmMZ iDnA== X-Forwarded-Encrypted: i=1; AJvYcCXnV60xfyPX9Y1aJFp+qtib5CP1xgXxmvkfAGk8oJ01EcQfK0vCCC/w2wK5hOIg2+jBB6BYTxSpcVEEkoHGedaxSeBv X-Gm-Message-State: AOJu0YyLgIfk9K3jrQuDgqlNprny21sQ98xriW/aQ/hFgb5VmUWLmga9 iNAMISpJ5Lh9e5BZYbXH2DfnUExOE4zsZ8WJSNMb771uEZz3LKaQr4SvOg6/ X-Google-Smtp-Source: AGHT+IHFApyYJLTpQNsPa0O7HI/vb7qEUTPGVhU45NHrc+zPxscyqHazTshcu7JFv6FdR+LjDyPZrw== X-Received: by 2002:a05:6a20:d486:b0:1a0:ee99:5d07 with SMTP id im6-20020a056a20d48600b001a0ee995d07mr7764444pzb.62.1708958309874; Mon, 26 Feb 2024 06:38:29 -0800 (PST) Received: from localhost ([198.11.178.15]) by smtp.gmail.com with ESMTPSA id e6-20020a63ee06000000b005dc491ccdcesm4053747pgi.14.2024.02.26.06.38.29 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:38:29 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Hou Wenlong , Lai Jiangshan , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski Subject: [RFC PATCH 58/73] x86/pvm: Relocate kernel address space layout Date: Mon, 26 Feb 2024 22:36:15 +0800 Message-Id: <20240226143630.33643-59-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Hou Wenlong Relocate the kernel address space layout to a specific range, which is similar to KASLR. Since there is not enough room for KASAN, KASAN is not supported for PVM guest. Suggested-by: Lai Jiangshan Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/Kconfig | 3 +- arch/x86/include/asm/pvm_para.h | 6 +++ arch/x86/kernel/head64_identity.c | 6 +++ arch/x86/kernel/pvm.c | 64 +++++++++++++++++++++++++++++++ arch/x86/mm/kaslr.c | 4 ++ 5 files changed, 82 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 1b4bea3db53d..ded687cc23ad 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -853,7 +853,8 @@ config KVM_GUEST config PVM_GUEST bool "PVM Guest support" - depends on X86_64 && KVM_GUEST && X86_PIE + depends on X86_64 && KVM_GUEST && X86_PIE && !KASAN + select RANDOMIZE_MEMORY select RELOCATABLE_UNCOMPRESSED_KERNEL default n help diff --git a/arch/x86/include/asm/pvm_para.h b/arch/x86/include/asm/pvm_para.h index efd7afdf9be9..ff0bf0fe7dc4 100644 --- a/arch/x86/include/asm/pvm_para.h +++ b/arch/x86/include/asm/pvm_para.h @@ -10,6 +10,7 @@ #include void __init pvm_early_setup(void); +bool __init pvm_kernel_layout_relocate(void); static inline void pvm_cpuid(unsigned int *eax, unsigned int *ebx, unsigned int *ecx, unsigned int *edx) @@ -64,6 +65,11 @@ static inline bool pvm_detect(void) static inline void pvm_early_setup(void) { } + +static inline bool pvm_kernel_layout_relocate(void) +{ + return false; +} #endif /* CONFIG_PVM_GUEST */ #endif /* _ASM_X86_PVM_PARA_H */ diff --git a/arch/x86/kernel/head64_identity.c b/arch/x86/kernel/head64_identity.c index f69f9904003c..467fe493c9ba 100644 --- a/arch/x86/kernel/head64_identity.c +++ b/arch/x86/kernel/head64_identity.c @@ -396,6 +396,12 @@ static void __head detect_pvm_range(void) pml4_index_end = (msr_val >> 16) & 0x1ff; pvm_range_start = (0x1fffe00 | pml4_index_start) * P4D_SIZE; pvm_range_end = (0x1fffe00 | pml4_index_end) * P4D_SIZE; + + /* + * early page fault would map page into directing mapping area, + * so we should modify 'page_offset_base' here early. + */ + page_offset_base = pvm_range_start; } void __head pvm_relocate_kernel(unsigned long physbase) diff --git a/arch/x86/kernel/pvm.c b/arch/x86/kernel/pvm.c index fc82c71b305b..9cdfbaa15dbb 100644 --- a/arch/x86/kernel/pvm.c +++ b/arch/x86/kernel/pvm.c @@ -10,7 +10,10 @@ */ #define pr_fmt(fmt) "pvm-guest: " fmt +#include + #include +#include #include unsigned long pvm_range_start __initdata; @@ -23,3 +26,64 @@ void __init pvm_early_setup(void) setup_force_cpu_cap(X86_FEATURE_KVM_PVM_GUEST); } + +#define TB_SHIFT 40 +#define HOLE_SIZE (1UL << 39) + +#define PVM_DIRECT_MAPPING_SIZE (8UL << TB_SHIFT) +#define PVM_VMALLOC_SIZE (5UL << TB_SHIFT) +#define PVM_VMEM_MAPPING_SIZE (1UL << TB_SHIFT) + +/* + * For a PVM guest, the hypervisor would provide one valid virtual address + * range for the guest kernel. The guest kernel needs to adjust its layout, + * including the direct mapping area, vmalloc area, vmemmap area, and CPU entry + * area, to be within this range. If the range start is 0xffffd90000000000, the + * PVM guest kernel with 4-level page tables could arrange its layout as + * follows: + * + * ffff800000000000 - ffff87ffffffffff (=43 bits) guard hole, reserved for hypervisor + * ... host kernel used ... guest kernel range start + * ffffd90000000000 - ffffe0ffffffffff (=8 TB) directing mapping of all physical memory + * ffffe10000000000 - ffffe17fffffffff (=39 bit) hole + * ffffe18000000000 - ffffe67fffffffff (=5 TB) vmalloc/ioremap space + * ffffe68000000000 - ffffe6ffffffffff (=39 bit) hole + * ffffe70000000000 - ffffe7ffffffffff (=40 bit) virtual memory map (1TB) + * ffffe80000000000 - ffffe87fffffffff (=39 bit) cpu_entry_area mapping + * ffffe88000000000 - ffffe8ff7fffffff (=510 G) hole + * ffffe8ff80000000 - ffffe8ffffffffff (=2 G) kernel image + * ... host kernel used ... guest kernel range end + * + */ +bool __init pvm_kernel_layout_relocate(void) +{ + unsigned long area_size; + + if (!boot_cpu_has(X86_FEATURE_KVM_PVM_GUEST)) { + vmemory_end = VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1; + return false; + } + + if (!IS_ALIGNED(pvm_range_start, PGDIR_SIZE)) + panic("The start of the allowed range is not aligned"); + + area_size = max_pfn << PAGE_SHIFT; + if (area_size > PVM_DIRECT_MAPPING_SIZE) + panic("The memory size is too large for directing mapping area"); + + vmalloc_base = page_offset_base + PVM_DIRECT_MAPPING_SIZE + HOLE_SIZE; + vmemory_end = vmalloc_base + PVM_VMALLOC_SIZE; + + vmemmap_base = vmemory_end + HOLE_SIZE; + area_size = max_pfn * sizeof(struct page); + if (area_size > PVM_VMEM_MAPPING_SIZE) + panic("The memory size is too large for virtual memory mapping area"); + + cpu_entry_area_base = vmemmap_base + PVM_VMEM_MAPPING_SIZE; + BUILD_BUG_ON(CPU_ENTRY_AREA_MAP_SIZE > (1UL << 39)); + + if (cpu_entry_area_base + (2UL << 39) > pvm_range_end) + panic("The size of the allowed range is too small"); + + return true; +} diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c index e3825c7542a3..f6f332abf515 100644 --- a/arch/x86/mm/kaslr.c +++ b/arch/x86/mm/kaslr.c @@ -28,6 +28,7 @@ #include #include +#include #include "mm_internal.h" @@ -82,6 +83,9 @@ void __init kernel_randomize_memory(void) BUILD_BUG_ON(vaddr_end != RAW_CPU_ENTRY_AREA_BASE); BUILD_BUG_ON(vaddr_end > __START_KERNEL_map); + if (pvm_kernel_layout_relocate()) + return; + if (!kaslr_memory_enabled()) return; From patchwork Mon Feb 26 14:36:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572343 Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D779E12F386; Mon, 26 Feb 2024 14:38:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958315; cv=none; b=A8xUPNp6Gw3RcQd2vcnxbneM2Sav0Qu8uQ9L4d2YSBukZVthBzIUWJfxt9xE+NE8EHZmeRGm5fZZ9TeO88XsUteTSMkGQhy1NnPTpQxQF4DZ/V5TMCxn7GPVpNQlQeNX2L51baBau6/sjfqjXF9ef1DxOEwEo0RqiLPjMZI8ZYw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958315; c=relaxed/simple; bh=f3SxkjdDBsEA/7OShaAvc2oYo2+utzf+LAhffkTk31g=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=FEmzPMARa9SLdBsqEQ34mS0rknNHU5XsCCH8ff0Bi5Dpejc0KnFDhSmv9kQTUlGMdKGSlqS8+JIU7kcosjv0Vlbe35TvRRHzdI4t1HxhJlMOfVXZjazfLzpfb+9di8MMVv/hsbrVI+xGglB/dKIZDO3E57pZDX7wYhQBgiumCYQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=fuyAH+HW; arc=none smtp.client-ip=209.85.214.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="fuyAH+HW" Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-1dc0d11d1b7so23381665ad.2; Mon, 26 Feb 2024 06:38:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958313; x=1709563113; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Eg6g7/q1WqY94yjHZb1HQwZnqsOWQYMcwxgHt4yt/Ww=; b=fuyAH+HWtp7wu2CQSInR70lp3dFidiDVa1X8iA26Z6d/cdgJE02sanUG8qDrS9E1Ox K3RQvpaCTAWrVJ7Rdc+ZwyBDos/ev4JAKzuwCiM5KMSByopqpt3Ei6NMQyIb8iFyvj8j plvEYeqP7UqKsEOJdUZ+1MDD9NWP3qyza8oQO07RchgBTEZMXWlV1lSS8WuRr35GDgeE Ao95+RACnQOHLDbnWEcux6JWyapzJ1UVhp+GhTEEtX8ZtzHFE+eZZuyUAUhDDZIdTTEc 6+BcCkhX72viu6+9PzkiuxPdfXjv3S2fr+nkVZPRSgbNzi0ydNS4mq/1gDC2VTuaQzdk 2E1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958313; x=1709563113; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Eg6g7/q1WqY94yjHZb1HQwZnqsOWQYMcwxgHt4yt/Ww=; b=mMSxFq7q3zuNKmD6gJZGwyn4eRR4ueh/hijoa787pPCVQ8T6iqp6U8fk/bsxFVSxa3 h6wxFdCsgzqFpLmCW6pIDby6JOzW6wv7kj9VCGMdVEfs93EDjG8RJrVM3JDD0p6bE9oa JsoInYqITugyFcp2MdPl9k/vxVkqRKcGqn8hx71ISds0TFHWtAN43t5gclUuDhbGXtH6 hZFX/DtRrIBpB2E7O4mDa8Lk3yTAiut63z8ERdTAnhwEfiYV8MQyJrfFYrfo8Nzhlj+d WqD+AT2RhBencl3/EVkOup9FIZI4wymxEpq9dWzWzwCx9xMzPT96kWtIq+MEDoxaLAcG n4MA== X-Forwarded-Encrypted: i=1; AJvYcCXUaIhGbUlzTBwyzqXibnuQyi9pGTwT9OjEDTYzbCRhD8AsRK+m7xDMfLBJK834zAtKFKrZxW58hXiQ4BxHwjUvWK1o X-Gm-Message-State: AOJu0YwchWc4ZpbeqkD6U1fBwY7XbN7ZDYPVFw6MGKi/5XqNZKousWP8 1ZtukFy1FQ0xJiORRTzphl3R0S835uVeQ46mkrTN3AXahopvXXHyzQZVJxF2 X-Google-Smtp-Source: AGHT+IESs4IqMOf+bMbVqFmjKod4bE6Q3FmhBANJYEYxQVumFhX4wlNGB2w/aW8KteFd5/EKzLullw== X-Received: by 2002:a17:902:ccc4:b0:1dc:b16c:63fa with SMTP id z4-20020a170902ccc400b001dcb16c63famr999109ple.4.1708958312979; Mon, 26 Feb 2024 06:38:32 -0800 (PST) Received: from localhost ([198.11.176.14]) by smtp.gmail.com with ESMTPSA id t5-20020a170902dcc500b001dc6b99af70sm4013399pll.108.2024.02.26.06.38.32 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:38:32 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski Subject: [RFC PATCH 59/73] x86/pti: Force enabling KPTI for PVM guest Date: Mon, 26 Feb 2024 22:36:16 +0800 Message-Id: <20240226143630.33643-60-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan For PVM, it needs the guest to provides two different page tables directly to prevent usermode access to the kernel address space. So force enabling KPTI for PVM guest. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/Kconfig | 1 + arch/x86/mm/pti.c | 7 +++++++ 2 files changed, 8 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index ded687cc23ad..32a2ab49752b 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -854,6 +854,7 @@ config KVM_GUEST config PVM_GUEST bool "PVM Guest support" depends on X86_64 && KVM_GUEST && X86_PIE && !KASAN + select PAGE_TABLE_ISOLATION select RANDOMIZE_MEMORY select RELOCATABLE_UNCOMPRESSED_KERNEL default n diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index 5dd733944629..3b06faeca569 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -84,6 +84,13 @@ void __init pti_check_boottime_disable(void) return; } + if (boot_cpu_has(X86_FEATURE_KVM_PVM_GUEST)) { + pti_mode = PTI_FORCE_ON; + pti_print_if_insecure("force enabled on kvm pvm guest."); + setup_force_cpu_cap(X86_FEATURE_PTI); + return; + } + if (cpu_mitigations_off()) pti_mode = PTI_FORCE_OFF; if (pti_mode == PTI_FORCE_OFF) { From patchwork Mon Feb 26 14:36:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572344 Received: from mail-pg1-f180.google.com (mail-pg1-f180.google.com [209.85.215.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ACC0B1493BB; Mon, 26 Feb 2024 14:38:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958324; cv=none; b=g8E64MszClTn9Tnm0WqNSN7K9p5ENXXT1+btnGfbfwBk+R4s9D2mmsV+7rI6lTezUFb9fhWF9D6/lTvTjrEqxNrnEQGciflCJR3Rxn6Fe3qVHIKatm9dCEpZZBQYqinu3AH5UJyDGYiCecP4BWOtI5cZval2XFuQKdkfLqqJlDQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958324; c=relaxed/simple; bh=SLhgYKuPHi8csFeflHgabtrSknwIG69hI70qF43PcK4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=OQTI1XJz8m85lOf3/3g57stDplpq75kLcMePI2XLXYPZuz2GbKrGNDYKwOmCA8i9ZFXFdVHULkdWj3+THrZIvQ68pTqXcix8VxtZHg0Fnxeaq1/I5OVK6B61lfyMM0WWTfnYC6xl32zAD8bVhkheTYYN8mabn8hVr6KZa9eB/LM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=eARcu9Ue; arc=none smtp.client-ip=209.85.215.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="eARcu9Ue" Received: by mail-pg1-f180.google.com with SMTP id 41be03b00d2f7-5d42e7ab8a9so2024215a12.3; Mon, 26 Feb 2024 06:38:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958322; x=1709563122; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Ug0/Ozbtd//lDi/KPWyH65r/e2ls8s75iPqWftOikEI=; b=eARcu9UecBxp4ADxwPiN3N0XTXACqZZa1oSw3VyGzupHxOW/NDcdjKzH8H3J6WOPyL /rNcjgJjTxb7G0nZA3nC+3xAEWQA0tLVRuof85f625mNYJVB1STasd1ZcEundo21jT69 a93gM+z9K2MeRLPSZTnn+QcTMzcc2y1KgaW4rkpnphWeVJcA7rxcxTgcqhLdEnYpuQYp S+JFyfSHjdKFB8BdxCajBFwNaeiTF6KiRb6qjhiqw4RQFQp3M3PY66H3NU3yxDArj2mL IjLHSoRB/fr4XHPYQEaYdW5TRVVq/swqzcfb8PeaWp6m03Z6m2dG5+prXXrfXyOGXM7w Ajxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958322; x=1709563122; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ug0/Ozbtd//lDi/KPWyH65r/e2ls8s75iPqWftOikEI=; b=lgy8TFthO9KXk9Xt5zHL8earQWaU3tJ7hNLt85V5kMG+EF99bC8V4Ot4tRIwZKJQhy lYfllIn3cOYQIlVwxrbvyRkzMeNffVBwPMjmwH9ZTIG1xSmjseTN9x+vfIz8wsAkagbp 78rBl3frmJJefLATok5RkiTLJ3NmsXvu7lCVb+I+cr3HRdN0h9oF2nCo3xyIsp8HjZGG 2vojgVuQnPUhAS8bAqpzTmcoWf7Bxw4OWV3TFwUvm1TmTdYbDtFA2O0tuZlbhuId+Ix7 rzgBIACiPI/fjPlj7NbAZxCkEPJlpXy521vOw/2UaPlCW1XkC0oLbhHSszogagpHgzZZ nTVw== X-Forwarded-Encrypted: i=1; AJvYcCX07f/gSdbfq/Ua636k/cJNLLkqHZxWHrIy/vvpWwy7c0xxdeiQSldFs7/vHNVF0reZISBVWwWsbh9i5wfdCcmsZnfx X-Gm-Message-State: AOJu0YzMomc1YQW45v9KRQ2YAOPmj4AviZxDPnJnnsqZlPBDemBBR04F 26BJZPePoUSn5QeLxP+kKd+o6i/VkiHh6Ac+e4xzbZNe2d3B83rQNziaHpoA X-Google-Smtp-Source: AGHT+IHmFbJBB+6VdoPL7S61I+zWD/b66M8dOZUr2V5eQrVF+O410IIAKimyG9Bva56zt/zGd2prMg== X-Received: by 2002:a17:90a:bd0a:b0:299:489f:fd2d with SMTP id y10-20020a17090abd0a00b00299489ffd2dmr4509515pjr.20.1708958321604; Mon, 26 Feb 2024 06:38:41 -0800 (PST) Received: from localhost ([47.89.225.180]) by smtp.gmail.com with ESMTPSA id nr14-20020a17090b240e00b00299332505d7sm1426793pjb.26.2024.02.26.06.38.40 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:38:41 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Andy Lutomirski , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 60/73] x86/pvm: Add event entry/exit and dispatch code Date: Mon, 26 Feb 2024 22:36:17 +0800 Message-Id: <20240226143630.33643-61-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan In PVM, it does not use IDT-based event delivery and instead utilizes a specific event delivery method similar to FRED. For user mode events, stack switching and GSBASE switching are done directly by the hypervisor. The default stack in the entry is already the task stack, and user mode states are saved in the shared PVCS structure. In order to avoid modifying the "vector" in the PVCS for direct switching of the syscall event, the syscall event still uses MSR_LSTAR as the entry. For supervisor mode events with vector < 32, old states are saved in the current stack. And for events with vector >=32, old states are saved in the PVCS, since the entry is irq disabled and old states will be saved into stack before enabling irq. Additionally, there is no #DF for PVM guests, as the hypervisor will treat it as a triple fault directly. Finally, no IST is needed. Signed-off-by: Lai Jiangshan Co-developed-by: Hou Wenlong Signed-off-by: Hou Wenlong --- arch/x86/entry/Makefile | 1 + arch/x86/entry/entry_64_pvm.S | 152 +++++++++++++++++++++++++++ arch/x86/include/asm/pvm_para.h | 8 ++ arch/x86/kernel/pvm.c | 181 ++++++++++++++++++++++++++++++++ 4 files changed, 342 insertions(+) create mode 100644 arch/x86/entry/entry_64_pvm.S diff --git a/arch/x86/entry/Makefile b/arch/x86/entry/Makefile index 55dd3f193d99..d9cb970dfe06 100644 --- a/arch/x86/entry/Makefile +++ b/arch/x86/entry/Makefile @@ -20,6 +20,7 @@ obj-y += vsyscall/ obj-$(CONFIG_PREEMPTION) += thunk_$(BITS).o obj-$(CONFIG_IA32_EMULATION) += entry_64_compat.o syscall_32.o obj-$(CONFIG_X86_X32_ABI) += syscall_x32.o +obj-$(CONFIG_PVM_GUEST) += entry_64_pvm.o ifeq ($(CONFIG_X86_64),y) obj-y += entry_64_switcher.o diff --git a/arch/x86/entry/entry_64_pvm.S b/arch/x86/entry/entry_64_pvm.S new file mode 100644 index 000000000000..256baf86a9f3 --- /dev/null +++ b/arch/x86/entry/entry_64_pvm.S @@ -0,0 +1,152 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#include +#include +#include +#include +#include + +#include "calling.h" + +/* Construct struct pt_regs on stack */ +.macro PUSH_IRET_FRAME_FROM_PVCS has_cs_ss:req is_kernel:req + .if \has_cs_ss == 1 + movl PER_CPU_VAR(pvm_vcpu_struct + PVCS_user_ss), %ecx + andl $0xff, %ecx + pushq %rcx /* pt_regs->ss */ + .elseif \is_kernel == 1 + pushq $__KERNEL_DS + .else + pushq $__USER_DS + .endif + + pushq PER_CPU_VAR(pvm_vcpu_struct + PVCS_rsp) /* pt_regs->sp */ + movl PER_CPU_VAR(pvm_vcpu_struct + PVCS_eflags), %ecx + pushq %rcx /* pt_regs->flags */ + + .if \has_cs_ss == 1 + movl PER_CPU_VAR(pvm_vcpu_struct + PVCS_user_cs), %ecx + andl $0xff, %ecx + pushq %rcx /* pt_regs->cs */ + .elseif \is_kernel == 1 + pushq $__KERNEL_CS + .else + pushq $__USER_CS + .endif + + pushq PER_CPU_VAR(pvm_vcpu_struct + PVCS_rip) /* pt_regs->ip */ + + /* set %rcx, %r11 per PVM event handling specification */ + movq PER_CPU_VAR(pvm_vcpu_struct + PVCS_rcx), %rcx + movq PER_CPU_VAR(pvm_vcpu_struct + PVCS_r11), %r11 +.endm + +.code64 +.section .entry.text, "ax" + +SYM_CODE_START(entry_SYSCALL_64_pvm) + UNWIND_HINT_ENTRY + ENDBR + + PUSH_IRET_FRAME_FROM_PVCS has_cs_ss=0 is_kernel=0 + + jmp entry_SYSCALL_64_after_hwframe +SYM_CODE_END(entry_SYSCALL_64_pvm) + +/* + * The new RIP value that PVM event delivery establishes is + * MSR_PVM_EVENT_ENTRY for vector events that occur in user mode. + */ + .align 64 +SYM_CODE_START(pvm_user_event_entry) + UNWIND_HINT_ENTRY + ENDBR + + PUSH_IRET_FRAME_FROM_PVCS has_cs_ss=1 is_kernel=0 + /* pt_regs->orig_ax: errcode and vector */ + pushq PER_CPU_VAR(pvm_vcpu_struct + PVCS_event_errcode) + + PUSH_AND_CLEAR_REGS + movq %rsp, %rdi /* %rdi -> pt_regs */ + call pvm_event + +SYM_INNER_LABEL(pvm_restore_regs_and_return_to_usermode, SYM_L_GLOBAL) + POP_REGS + + /* Copy %rcx, %r11 to the PVM CPU structure. */ + movq %rcx, PER_CPU_VAR(pvm_vcpu_struct + PVCS_rcx) + movq %r11, PER_CPU_VAR(pvm_vcpu_struct + PVCS_r11) + + /* Copy the IRET frame to the PVM CPU structure. */ + movq 1*8(%rsp), %rcx /* RIP */ + movq %rcx, PER_CPU_VAR(pvm_vcpu_struct + PVCS_rip) + movq 2*8(%rsp), %rcx /* CS */ + movw %cx, PER_CPU_VAR(pvm_vcpu_struct + PVCS_user_cs) + movq 3*8(%rsp), %rcx /* RFLAGS */ + movl %ecx, PER_CPU_VAR(pvm_vcpu_struct + PVCS_eflags) + movq 4*8(%rsp), %rcx /* RSP */ + movq %rcx, PER_CPU_VAR(pvm_vcpu_struct + PVCS_rsp) + movq 5*8(%rsp), %rcx /* SS */ + movw %cx, PER_CPU_VAR(pvm_vcpu_struct + PVCS_user_ss) + /* + * We are on the trampoline stack. All regs are live. + * We can do future final exit work right here. + */ + STACKLEAK_ERASE_NOCLOBBER + + addq $6*8, %rsp +SYM_INNER_LABEL(pvm_retu_rip, SYM_L_GLOBAL) + ANNOTATE_NOENDBR + syscall +SYM_CODE_END(pvm_user_event_entry) + +/* + * The new RIP value that PVM event delivery establishes is + * MSR_PVM_EVENT_ENTRY + 256 for events with vector < 32 + * that occur in supervisor mode. + */ + .org pvm_user_event_entry+256, 0xcc +SYM_CODE_START(pvm_kernel_exception_entry) + UNWIND_HINT_ENTRY + ENDBR + + /* set %rcx, %r11 per PVM event handling specification */ + movq 6*8(%rsp), %rcx + movq 7*8(%rsp), %r11 + + PUSH_AND_CLEAR_REGS + movq %rsp, %rdi /* %rdi -> pt_regs */ + call pvm_event + + jmp pvm_restore_regs_and_return_to_kernel +SYM_CODE_END(pvm_kernel_exception_entry) + +/* + * The new RIP value that PVM event delivery establishes is + * MSR_PVM_EVENT_ENTRY + 512 for events with vector >= 32 + * that occur in supervisor mode. + */ + .org pvm_user_event_entry+512, 0xcc +SYM_CODE_START(pvm_kernel_interrupt_entry) + UNWIND_HINT_ENTRY + ENDBR + + /* Reserve space for rcx/r11 */ + subq $16, %rsp + + PUSH_IRET_FRAME_FROM_PVCS has_cs_ss=0 is_kernel=1 + /* pt_regs->orig_ax: errcode and vector */ + pushq PER_CPU_VAR(pvm_vcpu_struct + PVCS_event_errcode) + + PUSH_AND_CLEAR_REGS + movq %rsp, %rdi /* %rdi -> pt_regs */ + call pvm_event + +SYM_INNER_LABEL(pvm_restore_regs_and_return_to_kernel, SYM_L_GLOBAL) + POP_REGS + + movq %rcx, 6*8(%rsp) + movq %r11, 7*8(%rsp) +SYM_INNER_LABEL(pvm_rets_rip, SYM_L_GLOBAL) + ANNOTATE_NOENDBR + syscall +SYM_CODE_END(pvm_kernel_interrupt_entry) diff --git a/arch/x86/include/asm/pvm_para.h b/arch/x86/include/asm/pvm_para.h index ff0bf0fe7dc4..c344185a192c 100644 --- a/arch/x86/include/asm/pvm_para.h +++ b/arch/x86/include/asm/pvm_para.h @@ -5,6 +5,8 @@ #include #include +#ifndef __ASSEMBLY__ + #ifdef CONFIG_PVM_GUEST #include #include @@ -72,4 +74,10 @@ static inline bool pvm_kernel_layout_relocate(void) } #endif /* CONFIG_PVM_GUEST */ +void entry_SYSCALL_64_pvm(void); +void pvm_user_event_entry(void); +void pvm_retu_rip(void); +void pvm_rets_rip(void); +#endif /* !__ASSEMBLY__ */ + #endif /* _ASM_X86_PVM_PARA_H */ diff --git a/arch/x86/kernel/pvm.c b/arch/x86/kernel/pvm.c index 9cdfbaa15dbb..9399e45b3c13 100644 --- a/arch/x86/kernel/pvm.c +++ b/arch/x86/kernel/pvm.c @@ -11,14 +11,195 @@ #define pr_fmt(fmt) "pvm-guest: " fmt #include +#include #include #include +#include #include +#include + +DEFINE_PER_CPU_PAGE_ALIGNED(struct pvm_vcpu_struct, pvm_vcpu_struct); unsigned long pvm_range_start __initdata; unsigned long pvm_range_end __initdata; +static noinstr void pvm_bad_event(struct pt_regs *regs, unsigned long vector, + unsigned long error_code) +{ + irqentry_state_t irq_state = irqentry_nmi_enter(regs); + + instrumentation_begin(); + + /* Panic on events from a high stack level */ + if (!user_mode(regs)) { + pr_emerg("PANIC: invalid or fatal PVM event;" + "vector %lu error 0x%lx at %04lx:%016lx\n", + vector, error_code, regs->cs, regs->ip); + die("invalid or fatal PVM event", regs, error_code); + panic("invalid or fatal PVM event"); + } else { + unsigned long flags = oops_begin(); + int sig = SIGKILL; + + pr_alert("BUG: invalid or fatal FRED event;" + "vector %lu error 0x%lx at %04lx:%016lx\n", + vector, error_code, regs->cs, regs->ip); + + if (__die("Invalid or fatal FRED event", regs, error_code)) + sig = 0; + + oops_end(flags, regs, sig); + } + instrumentation_end(); + irqentry_nmi_exit(regs, irq_state); +} + +DEFINE_IDTENTRY_RAW(pvm_exc_debug) +{ + /* + * There's no IST on PVM. but we still need to sipatch + * to the correct handler. + */ + if (user_mode(regs)) + noist_exc_debug(regs); + else + exc_debug(regs); +} + +#ifdef CONFIG_X86_MCE +DEFINE_IDTENTRY_RAW(pvm_exc_machine_check) +{ + /* + * There's no IST on PVM, but we still need to dispatch + * to the correct handler. + */ + if (user_mode(regs)) + noist_exc_machine_check(regs); + else + exc_machine_check(regs); +} +#endif + +static noinstr void pvm_exception(struct pt_regs *regs, unsigned long vector, + unsigned long error_code) +{ + /* Optimize for #PF. That's the only exception which matters performance wise */ + if (likely(vector == X86_TRAP_PF)) { + exc_page_fault(regs, error_code); + return; + } + + switch (vector) { + case X86_TRAP_DE: return exc_divide_error(regs); + case X86_TRAP_DB: return pvm_exc_debug(regs); + case X86_TRAP_NMI: return exc_nmi(regs); + case X86_TRAP_BP: return exc_int3(regs); + case X86_TRAP_OF: return exc_overflow(regs); + case X86_TRAP_BR: return exc_bounds(regs); + case X86_TRAP_UD: return exc_invalid_op(regs); + case X86_TRAP_NM: return exc_device_not_available(regs); + case X86_TRAP_DF: return exc_double_fault(regs, error_code); + case X86_TRAP_TS: return exc_invalid_tss(regs, error_code); + case X86_TRAP_NP: return exc_segment_not_present(regs, error_code); + case X86_TRAP_SS: return exc_stack_segment(regs, error_code); + case X86_TRAP_GP: return exc_general_protection(regs, error_code); + case X86_TRAP_MF: return exc_coprocessor_error(regs); + case X86_TRAP_AC: return exc_alignment_check(regs, error_code); + case X86_TRAP_XF: return exc_simd_coprocessor_error(regs); +#ifdef CONFIG_X86_MCE + case X86_TRAP_MC: return pvm_exc_machine_check(regs); +#endif +#ifdef CONFIG_X86_CET + case X86_TRAP_CP: return exc_control_protection(regs, error_code); +#endif + default: return pvm_bad_event(regs, vector, error_code); + } +} + +static noinstr void pvm_handle_INT80_compat(struct pt_regs *regs) +{ +#ifdef CONFIG_IA32_EMULATION + if (ia32_enabled()) { + int80_emulation(regs); + return; + } +#endif + exc_general_protection(regs, 0); +} + +typedef void (*idtentry_t)(struct pt_regs *regs); + +#define SYSVEC(_vector, _function) [_vector - FIRST_SYSTEM_VECTOR] = sysvec_##_function + +#define pvm_handle_spurious_interrupt ((idtentry_t)(void *)spurious_interrupt) + +static idtentry_t pvm_sysvec_table[NR_SYSTEM_VECTORS] __ro_after_init = { + [0 ... NR_SYSTEM_VECTORS-1] = pvm_handle_spurious_interrupt, + + SYSVEC(ERROR_APIC_VECTOR, error_interrupt), + SYSVEC(SPURIOUS_APIC_VECTOR, spurious_apic_interrupt), + SYSVEC(LOCAL_TIMER_VECTOR, apic_timer_interrupt), + SYSVEC(X86_PLATFORM_IPI_VECTOR, x86_platform_ipi), + +#ifdef CONFIG_SMP + SYSVEC(RESCHEDULE_VECTOR, reschedule_ipi), + SYSVEC(CALL_FUNCTION_SINGLE_VECTOR, call_function_single), + SYSVEC(CALL_FUNCTION_VECTOR, call_function), + SYSVEC(REBOOT_VECTOR, reboot), +#endif +#ifdef CONFIG_X86_MCE_THRESHOLD + SYSVEC(THRESHOLD_APIC_VECTOR, threshold), +#endif +#ifdef CONFIG_X86_MCE_AMD + SYSVEC(DEFERRED_ERROR_VECTOR, deferred_error), +#endif +#ifdef CONFIG_X86_THERMAL_VECTOR + SYSVEC(THERMAL_APIC_VECTOR, thermal), +#endif +#ifdef CONFIG_IRQ_WORK + SYSVEC(IRQ_WORK_VECTOR, irq_work), +#endif +#ifdef CONFIG_HAVE_KVM + SYSVEC(POSTED_INTR_VECTOR, kvm_posted_intr_ipi), + SYSVEC(POSTED_INTR_WAKEUP_VECTOR, kvm_posted_intr_wakeup_ipi), + SYSVEC(POSTED_INTR_NESTED_VECTOR, kvm_posted_intr_nested_ipi), +#endif +}; + +/* + * some pointers in pvm_sysvec_table are actual spurious_interrupt() who + * expects the second argument to be the vector. + */ +typedef void (*idtentry_x_t)(struct pt_regs *regs, int vector); + +static __always_inline void pvm_handle_sysvec(struct pt_regs *regs, unsigned long vector) +{ + unsigned int index = array_index_nospec(vector - FIRST_SYSTEM_VECTOR, + NR_SYSTEM_VECTORS); + idtentry_x_t func = (void *)pvm_sysvec_table[index]; + + func(regs, vector); +} + +__visible noinstr void pvm_event(struct pt_regs *regs) +{ + u32 error_code = regs->orig_ax; + u64 vector = regs->orig_ax >> 32; + + /* Invalidate orig_ax so that syscall_get_nr() works correctly */ + regs->orig_ax = -1; + + if (vector < NUM_EXCEPTION_VECTORS) + pvm_exception(regs, vector, error_code); + else if (vector >= FIRST_SYSTEM_VECTOR) + pvm_handle_sysvec(regs, vector); + else if (unlikely(vector == IA32_SYSCALL_VECTOR)) + pvm_handle_INT80_compat(regs); + else + common_interrupt(regs, vector); +} + void __init pvm_early_setup(void) { if (!pvm_range_end) From patchwork Mon Feb 26 14:36:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572345 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E3E8A149DF2; Mon, 26 Feb 2024 14:38:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958331; cv=none; b=t8FahfVkj+fVSQqjiUHcEht7IwfzzuBCkCbofw99hfU8LQKDW93UkO6K9FEhn60ARsY6O0MUim8mUd3CpjS89oduU/PG/IsvGRACKAbit29bzPWw6TlMrbnrfGPTwzd8sj/ikn+Iadx/U+Ut3bORZ5RLjQjnfNvJptDoBYP9210= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958331; c=relaxed/simple; bh=Kpvu6OKJDmlESu/r7oOCgeLu3G/zXXrBQ7wFNVyEq9s=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=nJkFhxrUMGoybzDiuW7YbxR6v5a5OqItH4wDSepnE/2e97yOy6BB7e3fNsLDhRBoX8LUUbWhX64cAUSAlLwkyQZmPsSqKo2Qh4hfspFSpFTI45maLpZgjniydGkBfVXJpR84HHbmReY0Av6+2vqbIBViMF8CS96ZF9eOcQ3/j4M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=TcT2jD8W; arc=none smtp.client-ip=209.85.210.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="TcT2jD8W" Received: by mail-pf1-f181.google.com with SMTP id d2e1a72fcca58-6e0f803d9dfso2070735b3a.0; Mon, 26 Feb 2024 06:38:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958329; x=1709563129; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=kzSWReBljDNFLKeUS+5Afa2lRsM7svYZxZXhjEgYHMY=; b=TcT2jD8W87TQtys8SDQbwE84PnMTjaK2EVmFOtPsMSclJr6LmkD++XZ7vTmikrYfqy k8wac7zAF1DVQXyTD/CCGSs3/+oO9RYSSgxZBx/CVJI0AHGTRv8CRpzNBb0D/dI4ards K8EkkBPp5//K/jL8/Qgz8woNT36Zn95wmyOur0c3lVcJDvrAW8VXEPHlk3sen9vkbkUH paioUs3IEVD5K7kOEhue6NnkSl4SojEPx/WrZs3cHSi/HqaeQTVNQ6GVYd3uJJQNKJrz OkKMzhA4fjSJf57Pw7a+F8rqgUTUSSzD0Cwb461nfPmvagvmbESC8gTYuMpzRIUkuKEh 0oTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958329; x=1709563129; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kzSWReBljDNFLKeUS+5Afa2lRsM7svYZxZXhjEgYHMY=; b=NFgn5UEzpnSkq/8eU/2Ba+gHur7hqmdxFzXxiMG5vVwQydsPgzP09s3aLat9OW0m/5 LOyQRqubfteTV2P3D4XXD/rDuiVTjx3V1PkTYcBuasLPoaI+0SopvrAre/N8nJzSlipM BYDN3+tpjVhEGXoHIsukgSusp+gZGeLtZRiWtahuKPig0F8lcoZc5mPDJUi+aFGmgoDD y+67W4xEMQDf2Yc5jDT5OXxCbvWJYd6IYUXMSzp+nq/GABsp+wjO8iD0+WaxyCLACfMU ZZ2G6/HC1kYe6jbte3ourxRjKT3n14ZeyhahAks4avJREY2wWDAX5Ss6aSYvj3oL6OK5 YgYg== X-Forwarded-Encrypted: i=1; AJvYcCU6xzSS2G0YvwBY/ZiB1Gg2n8N8nrpcmoJp48Ip459vSYLek1KYhMM4tMu26Y1MUMwSvPhmpaGl3jyjF2Qs4tU1Mlfq X-Gm-Message-State: AOJu0YzRi1ekZdC6gRZbY18PoKmH6IgiI5paBaTATHCt0qGkVy6emPWh KLtNIbXrDC2Bf73sj7XQIi+jSTFFy0JWDs/Vdm8xKBRu2zMH+wXoipGzbEXl X-Google-Smtp-Source: AGHT+IH/x2k47qxnhOEYdzCjBRPg9dbKoudy8fclIz1n+Ihu7Hk4wZHA1MBpxEJ6fq53oUsC3z7yEw== X-Received: by 2002:a05:6a21:1394:b0:1a0:eea8:933d with SMTP id oa20-20020a056a21139400b001a0eea8933dmr7158469pzb.42.1708958329024; Mon, 26 Feb 2024 06:38:49 -0800 (PST) Received: from localhost ([47.89.225.180]) by smtp.gmail.com with ESMTPSA id m20-20020a63f614000000b005b7dd356f75sm4070253pgh.32.2024.02.26.06.38.48 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:38:48 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Hou Wenlong , Lai Jiangshan , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" , Wanpeng Li , Vitaly Kuznetsov Subject: [RFC PATCH 61/73] x86/pvm: Allow to install a system interrupt handler Date: Mon, 26 Feb 2024 22:36:18 +0800 Message-Id: <20240226143630.33643-62-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Hou Wenlong Add pvm_sysvec_install() to install a system interrupt handler into PVM system interrupt handler table. Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/include/asm/pvm_para.h | 6 ++++++ arch/x86/kernel/kvm.c | 2 ++ arch/x86/kernel/pvm.c | 11 +++++++++-- 3 files changed, 17 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pvm_para.h b/arch/x86/include/asm/pvm_para.h index c344185a192c..9216e539fea8 100644 --- a/arch/x86/include/asm/pvm_para.h +++ b/arch/x86/include/asm/pvm_para.h @@ -6,12 +6,14 @@ #include #ifndef __ASSEMBLY__ +typedef void (*idtentry_t)(struct pt_regs *regs); #ifdef CONFIG_PVM_GUEST #include #include void __init pvm_early_setup(void); +void __init pvm_install_sysvec(unsigned int sysvec, idtentry_t handler); bool __init pvm_kernel_layout_relocate(void); static inline void pvm_cpuid(unsigned int *eax, unsigned int *ebx, @@ -68,6 +70,10 @@ static inline void pvm_early_setup(void) { } +static inline void pvm_install_sysvec(unsigned int sysvec, idtentry_t handler) +{ +} + static inline bool pvm_kernel_layout_relocate(void) { return false; diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index de72a5a1f7ad..87b00c279aaf 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -43,6 +43,7 @@ #include #include #include +#include DEFINE_STATIC_KEY_FALSE(kvm_async_pf_enabled); @@ -843,6 +844,7 @@ static void __init kvm_guest_init(void) if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_INT) && kvmapf) { static_branch_enable(&kvm_async_pf_enabled); alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_kvm_asyncpf_interrupt); + pvm_install_sysvec(HYPERVISOR_CALLBACK_VECTOR, sysvec_kvm_asyncpf_interrupt); } #ifdef CONFIG_SMP diff --git a/arch/x86/kernel/pvm.c b/arch/x86/kernel/pvm.c index 9399e45b3c13..88b013185ecd 100644 --- a/arch/x86/kernel/pvm.c +++ b/arch/x86/kernel/pvm.c @@ -128,8 +128,6 @@ static noinstr void pvm_handle_INT80_compat(struct pt_regs *regs) exc_general_protection(regs, 0); } -typedef void (*idtentry_t)(struct pt_regs *regs); - #define SYSVEC(_vector, _function) [_vector - FIRST_SYSTEM_VECTOR] = sysvec_##_function #define pvm_handle_spurious_interrupt ((idtentry_t)(void *)spurious_interrupt) @@ -167,6 +165,15 @@ static idtentry_t pvm_sysvec_table[NR_SYSTEM_VECTORS] __ro_after_init = { #endif }; +void __init pvm_install_sysvec(unsigned int sysvec, idtentry_t handler) +{ + if (WARN_ON_ONCE(sysvec < FIRST_SYSTEM_VECTOR)) + return; + if (!WARN_ON_ONCE(pvm_sysvec_table[sysvec - FIRST_SYSTEM_VECTOR] != + pvm_handle_spurious_interrupt)) + pvm_sysvec_table[sysvec - FIRST_SYSTEM_VECTOR] = handler; +} + /* * some pointers in pvm_sysvec_table are actual spurious_interrupt() who * expects the second argument to be the vector. From patchwork Mon Feb 26 14:36:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572346 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8649512AADA; Mon, 26 Feb 2024 14:38:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958340; cv=none; b=mzU897vbxTasLG3KJvAdsDOHWs4N14GYdVNdMDyZ2uDw18RPCnDYKLIu5qk7NOD2TgifD71VLgWhVAVwDtPEjAGEG9t/8TLrkmxi38YbeJYRQpgnaS2XRSGCJA23H/+pOGL0L79bCYxJwpqXbKNg19HUhIdVg5WorwOnW/06sNc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958340; c=relaxed/simple; bh=PlFQWKQVgSVHs1FKXuyLOo//+7TJ3KioqGv56Yq15H0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=g+vwgR5HekCdbAekaQeZFOWZ9dFUELsr5Lu71Xk8AYk6HDIAyySWoEFEe7zXoru97Om2JCKqMZ1Wou97XDyafmPJdW7D3TWsXP1aEpnvgoMsoleV/xUFyiVI3EHmZ/12BxM+i/RNc6RvxbbN0XhFNMof2obWm5q2jFWFRPFWzBU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=mcm2MFsR; arc=none smtp.client-ip=209.85.210.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="mcm2MFsR" Received: by mail-pf1-f176.google.com with SMTP id d2e1a72fcca58-6e53f3f1f82so217878b3a.2; Mon, 26 Feb 2024 06:38:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958338; x=1709563138; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Np68LovMToSBu5RVhorHMFTejoXHFfwbO2NJYg4bpNo=; b=mcm2MFsRiSu6/4q73/xxJfmQoc7sq0IM8GRX1OG/7M+0ILTe6HEfjUW2IAY4SB2yeo UwFfWHe9KzrWhoH9yVf+w7kXe/TGmHDiLqkVW21OcbaMvSURyQle4Y8BepVksTgnVN9h U1FheYSwH++34r389jXi5EjhKEwXdQuAt2yTyva1c0GZSq64JdWGR59U0TLiNYp9LCNx uhUKi//oQ9Or3J8vKZ1WgBAFYyiIEWoX5SfdF1f6yipfqAfGoTxbs7vfg+39SwcRGBgv NX6nZ4l5dT5tURkPOATAL8GMZ91BGXJZS+Camx0X/9A6hn5kZpMyPlTQEPreFseulk1+ SYng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958338; x=1709563138; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Np68LovMToSBu5RVhorHMFTejoXHFfwbO2NJYg4bpNo=; b=FxdFGovYx6iFJFeShpsghXZ7+rbkt9SIJxIfk4AOWxi8K7R153N5g4158YOBV38JID m0rtM7eTGmMUWa6h5tGNX/BjNvwGZyV8bOOY27lOvPGPdekUEtnub3WKk4oKWE/iyfDw IGRc7Vg5D4JxSjPDJ4+qrAvPZkG1qf4J7oGwWvM7RyVJQWX+EO79eZhZLsTPmHIZWrEf dFWfNzsoUdwerksYJth5bM13CXKiVr7kBBfErzS7Xit0Rf78/9rhze7L0kD8kMHAfzD3 ipl5yqkv/DH9X1jb+gCVxQhvtVHjNMR8jU6qGepJEhxnDnnxrCSln7RLjfwQMDZQ82dJ HKDA== X-Forwarded-Encrypted: i=1; AJvYcCX4pI/OieRxS4JESpuXRU3MwkINbse5uqpt76u/4SM1O1C3O5srlOwkt8eqP09TJWjCvFmilyO8amSDTcT3EH6RCr2x X-Gm-Message-State: AOJu0Yw+P/g7Qy9aFXxjZrNJ1CRINZsnCZSfnmbjJpeIgJfj30FSlRpZ qdlC3Flak5U4p35PE/TwQ0D+/TL6gGCF/YzNedgG1SAuYJWOjfoZfP7XjIp0 X-Google-Smtp-Source: AGHT+IEUOW5x94cpzV4bcvaCVnBjAKoiMDPSirqAMkLuAwFFgvGAt9Nnyg7nIAZ6iu3cxOovBq6vXQ== X-Received: by 2002:a05:6a20:4386:b0:1a0:6c04:4bba with SMTP id i6-20020a056a20438600b001a06c044bbamr7886045pzl.11.1708958337590; Mon, 26 Feb 2024 06:38:57 -0800 (PST) Received: from localhost ([47.89.225.180]) by smtp.gmail.com with ESMTPSA id e1-20020a170902744100b001dc944299acsm2628990plt.217.2024.02.26.06.38.56 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:38:57 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Hou Wenlong , Lai Jiangshan , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" , David Woodhouse , Brian Gerst , Josh Poimboeuf , Thomas Garnier , Ard Biesheuvel , Tom Lendacky Subject: [RFC PATCH 62/73] x86/pvm: Add early kernel event entry and dispatch code Date: Mon, 26 Feb 2024 22:36:19 +0800 Message-Id: <20240226143630.33643-63-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Hou Wenlong Since PVM doesn't support IDT-based event delivery, it needs to handle early kernel events during the booting. Currently, there are two stages before the final IDT setup. Firstly, all exception handlers are set as do_early_exception() in idt_setup_early_handlers(). Later, #DB, #BP, and dispatch code. Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/include/asm/pvm_para.h | 5 +++++ arch/x86/kernel/head_64.S | 21 +++++++++++++++++++++ arch/x86/kernel/pvm.c | 33 +++++++++++++++++++++++++++++++++ 3 files changed, 59 insertions(+) diff --git a/arch/x86/include/asm/pvm_para.h b/arch/x86/include/asm/pvm_para.h index 9216e539fea8..bfb08f0ea293 100644 --- a/arch/x86/include/asm/pvm_para.h +++ b/arch/x86/include/asm/pvm_para.h @@ -13,6 +13,7 @@ typedef void (*idtentry_t)(struct pt_regs *regs); #include void __init pvm_early_setup(void); +void __init pvm_setup_early_traps(void); void __init pvm_install_sysvec(unsigned int sysvec, idtentry_t handler); bool __init pvm_kernel_layout_relocate(void); @@ -70,6 +71,10 @@ static inline void pvm_early_setup(void) { } +static inline void pvm_setup_early_traps(void) +{ +} + static inline void pvm_install_sysvec(unsigned int sysvec, idtentry_t handler) { } diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 1d931bab4393..6ad3aedca7da 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -633,6 +633,27 @@ SYM_CODE_START_NOALIGN(vc_no_ghcb) SYM_CODE_END(vc_no_ghcb) #endif +#ifdef CONFIG_PVM_GUEST + .align 256 +SYM_CODE_START_NOALIGN(pvm_early_kernel_event_entry) + UNWIND_HINT_ENTRY + ENDBR + + incl early_recursion_flag(%rip) + + /* set %rcx, %r11 per PVM event handling specification */ + movq 6*8(%rsp), %rcx + movq 7*8(%rsp), %r11 + + PUSH_AND_CLEAR_REGS + movq %rsp, %rdi /* %rdi -> pt_regs */ + call pvm_early_event + + decl early_recursion_flag(%rip) + jmp pvm_restore_regs_and_return_to_kernel +SYM_CODE_END(pvm_early_kernel_event_entry) +#endif + #define SYM_DATA_START_PAGE_ALIGNED(name) \ SYM_START(name, SYM_L_GLOBAL, .balign PAGE_SIZE) diff --git a/arch/x86/kernel/pvm.c b/arch/x86/kernel/pvm.c index 88b013185ecd..b3b4ff0bbc91 100644 --- a/arch/x86/kernel/pvm.c +++ b/arch/x86/kernel/pvm.c @@ -17,6 +17,7 @@ #include #include #include +#include #include DEFINE_PER_CPU_PAGE_ALIGNED(struct pvm_vcpu_struct, pvm_vcpu_struct); @@ -24,6 +25,38 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct pvm_vcpu_struct, pvm_vcpu_struct); unsigned long pvm_range_start __initdata; unsigned long pvm_range_end __initdata; +static bool early_traps_setup __initdata; + +void __init pvm_early_event(struct pt_regs *regs) +{ + int vector = regs->orig_ax >> 32; + + if (!early_traps_setup) { + do_early_exception(regs, vector); + return; + } + + switch (vector) { + case X86_TRAP_DB: + exc_debug(regs); + return; + case X86_TRAP_BP: + exc_int3(regs); + return; + case X86_TRAP_PF: + exc_page_fault(regs, regs->orig_ax); + return; + default: + do_early_exception(regs, vector); + return; + } +} + +void __init pvm_setup_early_traps(void) +{ + early_traps_setup = true; +} + static noinstr void pvm_bad_event(struct pt_regs *regs, unsigned long vector, unsigned long error_code) { From patchwork Mon Feb 26 14:36:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572347 Received: from mail-oo1-f46.google.com (mail-oo1-f46.google.com [209.85.161.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2A89712F59E; Mon, 26 Feb 2024 14:39:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958347; cv=none; b=oOVSByAtxUtxBOyx2iHjq4l9TNI6pKMYoj4QdMaD6yl0B8wQfw3OREBCYAxpATnTzsyC7tSXC9IpbpWncvXBX/6XRoaGUcVZb2axzBdJSgb1RjtN9oXQFpuAg0J7aWkgSPVUTOp/zrrImG4UvZGkU3vGw1I0neO7UBI3Y9cNu0I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958347; c=relaxed/simple; bh=ag8Q2ckwAIVyhLdyymUbxED1UwHq/TzpNBbOj/uIlg0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=D44EfaVkAyko3gT5hB6NDktMH0da0/T6JDZXp96kZK7Fphq6uisnbfncKYTCASl6U93lsOxU9sVeIL+KIxByjAaiwTrvuRoEks8GdGs7LuXBCbqwHX9rJAdJBZ9iRTnmuYqB9PczZ3tvj3lzF1xkwLDZ5pPjLV+M0VLPf6LYKZE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=L0DHxCtV; arc=none smtp.client-ip=209.85.161.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="L0DHxCtV" Received: by mail-oo1-f46.google.com with SMTP id 006d021491bc7-5a034c12090so1301395eaf.2; Mon, 26 Feb 2024 06:39:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958345; x=1709563145; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=fwNizCD/fZ2aImktIeVQE2S+xAX0NlliquuDy32Zgd8=; b=L0DHxCtVnNhJxDvTs0P7h6tOdL0Ox+vhwp5bttqAiHEWnxChsLb6wnIE/qJUQq4TL6 vCS78FndR5hKD4nFREPWypqxjHzJ94/W1J+m43cjve7tLVNQ3kJOZJPnFsBZI/zf9a0a hwiNSoOIMhHxsarTppXIx8ksHVC5uSWf8LQIKBLFkIlBcOZL1XTFAN6q8mCgFWN6/J9K 0e5s+Lu+EMUAhSYWCiTAFgitEMt8+o1nVprCutLd04t5Nb/UjM9s6bvVYLqdU8GDau55 h5SIdjFMP+aSuG6UMSdEk/2NaAj5pESVeo2Z9zT7qbK3tUGAyOz0VccAiaiaDYEYIpDK BoqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958345; x=1709563145; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fwNizCD/fZ2aImktIeVQE2S+xAX0NlliquuDy32Zgd8=; b=snom6wXwP+SGSiPihiSbk9SwyK5Fm1+VsAxBS8zxQ8e5F+G9oHx1XfqWkTV2+YMsck vNTZ5dRTF6uQtPOYELTHHtKGVC1GAtE/fZltD4Bb8KcBZIyWZc4v7gwMsQ+7V23kNEfr NR9DscQvoEWlEHP0F8kben7WVnE8EfkY4/bw3sMorZdHCFgelP9nfuqELSL6Ckr94927 /f6u1QTCzgjMAcrxOBXqoFBpI/6J8UQB1Uyom8nO/F4JqqzydY2mHyVfqefmg9/Mt9Fc H+FvCJf0agScIc1FLMx0vItl9oRL1NrncnSZJbA7HMFKYyVU4yAH0zCe4ldO8hjXQ0jl bhSA== X-Forwarded-Encrypted: i=1; AJvYcCUQd3QsmdwIcExkhi9OTmjEmktDyKUfFZbu6TvgsIEIozsfgOucXReqmcT0GnORxGBKEEM4lJ+fCDIEWGhsv7lxT3NU X-Gm-Message-State: AOJu0Yz8J/8pb5WVTDYNEzWvvc4RqjcCvKDtDqI2szk0ZwFg5UZaXz81 TEGFQPjKdNwrTsHizPctryfHYfJkeT2dcl+d0WGrry2pn7rz59nCatKnvk7D X-Google-Smtp-Source: AGHT+IEv6+SMIA4XY6jCfQBvGsq2ofA2gzV1dxKYvTWh4nVs4/v+GPoNA3ko3QSN22pGFMrnguOfgg== X-Received: by 2002:a05:6358:190a:b0:17b:5b6a:de9d with SMTP id w10-20020a056358190a00b0017b5b6ade9dmr9800743rwm.23.1708958344977; Mon, 26 Feb 2024 06:39:04 -0800 (PST) Received: from localhost ([47.254.32.37]) by smtp.gmail.com with ESMTPSA id o7-20020a63f147000000b005dc9439c56bsm4053749pgk.13.2024.02.26.06.39.04 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:39:04 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Andy Lutomirski , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 63/73] x86/pvm: Add hypercall support Date: Mon, 26 Feb 2024 22:36:20 +0800 Message-Id: <20240226143630.33643-64-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan For the PVM guest, it will use the syscall instruction as the hypercall instruction and follow the KVM hypercall call convention. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/entry/entry_64_pvm.S | 15 +++++++++++ arch/x86/include/asm/pvm_para.h | 1 + arch/x86/kernel/pvm.c | 46 +++++++++++++++++++++++++++++++++ 3 files changed, 62 insertions(+) diff --git a/arch/x86/entry/entry_64_pvm.S b/arch/x86/entry/entry_64_pvm.S index 256baf86a9f3..abb57e251e73 100644 --- a/arch/x86/entry/entry_64_pvm.S +++ b/arch/x86/entry/entry_64_pvm.S @@ -52,6 +52,21 @@ SYM_CODE_START(entry_SYSCALL_64_pvm) jmp entry_SYSCALL_64_after_hwframe SYM_CODE_END(entry_SYSCALL_64_pvm) +.pushsection .noinstr.text, "ax" +SYM_FUNC_START(pvm_hypercall) + push %r11 + push %r10 + movq %rcx, %r10 + UNWIND_HINT_SAVE + syscall + UNWIND_HINT_RESTORE + movq %r10, %rcx + popq %r10 + popq %r11 + RET +SYM_FUNC_END(pvm_hypercall) +.popsection + /* * The new RIP value that PVM event delivery establishes is * MSR_PVM_EVENT_ENTRY for vector events that occur in user mode. diff --git a/arch/x86/include/asm/pvm_para.h b/arch/x86/include/asm/pvm_para.h index bfb08f0ea293..72c74545dba6 100644 --- a/arch/x86/include/asm/pvm_para.h +++ b/arch/x86/include/asm/pvm_para.h @@ -87,6 +87,7 @@ static inline bool pvm_kernel_layout_relocate(void) void entry_SYSCALL_64_pvm(void); void pvm_user_event_entry(void); +void pvm_hypercall(void); void pvm_retu_rip(void); void pvm_rets_rip(void); #endif /* !__ASSEMBLY__ */ diff --git a/arch/x86/kernel/pvm.c b/arch/x86/kernel/pvm.c index b3b4ff0bbc91..352d74394c4a 100644 --- a/arch/x86/kernel/pvm.c +++ b/arch/x86/kernel/pvm.c @@ -27,6 +27,52 @@ unsigned long pvm_range_end __initdata; static bool early_traps_setup __initdata; +static __always_inline long pvm_hypercall0(unsigned int nr) +{ + long ret; + + asm volatile("call pvm_hypercall" + : "=a"(ret) + : "a"(nr) + : "memory"); + return ret; +} + +static __always_inline long pvm_hypercall1(unsigned int nr, unsigned long p1) +{ + long ret; + + asm volatile("call pvm_hypercall" + : "=a"(ret) + : "a"(nr), "b"(p1) + : "memory"); + return ret; +} + +static __always_inline long pvm_hypercall2(unsigned int nr, unsigned long p1, + unsigned long p2) +{ + long ret; + + asm volatile("call pvm_hypercall" + : "=a"(ret) + : "a"(nr), "b"(p1), "c"(p2) + : "memory"); + return ret; +} + +static __always_inline long pvm_hypercall3(unsigned int nr, unsigned long p1, + unsigned long p2, unsigned long p3) +{ + long ret; + + asm volatile("call pvm_hypercall" + : "=a"(ret) + : "a"(nr), "b"(p1), "c"(p2), "d"(p3) + : "memory"); + return ret; +} + void __init pvm_early_event(struct pt_regs *regs) { int vector = regs->orig_ax >> 32; From patchwork Mon Feb 26 14:36:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572352 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D6E6C12A17C; Mon, 26 Feb 2024 14:39:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958358; cv=none; b=lJRk3wZ4xjnV905vtjEQDOWztHEHeK76eKf6PqtJ2d451/2ABhx6pkn2ctxRiLcwoXU8/p1ZRqccY79BHGoExLK2Hvu0eCDNmce7b4VjQf38VKopz4AzJ7uyrlLhEiEAJXNqSWp/G0oEv/6yTvsmp8N77k0c27PeEpOZJXaAsag= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958358; c=relaxed/simple; bh=kTY4wS3zVP5woxSl7wt9Hijoz3oVrwUyvBkwFVIbomc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=FlzKZDYdg6M7g94+I1N3EkwZ1nTDUInpwSI0mh3RQT/IGeODYKPLCz2crI/58STb6Bsu4IkigShPlrXSufnRxIv8q48U7aq44YIz7xe1Xraw5ZHDAxnPxXNuc/U5nkywmoK9DtxekL6bpWnN3xPZFZdne26nGHtAL2LJthhGHv0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=XfkHOCHs; arc=none smtp.client-ip=209.85.214.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XfkHOCHs" Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-1d93edfa76dso26738595ad.1; Mon, 26 Feb 2024 06:39:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958356; x=1709563156; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+7gAp+Q6U/3m+V7ykKpuxVMkNBKRu+hZ/ADpZ+xq1uU=; b=XfkHOCHs+x3Z4AU9u5ryAMXeLOr/Cm+xUARi6Up7yciL/uZppayl5rizPCgCAlYadH PrP0bsJCEaGpsQ6NYu1kmmEGcX4XjHViS+uq44uPQwdPjCdra4MVSaVOg+gS211kOzVd eqHSZcPJeuUY5KqsMdMQB6PoBQbDBIpir3mbCK7Vj2HeypAfuAYuX3eS5P7dF4+7Dhpb /2ugYapBz+v20WwrD32G46JccE5NI/kNDPcCg6oZm5FpFGf9Lr2HTU7qqMjfGlcuczUH ljtkoVXBSDmFQnc+s/u0GDkZGO6FAF8CaXfmFUSsUiMlU+lbY8aZqJwkVr7wJ7heT9Ny 6oEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958356; x=1709563156; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+7gAp+Q6U/3m+V7ykKpuxVMkNBKRu+hZ/ADpZ+xq1uU=; b=qmBM6LpGxSIiJuzBnZb/rR+06pBjJbdbjRh0Om3oqobBgryZUB0AE4ij4zb9KRx6Na k7MdkKwN6GHnY3ljOUV1HbHD5lD/Mecdns3s5KINTlkQHi1nUcgKDMBou0Fp0Fmlv0SV 3jQfea6vyx2nwES177vUlVZ1jznFpG8ck4qlpztK29gIZbopVBBUGmLO53O3kq+qedgz xDPo9Ft5GndPmdXsNh2+DdNiV8ysIvxGBAKvBgVtfAUosnDi1AjhOh8JsVYZEBi2BqMS /ZOwf8bXKcLdfmfZfXZkpwlN0622scRKwLKGv/DggkenJqGdEtNQcJNDLOdaZ87v+Juk dp8Q== X-Forwarded-Encrypted: i=1; AJvYcCUDaNbEEffeyf786PyhIAr8qo7i3VX/1yyYo9JMfIISGsO1MGlI/rESJ8yJzs54JQtLvCGsJ0hNclAJMMTWvb91Fmde X-Gm-Message-State: AOJu0YxUNEE1V0tEOUGEgTcw8Up8fH+rmD+hcKqQSaKr2rYo1bdqEN7i h93AayUrx+lvRw7pc0o4iwR8kTOTi5VICukEjP101usYM/mRj13eIxoxoIJg X-Google-Smtp-Source: AGHT+IEWneBeaY+Xy8kg73z80/nF6qwXfZ9Jwoz7WSuwfJtgWsOkxvKS+hM3vdKwbX01UpLW/z6NYQ== X-Received: by 2002:a17:902:f54f:b0:1dc:78b6:bbcf with SMTP id h15-20020a170902f54f00b001dc78b6bbcfmr9108923plf.63.1708958356107; Mon, 26 Feb 2024 06:39:16 -0800 (PST) Received: from localhost ([198.11.176.14]) by smtp.gmail.com with ESMTPSA id ji22-20020a170903325600b001dc23e877bfsm4019205plb.268.2024.02.26.06.39.15 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:39:15 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Hou Wenlong , Lai Jiangshan , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Andy Lutomirski , Dave Hansen , "H. Peter Anvin" , Nikolay Borisov , Rick Edgecombe , Daniel Sneddon , Adam Dunlap , Yuntao Wang , Wang Jinchao , Josh Poimboeuf , "Mike Rapoport (IBM)" , Yu-cheng Yu Subject: [RFC PATCH 64/73] x86/pvm: Enable PVM event delivery Date: Mon, 26 Feb 2024 22:36:21 +0800 Message-Id: <20240226143630.33643-65-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Hou Wenlong Invoke pvm_early_setup() after idt_setup_early_handler() to enable early kernel event delivery. Also, modify cpu_init_exception_handling() to call pvm_setup_event_handling() in order to enable event delivery for the current CPU. Additionally, for the syscall event, change MSR_LSTAR to PVM specific entry. Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/entry/entry_64.S | 9 ++++++-- arch/x86/include/asm/pvm_para.h | 5 +++++ arch/x86/kernel/cpu/common.c | 11 ++++++++++ arch/x86/kernel/head64.c | 3 +++ arch/x86/kernel/idt.c | 2 ++ arch/x86/kernel/pvm.c | 37 +++++++++++++++++++++++++++++++++ 6 files changed, 65 insertions(+), 2 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 5b25ea4a16ae..fe12605b3c05 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -124,10 +124,12 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL) * a completely clean 64-bit userspace context. If we're not, * go to the slow exit path. * In the Xen PV case we must use iret anyway. + * In the PVM guest case we must use eretu synthetic instruction. */ - ALTERNATIVE "testb %al, %al; jz swapgs_restore_regs_and_return_to_usermode", \ - "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV + ALTERNATIVE_2 "testb %al, %al; jz swapgs_restore_regs_and_return_to_usermode", \ + "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV, \ + "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_KVM_PVM_GUEST /* * We win! This label is here just for ease of understanding @@ -597,6 +599,9 @@ SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL) #ifdef CONFIG_XEN_PV ALTERNATIVE "", "jmp xenpv_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV #endif +#ifdef CONFIG_PVM_GUEST + ALTERNATIVE "", "jmp pvm_restore_regs_and_return_to_usermode", X86_FEATURE_KVM_PVM_GUEST +#endif POP_REGS pop_rdi=0 diff --git a/arch/x86/include/asm/pvm_para.h b/arch/x86/include/asm/pvm_para.h index 72c74545dba6..f5d40a57c423 100644 --- a/arch/x86/include/asm/pvm_para.h +++ b/arch/x86/include/asm/pvm_para.h @@ -15,6 +15,7 @@ typedef void (*idtentry_t)(struct pt_regs *regs); void __init pvm_early_setup(void); void __init pvm_setup_early_traps(void); void __init pvm_install_sysvec(unsigned int sysvec, idtentry_t handler); +void pvm_setup_event_handling(void); bool __init pvm_kernel_layout_relocate(void); static inline void pvm_cpuid(unsigned int *eax, unsigned int *ebx, @@ -79,6 +80,10 @@ static inline void pvm_install_sysvec(unsigned int sysvec, idtentry_t handler) { } +static inline void pvm_setup_event_handling(void) +{ +} + static inline bool pvm_kernel_layout_relocate(void) { return false; diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 45f214e41a9a..89874559dbc2 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -66,6 +66,7 @@ #include #include #include +#include #include "cpu.h" @@ -2066,7 +2067,15 @@ static void wrmsrl_cstar(unsigned long val) void syscall_init(void) { wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS); + +#ifdef CONFIG_PVM_GUEST + if (boot_cpu_has(X86_FEATURE_KVM_PVM_GUEST)) + wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64_pvm); + else + wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64); +#else wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64); +#endif if (ia32_enabled()) { wrmsrl_cstar((unsigned long)entry_SYSCALL_compat); @@ -2217,6 +2226,8 @@ void cpu_init_exception_handling(void) /* Finally load the IDT */ load_current_idt(); + + pvm_setup_event_handling(); } /* diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index d0e8d648bd38..17cd11dd1f03 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -42,6 +42,7 @@ #include #include #include +#include /* * Manage page tables very early on. @@ -286,6 +287,8 @@ asmlinkage __visible void __init __noreturn x86_64_start_kernel(char * real_mode idt_setup_early_handler(); + pvm_early_setup(); + /* Needed before cc_platform_has() can be used for TDX */ tdx_early_init(); diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c index 660b601f1d6c..0dc3ded6da01 100644 --- a/arch/x86/kernel/idt.c +++ b/arch/x86/kernel/idt.c @@ -12,6 +12,7 @@ #include #include #include +#include #define DPL0 0x0 #define DPL3 0x3 @@ -259,6 +260,7 @@ void __init idt_setup_early_pf(void) { idt_setup_from_table(idt_table, early_pf_idts, ARRAY_SIZE(early_pf_idts), true); + pvm_setup_early_traps(); } #endif diff --git a/arch/x86/kernel/pvm.c b/arch/x86/kernel/pvm.c index 352d74394c4a..c38e46a96ad3 100644 --- a/arch/x86/kernel/pvm.c +++ b/arch/x86/kernel/pvm.c @@ -286,12 +286,49 @@ __visible noinstr void pvm_event(struct pt_regs *regs) common_interrupt(regs, vector); } +extern void pvm_early_kernel_event_entry(void); + +/* + * Reserve a fixed-size area in the current stack during an event from + * supervisor mode. This is for the int3 handler to emulate a call instruction. + */ +#define PVM_SUPERVISOR_REDZONE_SIZE (2*8UL) + void __init pvm_early_setup(void) { if (!pvm_range_end) return; setup_force_cpu_cap(X86_FEATURE_KVM_PVM_GUEST); + + wrmsrl(MSR_PVM_VCPU_STRUCT, __pa(this_cpu_ptr(&pvm_vcpu_struct))); + wrmsrl(MSR_PVM_EVENT_ENTRY, (unsigned long)(void *)pvm_early_kernel_event_entry - 256); + wrmsrl(MSR_PVM_SUPERVISOR_REDZONE, PVM_SUPERVISOR_REDZONE_SIZE); + wrmsrl(MSR_PVM_RETS_RIP, (unsigned long)(void *)pvm_rets_rip); +} + +void pvm_setup_event_handling(void) +{ + if (boot_cpu_has(X86_FEATURE_KVM_PVM_GUEST)) { + u64 xpa = slow_virt_to_phys(this_cpu_ptr(&pvm_vcpu_struct)); + + wrmsrl(MSR_PVM_VCPU_STRUCT, xpa); + wrmsrl(MSR_PVM_EVENT_ENTRY, (unsigned long)(void *)pvm_user_event_entry); + wrmsrl(MSR_PVM_SUPERVISOR_REDZONE, PVM_SUPERVISOR_REDZONE_SIZE); + wrmsrl(MSR_PVM_RETU_RIP, (unsigned long)(void *)pvm_retu_rip); + wrmsrl(MSR_PVM_RETS_RIP, (unsigned long)(void *)pvm_rets_rip); + + /* + * PVM spec requires the hypervisor-maintained + * MSR_KERNEL_GS_BASE to be the same as the kernel GSBASE for + * event delivery for user mode. wrmsrl(MSR_KERNEL_GS_BASE) + * accesses only the user GSBASE in the PVCS via + * pvm_write_msr() without hypervisor involved, so use + * PVM_HC_WRMSR instead. + */ + pvm_hypercall2(PVM_HC_WRMSR, MSR_KERNEL_GS_BASE, + cpu_kernelmode_gs_base(smp_processor_id())); + } } #define TB_SHIFT 40 From patchwork Mon Feb 26 14:36:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572353 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 32A9914AD3B; Mon, 26 Feb 2024 14:39:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958361; cv=none; b=GTPbMTB6x6/pZ4bbBK3uWVBVz0zsO6F5N+sqZBndB2+qoe+On4Q4d03p1BD8nmq4J3SFmQg/JtGSR/9Q5WmoL2LgC3e014H37G+OKVAE48Xe1rFRN/upPTjA3SLH+krElgAAToFu8ruiF6rJPX/T8AzDFJPelSpW2vbGo8AhGJg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958361; c=relaxed/simple; bh=XhSoym3qQlKNWcZaO1fmucbwOJAhaGU0ZDzfY5zhuSk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=AvrL2yEVwObNlWd5YgQQkzkilx2JzkYaRwfI8GYlGyuNTNeMvJT6U309O0xcB8Pz+TKPBPhLYXYavPJwuR6qjc4YvTGAAZ9pJDurSqUj7PuCdBxFMFO9y35RyGMkwfA8sS3Iga6lHlXpmak6XZ3qzIi+P6HdwsAk8VFpy6jMRMk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=km8qtcDh; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="km8qtcDh" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-1dcafff3c50so3680195ad.0; Mon, 26 Feb 2024 06:39:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958359; x=1709563159; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=WkFyhfs2k+/ZiDzHPyMbRms0OafqFweJ9xnGHR0iLV8=; b=km8qtcDhONpx3iGTYwvzSZTeaCQUZLPRlKLAAnm+nDZXnmuiCpSvU3aDjI2f2+wcyj 9R9zPCS5dcnEGWyht4zpdJcf3y86FDNnQZsg6/7c+vakIEkp+lNCffDEyYMK1c9XIhmO XGCLdXyJFjUo6xNmrfM5lczAEuU1mbqDY+3lry/Zwdtl0dFleT//VJfl4wQ2AvjWQN1n i190qpCZ5ejZPEum/3rxE9+jhtGAIJXD4FJf71v2VtYyNsduJrK98TPHEOqxj+NyjvqP b220mHUMYfWQe8p03aJ+oZXPxUF8IlhqH8r0WHljj2xuwkYXzGu38KnP8uaQV4iBIJoA daWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958359; x=1709563159; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WkFyhfs2k+/ZiDzHPyMbRms0OafqFweJ9xnGHR0iLV8=; b=QLfjRmAE3+g0JOShuRWDVMLAGoGxb4dnlynOCMWzZHz6BbVx9tuZwQ/atYeYKpZn4g NLeL+l0dVAWs991UO3SNY0Ri37W3r+QZxhg2iKIQiFULgs3DsdoKu/F7p8GyjPQVyndB 2kHZBFPUJXr8Wzcf9KSCJ+77aFiASjBEIB9HmheCcBuEyYRB2N1vZlnLK4wP71UC5qVj WBf1OM2zcAEUjSigZu3aPCQ/uf8VBFsZbHb+MnRvM1MJNUVtIqfoVpp9YAhf9gMFIuXU OBXcRGEHdoSdW+t7udCwlSi3ABzWSxTMNyDwQ+WDZp6uEO3fU6Cgz7sS82rof94sE/Rk 3XnQ== X-Forwarded-Encrypted: i=1; AJvYcCW/92nAyb+4OnrYd5XiInYZguhoAtZTAIWrNNZqahzpMobFxK0m2eZHoYTP//9v7STJmmgqJyrkytt5mjzeepGolKVB X-Gm-Message-State: AOJu0YyfAh+9xksGH7f2UUxqF4usGqpPEBnBBTajRjcdRLAwt/rqKmun gZQNQn+YcsFSxKksRGKvMKgaxO3FbUaUWdHBj5SWVLCZsdH79fMRbYvyqkfm X-Google-Smtp-Source: AGHT+IF8tRUu66bSXe/aJmjWB0UBmQOoCiQpAK7DNUmyzrYAZTKYIv0rpCc73qQs0tztBkfqpn2XWQ== X-Received: by 2002:a17:902:d488:b0:1dc:8042:3b76 with SMTP id c8-20020a170902d48800b001dc80423b76mr10400301plg.2.1708958359247; Mon, 26 Feb 2024 06:39:19 -0800 (PST) Received: from localhost ([198.11.176.14]) by smtp.gmail.com with ESMTPSA id b4-20020a170903228400b001db5ea6664asm3997276plh.21.2024.02.26.06.39.18 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:39:18 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Wanpeng Li , Vitaly Kuznetsov , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 65/73] x86/kvm: Patch KVM hypercall as PVM hypercall Date: Mon, 26 Feb 2024 22:36:22 +0800 Message-Id: <20240226143630.33643-66-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan Modify the KVM_HYPERCALL macro to enable patching the KVM hypercall as a PVM hypercall. Note that this modification will increase the size by two bytes for each KVM hypercall instruction. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/include/asm/kvm_para.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 57bc74e112f2..1a322b684146 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -2,6 +2,7 @@ #ifndef _ASM_X86_KVM_PARA_H #define _ASM_X86_KVM_PARA_H +#include #include #include #include @@ -18,8 +19,14 @@ static inline bool kvm_check_and_clear_guest_paused(void) } #endif /* CONFIG_KVM_GUEST */ +#ifdef CONFIG_PVM_GUEST +#define KVM_HYPERCALL \ + ALTERNATIVE_2("vmcall", "vmmcall", X86_FEATURE_VMMCALL, \ + "call pvm_hypercall", X86_FEATURE_KVM_PVM_GUEST) +#else #define KVM_HYPERCALL \ ALTERNATIVE("vmcall", "vmmcall", X86_FEATURE_VMMCALL) +#endif /* CONFIG_PVM_GUEST */ /* For KVM hypercalls, a three-byte sequence of either the vmcall or the vmmcall * instruction. The hypervisor may replace it with something else but only the From patchwork Mon Feb 26 14:36:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572354 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7608512FB08; Mon, 26 Feb 2024 14:39:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958368; cv=none; b=hnXoYfOtI+kIgFX/ZL/qaFjJDO4i89dwh8xGZevNyFqpHLb+cwmi1f9hWQ6VOU5Vs47wqu5poLRcDZTaJiK5+8QaD7ttdZdYg+2UrYkoJr5vlVwlaLqkF/itqake3TmTmjk8yE3ERGRISwxPlSbtdZr7qST9Lki6tWNukH7NWW4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958368; c=relaxed/simple; bh=tzGizjCiHx7zVuGTWahlp4kf7Af8DRrnW2IXT6ESIME=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=cEBeZ4IqKC0GYtw1LevP55PjDEFDbHPHxoJglVr7BEEv7I6Td9e9IPVrbD3Ritim/XnLLEx3gWJXhwehargq6MWzb0f/2Du/N061pFhRLwUC9PkxNV0p6n7mILbD3sUZOv9+BBHi93ZsLkVBSzPKxqsrR+nkXpPTUqq53VSj4Lo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=a20LGOAj; arc=none smtp.client-ip=209.85.210.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="a20LGOAj" Received: by mail-pf1-f181.google.com with SMTP id d2e1a72fcca58-6e46dcd8feaso829836b3a.2; Mon, 26 Feb 2024 06:39:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958365; x=1709563165; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3obabIc0frWAPcp78SCT093DaVOW3OCMmYlsqxV6Jrg=; b=a20LGOAjh7ORdwZqHljJ9i+OAx1bqLix4/ayuMZVSwWI3jFeRATKYRSqeo7Gu5Ueyr aPTI5wS10Y/sOgfndViduFl3qrLp3EsMcMmhP1IoETtcOjwekdYVw7NlhbahHszwNTjT uuNGjRb7NTPL3SN4nS64zke3IyZGMSoTuuDLoR+O737g41gfbdca06j9sAWt97s10gRd eOEUZAZ++BueRF6RXCLLzAkPMq2lhq/zNfrPmbYCLyarZ5YYpT+USs4C8WXk7vB1L1vH PA6iZG2FQe5P4bsGVORbm2V9GcCb30eihZN0IjnekZ2U/m2LjvPCpdLiUHslxIeyP++w A4AQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958365; x=1709563165; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3obabIc0frWAPcp78SCT093DaVOW3OCMmYlsqxV6Jrg=; b=Z+5/nn8Qup1dDdky/IenWsAXEBCTUroEp0BU9uVvUHxsCF8OcJUKJICiblqyeAyqfd 2pgTmhQTSFG4gyzzL0OT6nmoZQ8ECk7KK2Vg4D2DjGjeb0IPyiBrRqTEPFpTYdD898bX I+S2mm9jG1+EpEDbY9Cuk3+kdmin8fKXBuFCFmsi0vgX0PhzTLeWzllAiCMLLqgepor7 uHetIy/ywQ28IYwiz9vPCZPOxrdLYvQ1/5f0fHtnwDus2sIdCVuEJ3HdbLO+xlc/7X5n 04/YVzYHkbtcVr22RoTVWWxiThA2WDt4KjCrC4xY9MVFp5UhOqykQ0ne6mbCPg9RwCY8 fwfw== X-Forwarded-Encrypted: i=1; AJvYcCWZqFywSI3WEG1d5UuRjKIT8rR25ZtrbznBoi499nqWNgf+JSuI6BsXmdL8yXI2/hhibjqME0qNV/vC/64n/Eq22IPL X-Gm-Message-State: AOJu0Yw6xwUos0C8uoj6IQ33SPT0sA/ug30kYjxIIQ72NLge/fNo5Zz6 FS3myGu/I0desxNoFx15vietS7bmqb9GgIrmC3zgI1VdJplElC990Gv7WMGy X-Google-Smtp-Source: AGHT+IHxB1SRDNUAQcM9eSd8jCrvT5j7mxZc3ehl/sTVGmSJcJcMTqd/bRM8vjNK9o0eC3pRQdhkvg== X-Received: by 2002:a05:6a00:1817:b0:6e4:5a0f:b87a with SMTP id y23-20020a056a00181700b006e45a0fb87amr8639641pfa.12.1708958365514; Mon, 26 Feb 2024 06:39:25 -0800 (PST) Received: from localhost ([47.254.32.37]) by smtp.gmail.com with ESMTPSA id l3-20020a62be03000000b006e04553a4c5sm4068362pff.52.2024.02.26.06.39.24 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:39:25 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Hou Wenlong , Lai Jiangshan , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Andy Lutomirski , Dave Hansen , "H. Peter Anvin" , Ajay Kaher , Alexey Makhalov , VMware PV-Drivers Reviewers , Boris Ostrovsky , "Mike Rapoport (IBM)" , Daniel Sneddon , Rick Edgecombe , Alexey Kardashevskiy , virtualization@lists.linux.dev, xen-devel@lists.xenproject.org Subject: [RFC PATCH 66/73] x86/pvm: Use new cpu feature to describe XENPV and PVM Date: Mon, 26 Feb 2024 22:36:23 +0800 Message-Id: <20240226143630.33643-67-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Hou Wenlong Some PVOPS are patched as the native version directly if the guest is not a XENPV guest. However, this approach will not work after introducing a PVM guest. To address this, use a new CPU feature to describe XENPV and PVM, and ensure that those PVOPS are patched only when it is not a paravirtual guest. Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/entry/entry_64.S | 5 ++--- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/paravirt.h | 14 +++++++------- arch/x86/kernel/pvm.c | 1 + arch/x86/xen/enlighten_pv.c | 1 + 5 files changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index fe12605b3c05..6b41a1837698 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -127,9 +127,8 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL) * In the PVM guest case we must use eretu synthetic instruction. */ - ALTERNATIVE_2 "testb %al, %al; jz swapgs_restore_regs_and_return_to_usermode", \ - "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV, \ - "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_KVM_PVM_GUEST + ALTERNATIVE "testb %al, %al; jz swapgs_restore_regs_and_return_to_usermode", \ + "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_PV_GUEST /* * We win! This label is here just for ease of understanding diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index e17e72f13423..72ef58a2db19 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -238,6 +238,7 @@ #define X86_FEATURE_VCPUPREEMPT ( 8*32+21) /* "" PV vcpu_is_preempted function */ #define X86_FEATURE_TDX_GUEST ( 8*32+22) /* Intel Trust Domain Extensions Guest */ #define X86_FEATURE_KVM_PVM_GUEST ( 8*32+23) /* KVM Pagetable-based Virtual Machine guest */ +#define X86_FEATURE_PV_GUEST ( 8*32+24) /* "" Paravirtual guest */ /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */ #define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/ diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index deaee9ec575e..a864ee481ca2 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -143,7 +143,7 @@ static __always_inline unsigned long read_cr2(void) { return PVOP_ALT_CALLEE0(unsigned long, mmu.read_cr2, "mov %%cr2, %%rax;", - ALT_NOT(X86_FEATURE_XENPV)); + ALT_NOT(X86_FEATURE_PV_GUEST)); } static __always_inline void write_cr2(unsigned long x) @@ -154,13 +154,13 @@ static __always_inline void write_cr2(unsigned long x) static inline unsigned long __read_cr3(void) { return PVOP_ALT_CALL0(unsigned long, mmu.read_cr3, - "mov %%cr3, %%rax;", ALT_NOT(X86_FEATURE_XENPV)); + "mov %%cr3, %%rax;", ALT_NOT(X86_FEATURE_PV_GUEST)); } static inline void write_cr3(unsigned long x) { PVOP_ALT_VCALL1(mmu.write_cr3, x, - "mov %%rdi, %%cr3", ALT_NOT(X86_FEATURE_XENPV)); + "mov %%rdi, %%cr3", ALT_NOT(X86_FEATURE_PV_GUEST)); } static inline void __write_cr4(unsigned long x) @@ -694,17 +694,17 @@ bool __raw_callee_save___native_vcpu_is_preempted(long cpu); static __always_inline unsigned long arch_local_save_flags(void) { return PVOP_ALT_CALLEE0(unsigned long, irq.save_fl, "pushf; pop %%rax;", - ALT_NOT(X86_FEATURE_XENPV)); + ALT_NOT(X86_FEATURE_PV_GUEST)); } static __always_inline void arch_local_irq_disable(void) { - PVOP_ALT_VCALLEE0(irq.irq_disable, "cli;", ALT_NOT(X86_FEATURE_XENPV)); + PVOP_ALT_VCALLEE0(irq.irq_disable, "cli;", ALT_NOT(X86_FEATURE_PV_GUEST)); } static __always_inline void arch_local_irq_enable(void) { - PVOP_ALT_VCALLEE0(irq.irq_enable, "sti;", ALT_NOT(X86_FEATURE_XENPV)); + PVOP_ALT_VCALLEE0(irq.irq_enable, "sti;", ALT_NOT(X86_FEATURE_PV_GUEST)); } static __always_inline unsigned long arch_local_irq_save(void) @@ -776,7 +776,7 @@ void native_pv_lock_init(void) __init; .endm #define SAVE_FLAGS ALTERNATIVE "PARA_IRQ_save_fl;", "pushf; pop %rax;", \ - ALT_NOT(X86_FEATURE_XENPV) + ALT_NOT(X86_FEATURE_PV_GUEST) #endif #endif /* CONFIG_PARAVIRT_XXL */ #endif /* CONFIG_X86_64 */ diff --git a/arch/x86/kernel/pvm.c b/arch/x86/kernel/pvm.c index c38e46a96ad3..d39550a8159f 100644 --- a/arch/x86/kernel/pvm.c +++ b/arch/x86/kernel/pvm.c @@ -300,6 +300,7 @@ void __init pvm_early_setup(void) return; setup_force_cpu_cap(X86_FEATURE_KVM_PVM_GUEST); + setup_force_cpu_cap(X86_FEATURE_PV_GUEST); wrmsrl(MSR_PVM_VCPU_STRUCT, __pa(this_cpu_ptr(&pvm_vcpu_struct))); wrmsrl(MSR_PVM_EVENT_ENTRY, (unsigned long)(void *)pvm_early_kernel_event_entry - 256); diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index aeb33e0a3f76..c56483051528 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -335,6 +335,7 @@ static bool __init xen_check_xsave(void) static void __init xen_init_capabilities(void) { setup_force_cpu_cap(X86_FEATURE_XENPV); + setup_force_cpu_cap(X86_FEATURE_PV_GUEST); setup_clear_cpu_cap(X86_FEATURE_DCA); setup_clear_cpu_cap(X86_FEATURE_APERFMPERF); setup_clear_cpu_cap(X86_FEATURE_MTRR); From patchwork Mon Feb 26 14:36:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572355 Received: from mail-pj1-f48.google.com (mail-pj1-f48.google.com [209.85.216.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6623912FB08; Mon, 26 Feb 2024 14:39:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958373; cv=none; b=tYkm8GmiqRWcpIJvad8p/vJkbfOnkUvbuTHL/7vQq70e41aSXCZsq3IsLqcLbJzM2KhaVI0RJQCb7MGqP7bwHNv4X4azPIDYVibRw8AQmQWWH1cky1gIVFOaR1e257NHG3SiZRW4rF09GBtYtuKU640A/1K5vqGia/V7nLXzu8s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958373; c=relaxed/simple; bh=Bea4NFK57XqYcwmGW3u9vwK3rfjIWPn3hka4ZBXdlRI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=pSfZgF56wcvyw3x8zxyrpjzEvdfTB2WYUgRkynSkPY92r0/uZ4FG3HcCt+c+MvyqTl3iPvgbYDWUpFvB+Lcy7OW/0ydKa8onFJ+DZfJsxV6haniCmsXqTAE6ORm1RHT/MP3P/a/Mda1I0PLVUcEA6Dx+yl4Ztrfz0bB1ZVVGOv8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=G2Ro8LPC; arc=none smtp.client-ip=209.85.216.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="G2Ro8LPC" Received: by mail-pj1-f48.google.com with SMTP id 98e67ed59e1d1-29abd02d0d9so755795a91.0; Mon, 26 Feb 2024 06:39:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958370; x=1709563170; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ior8XKCEXf7aXrFeNTkdMW+EWEMRN6zdLbeEAiFvpAA=; b=G2Ro8LPC6lmc7Ns1tmqNZrcTwamHWVD90dZBmnkxkRYJNjz8p/6xWKj5w2ziz5Nzsc Yu05oicSsjgNezvSw9ayhRTP6N7IwkvfYoy59bHHOkm9twAI9mfC/n7JBxpqiFwLneg0 k6KgP+W1I1cbaY4BBHJwLGPjJsTc5lf1Jags3IZvZ5V+UcrPjgcvw5eQClhqcq6L7LG1 RPfU5wFKFcL68ABrkNqJOcvD/aAHH4BpAClQJKCmRPxTwRDssVcOLgUOexdDDB+LfAvZ jIm79ALIDAN0nk5UrceUivvdFGUtk2p/lq/8Nd0REu/ymNUvrJ8hpEJ3aBehT/vvwLCo sKLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958370; x=1709563170; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ior8XKCEXf7aXrFeNTkdMW+EWEMRN6zdLbeEAiFvpAA=; b=nkFzQnlyZ7rKln63gLr3n6bCl0MK+qb8A2R6sXlVR7hZ07KVNIIZ81D//G8pBcfXFM WVYboOybNoewZWrIwrk04UUZbYivt5DpHO7C4f2ph5pi/vTbV8RVKo7G2S2kzT0poRRl Jxj5EummZl8Bg8wI1JhhkIYnxrWWcbeuU32bAiOJ4td9bmelD9pCevrSYprnXcs1h9bg 05yxDfgCffXp0tHZpg4Z5Kj9Sp/vLH+AOCMzPmqsH4IV09NFtUK4jqsJrzLSp6bU/UAT eHn/jRXD864QGqsHhFoheI0t8KVt1UhoCZnXJb+6E6J62fveURa4RzaiRXclEB4qkV6c FFPQ== X-Forwarded-Encrypted: i=1; AJvYcCXL1M5u3TOypsM5vTEIDZw8KAXyBZ0z7XmoDef6yMJqvhVKwasKPkqp0N6LFDyEQUxD3+4XICzE79bb28OtOO+vhV7S X-Gm-Message-State: AOJu0YzsHshDd4y6NvuRYu5+BipCBQl/5G3VUXj2n940spMBlJ4w38ot INCS8Tpi7i8bqWxCIV9d00DYkA017ZPZqjR8B9jwif5VHj33JyqTvb7PrkVY X-Google-Smtp-Source: AGHT+IHoSxSZtuAdPrrHHtoExIG3TjunA4TlOnLEguOcCRn6HdDkvG8S8QuYGxBARatgSLkWNS1Jxg== X-Received: by 2002:a17:90b:4f43:b0:29a:be10:b829 with SMTP id pj3-20020a17090b4f4300b0029abe10b829mr1808536pjb.44.1708958370508; Mon, 26 Feb 2024 06:39:30 -0800 (PST) Received: from localhost ([47.254.32.37]) by smtp.gmail.com with ESMTPSA id dj8-20020a17090ad2c800b00297138f0496sm6721446pjb.31.2024.02.26.06.39.29 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:39:30 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 67/73] x86/pvm: Implement cpu related PVOPS Date: Mon, 26 Feb 2024 22:36:24 +0800 Message-Id: <20240226143630.33643-68-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan The MSR read/write operations are in the hot path, so use hypercalls in their PVOPS to enhance performance. Additionally, it is important to ensure that load_gs_index() and load_tls() notify the hypervisor in their PVOPS. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/Kconfig | 1 + arch/x86/kernel/pvm.c | 85 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 86 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 32a2ab49752b..60e28727580a 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -855,6 +855,7 @@ config PVM_GUEST bool "PVM Guest support" depends on X86_64 && KVM_GUEST && X86_PIE && !KASAN select PAGE_TABLE_ISOLATION + select PARAVIRT_XXL select RANDOMIZE_MEMORY select RELOCATABLE_UNCOMPRESSED_KERNEL default n diff --git a/arch/x86/kernel/pvm.c b/arch/x86/kernel/pvm.c index d39550a8159f..12a35bef9bb8 100644 --- a/arch/x86/kernel/pvm.c +++ b/arch/x86/kernel/pvm.c @@ -73,6 +73,81 @@ static __always_inline long pvm_hypercall3(unsigned int nr, unsigned long p1, return ret; } +static void pvm_load_gs_index(unsigned int sel) +{ + if (sel & 4) { + pr_warn_once("pvm guest doesn't support LDT"); + this_cpu_write(pvm_vcpu_struct.user_gsbase, 0); + } else { + unsigned long base; + + preempt_disable(); + base = pvm_hypercall1(PVM_HC_LOAD_GS, sel); + __this_cpu_write(pvm_vcpu_struct.user_gsbase, base); + preempt_enable(); + } +} + +static unsigned long long pvm_read_msr_safe(unsigned int msr, int *err) +{ + switch (msr) { + case MSR_FS_BASE: + *err = 0; + return rdfsbase(); + case MSR_KERNEL_GS_BASE: + *err = 0; + return this_cpu_read(pvm_vcpu_struct.user_gsbase); + default: + return native_read_msr_safe(msr, err); + } +} + +static unsigned long long pvm_read_msr(unsigned int msr) +{ + switch (msr) { + case MSR_FS_BASE: + return rdfsbase(); + case MSR_KERNEL_GS_BASE: + return this_cpu_read(pvm_vcpu_struct.user_gsbase); + default: + return pvm_hypercall1(PVM_HC_RDMSR, msr); + } +} + +static int notrace pvm_write_msr_safe(unsigned int msr, u32 low, u32 high) +{ + unsigned long base = ((u64)high << 32) | low; + + switch (msr) { + case MSR_FS_BASE: + wrfsbase(base); + return 0; + case MSR_KERNEL_GS_BASE: + this_cpu_write(pvm_vcpu_struct.user_gsbase, base); + return 0; + default: + return pvm_hypercall2(PVM_HC_WRMSR, msr, base); + } +} + +static void notrace pvm_write_msr(unsigned int msr, u32 low, u32 high) +{ + pvm_write_msr_safe(msr, low, high); +} + +static void pvm_load_tls(struct thread_struct *t, unsigned int cpu) +{ + struct desc_struct *gdt = get_cpu_gdt_rw(cpu); + unsigned long *tls_array = (unsigned long *)gdt; + + if (memcmp(&gdt[GDT_ENTRY_TLS_MIN], &t->tls_array[0], sizeof(t->tls_array))) { + native_load_tls(t, cpu); + pvm_hypercall3(PVM_HC_LOAD_TLS, tls_array[GDT_ENTRY_TLS_MIN], + tls_array[GDT_ENTRY_TLS_MIN + 1], + tls_array[GDT_ENTRY_TLS_MIN + 2]); + } +} + void __init pvm_early_event(struct pt_regs *regs) { int vector = regs->orig_ax >> 32; @@ -302,6 +377,16 @@ void __init pvm_early_setup(void) setup_force_cpu_cap(X86_FEATURE_KVM_PVM_GUEST); setup_force_cpu_cap(X86_FEATURE_PV_GUEST); + /* PVM takes care of %gs when switching to usermode for us */ + pv_ops.cpu.load_gs_index = pvm_load_gs_index; + pv_ops.cpu.cpuid = pvm_cpuid; + + pv_ops.cpu.read_msr = pvm_read_msr; + pv_ops.cpu.write_msr = pvm_write_msr; + pv_ops.cpu.read_msr_safe = pvm_read_msr_safe; + pv_ops.cpu.write_msr_safe = pvm_write_msr_safe; + pv_ops.cpu.load_tls = pvm_load_tls; + wrmsrl(MSR_PVM_VCPU_STRUCT, __pa(this_cpu_ptr(&pvm_vcpu_struct))); wrmsrl(MSR_PVM_EVENT_ENTRY, (unsigned long)(void *)pvm_early_kernel_event_entry - 256); wrmsrl(MSR_PVM_SUPERVISOR_REDZONE, PVM_SUPERVISOR_REDZONE_SIZE); From patchwork Mon Feb 26 14:36:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572356 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DECE012BE82; Mon, 26 Feb 2024 14:39:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958380; cv=none; b=S2cYJGJR3YvSBT3kDZ4wodoOG/9mMYKOCXshJS6yelR8bI955A6C3VRQsqGEvlvNdnOct9eYLb9Ia8FYofktzBzNH4kOWzbRhcu01seFCn4spLdaqHy2BYFEhtVAq0HiYtZ4U22VzYIjY4N+AA38OnPwJN8MHq/BPW2ZGMtqkEk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958380; c=relaxed/simple; bh=KM7iecQmLkLTMPKp+Fy1sVXQ4TzdbjLCokucwFll4/o=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=KeUtJRqFIiYa/0/9/pjHflMG6f+2+NyTU6531KLrUcmZCSL4tGhoXtUXe0gaLsEZVpUtjLx8degAVN7wXwxLHPpbL9XIrrNiChZ25HUgOWhHvcldJJN5iUb3/Q2NjhQd4UdsvXJdKzeGL8iIfmiFH9nktYfm2soQ9N3EuZDxZ+s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=RGr7baVH; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="RGr7baVH" Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-1dca160163dso9116875ad.3; Mon, 26 Feb 2024 06:39:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958378; x=1709563178; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KvvsdBL0+tpohgU9GakMz4yFaDR5z6TAfe25+CbTcnM=; b=RGr7baVHFaq1sTWkn7bqNCyom9xZdt2jWCMzT8AaqeUbzCLhUEsGu5LWrmxGjrawxX tZY5OoOppY1rs08R8lePkf3iFfSfvE89r170lZslDQ2jKsOcOwdnsqVreFjs7YDuCcQg LlBEA1e1furf5bJyvJft0OsX1zzN7rgq5pI3dJYMsf9KgO97v5P3V3YWDZT3wvKTsUyd 6dyFxGosbvAL1iAFFfTcyW9HrUzibACOf0T5bR2H3rdCkGQM9Rjc69CJ9Nv1XVNV3S7G WeCeEAXjM6C6c0SAbSXqR7Kq7Qmoz9REDsJ50esyBwdQxNiEMxwC9Q5Ctosdqu4M8QZ+ ls7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958378; x=1709563178; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KvvsdBL0+tpohgU9GakMz4yFaDR5z6TAfe25+CbTcnM=; b=tgm89Z9tzGte2Gm8pMA4DSys8oHzHTDOxj3spKG7wVUryYIWTZVbgMmZK+niBmtwt/ s57qk6MmiaYpJdIQAfUUGtMSdU6FerP/pjzp25oCFyl7ageaAWVDQh+3Z1eVWhMgL9jG l4yBdoEyoEavdN56z+lg5cIyrpuO2Hg1gq61HKA1Sg/YPGJps2W/0tYmpPAKG6dc2D47 FnwJrnoNI2GZeL+z++OPXmtCkQwpPPVlIFFe2cpDALFB0cHc2kQ4lUK7F/eSsE7rmRpV K5bdjnkGOOrjby2ks0T+aEg553Q869L7gYBeVPum/7gr9trKOCxgYZ9WQSMl8o2Ew0Sz AfQg== X-Forwarded-Encrypted: i=1; AJvYcCW1pXZKYwFYNEDHuoAxla9bnYVt1P4DmvLYAWAyRrRydL7XYC4W83SXuhKU4J6oexkR5mwUfNPrH4brccycI0NUNlOj X-Gm-Message-State: AOJu0Yyo3Rll4HvZha/UAJkGuYWbvMue+6BSg2PIhc7ClkEugnRN6gFZ oPZUihNISATXTaL9nW59sjknSZDStNyVjxH1gzRC6cXab5WSQkhxcJyAVs2W X-Google-Smtp-Source: AGHT+IHq6ZYg6Su4mIb97VQAs0Gi7f0aV9fTMBhWC2JsE2QnNB9BzhNGe5Qvb9vw+7/4xxYISfTEGQ== X-Received: by 2002:a17:903:24f:b0:1db:ccd0:e77e with SMTP id j15-20020a170903024f00b001dbccd0e77emr10543394plh.35.1708958377728; Mon, 26 Feb 2024 06:39:37 -0800 (PST) Received: from localhost ([47.88.5.130]) by smtp.gmail.com with ESMTPSA id lb13-20020a170902fa4d00b001dcb308510dsm208925plb.26.2024.02.26.06.39.37 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:39:37 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Andy Lutomirski , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 68/73] x86/pvm: Implement irq related PVOPS Date: Mon, 26 Feb 2024 22:36:25 +0800 Message-Id: <20240226143630.33643-69-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan The save_fl(), irq_enable(), and irq_disable() functions are in the hot path, so the hypervisor shares the X86_EFLAG_IF status in the PVCS structure for the guest kernel. This allows it to be read and modified directly without a VM exit if there is no IRQ window request. Additionally, the irq_halt() function remains the same, and a hypercall is used in its PVOPS to enhance performance. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/entry/entry_64_pvm.S | 22 ++++++++++++++++++++++ arch/x86/include/asm/pvm_para.h | 3 +++ arch/x86/kernel/pvm.c | 10 ++++++++++ 3 files changed, 35 insertions(+) diff --git a/arch/x86/entry/entry_64_pvm.S b/arch/x86/entry/entry_64_pvm.S index abb57e251e73..1d17bac2909a 100644 --- a/arch/x86/entry/entry_64_pvm.S +++ b/arch/x86/entry/entry_64_pvm.S @@ -65,6 +65,28 @@ SYM_FUNC_START(pvm_hypercall) popq %r11 RET SYM_FUNC_END(pvm_hypercall) + +SYM_FUNC_START(pvm_save_fl) + movq PER_CPU_VAR(pvm_vcpu_struct + PVCS_event_flags), %rax + RET +SYM_FUNC_END(pvm_save_fl) + +SYM_FUNC_START(pvm_irq_disable) + btrq $X86_EFLAGS_IF_BIT, PER_CPU_VAR(pvm_vcpu_struct + PVCS_event_flags) + RET +SYM_FUNC_END(pvm_irq_disable) + +SYM_FUNC_START(pvm_irq_enable) + /* set X86_EFLAGS_IF */ + orq $X86_EFLAGS_IF, PER_CPU_VAR(pvm_vcpu_struct + PVCS_event_flags) + btq $PVM_EVENT_FLAGS_IP_BIT, PER_CPU_VAR(pvm_vcpu_struct + PVCS_event_flags) + jc .L_maybe_interrupt_pending + RET +.L_maybe_interrupt_pending: + /* handle pending IRQ */ + movq $PVM_HC_IRQ_WIN, %rax + jmp pvm_hypercall +SYM_FUNC_END(pvm_irq_enable) .popsection /* diff --git a/arch/x86/include/asm/pvm_para.h b/arch/x86/include/asm/pvm_para.h index f5d40a57c423..9484a1a23568 100644 --- a/arch/x86/include/asm/pvm_para.h +++ b/arch/x86/include/asm/pvm_para.h @@ -95,6 +95,9 @@ void pvm_user_event_entry(void); void pvm_hypercall(void); void pvm_retu_rip(void); void pvm_rets_rip(void); +void pvm_save_fl(void); +void pvm_irq_disable(void); +void pvm_irq_enable(void); #endif /* !__ASSEMBLY__ */ #endif /* _ASM_X86_PVM_PARA_H */ diff --git a/arch/x86/kernel/pvm.c b/arch/x86/kernel/pvm.c index 12a35bef9bb8..b4522947374d 100644 --- a/arch/x86/kernel/pvm.c +++ b/arch/x86/kernel/pvm.c @@ -148,6 +148,11 @@ static void pvm_load_tls(struct thread_struct *t, unsigned int cpu) } } +static noinstr void pvm_safe_halt(void) +{ + pvm_hypercall0(PVM_HC_IRQ_HALT); +} + void __init pvm_early_event(struct pt_regs *regs) { int vector = regs->orig_ax >> 32; @@ -387,6 +392,11 @@ void __init pvm_early_setup(void) pv_ops.cpu.write_msr_safe = pvm_write_msr_safe; pv_ops.cpu.load_tls = pvm_load_tls; + pv_ops.irq.save_fl = __PV_IS_CALLEE_SAVE(pvm_save_fl); + pv_ops.irq.irq_disable = __PV_IS_CALLEE_SAVE(pvm_irq_disable); + pv_ops.irq.irq_enable = __PV_IS_CALLEE_SAVE(pvm_irq_enable); + pv_ops.irq.safe_halt = pvm_safe_halt; + wrmsrl(MSR_PVM_VCPU_STRUCT, __pa(this_cpu_ptr(&pvm_vcpu_struct))); wrmsrl(MSR_PVM_EVENT_ENTRY, (unsigned long)(void *)pvm_early_kernel_event_entry - 256); wrmsrl(MSR_PVM_SUPERVISOR_REDZONE, PVM_SUPERVISOR_REDZONE_SIZE); From patchwork Mon Feb 26 14:36:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572357 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5796B14F975; Mon, 26 Feb 2024 14:39:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958384; cv=none; b=PNk6giOU5Us8xKasPHFblfCJzl/+aJsYOM0gy9MWcAgzxHM/aNB/aIUe8jpk39rtzkpXOpcQErRCjSUN6S5YCDSy9GLOwtu7nUw2iTLfRi/GYLZn/teVT/KcJWetgK8bEAwpuXdyAblVTDwXElIqzfvdaT3xyzemd53vJ9wkNow= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958384; c=relaxed/simple; bh=QoH9b93OPUhcL2ymZ5y++6zNb6PgFanAxGJWrP1JJ0U=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=crBDLbEH3oQ2qnu+bec/j/m/cNoeOlI4dZP5O78mxtxh3Er1R+Kw54UFaHwKPAf/pbbkeehwYWbywNBsE/GwafMo2brx6VKJU/YjdKyL4mao1zkely1/osvWZ6r0SIIIu5V2w7BA4VJ9ZGkVJD/Mr3fsQdA0AUtRzov/qJxrK1U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=baCu8jxg; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="baCu8jxg" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-1dcad814986so4575105ad.0; Mon, 26 Feb 2024 06:39:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958382; x=1709563182; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XyUwxtfxrCb1Y6YMuPhhZhGxcT5rupSBWwJ7nh5wrj4=; b=baCu8jxggOlFmm32ZNTeTq/kh3WFkP0czlhly5opQogjyr0JRVQeOQk9sURG2mJapy 4V+tYza4FsgD4jFoEkSyc5CpqCobD8PA/6cewMx2OurmV/ZawSEIaTT+jKmIU/mlNDs6 DBTGc1AdIntbZDEAFCjG5YQ2zlzHs1A13VZfCRusGOD9Z7iuodQwaGZdOvZbMiX+sDMk zALizPMe5+BZMeaKmGHZBdiGUwOPIqE+N4IEH73wRPbyXv3yFH6j2pg/VJMNNz2lbWhS 0dPfcL0r+5JlNLEmfAQTNCQ/N0OSUgSjgrF1sQR2bV5/wSosfyxjpmu152GveVT2JhEl N0Og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958382; x=1709563182; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XyUwxtfxrCb1Y6YMuPhhZhGxcT5rupSBWwJ7nh5wrj4=; b=cD0lIJ6kdxAVqtnQakxzVGuTxL+db2cmFiIeMo7UmEVxfpqkqZqgIfI92Wf8DSx3+t FZrTvOSktcgAnF/q++UK+P4YYG6grhkl0a4VtSWWISqwwiOWkQ0vYhvUpvKGM2uIjLwl /1Y+oSNsCtCrWn5jrZgSDErKMyh8QBgVuBiusrom3fS47HRYpoWZInu/yYTnaR4h39in 9+EKVb/BrAuIRtrFzUeBnLfNSsq0lcl1f5pR8qRd1it/m1Y6ORenIB35oT/2pWl+Uzn+ SB5sz+emaidss7b4I+YebbAnCeAMMXMqls0ifgYrzO/gVqeGbKjyCXalou4EJsP42+mS 9jfg== X-Forwarded-Encrypted: i=1; AJvYcCXM3kD1MrVOf2G20ASCSTpbO380nStnmZ2jynQyAY2L8rCPKEAgtSireh5swqrwYFdjc3NNHc/hNUa2sgeZdGJnTe2q X-Gm-Message-State: AOJu0Ywt2qghaUN6qZkzPjyEwn9aqVtUfnNaX1BCbwp0X2Rr3fvfPY3Y Oe0s4o8cY1HKp2CNi+7NURLkWVv+mVJBBYC/AnI38XHSxVicq+b/zQnNwd4u X-Google-Smtp-Source: AGHT+IEn1oqsBh4RGj9UVLCZqdOuZh4zIFwWm/F6YVc6bX7bWgQDy8/p7G+ALzLCY13ngxYKBB6tVw== X-Received: by 2002:a17:903:2281:b0:1db:35b5:7e37 with SMTP id b1-20020a170903228100b001db35b57e37mr9442029plh.50.1708958382541; Mon, 26 Feb 2024 06:39:42 -0800 (PST) Received: from localhost ([47.88.5.130]) by smtp.gmail.com with ESMTPSA id ix14-20020a170902f80e00b001dbcb39dd7dsm3992856plb.125.2024.02.26.06.39.41 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:39:42 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Lai Jiangshan , Hou Wenlong , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 69/73] x86/pvm: Implement mmu related PVOPS Date: Mon, 26 Feb 2024 22:36:26 +0800 Message-Id: <20240226143630.33643-70-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lai Jiangshan CR2 is passed directly in the event entry, allowing it to be read directly in PVOPS. Additionally, write_cr3() for context switch needs to notify the hypervisor in its PVOPS. For performance reasons, TLB-related PVOPS utilize hypercalls. Signed-off-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/kernel/pvm.c | 56 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) diff --git a/arch/x86/kernel/pvm.c b/arch/x86/kernel/pvm.c index b4522947374d..1dc2c0fb7daa 100644 --- a/arch/x86/kernel/pvm.c +++ b/arch/x86/kernel/pvm.c @@ -21,6 +21,7 @@ #include DEFINE_PER_CPU_PAGE_ALIGNED(struct pvm_vcpu_struct, pvm_vcpu_struct); +static DEFINE_PER_CPU(unsigned long, pvm_guest_cr3); unsigned long pvm_range_start __initdata; unsigned long pvm_range_end __initdata; @@ -153,6 +154,52 @@ static noinstr void pvm_safe_halt(void) pvm_hypercall0(PVM_HC_IRQ_HALT); } +static noinstr unsigned long pvm_read_cr2(void) +{ + return this_cpu_read(pvm_vcpu_struct.cr2); +} + +static noinstr void pvm_write_cr2(unsigned long cr2) +{ + native_write_cr2(cr2); + this_cpu_write(pvm_vcpu_struct.cr2, cr2); +} + +static unsigned long pvm_read_cr3(void) +{ + return this_cpu_read(pvm_guest_cr3); +} + +static unsigned long pvm_user_pgd(unsigned long pgd) +{ + return pgd | BIT(PTI_PGTABLE_SWITCH_BIT) | BIT(X86_CR3_PTI_PCID_USER_BIT); +} + +static void pvm_write_cr3(unsigned long val) +{ + /* Convert CR3_NO_FLUSH bit to hypercall flags. */ + unsigned long flags = ~val >> 63; + unsigned long pgd = val & ~X86_CR3_PCID_NOFLUSH; + + this_cpu_write(pvm_guest_cr3, pgd); + pvm_hypercall3(PVM_HC_LOAD_PGTBL, flags, pgd, pvm_user_pgd(pgd)); +} + +static void pvm_flush_tlb_user(void) +{ + pvm_hypercall0(PVM_HC_TLB_FLUSH_CURRENT); +} + +static void pvm_flush_tlb_kernel(void) +{ + pvm_hypercall0(PVM_HC_TLB_FLUSH); +} + +static void pvm_flush_tlb_one_user(unsigned long addr) +{ + pvm_hypercall1(PVM_HC_TLB_INVLPG, addr); +} + void __init pvm_early_event(struct pt_regs *regs) { int vector = regs->orig_ax >> 32; @@ -397,6 +444,15 @@ void __init pvm_early_setup(void) pv_ops.irq.irq_enable = __PV_IS_CALLEE_SAVE(pvm_irq_enable); pv_ops.irq.safe_halt = pvm_safe_halt; + this_cpu_write(pvm_guest_cr3, __native_read_cr3()); + pv_ops.mmu.read_cr2 = __PV_IS_CALLEE_SAVE(pvm_read_cr2); + pv_ops.mmu.write_cr2 = pvm_write_cr2; + pv_ops.mmu.read_cr3 = pvm_read_cr3; + pv_ops.mmu.write_cr3 = pvm_write_cr3; + pv_ops.mmu.flush_tlb_user = pvm_flush_tlb_user; + pv_ops.mmu.flush_tlb_kernel = pvm_flush_tlb_kernel; + pv_ops.mmu.flush_tlb_one_user = pvm_flush_tlb_one_user; + wrmsrl(MSR_PVM_VCPU_STRUCT, __pa(this_cpu_ptr(&pvm_vcpu_struct))); wrmsrl(MSR_PVM_EVENT_ENTRY, (unsigned long)(void *)pvm_early_kernel_event_entry - 256); wrmsrl(MSR_PVM_SUPERVISOR_REDZONE, PVM_SUPERVISOR_REDZONE_SIZE); From patchwork Mon Feb 26 14:36:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572358 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5AF5414F99A; Mon, 26 Feb 2024 14:39:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958388; cv=none; b=Xtt9UbHpH6P/p4wOwKgkeCFWkrmr1ejVFlabPwgL4Y9Is6R1f0X2Rbdxmf8Zly9c1c45mmXKsTTMD26yEwMsVvJMP/VkBpx0Xw7c+34GooGpy9x/DlmyljwnR6Vnrl6t63s7b8LpQDDinC/7wAdWANts93BxslQcNbTnc5LHGh0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958388; c=relaxed/simple; bh=pVj+O/nc8ZTcYQP0Pc8FIf1e7FsTCb44SoV4NEs/Y14=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=C1pjGSqrARbMeo95uO18j1GSKPOkwJZ8uECl1Dw/BSrsvzm9TMlRYCfLH13oJP2Yv5LRaRBeraznBOtir3LcoZpBFSTKVOnVcKVW/PRSkgHpTbuO+AGC5g4PsmIBBAM7kA5aZLTqWuhgp4Bk+40p+vsNsxLEls1F14Vqhtxp0xo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=AW4B7y+r; arc=none smtp.client-ip=209.85.214.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="AW4B7y+r" Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-1dc139ed11fso20145375ad.0; Mon, 26 Feb 2024 06:39:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958386; x=1709563186; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9QI/Y1kNMrQvMG7JUt6H0OqHjkXiCz72P215jqTcUcU=; b=AW4B7y+rghpeQAyzXcXbndgLtRl7Xk0kB94Rrs2BoPDJssOJFonAJtAF2sGkbkrI8B Y7LMbnnGWOdUMtDOQQXnV3lNGHf9EJtTJIBWXaGMWr3Ao+gVAsPlk0xqKkKCmn2S71gX dJlPKwaPXre+Ac9lBn+Zk3p6D4jWzSqoS8CtUolqR+1zRKqQ+p7okihu4PbbsMR4sGXT qDifUEWK53SwSwxIhPbt5iSrC2pw7ymVPWJjlik+Ef89MzW5v9I5vyl6MNqEromADrV4 Y7mcYyjqdiOkm9LxvugUidtd/lCUXE1J+bgl6vOzzpjmW6fyWJknJELc9BTkET3IjItD xvtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958386; x=1709563186; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9QI/Y1kNMrQvMG7JUt6H0OqHjkXiCz72P215jqTcUcU=; b=JzRpcs8o90som9KMzub/eUKa+APJsLvrbq2WYjBHYuYnAz4XWoZ8yxk6hVaGu7WjCt L/3E00LlgqFDpaG1OcNTbAJIikMtuQ2vj8JkGFEdyHYOfqNv1PH7qL3/gVXbvB55SzK6 UZB3BDhE90dgRU5A4zZ9kas8pBfqoyUQQfpGgOmhEnS3Na8F0LW55rIf5PIrq7J/X3tL FbnHEkkzcQ+DTIYHFMJjEOx31mZAv90AzrFDe8DOq6L71vOhyQEiqdt0Ny9dJpzjbQlo xmbpIFuo6VHfD6utL3EMkE00l6TIpZNK+rdQYiDTUpp3BSiyesZaycLkrFY01B8hLQAL ontQ== X-Forwarded-Encrypted: i=1; AJvYcCU4VZCyjBJkJJZJmxn6potz7Y4rvMe/78tgwhtJNCTHnjD1vty7UcKSlem25E7Drl6YunslDCxFQo+lejv990+wR5qy X-Gm-Message-State: AOJu0YxnTF1cfeTA1KEuOSAMTjmLxCt/oVBxX2PsM7A8UQcc5KH5h6V9 ikKFXVdDZLsDX8u1+EoMHm3tp2T15ZszClW8HV95qVvjDeBZQfhAasBPbnSf X-Google-Smtp-Source: AGHT+IHy8ONH+IR/CrprMSlLNr66EQ8UbtExowfbbOUGEyBdvChuDg4mhNFNKU725s1aIh6O+0Jq+w== X-Received: by 2002:a17:902:e806:b0:1dc:6e06:7685 with SMTP id u6-20020a170902e80600b001dc6e067685mr9416736plg.29.1708958386631; Mon, 26 Feb 2024 06:39:46 -0800 (PST) Received: from localhost ([198.11.176.14]) by smtp.gmail.com with ESMTPSA id g12-20020a170902fe0c00b001dc91b4081dsm2841692plj.271.2024.02.26.06.39.45 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:39:46 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Hou Wenlong , Lai Jiangshan , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" , "Kirill A. Shutemov" , Mike Rapoport , Rick Edgecombe Subject: [RFC PATCH 70/73] x86/pvm: Don't use SWAPGS for gsbase read/write Date: Mon, 26 Feb 2024 22:36:27 +0800 Message-Id: <20240226143630.33643-71-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Hou Wenlong On PVM guest, SWAPGS doesn't work. So let __rdgsbase_inactive() and __wrgsbase_inactive() to use rdmsrl()/wrmsrl() on PVM guest. Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/kernel/process_64.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 33b268747bb7..9a56bcef515e 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -157,7 +157,7 @@ enum which_selector { * traced or probed than any access to a per CPU variable happens with * the wrong GS. * - * It is not used on Xen paravirt. When paravirt support is needed, it + * It is not used on Xen/PVM paravirt. When paravirt support is needed, it * needs to be renamed with native_ prefix. */ static noinstr unsigned long __rdgsbase_inactive(void) @@ -166,7 +166,8 @@ static noinstr unsigned long __rdgsbase_inactive(void) lockdep_assert_irqs_disabled(); - if (!cpu_feature_enabled(X86_FEATURE_XENPV)) { + if (!cpu_feature_enabled(X86_FEATURE_XENPV) && + !cpu_feature_enabled(X86_FEATURE_KVM_PVM_GUEST)) { native_swapgs(); gsbase = rdgsbase(); native_swapgs(); @@ -184,14 +185,15 @@ static noinstr unsigned long __rdgsbase_inactive(void) * traced or probed than any access to a per CPU variable happens with * the wrong GS. * - * It is not used on Xen paravirt. When paravirt support is needed, it + * It is not used on Xen/PVM paravirt. When paravirt support is needed, it * needs to be renamed with native_ prefix. */ static noinstr void __wrgsbase_inactive(unsigned long gsbase) { lockdep_assert_irqs_disabled(); - if (!cpu_feature_enabled(X86_FEATURE_XENPV)) { + if (!cpu_feature_enabled(X86_FEATURE_XENPV) && + !cpu_feature_enabled(X86_FEATURE_KVM_PVM_GUEST)) { native_swapgs(); wrgsbase(gsbase); native_swapgs(); From patchwork Mon Feb 26 14:36:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572359 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1213112FF65; Mon, 26 Feb 2024 14:39:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958394; cv=none; b=It7QpyFN28its+hFCi0EtC82jQba0dBfdNMNIclsLJRq/5cZ1SrrYQPqHu7uEsCc3ej5J1rII3FWfY37W4LjfIRUOyDDt62Fa/GG/ANVioxVUzEZAzB9tZzJBN7bYhOY5NRTYEDpvZY8PKLWeHa7zGawpvwK5hheokug07hmoKA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958394; c=relaxed/simple; bh=BruDZbIrLo4Z+hSY5ZDPzvxJsJH0osgJZhLwKVbC7e4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=tXiml4QqNQgRxU65Tt95oVUIycuZkcwLGTNQ3Vgj8CO4Aze2PKpadF13YXK2EoZhFC/OcLPErcM5eR+Pl+t2k5D/2IS0rd52DXGjYdq6Meb/rlWD747IWUOUg7xwmsx/Qiou6lsjRLTm2ssOjgAl+dZYl2cGgzQIu2PB+mmLe/Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ZsMtG2cv; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ZsMtG2cv" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-1dcab44747bso5069485ad.1; Mon, 26 Feb 2024 06:39:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958392; x=1709563192; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+APYg9JF6Nj9qghtD9fNf7B9/lqpn0PpSUUFoa6b37s=; b=ZsMtG2cv/P/ms+BaQgznmyeKNlwzUMpbRQAVhggt/I5V50wC/4uRQ5P7aSkEAO6s/F XIxsQlYbAXALsUlLVjpKwTuaY4gVmAjXmFC+smV7zRXCxFPf/eL4xle1LVjB2iYmYn0Q +ff39tv/QYJbaTRCTv17CBY2PC3uF7cyDasf3BzWYChKsNQFHq3hfnYTWwAfkQXszdRb q+nB91lskGq+DAPV9jR7RnR0YvBJxcUyxLVtMC9NAgGFjgkowedY9OHhSkYg/7AEJn34 NELxXKVcJbCdRc0XdMMiyEhkwJRwvhPtzTbo3KVM13snnmSJRDW+9WwVAz/lUKRzZt+K pwYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958392; x=1709563192; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+APYg9JF6Nj9qghtD9fNf7B9/lqpn0PpSUUFoa6b37s=; b=Su4527EYAfvSnlg9XOUz9m3bsl0TPE5kJmdoeVnna5VEI+N09ClmLhQFayfI3rv++f 2lWExhfDuKC228GCnIZ/mMqQnMtltN8v/dbl+9tddmxNpfVc38y2kGN62XjeNns2CSjL HPUdbwg5kL/wik3Hkw8+FblzgrE3gED83o61YUTItCDfDHyD6xN8Wq9aTpQeCM1BW7Gb T5re2L/ZtlsaCJeg5EnAvCu/R4AyW96E/8p3Ak+aKz6knX+N7nJF4fIXMHZRYeGshpZw cVvjqQ0/ss4Ix9BjWpFKnINB/QTuv+j6imFnlLHK2QEIMP/oHgPZq4jC56jnKQhs6K3P oOSw== X-Forwarded-Encrypted: i=1; AJvYcCUxNWce/wTvVsqIUe1hKSfdWWijlfVCXqp0TbF0/jdMG6y1iJ7vJ7kxNjuxoaIS6GtlUhIpPcCwIu9KA2MGJSGkZq3f X-Gm-Message-State: AOJu0YxtP1OYPTy1D6iN5H/pM051ewkrW8BR41ncwOyOAWLM7cXGVGxh lvQIEOIPOWQDiyhum0MXwUXqcGDoN7wBNAlV25kK2sru8bNsv7exg+Hw5HW1 X-Google-Smtp-Source: AGHT+IFFKf1/p22eofDzSLdOn4ig5DMTPrqWRXSvqSeQhcPwjZzCENJYGc+o2E87m6hgwQmJkreMaQ== X-Received: by 2002:a17:903:258d:b0:1dc:63fd:39c1 with SMTP id jb13-20020a170903258d00b001dc63fd39c1mr5445941plb.54.1708958391807; Mon, 26 Feb 2024 06:39:51 -0800 (PST) Received: from localhost ([47.89.225.180]) by smtp.gmail.com with ESMTPSA id i6-20020a170902eb4600b001dc0e5ad5desm4032787pli.114.2024.02.26.06.39.51 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:39:51 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Hou Wenlong , Lai Jiangshan , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" Subject: [RFC PATCH 71/73] x86/pvm: Adapt pushf/popf in this_cpu_cmpxchg16b_emu() Date: Mon, 26 Feb 2024 22:36:28 +0800 Message-Id: <20240226143630.33643-72-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Hou Wenlong The pushf/popf instructions in this_cpu_cmpxchg16b_emu() are non-privilege instructions, so they cannot be trapped and emulated, which could cause a boot failure. However, since the cmpxchg16b instruction is supported for PVM guest. we can patch this_cpu_cmpxchg16b_emu() and use cmpxchg16b directly. Suggested-by: Lai Jiangshan Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/kernel/pvm.c | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/arch/x86/kernel/pvm.c b/arch/x86/kernel/pvm.c index 1dc2c0fb7daa..567ea19d569c 100644 --- a/arch/x86/kernel/pvm.c +++ b/arch/x86/kernel/pvm.c @@ -413,6 +413,34 @@ __visible noinstr void pvm_event(struct pt_regs *regs) common_interrupt(regs, vector); } +asm ( + ".pushsection .rodata \n" + ".global pvm_cmpxchg16b_emu_template \n" + "pvm_cmpxchg16b_emu_template: \n" + " cmpxchg16b %gs:(%rsi) \n" + " ret \n" + ".global pvm_cmpxchg16b_emu_tail \n" + "pvm_cmpxchg16b_emu_tail: \n" + ".popsection \n" +); + +extern u8 this_cpu_cmpxchg16b_emu[]; +extern u8 pvm_cmpxchg16b_emu_template[]; +extern u8 pvm_cmpxchg16b_emu_tail[]; + +static void __init pvm_early_patch(void) +{ + /* + * The pushf/popf instructions in this_cpu_cmpxchg16b_emu() are + * non-privilege instructions, so they cannot be trapped and emulated, + * which could cause a boot failure. However, since the cmpxchg16b + * instruction is supported for PVM guest. we can patch + * this_cpu_cmpxchg16b_emu() and use cmpxchg16b directly. + */ + memcpy(this_cpu_cmpxchg16b_emu, pvm_cmpxchg16b_emu_template, + (unsigned int)(pvm_cmpxchg16b_emu_tail - pvm_cmpxchg16b_emu_template)); +} + extern void pvm_early_kernel_event_entry(void); /* @@ -457,6 +485,8 @@ void __init pvm_early_setup(void) wrmsrl(MSR_PVM_EVENT_ENTRY, (unsigned long)(void *)pvm_early_kernel_event_entry - 256); wrmsrl(MSR_PVM_SUPERVISOR_REDZONE, PVM_SUPERVISOR_REDZONE_SIZE); wrmsrl(MSR_PVM_RETS_RIP, (unsigned long)(void *)pvm_rets_rip); + + pvm_early_patch(); } void pvm_setup_event_handling(void) From patchwork Mon Feb 26 14:36:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572360 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D420B12FF65; Mon, 26 Feb 2024 14:39:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958399; cv=none; b=KZxBri8jwVB/w9hmAobU+3kD2jkM6s+ycl0R3QZC4FJgkzZ9I2qDv3wH3eMmcfey+guXdYY9dE7MDcAxjtO8wKbxeFk84bIzt5FpcRYb+KSNe/m2iKbnO8Iybi4K5I41xJ7sgS1EFC1yh++T+PrxuglmR9wBmMSgbRNKAi1OEJI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958399; c=relaxed/simple; bh=iXEXISMa0LFDWwH0oSNq32/ssO3bNLLFAadmTz0LNz8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=bvD6JkBNQiJYrNtEAUqE1OhILjasB96j+e08Qyk4pjlQr1ouDG98pHbk65wKgcPN4KmTQjLiJgIKXzbjysMmNoWzu5PkKGC76qKGGUzqqujQTFWOYByb4JsiJ83jxUfgyLYbH2mta4COuuiAuh4e0xBKsU2fBv1zs48cNSx52II= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=HOfXGjYW; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="HOfXGjYW" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-1dbd32cff0bso22045055ad.0; Mon, 26 Feb 2024 06:39:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958397; x=1709563197; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ozpAnnPH2LU7+faQB5EOaaa/mJ5DT/kXmcQ3UpGuV/k=; b=HOfXGjYWTP6tnBFHUQ73Ws43EeJxrZM8JckujZksQcaGxSPCpGbD4+qbmIFYy60IRI KmHoGfosvlLlVB0FPXastSR2S6V+ufyaJ9AKUKD4H4Y61Q5Fnyez/FEhSOsfAjW0xzsa VmBdwtwVdj2P3gKTVliH3wC8g9MTn3xANfDA2WJ8YTre5EOteItxX1ZUQyVn6c61PyXP 9/60w6bHUrWmhTceE2pb2oV1G0DdQ0pQFhS8Wm/lMdBwFljermGdWB6DepFpc4VndcSN 739AFGHwcshPFIWkceqf9lpoqDg0sLZZ/ZYzRxkpcS67vp29cjIVugfhgeOGpm+stHSQ RxNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958397; x=1709563197; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ozpAnnPH2LU7+faQB5EOaaa/mJ5DT/kXmcQ3UpGuV/k=; b=NcmIwCMSU2BQ5S6t6tzH9/yxkp86AcoEzQ2F4yw87XHX/Z+Jtdg2MHIDdUbncMGBRt H/xQ4QQ6JsNxzcKZm4YXTRdPP7ur5DCwXbnAMJwfatZFraMLfoezjQ6tv+VSnHZJTc3v bH99jfPNhKvuBplJ3y6PsIyBKzbTkw93KyPxmUxxOp3xyE7A7FGwoyXGOAgUaBgtQr1b ol94Sffx8/ip3X3+0EMzgYsTyikb+D/JBpJoVTs8nYHugPn+JdHcgqleU+1v3vtUZgNt VpwXC2J0KlznPBBXcTm1Yl+AOF1xs2erRaLi1kb6qJVzrvfZ5Lqjlh+nU9PaHZKiAEv/ clTA== X-Forwarded-Encrypted: i=1; AJvYcCVQCK2hXqxYk4PXrvrafKlPa62AUBnmFPMaqlhBk+8m7ASPOnNDdJg0zfLC56mIC4TFMk2pIjdUq4XPrGPAQQ7PqQFq X-Gm-Message-State: AOJu0YxLlaSBrXAs7mMlXZLMLZhTg9e6lV6njNdIKVp8w2wxiC3j9Gdo mewRL5icQGokoMqd79rsqlSt9ZScaN+Ahpq88ZtLc0YJYPiKu25hz4FOFUSQ X-Google-Smtp-Source: AGHT+IHum+mxS4FQWGohuWSxE1bsdYayCIrR4AxVpGaAxnRumWchRbzfaDoIjobdQIcpfnlAdJSBZA== X-Received: by 2002:a17:902:d2c2:b0:1db:e7a4:90a8 with SMTP id n2-20020a170902d2c200b001dbe7a490a8mr7447890plc.10.1708958397036; Mon, 26 Feb 2024 06:39:57 -0800 (PST) Received: from localhost ([47.254.32.37]) by smtp.gmail.com with ESMTPSA id v17-20020a170902d09100b001dc11f90512sm3424456plv.126.2024.02.26.06.39.56 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:39:56 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Hou Wenlong , Lai Jiangshan , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Dave Hansen , "H. Peter Anvin" , Sami Tolvanen , Fangrui Song , Willy Tarreau , Thomas Garnier , Josh Poimboeuf , Xin Li Subject: [RFC PATCH 72/73] x86/pvm: Use RDTSCP as default in vdso_read_cpunode() Date: Mon, 26 Feb 2024 22:36:29 +0800 Message-Id: <20240226143630.33643-73-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Hou Wenlong The CPUNODE description of the guest cannot be installed into the host's GDT, as this index is also used for the host to retrieve the current CPU in paranoid entry. As a result, LSL in vdso_read_cpunode() does not work correctly for the PVM guest. To address this issue, use RDTSCP as the default in vdso_read_cpunode(), as it is supported by the hypervisor. Suggested-by: Lai Jiangshan Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/include/asm/alternative.h | 14 ++++++++++++++ arch/x86/include/asm/segment.h | 14 ++++++++++---- 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index cf4b236b47a3..caebb49c5d61 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -299,6 +299,20 @@ static inline int alternatives_text_reserved(void *start, void *end) asm_inline volatile (ALTERNATIVE(oldinstr, newinstr, ft_flags) \ : output : "i" (0), ## input) +/* + * This is similar to alternative_io. But it has two features and + * respective instructions. + * + * If CPU has feature2, newinstr2 is used. + * Otherwise, if CPU has feature1, newinstr1 is used. + * Otherwise, oldinstr is used. + */ +#define alternative_io_2(oldinstr, newinstr1, ft_flags1, newinstr2, \ + ft_flags2, output, input...) \ + asm_inline volatile (ALTERNATIVE_2(oldinstr, newinstr1, ft_flags1, \ + newinstr2, ft_flags2) \ + : output : "i" (0), ## input) + /* Like alternative_io, but for replacing a direct call with another one. */ #define alternative_call(oldfunc, newfunc, ft_flags, output, input...) \ asm_inline volatile (ALTERNATIVE("call %P[old]", "call %P[new]", ft_flags) \ diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h index 9d6411c65920..555966922e8f 100644 --- a/arch/x86/include/asm/segment.h +++ b/arch/x86/include/asm/segment.h @@ -253,11 +253,17 @@ static inline void vdso_read_cpunode(unsigned *cpu, unsigned *node) * hoisting it out of the calling function. * * If RDPID is available, use it. + * + * If it is PVM guest and RDPID is not available, use RDTSCP. */ - alternative_io ("lsl %[seg],%[p]", - ".byte 0xf3,0x0f,0xc7,0xf8", /* RDPID %eax/rax */ - X86_FEATURE_RDPID, - [p] "=a" (p), [seg] "r" (__CPUNODE_SEG)); + alternative_io_2("lsl %[seg],%[p]", + ".byte 0x0f,0x01,0xf9\n\t" /* RDTSCP %eax:%edx, %ecx */ + "mov %%ecx,%%eax\n\t", + X86_FEATURE_KVM_PVM_GUEST, + ".byte 0xf3,0x0f,0xc7,0xf8", /* RDPID %eax/rax */ + X86_FEATURE_RDPID, + [p] "=a" (p), [seg] "r" (__CPUNODE_SEG) + : "cx", "dx"); if (cpu) *cpu = (p & VDSO_CPUNODE_MASK); From patchwork Mon Feb 26 14:36:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lai Jiangshan X-Patchwork-Id: 13572361 Received: from mail-oo1-f44.google.com (mail-oo1-f44.google.com [209.85.161.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5491912FF82; Mon, 26 Feb 2024 14:40:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958406; cv=none; b=Ei0mvy+PXUb/nEimi9lx3atwd836vuzcxHv6waRv3Va1jk0iZ68UB+ejnxzHR6m7rYa+//INeJIgGqjgAKviDof0FDWZzDEyPC6z3lKoVKgnlNgcng7vlWzJOPMd1W2vy+T8UyAxRLwH8ThCgZ6eBTi8UweFlsEmTHrnvLXKng4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708958406; c=relaxed/simple; bh=qqbq3Bwm43dJh8id89W2TYXp5QH09UlcB6vdIL6AVDQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=POY5p6iMnY4fq/1jZPXLBK2pvc3J7U5k+JIu04CpnrZwNPYPvnREQ1dxzNmj8OV2nMTc8Cqei4fNKpcA7qfuOnt5Qyv1ONl/7Gf67TwONp8n868zSv/wS7zUbHCgH1q918zGfGuY8VBxw0tZmpZoUGFcVXEeSihz7l/w3W4yyoA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=I5zmQ07g; arc=none smtp.client-ip=209.85.161.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="I5zmQ07g" Received: by mail-oo1-f44.google.com with SMTP id 006d021491bc7-5a0919f2022so297194eaf.1; Mon, 26 Feb 2024 06:40:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708958404; x=1709563204; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=nuFgloaw/T98jLpJSXGJfxanwsQCbN3BvZjtpbYZY9M=; b=I5zmQ07gPcROjHLJqzlY7T+TqWGFmh5r/xPRPl5gPKT39e+EE1C2z91kPpQyl0ic// uinDm/fqztzbPAUPWKGnW5+MVDGmdcKXAr4qAHnjEfAOc0eKMGzz5EsAxeagfgXB+G1g XcX7Yn3YAvHtLCXbpxdCNEdf/nrWrsXcXewC3JIv92Y/ujDNbIX++id5V0cqPtUYogf1 D3yJmRXV1c5oFDk1pgnHHKiogtZbfMKWUBGK4GSmJzyRuNm5PMzdotHRYjH+U/PZVs7o lJ1bWPXgmsIIdPbBae0iOCe17JN9ZzIBb6F9tBqSOPlHW8u++oD5VWxAEzYbRGuOOSW5 eKwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708958404; x=1709563204; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nuFgloaw/T98jLpJSXGJfxanwsQCbN3BvZjtpbYZY9M=; b=rAmbsfq6l2rnSwVZ/ZrsZWxFtRjW1kHb/U43d0Q+K+ai+E6etxNTW0zLaH2VoT1eAE zglHjjHrbsYhok+vAbpERs3EjEWpMrtQBKGiLju3wmIP1CKv+H88sfO6LPFTz/9/JllO YvzPOYLyrXSiYjWV16nX2FPJ0VkjFh5p/RreSDzPpBXXkBNDCKGZAsiou9O/ULAA0vUX cPBfFaKGLEW3T/yqn01nlqkgFJkfC8O59r+L85akFaDspe250VTWDTDR5htdt27X0sVN ph0K8O1tvuOiTVW7SmjUB6hECAXMRMcSLMKJN5GwUwL4sqLbFY9qXMPIflPxVIfhKrEc BYVA== X-Forwarded-Encrypted: i=1; AJvYcCWqz0y6zgamjn4IXd9Z9uJKAK/BxhotOh18mOy/2gzA96I3J+OK8AnkUN50FXgvNiv4486gIjunkgC3D9rTlZ9dckoS X-Gm-Message-State: AOJu0YwuAImbKG18oPedZVuP4LYlhfsYuVC3+rsZgTL2wHRgqyyB5m0G 9B89qJJBpE/yehfJFycgdHUT0Hbf8Ie0Nl2FRNPpyA90U1bJlL4Bj/FFXB7t X-Google-Smtp-Source: AGHT+IGjgJBiYbBVOHtGYle9iWXhqY6cflLoOIZ20fo8AoJm+voj+6sWNjAMcfJ0VA4hKdsuB66yYQ== X-Received: by 2002:a05:6358:5e8a:b0:17b:56ad:6b14 with SMTP id z10-20020a0563585e8a00b0017b56ad6b14mr4467201rwn.2.1708958404049; Mon, 26 Feb 2024 06:40:04 -0800 (PST) Received: from localhost ([47.254.32.37]) by smtp.gmail.com with ESMTPSA id z12-20020a634c0c000000b005e2b0671987sm3998766pga.51.2024.02.26.06.40.03 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2024 06:40:03 -0800 (PST) From: Lai Jiangshan To: linux-kernel@vger.kernel.org Cc: Hou Wenlong , Lai Jiangshan , Linus Torvalds , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Borislav Petkov , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , x86@kernel.org, Kees Cook , Juergen Gross , Andy Lutomirski , Dave Hansen , "H. Peter Anvin" , "Kirill A. Shutemov" , Andrew Morton , Hugh Dickins Subject: [RFC PATCH 73/73] x86/pvm: Disable some unsupported syscalls and features Date: Mon, 26 Feb 2024 22:36:30 +0800 Message-Id: <20240226143630.33643-74-jiangshanlai@gmail.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20240226143630.33643-1-jiangshanlai@gmail.com> References: <20240226143630.33643-1-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Hou Wenlong n the PVM guest, the LDT won't be loaded into hardware, rendering it ineffective. Consequently, the modify_ldt() syscall should be disabled. Additionally, the VSYSCALL address is not within the allowed address range, making full emulation of the vsyscall page unsupported in the PVM guest. It is recommended to use XONLY mode instead. Furthermore, SYSENTER (Intel) and SYSCALL32 (AMD) are not supported by the hypervisor, so they should not be used in VDSO. Suggested-by: Lai Jiangshan Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/entry/vsyscall/vsyscall_64.c | 4 ++++ arch/x86/kernel/ldt.c | 3 +++ arch/x86/kernel/pvm.c | 4 ++++ 3 files changed, 11 insertions(+) diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c index f469f8dc36d4..dc6bc7fb490e 100644 --- a/arch/x86/entry/vsyscall/vsyscall_64.c +++ b/arch/x86/entry/vsyscall/vsyscall_64.c @@ -378,6 +378,10 @@ void __init map_vsyscall(void) extern char __vsyscall_page; unsigned long physaddr_vsyscall = __pa_symbol(&__vsyscall_page); + /* Full emulation is not supported in PVM guest, use XONLY instead. */ + if (vsyscall_mode == EMULATE && boot_cpu_has(X86_FEATURE_KVM_PVM_GUEST)) + vsyscall_mode = XONLY; + /* * For full emulation, the page needs to exist for real. In * execute-only mode, there is no PTE at all backing the vsyscall diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c index adc67f98819a..d75815491d7e 100644 --- a/arch/x86/kernel/ldt.c +++ b/arch/x86/kernel/ldt.c @@ -669,6 +669,9 @@ SYSCALL_DEFINE3(modify_ldt, int , func , void __user * , ptr , { int ret = -ENOSYS; + if (cpu_feature_enabled(X86_FEATURE_KVM_PVM_GUEST)) + return (unsigned int)ret; + switch (func) { case 0: ret = read_ldt(ptr, bytecount); diff --git a/arch/x86/kernel/pvm.c b/arch/x86/kernel/pvm.c index 567ea19d569c..b172bd026594 100644 --- a/arch/x86/kernel/pvm.c +++ b/arch/x86/kernel/pvm.c @@ -457,6 +457,10 @@ void __init pvm_early_setup(void) setup_force_cpu_cap(X86_FEATURE_KVM_PVM_GUEST); setup_force_cpu_cap(X86_FEATURE_PV_GUEST); + /* Don't use SYSENTER (Intel) and SYSCALL32 (AMD) in vdso. */ + setup_clear_cpu_cap(X86_FEATURE_SYSENTER32); + setup_clear_cpu_cap(X86_FEATURE_SYSCALL32); + /* PVM takes care of %gs when switching to usermode for us */ pv_ops.cpu.load_gs_index = pvm_load_gs_index; pv_ops.cpu.cpuid = pvm_cpuid;