[RFC,62/67] KVM: TDX: Load and init TDX-SEAM module during boot

From: Sean Christopherson <sean.j.christopherson@intel.com>

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a hook into the early boot flow to load TDX-SEAM and do BSP-only
init of TDX-SEAM.

Perform TDSYSINIT, TDSYSINITLP sequence to initialize TDX during kernel
boot.  Call TDSYSINIT on BSP for platform level initialization, and call
TDSYSINITLP for all cpus for per-cpu initialization.

On BSP, also call TDSYSINFO to get TDX info right after TDSYSINITLP.
While TDX initialization on AP is done in identify_cpu() when AP is
brought up, on BSP it is done right after SEAM module is loaded, but
not in identify_cpu(). The reason is constructing TDMRs needs to be
done before kernel normal page allocator is up, since it requires to
reserve large memory for PAMT (>4MB), which kernel page allocator cannot
allocate. And reserving how much memory for PAMT requires TDX info
reteurned by TDSYSINFO, so it also needs to be done in BSP right after
TDSYSINITLP.

Check kernel parameters and other variables that prevent/indicate that
not all logical CPUs can be onlined.  TDSYSINITLP must be called on all
logical CPUs as part of TDX-SEAM configuration, e.g. TDSYSCONFIG is
guaranteed to fail if not all CPUs are onlined.

Query the 'nr_cpus', 'possible_cpus' and 'maxcpus' kernel parameters, as
well as the 'disabled_cpus' counter that can be incremented during ACPI
parsing (CPUs marked as disabled cannot be brought up later).

Note, the kernel ignores the "Online Capable" bit defined in the ACPI
specification v6.3, section 5.2.12.2 Processor Local APIC Structure:

  CPUs marked as disabled ("Enabled" bit cleared) but it can be
  brought up later by OS if "Online Capable" bit is set.

and simply treats ACPI hot-added CPUs as enabled, i.e. with ACPI CPU
hotplug, the aforementioned variables can change dynamically post-boot.
But, CPU hotplug is unsupported on TDX enabled systems, therefore the
variables are effectively constant post-boot TDX.

In the post-SMP boot phase (tdx_init()), verify that all present CPUs
were succesfully booted.  Note that this also covers the SMT=off case,
i.e. verifies that to-be-disabled sibling threads are booted and run
through TDSYSINITLP.

Detect the TDX private keyID range by reading MSR_IA32_MKTME_KEYID_PART,
which is configured by BIOS and partitions the MKTME KeyID space into
regular KeyIDs and TDX-only KeyIDs.  Disable TDX if the partitioning is
not consistent across all CPUs, i.e. if BIOS screwed up.

Construct Trust Domain Memory Regions (TDMRs) based on info reported by
TDSYSINFO.  For simplicity, all system memory is configured as TDMRs,
otherwise page allocator needs to be modified to distinguish normal and
TD memory allocation.  The overhead of marking all memory as TDMRs
consists of the memory needed for TDX-SEAM's Physical Address Metadata
Tables (PAMTs) used to track TDMRs.

TDMRs are constructed (and PAMTs associated with TDMRs are reserved)
on basis of NUMA node for better performance -- when accessing TD
memory in TDMR, CPU doesn't have to access PAMT in remote node.

Sanity check that the CMRs reported by TDSYSINFO have covered all memory
reported in e820, and disable TDX if there is a discrepancy.  If there
is memory available to the kernel (reported in e820) that is not covered
by a TDMR then it's possible the page allocator will allocate a page
that's not usable for a TD's memory, i.e. would break KVM.

Once all enumeration and sanity checking is done, call TDSYSCONFIG,
TDSYSCONFIGKEY and TDSYSINITTDMR to configure and initialize TDMRs.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/Kbuild                     |    1 +
 arch/x86/include/asm/kvm_boot.h     |   43 +
 arch/x86/kernel/cpu/intel.c         |    4 +
 arch/x86/kernel/setup.c             |    3 +
 arch/x86/kvm/Kconfig                |    8 +
 arch/x86/kvm/boot/Makefile          |    5 +
 arch/x86/kvm/boot/seam/seamldr.S    |  188 +++++
 arch/x86/kvm/boot/seam/seamloader.c |  162 ++++
 arch/x86/kvm/boot/seam/tdx.c        | 1131 +++++++++++++++++++++++++++
 9 files changed, 1545 insertions(+)
 create mode 100644 arch/x86/include/asm/kvm_boot.h
 create mode 100644 arch/x86/kvm/boot/Makefile
 create mode 100644 arch/x86/kvm/boot/seam/seamldr.S
 create mode 100644 arch/x86/kvm/boot/seam/seamloader.c
 create mode 100644 arch/x86/kvm/boot/seam/tdx.c

Message ID	542b02522475c69143e3ac8bcf6014b7db03bd55.1605232743.git.isaku.yamahata@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66E0EC2D0A3 for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1D98224199 for <kvm@archiver.kernel.org>; Mon, 16 Nov 2020 18:28:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388354AbgKPS2Y (ORCPT <rfc822;kvm@archiver.kernel.org>); Mon, 16 Nov 2020 13:28:24 -0500 Received: from mga02.intel.com ([134.134.136.20]:48454 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388334AbgKPS2X (ORCPT <rfc822;kvm@vger.kernel.org>); Mon, 16 Nov 2020 13:28:23 -0500 IronPort-SDR: tQaWWjdg+iibMpXQ/3mw0MCZjFLjhfLhtlnh8Ku1uqH8AEcr94+l7sRgrIngxCEu63STll88mK TCb5Ic7rmaVg== X-IronPort-AV: E=McAfee;i="6000,8403,9807"; a="157819211" X-IronPort-AV: E=Sophos;i="5.77,483,1596524400"; d="scan'208";a="157819211" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Nov 2020 10:28:22 -0800 IronPort-SDR: fUllWTxfGS6xw1Qk7QFGZ+35j+NBk0e18JZ7/O/HxgDyKyrQZJU9vOJ/ILaHc6a6qa9jhGBIeH XcH6xKnmEOnw== X-IronPort-AV: E=Sophos;i="5.77,483,1596524400"; d="scan'208";a="400528377" Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Nov 2020 10:28:21 -0800 From: isaku.yamahata@intel.com To: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, "H . Peter Anvin" <hpa@zytor.com>, Paolo Bonzini <pbonzini@redhat.com>, Vitaly Kuznetsov <vkuznets@redhat.com>, Wanpeng Li <wanpengli@tencent.com>, Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>, x86@kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Sean Christopherson <sean.j.christopherson@intel.com>, Kai Huang <kai.huang@linux.intel.com>, Xiaoyao Li <xiaoyao.li@intel.com> Subject: [RFC PATCH 62/67] KVM: TDX: Load and init TDX-SEAM module during boot Date: Mon, 16 Nov 2020 10:26:47 -0800 Message-Id: <542b02522475c69143e3ac8bcf6014b7db03bd55.1605232743.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com> References: <cover.1605232743.git.isaku.yamahata@intel.com> In-Reply-To: <cover.1605232743.git.isaku.yamahata@intel.com> References: <cover.1605232743.git.isaku.yamahata@intel.com> Precedence: bulk List-ID: <kvm.vger.kernel.org> X-Mailing-List: kvm@vger.kernel.org
Series	KVM: X86: TDX support \| expand [RFC,00/67] KVM: X86: TDX support [RFC,01/67] x86/cpufeatures: Add synthetic feature flag for TDX (in host) [RFC,02/67] x86/msr-index: Define MSR_IA32_MKTME_KEYID_PART used by TDX [RFC,03/67] x86/cpu: Move get_builtin_firmware() common code (from microcode only) [RFC,04/67] KVM: Export kvm_io_bus_read for use by TDX for PV MMIO [RFC,05/67] KVM: Enable hardware before doing arch VM initialization [RFC,06/67] KVM: x86: Split core of hypercall emulation to helper function [RFC,07/67] KVM: x86: Export kvm_mmio tracepoint for use by TDX for PV MMIO [RFC,08/67] KVM: x86/mmu: Zap only leaf SPTEs for deleted/moved memslot by default [RFC,09/67] KVM: Add infrastructure and macro to mark VM as bugged [RFC,10/67] KVM: Export kvm_make_all_cpus_request() for use in marking VMs as bugged [RFC,11/67] KVM: x86: Use KVM_BUG/KVM_BUG_ON to handle bugs that are fatal to the VM [RFC,12/67] KVM: x86/mmu: Mark VM as bugged if page fault returns RET_PF_INVALID [RFC,13/67] KVM: VMX: Explicitly check for hv_remote_flush_tlb when loading pgd() [RFC,14/67] KVM: Add max_vcpus field in common 'struct kvm' [RFC,15/67] KVM: x86: Add vm_type to differentiate legacy VMs from protected VMs [RFC,16/67] KVM: x86: Hoist kvm_dirty_regs check out of sync_regs() [RFC,17/67] KVM: x86: Introduce "protected guest" concept and block disallowed ioctls [RFC,18/67] KVM: x86: Add per-VM flag to disable direct IRQ injection [RFC,19/67] KVM: x86: Add flag to disallow #MC injection / KVM_X86_SETUP_MCE [RFC,20/67] KVM: x86: Make KVM_CAP_X86_SMM a per-VM capability [RFC,21/67] KVM: x86: Add flag to mark TSC as immutable (for TDX) [RFC,22/67] KVM: Add per-VM flag to mark read-only memory as unsupported [RFC,23/67] KVM: Add per-VM flag to disable dirty logging of memslots for TDs [RFC,24/67] KVM: x86: Add per-VM flag to disable in-kernel I/O APIC and level routes [RFC,25/67] KVM: x86: Allow host-initiated WRMSR to set X2APIC regardless of CPUID [RFC,26/67] KVM: x86: Add kvm_x86_ops .cache_gprs() and .flush_gprs() [RFC,27/67] KVM: x86: Add support for vCPU and device-scoped KVM_MEMORY_ENCRYPT_OP [RFC,28/67] KVM: x86: Introduce vm_teardown() hook in kvm_arch_vm_destroy() [RFC,29/67] KVM: x86: Add a switch_db_regs flag to handle TDX's auto-switched behavior [RFC,30/67] KVM: x86: Check for pending APICv interrupt in kvm_vcpu_has_events() [RFC,31/67] KVM: x86: Add option to force LAPIC expiration wait [RFC,32/67] KVM: x86: Add guest_supported_xss placholder [RFC,33/67] KVM: Export kvm_is_reserved_pfn() for use by TDX [RFC,34/67] KVM: x86: Add infrastructure for stolen GPA bits [RFC,35/67] KVM: x86/mmu: Explicitly check for MMIO spte in fast page fault [RFC,36/67] KVM: x86/mmu: Track shadow MMIO value on a per-VM basis [RFC,37/67] KVM: x86/mmu: Ignore bits 63 and 62 when checking for "present" SPTEs [RFC,38/67] KVM: x86/mmu: Allow non-zero init value for shadow PTE [RFC,39/67] KVM: x86/mmu: Refactor shadow walk in __direct_map() to reduce indentation [RFC,40/67] KVM: x86/mmu: Return old SPTE from mmu_spte_clear_track_bits() [RFC,41/67] KVM: x86/mmu: Frame in support for private/inaccessible shadow pages [RFC,42/67] KVM: x86/mmu: Move 'pfn' variable to caller of direct_page_fault() [RFC,43/67] KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX [RFC,44/67] KVM: VMX: Modify NMI and INTR handlers to take intr_info as param [RFC,45/67] KVM: VMX: Move NMI/exception handler to common helper [RFC,46/67] KVM: VMX: Split out guts of EPT violation to common/exposed function [RFC,47/67] KVM: VMX: Define EPT Violation architectural bits [RFC,48/67] KVM: VMX: Define VMCS encodings for shared EPT pointer [RFC,49/67] KVM: VMX: Add 'main.c' to wrap VMX and TDX [RFC,50/67] KVM: VMX: Move setting of EPT MMU masks to common VT-x code [RFC,51/67] KVM: VMX: Move register caching logic to common code [RFC,52/67] KVM: TDX: Add TDX "architectural" error codes [RFC,53/67] KVM: TDX: Add architectural definitions for structures and values [RFC,54/67] KVM: TDX: Define TDCALL exit reason [RFC,55/67] KVM: TDX: Add SEAMRR related MSRs macro definition [RFC,56/67] KVM: TDX: Add macro framework to wrap TDX SEAMCALLs [RFC,57/67] KVM: TDX: Stub in tdx.h with structs, accessors, and VMCS helpers [RFC,58/67] KVM: VMX: Add macro framework to read/write VMCS for VMs and TDs [RFC,59/67] KVM: VMX: Move AR_BYTES encoder/decoder helpers to common.h [RFC,60/67] KVM: VMX: MOVE GDT and IDT accessors to common code [RFC,61/67] KVM: VMX: Move .get_interrupt_shadow() implementation to common VMX code [RFC,62/67] KVM: TDX: Load and init TDX-SEAM module during boot [RFC,63/67] cpu/hotplug: Document that TDX also depends on booting CPUs once [RFC,64/67] KVM: TDX: Add "basic" support for building and running Trust Domains [RFC,65/67] KVM: x86: Mark the VM (TD) as bugged if non-coherent DMA is detected [RFC,66/67] fixup! KVM: TDX: Add "basic" support for building and running Trust Domains [RFC,67/67] KVM: X86: not for review: add dummy file for TDX-SEAM module

[RFC,62/67] KVM: TDX: Load and init TDX-SEAM module during boot

Commit Message

Patch