Message ID | 20250201021718.699411-7-seanjc@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | x86/tsc: Try to wrangle PV clocks vs. TSC | expand |
Sean Christopherson <seanjc@google.com> writes: > When running as a TDX guest, explicitly override the TSC frequency > calibration routine with CPUID-based calibration instead of potentially > relying on a hypervisor-controlled PV routine. For TDX guests, CPUID.0x15 > is always emulated by the TDX-Module, i.e. the information from CPUID is > more trustworthy than the information provided by the hypervisor. > > To maintain backwards compatibility with TDX guest kernels that use native > calibration, and because it's the least awful option, retain > native_calibrate_tsc()'s stuffing of the local APIC bus period using the > core crystal frequency. While it's entirely possible for the hypervisor > to emulate the APIC timer at a different frequency than the core crystal > frequency, the commonly accepted interpretation of Intel's SDM is that APIC > timer runs at the core crystal frequency when that latter is enumerated via > CPUID: > > The APIC timer frequency will be the processor’s bus clock or core > crystal clock frequency (when TSC/core crystal clock ratio is enumerated > in CPUID leaf 0x15). > > If the hypervisor is malicious and deliberately runs the APIC timer at the > wrong frequency, nothing would stop the hypervisor from modifying the > frequency at any time, i.e. attempting to manually calibrate the frequency > out of paranoia would be futile. > > Deliberately leave the CPU frequency calibration routine as is, since the > TDX-Module doesn't provide any guarantees with respect to CPUID.0x16. Does TDX use kvmclock? If yes, kvmclock would have registered the CPU frequency calibration routine: tsc_register_calibration_routines(kvm_get_tsc_khz, kvm_get_cpu_khz, tsc_properties); so TDX will use kvm_get_cpu_khz(), which will either use CPUID.0x16 or PV clock, is this on the expected line ? Regards Nikunj > + > +void __init tdx_tsc_init(void) > +{ > + /* TSC is the only reliable clock in TDX guest */ > + setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE); > + setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ); > + > + /* > + * Override the PV calibration routines (if set) with more trustworthy > + * CPUID-based calibration. The TDX module emulates CPUID, whereas any > + * PV information is provided by the hypervisor. > + */ > + tsc_register_calibration_routines(tdx_get_tsc_khz, NULL); > +}
On Tue, Feb 04, 2025, Nikunj A Dadhania wrote: > Sean Christopherson <seanjc@google.com> writes: > > > When running as a TDX guest, explicitly override the TSC frequency > > calibration routine with CPUID-based calibration instead of potentially > > relying on a hypervisor-controlled PV routine. For TDX guests, CPUID.0x15 > > is always emulated by the TDX-Module, i.e. the information from CPUID is > > more trustworthy than the information provided by the hypervisor. > > > > To maintain backwards compatibility with TDX guest kernels that use native > > calibration, and because it's the least awful option, retain > > native_calibrate_tsc()'s stuffing of the local APIC bus period using the > > core crystal frequency. While it's entirely possible for the hypervisor > > to emulate the APIC timer at a different frequency than the core crystal > > frequency, the commonly accepted interpretation of Intel's SDM is that APIC > > timer runs at the core crystal frequency when that latter is enumerated via > > CPUID: > > > > The APIC timer frequency will be the processor’s bus clock or core > > crystal clock frequency (when TSC/core crystal clock ratio is enumerated > > in CPUID leaf 0x15). > > > > If the hypervisor is malicious and deliberately runs the APIC timer at the > > wrong frequency, nothing would stop the hypervisor from modifying the > > frequency at any time, i.e. attempting to manually calibrate the frequency > > out of paranoia would be futile. > > > > Deliberately leave the CPU frequency calibration routine as is, since the > > TDX-Module doesn't provide any guarantees with respect to CPUID.0x16. > > Does TDX use kvmclock? A TDX guest can. That's up to the host (expose kvmclock) and the guest (enable kvmclock). > If yes, kvmclock would have registered the CPU frequency calibration routine: > > tsc_register_calibration_routines(kvm_get_tsc_khz, kvm_get_cpu_khz, > tsc_properties); > > so TDX will use kvm_get_cpu_khz(), which will either use CPUID.0x16 or > PV clock, is this on the expected line ? What do you mean by "is this on the expected line"? If you are asking "is this intended", then the answer is "yes, working as intended". As above, the TDX-Module doesn't emulate CPUID.0x16, so no matter what, the guest is relying on the untrusted hypervisor to get the CPU frequency. If someone thinks that TDX guests should assume the CPU runs as the same frequency as the TSC, a la SNP's Secure TSC, then they are welcome to propose such a change.
Sean Christopherson <seanjc@google.com> writes: > On Tue, Feb 04, 2025, Nikunj A Dadhania wrote: >> Sean Christopherson <seanjc@google.com> writes: >> >> > When running as a TDX guest, explicitly override the TSC frequency >> > calibration routine with CPUID-based calibration instead of potentially >> > relying on a hypervisor-controlled PV routine. For TDX guests, CPUID.0x15 >> > is always emulated by the TDX-Module, i.e. the information from CPUID is >> > more trustworthy than the information provided by the hypervisor. >> > >> > To maintain backwards compatibility with TDX guest kernels that use native >> > calibration, and because it's the least awful option, retain >> > native_calibrate_tsc()'s stuffing of the local APIC bus period using the >> > core crystal frequency. While it's entirely possible for the hypervisor >> > to emulate the APIC timer at a different frequency than the core crystal >> > frequency, the commonly accepted interpretation of Intel's SDM is that APIC >> > timer runs at the core crystal frequency when that latter is enumerated via >> > CPUID: >> > >> > The APIC timer frequency will be the processor’s bus clock or core >> > crystal clock frequency (when TSC/core crystal clock ratio is enumerated >> > in CPUID leaf 0x15). >> > >> > If the hypervisor is malicious and deliberately runs the APIC timer at the >> > wrong frequency, nothing would stop the hypervisor from modifying the >> > frequency at any time, i.e. attempting to manually calibrate the frequency >> > out of paranoia would be futile. >> > >> > Deliberately leave the CPU frequency calibration routine as is, since the >> > TDX-Module doesn't provide any guarantees with respect to CPUID.0x16. >> >> Does TDX use kvmclock? > > A TDX guest can. That's up to the host (expose kvmclock) and the guest (enable > kvmclock). > >> If yes, kvmclock would have registered the CPU frequency calibration routine: >> >> tsc_register_calibration_routines(kvm_get_tsc_khz, kvm_get_cpu_khz, >> tsc_properties); >> >> so TDX will use kvm_get_cpu_khz(), which will either use CPUID.0x16 or >> PV clock, is this on the expected line ? > > What do you mean by "is this on the expected line"? If you are asking "is this > intended", Yes, that is what I meant. > then the answer is "yes, working as intended". As above, the TDX-Module > doesn't emulate CPUID.0x16, so no matter what, the guest is relying on the untrusted > hypervisor to get the CPU frequency. If someone thinks that TDX guests should > assume the CPU runs as the same frequency as the TSC, a la SNP's Secure TSC, then > they are welcome to propose such a change. Ok, that makes sense. Regards Nikunj
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 32809a06dab4..9d95dc713331 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -8,6 +8,7 @@ #include <linux/export.h> #include <linux/io.h> #include <linux/kexec.h> +#include <asm/apic.h> #include <asm/coco.h> #include <asm/tdx.h> #include <asm/vmx.h> @@ -1063,9 +1064,6 @@ void __init tdx_early_init(void) setup_force_cpu_cap(X86_FEATURE_TDX_GUEST); - /* TSC is the only reliable clock in TDX guest */ - setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE); - cc_vendor = CC_VENDOR_INTEL; /* Configure the TD */ @@ -1122,3 +1120,29 @@ void __init tdx_early_init(void) tdx_announce(); } + +static unsigned long tdx_get_tsc_khz(void) +{ + unsigned int __tsc_khz, crystal_khz; + + if (WARN_ON_ONCE(cpuid_get_tsc_freq(&__tsc_khz, &crystal_khz))) + return 0; + + lapic_timer_period = crystal_khz * 1000 / HZ; + + return __tsc_khz; +} + +void __init tdx_tsc_init(void) +{ + /* TSC is the only reliable clock in TDX guest */ + setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE); + setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ); + + /* + * Override the PV calibration routines (if set) with more trustworthy + * CPUID-based calibration. The TDX module emulates CPUID, whereas any + * PV information is provided by the hypervisor. + */ + tsc_register_calibration_routines(tdx_get_tsc_khz, NULL); +} diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index b4b16dafd55e..621fbdd101e2 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -53,6 +53,7 @@ struct ve_info { #ifdef CONFIG_INTEL_TDX_GUEST void __init tdx_early_init(void); +void __init tdx_tsc_init(void); void tdx_get_ve_info(struct ve_info *ve); @@ -72,6 +73,7 @@ void __init tdx_dump_td_ctls(u64 td_ctls); #else static inline void tdx_early_init(void) { }; +static inline void tdx_tsc_init(void) { } static inline void tdx_safe_halt(void) { }; static inline bool tdx_early_handle_ve(struct pt_regs *regs) { return false; } diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 09ca0cbd4f31..922003059101 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -32,6 +32,7 @@ #include <asm/topology.h> #include <asm/uv/uv.h> #include <asm/sev.h> +#include <asm/tdx.h> unsigned int __read_mostly cpu_khz; /* TSC clocks / usec, not used here */ EXPORT_SYMBOL(cpu_khz); @@ -1514,8 +1515,15 @@ void __init tsc_early_init(void) if (is_early_uv_system()) return; + /* + * Do CoCo specific "secure" TSC initialization *after* hypervisor + * platform initialization so that the secure variant can override the + * hypervisor's PV calibration routine with a more trusted method. + */ if (cc_platform_has(CC_ATTR_GUEST_SNP_SECURE_TSC)) snp_secure_tsc_init(); + else if (boot_cpu_has(X86_FEATURE_TDX_GUEST)) + tdx_tsc_init(); if (!determine_cpu_tsc_frequencies(true)) return;
When running as a TDX guest, explicitly override the TSC frequency calibration routine with CPUID-based calibration instead of potentially relying on a hypervisor-controlled PV routine. For TDX guests, CPUID.0x15 is always emulated by the TDX-Module, i.e. the information from CPUID is more trustworthy than the information provided by the hypervisor. To maintain backwards compatibility with TDX guest kernels that use native calibration, and because it's the least awful option, retain native_calibrate_tsc()'s stuffing of the local APIC bus period using the core crystal frequency. While it's entirely possible for the hypervisor to emulate the APIC timer at a different frequency than the core crystal frequency, the commonly accepted interpretation of Intel's SDM is that APIC timer runs at the core crystal frequency when that latter is enumerated via CPUID: The APIC timer frequency will be the processor’s bus clock or core crystal clock frequency (when TSC/core crystal clock ratio is enumerated in CPUID leaf 0x15). If the hypervisor is malicious and deliberately runs the APIC timer at the wrong frequency, nothing would stop the hypervisor from modifying the frequency at any time, i.e. attempting to manually calibrate the frequency out of paranoia would be futile. Deliberately leave the CPU frequency calibration routine as is, since the TDX-Module doesn't provide any guarantees with respect to CPUID.0x16. Opportunistically add a comment explaining that CoCo TSC initialization needs to come after hypervisor specific initialization. Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/coco/tdx/tdx.c | 30 +++++++++++++++++++++++++++--- arch/x86/include/asm/tdx.h | 2 ++ arch/x86/kernel/tsc.c | 8 ++++++++ 3 files changed, 37 insertions(+), 3 deletions(-)