Message ID | 20250106124633.1418972-13-nikunj@amd.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Add Secure TSC support for SNP guests | expand |
On Mon, Jan 06, 2025 at 06:16:32PM +0530, Nikunj A Dadhania wrote: > Although the kernel switches over to stable TSC clocksource instead of > PV clocksource, the scheduler still keeps on using PV clocks as the > sched clock source. This is because KVM, Xen and VMWare, switch the > paravirt sched clock handler in their init routines. HyperV is the > only PV clock source that checks if the platform provides an invariant > TSC and does not switch to PV sched clock. So this below looks like some desperate hackery. Are you doing this because kvm_sched_clock_init() does paravirt_set_sched_clock(kvm_sched_clock_read); ? If so, why don't you simply prevent that on STSC guests?
On 1/8/2025 1:07 AM, Borislav Petkov wrote: > On Mon, Jan 06, 2025 at 06:16:32PM +0530, Nikunj A Dadhania wrote: >> Although the kernel switches over to stable TSC clocksource instead of >> PV clocksource, the scheduler still keeps on using PV clocks as the >> sched clock source. This is because KVM, Xen and VMWare, switch the >> paravirt sched clock handler in their init routines. HyperV is the >> only PV clock source that checks if the platform provides an invariant >> TSC and does not switch to PV sched clock. > > So this below looks like some desperate hackery. Are you doing this because > kvm_sched_clock_init() does > > paravirt_set_sched_clock(kvm_sched_clock_read); > > ? All guests(including STSC) running on KVM or other hypervisors like Xen/VMWare uses the regular TSC when the guest is booted with TscInvariant bit set. But it doesn't switch the sched clock to use TSC which it should have, pointed here[1] by Sean: No, I'm saying that the guest should prefer the raw TSC over kvmclock if the TSC is stable, irrespective of SNP or TDX. This is effectively already done for the timekeeping base (see commit 7539b174aef4 ("x86: kvmguest: use TSC clocksource if invariant TSC is exposed")), but the scheduler still uses kvmclock thanks to the kvm_sched_clock_init() code. > If so, why don't you simply prevent that on STSC guests? This is for all guests running on different hypervisor with stable TSC. And I did try to limit this to KVM here [2], here is Sean's comment to that implementation: > Although the kernel switches over to stable TSC clocksource instead of > kvmclock, the scheduler still keeps on using kvmclock as the sched clock. > This is due to kvm_sched_clock_init() updating the pv_sched_clock() > unconditionally. All PV clocks are affected by this, no? This seems like something that should be handled in common code, which is the point I was trying to make in v11. Regards Nikunj 1. https://lore.kernel.org/kvm/ZurCbP7MesWXQbqZ@google.com/ 2. https://lore.kernel.org/kvm/20241009092850.197575-17-nikunj@amd.com/
On Wed, Jan 08, 2025 at 10:50:13AM +0530, Nikunj A. Dadhania wrote: > All guests(including STSC) running on KVM or other hypervisors like Xen/VMWare > uses the regular TSC when the guest is booted with TscInvariant bit set. But it > doesn't switch the sched clock to use TSC which it should have, pointed here[1] > by Sean: Well, that paravirt_set_sched_clock() thing is there for a reason and all the different guest types which call it perhaps have their reasons. Now, if you think it would make sense to use the TSC in every guest type if the TSC is this and that, then I'd need to see a patch or even a patchset which explains why it is ok to do so on every guest type and then tested on every guest type and then each patch would convert a single guest type and properly explain why. And your patch doesn't have that at all. I know Sean thinks it is a good idea and perhaps it is but without proper justification and acks from the other guest type maintainers, I simply won't take such a patch(-set). And even if it is good, you need to solve it another way - not with this delayed-work hackery. IOW, this switching the paravirt clock crap to use TSC when the TSC is this and that and the HV tells you so, should be a wholly separate patchset with reviews and acks and the usual shebang. If you want to take care only of STSC now, I'd take a patch which is known good and tested properly. And that should happen very soon because the merge window is closing in. Otherwise, there's always another merge window and rushing things doesn't do any good anyway, as I keep preaching... Your call.
On 1/8/2025 1:52 PM, Borislav Petkov wrote: > On Wed, Jan 08, 2025 at 10:50:13AM +0530, Nikunj A. Dadhania wrote: > And your patch doesn't have that at all. I know Sean thinks it is a good idea > and perhaps it is but without proper justification and acks from the other > guest type maintainers, I simply won't take such a patch(-set). I haven't heard anything from other guest type maintainers yet, and they are on CC from v12 onward. > And even if it is good, you need to solve it another way - not with this > delayed-work hackery. > > IOW, this switching the paravirt clock crap to use TSC when the TSC is this > and that and the HV tells you so, should be a wholly separate patchset with > reviews and acks and the usual shebang. I agree, we should handle this as a separate patch series. > If you want to take care only of STSC now, I'd take a patch which is known > good and tested properly. And that should happen very soon because the merge > window is closing in. In that case, let me limit this only to STSC for now, i will send updated patch. Regards, Nikunj
On 1/8/2025 2:04 PM, Nikunj A. Dadhania wrote: > >> If you want to take care only of STSC now, I'd take a patch which is known >> good and tested properly. And that should happen very soon because the merge >> window is closing in. > > In that case, let me limit this only to STSC for now, i will send updated patch. From: Nikunj A Dadhania <nikunj@amd.com> Date: Wed, 8 Jan 2025 14:18:04 +0530 Subject: [PATCH] x86/kvmclock: Prefer TSC as the scheduler clock for Secure TSC guests Although the kernel switches over to a stable TSC clocksource instead of kvmclock, the scheduler still keeps on using kvmclock as the sched clock. This is due to kvm_sched_clock_init() updating the pv_sched_clock() unconditionally. Do not override the PV sched clock when Secure TSC is enabled. Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> --- arch/x86/kernel/kvmclock.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index d8fef3a65a35..82c4743a5e7a 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -324,8 +324,10 @@ void __init kvmclock_init(void) if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT)) pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT); - flags = pvclock_read_flags(&hv_clock_boot[0].pvti); - kvm_sched_clock_init(flags & PVCLOCK_TSC_STABLE_BIT); + if (!cc_platform_has(CC_ATTR_GUEST_SNP_SECURE_TSC)) { + flags = pvclock_read_flags(&hv_clock_boot[0].pvti); + kvm_sched_clock_init(flags & PVCLOCK_TSC_STABLE_BIT); + } x86_platform.calibrate_tsc = kvm_get_tsc_khz; x86_platform.calibrate_cpu = kvm_get_tsc_khz;
On Wed, Jan 08, 2025, Nikunj A. Dadhania wrote: > > On 1/8/2025 2:04 PM, Nikunj A. Dadhania wrote: > > > >> If you want to take care only of STSC now, I'd take a patch which is known > >> good and tested properly. And that should happen very soon because the merge > >> window is closing in. > > > > In that case, let me limit this only to STSC for now, i will send updated patch. > > From: Nikunj A Dadhania <nikunj@amd.com> > Date: Wed, 8 Jan 2025 14:18:04 +0530 > Subject: [PATCH] x86/kvmclock: Prefer TSC as the scheduler clock for Secure > TSC guests > > Although the kernel switches over to a stable TSC clocksource instead of > kvmclock, the scheduler still keeps on using kvmclock as the sched clock. > This is due to kvm_sched_clock_init() updating the pv_sched_clock() > unconditionally. Do not override the PV sched clock when Secure TSC is > enabled. > > Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> > --- > arch/x86/kernel/kvmclock.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c > index d8fef3a65a35..82c4743a5e7a 100644 > --- a/arch/x86/kernel/kvmclock.c > +++ b/arch/x86/kernel/kvmclock.c > @@ -324,8 +324,10 @@ void __init kvmclock_init(void) > if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT)) > pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT); > > - flags = pvclock_read_flags(&hv_clock_boot[0].pvti); > - kvm_sched_clock_init(flags & PVCLOCK_TSC_STABLE_BIT); > + if (!cc_platform_has(CC_ATTR_GUEST_SNP_SECURE_TSC)) { > + flags = pvclock_read_flags(&hv_clock_boot[0].pvti); > + kvm_sched_clock_init(flags & PVCLOCK_TSC_STABLE_BIT); > + } This still misses my point. Ditto for the "x86/tsc: Switch Secure TSC guests away from kvm-clock". I object to singling out kvmclock. It's weird and misleading, because handling only kvmclock suggests that other PV clocks are somehow trusted/ok, when in reality the only reason kvmclock is getting singled out is (presumably) because it's what Nikunj and the other folks enabling KVM SNP test on. What I care most about is having a sane, consistent policy throughout the kernel. E.g. so that a user/reader walks away with an understanding PV clocks are a theoretical host attack vector and so should be avoided when Secure TSC is available. Ideally, if the TSC is the preferred clocksource, then the scheduler will use the TSC and not a PV clock irrespective of STSC. But I 100% agree with Boris that it needs buy-in from other maintainers (including Paolo), because it's entirely possible (likely, even) that there's an angle to scheduling I'm not considering.
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 88d8bfceea04..fe7a0b1b7cfd 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -291,12 +291,26 @@ static void __init upgrade_clock_rating(struct clocksource *tsc_early, tsc->rating = 450; } } + +static void enable_native_sc_work(struct work_struct *work) +{ + pr_info("Using native sched clock\n"); + paravirt_set_sched_clock(native_sched_clock); +} +static DECLARE_DELAYED_WORK(enable_native_sc, enable_native_sc_work); + +static void enable_native_sched_clock(void) +{ + if (!using_native_sched_clock()) + schedule_delayed_work(&enable_native_sc, 0); +} #else u64 sched_clock_noinstr(void) __attribute__((alias("native_sched_clock"))); bool using_native_sched_clock(void) { return true; } static void __init upgrade_clock_rating(struct clocksource *tsc_early, struct clocksource *tsc) { } +static void enable_native_sched_clock(void) { } #endif notrace u64 sched_clock(void) @@ -1176,6 +1190,10 @@ static void tsc_cs_tick_stable(struct clocksource *cs) static int tsc_cs_enable(struct clocksource *cs) { vclocks_set_used(VDSO_CLOCKMODE_TSC); + + /* Restore native_sched_clock() when switching to TSC */ + enable_native_sched_clock(); + return 0; }
Although the kernel switches over to stable TSC clocksource instead of PV clocksource, the scheduler still keeps on using PV clocks as the sched clock source. This is because KVM, Xen and VMWare, switch the paravirt sched clock handler in their init routines. HyperV is the only PV clock source that checks if the platform provides an invariant TSC and does not switch to PV sched clock. When switching back to stable TSC, restore the scheduler clock to native_sched_clock(). As the clock selection happens in the stop machine context, schedule delayed work to update the static_call() Cc: Alexey Makhalov <alexey.makhalov@broadcom.com> Cc: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> --- arch/x86/kernel/tsc.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)