diff mbox series

[RFC,v3,53/59] KVM: x86: Add a helper function to restore 4 host MSRs on exit to user space

Message ID 4ede5c987a4ae938a37ab7fe70d5e1d561ee97d4.1637799475.git.isaku.yamahata@intel.com (mailing list archive)
State New, archived
Headers show
Series KVM: X86: TDX support | expand

Commit Message

Isaku Yamahata Nov. 25, 2021, 12:20 a.m. UTC
From: Chao Gao <chao.gao@intel.com>

The TDX module unconditionally reset 4 host MSRs (MSR_SYSCALL_MASK,
MSR_START, MSR_LSTAR, MSR_TSC_AUX) to architectural INIT state on exit from
TDX VM to KVM.  KVM needs to save their values before TD enter and restore
them on exit to userspace.

Reuse current kvm_user_return mechanism and introduce a function to update
cached values and register the user return notifier in this new function.

The later patch will use the helper function to save/restore 4 host MSRs.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/x86.c              | 25 ++++++++++++++++++++-----
 2 files changed, 21 insertions(+), 5 deletions(-)

Comments

Thomas Gleixner Nov. 25, 2021, 8:34 p.m. UTC | #1
On Wed, Nov 24 2021 at 16:20, isaku yamahata wrote:
> From: Chao Gao <chao.gao@intel.com>

> $Subject: KVM: x86: Add a helper function to restore 4 host MSRs on exit to user space

Which user space are you talking about? This subject line is misleading
at best. The unconditional reset is happening when a TDX VM exits
because the SEAM firmware enforces this to prevent unformation leaks.

It also does not matter whether this are four or ten MSR. Fact is that
the SEAM firmware is buggy because it does not save/restore those MSRs.

So the proper subject line is:

   KVM: x86: Add infrastructure to handle MSR corruption by broken TDX firmware 

> The TDX module unconditionally reset 4 host MSRs (MSR_SYSCALL_MASK,
> MSR_START, MSR_LSTAR, MSR_TSC_AUX) to architectural INIT state on exit from
> TDX VM to KVM.  KVM needs to save their values before TD enter and restore
> them on exit to userspace.
>
> Reuse current kvm_user_return mechanism and introduce a function to update
> cached values and register the user return notifier in this new function.
>
> The later patch will use the helper function to save/restore 4 host
> MSRs.

'The later patch ...' is useless information. Of course there will be a
later patch to make use of this which is implied by 'Add infrastructure
...'. Can we please get rid of these useless phrases which have no value
at patch submission time and are even more confusing once the pile is
merged?

Thanks,

        tglx
Chao Gao Nov. 26, 2021, 9:19 a.m. UTC | #2
On Thu, Nov 25, 2021 at 09:34:59PM +0100, Thomas Gleixner wrote:
>On Wed, Nov 24 2021 at 16:20, isaku yamahata wrote:
>> From: Chao Gao <chao.gao@intel.com>
>
>> $Subject: KVM: x86: Add a helper function to restore 4 host MSRs on exit to user space
>
>Which user space are you talking about? This subject line is misleading

Host Ring3.

>at best. The unconditional reset is happening when a TDX VM exits
>because the SEAM firmware enforces this to prevent unformation leaks.

Yes.

>
>It also does not matter whether this are four or ten MSR.

Indeed, the number of MSRs doesn't matter.

>Fact is that
>the SEAM firmware is buggy because it does not save/restore those MSRs.

It is done deliberately. It gives host a chance to do "lazy" restoration.
"lazy" means don't save/restore them on each TD entry/exit but defer
restoration to when it is neccesary e.g., when vCPU is scheduled out or
when kernel is about to return to Ring3.

>
>So the proper subject line is:
>
>   KVM: x86: Add infrastructure to handle MSR corruption by broken TDX firmware

I rewrote the commit message:

    KVM: x86: Allow to update cached values in kvm_user_return_msrs w/o wrmsr

    Several MSRs are constant and only used in userspace. But VMs may have
    different values. KVM uses kvm_set_user_return_msr() to switch to guest's
    values and leverages user return notifier to restore them when kernel is
    to return to userspace. In order to save unnecessary wrmsr, KVM also caches
    the value it wrote to a MSR last time.

    TDX module unconditionally resets some of these MSRs to architectural INIT
    state on TD exit. It makes the cached values in kvm_user_return_msrs are
    inconsistent with values in hardware. This inconsistency needs to be fixed
    otherwise, it may mislead kvm_on_user_return() to skip restoring some MSRs
    to host's values. kvm_set_user_return_msr() can help to correct this case
    but it is not optimal as it always does a wrmsr. So, introduce a variation
    of kvm_set_user_return_msr() to update the cached value but skip the wrmsr.

>
>> The TDX module unconditionally reset 4 host MSRs (MSR_SYSCALL_MASK,
>> MSR_START, MSR_LSTAR, MSR_TSC_AUX) to architectural INIT state on exit from
>> TDX VM to KVM.  KVM needs to save their values before TD enter and restore
>> them on exit to userspace.
>>
>> Reuse current kvm_user_return mechanism and introduce a function to update
>> cached values and register the user return notifier in this new function.
>>
>> The later patch will use the helper function to save/restore 4 host
>> MSRs.
>
>'The later patch ...' is useless information. Of course there will be a
>later patch to make use of this which is implied by 'Add infrastructure
>...'. Can we please get rid of these useless phrases which have no value
>at patch submission time and are even more confusing once the pile is
>merged?

Of course. Will remove all "later patch" phrases.

Thanks
Chao

>
>Thanks,
>
>        tglx
Paolo Bonzini Nov. 26, 2021, 9:40 a.m. UTC | #3
On 11/26/21 10:19, Chao Gao wrote:
> Of course. Will remove all "later patch" phrases.

Chao, see my comment on patch 54, for how to do this.  In this case, 
this patch would be right before the one that enables KVM_RUN on a TDX 
module.

Paolo
Lai Jiangshan Nov. 29, 2021, 7:08 a.m. UTC | #4
On Mon, Nov 29, 2021 at 2:00 AM Chao Gao <chao.gao@intel.com> wrote:
>
> On Thu, Nov 25, 2021 at 09:34:59PM +0100, Thomas Gleixner wrote:
> >On Wed, Nov 24 2021 at 16:20, isaku yamahata wrote:
> >> From: Chao Gao <chao.gao@intel.com>
> >
> >> $Subject: KVM: x86: Add a helper function to restore 4 host MSRs on exit to user space
> >
> >Which user space are you talking about? This subject line is misleading
>
> Host Ring3.
>
> >at best. The unconditional reset is happening when a TDX VM exits
> >because the SEAM firmware enforces this to prevent unformation leaks.
>
> Yes.
>
> >
> >It also does not matter whether this are four or ten MSR.
>
> Indeed, the number of MSRs doesn't matter.
>
> >Fact is that
> >the SEAM firmware is buggy because it does not save/restore those MSRs.
>
> It is done deliberately. It gives host a chance to do "lazy" restoration.
> "lazy" means don't save/restore them on each TD entry/exit but defer
> restoration to when it is neccesary e.g., when vCPU is scheduled out or
> when kernel is about to return to Ring3.
>
> The TDX module unconditionally reset 4 host MSRs (MSR_SYSCALL_MASK,
> MSR_START, MSR_LSTAR, MSR_TSC_AUX) to architectural INIT state on exit from
> TDX VM to KVM.

I did not find the information in intel-tdx-module-1eas.pdf nor
intel-tdx-cpu-architectural-specification.pdf.

Maybe the version I downloaded is outdated.

I guess that the "lazy" restoration mode is not a valid optimization.
The SEAM module should restore it to the original value when it tries
to reset it to architectural INIT state on exit from TDX VM to KVM
since the SEAM module also does it via wrmsr (correct me if not).

If the SEAM module doesn't know "the original value" of the these
MSRs, it would be mere an optimization to save an rdmsr in SEAM.
But there are a lot of other ways for the host to share the values
to SEAM in zero overhead.

Could you provide more information?
Chao Gao Nov. 29, 2021, 9:26 a.m. UTC | #5
On Mon, Nov 29, 2021 at 03:08:39PM +0800, Lai Jiangshan wrote:
>On Mon, Nov 29, 2021 at 2:00 AM Chao Gao <chao.gao@intel.com> wrote:
>>
>> On Thu, Nov 25, 2021 at 09:34:59PM +0100, Thomas Gleixner wrote:
>> >On Wed, Nov 24 2021 at 16:20, isaku yamahata wrote:
>> >> From: Chao Gao <chao.gao@intel.com>
>> >
>> >> $Subject: KVM: x86: Add a helper function to restore 4 host MSRs on exit to user space
>> >
>> >Which user space are you talking about? This subject line is misleading
>>
>> Host Ring3.
>>
>> >at best. The unconditional reset is happening when a TDX VM exits
>> >because the SEAM firmware enforces this to prevent unformation leaks.
>>
>> Yes.
>>
>> >
>> >It also does not matter whether this are four or ten MSR.
>>
>> Indeed, the number of MSRs doesn't matter.
>>
>> >Fact is that
>> >the SEAM firmware is buggy because it does not save/restore those MSRs.
>>
>> It is done deliberately. It gives host a chance to do "lazy" restoration.
>> "lazy" means don't save/restore them on each TD entry/exit but defer
>> restoration to when it is neccesary e.g., when vCPU is scheduled out or
>> when kernel is about to return to Ring3.
>>
>> The TDX module unconditionally reset 4 host MSRs (MSR_SYSCALL_MASK,
>> MSR_START, MSR_LSTAR, MSR_TSC_AUX) to architectural INIT state on exit from
>> TDX VM to KVM.
>
>I did not find the information in intel-tdx-module-1eas.pdf nor
>intel-tdx-cpu-architectural-specification.pdf.
>
>Maybe the version I downloaded is outdated.

Hi Jiangshan,

Please refer to Table 22.162 MSRs that may be Modified by TDH.VP.ENTER,
in section 22.2.40 TDH.VP.ENTER leaf.

>
>I guess that the "lazy" restoration mode is not a valid optimization.
>The SEAM module should restore it to the original value when it tries
>to reset it to architectural INIT state on exit from TDX VM to KVM
>since the SEAM module also does it via wrmsr (correct me if not).

Correct.

>
>If the SEAM module doesn't know "the original value" of the these
>MSRs, it would be mere an optimization to save an rdmsr in SEAM.

Yes. Just a rdmsr is saved in TDX module at the cost of host's
restoring a MSR. If restoration (wrmsr) can be done in a lazy fashion
or even the MSR isn't used by host, some CPU cycles can be saved.

>But there are a lot of other ways for the host to share the values
>to SEAM in zero overhead.

I am not sure. Looks it requests a new interface between host and TDX
module. I guess one problem is how/when to verify host's inputs in case
they are invalid.

Thanks
Chao

>
>Could you provide more information?
Lai Jiangshan Nov. 30, 2021, 4:58 a.m. UTC | #6
On Mon, Nov 29, 2021 at 5:16 PM Chao Gao <chao.gao@intel.com> wrote:

> >I did not find the information in intel-tdx-module-1eas.pdf nor
> >intel-tdx-cpu-architectural-specification.pdf.
> >
> >Maybe the version I downloaded is outdated.
>
> Hi Jiangshan,
>
> Please refer to Table 22.162 MSRs that may be Modified by TDH.VP.ENTER,
> in section 22.2.40 TDH.VP.ENTER leaf.

No file in this link:
https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html
has chapter 22.

>
> >
> >I guess that the "lazy" restoration mode is not a valid optimization.
> >The SEAM module should restore it to the original value when it tries
> >to reset it to architectural INIT state on exit from TDX VM to KVM
> >since the SEAM module also does it via wrmsr (correct me if not).
>
> Correct.
>
> >
> >If the SEAM module doesn't know "the original value" of the these
> >MSRs, it would be mere an optimization to save an rdmsr in SEAM.
>
> Yes. Just a rdmsr is saved in TDX module at the cost of host's
> restoring a MSR. If restoration (wrmsr) can be done in a lazy fashion
> or even the MSR isn't used by host, some CPU cycles can be saved.

But it adds overall overhead because the wrmsr in TDX module
can't be skipped while the unneeded potential overhead of
wrmsr is added in user return path.

If TDX module restores the original MSR value, the host hypervisor
doesn't need to step in.

I think I'm reviewing the code without the code.  It is definitely
wrong design to (ab)use the host's user-return-msr mechanism.

>
> >But there are a lot of other ways for the host to share the values
> >to SEAM in zero overhead.
>
> I am not sure. Looks it requests a new interface between host and TDX
> module. I guess one problem is how/when to verify host's inputs in case
> they are invalid.
>

If the requirement of "lazy restoration" is being added (not seen in the
published document yet), you are changing the ABI between the host and
the TDX module.
Chao Gao Nov. 30, 2021, 8:19 a.m. UTC | #7
On Tue, Nov 30, 2021 at 12:58:47PM +0800, Lai Jiangshan wrote:
>On Mon, Nov 29, 2021 at 5:16 PM Chao Gao <chao.gao@intel.com> wrote:
>
>> >I did not find the information in intel-tdx-module-1eas.pdf nor
>> >intel-tdx-cpu-architectural-specification.pdf.
>> >
>> >Maybe the version I downloaded is outdated.
>>
>> Hi Jiangshan,
>>
>> Please refer to Table 22.162 MSRs that may be Modified by TDH.VP.ENTER,
>> in section 22.2.40 TDH.VP.ENTER leaf.
>
>No file in this link:
>https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html
>has chapter 22.

https://www.intel.com/content/dam/develop/external/us/en/documents/tdx-module-1.0-public-spec-v0.931.pdf

>
>>
>> >
>> >I guess that the "lazy" restoration mode is not a valid optimization.
>> >The SEAM module should restore it to the original value when it tries
>> >to reset it to architectural INIT state on exit from TDX VM to KVM
>> >since the SEAM module also does it via wrmsr (correct me if not).
>>
>> Correct.
>>
>> >
>> >If the SEAM module doesn't know "the original value" of the these
>> >MSRs, it would be mere an optimization to save an rdmsr in SEAM.
>>
>> Yes. Just a rdmsr is saved in TDX module at the cost of host's
>> restoring a MSR. If restoration (wrmsr) can be done in a lazy fashion
>> or even the MSR isn't used by host, some CPU cycles can be saved.
>
>But it adds overall overhead because the wrmsr in TDX module
>can't be skipped while the unneeded potential overhead of
>wrmsr is added in user return path.

It is a trade-off. In this way, TDX module needn't save some host MSRs.
Of course, host has to restore those MSRs. the overall impact depends
on the frequency of restoring: on every exit from TD VM, on vCPU being
scheduled out or on exit to userspace. In the last two cases, it is
supposed to have less overhead than saving host MSRs and restoring them
on every exit from TD VM.

>
>If TDX module restores the original MSR value, the host hypervisor
>doesn't need to step in.
>
>I think I'm reviewing the code without the code.  It is definitely
>wrong design to (ab)use the host's user-return-msr mechanism.

I cannot follow.

>
>>
>> >But there are a lot of other ways for the host to share the values
>> >to SEAM in zero overhead.
>>
>> I am not sure. Looks it requests a new interface between host and TDX
>> module. I guess one problem is how/when to verify host's inputs in case
>> they are invalid.
>>
>
>If the requirement of "lazy restoration" is being added (not seen in the
>published document yet), you are changing the ABI between the host and
>the TDX module.

No, it is already documented in public spec.

TDX module spec just says some MSRs are reset to INIT state by TDX module
(un)conditionally during TD exit. When to restore these MSRs to host's
values is decided by host.
Lai Jiangshan Nov. 30, 2021, 11:18 a.m. UTC | #8
On Tue, Nov 30, 2021 at 4:10 PM Chao Gao <chao.gao@intel.com> wrote:

>
> No, it is already documented in public spec.

In intel-tdx-module-1.5-base-spec-348549001.pdf, page 79
"recoverability hist",  I think it is a "hint".

>
> TDX module spec just says some MSRs are reset to INIT state by TDX module
> (un)conditionally during TD exit. When to restore these MSRs to host's
> values is decided by host.
>


Sigh, it is quite a common solution to reset a register to a default
value here, for example in VMX, the host GDT.limit and host TSS.limit
is reset after VMEXIT, so load_fixmap_gdt() and invalidate_tss_limit()
have to be called in host in vmx_vcpu_put().

Off-topic: Is it a good idea to also put load_fixmap_gdt() and
invalidate_tss_limit() in user-return-notfier?
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a2027bc14e41..17d6e4bcf84b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1910,6 +1910,7 @@  int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
 int kvm_add_user_return_msr(u32 msr);
 int kvm_find_user_return_msr(u32 msr);
 int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask);
+void kvm_user_return_update_cache(unsigned int index, u64 val);
 
 static inline bool kvm_is_supported_user_return_msr(u32 msr)
 {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4dd8ec2641a2..09da7cbedb5f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -418,6 +418,15 @@  static void kvm_user_return_msr_cpu_online(void)
 	}
 }
 
+static void kvm_user_return_register_notifier(struct kvm_user_return_msrs *msrs)
+{
+	if (!msrs->registered) {
+		msrs->urn.on_user_return = kvm_on_user_return;
+		user_return_notifier_register(&msrs->urn);
+		msrs->registered = true;
+	}
+}
+
 int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
 {
 	unsigned int cpu = smp_processor_id();
@@ -432,15 +441,21 @@  int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
 		return 1;
 
 	msrs->values[slot].curr = value;
-	if (!msrs->registered) {
-		msrs->urn.on_user_return = kvm_on_user_return;
-		user_return_notifier_register(&msrs->urn);
-		msrs->registered = true;
-	}
+	kvm_user_return_register_notifier(msrs);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(kvm_set_user_return_msr);
 
+/* Update the cache, "curr", and register the notifier */
+void kvm_user_return_update_cache(unsigned int slot, u64 value)
+{
+	struct kvm_user_return_msrs *msrs = this_cpu_ptr(user_return_msrs);
+
+	msrs->values[slot].curr = value;
+	kvm_user_return_register_notifier(msrs);
+}
+EXPORT_SYMBOL_GPL(kvm_user_return_update_cache);
+
 static void drop_user_return_notifiers(void)
 {
 	unsigned int cpu = smp_processor_id();