Message ID | 20211208000359.2853257-11-yang.zhong@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | AMX Support in KVM | expand |
First, the MSR should be added to msrs_to_save_all and kvm_cpu_cap_has(X86_FEATURE_XFD) should be checked in kvm_init_msr_list. It seems that RDMSR support is missing, too. More important, please include: - documentation for the new KVM_EXIT_* value - a selftest that explains how userspace should react to it. This is a strong requirement for any new API (the first has been for years; but the latter is also almost always respected these days). This series should not have been submitted without documentation. Also: On 12/8/21 01:03, Yang Zhong wrote: > > + if (!guest_cpuid_has(vcpu, X86_FEATURE_XFD)) > + return 1; This should allow msr->host_initiated always (even if XFD is not part of CPUID). However, if XFD is nonzero and kvm_check_guest_realloc_fpstate returns true, then it should return 1. The selftest should also cover using KVM_GET_MSR/KVM_SET_MSR. > + /* Setting unsupported bits causes #GP */ > + if (~XFEATURE_MASK_USER_DYNAMIC & data) { > + kvm_inject_gp(vcpu, 0); > + break; > + } This should check if (data & ~(XFEATURE_MASK_USER_DYNAMIC & vcpu->arch.guest_supported_xcr0)) instead. Paolo
On Tue, Dec 07 2021 at 19:03, Yang Zhong wrote: > + > + /* > + * Update IA32_XFD to the guest value so #NM can be > + * raised properly in the guest. Instead of directly > + * writing the MSR, call a helper to avoid breaking > + * per-cpu cached value in fpu core. > + */ > + fpregs_lock(); > + current->thread.fpu.fpstate->xfd = data; > + xfd_update_state(current->thread.fpu.fpstate); > + fpregs_unlock(); > + break; Now looking at the actual callsite the previous patch really should be something like the below. Why? It preserves the inline which allows the compiler to generate better code in the other hotpathes and it keeps the FPU internals to the core code. Hmm? Thanks, tglx --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -125,8 +125,10 @@ DECLARE_PER_CPU(struct fpu *, fpu_fpregs /* Process cleanup */ #ifdef CONFIG_X86_64 extern void fpstate_free(struct fpu *fpu); +extern void fpu_update_xfd_state(u64 xfd); #else static inline void fpstate_free(struct fpu *fpu) { } +static inline void fpu_update_xfd_state(u64 xfd) { } #endif /* fpstate-related functions which are exported to KVM */ --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -322,6 +322,19 @@ int fpu_swap_kvm_fpstate(struct fpu_gues } EXPORT_SYMBOL_GPL(fpu_swap_kvm_fpstate); +#ifdef CONFIG_X86_64 +void fpu_update_xfd_state(u64 xfd) +{ + struct fpstate *fps = current->thread.fpu.fpstate; + + fpregs_lock(); + fps->xfd = xfd; + xfd_update_state(fps); + fpregs_unlock(); +} +EXPORT_SYMBOL_GPL(fpu_update_xfd_state); +#endif + void fpu_copy_guest_fpstate_to_uabi(struct fpu_guest *gfpu, void *buf, unsigned int size, u32 pkru) {
On 12/11/2021 12:02 AM, Paolo Bonzini wrote: > > Also: > > On 12/8/21 01:03, Yang Zhong wrote: > > > > + if (!guest_cpuid_has(vcpu, X86_FEATURE_XFD)) > > + return 1; > > This should allow msr->host_initiated always (even if XFD is not part of > CPUID). Thanks Paolo. msr->host_initiated handling would be added in next version. I'd like to ask why always allow msr->host_initiated even if XFD is not part of CPUID, although guest doesn't care that MSR? We found some MSRs (e.g. MSR_AMD64_OSVW_STATUS and MSR_AMD64_OSVW_ID_LENGTH ) are specially handled so would like to know the consideration of allowing msr->host_initiated. if (!msr_info->host_initiated && !guest_cpuid_has(vcpu, X86_FEATURE_XFD)) return 1; However, if XFD is nonzero and kvm_check_guest_realloc_fpstate > returns true, then it should return 1. > If XFD is nonzero, kvm_check_guest_realloc_fpstate() won't return true. So may not need this check here? Thanks, Jing > > Paolo
On 12/13/21 08:51, Liu, Jing2 wrote: > On 12/11/2021 12:02 AM, Paolo Bonzini wrote: >> >> Also: >> >> On 12/8/21 01:03, Yang Zhong wrote: >>> >>> + if (!guest_cpuid_has(vcpu, X86_FEATURE_XFD)) >>> + return 1; >> >> This should allow msr->host_initiated always (even if XFD is not part of >> CPUID). > Thanks Paolo. > > msr->host_initiated handling would be added in next version. > > I'd like to ask why always allow msr->host_initiated even if XFD is not part of > CPUID, although guest doesn't care that MSR? We found some MSRs > (e.g. MSR_AMD64_OSVW_STATUS and MSR_AMD64_OSVW_ID_LENGTH ) > are specially handled so would like to know the consideration of allowing > msr->host_initiated. > > if (!msr_info->host_initiated && !guest_cpuid_has(vcpu, X86_FEATURE_XFD)) > return 1; Because it's simpler if userspace can just take the entire list from KVM_GET_MSR_INDEX_LIST and pass it to KVM_GET/SET_MSR. See for example vcpu_save_state and vcpu_load_state in tools/testing/selftests/kvm/lib/x86_64/processor.c. >> However, if XFD is nonzero and kvm_check_guest_realloc_fpstate >> returns true, then it should return 1. > > If XFD is nonzero, kvm_check_guest_realloc_fpstate() won't return true. So > may not need this check here? It can't for now, because there's a single dynamic feature, but here: + if ((xfd & xcr0) != xcr0) { + u64 request = (xcr0 ^ xfd) & xcr0; + struct fpu_guest *guest_fpu = &vcpu->arch.guest_fpu; + + /* + * If requested features haven't been enabled, update + * the request bitmap and tell the caller to request + * dynamic buffer reallocation. + */ + if ((guest_fpu->user_xfeatures & request) != request) { + vcpu->arch.guest_fpu.realloc_request = request; + return true; + } + } it is certainly possible to return true with nonzero XFD. Paolo
On 12/8/21 01:03, Yang Zhong wrote: > + /* > + * Update IA32_XFD to the guest value so #NM can be > + * raised properly in the guest. Instead of directly > + * writing the MSR, call a helper to avoid breaking > + * per-cpu cached value in fpu core. > + */ > + fpregs_lock(); > + current->thread.fpu.fpstate->xfd = data; This is wrong, it should be written in vcpu->arch.guest_fpu. > + xfd_update_state(current->thread.fpu.fpstate); This is okay though, so that KVM_SET_MSR will not write XFD and WRMSR will. That said, I think xfd_update_state should not have an argument. current->thread.fpu.fpstate->xfd is the only fpstate that should be synced with the xfd_state per-CPU variable. Paolo > + fpregs_unlock();
On Mon, Dec 13 2021 at 16:06, Paolo Bonzini wrote: > On 12/8/21 01:03, Yang Zhong wrote: >> + /* >> + * Update IA32_XFD to the guest value so #NM can be >> + * raised properly in the guest. Instead of directly >> + * writing the MSR, call a helper to avoid breaking >> + * per-cpu cached value in fpu core. >> + */ >> + fpregs_lock(); >> + current->thread.fpu.fpstate->xfd = data; > > This is wrong, it should be written in vcpu->arch.guest_fpu. > >> + xfd_update_state(current->thread.fpu.fpstate); > > This is okay though, so that KVM_SET_MSR will not write XFD and WRMSR > will. > > That said, I think xfd_update_state should not have an argument. > current->thread.fpu.fpstate->xfd is the only fpstate that should be > synced with the xfd_state per-CPU variable. I'm looking into this right now. The whole restore versus runtime thing needs to be handled differently. Thanks, tglx
Paolo, On Mon, Dec 13 2021 at 20:45, Thomas Gleixner wrote: > On Mon, Dec 13 2021 at 16:06, Paolo Bonzini wrote: >> That said, I think xfd_update_state should not have an argument. >> current->thread.fpu.fpstate->xfd is the only fpstate that should be >> synced with the xfd_state per-CPU variable. > > I'm looking into this right now. The whole restore versus runtime thing > needs to be handled differently. We need to look at different things here: 1) XFD MSR write emulation 2) XFD MSR synchronization when write emulation is disabled 3) Guest restore #1 and #2 are in the context of vcpu_run() and vcpu->arch.guest_fpu.fpstate == current->thread.fpu.fpstate while #3 has: vcpu->arch.guest_fpu.fpstate != current->thread.fpu.fpstate #2 is only updating fpstate->xfd and the per CPU shadow. So the state synchronization wants to be something like this: void fpu_sync_guest_xfd_state(void) { struct fpstate *fps = current->thread.fpu.fpstate; lockdep_assert_irqs_disabled(); if (fpu_state_size_dynamic()) { rdmsrl(MSR_IA32_XFD, fps->xfd); __this_cpu_write(xfd_state, fps->xfd); } } EXPORT_SYMBOL_GPL(fpu_sync_guest_xfd_state); No wrmsrl() because the MSR is already up do date. The important part is that fpstate->xfd and the shadow state are updated so that after reenabling preemption the context switch FPU logic works correctly. #1 and #3 can trigger a reallocation of guest_fpu.fpstate and can fail. But this is also true for XSETBV emulation and XCR0 restore. For #1 modifying fps->xfd in the KVM code before calling into the FPU code is just _wrong_ because if the guest removes the XFD restriction then it must be ensured that the buffer is sized correctly _before_ this is updated. For #3 it's not really important, but I still try to wrap my head around the whole picture vs. XCR0. There are two options: 1) Require strict ordering of XFD and XCR0 update to avoid pointless buffer expansion, i.e. XFD before XCR0. Because if XCR0 is updated while guest_fpu->fpstate.xfd is still in init state (0) and XCR0 contains extended features, then the buffer would be expanded because XFD does not mask the extended features out. When XFD is restored with a non-zero value, it's too late already. 2) Ignore buffer expansion up to the point where XSTATE restore happens and evaluate guest XCR0 and guest_fpu->fpstate.xfd there. I'm leaning towards #1 because that means we have exactly _ONE_ place where we need to deal with buffer expansion. If Qemu gets the ordering wrong it wastes memory per vCPU, *shrug*. Thanks, tglx
Hi, Thomas, > From: Thomas Gleixner <tglx@linutronix.de> > Sent: Tuesday, December 14, 2021 5:23 AM > > Paolo, > > On Mon, Dec 13 2021 at 20:45, Thomas Gleixner wrote: > > On Mon, Dec 13 2021 at 16:06, Paolo Bonzini wrote: > >> That said, I think xfd_update_state should not have an argument. > >> current->thread.fpu.fpstate->xfd is the only fpstate that should be > >> synced with the xfd_state per-CPU variable. > > > > I'm looking into this right now. The whole restore versus runtime thing > > needs to be handled differently. > After looking at your series, I think it missed Paolo's comment about changing xfd_update_state() to accept no argument. Thanks Kevin
On Fri, Dec 10, 2021 at 05:02:49PM +0100, Paolo Bonzini wrote: > First, the MSR should be added to msrs_to_save_all and > kvm_cpu_cap_has(X86_FEATURE_XFD) should be checked in > kvm_init_msr_list. > > It seems that RDMSR support is missing, too. > > More important, please include: > > - documentation for the new KVM_EXIT_* value > > - a selftest that explains how userspace should react to it. > > This is a strong requirement for any new API (the first has been for > years; but the latter is also almost always respected these days). > This series should not have been submitted without documentation. > > Also: > > On 12/8/21 01:03, Yang Zhong wrote: > > > >+ if (!guest_cpuid_has(vcpu, X86_FEATURE_XFD)) > >+ return 1; > > This should allow msr->host_initiated always (even if XFD is not > part of CPUID). However, if XFD is nonzero and > kvm_check_guest_realloc_fpstate returns true, then it should return > 1. > > The selftest should also cover using KVM_GET_MSR/KVM_SET_MSR. > Paolo, Seems we do not need new KVM_EXIT_* again from below thomas' new patchset: git://git.kernel.org/pub/scm/linux/kernel/git/people/tglx/devel.git x86/fpu-kvm So the selftest stll need support KVM_GET_MSR/KVM_SET_MSR for MSR_IA32_XFD and MSR_IA32_XFD_ERR? If yes, we only do some read/write test with vcpu_set_msr()/ vcpu_get_msr() from new selftest tool? or do wrmsr from guest side and check this value from selftest side? I checked some msr selftest reference code, tsc_msrs_test.c, which maybe better for this reference. If you have better suggestion, please share it to me. thanks! Yang
On 12/14/21 11:26, Yang Zhong wrote: > Paolo, Seems we do not need new KVM_EXIT_* again from below thomas' new patchset: > git://git.kernel.org/pub/scm/linux/kernel/git/people/tglx/devel.git x86/fpu-kvm > > So the selftest stll need support KVM_GET_MSR/KVM_SET_MSR for MSR_IA32_XFD > and MSR_IA32_XFD_ERR? If yes, we only do some read/write test with vcpu_set_msr()/ > vcpu_get_msr() from new selftest tool? or do wrmsr from guest side and check this value > from selftest side? You can write a test similar to state_test.c to cover XCR0, XFD and the new XSAVE extensions. The test can: - initialize AMX and write a nonzero value to XFD - load a matrix into TMM0 - check that #NM is delivered (search for vm_install_exception_handler) and that XFD_ERR is correct - write 0 to XFD - load again the matrix, and check that #NM is not delivered - store it back into memory - compare it with the original data All of this can be done with a full save&restore after every step (though I suggest that you first get it working without save&restore, the relevant code in state_test.c is easy to identify and comment out). You will have to modify vcpu_load_state, so that it does first KVM_SET_MSRS, then KVM_SET_XCRS, then KVM_SET_XSAVE. See patch below. Paolo > I checked some msr selftest reference code, tsc_msrs_test.c, which maybe better for this > reference. If you have better suggestion, please share it to me. thanks! ------------------ 8< ----------------- From: Paolo Bonzini <pbonzini@redhat.com> Subject: [PATCH] selftest: kvm: Reorder vcpu_load_state steps for AMX For AMX support it is recommended to load XCR0 after XFD, so that KVM does not see XFD=0, XCR=1 for a save state that will eventually be disabled (which would lead to premature allocation of the space required for that save state). It is also required to load XSAVE data after XCR0 and XFD, so that KVM can trigger allocation of the extra space required to store AMX state. Adjust vcpu_load_state to obey these new requirements. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c index 82c39db91369..d805f63f7203 100644 --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c @@ -1157,16 +1157,6 @@ void vcpu_load_state(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_x86_state *s struct vcpu *vcpu = vcpu_find(vm, vcpuid); int r; - r = ioctl(vcpu->fd, KVM_SET_XSAVE, &state->xsave); - TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_XSAVE, r: %i", - r); - - if (kvm_check_cap(KVM_CAP_XCRS)) { - r = ioctl(vcpu->fd, KVM_SET_XCRS, &state->xcrs); - TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_XCRS, r: %i", - r); - } - r = ioctl(vcpu->fd, KVM_SET_SREGS, &state->sregs); TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_SREGS, r: %i", r); @@ -1175,6 +1165,16 @@ void vcpu_load_state(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_x86_state *s TEST_ASSERT(r == state->msrs.nmsrs, "Unexpected result from KVM_SET_MSRS, r: %i (failed at %x)", r, r == state->msrs.nmsrs ? -1 : state->msrs.entries[r].index); + if (kvm_check_cap(KVM_CAP_XCRS)) { + r = ioctl(vcpu->fd, KVM_SET_XCRS, &state->xcrs); + TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_XCRS, r: %i", + r); + } + + r = ioctl(vcpu->fd, KVM_SET_XSAVE, &state->xsave); + TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_XSAVE, r: %i", + r); + r = ioctl(vcpu->fd, KVM_SET_VCPU_EVENTS, &state->events); TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_VCPU_EVENTS, r: %i", r);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 70d86ffbccf7..971d60980d5b 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7141,6 +7141,11 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu) vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4)); } +static void vmx_update_intercept_xfd(struct kvm_vcpu *vcpu) +{ + vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD, MSR_TYPE_R, false); +} + static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); @@ -7181,6 +7186,9 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) } } + if (cpu_feature_enabled(X86_FEATURE_XFD) && guest_cpuid_has(vcpu, X86_FEATURE_XFD)) + vmx_update_intercept_xfd(vcpu); + set_cr4_guest_host_mask(vmx); vmx_write_encls_bitmap(vcpu, NULL); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 91cc6f69a7ca..c83887cb55ee 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1873,6 +1873,16 @@ static int kvm_msr_user_space(struct kvm_vcpu *vcpu, u32 index, { u64 msr_reason = kvm_msr_reason(r); + /* + * MSR emulation may need certain effect triggered in the + * path transitioning to userspace (e.g. fpstate realloction). + * In this case the actual exit reason and completion + * func should have been set by the emulation code before + * this point. + */ + if (r == KVM_MSR_RET_USERSPACE) + return 1; + /* Check if the user wanted to know about this MSR fault */ if (!(vcpu->kvm->arch.user_space_msr_mask & msr_reason)) return 0; @@ -3692,6 +3702,44 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) return 1; vcpu->arch.msr_misc_features_enables = data; break; +#ifdef CONFIG_X86_64 + case MSR_IA32_XFD: + if (!guest_cpuid_has(vcpu, X86_FEATURE_XFD)) + return 1; + + /* Setting unsupported bits causes #GP */ + if (~XFEATURE_MASK_USER_DYNAMIC & data) { + kvm_inject_gp(vcpu, 0); + break; + } + + WARN_ON_ONCE(current->thread.fpu.fpstate != + vcpu->arch.guest_fpu.fpstate); + + /* + * Check if fpstate reallocate is required. If yes, then + * let the fpu core do reallocation and update xfd; + * otherwise, update xfd here. + */ + if (kvm_check_guest_realloc_fpstate(vcpu, data)) { + vcpu->run->exit_reason = KVM_EXIT_FPU_REALLOC; + vcpu->arch.complete_userspace_io = + kvm_skip_emulated_instruction; + return KVM_MSR_RET_USERSPACE; + } + + /* + * Update IA32_XFD to the guest value so #NM can be + * raised properly in the guest. Instead of directly + * writing the MSR, call a helper to avoid breaking + * per-cpu cached value in fpu core. + */ + fpregs_lock(); + current->thread.fpu.fpstate->xfd = data; + xfd_update_state(current->thread.fpu.fpstate); + fpregs_unlock(); + break; +#endif default: if (kvm_pmu_is_valid_msr(vcpu, msr)) return kvm_pmu_set_msr(vcpu, msr_info); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 24a323980146..446ffa8c7804 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -460,6 +460,7 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type); */ #define KVM_MSR_RET_INVALID 2 /* in-kernel MSR emulation #GP condition */ #define KVM_MSR_RET_FILTERED 3 /* #GP due to userspace MSR filter */ +#define KVM_MSR_RET_USERSPACE 4 /* Userspace handling */ #define __cr4_reserved_bits(__cpu_has, __c) \ ({ \ diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 1daa45268de2..0c7b301c7254 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -270,6 +270,7 @@ struct kvm_xen_exit { #define KVM_EXIT_X86_BUS_LOCK 33 #define KVM_EXIT_XEN 34 #define KVM_EXIT_RISCV_SBI 35 +#define KVM_EXIT_FPU_REALLOC 36 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */