Message ID | 20220117150542.2176196-1-vkuznets@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN for CPU hotplug | expand |
On Mon, 17 Jan 2022 16:05:38 +0100 Vitaly Kuznetsov <vkuznets@redhat.com> wrote: > Changes since v1: > - Drop the allowlist of items which were allowed to change and just allow > the exact same CPUID data [Sean, Paolo]. Adjust selftest accordingly. > - Drop PATCH1 as the exact same change got merged upstream. > > Recently, KVM made it illegal to change CPUID after KVM_RUN but > unfortunately this change is not fully compatible with existing VMMs. > In particular, QEMU reuses vCPU fds for CPU hotplug after unplug and it > calls KVM_SET_CPUID2. Relax the requirement by implementing an allowing > KVM_SET_CPUID{,2} with the exact same data. Can you check following scenario: * on host that has IA32_TSX_CTRL and TSX enabled (RTM/HLE cpuid bits present) * boot 2 vcpus VM with TSX enabled on VMM side but with tsx=off on kernel CLI that should cause kernel to set MSR_IA32_TSX_CTRL to 3H from initial 0H and clear RTM+HLE bits in CPUID, check that RTM/HLE cpuid it cleared * hotunplug a VCPU and then replug it again if IA32_TSX_CTRL is reset to initial state, that should re-enable RTM/HLE cpuid bits and KVM_SET_CPUID2 might fail due to difference and as Sean pointed out there might be other non constant leafs, where exact match check could leave userspace broken. > Vitaly Kuznetsov (4): > KVM: x86: Do runtime CPUID update before updating > vcpu->arch.cpuid_entries > KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN > KVM: selftests: Rename 'get_cpuid_test' to 'cpuid_test' > KVM: selftests: Test KVM_SET_CPUID2 after KVM_RUN > > arch/x86/kvm/cpuid.c | 67 ++++++++++++++++--- > arch/x86/kvm/x86.c | 19 ------ > tools/testing/selftests/kvm/.gitignore | 2 +- > tools/testing/selftests/kvm/Makefile | 4 +- > .../selftests/kvm/include/x86_64/processor.h | 7 ++ > .../selftests/kvm/lib/x86_64/processor.c | 33 +++++++-- > .../x86_64/{get_cpuid_test.c => cpuid_test.c} | 30 +++++++++ > 7 files changed, 126 insertions(+), 36 deletions(-) > rename tools/testing/selftests/kvm/x86_64/{get_cpuid_test.c => cpuid_test.c} (83%) >
On 1/18/22 15:35, Igor Mammedov wrote: > Can you check following scenario: > * on host that has IA32_TSX_CTRL and TSX enabled (RTM/HLE cpuid bits present) > * boot 2 vcpus VM with TSX enabled on VMM side but with tsx=off on kernel CLI > > that should cause kernel to set MSR_IA32_TSX_CTRL to 3H from initial 0H > and clear RTM+HLE bits in CPUID, check that RTM/HLE cpuid it cleared > > * hotunplug a VCPU and then replug it again > if IA32_TSX_CTRL is reset to initial state, that should re-enable > RTM/HLE cpuid bits and KVM_SET_CPUID2 might fail due to difference > > and as Sean pointed out there might be other non constant leafs, > where exact match check could leave userspace broken. MSR_IA32_TSX_CTRL is handled differently straight during the CPUID call: if (function == 7 && index == 0) { u64 data; if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) && (data & TSX_CTRL_CPUID_CLEAR)) *ebx &= ~(F(RTM) | F(HLE)); } and I think we should redo all or most of kvm_update_cpuid_runtime the same way. Paolo
Igor Mammedov <imammedo@redhat.com> writes: > On Mon, 17 Jan 2022 16:05:38 +0100 > Vitaly Kuznetsov <vkuznets@redhat.com> wrote: > >> Changes since v1: >> - Drop the allowlist of items which were allowed to change and just allow >> the exact same CPUID data [Sean, Paolo]. Adjust selftest accordingly. >> - Drop PATCH1 as the exact same change got merged upstream. >> >> Recently, KVM made it illegal to change CPUID after KVM_RUN but >> unfortunately this change is not fully compatible with existing VMMs. >> In particular, QEMU reuses vCPU fds for CPU hotplug after unplug and it >> calls KVM_SET_CPUID2. Relax the requirement by implementing an allowing >> KVM_SET_CPUID{,2} with the exact same data. > > > Can you check following scenario: > * on host that has IA32_TSX_CTRL and TSX enabled (RTM/HLE cpuid bits present) > * boot 2 vcpus VM with TSX enabled on VMM side but with tsx=off on kernel CLI > > that should cause kernel to set MSR_IA32_TSX_CTRL to 3H from initial 0H > and clear RTM+HLE bits in CPUID, check that RTM/HLE cpuid it > cleared Forgive me my ignorance around (not only) TSX :-) I took a "Intel(R) Xeon(R) CPU E3-1270 v5 @ 3.60GHz" host which seems to have rtm/hle and booted a guest with 'cpu=host' and with (and without) 'tsx=off' on the kernel command line. I decided to check what's is MSR_IA32_TSX_CTRL but I see the following: # rdmsr 0x122 rdmsr: CPU 0 cannot read MSR 0x00000122 I tried adding 'tsx_ctrl' to my QEMU command line but it complains with qemu-system-x86_64: warning: host doesn't support requested feature: MSR(10AH).tsx-ctrl [bit 7] so I think my host is not good enough :-( Also, I've looked at tsx_clear_cpuid() but it actually writes to MSR_TSX_FORCE_ABORT MSR (0x10F), not MSR_IA32_TSX_CTRL so I'm confused. > > * hotunplug a VCPU and then replug it again > if IA32_TSX_CTRL is reset to initial state, that should re-enable > RTM/HLE cpuid bits and KVM_SET_CPUID2 might fail due to difference Could you please teach me this kung-fu, I mean hot to unplug a cold-plugged CPU with QMP? Previoulsy, I only did un-plugging for what I've hotplugged, something like: (QEMU) device_add driver=host-x86_64-cpu socket-id=0 core-id=2 thread-id=0 id=cpu2 {"return": {}} (QEMU) device_del id=cpu2 {"return": {}} What's the ids of the cold-plugged CPUs? > > and as Sean pointed out there might be other non constant leafs, > where exact match check could leave userspace broken. Indeed, while testing your suggestion I've stumbled upon CPUID.(EAX=0x12, ECX=1) (SGX) where we mangle ECX from kvm_vcpu_after_set_cpuid(): best = kvm_find_cpuid_entry(vcpu, 0x12, 0x1); if (best) { best->ecx &= vcpu->arch.guest_supported_xcr0 & 0xffffffff; best->edx &= vcpu->arch.guest_supported_xcr0 >> 32; best->ecx |= XFEATURE_MASK_FPSSE; } In theory, we should just move this to __kvm_update_cpuid_runtime()... I'll take a look tomorrow.
On Tue, Jan 18, 2022, Paolo Bonzini wrote: > On 1/18/22 15:35, Igor Mammedov wrote: > > Can you check following scenario: > > * on host that has IA32_TSX_CTRL and TSX enabled (RTM/HLE cpuid bits present) > > * boot 2 vcpus VM with TSX enabled on VMM side but with tsx=off on kernel CLI > > > > that should cause kernel to set MSR_IA32_TSX_CTRL to 3H from initial 0H > > and clear RTM+HLE bits in CPUID, check that RTM/HLE cpuid it cleared > > > > * hotunplug a VCPU and then replug it again > > if IA32_TSX_CTRL is reset to initial state, that should re-enable > > RTM/HLE cpuid bits and KVM_SET_CPUID2 might fail due to difference > > > > and as Sean pointed out there might be other non constant leafs, > > where exact match check could leave userspace broken. > > > MSR_IA32_TSX_CTRL is handled differently straight during the CPUID call: > > if (function == 7 && index == 0) { > u64 data; > if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) && > (data & TSX_CTRL_CPUID_CLEAR)) > *ebx &= ~(F(RTM) | F(HLE)); > } > > > and I think we should redo all or most of kvm_update_cpuid_runtime > the same way. Please no. xstate_required_size() requires multiple host CPUID calls, and glibc does CPUID.0xD.0x0 and CPUID.0xD.0x1 as part of its initialization, i.e. launching a new userspace process in the guest will see additional performance overhread due to KVM dynamically computing the XSAVE size instead of caching it based on vCPU state. Nested virtualization would be especially painful as every one of those "host" CPUID invocations will trigger and exit from L1=>L0.
On 1/18/22 17:53, Sean Christopherson wrote: >> >> and I think we should redo all or most of kvm_update_cpuid_runtime >> the same way. > Please no. xstate_required_size() requires multiple host CPUID calls, and glibc > does CPUID.0xD.0x0 and CPUID.0xD.0x1 as part of its initialization, i.e. launching > a new userspace process in the guest will see additional performance overhread due > to KVM dynamically computing the XSAVE size instead of caching it based on vCPU > state. Nested virtualization would be especially painful as every one of those > "host" CPUID invocations will trigger and exit from L1=>L0. > Agreed, but all of the required data is by Linux cached in xstate_offsets, xstate_sizes and xstate_comp_offsets; moving xstate_required_size to xstate.c and skipping the host CPUID would probably be a good idea nevertheless. Paolo
On Tue, 18 Jan 2022 17:34:09 +0100 Vitaly Kuznetsov <vkuznets@redhat.com> wrote: > Igor Mammedov <imammedo@redhat.com> writes: > > > On Mon, 17 Jan 2022 16:05:38 +0100 > > Vitaly Kuznetsov <vkuznets@redhat.com> wrote: > > > >> Changes since v1: > >> - Drop the allowlist of items which were allowed to change and just allow > >> the exact same CPUID data [Sean, Paolo]. Adjust selftest accordingly. > >> - Drop PATCH1 as the exact same change got merged upstream. > >> > >> Recently, KVM made it illegal to change CPUID after KVM_RUN but > >> unfortunately this change is not fully compatible with existing VMMs. > >> In particular, QEMU reuses vCPU fds for CPU hotplug after unplug and it > >> calls KVM_SET_CPUID2. Relax the requirement by implementing an allowing > >> KVM_SET_CPUID{,2} with the exact same data. > > > > > > Can you check following scenario: > > * on host that has IA32_TSX_CTRL and TSX enabled (RTM/HLE cpuid bits present) > > * boot 2 vcpus VM with TSX enabled on VMM side but with tsx=off on kernel CLI > > > > that should cause kernel to set MSR_IA32_TSX_CTRL to 3H from initial 0H > > and clear RTM+HLE bits in CPUID, check that RTM/HLE cpuid it > > cleared > > Forgive me my ignorance around (not only) TSX :-) I took a "Intel(R) > Xeon(R) CPU E3-1270 v5 @ 3.60GHz" host which seems to have rtm/hle and > booted a guest with 'cpu=host' and with (and without) 'tsx=off' on the > kernel command line. I decided to check what's is MSR_IA32_TSX_CTRL but > I see the following: > > # rdmsr 0x122 > rdmsr: CPU 0 cannot read MSR 0x00000122 > > I tried adding 'tsx_ctrl' to my QEMU command line but it complains with > qemu-system-x86_64: warning: host doesn't support requested feature: MSR(10AH).tsx-ctrl [bit 7] > > so I think my host is not good enough :-( I've seen it being available on "COOPER LAKE" Xeon > > Also, I've looked at tsx_clear_cpuid() but it actually writes to > MSR_TSX_FORCE_ABORT MSR (0x10F), not MSR_IA32_TSX_CTRL so I'm confused. look at tsx_disable() > > * hotunplug a VCPU and then replug it again > > if IA32_TSX_CTRL is reset to initial state, that should re-enable > > RTM/HLE cpuid bits and KVM_SET_CPUID2 might fail due to difference > > Could you please teach me this kung-fu, I mean hot to unplug a > cold-plugged CPU with QMP? Previoulsy, I only did un-plugging for what > I've hotplugged, something like: > > (QEMU) device_add driver=host-x86_64-cpu socket-id=0 core-id=2 thread-id=0 id=cpu2 > {"return": {}} > (QEMU) device_del id=cpu2 > {"return": {}} > > What's the ids of the cold-plugged CPUs? it doesn't have to be coldplugged, hot(plug/unplug/plug) sequence is fine as well. fyi you can use qom_path with device _del from 'info hotpluggable-cpus' output > > and as Sean pointed out there might be other non constant leafs, > > where exact match check could leave userspace broken. > > Indeed, while testing your suggestion I've stumbled upon > CPUID.(EAX=0x12, ECX=1) (SGX) where we mangle ECX from > kvm_vcpu_after_set_cpuid(): > > best = kvm_find_cpuid_entry(vcpu, 0x12, 0x1); > if (best) { > best->ecx &= vcpu->arch.guest_supported_xcr0 & 0xffffffff; > best->edx &= vcpu->arch.guest_supported_xcr0 >> 32; > best->ecx |= XFEATURE_MASK_FPSSE; > } > > In theory, we should just move this to __kvm_update_cpuid_runtime()... > I'll take a look tomorrow. >