Message ID | 55093B52.5090904@canonical.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 18/03/2015 09:46, Stefan Bader wrote: > > Regardless of that, I wonder whether the below (this version untested) sound > acceptable for upstream? At least it would make debugging much simpler. :) > > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -2953,8 +2953,11 @@ static __init int adjust_vmx_controls(u32 ctl_min, u32 ct > ctl |= vmx_msr_low; /* bit == 1 in low word ==> must be one */ > > /* Ensure minimum (required) set of control bits are supported. */ > - if (ctl_min & ~ctl) > + if (ctl_min & ~ctl) { > + printk(KERN_ERR "vmx: msr(%08x) does not match requirements. " > + "req=%08x cur=%08x\n", msr, ctl_min, ctl); > return -EIO; > + } > > *result = ctl; > return 0; Yes, this is nice. Maybe -ENODEV. Also, a minimal patch for Ubuntu would probably be: @@ -2850,7 +2851,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) vmx_capability.ept, vmx_capability.vpid); } - min = 0; + min = VM_EXIT_SAVE_DEBUG_CONTROLS; #ifdef CONFIG_X86_64 min |= VM_EXIT_HOST_ADDR_SPACE_SIZE; #endif but I don't think it's a good idea to add it to stable kernels. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 18.03.2015 10:18, Paolo Bonzini wrote: > > > On 18/03/2015 09:46, Stefan Bader wrote: >> >> Regardless of that, I wonder whether the below (this version untested) sound >> acceptable for upstream? At least it would make debugging much simpler. :) >> >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -2953,8 +2953,11 @@ static __init int adjust_vmx_controls(u32 ctl_min, u32 ct >> ctl |= vmx_msr_low; /* bit == 1 in low word ==> must be one */ >> >> /* Ensure minimum (required) set of control bits are supported. */ >> - if (ctl_min & ~ctl) >> + if (ctl_min & ~ctl) { >> + printk(KERN_ERR "vmx: msr(%08x) does not match requirements. " >> + "req=%08x cur=%08x\n", msr, ctl_min, ctl); >> return -EIO; >> + } >> >> *result = ctl; >> return 0; > > Yes, this is nice. Maybe -ENODEV. Maybe, though I did not change that. Just added to give some kind of hint when the module would otherwise fail with just an IO error. > > Also, a minimal patch for Ubuntu would probably be: > > @@ -2850,7 +2851,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) > vmx_capability.ept, vmx_capability.vpid); > } > > - min = 0; > + min = VM_EXIT_SAVE_DEBUG_CONTROLS; > #ifdef CONFIG_X86_64 > min |= VM_EXIT_HOST_ADDR_SPACE_SIZE; > #endif > > but I don't think it's a good idea to add it to stable kernels. Why is that? Because it has a risk of causing the module failing to load on L0 where it did work before? Which would be something I would rather avoid. Generally I think it would be good to have something that can be generally applied. Given the speed that cloud service providers tend to move forward (ok they may not actively push the ability to go nested). -Stefan > > Paolo >
On 18/03/2015 10:59, Stefan Bader wrote: >> @@ -2850,7 +2851,7 @@ static __init int setup_vmcs_config(struct >> vmcs_config *vmcs_conf) vmx_capability.ept, >> vmx_capability.vpid); } >> >> - min = 0; + min = VM_EXIT_SAVE_DEBUG_CONTROLS; #ifdef >> CONFIG_X86_64 min |= VM_EXIT_HOST_ADDR_SPACE_SIZE; #endif >> >> but I don't think it's a good idea to add it to stable kernels. > > Why is that? Because it has a risk of causing the module failing to > load on L0 where it did work before? Because if we wanted to make 3.14 nested VMX stable-ish we would need several more, at least these: KVM: nVMX: fix lifetime issues for vmcs02 KVM: nVMX: clean up nested_release_vmcs12 and code around it KVM: nVMX: Rework interception of IRQs and NMIs KVM: nVMX: Do not inject NMI vmexits when L2 has a pending interrupt KVM: nVMX: Disable preemption while reading from shadow VMCS and for 3.13: KVM: nVMX: Leave VMX mode on clearing of feature control MSR There are also several L2-crash-L1 bugs too in Nadav Amit's patches. Basically, nested VMX was never considered stable-worthy. Perhaps that can change soon---but not retroactively. So I'd rather avoid giving false impressions of the stability of nVMX in 3.14. Even if we considered nVMX stable, I'd _really_ not want to consider the L1<->L2 boundary a secure one for a longer time. > Which would be something I would rather avoid. Generally I think it > would be good to have something that can be generally applied. > Given the speed that cloud service providers tend to move forward > (ok they may not actively push the ability to go nested). And if they did, I'd really not want them to do it with a 3.14 kernel. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 18.03.2015 11:27, Paolo Bonzini wrote: > > > On 18/03/2015 10:59, Stefan Bader wrote: >>> @@ -2850,7 +2851,7 @@ static __init int setup_vmcs_config(struct >>> vmcs_config *vmcs_conf) vmx_capability.ept, >>> vmx_capability.vpid); } >>> >>> - min = 0; + min = VM_EXIT_SAVE_DEBUG_CONTROLS; #ifdef >>> CONFIG_X86_64 min |= VM_EXIT_HOST_ADDR_SPACE_SIZE; #endif >>> >>> but I don't think it's a good idea to add it to stable kernels. >> >> Why is that? Because it has a risk of causing the module failing to >> load on L0 where it did work before? > > Because if we wanted to make 3.14 nested VMX stable-ish we would need > several more, at least these: > > KVM: nVMX: fix lifetime issues for vmcs02 > KVM: nVMX: clean up nested_release_vmcs12 and code around it > KVM: nVMX: Rework interception of IRQs and NMIs > KVM: nVMX: Do not inject NMI vmexits when L2 has a pending > interrupt > KVM: nVMX: Disable preemption while reading from shadow VMCS > > and for 3.13: > > KVM: nVMX: Leave VMX mode on clearing of feature control MSR > > There are also several L2-crash-L1 bugs too in Nadav Amit's patches. > > Basically, nested VMX was never considered stable-worthy. Perhaps > that can change soon---but not retroactively. > > So I'd rather avoid giving false impressions of the stability of nVMX > in 3.14. > > Even if we considered nVMX stable, I'd _really_ not want to consider > the L1<->L2 boundary a secure one for a longer time. > >> Which would be something I would rather avoid. Generally I think it >> would be good to have something that can be generally applied. >> Given the speed that cloud service providers tend to move forward >> (ok they may not actively push the ability to go nested). > > And if they did, I'd really not want them to do it with a 3.14 kernel. 3.14... you are optimistic. :) But thanks a lot for the detailed info. -Stefan > > Paolo >
On 18.03.2015 10:18, Paolo Bonzini wrote: > > > On 18/03/2015 09:46, Stefan Bader wrote: >> >> Regardless of that, I wonder whether the below (this version untested) sound >> acceptable for upstream? At least it would make debugging much simpler. :) >> >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -2953,8 +2953,11 @@ static __init int adjust_vmx_controls(u32 ctl_min, u32 ct >> ctl |= vmx_msr_low; /* bit == 1 in low word ==> must be one */ >> >> /* Ensure minimum (required) set of control bits are supported. */ >> - if (ctl_min & ~ctl) >> + if (ctl_min & ~ctl) { >> + printk(KERN_ERR "vmx: msr(%08x) does not match requirements. " >> + "req=%08x cur=%08x\n", msr, ctl_min, ctl); >> return -EIO; >> + } >> >> *result = ctl; >> return 0; > > Yes, this is nice. Maybe -ENODEV. > > Also, a minimal patch for Ubuntu would probably be: > > @@ -2850,7 +2851,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) > vmx_capability.ept, vmx_capability.vpid); > } > > - min = 0; > + min = VM_EXIT_SAVE_DEBUG_CONTROLS; > #ifdef CONFIG_X86_64 > min |= VM_EXIT_HOST_ADDR_SPACE_SIZE; > #endif > > but I don't think it's a good idea to add it to stable kernels. Sorry, I got a bit confused on my assumptions. While the change above causes guests to fail but the statement to say this is caused by host kernels before this change was against better knowledge and wrong. The actual range was hosts running 3.2 which (maybe not perfect but at least well enough) allowed to use nested vmx for guest kernel <3.15 will break. But running 3.13 on the host has no issues. Comparing the rdmsr values of guests between those two host kernels, I found that on 3.2 the exit control msr was very sparsely initialized. And looking at the changes between 3.2 and 3.13 I found commit 33fb20c39e98b90813b5ab2d9a0d6faa6300caca Author: Jan Kiszka <jan.kiszka@siemens.com> Date: Wed Mar 6 15:44:03 2013 +0100 KVM: nVMX: Fix content of MSR_IA32_VMX_ENTRY/EXIT_CTLS This was added in 3.10. So the range of kernels affected <3.10 back to when nested vmx became somewhat usable. For 3.2 Ben (and obviously us) would be affected. Apart from that, I believe, it is only 3.4 which has an active longterm. At least that change looks safer for stable as it sounds like correcting things and not adding a feature. I was able to cherry-pick that into a 3.2 kernel and then a 3.16 guest successfully can load the kvm-intel module again, of course with the same shortcomings as before. -Stefan > > Paolo >
On 19/03/2015 20:58, Stefan Bader wrote: > This was added in 3.10. So the range of kernels affected <3.10 back > to when nested vmx became somewhat usable. For 3.2 Ben (and > obviously us) would be affected. Apart from that, I believe, it is > only 3.4 which has an active longterm. At least that change looks > safer for stable as it sounds like correcting things and not adding > a feature. I was able to cherry-pick that into a 3.2 kernel and > then a 3.16 guest successfully can load the kvm-intel module again, > of course with the same shortcomings as before. Feel free to backport whatever you want to distro kernels. But I'm going to NACK for stable@ anything that is related to nested virt. The code has changed so much that I simply cannot do a meaningful review of most patches when applied to old codebases. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2953,8 +2953,11 @@ static __init int adjust_vmx_controls(u32 ctl_min, u32 ct ctl |= vmx_msr_low; /* bit == 1 in low word ==> must be one */ /* Ensure minimum (required) set of control bits are supported. */ - if (ctl_min & ~ctl) + if (ctl_min & ~ctl) { + printk(KERN_ERR "vmx: msr(%08x) does not match requirements. " + "req=%08x cur=%08x\n", msr, ctl_min, ctl); return -EIO; + } *result = ctl; return 0;