Message ID | 1395286089-5406-4-git-send-email-bsd@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 2014-03-20 04:28, Bandan Das wrote: > Some L1 hypervisors such as Xen seem to be calling invept after > vmclear or before vmptrld on L2. In this case, proceed with > falling through and syncing roots as a case where > context wide invalidation can't be supported Can we also base this behaviour on a statement in the SDM? But on first glance, I do not find anything like this over there. Jan > > Signed-off-by: Bandan Das <bsd@redhat.com> > --- > arch/x86/kvm/vmx.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index c707389..b407b3a 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -6435,8 +6435,9 @@ static int handle_invept(struct kvm_vcpu *vcpu) > > switch (type) { > case VMX_EPT_EXTENT_CONTEXT: > - if ((operand.eptp & eptp_mask) != > - (nested_ept_get_cr3(vcpu) & eptp_mask)) > + if (get_vmcs12(vcpu) && > + ((operand.eptp & eptp_mask) != (nested_ept_get_cr3(vcpu) & > + eptp_mask))) > break; > case VMX_EPT_EXTENT_GLOBAL: > kvm_mmu_sync_roots(vcpu); >
Il 20/03/2014 04:28, Bandan Das ha scritto: > Some L1 hypervisors such as Xen seem to be calling invept after > vmclear or before vmptrld on L2. In this case, proceed with > falling through and syncing roots as a case where > context wide invalidation can't be supported > > Signed-off-by: Bandan Das <bsd@redhat.com> > --- > arch/x86/kvm/vmx.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index c707389..b407b3a 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -6435,8 +6435,9 @@ static int handle_invept(struct kvm_vcpu *vcpu) > > switch (type) { > case VMX_EPT_EXTENT_CONTEXT: > - if ((operand.eptp & eptp_mask) != > - (nested_ept_get_cr3(vcpu) & eptp_mask)) > + if (get_vmcs12(vcpu) && > + ((operand.eptp & eptp_mask) != (nested_ept_get_cr3(vcpu) & > + eptp_mask))) > break; > case VMX_EPT_EXTENT_GLOBAL: > kvm_mmu_sync_roots(vcpu); > Please add a /* fall through */ comment as well. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Kiszka <jan.kiszka@siemens.com> writes: > On 2014-03-20 04:28, Bandan Das wrote: >> Some L1 hypervisors such as Xen seem to be calling invept after >> vmclear or before vmptrld on L2. In this case, proceed with >> falling through and syncing roots as a case where >> context wide invalidation can't be supported > > Can we also base this behaviour on a statement in the SDM? But on first > glance, I do not find anything like this over there. The SDM has nothing of this sort explicitly mentioned but 28.3.3.1 "Operations that invalidate Cached Mappings" does mention that the instruction may invalidate mappings associated with other EP4TAs (even in single context). Note that I based this on what we currently do for context invalidation - static inline void ept_sync_context(u64 eptp) { if (enable_ept) { if (cpu_has_vmx_invept_context()) __invept(VMX_EPT_EXTENT_CONTEXT, eptp, 0); else ept_sync_global(); } } Seemed easier and cleaner than having a cached eptp after vmcs12 is long gone :) If you prefer, I can modify the commit message to reflect this. > Jan > >> >> Signed-off-by: Bandan Das <bsd@redhat.com> >> --- >> arch/x86/kvm/vmx.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> index c707389..b407b3a 100644 >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -6435,8 +6435,9 @@ static int handle_invept(struct kvm_vcpu *vcpu) >> >> switch (type) { >> case VMX_EPT_EXTENT_CONTEXT: >> - if ((operand.eptp & eptp_mask) != >> - (nested_ept_get_cr3(vcpu) & eptp_mask)) >> + if (get_vmcs12(vcpu) && >> + ((operand.eptp & eptp_mask) != (nested_ept_get_cr3(vcpu) & >> + eptp_mask))) >> break; >> case VMX_EPT_EXTENT_GLOBAL: >> kvm_mmu_sync_roots(vcpu); >> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Paolo Bonzini <pbonzini@redhat.com> writes: > Il 20/03/2014 04:28, Bandan Das ha scritto: >> Some L1 hypervisors such as Xen seem to be calling invept after >> vmclear or before vmptrld on L2. In this case, proceed with >> falling through and syncing roots as a case where >> context wide invalidation can't be supported >> >> Signed-off-by: Bandan Das <bsd@redhat.com> >> --- >> arch/x86/kvm/vmx.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> index c707389..b407b3a 100644 >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -6435,8 +6435,9 @@ static int handle_invept(struct kvm_vcpu *vcpu) >> >> switch (type) { >> case VMX_EPT_EXTENT_CONTEXT: >> - if ((operand.eptp & eptp_mask) != >> - (nested_ept_get_cr3(vcpu) & eptp_mask)) >> + if (get_vmcs12(vcpu) && >> + ((operand.eptp & eptp_mask) != (nested_ept_get_cr3(vcpu) & >> + eptp_mask))) >> break; >> case VMX_EPT_EXTENT_GLOBAL: >> kvm_mmu_sync_roots(vcpu); >> > > Please add a /* fall through */ comment as well. Thanks for reviewing Paolo, will fix in v2. > Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2014-03-20 21:58, Bandan Das wrote: > Jan Kiszka <jan.kiszka@siemens.com> writes: > >> On 2014-03-20 04:28, Bandan Das wrote: >>> Some L1 hypervisors such as Xen seem to be calling invept after >>> vmclear or before vmptrld on L2. In this case, proceed with >>> falling through and syncing roots as a case where >>> context wide invalidation can't be supported >> >> Can we also base this behaviour on a statement in the SDM? But on first >> glance, I do not find anything like this over there. > > The SDM has nothing of this sort explicitly mentioned but 28.3.3.1 > "Operations that invalidate Cached Mappings" does mention that > the instruction may invalidate mappings associated with other > EP4TAs (even in single context). Yes, "may". So we are implementing undefined behavior in order to please a broken hypervisor that relies on it? Then please state this in the patch and probably also inform Xen about their issue. > > Note that I based this on what we currently do for context invalidation - > static inline void ept_sync_context(u64 eptp) > { > if (enable_ept) { > if (cpu_has_vmx_invept_context()) > __invept(VMX_EPT_EXTENT_CONTEXT, eptp, 0); > else > ept_sync_global(); > } > } Don't get your point. This test is about testing for the CPU support context invalidating, then falling back to global invalidation if there is no support. Jan > > Seemed easier and cleaner than having a cached eptp after vmcs12 is > long gone :) > > If you prefer, I can modify the commit message to reflect this. > >> Jan >> >>> >>> Signed-off-by: Bandan Das <bsd@redhat.com> >>> --- >>> arch/x86/kvm/vmx.c | 5 +++-- >>> 1 file changed, 3 insertions(+), 2 deletions(-) >>> >>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>> index c707389..b407b3a 100644 >>> --- a/arch/x86/kvm/vmx.c >>> +++ b/arch/x86/kvm/vmx.c >>> @@ -6435,8 +6435,9 @@ static int handle_invept(struct kvm_vcpu *vcpu) >>> >>> switch (type) { >>> case VMX_EPT_EXTENT_CONTEXT: >>> - if ((operand.eptp & eptp_mask) != >>> - (nested_ept_get_cr3(vcpu) & eptp_mask)) >>> + if (get_vmcs12(vcpu) && >>> + ((operand.eptp & eptp_mask) != (nested_ept_get_cr3(vcpu) & >>> + eptp_mask))) >>> break; >>> case VMX_EPT_EXTENT_GLOBAL: >>> kvm_mmu_sync_roots(vcpu); >>>
Jan Kiszka <jan.kiszka@web.de> writes: > On 2014-03-20 21:58, Bandan Das wrote: >> Jan Kiszka <jan.kiszka@siemens.com> writes: >> >>> On 2014-03-20 04:28, Bandan Das wrote: >>>> Some L1 hypervisors such as Xen seem to be calling invept after >>>> vmclear or before vmptrld on L2. In this case, proceed with >>>> falling through and syncing roots as a case where >>>> context wide invalidation can't be supported >>> >>> Can we also base this behaviour on a statement in the SDM? But on first >>> glance, I do not find anything like this over there. >> >> The SDM has nothing of this sort explicitly mentioned but 28.3.3.1 >> "Operations that invalidate Cached Mappings" does mention that >> the instruction may invalidate mappings associated with other >> EP4TAs (even in single context). > > Yes, "may". So we are implementing undefined behavior in order to please > a broken hypervisor that relies on it? Then please state this in the > patch and probably also inform Xen about their issue. Why undefined behavior ? We don't do anything specific for the single context invalidation case ianyway .e If the eptp matches what vmcs12 has, single context invalidation does fall though to the global invalidation case already. All this change does is add the "L1 calls invept after vmclear and before vmptrld" to the list of cases to fall though to global invalidation since nvmx doesn't have any knowledge of the current eptp for this case. Or do you think we should rethink this approach ? >> >> Note that I based this on what we currently do for context invalidation - >> static inline void ept_sync_context(u64 eptp) >> { >> if (enable_ept) { >> if (cpu_has_vmx_invept_context()) >> __invept(VMX_EPT_EXTENT_CONTEXT, eptp, 0); >> else >> ept_sync_global(); >> } >> } > > Don't get your point. This test is about testing for the CPU support > context invalidating, then falling back to global invalidation if there > is no support. Sorry, if this was confusing. All I was trying to say is switching to global invalidation if we can't do single context invalidation for some reason is not unusual. Thanks, Bandan > Jan > >> >> Seemed easier and cleaner than having a cached eptp after vmcs12 is >> long gone :) >> >> If you prefer, I can modify the commit message to reflect this. >> >>> Jan >>> >>>> >>>> Signed-off-by: Bandan Das <bsd@redhat.com> >>>> --- >>>> arch/x86/kvm/vmx.c | 5 +++-- >>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>>> index c707389..b407b3a 100644 >>>> --- a/arch/x86/kvm/vmx.c >>>> +++ b/arch/x86/kvm/vmx.c >>>> @@ -6435,8 +6435,9 @@ static int handle_invept(struct kvm_vcpu *vcpu) >>>> >>>> switch (type) { >>>> case VMX_EPT_EXTENT_CONTEXT: >>>> - if ((operand.eptp & eptp_mask) != >>>> - (nested_ept_get_cr3(vcpu) & eptp_mask)) >>>> + if (get_vmcs12(vcpu) && >>>> + ((operand.eptp & eptp_mask) != (nested_ept_get_cr3(vcpu) & >>>> + eptp_mask))) >>>> break; >>>> case VMX_EPT_EXTENT_GLOBAL: >>>> kvm_mmu_sync_roots(vcpu); >>>> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2014-03-22 17:43, Bandan Das wrote: > Jan Kiszka <jan.kiszka@web.de> writes: > >> On 2014-03-20 21:58, Bandan Das wrote: >>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>> >>>> On 2014-03-20 04:28, Bandan Das wrote: >>>>> Some L1 hypervisors such as Xen seem to be calling invept after >>>>> vmclear or before vmptrld on L2. In this case, proceed with >>>>> falling through and syncing roots as a case where >>>>> context wide invalidation can't be supported >>>> >>>> Can we also base this behaviour on a statement in the SDM? But on first >>>> glance, I do not find anything like this over there. >>> >>> The SDM has nothing of this sort explicitly mentioned but 28.3.3.1 >>> "Operations that invalidate Cached Mappings" does mention that >>> the instruction may invalidate mappings associated with other >>> EP4TAs (even in single context). >> >> Yes, "may". So we are implementing undefined behavior in order to please >> a broken hypervisor that relies on it? Then please state this in the >> patch and probably also inform Xen about their issue. > > Why undefined behavior ? We don't do anything specific for > the single context invalidation case ianyway .e If the eptp matches what > vmcs12 has, single context invalidation does fall though to the global > invalidation case already. All this change does is add the "L1 calls > invept after vmclear and before vmptrld" to the list of cases to fall > though to global invalidation since nvmx doesn't have any knowledge of > the current eptp for this case. OK, I think I misunderstood what the guest expects and how we currently achieve this: we do not track the mapping between guest and host eptp, thus cannot properly emulate its behaviour. We therefore need to flush everything. > > Or do you think we should rethink this approach ? Well, I wonder if we should expose single-context invept support at all. I'm also wondering if we are returning proper flags on if ((operand.eptp & eptp_mask) != (nested_ept_get_cr3(vcpu) & eptp_mask)) break; Neither nested_vmx_succeed nor nested_vmx_fail* is called if this condition evaluates to true. Jan
Jan Kiszka <jan.kiszka@web.de> writes: > On 2014-03-22 17:43, Bandan Das wrote: >> Jan Kiszka <jan.kiszka@web.de> writes: >> >>> On 2014-03-20 21:58, Bandan Das wrote: >>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>> >>>>> On 2014-03-20 04:28, Bandan Das wrote: >>>>>> Some L1 hypervisors such as Xen seem to be calling invept after >>>>>> vmclear or before vmptrld on L2. In this case, proceed with >>>>>> falling through and syncing roots as a case where >>>>>> context wide invalidation can't be supported >>>>> >>>>> Can we also base this behaviour on a statement in the SDM? But on first >>>>> glance, I do not find anything like this over there. >>>> >>>> The SDM has nothing of this sort explicitly mentioned but 28.3.3.1 >>>> "Operations that invalidate Cached Mappings" does mention that >>>> the instruction may invalidate mappings associated with other >>>> EP4TAs (even in single context). >>> >>> Yes, "may". So we are implementing undefined behavior in order to please >>> a broken hypervisor that relies on it? Then please state this in the >>> patch and probably also inform Xen about their issue. >> >> Why undefined behavior ? We don't do anything specific for >> the single context invalidation case ianyway .e If the eptp matches what >> vmcs12 has, single context invalidation does fall though to the global >> invalidation case already. All this change does is add the "L1 calls >> invept after vmclear and before vmptrld" to the list of cases to fall >> though to global invalidation since nvmx doesn't have any knowledge of >> the current eptp for this case. > > OK, I think I misunderstood what the guest expects and how we currently > achieve this: we do not track the mapping between guest and host eptp, > thus cannot properly emulate its behaviour. We therefore need to flush > everything. > >> >> Or do you think we should rethink this approach ? > > Well, I wonder if we should expose single-context invept support at all. > > I'm also wondering if we are returning proper flags on > > if ((operand.eptp & eptp_mask) != > (nested_ept_get_cr3(vcpu) & eptp_mask)) > break; That does sound plausible but then we would have to get rid of this little optimization in the code you have quoted above. We would have to flush and sync roots for all invept calls. So, my preference is to keep it. > Neither nested_vmx_succeed nor nested_vmx_fail* is called if this > condition evaluates to true. It appears we should always call nested_vmx_succeed(). If you are ok, I will send a v2. Thanks, Bandan > Jan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2014-03-26 21:22, Bandan Das wrote: > Jan Kiszka <jan.kiszka@web.de> writes: > >> On 2014-03-22 17:43, Bandan Das wrote: >>> Jan Kiszka <jan.kiszka@web.de> writes: >>> >>>> On 2014-03-20 21:58, Bandan Das wrote: >>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>> >>>>>> On 2014-03-20 04:28, Bandan Das wrote: >>>>>>> Some L1 hypervisors such as Xen seem to be calling invept after >>>>>>> vmclear or before vmptrld on L2. In this case, proceed with >>>>>>> falling through and syncing roots as a case where >>>>>>> context wide invalidation can't be supported >>>>>> >>>>>> Can we also base this behaviour on a statement in the SDM? But on first >>>>>> glance, I do not find anything like this over there. >>>>> >>>>> The SDM has nothing of this sort explicitly mentioned but 28.3.3.1 >>>>> "Operations that invalidate Cached Mappings" does mention that >>>>> the instruction may invalidate mappings associated with other >>>>> EP4TAs (even in single context). >>>> >>>> Yes, "may". So we are implementing undefined behavior in order to please >>>> a broken hypervisor that relies on it? Then please state this in the >>>> patch and probably also inform Xen about their issue. >>> >>> Why undefined behavior ? We don't do anything specific for >>> the single context invalidation case ianyway .e If the eptp matches what >>> vmcs12 has, single context invalidation does fall though to the global >>> invalidation case already. All this change does is add the "L1 calls >>> invept after vmclear and before vmptrld" to the list of cases to fall >>> though to global invalidation since nvmx doesn't have any knowledge of >>> the current eptp for this case. >> >> OK, I think I misunderstood what the guest expects and how we currently >> achieve this: we do not track the mapping between guest and host eptp, >> thus cannot properly emulate its behaviour. We therefore need to flush >> everything. >> >>> >>> Or do you think we should rethink this approach ? >> >> Well, I wonder if we should expose single-context invept support at all. >> >> I'm also wondering if we are returning proper flags on >> >> if ((operand.eptp & eptp_mask) != >> (nested_ept_get_cr3(vcpu) & eptp_mask)) >> break; > > That does sound plausible but then we would have to get rid of this > little optimization in the code you have quoted above. We would have > to flush and sync roots for all invept calls. So, my preference is to > keep it. Do you have any numbers on how many per-context invept calls we are able to ignore? At least for KVM this is should be 0 as we only flush on changes to the current EPT structures. Jan > >> Neither nested_vmx_succeed nor nested_vmx_fail* is called if this >> condition evaluates to true. > > It appears we should always call nested_vmx_succeed(). If you are ok, > I will send a v2. > > Thanks, > Bandan > >> Jan
Jan Kiszka <jan.kiszka@web.de> writes: > On 2014-03-26 21:22, Bandan Das wrote: >> Jan Kiszka <jan.kiszka@web.de> writes: >> >>> On 2014-03-22 17:43, Bandan Das wrote: >>>> Jan Kiszka <jan.kiszka@web.de> writes: >>>> >>>>> On 2014-03-20 21:58, Bandan Das wrote: >>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes: >>>>>> >>>>>>> On 2014-03-20 04:28, Bandan Das wrote: >>>>>>>> Some L1 hypervisors such as Xen seem to be calling invept after >>>>>>>> vmclear or before vmptrld on L2. In this case, proceed with >>>>>>>> falling through and syncing roots as a case where >>>>>>>> context wide invalidation can't be supported >>>>>>> >>>>>>> Can we also base this behaviour on a statement in the SDM? But on first >>>>>>> glance, I do not find anything like this over there. >>>>>> >>>>>> The SDM has nothing of this sort explicitly mentioned but 28.3.3.1 >>>>>> "Operations that invalidate Cached Mappings" does mention that >>>>>> the instruction may invalidate mappings associated with other >>>>>> EP4TAs (even in single context). >>>>> >>>>> Yes, "may". So we are implementing undefined behavior in order to please >>>>> a broken hypervisor that relies on it? Then please state this in the >>>>> patch and probably also inform Xen about their issue. >>>> >>>> Why undefined behavior ? We don't do anything specific for >>>> the single context invalidation case ianyway .e If the eptp matches what >>>> vmcs12 has, single context invalidation does fall though to the global >>>> invalidation case already. All this change does is add the "L1 calls >>>> invept after vmclear and before vmptrld" to the list of cases to fall >>>> though to global invalidation since nvmx doesn't have any knowledge of >>>> the current eptp for this case. >>> >>> OK, I think I misunderstood what the guest expects and how we currently >>> achieve this: we do not track the mapping between guest and host eptp, >>> thus cannot properly emulate its behaviour. We therefore need to flush >>> everything. >>> >>>> >>>> Or do you think we should rethink this approach ? >>> >>> Well, I wonder if we should expose single-context invept support at all. >>> >>> I'm also wondering if we are returning proper flags on >>> >>> if ((operand.eptp & eptp_mask) != >>> (nested_ept_get_cr3(vcpu) & eptp_mask)) >>> break; >> >> That does sound plausible but then we would have to get rid of this >> little optimization in the code you have quoted above. We would have >> to flush and sync roots for all invept calls. So, my preference is to >> keep it. > > Do you have any numbers on how many per-context invept calls we are able > to ignore? At least for KVM this is should be 0 as we only flush on > changes to the current EPT structures. Just instrumented a Debian L2 bootup under L1 Xen and there were 0 per-context invept calls (with a valid vmcs12). I will spin out a version for testing where we don't advertise single context invept and if everything works, put a comment as to why we are removing support. Thanks, Bandan > Jan > >> >>> Neither nested_vmx_succeed nor nested_vmx_fail* is called if this >>> condition evaluates to true. >> >> It appears we should always call nested_vmx_succeed(). If you are ok, >> I will send a v2. >> >> Thanks, >> Bandan >> >>> Jan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index c707389..b407b3a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6435,8 +6435,9 @@ static int handle_invept(struct kvm_vcpu *vcpu) switch (type) { case VMX_EPT_EXTENT_CONTEXT: - if ((operand.eptp & eptp_mask) != - (nested_ept_get_cr3(vcpu) & eptp_mask)) + if (get_vmcs12(vcpu) && + ((operand.eptp & eptp_mask) != (nested_ept_get_cr3(vcpu) & + eptp_mask))) break; case VMX_EPT_EXTENT_GLOBAL: kvm_mmu_sync_roots(vcpu);
Some L1 hypervisors such as Xen seem to be calling invept after vmclear or before vmptrld on L2. In this case, proceed with falling through and syncing roots as a case where context wide invalidation can't be supported Signed-off-by: Bandan Das <bsd@redhat.com> --- arch/x86/kvm/vmx.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)