Message ID | 1529710522-28315-23-git-send-email-liran.alon@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sat, Jun 23, 2018 at 02:35:22AM +0300, Liran Alon wrote: > From: Jim Mattson <jmattson@google.com> > > Haswell and later hardware masks off the irrelevant bits if the guest > access rights fields on vmwrite, storing only the 13 relevant > bits. This masking isn't documented anywhere. When using VMCS > shadowing for these fields, these fields will be masked when written > to the shadow vmcs12. For consistency, mask these fields when the > vmwrite is handled in software. Is there software that actively relies on this hardware behavior or is this more of a cosmetic issue, e.g. it causes failures in a fuzzer or memory checker of some form? Not that it really matters, but I think it'd be more correct to model this behavior in the VMREAD path. > Reviewed-by: Liran Alon <liran.alon@oracle.com> > Signed-off-by: Jim Mattson <jmattson@google.com> > --- > arch/x86/kvm/vmx.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 4b63d6bae6bd..262029c6e3e5 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -8263,6 +8263,13 @@ static inline int vmcs12_write_any(struct vmcs12 *vmcs12, > if (offset < 0) > return offset; > > + /* > + * For compatibility with Haswell and later, mask off the > + * irrelevant bits of the guest access rights fields. > + */ > + if (field >= GUEST_ES_AR_BYTES && field <= GUEST_TR_AR_BYTES) > + field_value &= 0x1f0ff; > + > switch (vmcs_field_width(field)) { > case VMCS_FIELD_WIDTH_U16: > *(u16 *)p = field_value; > -- > 1.9.1 >
On 28/06/18 00:41, Sean Christopherson wrote: > On Sat, Jun 23, 2018 at 02:35:22AM +0300, Liran Alon wrote: >> From: Jim Mattson <jmattson@google.com> >> >> Haswell and later hardware masks off the irrelevant bits if the guest >> access rights fields on vmwrite, storing only the 13 relevant >> bits. This masking isn't documented anywhere. When using VMCS >> shadowing for these fields, these fields will be masked when written >> to the shadow vmcs12. For consistency, mask these fields when the >> vmwrite is handled in software. > Is there software that actively relies on this hardware behavior > or is this more of a cosmetic issue, e.g. it causes failures in a > fuzzer or memory checker of some form? > > Not that it really matters, but I think it'd be more correct to > model this behavior in the VMREAD path. > Because you think that CPU writes full content to access right field in VMCS memory but VMREAD is the one that actually mask the content?
On Wed, Jun 27, 2018 at 3:13 PM, Liran Alon <liran.alon@oracle.com> wrote: > > On 28/06/18 00:41, Sean Christopherson wrote: >> >> On Sat, Jun 23, 2018 at 02:35:22AM +0300, Liran Alon wrote: >>> >>> From: Jim Mattson <jmattson@google.com> >>> >>> Haswell and later hardware masks off the irrelevant bits if the guest >>> access rights fields on vmwrite, storing only the 13 relevant >>> bits. This masking isn't documented anywhere. When using VMCS >>> shadowing for these fields, these fields will be masked when written >>> to the shadow vmcs12. For consistency, mask these fields when the >>> vmwrite is handled in software. >> >> Is there software that actively relies on this hardware behavior >> or is this more of a cosmetic issue, e.g. it causes failures in a >> fuzzer or memory checker of some form? >> >> Not that it really matters, but I think it'd be more correct to >> model this behavior in the VMREAD path. >> > > > Because you think that CPU writes full content to access right field in VMCS > memory > but VMREAD is the one that actually mask the content? VMWRITE definitely does not write the full content to memory. The VMCS format for Haswell and later packs the access rights fields. (You can verify this by doing a VMCLEAR.)
On Wed, Jun 27, 2018 at 03:19:09PM -0700, Jim Mattson wrote: > On Wed, Jun 27, 2018 at 3:13 PM, Liran Alon <liran.alon@oracle.com> wrote: > > > > On 28/06/18 00:41, Sean Christopherson wrote: > >> > >> On Sat, Jun 23, 2018 at 02:35:22AM +0300, Liran Alon wrote: > >>> > >>> From: Jim Mattson <jmattson@google.com> > >>> > >>> Haswell and later hardware masks off the irrelevant bits if the guest > >>> access rights fields on vmwrite, storing only the 13 relevant > >>> bits. This masking isn't documented anywhere. When using VMCS > >>> shadowing for these fields, these fields will be masked when written > >>> to the shadow vmcs12. For consistency, mask these fields when the > >>> vmwrite is handled in software. > >> > >> Is there software that actively relies on this hardware behavior > >> or is this more of a cosmetic issue, e.g. it causes failures in a > >> fuzzer or memory checker of some form? > >> > >> Not that it really matters, but I think it'd be more correct to > >> model this behavior in the VMREAD path. > >> > > > > > > Because you think that CPU writes full content to access right field in VMCS > > memory > > but VMREAD is the one that actually mask the content? > > VMWRITE definitely does not write the full content to memory. The VMCS > format for Haswell and later packs the access rights fields. (You can > verify this by doing a VMCLEAR.) My thought process regarding VMREAD vs VMWRITE was that the behavior would only be architecturally visible on VMREAD, but I forgot that the Haswell behavior is also visible at VMEnter since dropping the reserved bits makes it impossible to generate a consistency check due to said bits being non-zero. Back to my question, what is the motivation for this change? If this patch was prompted by a test failure then I think it makes more sense to "fix" the test, i.e. eliminate the AR byte test case. Dropping the AR byte reserved bits unconditionally makes KVM consistent with Haswell but inconsistent with respect to prior CPUs since this code path also affects non-shadowed VMWRITE. As for documentation, I'll submit an SDM bug to try and get a blurb added stating that the reserved bits may be dropped on some CPUs.
On Thu, Jun 28, 2018 at 7:28 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > Back to my question, what is the motivation for this change? The motivation was consistent behavior between software emulation and hardware behavior, since they could potentially be mixed in the execution of the same VM (particularly once live migration gets checked in, and the VM migrates between pre- and post- Haswell systems). > Dropping the AR byte reserved bits unconditionally makes KVM consistent with > Haswell but inconsistent with respect to prior CPUs since this code > path also affects non-shadowed VMWRITE. True, but on older CPUs, VMWRITE is only possible through software emulation, so there isn't any need to be consistent with hardware. This change does introduce an inconsistency with older versions of kvm, but since live migration support for nested VMX isn't checked in yet, that shouldn't be a problem.
On Thu, Jun 28, 2018 at 09:03:29AM -0700, Jim Mattson wrote: > On Thu, Jun 28, 2018 at 7:28 AM, Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > > Back to my question, what is the motivation for this change? > > The motivation was consistent behavior between software emulation and > hardware behavior, since they could potentially be mixed in the > execution of the same VM (particularly once live migration gets > checked in, and the VM migrates between pre- and post- Haswell > systems). > > > Dropping the AR byte reserved bits unconditionally makes KVM consistent with > > Haswell but inconsistent with respect to prior CPUs since this code > > path also affects non-shadowed VMWRITE. > > True, but on older CPUs, VMWRITE is only possible through software > emulation, so there isn't any need to be consistent with hardware. > This change does introduce an inconsistency with older versions of > kvm, but since live migration support for nested VMX isn't checked in > yet, that shouldn't be a problem. Gotcha. I think it's worth updating the comment and/or commit message to explain the intended behavior in depth, otherwise people like me might think we're arbitrarily choosing to emulate Haswell+. E.g.: /* * Some Intel CPUs drop the reserved bits of the AR byte * fields on VMWRITE, which can lead to divergent behavior * after migrating a VM. E.g. a nested VMRESUME might * unexpectedly fail after migrating from hardware that * drops the bits to hardware that preserves the bits and * thus performs the reserved bits consistency check. * Explicitly clear the reserved bits to ensure consistent * KVM behavior regardless of the underlying hardware. */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 4b63d6bae6bd..262029c6e3e5 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8263,6 +8263,13 @@ static inline int vmcs12_write_any(struct vmcs12 *vmcs12, if (offset < 0) return offset; + /* + * For compatibility with Haswell and later, mask off the + * irrelevant bits of the guest access rights fields. + */ + if (field >= GUEST_ES_AR_BYTES && field <= GUEST_TR_AR_BYTES) + field_value &= 0x1f0ff; + switch (vmcs_field_width(field)) { case VMCS_FIELD_WIDTH_U16: *(u16 *)p = field_value;