diff mbox

[22/22] KVM: nVMX: Mask guest access rights fields

Message ID 1529710522-28315-23-git-send-email-liran.alon@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Liran Alon June 22, 2018, 11:35 p.m. UTC
From: Jim Mattson <jmattson@google.com>

Haswell and later hardware masks off the irrelevant bits if the guest
access rights fields on vmwrite, storing only the 13 relevant
bits. This masking isn't documented anywhere. When using VMCS
shadowing for these fields, these fields will be masked when written
to the shadow vmcs12. For consistency, mask these fields when the
vmwrite is handled in software.

Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/vmx.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Sean Christopherson June 27, 2018, 9:41 p.m. UTC | #1
On Sat, Jun 23, 2018 at 02:35:22AM +0300, Liran Alon wrote:
> From: Jim Mattson <jmattson@google.com>
> 
> Haswell and later hardware masks off the irrelevant bits if the guest
> access rights fields on vmwrite, storing only the 13 relevant
> bits. This masking isn't documented anywhere. When using VMCS
> shadowing for these fields, these fields will be masked when written
> to the shadow vmcs12. For consistency, mask these fields when the
> vmwrite is handled in software.

Is there software that actively relies on this hardware behavior
or is this more of a cosmetic issue, e.g. it causes failures in a
fuzzer or memory checker of some form?

Not that it really matters, but I think it'd be more correct to
model this behavior in the VMREAD path.  
 
> Reviewed-by: Liran Alon <liran.alon@oracle.com>
> Signed-off-by: Jim Mattson <jmattson@google.com>
> ---
>  arch/x86/kvm/vmx.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 4b63d6bae6bd..262029c6e3e5 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -8263,6 +8263,13 @@ static inline int vmcs12_write_any(struct vmcs12 *vmcs12,
>  	if (offset < 0)
>  		return offset;
>  
> +	/*
> +	 * For compatibility with Haswell and later, mask off the
> +	 * irrelevant bits of the guest access rights fields.
> +	 */
> +	if (field >= GUEST_ES_AR_BYTES && field <= GUEST_TR_AR_BYTES)
> +		field_value &= 0x1f0ff;
> +
>  	switch (vmcs_field_width(field)) {
>  	case VMCS_FIELD_WIDTH_U16:
>  		*(u16 *)p = field_value;
> -- 
> 1.9.1
>
Liran Alon June 27, 2018, 10:13 p.m. UTC | #2
On 28/06/18 00:41, Sean Christopherson wrote:
> On Sat, Jun 23, 2018 at 02:35:22AM +0300, Liran Alon wrote:
>> From: Jim Mattson <jmattson@google.com>
>>
>> Haswell and later hardware masks off the irrelevant bits if the guest
>> access rights fields on vmwrite, storing only the 13 relevant
>> bits. This masking isn't documented anywhere. When using VMCS
>> shadowing for these fields, these fields will be masked when written
>> to the shadow vmcs12. For consistency, mask these fields when the
>> vmwrite is handled in software.
> Is there software that actively relies on this hardware behavior
> or is this more of a cosmetic issue, e.g. it causes failures in a
> fuzzer or memory checker of some form?
>
> Not that it really matters, but I think it'd be more correct to
> model this behavior in the VMREAD path.
>   

Because you think that CPU writes full content to access right field in 
VMCS memory
but VMREAD is the one that actually mask the content?
Jim Mattson June 27, 2018, 10:19 p.m. UTC | #3
On Wed, Jun 27, 2018 at 3:13 PM, Liran Alon <liran.alon@oracle.com> wrote:
>
> On 28/06/18 00:41, Sean Christopherson wrote:
>>
>> On Sat, Jun 23, 2018 at 02:35:22AM +0300, Liran Alon wrote:
>>>
>>> From: Jim Mattson <jmattson@google.com>
>>>
>>> Haswell and later hardware masks off the irrelevant bits if the guest
>>> access rights fields on vmwrite, storing only the 13 relevant
>>> bits. This masking isn't documented anywhere. When using VMCS
>>> shadowing for these fields, these fields will be masked when written
>>> to the shadow vmcs12. For consistency, mask these fields when the
>>> vmwrite is handled in software.
>>
>> Is there software that actively relies on this hardware behavior
>> or is this more of a cosmetic issue, e.g. it causes failures in a
>> fuzzer or memory checker of some form?
>>
>> Not that it really matters, but I think it'd be more correct to
>> model this behavior in the VMREAD path.
>>
>
>
> Because you think that CPU writes full content to access right field in VMCS
> memory
> but VMREAD is the one that actually mask the content?

VMWRITE definitely does not write the full content to memory. The VMCS
format for Haswell and later packs the access rights fields. (You can
verify this by doing a VMCLEAR.)
Sean Christopherson June 28, 2018, 2:28 p.m. UTC | #4
On Wed, Jun 27, 2018 at 03:19:09PM -0700, Jim Mattson wrote:
> On Wed, Jun 27, 2018 at 3:13 PM, Liran Alon <liran.alon@oracle.com> wrote:
> >
> > On 28/06/18 00:41, Sean Christopherson wrote:
> >>
> >> On Sat, Jun 23, 2018 at 02:35:22AM +0300, Liran Alon wrote:
> >>>
> >>> From: Jim Mattson <jmattson@google.com>
> >>>
> >>> Haswell and later hardware masks off the irrelevant bits if the guest
> >>> access rights fields on vmwrite, storing only the 13 relevant
> >>> bits. This masking isn't documented anywhere. When using VMCS
> >>> shadowing for these fields, these fields will be masked when written
> >>> to the shadow vmcs12. For consistency, mask these fields when the
> >>> vmwrite is handled in software.
> >>
> >> Is there software that actively relies on this hardware behavior
> >> or is this more of a cosmetic issue, e.g. it causes failures in a
> >> fuzzer or memory checker of some form?
> >>
> >> Not that it really matters, but I think it'd be more correct to
> >> model this behavior in the VMREAD path.
> >>
> >
> >
> > Because you think that CPU writes full content to access right field in VMCS
> > memory
> > but VMREAD is the one that actually mask the content?
> 
> VMWRITE definitely does not write the full content to memory. The VMCS
> format for Haswell and later packs the access rights fields. (You can
> verify this by doing a VMCLEAR.)

My thought process regarding VMREAD vs VMWRITE was that the behavior
would only be architecturally visible on VMREAD, but I forgot that
the Haswell behavior is also visible at VMEnter since dropping the
reserved bits makes it impossible to generate a consistency check
due to said bits being non-zero.

Back to my question, what is the motivation for this change?  If this
patch was prompted by a test failure then I think it makes more sense
to "fix" the test, i.e. eliminate the AR byte test case.  Dropping
the AR byte reserved bits unconditionally makes KVM consistent with
Haswell but inconsistent with respect to prior CPUs since this code
path also affects non-shadowed VMWRITE.

As for documentation, I'll submit an SDM bug to try and get a blurb
added stating that the reserved bits may be dropped on some CPUs.
Jim Mattson June 28, 2018, 4:03 p.m. UTC | #5
On Thu, Jun 28, 2018 at 7:28 AM, Sean Christopherson
<sean.j.christopherson@intel.com> wrote:

> Back to my question, what is the motivation for this change?

The motivation was consistent behavior between software emulation and
hardware behavior, since they could potentially be mixed in the
execution of the same VM (particularly once live migration gets
checked in, and the VM migrates between pre- and post- Haswell
systems).

> Dropping the AR byte reserved bits unconditionally makes KVM consistent with
> Haswell but inconsistent with respect to prior CPUs since this code
> path also affects non-shadowed VMWRITE.

True, but on older CPUs, VMWRITE is only possible through software
emulation, so there isn't any need to be consistent with hardware.
This change does introduce an inconsistency with older versions of
kvm, but since live migration support for nested VMX isn't checked in
yet, that shouldn't be a problem.
Sean Christopherson June 28, 2018, 4:25 p.m. UTC | #6
On Thu, Jun 28, 2018 at 09:03:29AM -0700, Jim Mattson wrote:
> On Thu, Jun 28, 2018 at 7:28 AM, Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> 
> > Back to my question, what is the motivation for this change?
> 
> The motivation was consistent behavior between software emulation and
> hardware behavior, since they could potentially be mixed in the
> execution of the same VM (particularly once live migration gets
> checked in, and the VM migrates between pre- and post- Haswell
> systems).
> 
> > Dropping the AR byte reserved bits unconditionally makes KVM consistent with
> > Haswell but inconsistent with respect to prior CPUs since this code
> > path also affects non-shadowed VMWRITE.
> 
> True, but on older CPUs, VMWRITE is only possible through software
> emulation, so there isn't any need to be consistent with hardware.
> This change does introduce an inconsistency with older versions of
> kvm, but since live migration support for nested VMX isn't checked in
> yet, that shouldn't be a problem.

Gotcha.  I think it's worth updating the comment and/or commit message
to explain the intended behavior in depth, otherwise people like me
might think we're arbitrarily choosing to emulate Haswell+.

E.g.: 
	/*
	 * Some Intel CPUs drop the reserved bits of the AR byte
	 * fields on VMWRITE, which can lead to divergent behavior
	 * after migrating a VM.  E.g. a nested VMRESUME might
	 * unexpectedly fail after migrating from hardware that
	 * drops the bits to hardware that preserves the bits and
	 * thus performs the reserved bits consistency check.
	 * Explicitly clear the reserved bits to ensure consistent
	 * KVM behavior regardless of the underlying hardware.
	 */
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4b63d6bae6bd..262029c6e3e5 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8263,6 +8263,13 @@  static inline int vmcs12_write_any(struct vmcs12 *vmcs12,
 	if (offset < 0)
 		return offset;
 
+	/*
+	 * For compatibility with Haswell and later, mask off the
+	 * irrelevant bits of the guest access rights fields.
+	 */
+	if (field >= GUEST_ES_AR_BYTES && field <= GUEST_TR_AR_BYTES)
+		field_value &= 0x1f0ff;
+
 	switch (vmcs_field_width(field)) {
 	case VMCS_FIELD_WIDTH_U16:
 		*(u16 *)p = field_value;