diff mbox series

KVM: x86: Do not raise #GP on write to MSR_IA32_MCG_CTL which is not 0 or all 1s

Message ID 20190102192522.13158-1-liran.alon@oracle.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: Do not raise #GP on write to MSR_IA32_MCG_CTL which is not 0 or all 1s | expand

Commit Message

Liran Alon Jan. 2, 2019, 7:25 p.m. UTC
Only 0 or all 1s can be written to IA32_MCG_CTL.
SDM specifies other values as undefined and/or implementation specific.

However, some guest kernels write different values.
One such example is WinNT 4 SP6 which uses a value of 0xffffffff.

Prefer to silently accept these writes to avoid an uncatched #GP in the guest.
We will define our implementation specific behaviour as any value other than 0
to be treated as all 1s.

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
---
 arch/x86/kvm/x86.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

Comments

Jim Mattson Jan. 2, 2019, 11:23 p.m. UTC | #1
On Wed, Jan 2, 2019 at 11:25 AM Liran Alon <liran.alon@oracle.com> wrote:
>
> Only 0 or all 1s can be written to IA32_MCG_CTL.
> SDM specifies other values as undefined and/or implementation specific.
>
> However, some guest kernels write different values.
> One such example is WinNT 4 SP6 which uses a value of 0xffffffff.
>
> Prefer to silently accept these writes to avoid an uncatched #GP in the guest.
> We will define our implementation specific behaviour as any value other than 0
> to be treated as all 1s.
>
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Signed-off-by: Liran Alon <liran.alon@oracle.com>
> ---
>  arch/x86/kvm/x86.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 02c8e095a239..a06e4e892a9d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2287,9 +2287,21 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>                 if (!(mcg_cap & MCG_CTL_P) &&
>                     (data || !msr_info->host_initiated))
>                         return 1;
> -               if (data != 0 && data != ~(u64)0)
> -                       return 1;
> -               vcpu->arch.mcg_ctl = data;
> +               /*
> +                * Only 0 or all 1s can be written to IA32_MCG_CTL.
> +                * SDM specifies other values as undefined and/or
> +                * implementation specific.
> +                *
> +                * However, some guest kernels write different values.
> +                * One such example is WinNT 4 SP6 which uses a value
> +                * of 0xffffffff.
> +                *
> +                * Prefer to silently accept these writes to avoid an
> +                * uncatched #GP in the guest. We will define our
> +                * implementation specific behaviour as any value
> +                * other than 0 to be treated as all 1s.
> +                */

Isn't an "implementation" essentially a "family, model, and stepping"?
I don't think you can define your own implementation-specific behavior
unless you have F/M/S different from any actual hardware produced by
Intel.

> +               vcpu->arch.mcg_ctl = data ? ~(u64)0 : 0;
>                 break;
>         default:
>                 if (msr >= MSR_IA32_MC0_CTL &&
> --
> 2.16.1
>
Sean Christopherson Jan. 2, 2019, 11:26 p.m. UTC | #2
On Wed, Jan 02, 2019 at 09:25:22PM +0200, Liran Alon wrote:
> Only 0 or all 1s can be written to IA32_MCG_CTL.
> SDM specifies other values as undefined and/or implementation specific.
> 
> However, some guest kernels write different values.
> One such example is WinNT 4 SP6 which uses a value of 0xffffffff.
> 
> Prefer to silently accept these writes to avoid an uncatched #GP in the guest.
> We will define our implementation specific behaviour as any value other than 0
> to be treated as all 1s.

I'm assuming this is a 32-bit guest, so what about going with a more
precise hackaround and explicitly allowing 0xffffffff for 32-bit guests,
e.g. sign-extending bit 31 when the value isn't already 0 or -1?

It's worth keeping the #GP behavior for modern kernels, e.g. for testing
and debug.  MSRs 0x0 and 0x1 are aliased to MSRs 0x400 and 0x401 for
historical reasons, i.e. WRMSR without setting ECX can easily write
MSR_IA32_MC0_CTL.

> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Signed-off-by: Liran Alon <liran.alon@oracle.com>
> ---
>  arch/x86/kvm/x86.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 02c8e095a239..a06e4e892a9d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2287,9 +2287,21 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		if (!(mcg_cap & MCG_CTL_P) &&
>  		    (data || !msr_info->host_initiated))
>  			return 1;
> -		if (data != 0 && data != ~(u64)0)
> -			return 1;
> -		vcpu->arch.mcg_ctl = data;
> +		/*
> +		 * Only 0 or all 1s can be written to IA32_MCG_CTL.
> +		 * SDM specifies other values as undefined and/or
> +		 * implementation specific.
> +		 *
> +		 * However, some guest kernels write different values.
> +		 * One such example is WinNT 4 SP6 which uses a value
> +		 * of 0xffffffff.
> +		 *
> +		 * Prefer to silently accept these writes to avoid an
> +		 * uncatched #GP in the guest. We will define our
> +		 * implementation specific behaviour as any value
> +		 * other than 0 to be treated as all 1s.
> +		 */
> +		vcpu->arch.mcg_ctl = data ? ~(u64)0 : 0;
>  		break;
>  	default:
>  		if (msr >= MSR_IA32_MC0_CTL &&
> -- 
> 2.16.1
>
Sean Christopherson Jan. 2, 2019, 11:34 p.m. UTC | #3
On Wed, Jan 02, 2019 at 03:26:30PM -0800, Sean Christopherson wrote:
> On Wed, Jan 02, 2019 at 09:25:22PM +0200, Liran Alon wrote:
> > Only 0 or all 1s can be written to IA32_MCG_CTL.
> > SDM specifies other values as undefined and/or implementation specific.
> > 
> > However, some guest kernels write different values.
> > One such example is WinNT 4 SP6 which uses a value of 0xffffffff.
> > 
> > Prefer to silently accept these writes to avoid an uncatched #GP in the guest.
> > We will define our implementation specific behaviour as any value other than 0
> > to be treated as all 1s.
> 
> I'm assuming this is a 32-bit guest, so what about going with a more
> precise hackaround and explicitly allowing 0xffffffff for 32-bit guests,
> e.g. sign-extending bit 31 when the value isn't already 0 or -1?
> 
> It's worth keeping the #GP behavior for modern kernels, e.g. for testing
> and debug.  MSRs 0x0 and 0x1 are aliased to MSRs 0x400 and 0x401 for
> historical reasons, i.e. WRMSR without setting ECX can easily write
> MSR_IA32_MC0_CTL.

Of course KVM probably doesn't emulate the aliasing so it'd likely #GP
anyways, but I still think we should go with a more surgical hack :)
diff mbox series

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 02c8e095a239..a06e4e892a9d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2287,9 +2287,21 @@  static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (!(mcg_cap & MCG_CTL_P) &&
 		    (data || !msr_info->host_initiated))
 			return 1;
-		if (data != 0 && data != ~(u64)0)
-			return 1;
-		vcpu->arch.mcg_ctl = data;
+		/*
+		 * Only 0 or all 1s can be written to IA32_MCG_CTL.
+		 * SDM specifies other values as undefined and/or
+		 * implementation specific.
+		 *
+		 * However, some guest kernels write different values.
+		 * One such example is WinNT 4 SP6 which uses a value
+		 * of 0xffffffff.
+		 *
+		 * Prefer to silently accept these writes to avoid an
+		 * uncatched #GP in the guest. We will define our
+		 * implementation specific behaviour as any value
+		 * other than 0 to be treated as all 1s.
+		 */
+		vcpu->arch.mcg_ctl = data ? ~(u64)0 : 0;
 		break;
 	default:
 		if (msr >= MSR_IA32_MC0_CTL &&