diff mbox

KVM : Fix read/write to IA32_FEATURE_CONTROL MSR in nested virt

Message ID 1372858868-24755-1-git-send-email-yzt356@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Arthur Chunqi Li July 3, 2013, 1:41 p.m. UTC
Fix read/write to IA32_FEATURE_CONTROL MSR in nested environment.
Simply return 0x5 when read and generate #GP(0) when write.
Delete handling codes in vmx_set_vmx_msr() and generate #GP(0) in
handle_wrmsr().

Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com>
---
 arch/x86/kvm/vmx.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

Comments

Paolo Bonzini July 4, 2013, 7 a.m. UTC | #1
Il 03/07/2013 15:41, Arthur Chunqi Li ha scritto:
> Fix read/write to IA32_FEATURE_CONTROL MSR in nested environment.
> Simply return 0x5 when read and generate #GP(0) when write.
> Delete handling codes in vmx_set_vmx_msr() and generate #GP(0) in
> handle_wrmsr().
> 
> Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com>
> ---
>  arch/x86/kvm/vmx.c |    5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 260a919..e125f94 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2277,7 +2277,7 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
>  
>  	switch (msr_index) {
>  	case MSR_IA32_FEATURE_CONTROL:
> -		*pdata = 0;
> +		*pdata = 0x5;
>  		break;

This is not in the MSR_IA32_VMX_BASIC..MSR_IA32_VMX_TRUE_ENTRY_CTLS
range, so you must check nested_vmx_allowed and return 0 if it is false.

Otherwise looks good.

Paolo

>  	case MSR_IA32_VMX_BASIC:
>  		/*
> @@ -2356,9 +2356,6 @@ static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
>  	if (!nested_vmx_allowed(vcpu))
>  		return 0;
>  
> -	if (msr_index == MSR_IA32_FEATURE_CONTROL)
> -		/* TODO: the right thing. */
> -		return 1;
>  	/*
>  	 * No need to treat VMX capability MSRs specially: If we don't handle
>  	 * them, handle_wrmsr will #GP(0), which is correct (they are readonly)
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov July 4, 2013, 7:10 a.m. UTC | #2
On Thu, Jul 04, 2013 at 09:00:09AM +0200, Paolo Bonzini wrote:
> Il 03/07/2013 15:41, Arthur Chunqi Li ha scritto:
> > Fix read/write to IA32_FEATURE_CONTROL MSR in nested environment.
> > Simply return 0x5 when read and generate #GP(0) when write.
> > Delete handling codes in vmx_set_vmx_msr() and generate #GP(0) in
> > handle_wrmsr().
> > 
> > Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com>
> > ---
> >  arch/x86/kvm/vmx.c |    5 +----
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 260a919..e125f94 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -2277,7 +2277,7 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
> >  
> >  	switch (msr_index) {
> >  	case MSR_IA32_FEATURE_CONTROL:
> > -		*pdata = 0;
> > +		*pdata = 0x5;
> >  		break;
> 
> This is not in the MSR_IA32_VMX_BASIC..MSR_IA32_VMX_TRUE_ENTRY_CTLS
> range, so you must check nested_vmx_allowed and return 0 if it is false.
> 
Or 1?

> Otherwise looks good.
> 
> Paolo
> 
> >  	case MSR_IA32_VMX_BASIC:
> >  		/*
> > @@ -2356,9 +2356,6 @@ static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
Also this function is no longer needed. You can drop it.

And what about Nadav's patch Bandan pointed too? It is not entirely
correct, but it is close to real HW.

> >  	if (!nested_vmx_allowed(vcpu))
> >  		return 0;
> >  
> > -	if (msr_index == MSR_IA32_FEATURE_CONTROL)
> > -		/* TODO: the right thing. */
> > -		return 1;
> >  	/*
> >  	 * No need to treat VMX capability MSRs specially: If we don't handle
> >  	 * them, handle_wrmsr will #GP(0), which is correct (they are readonly)
> > 

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arthur Chunqi Li July 4, 2013, 7:21 a.m. UTC | #3
On Thu, Jul 4, 2013 at 3:10 PM, Gleb Natapov <gleb@redhat.com> wrote:
> On Thu, Jul 04, 2013 at 09:00:09AM +0200, Paolo Bonzini wrote:
>> Il 03/07/2013 15:41, Arthur Chunqi Li ha scritto:
>> > Fix read/write to IA32_FEATURE_CONTROL MSR in nested environment.
>> > Simply return 0x5 when read and generate #GP(0) when write.
>> > Delete handling codes in vmx_set_vmx_msr() and generate #GP(0) in
>> > handle_wrmsr().
>> >
>> > Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com>
>> > ---
>> >  arch/x86/kvm/vmx.c |    5 +----
>> >  1 file changed, 1 insertion(+), 4 deletions(-)
>> >
>> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> > index 260a919..e125f94 100644
>> > --- a/arch/x86/kvm/vmx.c
>> > +++ b/arch/x86/kvm/vmx.c
>> > @@ -2277,7 +2277,7 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
>> >
>> >     switch (msr_index) {
>> >     case MSR_IA32_FEATURE_CONTROL:
>> > -           *pdata = 0;
>> > +           *pdata = 0x5;
>> >             break;
>>
>> This is not in the MSR_IA32_VMX_BASIC..MSR_IA32_VMX_TRUE_ENTRY_CTLS
>> range, so you must check nested_vmx_allowed and return 0 if it is false.
>>
> Or 1?
I think 1 is better here because this may return LOCK message when
query and tell OS not to write (if OS does such logical check)
>
>> Otherwise looks good.
>>
>> Paolo
>>
>> >     case MSR_IA32_VMX_BASIC:
>> >             /*
>> > @@ -2356,9 +2356,6 @@ static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
> Also this function is no longer needed. You can drop it.
>
> And what about Nadav's patch Bandan pointed too? It is not entirely
> correct, but it is close to real HW.
I think Nadav's patch is much closer to the HW scenario. However, I
think we don't need make things complex since KVM doen't support SMX
now and this MSR is always set to 0x5.

Arthur
>
>> >     if (!nested_vmx_allowed(vcpu))
>> >             return 0;
>> >
>> > -   if (msr_index == MSR_IA32_FEATURE_CONTROL)
>> > -           /* TODO: the right thing. */
>> > -           return 1;
>> >     /*
>> >      * No need to treat VMX capability MSRs specially: If we don't handle
>> >      * them, handle_wrmsr will #GP(0), which is correct (they are readonly)
>> >
>
> --
>                         Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov July 4, 2013, 7:24 a.m. UTC | #4
On Thu, Jul 04, 2013 at 03:21:15PM +0800, Arthur Chunqi Li wrote:
> On Thu, Jul 4, 2013 at 3:10 PM, Gleb Natapov <gleb@redhat.com> wrote:
> > On Thu, Jul 04, 2013 at 09:00:09AM +0200, Paolo Bonzini wrote:
> >> Il 03/07/2013 15:41, Arthur Chunqi Li ha scritto:
> >> > Fix read/write to IA32_FEATURE_CONTROL MSR in nested environment.
> >> > Simply return 0x5 when read and generate #GP(0) when write.
> >> > Delete handling codes in vmx_set_vmx_msr() and generate #GP(0) in
> >> > handle_wrmsr().
> >> >
> >> > Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com>
> >> > ---
> >> >  arch/x86/kvm/vmx.c |    5 +----
> >> >  1 file changed, 1 insertion(+), 4 deletions(-)
> >> >
> >> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >> > index 260a919..e125f94 100644
> >> > --- a/arch/x86/kvm/vmx.c
> >> > +++ b/arch/x86/kvm/vmx.c
> >> > @@ -2277,7 +2277,7 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
> >> >
> >> >     switch (msr_index) {
> >> >     case MSR_IA32_FEATURE_CONTROL:
> >> > -           *pdata = 0;
> >> > +           *pdata = 0x5;
> >> >             break;
> >>
> >> This is not in the MSR_IA32_VMX_BASIC..MSR_IA32_VMX_TRUE_ENTRY_CTLS
> >> range, so you must check nested_vmx_allowed and return 0 if it is false.
> >>
> > Or 1?
> I think 1 is better here because this may return LOCK message when
> query and tell OS not to write (if OS does such logical check)
> >
> >> Otherwise looks good.
> >>
> >> Paolo
> >>
> >> >     case MSR_IA32_VMX_BASIC:
> >> >             /*
> >> > @@ -2356,9 +2356,6 @@ static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
> > Also this function is no longer needed. You can drop it.
> >
> > And what about Nadav's patch Bandan pointed too? It is not entirely
> > correct, but it is close to real HW.
> I think Nadav's patch is much closer to the HW scenario. However, I
> think we don't need make things complex since KVM doen't support SMX
> now and this MSR is always set to 0x5.
> 
Set to 0x5 by BIOS on real HW. This way BIOS can control if VMX is
exposed to an OS.

> Arthur
> >
> >> >     if (!nested_vmx_allowed(vcpu))
> >> >             return 0;
> >> >
> >> > -   if (msr_index == MSR_IA32_FEATURE_CONTROL)
> >> > -           /* TODO: the right thing. */
> >> > -           return 1;
> >> >     /*
> >> >      * No need to treat VMX capability MSRs specially: If we don't handle
> >> >      * them, handle_wrmsr will #GP(0), which is correct (they are readonly)
> >> >
> >
> > --
> >                         Gleb.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arthur Chunqi Li July 4, 2013, 8:16 a.m. UTC | #5
? 2013-7-4?15:24?Gleb Natapov <gleb@redhat.com> ???

> On Thu, Jul 04, 2013 at 03:21:15PM +0800, Arthur Chunqi Li wrote:
>> On Thu, Jul 4, 2013 at 3:10 PM, Gleb Natapov <gleb@redhat.com> wrote:
>>> On Thu, Jul 04, 2013 at 09:00:09AM +0200, Paolo Bonzini wrote:
>>>> Il 03/07/2013 15:41, Arthur Chunqi Li ha scritto:
>>>>> Fix read/write to IA32_FEATURE_CONTROL MSR in nested environment.
>>>>> Simply return 0x5 when read and generate #GP(0) when write.
>>>>> Delete handling codes in vmx_set_vmx_msr() and generate #GP(0) in
>>>>> handle_wrmsr().
>>>>> 
>>>>> Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com>
>>>>> ---
>>>>> arch/x86/kvm/vmx.c |    5 +----
>>>>> 1 file changed, 1 insertion(+), 4 deletions(-)
>>>>> 
>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>> index 260a919..e125f94 100644
>>>>> --- a/arch/x86/kvm/vmx.c
>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>> @@ -2277,7 +2277,7 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
>>>>> 
>>>>>    switch (msr_index) {
>>>>>    case MSR_IA32_FEATURE_CONTROL:
>>>>> -           *pdata = 0;
>>>>> +           *pdata = 0x5;
>>>>>            break;
>>>> 
>>>> This is not in the MSR_IA32_VMX_BASIC..MSR_IA32_VMX_TRUE_ENTRY_CTLS
>>>> range, so you must check nested_vmx_allowed and return 0 if it is false.
>>>> 
>>> Or 1?
>> I think 1 is better here because this may return LOCK message when
>> query and tell OS not to write (if OS does such logical check)
>>> 
>>>> Otherwise looks good.
>>>> 
>>>> Paolo
>>>> 
>>>>>    case MSR_IA32_VMX_BASIC:
>>>>>            /*
>>>>> @@ -2356,9 +2356,6 @@ static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
>>> Also this function is no longer needed. You can drop it.
>>> 
>>> And what about Nadav's patch Bandan pointed too? It is not entirely
>>> correct, but it is close to real HW.
>> I think Nadav's patch is much closer to the HW scenario. However, I
>> think we don't need make things complex since KVM doen't support SMX
>> now and this MSR is always set to 0x5.
>> 
> Set to 0x5 by BIOS on real HW. This way BIOS can control if VMX is
> exposed to an OS.
I know. So if we don't use solutions like Nadav's patch, some third-party BIOSes emulator (if they are) may get error since we simply generate #GP(0) when write to this MSR. We can correct SIPI reset in Nadav's patch and add initial codes to seabios, then the entire logical can fit real HW.

Arthur
> 
>> Arthur
>>> 
>>>>>    if (!nested_vmx_allowed(vcpu))
>>>>>            return 0;
>>>>> 
>>>>> -   if (msr_index == MSR_IA32_FEATURE_CONTROL)
>>>>> -           /* TODO: the right thing. */
>>>>> -           return 1;
>>>>>    /*
>>>>>     * No need to treat VMX capability MSRs specially: If we don't handle
>>>>>     * them, handle_wrmsr will #GP(0), which is correct (they are readonly)
>>>>> 
>>> 
>>> --
>>>                        Gleb.
> 
> --
>            Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov July 4, 2013, 10:43 a.m. UTC | #6
On Thu, Jul 04, 2013 at 04:16:25PM +0800, Gmail wrote:
> ? 2013-7-4?15:24?Gleb Natapov <gleb@redhat.com> ???
> 
> > On Thu, Jul 04, 2013 at 03:21:15PM +0800, Arthur Chunqi Li wrote:
> >> On Thu, Jul 4, 2013 at 3:10 PM, Gleb Natapov <gleb@redhat.com> wrote:
> >>> On Thu, Jul 04, 2013 at 09:00:09AM +0200, Paolo Bonzini wrote:
> >>>> Il 03/07/2013 15:41, Arthur Chunqi Li ha scritto:
> >>>>> Fix read/write to IA32_FEATURE_CONTROL MSR in nested environment.
> >>>>> Simply return 0x5 when read and generate #GP(0) when write.
> >>>>> Delete handling codes in vmx_set_vmx_msr() and generate #GP(0) in
> >>>>> handle_wrmsr().
> >>>>> 
> >>>>> Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com>
> >>>>> ---
> >>>>> arch/x86/kvm/vmx.c |    5 +----
> >>>>> 1 file changed, 1 insertion(+), 4 deletions(-)
> >>>>> 
> >>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>>>> index 260a919..e125f94 100644
> >>>>> --- a/arch/x86/kvm/vmx.c
> >>>>> +++ b/arch/x86/kvm/vmx.c
> >>>>> @@ -2277,7 +2277,7 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
> >>>>> 
> >>>>>    switch (msr_index) {
> >>>>>    case MSR_IA32_FEATURE_CONTROL:
> >>>>> -           *pdata = 0;
> >>>>> +           *pdata = 0x5;
> >>>>>            break;
> >>>> 
> >>>> This is not in the MSR_IA32_VMX_BASIC..MSR_IA32_VMX_TRUE_ENTRY_CTLS
> >>>> range, so you must check nested_vmx_allowed and return 0 if it is false.
> >>>> 
> >>> Or 1?
> >> I think 1 is better here because this may return LOCK message when
> >> query and tell OS not to write (if OS does such logical check)
> >>> 
> >>>> Otherwise looks good.
> >>>> 
> >>>> Paolo
> >>>> 
> >>>>>    case MSR_IA32_VMX_BASIC:
> >>>>>            /*
> >>>>> @@ -2356,9 +2356,6 @@ static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
> >>> Also this function is no longer needed. You can drop it.
> >>> 
> >>> And what about Nadav's patch Bandan pointed too? It is not entirely
> >>> correct, but it is close to real HW.
> >> I think Nadav's patch is much closer to the HW scenario. However, I
> >> think we don't need make things complex since KVM doen't support SMX
> >> now and this MSR is always set to 0x5.
> >> 
> > Set to 0x5 by BIOS on real HW. This way BIOS can control if VMX is
> > exposed to an OS.
> I know. So if we don't use solutions like Nadav's patch, some third-party BIOSes emulator (if they are) may get error since we simply generate #GP(0) when write to this MSR. We can correct SIPI reset in Nadav's patch and add initial codes to seabios, then the entire logical can fit real HW.
> 
We do not support third-party BIOSes, we just try to be as close to real
HW as possible. Fixing Nadav's code sounds best.

> Arthur
> > 
> >> Arthur
> >>> 
> >>>>>    if (!nested_vmx_allowed(vcpu))
> >>>>>            return 0;
> >>>>> 
> >>>>> -   if (msr_index == MSR_IA32_FEATURE_CONTROL)
> >>>>> -           /* TODO: the right thing. */
> >>>>> -           return 1;
> >>>>>    /*
> >>>>>     * No need to treat VMX capability MSRs specially: If we don't handle
> >>>>>     * them, handle_wrmsr will #GP(0), which is correct (they are readonly)
> >>>>> 
> >>> 
> >>> --
> >>>                        Gleb.
> > 
> > --
> >            Gleb.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini July 4, 2013, 11:01 a.m. UTC | #7
Il 04/07/2013 09:10, Gleb Natapov ha scritto:
> On Thu, Jul 04, 2013 at 09:00:09AM +0200, Paolo Bonzini wrote:
>> Il 03/07/2013 15:41, Arthur Chunqi Li ha scritto:
>>> Fix read/write to IA32_FEATURE_CONTROL MSR in nested environment.
>>> Simply return 0x5 when read and generate #GP(0) when write.
>>> Delete handling codes in vmx_set_vmx_msr() and generate #GP(0) in
>>> handle_wrmsr().
>>>
>>> Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com>
>>> ---
>>>  arch/x86/kvm/vmx.c |    5 +----
>>>  1 file changed, 1 insertion(+), 4 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index 260a919..e125f94 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -2277,7 +2277,7 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
>>>  
>>>  	switch (msr_index) {
>>>  	case MSR_IA32_FEATURE_CONTROL:
>>> -		*pdata = 0;
>>> +		*pdata = 0x5;
>>>  		break;
>>
>> This is not in the MSR_IA32_VMX_BASIC..MSR_IA32_VMX_TRUE_ENTRY_CTLS
>> range, so you must check nested_vmx_allowed and return 0 if it is false.
>
> Or 1?

"Return 0 from the whole function" and hence #GP(0) on reads.  The MSR
doesn't exist if VMX=SMX=0.

>> Otherwise looks good.
>>
>> Paolo
>>
>>>  	case MSR_IA32_VMX_BASIC:
>>>  		/*
>>> @@ -2356,9 +2356,6 @@ static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
> Also this function is no longer needed. You can drop it.
> 
> And what about Nadav's patch Bandan pointed too? It is not entirely
> correct, but it is close to real HW.

I don't like that it requires a firmware change in order to use nested
VMX (at least for hypervisors that read the MSR).  "Worse emulation" and
"better emulation + new firmware" are indistiguishable from the point of
view of anyone except the firmware.

IMO there is no reason for a better emulation that no one would care
about _and_ could look like a regression when updating to a newer kernel.

Paolo

>>>  	if (!nested_vmx_allowed(vcpu))
>>>  		return 0;
>>>  
>>> -	if (msr_index == MSR_IA32_FEATURE_CONTROL)
>>> -		/* TODO: the right thing. */
>>> -		return 1;
>>>  	/*
>>>  	 * No need to treat VMX capability MSRs specially: If we don't handle
>>>  	 * them, handle_wrmsr will #GP(0), which is correct (they are readonly)
>>>
> 
> --
> 			Gleb.
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov July 4, 2013, 11:12 a.m. UTC | #8
On Thu, Jul 04, 2013 at 01:01:15PM +0200, Paolo Bonzini wrote:
> Il 04/07/2013 09:10, Gleb Natapov ha scritto:
> > On Thu, Jul 04, 2013 at 09:00:09AM +0200, Paolo Bonzini wrote:
> >> Il 03/07/2013 15:41, Arthur Chunqi Li ha scritto:
> >>> Fix read/write to IA32_FEATURE_CONTROL MSR in nested environment.
> >>> Simply return 0x5 when read and generate #GP(0) when write.
> >>> Delete handling codes in vmx_set_vmx_msr() and generate #GP(0) in
> >>> handle_wrmsr().
> >>>
> >>> Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com>
> >>> ---
> >>>  arch/x86/kvm/vmx.c |    5 +----
> >>>  1 file changed, 1 insertion(+), 4 deletions(-)
> >>>
> >>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>> index 260a919..e125f94 100644
> >>> --- a/arch/x86/kvm/vmx.c
> >>> +++ b/arch/x86/kvm/vmx.c
> >>> @@ -2277,7 +2277,7 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
> >>>  
> >>>  	switch (msr_index) {
> >>>  	case MSR_IA32_FEATURE_CONTROL:
> >>> -		*pdata = 0;
> >>> +		*pdata = 0x5;
> >>>  		break;
> >>
> >> This is not in the MSR_IA32_VMX_BASIC..MSR_IA32_VMX_TRUE_ENTRY_CTLS
> >> range, so you must check nested_vmx_allowed and return 0 if it is false.
> >
> > Or 1?
> 
> "Return 0 from the whole function" and hence #GP(0) on reads.  The MSR
> doesn't exist if VMX=SMX=0.
> 
Right.

> >> Otherwise looks good.
> >>
> >> Paolo
> >>
> >>>  	case MSR_IA32_VMX_BASIC:
> >>>  		/*
> >>> @@ -2356,9 +2356,6 @@ static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
> > Also this function is no longer needed. You can drop it.
> > 
> > And what about Nadav's patch Bandan pointed too? It is not entirely
> > correct, but it is close to real HW.
> 
> I don't like that it requires a firmware change in order to use nested
> VMX (at least for hypervisors that read the MSR).  "Worse emulation" and
> "better emulation + new firmware" are indistiguishable from the point of
> view of anyone except the firmware.
> 
> IMO there is no reason for a better emulation that no one would care
> about _and_ could look like a regression when updating to a newer kernel.
> 
That is why now is the good time to do that since nested vmx is not
widely used. When it will be widely used the change will be impossible
to do for reason you age giving. So it is now or never.

> Paolo
> 
> >>>  	if (!nested_vmx_allowed(vcpu))
> >>>  		return 0;
> >>>  
> >>> -	if (msr_index == MSR_IA32_FEATURE_CONTROL)
> >>> -		/* TODO: the right thing. */
> >>> -		return 1;
> >>>  	/*
> >>>  	 * No need to treat VMX capability MSRs specially: If we don't handle
> >>>  	 * them, handle_wrmsr will #GP(0), which is correct (they are readonly)
> >>>
> > 
> > --
> > 			Gleb.
> > 

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini July 4, 2013, 11:21 a.m. UTC | #9
Il 04/07/2013 13:12, Gleb Natapov ha scritto:
> > I don't like that it requires a firmware change in order to use nested
> > VMX (at least for hypervisors that read the MSR).  "Worse emulation" and
> > "better emulation + new firmware" are indistiguishable from the point of
> > view of anyone except the firmware.
> > 
> > IMO there is no reason for a better emulation that no one would care
> > about _and_ could look like a regression when updating to a newer kernel.
>
> That is why now is the good time to do that since nested vmx is not
> widely used. When it will be widely used the change will be impossible
> to do for reason you age giving. So it is now or never.

I think it is a can of worms.  For example, should this be
conditionalized on running under QEMU?  Under UEFI, TianoCore should be
doing it, not SeaBIOS.  And for CoreBoot, should it be done by CoreBoot
or SeaBIOS?  (How do people use KVM together with CoreBoot?)

So I still prefer never... :)

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov July 4, 2013, 11:31 a.m. UTC | #10
On Thu, Jul 04, 2013 at 01:21:51PM +0200, Paolo Bonzini wrote:
> Il 04/07/2013 13:12, Gleb Natapov ha scritto:
> > > I don't like that it requires a firmware change in order to use nested
> > > VMX (at least for hypervisors that read the MSR).  "Worse emulation" and
> > > "better emulation + new firmware" are indistiguishable from the point of
> > > view of anyone except the firmware.
> > > 
> > > IMO there is no reason for a better emulation that no one would care
> > > about _and_ could look like a regression when updating to a newer kernel.
> >
> > That is why now is the good time to do that since nested vmx is not
> > widely used. When it will be widely used the change will be impossible
> > to do for reason you age giving. So it is now or never.
> 
> I think it is a can of worms.  For example, should this be
> conditionalized on running under QEMU?  Under UEFI, TianoCore should be
> doing it, not SeaBIOS.  And for CoreBoot, should it be done by CoreBoot
> or SeaBIOS?  (How do people use KVM together with CoreBoot?)
> 
This is not the first thing that firmware need to initialize. I let
firmware guys fight over who is doing it, we just model HW. FWIW for
Seabios patch would be trivial.

> So I still prefer never... :)
> 
This is a "can of worms" IMO. What we decide to init in KVM next to
relieve firmware from its duty? This is "other hypervisor" way, in KVM
we just model HW.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini July 4, 2013, 12:34 p.m. UTC | #11
Il 04/07/2013 13:31, Gleb Natapov ha scritto:
> On Thu, Jul 04, 2013 at 01:21:51PM +0200, Paolo Bonzini wrote:
>> Il 04/07/2013 13:12, Gleb Natapov ha scritto:
>>>> I don't like that it requires a firmware change in order to use nested
>>>> VMX (at least for hypervisors that read the MSR).  "Worse emulation" and
>>>> "better emulation + new firmware" are indistiguishable from the point of
>>>> view of anyone except the firmware.
>>>>
>>>> IMO there is no reason for a better emulation that no one would care
>>>> about _and_ could look like a regression when updating to a newer kernel.
>>>
>>> That is why now is the good time to do that since nested vmx is not
>>> widely used. When it will be widely used the change will be impossible
>>> to do for reason you age giving. So it is now or never.
>>
>> I think it is a can of worms.  For example, should this be
>> conditionalized on running under QEMU?  Under UEFI, TianoCore should be
>> doing it, not SeaBIOS.  And for CoreBoot, should it be done by CoreBoot
>> or SeaBIOS?  (How do people use KVM together with CoreBoot?)
>>
> This is not the first thing that firmware need to initialize. I let
> firmware guys fight over who is doing it, we just model HW. FWIW for
> Seabios patch would be trivial.

Trivial but still depending on the question "who is doing it".  If
CoreBoot should (also) be doing it, the SeaBIOS patch should be
conditional on CONFIG_QEMU.  Also, should it be unconditional or depend
on some external configuration knob (as on a bare-metal firmware)?

Actually KVM probes MSR_IA32_FEATURE_CONTROL itself and sets the bits,
so we can sidestep the whole firmware thing, and go with a fixed version
of Nadav's patch.

>> So I still prefer never... :)
>
> This is a "can of worms" IMO. What we decide to init in KVM next to
> relieve firmware from its duty? This is "other hypervisor" way, in KVM
> we just model HW.

FWIW, I now checked Xen nested VMX and it just returns 5, but this has
nothing to do with paravirtualization).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov July 4, 2013, 12:43 p.m. UTC | #12
On Thu, Jul 04, 2013 at 02:34:11PM +0200, Paolo Bonzini wrote:
> Il 04/07/2013 13:31, Gleb Natapov ha scritto:
> > On Thu, Jul 04, 2013 at 01:21:51PM +0200, Paolo Bonzini wrote:
> >> Il 04/07/2013 13:12, Gleb Natapov ha scritto:
> >>>> I don't like that it requires a firmware change in order to use nested
> >>>> VMX (at least for hypervisors that read the MSR).  "Worse emulation" and
> >>>> "better emulation + new firmware" are indistiguishable from the point of
> >>>> view of anyone except the firmware.
> >>>>
> >>>> IMO there is no reason for a better emulation that no one would care
> >>>> about _and_ could look like a regression when updating to a newer kernel.
> >>>
> >>> That is why now is the good time to do that since nested vmx is not
> >>> widely used. When it will be widely used the change will be impossible
> >>> to do for reason you age giving. So it is now or never.
> >>
> >> I think it is a can of worms.  For example, should this be
> >> conditionalized on running under QEMU?  Under UEFI, TianoCore should be
> >> doing it, not SeaBIOS.  And for CoreBoot, should it be done by CoreBoot
> >> or SeaBIOS?  (How do people use KVM together with CoreBoot?)
> >>
> > This is not the first thing that firmware need to initialize. I let
> > firmware guys fight over who is doing it, we just model HW. FWIW for
> > Seabios patch would be trivial.
> 
> Trivial but still depending on the question "who is doing it".  If
> CoreBoot should (also) be doing it, the SeaBIOS patch should be
> conditional on CONFIG_QEMU.  Also, should it be unconditional or depend
> on some external configuration knob (as on a bare-metal firmware)?
> 
Let firmware developers solve firmware problems. They have all the same
problems when running on real HW and they will have to figure out a
solution regardless. Making things different on virt will only cause
people to treat virt differently (remember irqbalance?).

> Actually KVM probes MSR_IA32_FEATURE_CONTROL itself and sets the bits,
> so we can sidestep the whole firmware thing, and go with a fixed version
> of Nadav's patch.
> 
Indeed, so no regression will be seen even temporary.

> >> So I still prefer never... :)
> >
> > This is a "can of worms" IMO. What we decide to init in KVM next to
> > relieve firmware from its duty? This is "other hypervisor" way, in KVM
> > we just model HW.
> 
> FWIW, I now checked Xen nested VMX and it just returns 5, but this has
> nothing to do with paravirtualization).
> 
> Paolo

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arthur Chunqi Li July 5, 2013, 3:26 a.m. UTC | #13
On Thu, Jul 4, 2013 at 8:43 PM, Gleb Natapov <gleb@redhat.com> wrote:
> On Thu, Jul 04, 2013 at 02:34:11PM +0200, Paolo Bonzini wrote:
>> Il 04/07/2013 13:31, Gleb Natapov ha scritto:
>> > On Thu, Jul 04, 2013 at 01:21:51PM +0200, Paolo Bonzini wrote:
>> >> Il 04/07/2013 13:12, Gleb Natapov ha scritto:
>> >>>> I don't like that it requires a firmware change in order to use nested
>> >>>> VMX (at least for hypervisors that read the MSR).  "Worse emulation" and
>> >>>> "better emulation + new firmware" are indistiguishable from the point of
>> >>>> view of anyone except the firmware.
>> >>>>
>> >>>> IMO there is no reason for a better emulation that no one would care
>> >>>> about _and_ could look like a regression when updating to a newer kernel.
>> >>>
>> >>> That is why now is the good time to do that since nested vmx is not
>> >>> widely used. When it will be widely used the change will be impossible
>> >>> to do for reason you age giving. So it is now or never.
>> >>
>> >> I think it is a can of worms.  For example, should this be
>> >> conditionalized on running under QEMU?  Under UEFI, TianoCore should be
>> >> doing it, not SeaBIOS.  And for CoreBoot, should it be done by CoreBoot
>> >> or SeaBIOS?  (How do people use KVM together with CoreBoot?)
>> >>
>> > This is not the first thing that firmware need to initialize. I let
>> > firmware guys fight over who is doing it, we just model HW. FWIW for
>> > Seabios patch would be trivial.
>>
>> Trivial but still depending on the question "who is doing it".  If
>> CoreBoot should (also) be doing it, the SeaBIOS patch should be
>> conditional on CONFIG_QEMU.  Also, should it be unconditional or depend
>> on some external configuration knob (as on a bare-metal firmware)?
>>
> Let firmware developers solve firmware problems. They have all the same
> problems when running on real HW and they will have to figure out a
> solution regardless. Making things different on virt will only cause
> people to treat virt differently (remember irqbalance?).
I prefer to Gleb's idea. As nested virt is trying to provide a
framework the same as the HW, we just need to model HW. If
initialization works are done in KVM, the responsibility will be
confused and some features are hard to expand in the future. e.g. if
we want to add SMX support and let BIOS to configure
MSR_IA32_FEATURE_CONTROL, it may hard to handle this at that time.

Arthur

>
>> Actually KVM probes MSR_IA32_FEATURE_CONTROL itself and sets the bits,
>> so we can sidestep the whole firmware thing, and go with a fixed version
>> of Nadav's patch.
>>
> Indeed, so no regression will be seen even temporary.
>
>> >> So I still prefer never... :)
>> >
>> > This is a "can of worms" IMO. What we decide to init in KVM next to
>> > relieve firmware from its duty? This is "other hypervisor" way, in KVM
>> > we just model HW.
>>
>> FWIW, I now checked Xen nested VMX and it just returns 5, but this has
>> nothing to do with paravirtualization).
>>
>> Paolo
>
> --
>                         Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 260a919..e125f94 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2277,7 +2277,7 @@  static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
 
 	switch (msr_index) {
 	case MSR_IA32_FEATURE_CONTROL:
-		*pdata = 0;
+		*pdata = 0x5;
 		break;
 	case MSR_IA32_VMX_BASIC:
 		/*
@@ -2356,9 +2356,6 @@  static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
 	if (!nested_vmx_allowed(vcpu))
 		return 0;
 
-	if (msr_index == MSR_IA32_FEATURE_CONTROL)
-		/* TODO: the right thing. */
-		return 1;
 	/*
 	 * No need to treat VMX capability MSRs specially: If we don't handle
 	 * them, handle_wrmsr will #GP(0), which is correct (they are readonly)