diff mbox

[v2,5/8] KVM: nVMX: Fix guest CR3 read-back on VM-exit

Message ID 52011AE6.2010006@siemens.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jan Kiszka Aug. 6, 2013, 3:48 p.m. UTC
On 2013-08-06 17:04, Zhang, Yang Z wrote:
> Gleb Natapov wrote on 2013-08-06:
>> On Tue, Aug 06, 2013 at 02:12:51PM +0000, Zhang, Yang Z wrote:
>>> Gleb Natapov wrote on 2013-08-06:
>>>> On Tue, Aug 06, 2013 at 11:44:41AM +0000, Zhang, Yang Z wrote:
>>>>> Gleb Natapov wrote on 2013-08-06:
>>>>>> On Tue, Aug 06, 2013 at 10:39:59AM +0200, Jan Kiszka wrote:
>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>
>>>>>>> If nested EPT is enabled, the L2 guest may change CR3 without any
>>>>>>> exits. We therefore have to read the current value from the VMCS
>>>>>>> when switching to L1. However, if paging wasn't enabled, L0 tracks
>>>>>>> L2's CR3, and GUEST_CR3 rather contains the real-mode identity map.
>>>>>>> So we need to retrieve CR3 from the architectural state after
>>>>>>> conditionally updating it - and this is what kvm_read_cr3 does.
>>>>>>>
>>>>>> I have a headache from trying to think about it already, but
>>>>>> shouldn't
>>>>>> L1 be the one who setups identity map for L2? I traced what
>>>>>> vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) return here and do not
>>>>>> see
>>>>> Here is my understanding:
>>>>> In vmx_set_cr3(), if enabled ept, it will check whether target
>>>>> vcpu is enabling
>>>> paging. When L2 running in real mode, then target vcpu is not
>>>> enabling paging and it will use L0's identity map for L2. If you
>>>> read GUEST_CR3 from VMCS, then you may get the L2's identity map
>>>> not
>> L1's.
>>>>>
>>>> Yes, but why it makes sense to use L0 identity map for L2? I didn't
>>>> see different vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) values because
>>>> L0 and L1 use the same identity map address. When I changed identity
>>>> address L1 configures vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) are
>>>> indeed different, but the real CR3 L2 uses points to L0 identity map.
>>>> If I zero L1 identity map page L2 still works.
>>>>
>>> If L2 in real mode, then L2PA == L1PA. So L0's identity map also works
>>> if L2 is in real mode.
>>>
>> That not the point. It may work accidentally for kvm on kvm, but what
>> if other hypervisor plays different tricks and builds different ident map for its guest?
> Yes, if other hypervisor doesn't build the 1:1 mapping for its guest, it will fail to work. But I cannot imagine what kind of hypervisor will do this and what the purpose is.
> Anyway, current logic is definitely wrong. It should use L1's identity map instead L0's.

So something like this is rather needed?


Jan

Comments

Gleb Natapov Aug. 6, 2013, 3:53 p.m. UTC | #1
On Tue, Aug 06, 2013 at 05:48:54PM +0200, Jan Kiszka wrote:
> On 2013-08-06 17:04, Zhang, Yang Z wrote:
> > Gleb Natapov wrote on 2013-08-06:
> >> On Tue, Aug 06, 2013 at 02:12:51PM +0000, Zhang, Yang Z wrote:
> >>> Gleb Natapov wrote on 2013-08-06:
> >>>> On Tue, Aug 06, 2013 at 11:44:41AM +0000, Zhang, Yang Z wrote:
> >>>>> Gleb Natapov wrote on 2013-08-06:
> >>>>>> On Tue, Aug 06, 2013 at 10:39:59AM +0200, Jan Kiszka wrote:
> >>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
> >>>>>>>
> >>>>>>> If nested EPT is enabled, the L2 guest may change CR3 without any
> >>>>>>> exits. We therefore have to read the current value from the VMCS
> >>>>>>> when switching to L1. However, if paging wasn't enabled, L0 tracks
> >>>>>>> L2's CR3, and GUEST_CR3 rather contains the real-mode identity map.
> >>>>>>> So we need to retrieve CR3 from the architectural state after
> >>>>>>> conditionally updating it - and this is what kvm_read_cr3 does.
> >>>>>>>
> >>>>>> I have a headache from trying to think about it already, but
> >>>>>> shouldn't
> >>>>>> L1 be the one who setups identity map for L2? I traced what
> >>>>>> vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) return here and do not
> >>>>>> see
> >>>>> Here is my understanding:
> >>>>> In vmx_set_cr3(), if enabled ept, it will check whether target
> >>>>> vcpu is enabling
> >>>> paging. When L2 running in real mode, then target vcpu is not
> >>>> enabling paging and it will use L0's identity map for L2. If you
> >>>> read GUEST_CR3 from VMCS, then you may get the L2's identity map
> >>>> not
> >> L1's.
> >>>>>
> >>>> Yes, but why it makes sense to use L0 identity map for L2? I didn't
> >>>> see different vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) values because
> >>>> L0 and L1 use the same identity map address. When I changed identity
> >>>> address L1 configures vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) are
> >>>> indeed different, but the real CR3 L2 uses points to L0 identity map.
> >>>> If I zero L1 identity map page L2 still works.
> >>>>
> >>> If L2 in real mode, then L2PA == L1PA. So L0's identity map also works
> >>> if L2 is in real mode.
> >>>
> >> That not the point. It may work accidentally for kvm on kvm, but what
> >> if other hypervisor plays different tricks and builds different ident map for its guest?
> > Yes, if other hypervisor doesn't build the 1:1 mapping for its guest, it will fail to work. But I cannot imagine what kind of hypervisor will do this and what the purpose is.
> > Anyway, current logic is definitely wrong. It should use L1's identity map instead L0's.
> 
> So something like this is rather needed?
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 44494ed..60a3644 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3375,8 +3375,10 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>  	if (enable_ept) {
>  		eptp = construct_eptp(cr3);
>  		vmcs_write64(EPT_POINTER, eptp);
> -		guest_cr3 = is_paging(vcpu) ? kvm_read_cr3(vcpu) :
> -			vcpu->kvm->arch.ept_identity_map_addr;
> +		if (is_paging(vcpu) || is_guest_mode(vcpu))
> +			guest_cr3 = kvm_read_cr3(vcpu) :
> +		else
> +			guest_cr3 = vcpu->kvm->arch.ept_identity_map_addr;
>  		ept_load_pdptrs(vcpu);
>  	}
>  
That what I am thinking, will think about it some more tomorrow.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kiszka Aug. 6, 2013, 3:57 p.m. UTC | #2
On 2013-08-06 17:53, Gleb Natapov wrote:
> On Tue, Aug 06, 2013 at 05:48:54PM +0200, Jan Kiszka wrote:
>> On 2013-08-06 17:04, Zhang, Yang Z wrote:
>>> Gleb Natapov wrote on 2013-08-06:
>>>> On Tue, Aug 06, 2013 at 02:12:51PM +0000, Zhang, Yang Z wrote:
>>>>> Gleb Natapov wrote on 2013-08-06:
>>>>>> On Tue, Aug 06, 2013 at 11:44:41AM +0000, Zhang, Yang Z wrote:
>>>>>>> Gleb Natapov wrote on 2013-08-06:
>>>>>>>> On Tue, Aug 06, 2013 at 10:39:59AM +0200, Jan Kiszka wrote:
>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>
>>>>>>>>> If nested EPT is enabled, the L2 guest may change CR3 without any
>>>>>>>>> exits. We therefore have to read the current value from the VMCS
>>>>>>>>> when switching to L1. However, if paging wasn't enabled, L0 tracks
>>>>>>>>> L2's CR3, and GUEST_CR3 rather contains the real-mode identity map.
>>>>>>>>> So we need to retrieve CR3 from the architectural state after
>>>>>>>>> conditionally updating it - and this is what kvm_read_cr3 does.
>>>>>>>>>
>>>>>>>> I have a headache from trying to think about it already, but
>>>>>>>> shouldn't
>>>>>>>> L1 be the one who setups identity map for L2? I traced what
>>>>>>>> vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) return here and do not
>>>>>>>> see
>>>>>>> Here is my understanding:
>>>>>>> In vmx_set_cr3(), if enabled ept, it will check whether target
>>>>>>> vcpu is enabling
>>>>>> paging. When L2 running in real mode, then target vcpu is not
>>>>>> enabling paging and it will use L0's identity map for L2. If you
>>>>>> read GUEST_CR3 from VMCS, then you may get the L2's identity map
>>>>>> not
>>>> L1's.
>>>>>>>
>>>>>> Yes, but why it makes sense to use L0 identity map for L2? I didn't
>>>>>> see different vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) values because
>>>>>> L0 and L1 use the same identity map address. When I changed identity
>>>>>> address L1 configures vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) are
>>>>>> indeed different, but the real CR3 L2 uses points to L0 identity map.
>>>>>> If I zero L1 identity map page L2 still works.
>>>>>>
>>>>> If L2 in real mode, then L2PA == L1PA. So L0's identity map also works
>>>>> if L2 is in real mode.
>>>>>
>>>> That not the point. It may work accidentally for kvm on kvm, but what
>>>> if other hypervisor plays different tricks and builds different ident map for its guest?
>>> Yes, if other hypervisor doesn't build the 1:1 mapping for its guest, it will fail to work. But I cannot imagine what kind of hypervisor will do this and what the purpose is.
>>> Anyway, current logic is definitely wrong. It should use L1's identity map instead L0's.
>>
>> So something like this is rather needed?
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 44494ed..60a3644 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -3375,8 +3375,10 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>>  	if (enable_ept) {
>>  		eptp = construct_eptp(cr3);
>>  		vmcs_write64(EPT_POINTER, eptp);
>> -		guest_cr3 = is_paging(vcpu) ? kvm_read_cr3(vcpu) :
>> -			vcpu->kvm->arch.ept_identity_map_addr;
>> +		if (is_paging(vcpu) || is_guest_mode(vcpu))
>> +			guest_cr3 = kvm_read_cr3(vcpu) :
>> +		else
>> +			guest_cr3 = vcpu->kvm->arch.ept_identity_map_addr;
>>  		ept_load_pdptrs(vcpu);
>>  	}
>>  
> That what I am thinking, will think about it some more tomorrow.

OK, and I'll feed it into a local test.

Jan
Gleb Natapov Aug. 7, 2013, 12:39 p.m. UTC | #3
On Tue, Aug 06, 2013 at 05:57:02PM +0200, Jan Kiszka wrote:
> On 2013-08-06 17:53, Gleb Natapov wrote:
> > On Tue, Aug 06, 2013 at 05:48:54PM +0200, Jan Kiszka wrote:
> >> On 2013-08-06 17:04, Zhang, Yang Z wrote:
> >>> Gleb Natapov wrote on 2013-08-06:
> >>>> On Tue, Aug 06, 2013 at 02:12:51PM +0000, Zhang, Yang Z wrote:
> >>>>> Gleb Natapov wrote on 2013-08-06:
> >>>>>> On Tue, Aug 06, 2013 at 11:44:41AM +0000, Zhang, Yang Z wrote:
> >>>>>>> Gleb Natapov wrote on 2013-08-06:
> >>>>>>>> On Tue, Aug 06, 2013 at 10:39:59AM +0200, Jan Kiszka wrote:
> >>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
> >>>>>>>>>
> >>>>>>>>> If nested EPT is enabled, the L2 guest may change CR3 without any
> >>>>>>>>> exits. We therefore have to read the current value from the VMCS
> >>>>>>>>> when switching to L1. However, if paging wasn't enabled, L0 tracks
> >>>>>>>>> L2's CR3, and GUEST_CR3 rather contains the real-mode identity map.
> >>>>>>>>> So we need to retrieve CR3 from the architectural state after
> >>>>>>>>> conditionally updating it - and this is what kvm_read_cr3 does.
> >>>>>>>>>
> >>>>>>>> I have a headache from trying to think about it already, but
> >>>>>>>> shouldn't
> >>>>>>>> L1 be the one who setups identity map for L2? I traced what
> >>>>>>>> vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) return here and do not
> >>>>>>>> see
> >>>>>>> Here is my understanding:
> >>>>>>> In vmx_set_cr3(), if enabled ept, it will check whether target
> >>>>>>> vcpu is enabling
> >>>>>> paging. When L2 running in real mode, then target vcpu is not
> >>>>>> enabling paging and it will use L0's identity map for L2. If you
> >>>>>> read GUEST_CR3 from VMCS, then you may get the L2's identity map
> >>>>>> not
> >>>> L1's.
> >>>>>>>
> >>>>>> Yes, but why it makes sense to use L0 identity map for L2? I didn't
> >>>>>> see different vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) values because
> >>>>>> L0 and L1 use the same identity map address. When I changed identity
> >>>>>> address L1 configures vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) are
> >>>>>> indeed different, but the real CR3 L2 uses points to L0 identity map.
> >>>>>> If I zero L1 identity map page L2 still works.
> >>>>>>
> >>>>> If L2 in real mode, then L2PA == L1PA. So L0's identity map also works
> >>>>> if L2 is in real mode.
> >>>>>
> >>>> That not the point. It may work accidentally for kvm on kvm, but what
> >>>> if other hypervisor plays different tricks and builds different ident map for its guest?
> >>> Yes, if other hypervisor doesn't build the 1:1 mapping for its guest, it will fail to work. But I cannot imagine what kind of hypervisor will do this and what the purpose is.
> >>> Anyway, current logic is definitely wrong. It should use L1's identity map instead L0's.
> >>
> >> So something like this is rather needed?
> >>
> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >> index 44494ed..60a3644 100644
> >> --- a/arch/x86/kvm/vmx.c
> >> +++ b/arch/x86/kvm/vmx.c
> >> @@ -3375,8 +3375,10 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> >>  	if (enable_ept) {
> >>  		eptp = construct_eptp(cr3);
> >>  		vmcs_write64(EPT_POINTER, eptp);
> >> -		guest_cr3 = is_paging(vcpu) ? kvm_read_cr3(vcpu) :
> >> -			vcpu->kvm->arch.ept_identity_map_addr;
> >> +		if (is_paging(vcpu) || is_guest_mode(vcpu))
> >> +			guest_cr3 = kvm_read_cr3(vcpu) :
> >> +		else
> >> +			guest_cr3 = vcpu->kvm->arch.ept_identity_map_addr;
> >>  		ept_load_pdptrs(vcpu);
> >>  	}
> >>  
> > That what I am thinking, will think about it some more tomorrow.
> 
> OK, and I'll feed it into a local test.
> 
Thought about is some more. So without nested unrestricted guest (nUG)
is_paging() will always be true (since without nUG guest entry is not
possible otherwise) and guest's cr3 will be used, but with nUG identity
map is not used (that is why L2 still works even though wrong identity
map pointer is assigned to cr3), so the code here just corrupts nested
guest's cr3 for no reason and that is why you had to use kvm_read_cr3()
in prepare_vmcs12() to get correct cr3 value. The patch above should be
used instead of original one IMO. How is testing going?

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kiszka Aug. 7, 2013, 12:46 p.m. UTC | #4
On 2013-08-07 14:39, Gleb Natapov wrote:
> On Tue, Aug 06, 2013 at 05:57:02PM +0200, Jan Kiszka wrote:
>> On 2013-08-06 17:53, Gleb Natapov wrote:
>>> On Tue, Aug 06, 2013 at 05:48:54PM +0200, Jan Kiszka wrote:
>>>> On 2013-08-06 17:04, Zhang, Yang Z wrote:
>>>>> Gleb Natapov wrote on 2013-08-06:
>>>>>> On Tue, Aug 06, 2013 at 02:12:51PM +0000, Zhang, Yang Z wrote:
>>>>>>> Gleb Natapov wrote on 2013-08-06:
>>>>>>>> On Tue, Aug 06, 2013 at 11:44:41AM +0000, Zhang, Yang Z wrote:
>>>>>>>>> Gleb Natapov wrote on 2013-08-06:
>>>>>>>>>> On Tue, Aug 06, 2013 at 10:39:59AM +0200, Jan Kiszka wrote:
>>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>>
>>>>>>>>>>> If nested EPT is enabled, the L2 guest may change CR3 without any
>>>>>>>>>>> exits. We therefore have to read the current value from the VMCS
>>>>>>>>>>> when switching to L1. However, if paging wasn't enabled, L0 tracks
>>>>>>>>>>> L2's CR3, and GUEST_CR3 rather contains the real-mode identity map.
>>>>>>>>>>> So we need to retrieve CR3 from the architectural state after
>>>>>>>>>>> conditionally updating it - and this is what kvm_read_cr3 does.
>>>>>>>>>>>
>>>>>>>>>> I have a headache from trying to think about it already, but
>>>>>>>>>> shouldn't
>>>>>>>>>> L1 be the one who setups identity map for L2? I traced what
>>>>>>>>>> vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) return here and do not
>>>>>>>>>> see
>>>>>>>>> Here is my understanding:
>>>>>>>>> In vmx_set_cr3(), if enabled ept, it will check whether target
>>>>>>>>> vcpu is enabling
>>>>>>>> paging. When L2 running in real mode, then target vcpu is not
>>>>>>>> enabling paging and it will use L0's identity map for L2. If you
>>>>>>>> read GUEST_CR3 from VMCS, then you may get the L2's identity map
>>>>>>>> not
>>>>>> L1's.
>>>>>>>>>
>>>>>>>> Yes, but why it makes sense to use L0 identity map for L2? I didn't
>>>>>>>> see different vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) values because
>>>>>>>> L0 and L1 use the same identity map address. When I changed identity
>>>>>>>> address L1 configures vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) are
>>>>>>>> indeed different, but the real CR3 L2 uses points to L0 identity map.
>>>>>>>> If I zero L1 identity map page L2 still works.
>>>>>>>>
>>>>>>> If L2 in real mode, then L2PA == L1PA. So L0's identity map also works
>>>>>>> if L2 is in real mode.
>>>>>>>
>>>>>> That not the point. It may work accidentally for kvm on kvm, but what
>>>>>> if other hypervisor plays different tricks and builds different ident map for its guest?
>>>>> Yes, if other hypervisor doesn't build the 1:1 mapping for its guest, it will fail to work. But I cannot imagine what kind of hypervisor will do this and what the purpose is.
>>>>> Anyway, current logic is definitely wrong. It should use L1's identity map instead L0's.
>>>>
>>>> So something like this is rather needed?
>>>>
>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>> index 44494ed..60a3644 100644
>>>> --- a/arch/x86/kvm/vmx.c
>>>> +++ b/arch/x86/kvm/vmx.c
>>>> @@ -3375,8 +3375,10 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>>>>  	if (enable_ept) {
>>>>  		eptp = construct_eptp(cr3);
>>>>  		vmcs_write64(EPT_POINTER, eptp);
>>>> -		guest_cr3 = is_paging(vcpu) ? kvm_read_cr3(vcpu) :
>>>> -			vcpu->kvm->arch.ept_identity_map_addr;
>>>> +		if (is_paging(vcpu) || is_guest_mode(vcpu))
>>>> +			guest_cr3 = kvm_read_cr3(vcpu) :
>>>> +		else
>>>> +			guest_cr3 = vcpu->kvm->arch.ept_identity_map_addr;
>>>>  		ept_load_pdptrs(vcpu);
>>>>  	}
>>>>  
>>> That what I am thinking, will think about it some more tomorrow.
>>
>> OK, and I'll feed it into a local test.
>>
> Thought about is some more. So without nested unrestricted guest (nUG)
> is_paging() will always be true (since without nUG guest entry is not
> possible otherwise) and guest's cr3 will be used, but with nUG identity
> map is not used (that is why L2 still works even though wrong identity
> map pointer is assigned to cr3), so the code here just corrupts nested
> guest's cr3 for no reason and that is why you had to use kvm_read_cr3()
> in prepare_vmcs12() to get correct cr3 value. The patch above should be
> used instead of original one IMO. How is testing going?

Yes, testing worked fine. I've queued above patch and will send it out
within the next round.

Jan
Paolo Bonzini Aug. 7, 2013, 1:32 p.m. UTC | #5
On 08/07/2013 02:46 PM, Jan Kiszka wrote:
> On 2013-08-07 14:39, Gleb Natapov wrote:
>> On Tue, Aug 06, 2013 at 05:57:02PM +0200, Jan Kiszka wrote:
>>> On 2013-08-06 17:53, Gleb Natapov wrote:
>>>> On Tue, Aug 06, 2013 at 05:48:54PM +0200, Jan Kiszka wrote:
>>>>> On 2013-08-06 17:04, Zhang, Yang Z wrote:
>>>>>> Gleb Natapov wrote on 2013-08-06:
>>>>>>> On Tue, Aug 06, 2013 at 02:12:51PM +0000, Zhang, Yang Z wrote:
>>>>>>>> Gleb Natapov wrote on 2013-08-06:
>>>>>>>>> On Tue, Aug 06, 2013 at 11:44:41AM +0000, Zhang, Yang Z wrote:
>>>>>>>>>> Gleb Natapov wrote on 2013-08-06:
>>>>>>>>>>> On Tue, Aug 06, 2013 at 10:39:59AM +0200, Jan Kiszka wrote:
>>>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>>>
>>>>>>>>>>>> If nested EPT is enabled, the L2 guest may change CR3 without any
>>>>>>>>>>>> exits. We therefore have to read the current value from the VMCS
>>>>>>>>>>>> when switching to L1. However, if paging wasn't enabled, L0 tracks
>>>>>>>>>>>> L2's CR3, and GUEST_CR3 rather contains the real-mode identity map.
>>>>>>>>>>>> So we need to retrieve CR3 from the architectural state after
>>>>>>>>>>>> conditionally updating it - and this is what kvm_read_cr3 does.
>>>>>>>>>>>>
>>>>>>>>>>> I have a headache from trying to think about it already, but
>>>>>>>>>>> shouldn't
>>>>>>>>>>> L1 be the one who setups identity map for L2? I traced what
>>>>>>>>>>> vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) return here and do not
>>>>>>>>>>> see
>>>>>>>>>> Here is my understanding:
>>>>>>>>>> In vmx_set_cr3(), if enabled ept, it will check whether target
>>>>>>>>>> vcpu is enabling
>>>>>>>>> paging. When L2 running in real mode, then target vcpu is not
>>>>>>>>> enabling paging and it will use L0's identity map for L2. If you
>>>>>>>>> read GUEST_CR3 from VMCS, then you may get the L2's identity map
>>>>>>>>> not
>>>>>>> L1's.
>>>>>>>>>>
>>>>>>>>> Yes, but why it makes sense to use L0 identity map for L2? I didn't
>>>>>>>>> see different vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) values because
>>>>>>>>> L0 and L1 use the same identity map address. When I changed identity
>>>>>>>>> address L1 configures vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) are
>>>>>>>>> indeed different, but the real CR3 L2 uses points to L0 identity map.
>>>>>>>>> If I zero L1 identity map page L2 still works.
>>>>>>>>>
>>>>>>>> If L2 in real mode, then L2PA == L1PA. So L0's identity map also works
>>>>>>>> if L2 is in real mode.
>>>>>>>>
>>>>>>> That not the point. It may work accidentally for kvm on kvm, but what
>>>>>>> if other hypervisor plays different tricks and builds different ident map for its guest?
>>>>>> Yes, if other hypervisor doesn't build the 1:1 mapping for its guest, it will fail to work. But I cannot imagine what kind of hypervisor will do this and what the purpose is.
>>>>>> Anyway, current logic is definitely wrong. It should use L1's identity map instead L0's.
>>>>>
>>>>> So something like this is rather needed?
>>>>>
>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>> index 44494ed..60a3644 100644
>>>>> --- a/arch/x86/kvm/vmx.c
>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>> @@ -3375,8 +3375,10 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>>>>>   	if (enable_ept) {
>>>>>   		eptp = construct_eptp(cr3);
>>>>>   		vmcs_write64(EPT_POINTER, eptp);
>>>>> -		guest_cr3 = is_paging(vcpu) ? kvm_read_cr3(vcpu) :
>>>>> -			vcpu->kvm->arch.ept_identity_map_addr;
>>>>> +		if (is_paging(vcpu) || is_guest_mode(vcpu))
>>>>> +			guest_cr3 = kvm_read_cr3(vcpu) :
>>>>> +		else
>>>>> +			guest_cr3 = vcpu->kvm->arch.ept_identity_map_addr;
>>>>>   		ept_load_pdptrs(vcpu);
>>>>>   	}
>>>>>
>>>> That what I am thinking, will think about it some more tomorrow.
>>>
>>> OK, and I'll feed it into a local test.
>>>
>> Thought about is some more. So without nested unrestricted guest (nUG)
>> is_paging() will always be true (since without nUG guest entry is not
>> possible otherwise) and guest's cr3 will be used, but with nUG identity
>> map is not used (that is why L2 still works even though wrong identity
>> map pointer is assigned to cr3), so the code here just corrupts nested
>> guest's cr3 for no reason and that is why you had to use kvm_read_cr3()
>> in prepare_vmcs12() to get correct cr3 value. The patch above should be
>> used instead of original one IMO. How is testing going?
>
> Yes, testing worked fine. I've queued above patch and will send it out
> within the next round.

Just reply here with the commit message you desire and Signed-off-by, so 
I can queue it for people who wish to play with nEPT.

Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov Aug. 7, 2013, 1:38 p.m. UTC | #6
On Wed, Aug 07, 2013 at 03:32:37PM +0200, Paolo Bonzini wrote:
> >>>>>diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>>>>index 44494ed..60a3644 100644
> >>>>>--- a/arch/x86/kvm/vmx.c
> >>>>>+++ b/arch/x86/kvm/vmx.c
> >>>>>@@ -3375,8 +3375,10 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> >>>>>  	if (enable_ept) {
> >>>>>  		eptp = construct_eptp(cr3);
> >>>>>  		vmcs_write64(EPT_POINTER, eptp);
> >>>>>-		guest_cr3 = is_paging(vcpu) ? kvm_read_cr3(vcpu) :
> >>>>>-			vcpu->kvm->arch.ept_identity_map_addr;
> >>>>>+		if (is_paging(vcpu) || is_guest_mode(vcpu))
> >>>>>+			guest_cr3 = kvm_read_cr3(vcpu) :
> >>>>>+		else
> >>>>>+			guest_cr3 = vcpu->kvm->arch.ept_identity_map_addr;
> >>>>>  		ept_load_pdptrs(vcpu);
> >>>>>  	}
> >>>>>
> >>>>That what I am thinking, will think about it some more tomorrow.
> >>>
> >>>OK, and I'll feed it into a local test.
> >>>
> >>Thought about is some more. So without nested unrestricted guest (nUG)
> >>is_paging() will always be true (since without nUG guest entry is not
> >>possible otherwise) and guest's cr3 will be used, but with nUG identity
> >>map is not used (that is why L2 still works even though wrong identity
> >>map pointer is assigned to cr3), so the code here just corrupts nested
> >>guest's cr3 for no reason and that is why you had to use kvm_read_cr3()
> >>in prepare_vmcs12() to get correct cr3 value. The patch above should be
> >>used instead of original one IMO. How is testing going?
> >
> >Yes, testing worked fine. I've queued above patch and will send it out
> >within the next round.
> 
> Just reply here with the commit message you desire and
> Signed-off-by, so I can queue it for people who wish to play with
> nEPT.
> 
I would love to have a comment there too :)

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Aug. 7, 2013, 1:54 p.m. UTC | #7
On 08/07/2013 03:38 PM, Gleb Natapov wrote:
> On Wed, Aug 07, 2013 at 03:32:37PM +0200, Paolo Bonzini wrote:
>>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>>>> index 44494ed..60a3644 100644
>>>>>>> --- a/arch/x86/kvm/vmx.c
>>>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>>>> @@ -3375,8 +3375,10 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>>>>>>>   	if (enable_ept) {
>>>>>>>   		eptp = construct_eptp(cr3);
>>>>>>>   		vmcs_write64(EPT_POINTER, eptp);
>>>>>>> -		guest_cr3 = is_paging(vcpu) ? kvm_read_cr3(vcpu) :
>>>>>>> -			vcpu->kvm->arch.ept_identity_map_addr;
>>>>>>> +		if (is_paging(vcpu) || is_guest_mode(vcpu))
>>>>>>> +			guest_cr3 = kvm_read_cr3(vcpu) :
>>>>>>> +		else
>>>>>>> +			guest_cr3 = vcpu->kvm->arch.ept_identity_map_addr;
>>>>>>>   		ept_load_pdptrs(vcpu);
>>>>>>>   	}
>>>>>>>
>>>>>> That what I am thinking, will think about it some more tomorrow.
>>>>>
>>>>> OK, and I'll feed it into a local test.
>>>>>
>>>> Thought about is some more. So without nested unrestricted guest (nUG)
>>>> is_paging() will always be true (since without nUG guest entry is not
>>>> possible otherwise) and guest's cr3 will be used, but with nUG identity
>>>> map is not used (that is why L2 still works even though wrong identity
>>>> map pointer is assigned to cr3), so the code here just corrupts nested
>>>> guest's cr3 for no reason and that is why you had to use kvm_read_cr3()
>>>> in prepare_vmcs12() to get correct cr3 value. The patch above should be
>>>> used instead of original one IMO. How is testing going?
>>>
>>> Yes, testing worked fine. I've queued above patch and will send it out
>>> within the next round.
>>
>> Just reply here with the commit message you desire and
>> Signed-off-by, so I can queue it for people who wish to play with
>> nEPT.
>
> I would love to have a comment there too :)

Ok, then it can wait since it is only needed with nested unrestricted 
guest.  On the other hand, it should come before patch 4 on the next 
submission.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kiszka Aug. 7, 2013, 1:59 p.m. UTC | #8
On 2013-08-07 15:54, Paolo Bonzini wrote:
> On 08/07/2013 03:38 PM, Gleb Natapov wrote:
>> On Wed, Aug 07, 2013 at 03:32:37PM +0200, Paolo Bonzini wrote:
>>>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>>>>> index 44494ed..60a3644 100644
>>>>>>>> --- a/arch/x86/kvm/vmx.c
>>>>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>>>>> @@ -3375,8 +3375,10 @@ static void vmx_set_cr3(struct kvm_vcpu
>>>>>>>> *vcpu, unsigned long cr3)
>>>>>>>>       if (enable_ept) {
>>>>>>>>           eptp = construct_eptp(cr3);
>>>>>>>>           vmcs_write64(EPT_POINTER, eptp);
>>>>>>>> -        guest_cr3 = is_paging(vcpu) ? kvm_read_cr3(vcpu) :
>>>>>>>> -            vcpu->kvm->arch.ept_identity_map_addr;
>>>>>>>> +        if (is_paging(vcpu) || is_guest_mode(vcpu))
>>>>>>>> +            guest_cr3 = kvm_read_cr3(vcpu) :
>>>>>>>> +        else
>>>>>>>> +            guest_cr3 = vcpu->kvm->arch.ept_identity_map_addr;
>>>>>>>>           ept_load_pdptrs(vcpu);
>>>>>>>>       }
>>>>>>>>
>>>>>>> That what I am thinking, will think about it some more tomorrow.
>>>>>>
>>>>>> OK, and I'll feed it into a local test.
>>>>>>
>>>>> Thought about is some more. So without nested unrestricted guest (nUG)
>>>>> is_paging() will always be true (since without nUG guest entry is not
>>>>> possible otherwise) and guest's cr3 will be used, but with nUG
>>>>> identity
>>>>> map is not used (that is why L2 still works even though wrong identity
>>>>> map pointer is assigned to cr3), so the code here just corrupts nested
>>>>> guest's cr3 for no reason and that is why you had to use
>>>>> kvm_read_cr3()
>>>>> in prepare_vmcs12() to get correct cr3 value. The patch above
>>>>> should be
>>>>> used instead of original one IMO. How is testing going?
>>>>
>>>> Yes, testing worked fine. I've queued above patch and will send it out
>>>> within the next round.
>>>
>>> Just reply here with the commit message you desire and
>>> Signed-off-by, so I can queue it for people who wish to play with
>>> nEPT.
>>
>> I would love to have a comment there too :)
> 
> Ok, then it can wait since it is only needed with nested unrestricted
> guest.

Yes, it's related to that feature.

> On the other hand, it should come before patch 4 on the next
> submission.

I'll reorder the whole series, moving the feature enabling to the end.
The ordering still reflects more the history than the dependencies.

Jan
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 44494ed..60a3644 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3375,8 +3375,10 @@  static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 	if (enable_ept) {
 		eptp = construct_eptp(cr3);
 		vmcs_write64(EPT_POINTER, eptp);
-		guest_cr3 = is_paging(vcpu) ? kvm_read_cr3(vcpu) :
-			vcpu->kvm->arch.ept_identity_map_addr;
+		if (is_paging(vcpu) || is_guest_mode(vcpu))
+			guest_cr3 = kvm_read_cr3(vcpu) :
+		else
+			guest_cr3 = vcpu->kvm->arch.ept_identity_map_addr;
 		ept_load_pdptrs(vcpu);
 	}