diff mbox

[v2,5/8] KVM: nVMX: Fix guest CR3 read-back on VM-exit

Message ID 0816baee846f9c8f4d54c6738b2582a95f9c56a3.1375778397.git.jan.kiszka@web.de (mailing list archive)
State New, archived
Headers show

Commit Message

Jan Kiszka Aug. 6, 2013, 8:39 a.m. UTC
From: Jan Kiszka <jan.kiszka@siemens.com>

If nested EPT is enabled, the L2 guest may change CR3 without any exits.
We therefore have to read the current value from the VMCS when switching
to L1. However, if paging wasn't enabled, L0 tracks L2's CR3, and
GUEST_CR3 rather contains the real-mode identity map. So we need to
retrieve CR3 from the architectural state after conditionally updating
it - and this is what kvm_read_cr3 does.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 arch/x86/kvm/vmx.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Comments

Gleb Natapov Aug. 6, 2013, 10:12 a.m. UTC | #1
On Tue, Aug 06, 2013 at 10:39:59AM +0200, Jan Kiszka wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> If nested EPT is enabled, the L2 guest may change CR3 without any exits.
> We therefore have to read the current value from the VMCS when switching
> to L1. However, if paging wasn't enabled, L0 tracks L2's CR3, and
> GUEST_CR3 rather contains the real-mode identity map. So we need to
> retrieve CR3 from the architectural state after conditionally updating
> it - and this is what kvm_read_cr3 does.
> 
I have a headache from trying to think about it already, but shouldn't
L1 be the one who setups identity map for L2? I traced what
vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) return here and do not see
different values in real mode.

> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  arch/x86/kvm/vmx.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index b482d47..09666aa 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -8106,7 +8106,7 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>  	 * Additionally, restore L2's PDPTR to vmcs12.
>  	 */
>  	if (enable_ept) {
> -		vmcs12->guest_cr3 = vmcs_read64(GUEST_CR3);
> +		vmcs12->guest_cr3 = kvm_read_cr3(vcpu);
>  		vmcs12->guest_pdptr0 = vmcs_read64(GUEST_PDPTR0);
>  		vmcs12->guest_pdptr1 = vmcs_read64(GUEST_PDPTR1);
>  		vmcs12->guest_pdptr2 = vmcs_read64(GUEST_PDPTR2);
> -- 
> 1.7.3.4

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kiszka Aug. 6, 2013, 10:25 a.m. UTC | #2
On 2013-08-06 12:12, Gleb Natapov wrote:
> On Tue, Aug 06, 2013 at 10:39:59AM +0200, Jan Kiszka wrote:
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>
>> If nested EPT is enabled, the L2 guest may change CR3 without any exits.
>> We therefore have to read the current value from the VMCS when switching
>> to L1. However, if paging wasn't enabled, L0 tracks L2's CR3, and
>> GUEST_CR3 rather contains the real-mode identity map. So we need to
>> retrieve CR3 from the architectural state after conditionally updating
>> it - and this is what kvm_read_cr3 does.
>>
> I have a headache from trying to think about it already, but shouldn't
> L1 be the one who setups identity map for L2? I traced what
> vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) return here and do not see
> different values in real mode.

Did you try with my patches applied and unrestricted guest mode in use?

Jan

> 
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>  arch/x86/kvm/vmx.c |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index b482d47..09666aa 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -8106,7 +8106,7 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>>  	 * Additionally, restore L2's PDPTR to vmcs12.
>>  	 */
>>  	if (enable_ept) {
>> -		vmcs12->guest_cr3 = vmcs_read64(GUEST_CR3);
>> +		vmcs12->guest_cr3 = kvm_read_cr3(vcpu);
>>  		vmcs12->guest_pdptr0 = vmcs_read64(GUEST_PDPTR0);
>>  		vmcs12->guest_pdptr1 = vmcs_read64(GUEST_PDPTR1);
>>  		vmcs12->guest_pdptr2 = vmcs_read64(GUEST_PDPTR2);
>> -- 
>> 1.7.3.4
> 
> --
> 			Gleb.
>
Gleb Natapov Aug. 6, 2013, 10:31 a.m. UTC | #3
On Tue, Aug 06, 2013 at 12:25:55PM +0200, Jan Kiszka wrote:
> On 2013-08-06 12:12, Gleb Natapov wrote:
> > On Tue, Aug 06, 2013 at 10:39:59AM +0200, Jan Kiszka wrote:
> >> From: Jan Kiszka <jan.kiszka@siemens.com>
> >>
> >> If nested EPT is enabled, the L2 guest may change CR3 without any exits.
> >> We therefore have to read the current value from the VMCS when switching
> >> to L1. However, if paging wasn't enabled, L0 tracks L2's CR3, and
> >> GUEST_CR3 rather contains the real-mode identity map. So we need to
> >> retrieve CR3 from the architectural state after conditionally updating
> >> it - and this is what kvm_read_cr3 does.
> >>
> > I have a headache from trying to think about it already, but shouldn't
> > L1 be the one who setups identity map for L2? I traced what
> > vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) return here and do not see
> > different values in real mode.
> 
> Did you try with my patches applied and unrestricted guest mode in use?
> 
No, for that I need to setup nested environment on the machine that
support unrestricted guest first :)

> Jan
> 
> > 
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >> ---
> >>  arch/x86/kvm/vmx.c |    2 +-
> >>  1 files changed, 1 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >> index b482d47..09666aa 100644
> >> --- a/arch/x86/kvm/vmx.c
> >> +++ b/arch/x86/kvm/vmx.c
> >> @@ -8106,7 +8106,7 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
> >>  	 * Additionally, restore L2's PDPTR to vmcs12.
> >>  	 */
> >>  	if (enable_ept) {
> >> -		vmcs12->guest_cr3 = vmcs_read64(GUEST_CR3);
> >> +		vmcs12->guest_cr3 = kvm_read_cr3(vcpu);
> >>  		vmcs12->guest_pdptr0 = vmcs_read64(GUEST_PDPTR0);
> >>  		vmcs12->guest_pdptr1 = vmcs_read64(GUEST_PDPTR1);
> >>  		vmcs12->guest_pdptr2 = vmcs_read64(GUEST_PDPTR2);
> >> -- 
> >> 1.7.3.4
> > 
> > --
> > 			Gleb.
> > 
> 
> 



--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov Aug. 6, 2013, 11:44 a.m. UTC | #4
On Tue, Aug 06, 2013 at 01:31:03PM +0300, Gleb Natapov wrote:
> On Tue, Aug 06, 2013 at 12:25:55PM +0200, Jan Kiszka wrote:
> > On 2013-08-06 12:12, Gleb Natapov wrote:
> > > On Tue, Aug 06, 2013 at 10:39:59AM +0200, Jan Kiszka wrote:
> > >> From: Jan Kiszka <jan.kiszka@siemens.com>
> > >>
> > >> If nested EPT is enabled, the L2 guest may change CR3 without any exits.
> > >> We therefore have to read the current value from the VMCS when switching
> > >> to L1. However, if paging wasn't enabled, L0 tracks L2's CR3, and
> > >> GUEST_CR3 rather contains the real-mode identity map. So we need to
> > >> retrieve CR3 from the architectural state after conditionally updating
> > >> it - and this is what kvm_read_cr3 does.
> > >>
> > > I have a headache from trying to think about it already, but shouldn't
> > > L1 be the one who setups identity map for L2? I traced what
> > > vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) return here and do not see
> > > different values in real mode.
> > 
> > Did you try with my patches applied and unrestricted guest mode in use?
> > 
> No, for that I need to setup nested environment on the machine that
> support unrestricted guest first :)
> 
Did that. I see unrestricted guest is enabled in L1, but still do not
see different values.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang, Yang Z Aug. 6, 2013, 11:44 a.m. UTC | #5
Gleb Natapov wrote on 2013-08-06:
> On Tue, Aug 06, 2013 at 10:39:59AM +0200, Jan Kiszka wrote:
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>> 
>> If nested EPT is enabled, the L2 guest may change CR3 without any exits.
>> We therefore have to read the current value from the VMCS when
>> switching to L1. However, if paging wasn't enabled, L0 tracks L2's
>> CR3, and
>> GUEST_CR3 rather contains the real-mode identity map. So we need to
>> retrieve CR3 from the architectural state after conditionally
>> updating it - and this is what kvm_read_cr3 does.
>> 
> I have a headache from trying to think about it already, but shouldn't
> L1 be the one who setups identity map for L2? I traced what
> vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) return here and do not see
Here is my understanding:
In vmx_set_cr3(), if enabled ept, it will check whether target vcpu is enabling paging. When L2 running in real mode, then target vcpu is not enabling paging and it will use L0's identity map for L2. If you read GUEST_CR3 from VMCS, then you may get the L2's identity map not L1's.

> different values in real mode.
> 
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>  arch/x86/kvm/vmx.c |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
>> b482d47..09666aa 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -8106,7 +8106,7 @@ static void prepare_vmcs12(struct kvm_vcpu
> *vcpu, struct vmcs12 *vmcs12)
>>  	 * Additionally, restore L2's PDPTR to vmcs12.
>>  	 */
>>  	if (enable_ept) {
>> -		vmcs12->guest_cr3 = vmcs_read64(GUEST_CR3);
>> +		vmcs12->guest_cr3 = kvm_read_cr3(vcpu);
>>  		vmcs12->guest_pdptr0 = vmcs_read64(GUEST_PDPTR0);
>>  		vmcs12->guest_pdptr1 = vmcs_read64(GUEST_PDPTR1);
>>  		vmcs12->guest_pdptr2 = vmcs_read64(GUEST_PDPTR2);
>> --
>> 1.7.3.4
>


Best regards,
Yang

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov Aug. 6, 2013, 2:02 p.m. UTC | #6
On Tue, Aug 06, 2013 at 11:44:41AM +0000, Zhang, Yang Z wrote:
> Gleb Natapov wrote on 2013-08-06:
> > On Tue, Aug 06, 2013 at 10:39:59AM +0200, Jan Kiszka wrote:
> >> From: Jan Kiszka <jan.kiszka@siemens.com>
> >> 
> >> If nested EPT is enabled, the L2 guest may change CR3 without any exits.
> >> We therefore have to read the current value from the VMCS when
> >> switching to L1. However, if paging wasn't enabled, L0 tracks L2's
> >> CR3, and
> >> GUEST_CR3 rather contains the real-mode identity map. So we need to
> >> retrieve CR3 from the architectural state after conditionally
> >> updating it - and this is what kvm_read_cr3 does.
> >> 
> > I have a headache from trying to think about it already, but shouldn't
> > L1 be the one who setups identity map for L2? I traced what
> > vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) return here and do not see
> Here is my understanding:
> In vmx_set_cr3(), if enabled ept, it will check whether target vcpu is enabling paging. When L2 running in real mode, then target vcpu is not enabling paging and it will use L0's identity map for L2. If you read GUEST_CR3 from VMCS, then you may get the L2's identity map not L1's.
> 
Yes, but why it makes sense to use L0 identity map for L2? I didn't see
different vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) values because L0
and L1 use the same identity map address. When I changed identity
address L1 configures vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) are
indeed different, but the real CR3 L2 uses points to L0 identity map. If I
zero L1 identity map page L2 still works.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang, Yang Z Aug. 6, 2013, 2:12 p.m. UTC | #7
Gleb Natapov wrote on 2013-08-06:
> On Tue, Aug 06, 2013 at 11:44:41AM +0000, Zhang, Yang Z wrote:
>> Gleb Natapov wrote on 2013-08-06:
>>> On Tue, Aug 06, 2013 at 10:39:59AM +0200, Jan Kiszka wrote:
>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>> 
>>>> If nested EPT is enabled, the L2 guest may change CR3 without any exits.
>>>> We therefore have to read the current value from the VMCS when
>>>> switching to L1. However, if paging wasn't enabled, L0 tracks
>>>> L2's CR3, and
>>>> GUEST_CR3 rather contains the real-mode identity map. So we need
>>>> to retrieve CR3 from the architectural state after conditionally
>>>> updating it - and this is what kvm_read_cr3 does.
>>>> 
>>> I have a headache from trying to think about it already, but
>>> shouldn't
>>> L1 be the one who setups identity map for L2? I traced what
>>> vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) return here and do not
>>> see
>> Here is my understanding:
>> In vmx_set_cr3(), if enabled ept, it will check whether target vcpu
>> is enabling
> paging. When L2 running in real mode, then target vcpu is not enabling
> paging and it will use L0's identity map for L2. If you read GUEST_CR3
> from VMCS, then you may get the L2's identity map not L1's.
>> 
> Yes, but why it makes sense to use L0 identity map for L2? I didn't see
> different vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) values because L0
> and L1 use the same identity map address. When I changed identity
> address L1 configures vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) are
> indeed different, but the real CR3 L2 uses points to L0 identity map. If
> I zero L1 identity map page L2 still works.
>
If L2 in real mode, then L2PA == L1PA. So L0's identity map also works if L2 is in real mode.

Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov Aug. 6, 2013, 2:41 p.m. UTC | #8
On Tue, Aug 06, 2013 at 02:12:51PM +0000, Zhang, Yang Z wrote:
> Gleb Natapov wrote on 2013-08-06:
> > On Tue, Aug 06, 2013 at 11:44:41AM +0000, Zhang, Yang Z wrote:
> >> Gleb Natapov wrote on 2013-08-06:
> >>> On Tue, Aug 06, 2013 at 10:39:59AM +0200, Jan Kiszka wrote:
> >>>> From: Jan Kiszka <jan.kiszka@siemens.com>
> >>>> 
> >>>> If nested EPT is enabled, the L2 guest may change CR3 without any exits.
> >>>> We therefore have to read the current value from the VMCS when
> >>>> switching to L1. However, if paging wasn't enabled, L0 tracks
> >>>> L2's CR3, and
> >>>> GUEST_CR3 rather contains the real-mode identity map. So we need
> >>>> to retrieve CR3 from the architectural state after conditionally
> >>>> updating it - and this is what kvm_read_cr3 does.
> >>>> 
> >>> I have a headache from trying to think about it already, but
> >>> shouldn't
> >>> L1 be the one who setups identity map for L2? I traced what
> >>> vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) return here and do not
> >>> see
> >> Here is my understanding:
> >> In vmx_set_cr3(), if enabled ept, it will check whether target vcpu
> >> is enabling
> > paging. When L2 running in real mode, then target vcpu is not enabling
> > paging and it will use L0's identity map for L2. If you read GUEST_CR3
> > from VMCS, then you may get the L2's identity map not L1's.
> >> 
> > Yes, but why it makes sense to use L0 identity map for L2? I didn't see
> > different vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) values because L0
> > and L1 use the same identity map address. When I changed identity
> > address L1 configures vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) are
> > indeed different, but the real CR3 L2 uses points to L0 identity map. If
> > I zero L1 identity map page L2 still works.
> >
> If L2 in real mode, then L2PA == L1PA. So L0's identity map also works if L2 is in real mode.
> 
That not the point. It may work accidentally for kvm on kvm, but what
if other hypervisor plays different tricks and builds different ident map
for its guest?
 
--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang, Yang Z Aug. 6, 2013, 3:04 p.m. UTC | #9
Gleb Natapov wrote on 2013-08-06:
> On Tue, Aug 06, 2013 at 02:12:51PM +0000, Zhang, Yang Z wrote:
>> Gleb Natapov wrote on 2013-08-06:
>>> On Tue, Aug 06, 2013 at 11:44:41AM +0000, Zhang, Yang Z wrote:
>>>> Gleb Natapov wrote on 2013-08-06:
>>>>> On Tue, Aug 06, 2013 at 10:39:59AM +0200, Jan Kiszka wrote:
>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>> 
>>>>>> If nested EPT is enabled, the L2 guest may change CR3 without any
>>>>>> exits. We therefore have to read the current value from the VMCS
>>>>>> when switching to L1. However, if paging wasn't enabled, L0 tracks
>>>>>> L2's CR3, and GUEST_CR3 rather contains the real-mode identity map.
>>>>>> So we need to retrieve CR3 from the architectural state after
>>>>>> conditionally updating it - and this is what kvm_read_cr3 does.
>>>>>> 
>>>>> I have a headache from trying to think about it already, but
>>>>> shouldn't
>>>>> L1 be the one who setups identity map for L2? I traced what
>>>>> vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) return here and do not
>>>>> see
>>>> Here is my understanding:
>>>> In vmx_set_cr3(), if enabled ept, it will check whether target
>>>> vcpu is enabling
>>> paging. When L2 running in real mode, then target vcpu is not
>>> enabling paging and it will use L0's identity map for L2. If you
>>> read GUEST_CR3 from VMCS, then you may get the L2's identity map
>>> not
> L1's.
>>>> 
>>> Yes, but why it makes sense to use L0 identity map for L2? I didn't
>>> see different vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) values because
>>> L0 and L1 use the same identity map address. When I changed identity
>>> address L1 configures vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) are
>>> indeed different, but the real CR3 L2 uses points to L0 identity map.
>>> If I zero L1 identity map page L2 still works.
>>> 
>> If L2 in real mode, then L2PA == L1PA. So L0's identity map also works
>> if L2 is in real mode.
>> 
> That not the point. It may work accidentally for kvm on kvm, but what
> if other hypervisor plays different tricks and builds different ident map for its guest?
Yes, if other hypervisor doesn't build the 1:1 mapping for its guest, it will fail to work. But I cannot imagine what kind of hypervisor will do this and what the purpose is.
Anyway, current logic is definitely wrong. It should use L1's identity map instead L0's.

Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b482d47..09666aa 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8106,7 +8106,7 @@  static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 	 * Additionally, restore L2's PDPTR to vmcs12.
 	 */
 	if (enable_ept) {
-		vmcs12->guest_cr3 = vmcs_read64(GUEST_CR3);
+		vmcs12->guest_cr3 = kvm_read_cr3(vcpu);
 		vmcs12->guest_pdptr0 = vmcs_read64(GUEST_PDPTR0);
 		vmcs12->guest_pdptr1 = vmcs_read64(GUEST_PDPTR1);
 		vmcs12->guest_pdptr2 = vmcs_read64(GUEST_PDPTR2);