diff mbox

[v1,1/2] KVM: nVMX: get rid of nested_get_page()

Message ID 20170803153601.GH32403@flask (mailing list archive)
State New, archived
Headers show

Commit Message

Radim Krčmář Aug. 3, 2017, 3:36 p.m. UTC
2017-08-03 16:09+0200, David Hildenbrand:
> nested_get_page() just sounds confusing. All we want is a page from G1.
> This is even unrelated to nested.
> 
> Let's introduce kvm_vcpu_gpa_to_page() so we don't get too lengthy
> lines.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---

I like the cleanup, but a subtle change in behavior that makes me wary:

> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> @@ -9535,15 +9528,15 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu,
>  		 */
>  		if (vmx->nested.apic_access_page) /* shouldn't happen */
>  			nested_release_page(vmx->nested.apic_access_page);
> -		vmx->nested.apic_access_page =
> -			nested_get_page(vcpu, vmcs12->apic_access_addr);
> +		page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->apic_access_addr);

If what shouldn't happen happened and then kvm_vcpu_gpa_to_page()
failed, we'd be calling put_page() twice on the same page.  I think the
situation currently can happen if VM entry fails after this point.

Assigning 'vmx->nested.apic_access_page = NULL' when releasing the page
sounds safer.

Unless I'm reading something wrong, the "shouldn't happen" really
shouldn't happen if we did something like this:


>  		/*
>  		 * If translation failed, no matter: This feature asks
>  		 * to exit when accessing the given address, and if it
>  		 * can never be accessed, this feature won't do
>  		 * anything anyway.
>  		 */
> -		if (vmx->nested.apic_access_page) {
> +		if (!is_error_page(page)) {
> +			vmx->nested.apic_access_page = page;
>  			hpa = page_to_phys(vmx->nested.apic_access_page);
>  			vmcs_write64(APIC_ACCESS_ADDR, hpa);
>  		} else {
> @@ -9560,8 +9553,7 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu,
>  	if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) {
>  		if (vmx->nested.virtual_apic_page) /* shouldn't happen */
>  			nested_release_page(vmx->nested.virtual_apic_page);

Ditto,

thanks.

> -		vmx->nested.virtual_apic_page =
> -			nested_get_page(vcpu, vmcs12->virtual_apic_page_addr);
> +		page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->apic_access_addr);
>  
>  		/*
>  		 * If translation failed, VM entry will fail because

Comments

David Hildenbrand Aug. 3, 2017, 3:58 p.m. UTC | #1
On 03.08.2017 17:36, Radim Krčmář wrote:
> 2017-08-03 16:09+0200, David Hildenbrand:
>> nested_get_page() just sounds confusing. All we want is a page from G1.
>> This is even unrelated to nested.
>>
>> Let's introduce kvm_vcpu_gpa_to_page() so we don't get too lengthy
>> lines.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
> 
> I like the cleanup, but a subtle change in behavior that makes me wary:
> 
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> @@ -9535,15 +9528,15 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu,
>>  		 */
>>  		if (vmx->nested.apic_access_page) /* shouldn't happen */
>>  			nested_release_page(vmx->nested.apic_access_page);
>> -		vmx->nested.apic_access_page =
>> -			nested_get_page(vcpu, vmcs12->apic_access_addr);
>> +		page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->apic_access_addr);
> 
> If what shouldn't happen happened and then kvm_vcpu_gpa_to_page()
> failed, we'd be calling put_page() twice on the same page.  I think the
> situation currently can happen if VM entry fails after this point.
> 
> Assigning 'vmx->nested.apic_access_page = NULL' when releasing the page
> sounds safer.
> 
> Unless I'm reading something wrong, the "shouldn't happen" really
> shouldn't happen if we did something like this:
> 

Very good point. I'll just clear the respective field whenever we
release a page, this way we are on the safe side.

Good catch! Thanks!
Jim Mattson Aug. 3, 2017, 4:05 p.m. UTC | #2
On Thu, Aug 3, 2017 at 8:36 AM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> 2017-08-03 16:09+0200, David Hildenbrand:
>> nested_get_page() just sounds confusing. All we want is a page from G1.
>> This is even unrelated to nested.
>>
>> Let's introduce kvm_vcpu_gpa_to_page() so we don't get too lengthy
>> lines.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>
> I like the cleanup, but a subtle change in behavior that makes me wary:
>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> @@ -9535,15 +9528,15 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu,
>>                */
>>               if (vmx->nested.apic_access_page) /* shouldn't happen */
>>                       nested_release_page(vmx->nested.apic_access_page);
>> -             vmx->nested.apic_access_page =
>> -                     nested_get_page(vcpu, vmcs12->apic_access_addr);
>> +             page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->apic_access_addr);
>
> If what shouldn't happen happened and then kvm_vcpu_gpa_to_page()
> failed, we'd be calling put_page() twice on the same page.  I think the
> situation currently can happen if VM entry fails after this point.
>
> Assigning 'vmx->nested.apic_access_page = NULL' when releasing the page
> sounds safer.
>
> Unless I'm reading something wrong, the "shouldn't happen" really
> shouldn't happen if we did something like this:
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index e34373838b31..d26e6693f748 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -10462,8 +10462,6 @@ static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
>                 return 1;
>         }
>
> -       nested_get_vmcs12_pages(vcpu, vmcs12);
> -
>         msr_entry_idx = nested_vmx_load_msr(vcpu,
>                                             vmcs12->vm_entry_msr_load_addr,
>                                             vmcs12->vm_entry_msr_load_count);
> @@ -10475,6 +10473,8 @@ static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
>                 return 1;
>         }
>
> +       nested_get_vmcs12_pages(vcpu, vmcs12);
> +
>         /*
>          * Note no nested_vmx_succeed or nested_vmx_fail here. At this point
>          * we are no longer running L1, and VMLAUNCH/VMRESUME has not yet
>
>>               /*
>>                * If translation failed, no matter: This feature asks
>>                * to exit when accessing the given address, and if it
>>                * can never be accessed, this feature won't do
>>                * anything anyway.
>>                */

This comment is incorrect. On real hardware, the APIC access page
doesn't have to exist (i.e. be backed by actual memory), because the
APIC access page is never accessed. Think of the APIC access page as a
sentinel value that the hypervisor can put in the page tables (EPT
page tables if they are in use, x86 page tables otherwise) to trigger
APIC virtualization. If there is an access, it is to the page at the
virtual APIC address, not the APIC access page.

Similarly, in a VM, there need not be a mapping for the APIC access
page for the feature to work as architected. (Or, at least, that's the
way it should work. :-)

>> -             if (vmx->nested.apic_access_page) {
>> +             if (!is_error_page(page)) {
>> +                     vmx->nested.apic_access_page = page;
>>                       hpa = page_to_phys(vmx->nested.apic_access_page);
>>                       vmcs_write64(APIC_ACCESS_ADDR, hpa);
>>               } else {
>> @@ -9560,8 +9553,7 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu,
>>       if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) {
>>               if (vmx->nested.virtual_apic_page) /* shouldn't happen */
>>                       nested_release_page(vmx->nested.virtual_apic_page);
>
> Ditto,
>
> thanks.
>
>> -             vmx->nested.virtual_apic_page =
>> -                     nested_get_page(vcpu, vmcs12->virtual_apic_page_addr);
>> +             page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->apic_access_addr);
>>
>>               /*
>>                * If translation failed, VM entry will fail because
>
Radim Krčmář Aug. 3, 2017, 5:41 p.m. UTC | #3
2017-08-03 09:05-0700, Jim Mattson:
> On Thu, Aug 3, 2017 at 8:36 AM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> > 2017-08-03 16:09+0200, David Hildenbrand:
> >>               /*
> >>                * If translation failed, no matter: This feature asks
> >>                * to exit when accessing the given address, and if it
> >>                * can never be accessed, this feature won't do
> >>                * anything anyway.
> >>                */
> 
> This comment is incorrect. On real hardware, the APIC access page
> doesn't have to exist (i.e. be backed by actual memory), because the
> APIC access page is never accessed. Think of the APIC access page as a
> sentinel value that the hypervisor can put in the page tables (EPT
> page tables if they are in use, x86 page tables otherwise) to trigger
> APIC virtualization. If there is an access, it is to the page at the
> virtual APIC address, not the APIC access page.

Right,

> Similarly, in a VM, there need not be a mapping for the APIC access
> page for the feature to work as architected. (Or, at least, that's the
> way it should work. :-)

the APIC_ACCESS_ADDR is always L0 physical address, so we somehow need
to map the L1 physical address somewhere in order to recognize accesses
from L2.

I think the correct way would be to should create a new mapping if the
chosen L1 physical address has no L0 physical address yet.
The code was made for the common case where hypervisors select a page
that is mapped by KVM ...

Do you wish to send patches? :)

Thanks.
Jim Mattson Aug. 3, 2017, 6:42 p.m. UTC | #4
On Thu, Aug 3, 2017 at 10:41 AM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> 2017-08-03 09:05-0700, Jim Mattson:
>> On Thu, Aug 3, 2017 at 8:36 AM, Radim Krčmář <rkrcmar@redhat.com> wrote:
>> > 2017-08-03 16:09+0200, David Hildenbrand:
>> >>               /*
>> >>                * If translation failed, no matter: This feature asks
>> >>                * to exit when accessing the given address, and if it
>> >>                * can never be accessed, this feature won't do
>> >>                * anything anyway.
>> >>                */
>>
>> This comment is incorrect. On real hardware, the APIC access page
>> doesn't have to exist (i.e. be backed by actual memory), because the
>> APIC access page is never accessed. Think of the APIC access page as a
>> sentinel value that the hypervisor can put in the page tables (EPT
>> page tables if they are in use, x86 page tables otherwise) to trigger
>> APIC virtualization. If there is an access, it is to the page at the
>> virtual APIC address, not the APIC access page.
>
> Right,
>
>> Similarly, in a VM, there need not be a mapping for the APIC access
>> page for the feature to work as architected. (Or, at least, that's the
>> way it should work. :-)
>
> the APIC_ACCESS_ADDR is always L0 physical address, so we somehow need
> to map the L1 physical address somewhere in order to recognize accesses
> from L2.
>
> I think the correct way would be to should create a new mapping if the
> chosen L1 physical address has no L0 physical address yet.
> The code was made for the common case where hypervisors select a page
> that is mapped by KVM ...

Yes, I think that's safest.

> Do you wish to send patches? :)

Unless someone beats me to it!
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e34373838b31..d26e6693f748 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10462,8 +10462,6 @@  static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
 		return 1;
 	}
 
-	nested_get_vmcs12_pages(vcpu, vmcs12);
-
 	msr_entry_idx = nested_vmx_load_msr(vcpu,
 					    vmcs12->vm_entry_msr_load_addr,
 					    vmcs12->vm_entry_msr_load_count);
@@ -10475,6 +10473,8 @@  static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
 		return 1;
 	}
 
+	nested_get_vmcs12_pages(vcpu, vmcs12);
+
 	/*
 	 * Note no nested_vmx_succeed or nested_vmx_fail here. At this point
 	 * we are no longer running L1, and VMLAUNCH/VMRESUME has not yet