Message ID | 20170803153601.GH32403@flask (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 03.08.2017 17:36, Radim Krčmář wrote: > 2017-08-03 16:09+0200, David Hildenbrand: >> nested_get_page() just sounds confusing. All we want is a page from G1. >> This is even unrelated to nested. >> >> Let's introduce kvm_vcpu_gpa_to_page() so we don't get too lengthy >> lines. >> >> Signed-off-by: David Hildenbrand <david@redhat.com> >> --- > > I like the cleanup, but a subtle change in behavior that makes me wary: > >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> @@ -9535,15 +9528,15 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu, >> */ >> if (vmx->nested.apic_access_page) /* shouldn't happen */ >> nested_release_page(vmx->nested.apic_access_page); >> - vmx->nested.apic_access_page = >> - nested_get_page(vcpu, vmcs12->apic_access_addr); >> + page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->apic_access_addr); > > If what shouldn't happen happened and then kvm_vcpu_gpa_to_page() > failed, we'd be calling put_page() twice on the same page. I think the > situation currently can happen if VM entry fails after this point. > > Assigning 'vmx->nested.apic_access_page = NULL' when releasing the page > sounds safer. > > Unless I'm reading something wrong, the "shouldn't happen" really > shouldn't happen if we did something like this: > Very good point. I'll just clear the respective field whenever we release a page, this way we are on the safe side. Good catch! Thanks!
On Thu, Aug 3, 2017 at 8:36 AM, Radim Krčmář <rkrcmar@redhat.com> wrote: > 2017-08-03 16:09+0200, David Hildenbrand: >> nested_get_page() just sounds confusing. All we want is a page from G1. >> This is even unrelated to nested. >> >> Let's introduce kvm_vcpu_gpa_to_page() so we don't get too lengthy >> lines. >> >> Signed-off-by: David Hildenbrand <david@redhat.com> >> --- > > I like the cleanup, but a subtle change in behavior that makes me wary: > >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> @@ -9535,15 +9528,15 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu, >> */ >> if (vmx->nested.apic_access_page) /* shouldn't happen */ >> nested_release_page(vmx->nested.apic_access_page); >> - vmx->nested.apic_access_page = >> - nested_get_page(vcpu, vmcs12->apic_access_addr); >> + page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->apic_access_addr); > > If what shouldn't happen happened and then kvm_vcpu_gpa_to_page() > failed, we'd be calling put_page() twice on the same page. I think the > situation currently can happen if VM entry fails after this point. > > Assigning 'vmx->nested.apic_access_page = NULL' when releasing the page > sounds safer. > > Unless I'm reading something wrong, the "shouldn't happen" really > shouldn't happen if we did something like this: > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index e34373838b31..d26e6693f748 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -10462,8 +10462,6 @@ static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry) > return 1; > } > > - nested_get_vmcs12_pages(vcpu, vmcs12); > - > msr_entry_idx = nested_vmx_load_msr(vcpu, > vmcs12->vm_entry_msr_load_addr, > vmcs12->vm_entry_msr_load_count); > @@ -10475,6 +10473,8 @@ static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry) > return 1; > } > > + nested_get_vmcs12_pages(vcpu, vmcs12); > + > /* > * Note no nested_vmx_succeed or nested_vmx_fail here. At this point > * we are no longer running L1, and VMLAUNCH/VMRESUME has not yet > >> /* >> * If translation failed, no matter: This feature asks >> * to exit when accessing the given address, and if it >> * can never be accessed, this feature won't do >> * anything anyway. >> */ This comment is incorrect. On real hardware, the APIC access page doesn't have to exist (i.e. be backed by actual memory), because the APIC access page is never accessed. Think of the APIC access page as a sentinel value that the hypervisor can put in the page tables (EPT page tables if they are in use, x86 page tables otherwise) to trigger APIC virtualization. If there is an access, it is to the page at the virtual APIC address, not the APIC access page. Similarly, in a VM, there need not be a mapping for the APIC access page for the feature to work as architected. (Or, at least, that's the way it should work. :-) >> - if (vmx->nested.apic_access_page) { >> + if (!is_error_page(page)) { >> + vmx->nested.apic_access_page = page; >> hpa = page_to_phys(vmx->nested.apic_access_page); >> vmcs_write64(APIC_ACCESS_ADDR, hpa); >> } else { >> @@ -9560,8 +9553,7 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu, >> if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) { >> if (vmx->nested.virtual_apic_page) /* shouldn't happen */ >> nested_release_page(vmx->nested.virtual_apic_page); > > Ditto, > > thanks. > >> - vmx->nested.virtual_apic_page = >> - nested_get_page(vcpu, vmcs12->virtual_apic_page_addr); >> + page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->apic_access_addr); >> >> /* >> * If translation failed, VM entry will fail because >
2017-08-03 09:05-0700, Jim Mattson: > On Thu, Aug 3, 2017 at 8:36 AM, Radim Krčmář <rkrcmar@redhat.com> wrote: > > 2017-08-03 16:09+0200, David Hildenbrand: > >> /* > >> * If translation failed, no matter: This feature asks > >> * to exit when accessing the given address, and if it > >> * can never be accessed, this feature won't do > >> * anything anyway. > >> */ > > This comment is incorrect. On real hardware, the APIC access page > doesn't have to exist (i.e. be backed by actual memory), because the > APIC access page is never accessed. Think of the APIC access page as a > sentinel value that the hypervisor can put in the page tables (EPT > page tables if they are in use, x86 page tables otherwise) to trigger > APIC virtualization. If there is an access, it is to the page at the > virtual APIC address, not the APIC access page. Right, > Similarly, in a VM, there need not be a mapping for the APIC access > page for the feature to work as architected. (Or, at least, that's the > way it should work. :-) the APIC_ACCESS_ADDR is always L0 physical address, so we somehow need to map the L1 physical address somewhere in order to recognize accesses from L2. I think the correct way would be to should create a new mapping if the chosen L1 physical address has no L0 physical address yet. The code was made for the common case where hypervisors select a page that is mapped by KVM ... Do you wish to send patches? :) Thanks.
On Thu, Aug 3, 2017 at 10:41 AM, Radim Krčmář <rkrcmar@redhat.com> wrote: > 2017-08-03 09:05-0700, Jim Mattson: >> On Thu, Aug 3, 2017 at 8:36 AM, Radim Krčmář <rkrcmar@redhat.com> wrote: >> > 2017-08-03 16:09+0200, David Hildenbrand: >> >> /* >> >> * If translation failed, no matter: This feature asks >> >> * to exit when accessing the given address, and if it >> >> * can never be accessed, this feature won't do >> >> * anything anyway. >> >> */ >> >> This comment is incorrect. On real hardware, the APIC access page >> doesn't have to exist (i.e. be backed by actual memory), because the >> APIC access page is never accessed. Think of the APIC access page as a >> sentinel value that the hypervisor can put in the page tables (EPT >> page tables if they are in use, x86 page tables otherwise) to trigger >> APIC virtualization. If there is an access, it is to the page at the >> virtual APIC address, not the APIC access page. > > Right, > >> Similarly, in a VM, there need not be a mapping for the APIC access >> page for the feature to work as architected. (Or, at least, that's the >> way it should work. :-) > > the APIC_ACCESS_ADDR is always L0 physical address, so we somehow need > to map the L1 physical address somewhere in order to recognize accesses > from L2. > > I think the correct way would be to should create a new mapping if the > chosen L1 physical address has no L0 physical address yet. > The code was made for the common case where hypervisors select a page > that is mapped by KVM ... Yes, I think that's safest. > Do you wish to send patches? :) Unless someone beats me to it!
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index e34373838b31..d26e6693f748 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -10462,8 +10462,6 @@ static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry) return 1; } - nested_get_vmcs12_pages(vcpu, vmcs12); - msr_entry_idx = nested_vmx_load_msr(vcpu, vmcs12->vm_entry_msr_load_addr, vmcs12->vm_entry_msr_load_count); @@ -10475,6 +10473,8 @@ static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry) return 1; } + nested_get_vmcs12_pages(vcpu, vmcs12); + /* * Note no nested_vmx_succeed or nested_vmx_fail here. At this point * we are no longer running L1, and VMLAUNCH/VMRESUME has not yet