Message ID | 1b6a411b-84e7-bfb1-647e-511a13df838c@suse.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | VMX: use a single, global APIC access page | expand |
On 10/02/2021 16:48, Jan Beulich wrote: > The address of this page is used by the CPU only to recognize when to > instead access the virtual APIC page instead. No accesses would ever go > to this page. It only needs to be present in the (CPU) page tables so > that address translation will produce its address as result for > respective accesses. > > By making this page global, we also eliminate the need to refcount it, > or to assign it to any domain in the first place. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> How certain are you about this? It's definitely not true on AMD's AVIC - writes very definitely end up in the backing page if they miss the APIC registers. This concern was why we didn't use a global page originally, IIRC. ~Andrew
On 10.02.2021 18:00, Andrew Cooper wrote: > On 10/02/2021 16:48, Jan Beulich wrote: >> The address of this page is used by the CPU only to recognize when to >> instead access the virtual APIC page instead. No accesses would ever go >> to this page. It only needs to be present in the (CPU) page tables so >> that address translation will produce its address as result for >> respective accesses. >> >> By making this page global, we also eliminate the need to refcount it, >> or to assign it to any domain in the first place. >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > How certain are you about this? The documentation (I'm inclined to say: unexpectedly) is very clear about this; I don't think it had been this clear back at the time. I'm hoping for Kevin to shout if he's aware of issues here. I've also gone through many of our emulation code paths (I think I caught all relevant ones), and found them all having suitable guards in place. (This process was why took until the evening to have a patch ready.) Jan
On 10.02.2021 18:00, Andrew Cooper wrote: > On 10/02/2021 16:48, Jan Beulich wrote: >> The address of this page is used by the CPU only to recognize when to >> instead access the virtual APIC page instead. No accesses would ever go >> to this page. It only needs to be present in the (CPU) page tables so >> that address translation will produce its address as result for >> respective accesses. >> >> By making this page global, we also eliminate the need to refcount it, >> or to assign it to any domain in the first place. >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > How certain are you about this? > > It's definitely not true on AMD's AVIC - writes very definitely end up > in the backing page if they miss the APIC registers. Doesn't this require a per-vCPU page then anyway? With a per-domain one things can't work correctly in the case you describe. Jan
On Wed, Feb 10, 2021 at 05:48:26PM +0100, Jan Beulich wrote: > I did further consider not allocating any real page at all, but just > using the address of some unpopulated space (which would require > announcing this page as reserved to Dom0, so it wouldn't put any PCI > MMIO BARs there). But I thought this would be too controversial, because > of the possible risks associated with this. No, Xen is not capable of allocating a suitable unpopulated page IMO, so let's not go down that route. Wasting one RAM page seems perfectly fine to me. > Perhaps the change to p2m_get_iommu_flags() should be in a separate > patch. Maybe, I'm still not fully convinced that's a change we want to do TBH. > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -1007,6 +1007,8 @@ int arch_domain_soft_reset(struct domain > > void arch_domain_creation_finished(struct domain *d) > { > + if ( is_hvm_domain(d) ) > + hvm_domain_creation_finished(d); > } > > #define xen_vcpu_guest_context vcpu_guest_context > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -66,8 +66,7 @@ boolean_param("force-ept", opt_force_ept > static void vmx_ctxt_switch_from(struct vcpu *v); > static void vmx_ctxt_switch_to(struct vcpu *v); > > -static int vmx_alloc_vlapic_mapping(struct domain *d); > -static void vmx_free_vlapic_mapping(struct domain *d); > +static int alloc_vlapic_mapping(void); > static void vmx_install_vlapic_mapping(struct vcpu *v); > static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr, > unsigned int flags); > @@ -78,6 +77,8 @@ static int vmx_msr_read_intercept(unsign > static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content); > static void vmx_invlpg(struct vcpu *v, unsigned long linear); > > +static mfn_t __read_mostly apic_access_mfn; > + > /* Values for domain's ->arch.hvm_domain.pi_ops.flags. */ > #define PI_CSW_FROM (1u << 0) > #define PI_CSW_TO (1u << 1) > @@ -401,7 +402,6 @@ static int vmx_domain_initialise(struct > .to = vmx_ctxt_switch_to, > .tail = vmx_do_resume, > }; > - int rc; > > d->arch.ctxt_switch = &csw; > > @@ -411,21 +411,16 @@ static int vmx_domain_initialise(struct > */ > d->arch.hvm.vmx.exec_sp = is_hardware_domain(d) || opt_ept_exec_sp; > > - if ( !has_vlapic(d) ) > - return 0; > - > - if ( (rc = vmx_alloc_vlapic_mapping(d)) != 0 ) > - return rc; > - > return 0; > } > > -static void vmx_domain_relinquish_resources(struct domain *d) > +static void domain_creation_finished(struct domain *d) > { > - if ( !has_vlapic(d) ) > - return; > > - vmx_free_vlapic_mapping(d); > + if ( !mfn_eq(apic_access_mfn, _mfn(0)) && > + set_mmio_p2m_entry(d, gaddr_to_gfn(APIC_DEFAULT_PHYS_BASE), > + apic_access_mfn, PAGE_ORDER_4K) ) > + domain_crash(d); > } > > static void vmx_init_ipt(struct vcpu *v) > @@ -2407,7 +2402,7 @@ static struct hvm_function_table __initd > .cpu_up_prepare = vmx_cpu_up_prepare, > .cpu_dead = vmx_cpu_dead, > .domain_initialise = vmx_domain_initialise, > - .domain_relinquish_resources = vmx_domain_relinquish_resources, > + .domain_creation_finished = domain_creation_finished, > .vcpu_initialise = vmx_vcpu_initialise, > .vcpu_destroy = vmx_vcpu_destroy, > .save_cpu_ctxt = vmx_save_vmcs_ctxt, > @@ -2653,7 +2648,7 @@ const struct hvm_function_table * __init > { > set_in_cr4(X86_CR4_VMXE); > > - if ( vmx_vmcs_init() ) > + if ( vmx_vmcs_init() || alloc_vlapic_mapping() ) > { > printk("VMX: failed to initialise.\n"); > return NULL; > @@ -3208,7 +3203,7 @@ gp_fault: > return X86EMUL_EXCEPTION; > } > > -static int vmx_alloc_vlapic_mapping(struct domain *d) > +static int __init alloc_vlapic_mapping(void) > { > struct page_info *pg; > mfn_t mfn; > @@ -3216,53 +3211,28 @@ static int vmx_alloc_vlapic_mapping(stru > if ( !cpu_has_vmx_virtualize_apic_accesses ) > return 0; > > - pg = alloc_domheap_page(d, MEMF_no_refcount); > + pg = alloc_domheap_page(NULL, 0); > if ( !pg ) > return -ENOMEM; > > - if ( !get_page_and_type(pg, d, PGT_writable_page) ) > - { > - /* > - * The domain can't possibly know about this page yet, so failure > - * here is a clear indication of something fishy going on. > - */ > - domain_crash(d); > - return -ENODATA; > - } > - > mfn = page_to_mfn(pg); > clear_domain_page(mfn); > - d->arch.hvm.vmx.apic_access_mfn = mfn; > + apic_access_mfn = mfn; > > - return set_mmio_p2m_entry(d, gaddr_to_gfn(APIC_DEFAULT_PHYS_BASE), mfn, > - PAGE_ORDER_4K); > -} > - > -static void vmx_free_vlapic_mapping(struct domain *d) > -{ > - mfn_t mfn = d->arch.hvm.vmx.apic_access_mfn; > - > - d->arch.hvm.vmx.apic_access_mfn = _mfn(0); > - if ( !mfn_eq(mfn, _mfn(0)) ) > - { > - struct page_info *pg = mfn_to_page(mfn); > - > - put_page_alloc_ref(pg); > - put_page_and_type(pg); > - } > + return 0; > } > > static void vmx_install_vlapic_mapping(struct vcpu *v) > { > paddr_t virt_page_ma, apic_page_ma; > > - if ( mfn_eq(v->domain->arch.hvm.vmx.apic_access_mfn, _mfn(0)) ) > + if ( mfn_eq(apic_access_mfn, _mfn(0)) ) You should check if the domain has a vlapic I think, since now apic_access_mfn is global and will be set regardless of whether the domain has vlapic enabled or not. Previously apic_access_mfn was only allocated if the domain had vlapic enabled. > return; > > ASSERT(cpu_has_vmx_virtualize_apic_accesses); > > virt_page_ma = page_to_maddr(vcpu_vlapic(v)->regs_page); > - apic_page_ma = mfn_to_maddr(v->domain->arch.hvm.vmx.apic_access_mfn); > + apic_page_ma = mfn_to_maddr(apic_access_mfn); > > vmx_vmcs_enter(v); > __vmwrite(VIRTUAL_APIC_PAGE_ADDR, virt_page_ma); I would consider setting up the vmcs and adding the page to the p2m in the same function, and likely call it from vlapic_init. We could have a domain_setup_apic in hvm_function_table that takes care of all this. > --- a/xen/include/asm-x86/hvm/hvm.h > +++ b/xen/include/asm-x86/hvm/hvm.h > @@ -106,6 +106,7 @@ struct hvm_function_table { > * Initialise/destroy HVM domain/vcpu resources > */ > int (*domain_initialise)(struct domain *d); > + void (*domain_creation_finished)(struct domain *d); > void (*domain_relinquish_resources)(struct domain *d); > void (*domain_destroy)(struct domain *d); > int (*vcpu_initialise)(struct vcpu *v); > @@ -390,6 +391,12 @@ static inline bool hvm_has_set_descripto > return hvm_funcs.set_descriptor_access_exiting; > } > > +static inline void hvm_domain_creation_finished(struct domain *d) > +{ > + if ( hvm_funcs.domain_creation_finished ) > + alternative_vcall(hvm_funcs.domain_creation_finished, d); > +} > + > static inline int > hvm_guest_x86_mode(struct vcpu *v) > { > @@ -765,6 +772,11 @@ static inline void hvm_invlpg(const stru > { > ASSERT_UNREACHABLE(); > } > + > +static inline void hvm_domain_creation_finished(struct domain *d) > +{ > + ASSERT_UNREACHABLE(); > +} > > /* > * Shadow code needs further cleanup to eliminate some HVM-only paths. For > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h > @@ -58,7 +58,6 @@ struct ept_data { > #define _VMX_DOMAIN_PML_ENABLED 0 > #define VMX_DOMAIN_PML_ENABLED (1ul << _VMX_DOMAIN_PML_ENABLED) > struct vmx_domain { > - mfn_t apic_access_mfn; > /* VMX_DOMAIN_* */ > unsigned int status; > > --- a/xen/include/asm-x86/p2m.h > +++ b/xen/include/asm-x86/p2m.h > @@ -935,6 +935,9 @@ static inline unsigned int p2m_get_iommu > flags = IOMMUF_readable; > if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) ) > flags |= IOMMUF_writable; > + /* VMX'es APIC access page is global and hence has no owner. */ > + if ( mfn_valid(mfn) && !page_get_owner(mfn_to_page(mfn)) ) > + flags = 0; Is it fine to have this page accessible to devices if the page tables are shared between the CPU and the IOMMU? Is it possible for devices to write to it? I still think both the CPU and the IOMMU page tables should be the same unless there's a technical reason for them not to be, rather than us just wanting to hide things from devices. Thanks, Roger.
On 11.02.2021 09:45, Roger Pau Monné wrote: > On Wed, Feb 10, 2021 at 05:48:26PM +0100, Jan Beulich wrote: >> I did further consider not allocating any real page at all, but just >> using the address of some unpopulated space (which would require >> announcing this page as reserved to Dom0, so it wouldn't put any PCI >> MMIO BARs there). But I thought this would be too controversial, because >> of the possible risks associated with this. > > No, Xen is not capable of allocating a suitable unpopulated page IMO, > so let's not go down that route. Wasting one RAM page seems perfectly > fine to me. Why would Xen not be able to, in principle? It may be difficult, but there may also be pages we could declare we know we can use for this purpose. >> @@ -3216,53 +3211,28 @@ static int vmx_alloc_vlapic_mapping(stru >> if ( !cpu_has_vmx_virtualize_apic_accesses ) >> return 0; >> >> - pg = alloc_domheap_page(d, MEMF_no_refcount); >> + pg = alloc_domheap_page(NULL, 0); >> if ( !pg ) >> return -ENOMEM; >> >> - if ( !get_page_and_type(pg, d, PGT_writable_page) ) >> - { >> - /* >> - * The domain can't possibly know about this page yet, so failure >> - * here is a clear indication of something fishy going on. >> - */ >> - domain_crash(d); >> - return -ENODATA; >> - } >> - >> mfn = page_to_mfn(pg); >> clear_domain_page(mfn); >> - d->arch.hvm.vmx.apic_access_mfn = mfn; >> + apic_access_mfn = mfn; >> >> - return set_mmio_p2m_entry(d, gaddr_to_gfn(APIC_DEFAULT_PHYS_BASE), mfn, >> - PAGE_ORDER_4K); >> -} >> - >> -static void vmx_free_vlapic_mapping(struct domain *d) >> -{ >> - mfn_t mfn = d->arch.hvm.vmx.apic_access_mfn; >> - >> - d->arch.hvm.vmx.apic_access_mfn = _mfn(0); >> - if ( !mfn_eq(mfn, _mfn(0)) ) >> - { >> - struct page_info *pg = mfn_to_page(mfn); >> - >> - put_page_alloc_ref(pg); >> - put_page_and_type(pg); >> - } >> + return 0; >> } >> >> static void vmx_install_vlapic_mapping(struct vcpu *v) >> { >> paddr_t virt_page_ma, apic_page_ma; >> >> - if ( mfn_eq(v->domain->arch.hvm.vmx.apic_access_mfn, _mfn(0)) ) >> + if ( mfn_eq(apic_access_mfn, _mfn(0)) ) > > You should check if the domain has a vlapic I think, since now > apic_access_mfn is global and will be set regardless of whether the > domain has vlapic enabled or not. > > Previously apic_access_mfn was only allocated if the domain had vlapic > enabled. Oh, indeed - thanks for spotting. And also in domain_creation_finished(). >> return; >> >> ASSERT(cpu_has_vmx_virtualize_apic_accesses); >> >> virt_page_ma = page_to_maddr(vcpu_vlapic(v)->regs_page); >> - apic_page_ma = mfn_to_maddr(v->domain->arch.hvm.vmx.apic_access_mfn); >> + apic_page_ma = mfn_to_maddr(apic_access_mfn); >> >> vmx_vmcs_enter(v); >> __vmwrite(VIRTUAL_APIC_PAGE_ADDR, virt_page_ma); > > I would consider setting up the vmcs and adding the page to the p2m in > the same function, and likely call it from vlapic_init. We could have > a domain_setup_apic in hvm_function_table that takes care of all this. Well, I'd prefer to do this just once per domain without needing to special case this on e.g. vCPU 0. >> --- a/xen/include/asm-x86/p2m.h >> +++ b/xen/include/asm-x86/p2m.h >> @@ -935,6 +935,9 @@ static inline unsigned int p2m_get_iommu >> flags = IOMMUF_readable; >> if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) ) >> flags |= IOMMUF_writable; >> + /* VMX'es APIC access page is global and hence has no owner. */ >> + if ( mfn_valid(mfn) && !page_get_owner(mfn_to_page(mfn)) ) >> + flags = 0; > > Is it fine to have this page accessible to devices if the page tables > are shared between the CPU and the IOMMU? No, it's not, but what do you do? As said elsewhere, devices gaining more access than is helpful is the price we pay for being able to share page tables. But ... > Is it possible for devices to write to it? ... thinking about it - they would be able to access it only when interrupt remapping is off. Otherwise the entire range 0xFEExxxxx gets treated differently altogether by the IOMMU, and is not subject to DMA remapping. IOW it shouldn't even matter what flags we put in there (and I'd be far less concerned about the no-intremap case). What this change matters for then is only to avoid an unnecessary call to iommu_map() (and, for small enough domain, extra intermediate page tables to be allocated). But let me really split this off of this patch. Jan
On Thu, Feb 11, 2021 at 11:36:59AM +0100, Jan Beulich wrote: > On 11.02.2021 09:45, Roger Pau Monné wrote: > > On Wed, Feb 10, 2021 at 05:48:26PM +0100, Jan Beulich wrote: > >> I did further consider not allocating any real page at all, but just > >> using the address of some unpopulated space (which would require > >> announcing this page as reserved to Dom0, so it wouldn't put any PCI > >> MMIO BARs there). But I thought this would be too controversial, because > >> of the possible risks associated with this. > > > > No, Xen is not capable of allocating a suitable unpopulated page IMO, > > so let's not go down that route. Wasting one RAM page seems perfectly > > fine to me. > > Why would Xen not be able to, in principle? It may be difficult, > but there may also be pages we could declare we know we can use > for this purpose. I was under the impression that there could always be bits in ACPI dynamic tables that could report MMIO ranges that Xen wasn't aware of, but those should already be marked as reserved in the memory map anyway for good behaved systems. > >> return; > >> > >> ASSERT(cpu_has_vmx_virtualize_apic_accesses); > >> > >> virt_page_ma = page_to_maddr(vcpu_vlapic(v)->regs_page); > >> - apic_page_ma = mfn_to_maddr(v->domain->arch.hvm.vmx.apic_access_mfn); > >> + apic_page_ma = mfn_to_maddr(apic_access_mfn); > >> > >> vmx_vmcs_enter(v); > >> __vmwrite(VIRTUAL_APIC_PAGE_ADDR, virt_page_ma); > > > > I would consider setting up the vmcs and adding the page to the p2m in > > the same function, and likely call it from vlapic_init. We could have > > a domain_setup_apic in hvm_function_table that takes care of all this. > > Well, I'd prefer to do this just once per domain without needing > to special case this on e.g. vCPU 0. I seems more natural to me to do this setup together with the rest of the vlapic initialization, but I'm not going to insist, I also understand your point about calling the function only once. > >> --- a/xen/include/asm-x86/p2m.h > >> +++ b/xen/include/asm-x86/p2m.h > >> @@ -935,6 +935,9 @@ static inline unsigned int p2m_get_iommu > >> flags = IOMMUF_readable; > >> if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) ) > >> flags |= IOMMUF_writable; > >> + /* VMX'es APIC access page is global and hence has no owner. */ > >> + if ( mfn_valid(mfn) && !page_get_owner(mfn_to_page(mfn)) ) > >> + flags = 0; > > > > Is it fine to have this page accessible to devices if the page tables > > are shared between the CPU and the IOMMU? > > No, it's not, but what do you do? As said elsewhere, devices > gaining more access than is helpful is the price we pay for > being able to share page tables. But ... I'm concerned about allowing devices to write to this shared page, as could be used as an unintended way to exchange information between domains? > > Is it possible for devices to write to it? > > ... thinking about it - they would be able to access it only > when interrupt remapping is off. Otherwise the entire range > 0xFEExxxxx gets treated differently altogether by the IOMMU, Now that I think of it, the range 0xFEExxxxx must always be special handled for device accesses, regardless of whether interrupt remapping is enabled or not, or else they won't be capable of delivering MSI messages? So I assume that whatever gets mapped in the IOMMU pages tables at 0xFEExxxxx simply gets ignored? Or else mapping the lapic access page when using shared page tables would have prevented CPU#0 from receiving MSI messages. Thanks, Roger.
On 11.02.2021 12:16, Roger Pau Monné wrote: > On Thu, Feb 11, 2021 at 11:36:59AM +0100, Jan Beulich wrote: >> On 11.02.2021 09:45, Roger Pau Monné wrote: >>> On Wed, Feb 10, 2021 at 05:48:26PM +0100, Jan Beulich wrote: >>>> --- a/xen/include/asm-x86/p2m.h >>>> +++ b/xen/include/asm-x86/p2m.h >>>> @@ -935,6 +935,9 @@ static inline unsigned int p2m_get_iommu >>>> flags = IOMMUF_readable; >>>> if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) ) >>>> flags |= IOMMUF_writable; >>>> + /* VMX'es APIC access page is global and hence has no owner. */ >>>> + if ( mfn_valid(mfn) && !page_get_owner(mfn_to_page(mfn)) ) >>>> + flags = 0; >>> >>> Is it fine to have this page accessible to devices if the page tables >>> are shared between the CPU and the IOMMU? >> >> No, it's not, but what do you do? As said elsewhere, devices >> gaining more access than is helpful is the price we pay for >> being able to share page tables. But ... > > I'm concerned about allowing devices to write to this shared page, as > could be used as an unintended way to exchange information between > domains? Well, such an abuse would be possible, but it wouldn't be part of an ABI and hence could break at any time. Similarly I wouldn't consider it an information leak if a guest abused this. >>> Is it possible for devices to write to it? >> >> ... thinking about it - they would be able to access it only >> when interrupt remapping is off. Otherwise the entire range >> 0xFEExxxxx gets treated differently altogether by the IOMMU, > > Now that I think of it, the range 0xFEExxxxx must always be special > handled for device accesses, regardless of whether interrupt remapping > is enabled or not, or else they won't be capable of delivering MSI > messages? I don't think I know how exactly chipsets handle MSI in this case, but yes, presumably these accesses need to be routed a different path even in that case. > So I assume that whatever gets mapped in the IOMMU pages tables at > 0xFEExxxxx simply gets ignored? This would be the implication, aiui. > Or else mapping the lapic access page when using shared page tables > would have prevented CPU#0 from receiving MSI messages. I guess I don't understand this part. In particular I see nothing CPU#0 specific here. Jan
On Thu, Feb 11, 2021 at 12:22:41PM +0100, Jan Beulich wrote: > On 11.02.2021 12:16, Roger Pau Monné wrote: > > On Thu, Feb 11, 2021 at 11:36:59AM +0100, Jan Beulich wrote: > >> On 11.02.2021 09:45, Roger Pau Monné wrote: > >>> On Wed, Feb 10, 2021 at 05:48:26PM +0100, Jan Beulich wrote: > >>>> --- a/xen/include/asm-x86/p2m.h > >>>> +++ b/xen/include/asm-x86/p2m.h > >>>> @@ -935,6 +935,9 @@ static inline unsigned int p2m_get_iommu > >>>> flags = IOMMUF_readable; > >>>> if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) ) > >>>> flags |= IOMMUF_writable; > >>>> + /* VMX'es APIC access page is global and hence has no owner. */ > >>>> + if ( mfn_valid(mfn) && !page_get_owner(mfn_to_page(mfn)) ) > >>>> + flags = 0; > >>> > >>> Is it fine to have this page accessible to devices if the page tables > >>> are shared between the CPU and the IOMMU? > >> > >> No, it's not, but what do you do? As said elsewhere, devices > >> gaining more access than is helpful is the price we pay for > >> being able to share page tables. But ... > > > > I'm concerned about allowing devices to write to this shared page, as > > could be used as an unintended way to exchange information between > > domains? > > Well, such an abuse would be possible, but it wouldn't be part > of an ABI and hence could break at any time. Similarly I > wouldn't consider it an information leak if a guest abused > this. Hm, I'm kind of worried about having such shared page accessible to guests. Could Intel confirm whether pages in the 0xFEExxxxx range are accessible to devices in any way when using IOMMU shared page tables? > >>> Is it possible for devices to write to it? > >> > >> ... thinking about it - they would be able to access it only > >> when interrupt remapping is off. Otherwise the entire range > >> 0xFEExxxxx gets treated differently altogether by the IOMMU, > > > > Now that I think of it, the range 0xFEExxxxx must always be special > > handled for device accesses, regardless of whether interrupt remapping > > is enabled or not, or else they won't be capable of delivering MSI > > messages? > > I don't think I know how exactly chipsets handle MSI in this > case, but yes, presumably these accesses need to be routed a > different path even in that case. > > > So I assume that whatever gets mapped in the IOMMU pages tables at > > 0xFEExxxxx simply gets ignored? > > This would be the implication, aiui. > > > Or else mapping the lapic access page when using shared page tables > > would have prevented CPU#0 from receiving MSI messages. > > I guess I don't understand this part. In particular I see > nothing CPU#0 specific here. Well, the default APIC address is 0xfee00000 which matches the MSI doorbell for APIC ID 0. APIC ID 1 would instead use 0xfee01000 and thus the APIC access page mapping won't shadow it anyway. Thanks, Roger.
On 11/02/2021 10:36, Jan Beulich wrote: > On 11.02.2021 09:45, Roger Pau Monné wrote: >> On Wed, Feb 10, 2021 at 05:48:26PM +0100, Jan Beulich wrote: >>> I did further consider not allocating any real page at all, but just >>> using the address of some unpopulated space (which would require >>> announcing this page as reserved to Dom0, so it wouldn't put any PCI >>> MMIO BARs there). But I thought this would be too controversial, because >>> of the possible risks associated with this. >> No, Xen is not capable of allocating a suitable unpopulated page IMO, >> so let's not go down that route. Wasting one RAM page seems perfectly >> fine to me. > Why would Xen not be able to, in principle? It may be difficult, > but there may also be pages we could declare we know we can use > for this purpose. There are frames we could use for this purpose, but their locations are platform specific (holes around TSEG). I'm also not sure about the implications of their use as a DMA target. > >>> static void vmx_install_vlapic_mapping(struct vcpu *v) >>> { >>> paddr_t virt_page_ma, apic_page_ma; >>> >>> - if ( mfn_eq(v->domain->arch.hvm.vmx.apic_access_mfn, _mfn(0)) ) >>> + if ( mfn_eq(apic_access_mfn, _mfn(0)) ) >> You should check if the domain has a vlapic I think, since now >> apic_access_mfn is global and will be set regardless of whether the >> domain has vlapic enabled or not. >> >> Previously apic_access_mfn was only allocated if the domain had vlapic >> enabled. > Oh, indeed - thanks for spotting. And also in domain_creation_finished(). Honestly - PVH without LAPIC was a short sighted move. Its a prohibited combination at the moment from XSA-256 and I don't see any value in lifting the restriction, considering that TPR acceleration has been around since practically the first generation of HVM support. ~Andrew
> From: Jan Beulich <jbeulich@suse.com> > Sent: Thursday, February 11, 2021 1:04 AM > > On 10.02.2021 18:00, Andrew Cooper wrote: > > On 10/02/2021 16:48, Jan Beulich wrote: > >> The address of this page is used by the CPU only to recognize when to > >> instead access the virtual APIC page instead. No accesses would ever go > >> to this page. It only needs to be present in the (CPU) page tables so > >> that address translation will produce its address as result for > >> respective accesses. > >> > >> By making this page global, we also eliminate the need to refcount it, > >> or to assign it to any domain in the first place. > >> > >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > > > How certain are you about this? > > The documentation (I'm inclined to say: unexpectedly) is very > clear about this; I don't think it had been this clear back at > the time. I'm hoping for Kevin to shout if he's aware of issues > here. > No, I didn't see an issue here. Thanks Kevin
> From: Roger Pau Monné <roger.pau@citrix.com> > Sent: Thursday, February 11, 2021 8:27 PM > > On Thu, Feb 11, 2021 at 12:22:41PM +0100, Jan Beulich wrote: > > On 11.02.2021 12:16, Roger Pau Monné wrote: > > > On Thu, Feb 11, 2021 at 11:36:59AM +0100, Jan Beulich wrote: > > >> On 11.02.2021 09:45, Roger Pau Monné wrote: > > >>> On Wed, Feb 10, 2021 at 05:48:26PM +0100, Jan Beulich wrote: > > >>>> --- a/xen/include/asm-x86/p2m.h > > >>>> +++ b/xen/include/asm-x86/p2m.h > > >>>> @@ -935,6 +935,9 @@ static inline unsigned int p2m_get_iommu > > >>>> flags = IOMMUF_readable; > > >>>> if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) ) > > >>>> flags |= IOMMUF_writable; > > >>>> + /* VMX'es APIC access page is global and hence has no owner. > */ > > >>>> + if ( mfn_valid(mfn) && !page_get_owner(mfn_to_page(mfn)) ) > > >>>> + flags = 0; > > >>> > > >>> Is it fine to have this page accessible to devices if the page tables > > >>> are shared between the CPU and the IOMMU? > > >> > > >> No, it's not, but what do you do? As said elsewhere, devices > > >> gaining more access than is helpful is the price we pay for > > >> being able to share page tables. But ... > > > > > > I'm concerned about allowing devices to write to this shared page, as > > > could be used as an unintended way to exchange information between > > > domains? > > > > Well, such an abuse would be possible, but it wouldn't be part > > of an ABI and hence could break at any time. Similarly I > > wouldn't consider it an information leak if a guest abused > > this. > > Hm, I'm kind of worried about having such shared page accessible to > guests. Could Intel confirm whether pages in the 0xFEExxxxx range are > accessible to devices in any way when using IOMMU shared page > tables? 0xFEExxxxx range is special. Requests to this range are not subject to DMA remapping (even if a valid mapping for this range exists in the IOMMU page table). And this special treatment is true regardless of whether interrupt remapping is enabled (which comes only after an interrupt message to this range is recognized). Thanks Kevin
On 01.03.2021 03:18, Tian, Kevin wrote: >> From: Roger Pau Monné <roger.pau@citrix.com> >> Sent: Thursday, February 11, 2021 8:27 PM >> >> On Thu, Feb 11, 2021 at 12:22:41PM +0100, Jan Beulich wrote: >>> On 11.02.2021 12:16, Roger Pau Monné wrote: >>>> On Thu, Feb 11, 2021 at 11:36:59AM +0100, Jan Beulich wrote: >>>>> On 11.02.2021 09:45, Roger Pau Monné wrote: >>>>>> On Wed, Feb 10, 2021 at 05:48:26PM +0100, Jan Beulich wrote: >>>>>>> --- a/xen/include/asm-x86/p2m.h >>>>>>> +++ b/xen/include/asm-x86/p2m.h >>>>>>> @@ -935,6 +935,9 @@ static inline unsigned int p2m_get_iommu >>>>>>> flags = IOMMUF_readable; >>>>>>> if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) ) >>>>>>> flags |= IOMMUF_writable; >>>>>>> + /* VMX'es APIC access page is global and hence has no owner. >> */ >>>>>>> + if ( mfn_valid(mfn) && !page_get_owner(mfn_to_page(mfn)) ) >>>>>>> + flags = 0; >>>>>> >>>>>> Is it fine to have this page accessible to devices if the page tables >>>>>> are shared between the CPU and the IOMMU? >>>>> >>>>> No, it's not, but what do you do? As said elsewhere, devices >>>>> gaining more access than is helpful is the price we pay for >>>>> being able to share page tables. But ... >>>> >>>> I'm concerned about allowing devices to write to this shared page, as >>>> could be used as an unintended way to exchange information between >>>> domains? >>> >>> Well, such an abuse would be possible, but it wouldn't be part >>> of an ABI and hence could break at any time. Similarly I >>> wouldn't consider it an information leak if a guest abused >>> this. >> >> Hm, I'm kind of worried about having such shared page accessible to >> guests. Could Intel confirm whether pages in the 0xFEExxxxx range are >> accessible to devices in any way when using IOMMU shared page >> tables? > > 0xFEExxxxx range is special. Requests to this range are not subject to > DMA remapping (even if a valid mapping for this range exists in the > IOMMU page table). And this special treatment is true regardless of > whether interrupt remapping is enabled (which comes only after an > interrupt message to this range is recognized). For my/our education, could you outline what happens to device accesses to that range when interrupt remapping is off? And perhaps also what happens to accesses to this range that don't match the pattern of an MSI initiation (dword write)? I don't think I've been able to spot anything to this effect in the docs. Jan
> From: Jan Beulich <jbeulich@suse.com> > Sent: Monday, March 1, 2021 4:16 PM > > On 01.03.2021 03:18, Tian, Kevin wrote: > >> From: Roger Pau Monné <roger.pau@citrix.com> > >> Sent: Thursday, February 11, 2021 8:27 PM > >> > >> On Thu, Feb 11, 2021 at 12:22:41PM +0100, Jan Beulich wrote: > >>> On 11.02.2021 12:16, Roger Pau Monné wrote: > >>>> On Thu, Feb 11, 2021 at 11:36:59AM +0100, Jan Beulich wrote: > >>>>> On 11.02.2021 09:45, Roger Pau Monné wrote: > >>>>>> On Wed, Feb 10, 2021 at 05:48:26PM +0100, Jan Beulich wrote: > >>>>>>> --- a/xen/include/asm-x86/p2m.h > >>>>>>> +++ b/xen/include/asm-x86/p2m.h > >>>>>>> @@ -935,6 +935,9 @@ static inline unsigned int p2m_get_iommu > >>>>>>> flags = IOMMUF_readable; > >>>>>>> if ( !rangeset_contains_singleton(mmio_ro_ranges, > mfn_x(mfn)) ) > >>>>>>> flags |= IOMMUF_writable; > >>>>>>> + /* VMX'es APIC access page is global and hence has no owner. > >> */ > >>>>>>> + if ( mfn_valid(mfn) && !page_get_owner(mfn_to_page(mfn)) ) > >>>>>>> + flags = 0; > >>>>>> > >>>>>> Is it fine to have this page accessible to devices if the page tables > >>>>>> are shared between the CPU and the IOMMU? > >>>>> > >>>>> No, it's not, but what do you do? As said elsewhere, devices > >>>>> gaining more access than is helpful is the price we pay for > >>>>> being able to share page tables. But ... > >>>> > >>>> I'm concerned about allowing devices to write to this shared page, as > >>>> could be used as an unintended way to exchange information between > >>>> domains? > >>> > >>> Well, such an abuse would be possible, but it wouldn't be part > >>> of an ABI and hence could break at any time. Similarly I > >>> wouldn't consider it an information leak if a guest abused > >>> this. > >> > >> Hm, I'm kind of worried about having such shared page accessible to > >> guests. Could Intel confirm whether pages in the 0xFEExxxxx range are > >> accessible to devices in any way when using IOMMU shared page > >> tables? > > > > 0xFEExxxxx range is special. Requests to this range are not subject to > > DMA remapping (even if a valid mapping for this range exists in the > > IOMMU page table). And this special treatment is true regardless of > > whether interrupt remapping is enabled (which comes only after an > > interrupt message to this range is recognized). > > For my/our education, could you outline what happens to device > accesses to that range when interrupt remapping is off? And > perhaps also what happens to accesses to this range that don't > match the pattern of an MSI initiation (dword write)? I don't > think I've been able to spot anything to this effect in the docs. > In VT-d spec "3.14 Handling Requests to Interrupt Address Range" -- On Intel® architecture platforms, physical address range 0xFEEx_xxxx is designated as the interrupt address range. Requests without PASID to this range are not subjected to DMA remapping (even if translation structures specify a mapping for this range). -- The following types of requests to this range are illegal requests. They are blocked and reported as Interrupt Remapping faults. • Read requests without PASID that are not ZLR. • Atomics requests without PASID. • Non-DWORD length write requests without PASID. -- Interrupt remapping decides how to interpret the format of the recognized interrupt message and whether to go through IRTE, as explained in "5.1.4 Interrupt-Remapping Hardware Operation": -- An interrupt request is identified by hardware as a DWORD sized write request to interrupt address ranges 0xFEEx_xxxx. • When interrupt-remapping is not enabled (IRES field Clear in Global Status Register), all interrupt requests are processed per the Compatibility interrupt request format described in Section 5.1.2.1. • When interrupt-remapping is enabled (IRES field Set in Global Status Register), interrupt requests are processed as follows: ... -- Thanks Kevin
On 01.03.2021 09:30, Tian, Kevin wrote: >> From: Jan Beulich <jbeulich@suse.com> >> Sent: Monday, March 1, 2021 4:16 PM >> >> On 01.03.2021 03:18, Tian, Kevin wrote: >>>> From: Roger Pau Monné <roger.pau@citrix.com> >>>> Sent: Thursday, February 11, 2021 8:27 PM >>>> >>>> On Thu, Feb 11, 2021 at 12:22:41PM +0100, Jan Beulich wrote: >>>>> On 11.02.2021 12:16, Roger Pau Monné wrote: >>>>>> On Thu, Feb 11, 2021 at 11:36:59AM +0100, Jan Beulich wrote: >>>>>>> On 11.02.2021 09:45, Roger Pau Monné wrote: >>>>>>>> On Wed, Feb 10, 2021 at 05:48:26PM +0100, Jan Beulich wrote: >>>>>>>>> --- a/xen/include/asm-x86/p2m.h >>>>>>>>> +++ b/xen/include/asm-x86/p2m.h >>>>>>>>> @@ -935,6 +935,9 @@ static inline unsigned int p2m_get_iommu >>>>>>>>> flags = IOMMUF_readable; >>>>>>>>> if ( !rangeset_contains_singleton(mmio_ro_ranges, >> mfn_x(mfn)) ) >>>>>>>>> flags |= IOMMUF_writable; >>>>>>>>> + /* VMX'es APIC access page is global and hence has no owner. >>>> */ >>>>>>>>> + if ( mfn_valid(mfn) && !page_get_owner(mfn_to_page(mfn)) ) >>>>>>>>> + flags = 0; >>>>>>>> >>>>>>>> Is it fine to have this page accessible to devices if the page tables >>>>>>>> are shared between the CPU and the IOMMU? >>>>>>> >>>>>>> No, it's not, but what do you do? As said elsewhere, devices >>>>>>> gaining more access than is helpful is the price we pay for >>>>>>> being able to share page tables. But ... >>>>>> >>>>>> I'm concerned about allowing devices to write to this shared page, as >>>>>> could be used as an unintended way to exchange information between >>>>>> domains? >>>>> >>>>> Well, such an abuse would be possible, but it wouldn't be part >>>>> of an ABI and hence could break at any time. Similarly I >>>>> wouldn't consider it an information leak if a guest abused >>>>> this. >>>> >>>> Hm, I'm kind of worried about having such shared page accessible to >>>> guests. Could Intel confirm whether pages in the 0xFEExxxxx range are >>>> accessible to devices in any way when using IOMMU shared page >>>> tables? >>> >>> 0xFEExxxxx range is special. Requests to this range are not subject to >>> DMA remapping (even if a valid mapping for this range exists in the >>> IOMMU page table). And this special treatment is true regardless of >>> whether interrupt remapping is enabled (which comes only after an >>> interrupt message to this range is recognized). >> >> For my/our education, could you outline what happens to device >> accesses to that range when interrupt remapping is off? And >> perhaps also what happens to accesses to this range that don't >> match the pattern of an MSI initiation (dword write)? I don't >> think I've been able to spot anything to this effect in the docs. >> > > In VT-d spec "3.14 Handling Requests to Interrupt Address Range" > -- > On Intel® architecture platforms, physical address range 0xFEEx_xxxx is > designated as the interrupt address range. Requests without PASID to > this range are not subjected to DMA remapping (even if translation > structures specify a mapping for this range). > -- > The following types of requests to this range are illegal requests. > They are blocked and reported as Interrupt Remapping faults. > • Read requests without PASID that are not ZLR. > • Atomics requests without PASID. > • Non-DWORD length write requests without PASID. > -- Ah, I see. That's (according to the change bars) a relatively recent addition. So the above clarifies things for the !PASID case. Am I interpreting "Requests-with-PASID with input address in range 0xFEEx_xxxx are translated normally like any other request-with-PASID through DMA-remapping hardware. However, if such a request is processed using pass-through translation, it will be blocked as described in the paragraph below. Software must not program paging-structure entries to remap any address to the interrupt address range. Untranslated requests and translation requests that result in an address in the interrupt range will be blocked with condition code LGN.4 or SGN.8. Translated requests with an address in the interrupt address range are treated as Unsupported Request (UR)." right in that _with_ PASID translation entries for this range would still be used, so long as they translate to an area outside of the FEExxxxx range? If so this would mean that with PASID (whenever we get to enabling this mode of operation) we'd need to avoid sharing page tables, and we'd need to suppress mirroring of EPT insertions for this range in the IOMMU page tables. (All of this independent of the particular choice of the APIC access page.) Jan
> From: Jan Beulich <jbeulich@suse.com> > Sent: Monday, March 1, 2021 5:59 PM > > On 01.03.2021 09:30, Tian, Kevin wrote: > >> From: Jan Beulich <jbeulich@suse.com> > >> Sent: Monday, March 1, 2021 4:16 PM > >> > >> On 01.03.2021 03:18, Tian, Kevin wrote: > >>>> From: Roger Pau Monné <roger.pau@citrix.com> > >>>> Sent: Thursday, February 11, 2021 8:27 PM > >>>> > >>>> On Thu, Feb 11, 2021 at 12:22:41PM +0100, Jan Beulich wrote: > >>>>> On 11.02.2021 12:16, Roger Pau Monné wrote: > >>>>>> On Thu, Feb 11, 2021 at 11:36:59AM +0100, Jan Beulich wrote: > >>>>>>> On 11.02.2021 09:45, Roger Pau Monné wrote: > >>>>>>>> On Wed, Feb 10, 2021 at 05:48:26PM +0100, Jan Beulich wrote: > >>>>>>>>> --- a/xen/include/asm-x86/p2m.h > >>>>>>>>> +++ b/xen/include/asm-x86/p2m.h > >>>>>>>>> @@ -935,6 +935,9 @@ static inline unsigned int p2m_get_iommu > >>>>>>>>> flags = IOMMUF_readable; > >>>>>>>>> if ( !rangeset_contains_singleton(mmio_ro_ranges, > >> mfn_x(mfn)) ) > >>>>>>>>> flags |= IOMMUF_writable; > >>>>>>>>> + /* VMX'es APIC access page is global and hence has no > owner. > >>>> */ > >>>>>>>>> + if ( mfn_valid(mfn) > && !page_get_owner(mfn_to_page(mfn)) ) > >>>>>>>>> + flags = 0; > >>>>>>>> > >>>>>>>> Is it fine to have this page accessible to devices if the page tables > >>>>>>>> are shared between the CPU and the IOMMU? > >>>>>>> > >>>>>>> No, it's not, but what do you do? As said elsewhere, devices > >>>>>>> gaining more access than is helpful is the price we pay for > >>>>>>> being able to share page tables. But ... > >>>>>> > >>>>>> I'm concerned about allowing devices to write to this shared page, as > >>>>>> could be used as an unintended way to exchange information > between > >>>>>> domains? > >>>>> > >>>>> Well, such an abuse would be possible, but it wouldn't be part > >>>>> of an ABI and hence could break at any time. Similarly I > >>>>> wouldn't consider it an information leak if a guest abused > >>>>> this. > >>>> > >>>> Hm, I'm kind of worried about having such shared page accessible to > >>>> guests. Could Intel confirm whether pages in the 0xFEExxxxx range are > >>>> accessible to devices in any way when using IOMMU shared page > >>>> tables? > >>> > >>> 0xFEExxxxx range is special. Requests to this range are not subject to > >>> DMA remapping (even if a valid mapping for this range exists in the > >>> IOMMU page table). And this special treatment is true regardless of > >>> whether interrupt remapping is enabled (which comes only after an > >>> interrupt message to this range is recognized). > >> > >> For my/our education, could you outline what happens to device > >> accesses to that range when interrupt remapping is off? And > >> perhaps also what happens to accesses to this range that don't > >> match the pattern of an MSI initiation (dword write)? I don't > >> think I've been able to spot anything to this effect in the docs. > >> > > > > In VT-d spec "3.14 Handling Requests to Interrupt Address Range" > > -- > > On Intel® architecture platforms, physical address range 0xFEEx_xxxx is > > designated as the interrupt address range. Requests without PASID to > > this range are not subjected to DMA remapping (even if translation > > structures specify a mapping for this range). > > -- > > The following types of requests to this range are illegal requests. > > They are blocked and reported as Interrupt Remapping faults. > > • Read requests without PASID that are not ZLR. > > • Atomics requests without PASID. > > • Non-DWORD length write requests without PASID. > > -- > > Ah, I see. That's (according to the change bars) a relatively recent > addition. So the above clarifies things for the !PASID case. Am I > interpreting > > "Requests-with-PASID with input address in range 0xFEEx_xxxx are > translated normally like any other request-with-PASID through > DMA-remapping hardware. However, if such a request is processed > using pass-through translation, it will be blocked as described > in the paragraph below. > > Software must not program paging-structure entries to remap any > address to the interrupt address range. Untranslated requests and > translation requests that result in an address in the interrupt > range will be blocked with condition code LGN.4 or SGN.8. > Translated requests with an address in the interrupt address > range are treated as Unsupported Request (UR)." > > right in that _with_ PASID translation entries for this range would > still be used, so long as they translate to an area outside of the > FEExxxxx range? If so this would mean that with PASID (whenever we yes > get to enabling this mode of operation) we'd need to avoid sharing > page tables, and we'd need to suppress mirroring of EPT insertions > for this range in the IOMMU page tables. (All of this independent > of the particular choice of the APIC access page.) > Or you can still share page tables as long as no DMA will target at this range (which should be the normal case and any inadvertent or malicious attempt is blocked anyway). Thanks Kevin
--- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1007,6 +1007,8 @@ int arch_domain_soft_reset(struct domain void arch_domain_creation_finished(struct domain *d) { + if ( is_hvm_domain(d) ) + hvm_domain_creation_finished(d); } #define xen_vcpu_guest_context vcpu_guest_context --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -66,8 +66,7 @@ boolean_param("force-ept", opt_force_ept static void vmx_ctxt_switch_from(struct vcpu *v); static void vmx_ctxt_switch_to(struct vcpu *v); -static int vmx_alloc_vlapic_mapping(struct domain *d); -static void vmx_free_vlapic_mapping(struct domain *d); +static int alloc_vlapic_mapping(void); static void vmx_install_vlapic_mapping(struct vcpu *v); static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr, unsigned int flags); @@ -78,6 +77,8 @@ static int vmx_msr_read_intercept(unsign static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content); static void vmx_invlpg(struct vcpu *v, unsigned long linear); +static mfn_t __read_mostly apic_access_mfn; + /* Values for domain's ->arch.hvm_domain.pi_ops.flags. */ #define PI_CSW_FROM (1u << 0) #define PI_CSW_TO (1u << 1) @@ -401,7 +402,6 @@ static int vmx_domain_initialise(struct .to = vmx_ctxt_switch_to, .tail = vmx_do_resume, }; - int rc; d->arch.ctxt_switch = &csw; @@ -411,21 +411,16 @@ static int vmx_domain_initialise(struct */ d->arch.hvm.vmx.exec_sp = is_hardware_domain(d) || opt_ept_exec_sp; - if ( !has_vlapic(d) ) - return 0; - - if ( (rc = vmx_alloc_vlapic_mapping(d)) != 0 ) - return rc; - return 0; } -static void vmx_domain_relinquish_resources(struct domain *d) +static void domain_creation_finished(struct domain *d) { - if ( !has_vlapic(d) ) - return; - vmx_free_vlapic_mapping(d); + if ( !mfn_eq(apic_access_mfn, _mfn(0)) && + set_mmio_p2m_entry(d, gaddr_to_gfn(APIC_DEFAULT_PHYS_BASE), + apic_access_mfn, PAGE_ORDER_4K) ) + domain_crash(d); } static void vmx_init_ipt(struct vcpu *v) @@ -2407,7 +2402,7 @@ static struct hvm_function_table __initd .cpu_up_prepare = vmx_cpu_up_prepare, .cpu_dead = vmx_cpu_dead, .domain_initialise = vmx_domain_initialise, - .domain_relinquish_resources = vmx_domain_relinquish_resources, + .domain_creation_finished = domain_creation_finished, .vcpu_initialise = vmx_vcpu_initialise, .vcpu_destroy = vmx_vcpu_destroy, .save_cpu_ctxt = vmx_save_vmcs_ctxt, @@ -2653,7 +2648,7 @@ const struct hvm_function_table * __init { set_in_cr4(X86_CR4_VMXE); - if ( vmx_vmcs_init() ) + if ( vmx_vmcs_init() || alloc_vlapic_mapping() ) { printk("VMX: failed to initialise.\n"); return NULL; @@ -3208,7 +3203,7 @@ gp_fault: return X86EMUL_EXCEPTION; } -static int vmx_alloc_vlapic_mapping(struct domain *d) +static int __init alloc_vlapic_mapping(void) { struct page_info *pg; mfn_t mfn; @@ -3216,53 +3211,28 @@ static int vmx_alloc_vlapic_mapping(stru if ( !cpu_has_vmx_virtualize_apic_accesses ) return 0; - pg = alloc_domheap_page(d, MEMF_no_refcount); + pg = alloc_domheap_page(NULL, 0); if ( !pg ) return -ENOMEM; - if ( !get_page_and_type(pg, d, PGT_writable_page) ) - { - /* - * The domain can't possibly know about this page yet, so failure - * here is a clear indication of something fishy going on. - */ - domain_crash(d); - return -ENODATA; - } - mfn = page_to_mfn(pg); clear_domain_page(mfn); - d->arch.hvm.vmx.apic_access_mfn = mfn; + apic_access_mfn = mfn; - return set_mmio_p2m_entry(d, gaddr_to_gfn(APIC_DEFAULT_PHYS_BASE), mfn, - PAGE_ORDER_4K); -} - -static void vmx_free_vlapic_mapping(struct domain *d) -{ - mfn_t mfn = d->arch.hvm.vmx.apic_access_mfn; - - d->arch.hvm.vmx.apic_access_mfn = _mfn(0); - if ( !mfn_eq(mfn, _mfn(0)) ) - { - struct page_info *pg = mfn_to_page(mfn); - - put_page_alloc_ref(pg); - put_page_and_type(pg); - } + return 0; } static void vmx_install_vlapic_mapping(struct vcpu *v) { paddr_t virt_page_ma, apic_page_ma; - if ( mfn_eq(v->domain->arch.hvm.vmx.apic_access_mfn, _mfn(0)) ) + if ( mfn_eq(apic_access_mfn, _mfn(0)) ) return; ASSERT(cpu_has_vmx_virtualize_apic_accesses); virt_page_ma = page_to_maddr(vcpu_vlapic(v)->regs_page); - apic_page_ma = mfn_to_maddr(v->domain->arch.hvm.vmx.apic_access_mfn); + apic_page_ma = mfn_to_maddr(apic_access_mfn); vmx_vmcs_enter(v); __vmwrite(VIRTUAL_APIC_PAGE_ADDR, virt_page_ma); --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -106,6 +106,7 @@ struct hvm_function_table { * Initialise/destroy HVM domain/vcpu resources */ int (*domain_initialise)(struct domain *d); + void (*domain_creation_finished)(struct domain *d); void (*domain_relinquish_resources)(struct domain *d); void (*domain_destroy)(struct domain *d); int (*vcpu_initialise)(struct vcpu *v); @@ -390,6 +391,12 @@ static inline bool hvm_has_set_descripto return hvm_funcs.set_descriptor_access_exiting; } +static inline void hvm_domain_creation_finished(struct domain *d) +{ + if ( hvm_funcs.domain_creation_finished ) + alternative_vcall(hvm_funcs.domain_creation_finished, d); +} + static inline int hvm_guest_x86_mode(struct vcpu *v) { @@ -765,6 +772,11 @@ static inline void hvm_invlpg(const stru { ASSERT_UNREACHABLE(); } + +static inline void hvm_domain_creation_finished(struct domain *d) +{ + ASSERT_UNREACHABLE(); +} /* * Shadow code needs further cleanup to eliminate some HVM-only paths. For --- a/xen/include/asm-x86/hvm/vmx/vmcs.h +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h @@ -58,7 +58,6 @@ struct ept_data { #define _VMX_DOMAIN_PML_ENABLED 0 #define VMX_DOMAIN_PML_ENABLED (1ul << _VMX_DOMAIN_PML_ENABLED) struct vmx_domain { - mfn_t apic_access_mfn; /* VMX_DOMAIN_* */ unsigned int status; --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -935,6 +935,9 @@ static inline unsigned int p2m_get_iommu flags = IOMMUF_readable; if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) ) flags |= IOMMUF_writable; + /* VMX'es APIC access page is global and hence has no owner. */ + if ( mfn_valid(mfn) && !page_get_owner(mfn_to_page(mfn)) ) + flags = 0; break; default: flags = 0;
The address of this page is used by the CPU only to recognize when to instead access the virtual APIC page instead. No accesses would ever go to this page. It only needs to be present in the (CPU) page tables so that address translation will produce its address as result for respective accesses. By making this page global, we also eliminate the need to refcount it, or to assign it to any domain in the first place. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- Hooking p2m insertion onto arch_domain_creation_finished() isn't very nice, but I couldn't find any better hook (nor a good place where to introduce a new one). In particular there look to be no hvm_funcs hooks being used on a domain-wide basis (except for init/destroy of course). I did consider connecting this to the setting of HVM_PARAM_IDENT_PT, but considered this no better, the more that the tool stack could be smarter and avoid setting that param when not needed. I did further consider not allocating any real page at all, but just using the address of some unpopulated space (which would require announcing this page as reserved to Dom0, so it wouldn't put any PCI MMIO BARs there). But I thought this would be too controversial, because of the possible risks associated with this. Perhaps the change to p2m_get_iommu_flags() should be in a separate patch.