diff mbox series

[v2] VMX: use a single, global APIC access page

Message ID a895386d-db14-2743-d8f9-09f9509a510a@suse.com (mailing list archive)
State Superseded
Headers show
Series [v2] VMX: use a single, global APIC access page | expand

Commit Message

Jan Beulich Feb. 19, 2021, 4:45 p.m. UTC
The address of this page is used by the CPU only to recognize when to
access the virtual APIC page instead. No accesses would ever go to this
page. It only needs to be present in the (CPU) page tables so that
address translation will produce its address as result for respective
accesses.

By making this page global, we also eliminate the need to refcount it,
or to assign it to any domain in the first place.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Avoid insertion when !has_vlapic(). Split off change to
    p2m_get_iommu_flags().
---
Hooking p2m insertion onto arch_domain_creation_finished() isn't very
nice, but I couldn't find any better hook (nor a good place where to
introduce a new one). In particular there look to be no hvm_funcs hooks
being used on a domain-wide basis (except for init/destroy of course).
I did consider connecting this to the setting of HVM_PARAM_IDENT_PT, but
considered this no better, the more that the tool stack could be smarter
and avoid setting that param when not needed.

I did further consider not allocating any real page at all, but just
using the address of some unpopulated space (which would require
announcing this page as reserved to Dom0, so it wouldn't put any PCI
MMIO BARs there). But I thought this would be too controversial, because
of the possible risks associated with this.

Comments

Jan Beulich Feb. 19, 2021, 4:59 p.m. UTC | #1
On 19.02.2021 17:45, Jan Beulich wrote:
> The address of this page is used by the CPU only to recognize when to
> access the virtual APIC page instead. No accesses would ever go to this
> page. It only needs to be present in the (CPU) page tables so that
> address translation will produce its address as result for respective
> accesses.
> 
> By making this page global, we also eliminate the need to refcount it,
> or to assign it to any domain in the first place.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: Avoid insertion when !has_vlapic(). Split off change to
>     p2m_get_iommu_flags().
> ---
> Hooking p2m insertion onto arch_domain_creation_finished() isn't very
> nice, but I couldn't find any better hook (nor a good place where to
> introduce a new one). In particular there look to be no hvm_funcs hooks
> being used on a domain-wide basis (except for init/destroy of course).
> I did consider connecting this to the setting of HVM_PARAM_IDENT_PT, but
> considered this no better, the more that the tool stack could be smarter
> and avoid setting that param when not needed.

While this patch was triggered not just by Julien's observation of
the early p2m insertion being a problem, but also many earlier
times of running into this odd code, it is - especially at this
stage - perhaps a possible option to split the change into just
the movement of the set_mmio_p2m_entry() invocation and all the
rest, in order to defer that rest until after 4.15.

Jan
Ian Jackson Feb. 19, 2021, 5:05 p.m. UTC | #2
Jan Beulich writes ("[PATCH v2] VMX: use a single, global APIC access page"):
> The address of this page is used by the CPU only to recognize when to
> access the virtual APIC page instead. No accesses would ever go to this
> page. It only needs to be present in the (CPU) page tables so that
> address translation will produce its address as result for respective
> accesses.
> 
> By making this page global, we also eliminate the need to refcount it,
> or to assign it to any domain in the first place.

Thanks for this.

From this:

Jan Beulich writes ("Re: [PATCH v2] VMX: use a single, global APIC access page"):
> While this patch was triggered not just by Julien's observation of
> the early p2m insertion being a problem, but also many earlier
> times of running into this odd code, it is - especially at this
> stage - perhaps a possible option to split the change into just
> the movement of the set_mmio_p2m_entry() invocation and all the
> rest, in order to defer that rest until after 4.15.

I infer that this contains a bugfix, but perhaps other
changes/improvements too.

George, I think you're our expert on this refcounting stuff - what do
you think of this ?

I guess my key question is whether this change will introduce risk by
messing with the complex refcounting machineryt - or remove it by
removing an interaction with the refcounting.

Thanks,
Ian.
Jan Beulich Feb. 22, 2021, 7:51 a.m. UTC | #3
On 19.02.2021 18:05, Ian Jackson wrote:
> Jan Beulich writes ("Re: [PATCH v2] VMX: use a single, global APIC access page"):
>> While this patch was triggered not just by Julien's observation of
>> the early p2m insertion being a problem, but also many earlier
>> times of running into this odd code, it is - especially at this
>> stage - perhaps a possible option to split the change into just
>> the movement of the set_mmio_p2m_entry() invocation and all the
>> rest, in order to defer that rest until after 4.15.
> 
> I infer that this contains a bugfix, but perhaps other
> changes/improvements too.
> 
> George, I think you're our expert on this refcounting stuff - what do
> you think of this ?
> 
> I guess my key question is whether this change will introduce risk by
> messing with the complex refcounting machineryt - or remove it by
> removing an interaction with the refcounting.

If anything, then the latter, but largely neither afaict - there's no
change in this regard here at all as far as the guest could affect
behavior, due to the page getting inserted as p2m_mmio_direct, and
guest_remove_page() having

    if ( p2mt == p2m_mmio_direct )
    {
        rc = clear_mmio_p2m_entry(d, gmfn, mfn, PAGE_ORDER_4K);
        goto out_put_gfn;
    }

before any refcounting logic is reached. The removal of interaction
is because now the page doesn't get associated with a domain (and
hence doesn't become subject to refcounting) at all.

The risk of the change stems from going from using a per-domain
page to using a single, system-wide one, which indeed was the subject
of v1 discussion. In any event the consideration towards splitting
the change would cover either concern. Perhaps I should really do so
and submit as v3 ...

Jan
Roger Pau Monné Feb. 22, 2021, 10:09 a.m. UTC | #4
On Mon, Feb 22, 2021 at 08:51:59AM +0100, Jan Beulich wrote:
> On 19.02.2021 18:05, Ian Jackson wrote:
> > Jan Beulich writes ("Re: [PATCH v2] VMX: use a single, global APIC access page"):
> >> While this patch was triggered not just by Julien's observation of
> >> the early p2m insertion being a problem, but also many earlier
> >> times of running into this odd code, it is - especially at this
> >> stage - perhaps a possible option to split the change into just
> >> the movement of the set_mmio_p2m_entry() invocation and all the
> >> rest, in order to defer that rest until after 4.15.
> > 
> > I infer that this contains a bugfix, but perhaps other
> > changes/improvements too.
> > 
> > George, I think you're our expert on this refcounting stuff - what do
> > you think of this ?
> > 
> > I guess my key question is whether this change will introduce risk by
> > messing with the complex refcounting machineryt - or remove it by
> > removing an interaction with the refcounting.
> 
> If anything, then the latter, but largely neither afaict - there's no
> change in this regard here at all as far as the guest could affect
> behavior, due to the page getting inserted as p2m_mmio_direct, and
> guest_remove_page() having
> 
>     if ( p2mt == p2m_mmio_direct )
>     {
>         rc = clear_mmio_p2m_entry(d, gmfn, mfn, PAGE_ORDER_4K);
>         goto out_put_gfn;
>     }
> 
> before any refcounting logic is reached. The removal of interaction
> is because now the page doesn't get associated with a domain (and
> hence doesn't become subject to refcounting) at all.
> 
> The risk of the change stems from going from using a per-domain
> page to using a single, system-wide one, which indeed was the subject
> of v1 discussion. In any event the consideration towards splitting
> the change would cover either concern. Perhaps I should really do so
> and submit as v3 ...

I agree it would be less risky to keep using a per-domain page, and
switch to a global one after the release. From the discussion in v1 I
don't think we where able to spot any specific issues apart from
guests possibly being able to access shared data in this page from
passthrough devices. I would at least feel more confortable with
that approach given the point we are in the release process.

Thanks, Roger.
George Dunlap Feb. 22, 2021, 10:26 a.m. UTC | #5
> On Feb 22, 2021, at 7:51 AM, Jan Beulich <JBeulich@suse.com> wrote:
> 
> On 19.02.2021 18:05, Ian Jackson wrote:
>> Jan Beulich writes ("Re: [PATCH v2] VMX: use a single, global APIC access page"):
>>> While this patch was triggered not just by Julien's observation of
>>> the early p2m insertion being a problem, but also many earlier
>>> times of running into this odd code, it is - especially at this
>>> stage - perhaps a possible option to split the change into just
>>> the movement of the set_mmio_p2m_entry() invocation and all the
>>> rest, in order to defer that rest until after 4.15.
>> 
>> I infer that this contains a bugfix, but perhaps other
>> changes/improvements too.
>> 
>> George, I think you're our expert on this refcounting stuff - what do
>> you think of this ?
>> 
>> I guess my key question is whether this change will introduce risk by
>> messing with the complex refcounting machineryt - or remove it by
>> removing an interaction with the refcounting.
> 
> If anything, then the latter, but largely neither afaict

Does it actually contain a bugfix?  It’s not at all clear to me from reading the description that it’s anything other than a clean-up.  If there’s something else that needs to be called out explicitly.

It should indeed theoretically make things safer long-term; the current vlapic_page allocation is using special-case of the refcounting rules, making it much more prone to being the subject of an “oversight”.  But at this point in the release we don’t have much time at all to shake out any potential bugs in the new implementation; as such I’d consider anything other than the minimum necessary to fix a bug to be not worth it.

 -George
Jan Beulich Feb. 22, 2021, 10:50 a.m. UTC | #6
On 22.02.2021 11:26, George Dunlap wrote:
> 
> 
>> On Feb 22, 2021, at 7:51 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>
>> On 19.02.2021 18:05, Ian Jackson wrote:
>>> Jan Beulich writes ("Re: [PATCH v2] VMX: use a single, global APIC access page"):
>>>> While this patch was triggered not just by Julien's observation of
>>>> the early p2m insertion being a problem, but also many earlier
>>>> times of running into this odd code, it is - especially at this
>>>> stage - perhaps a possible option to split the change into just
>>>> the movement of the set_mmio_p2m_entry() invocation and all the
>>>> rest, in order to defer that rest until after 4.15.
>>>
>>> I infer that this contains a bugfix, but perhaps other
>>> changes/improvements too.
>>>
>>> George, I think you're our expert on this refcounting stuff - what do
>>> you think of this ?
>>>
>>> I guess my key question is whether this change will introduce risk by
>>> messing with the complex refcounting machineryt - or remove it by
>>> removing an interaction with the refcounting.
>>
>> If anything, then the latter, but largely neither afaict
> 
> Does it actually contain a bugfix?  It’s not at all clear to me from reading the description that it’s anything other than a clean-up.  If there’s something else that needs to be called out explicitly.

Hmm, yes. The change wanted making anyway, for a long time imo. Hence
when putting together the patch I forgot to call out that as a side
effect it addresses a memory leak, as reported by Julien. With the
splitting of the two changes that'll be necessarily mentioned. I'm
about to submit v3.

Jan

> It should indeed theoretically make things safer long-term; the current vlapic_page allocation is using special-case of the refcounting rules, making it much more prone to being the subject of an “oversight”.  But at this point in the release we don’t have much time at all to shake out any potential bugs in the new implementation; as such I’d consider anything other than the minimum necessary to fix a bug to be not worth it.
> 
>  -George
>
Ian Jackson Feb. 22, 2021, 11:22 a.m. UTC | #7
Roger Pau Monné writes ("Re: [PATCH for-4.15 v2] VMX: use a single, global APIC access page"):
> On Mon, Feb 22, 2021 at 08:51:59AM +0100, Jan Beulich wrote:
> > If anything, then the latter, but largely neither afaict - there's no
> > change in this regard here at all as far as the guest could affect
> > behavior, due to the page getting inserted as p2m_mmio_direct, and
> > guest_remove_page() having
> > 
> >     if ( p2mt == p2m_mmio_direct )
> >     {
> >         rc = clear_mmio_p2m_entry(d, gmfn, mfn, PAGE_ORDER_4K);
> >         goto out_put_gfn;
> >     }
> > 
> > before any refcounting logic is reached. The removal of interaction
> > is because now the page doesn't get associated with a domain (and
> > hence doesn't become subject to refcounting) at all.
> > 
> > The risk of the change stems from going from using a per-domain
> > page to using a single, system-wide one, which indeed was the subject
> > of v1 discussion. In any event the consideration towards splitting
> > the change would cover either concern. Perhaps I should really do so
> > and submit as v3 ...
> 
> I agree it would be less risky to keep using a per-domain page, and
> switch to a global one after the release. From the discussion in v1 I
> don't think we where able to spot any specific issues apart from
> guests possibly being able to access shared data in this page from
> passthrough devices. I would at least feel more confortable with
> that approach given the point we are in the release process.

Thanks to Roger and Jan for these comments which were very helpful to
me.

Jan Beulich writes ("Re: [PATCH for-4.15 v2] VMX: use a single, global APIC access page"):
> Hmm, yes. The change wanted making anyway, for a long time imo. Hence
> when putting together the patch I forgot to call out that as a side
> effect it addresses a memory leak, as reported by Julien. With the
> splitting of the two changes that'll be necessarily mentioned. I'm
> about to submit v3.

Great, I see it now.

Thanks,
Ian.
diff mbox series

Patch

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1007,6 +1007,8 @@  int arch_domain_soft_reset(struct domain
 
 void arch_domain_creation_finished(struct domain *d)
 {
+    if ( is_hvm_domain(d) )
+        hvm_domain_creation_finished(d);
 }
 
 #define xen_vcpu_guest_context vcpu_guest_context
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -66,8 +66,7 @@  boolean_param("force-ept", opt_force_ept
 static void vmx_ctxt_switch_from(struct vcpu *v);
 static void vmx_ctxt_switch_to(struct vcpu *v);
 
-static int  vmx_alloc_vlapic_mapping(struct domain *d);
-static void vmx_free_vlapic_mapping(struct domain *d);
+static int alloc_vlapic_mapping(void);
 static void vmx_install_vlapic_mapping(struct vcpu *v);
 static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr,
                                 unsigned int flags);
@@ -78,6 +77,8 @@  static int vmx_msr_read_intercept(unsign
 static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
 static void vmx_invlpg(struct vcpu *v, unsigned long linear);
 
+static mfn_t __read_mostly apic_access_mfn;
+
 /* Values for domain's ->arch.hvm_domain.pi_ops.flags. */
 #define PI_CSW_FROM (1u << 0)
 #define PI_CSW_TO   (1u << 1)
@@ -401,7 +402,6 @@  static int vmx_domain_initialise(struct
         .to   = vmx_ctxt_switch_to,
         .tail = vmx_do_resume,
     };
-    int rc;
 
     d->arch.ctxt_switch = &csw;
 
@@ -411,21 +411,16 @@  static int vmx_domain_initialise(struct
      */
     d->arch.hvm.vmx.exec_sp = is_hardware_domain(d) || opt_ept_exec_sp;
 
-    if ( !has_vlapic(d) )
-        return 0;
-
-    if ( (rc = vmx_alloc_vlapic_mapping(d)) != 0 )
-        return rc;
-
     return 0;
 }
 
-static void vmx_domain_relinquish_resources(struct domain *d)
+static void domain_creation_finished(struct domain *d)
 {
-    if ( !has_vlapic(d) )
-        return;
 
-    vmx_free_vlapic_mapping(d);
+    if ( has_vlapic(d) && !mfn_eq(apic_access_mfn, _mfn(0)) &&
+         set_mmio_p2m_entry(d, gaddr_to_gfn(APIC_DEFAULT_PHYS_BASE),
+                            apic_access_mfn, PAGE_ORDER_4K) )
+        domain_crash(d);
 }
 
 static void vmx_init_ipt(struct vcpu *v)
@@ -2407,7 +2402,7 @@  static struct hvm_function_table __initd
     .cpu_up_prepare       = vmx_cpu_up_prepare,
     .cpu_dead             = vmx_cpu_dead,
     .domain_initialise    = vmx_domain_initialise,
-    .domain_relinquish_resources = vmx_domain_relinquish_resources,
+    .domain_creation_finished = domain_creation_finished,
     .vcpu_initialise      = vmx_vcpu_initialise,
     .vcpu_destroy         = vmx_vcpu_destroy,
     .save_cpu_ctxt        = vmx_save_vmcs_ctxt,
@@ -2653,7 +2548,7 @@  const struct hvm_function_table * __init
 {
     set_in_cr4(X86_CR4_VMXE);
 
-    if ( vmx_vmcs_init() )
+    if ( vmx_vmcs_init() || alloc_vlapic_mapping() )
     {
         printk("VMX: failed to initialise.\n");
         return NULL;
@@ -3208,7 +3203,7 @@  gp_fault:
     return X86EMUL_EXCEPTION;
 }
 
-static int vmx_alloc_vlapic_mapping(struct domain *d)
+static int __init alloc_vlapic_mapping(void)
 {
     struct page_info *pg;
     mfn_t mfn;
@@ -3216,53 +3211,28 @@  static int vmx_alloc_vlapic_mapping(stru
     if ( !cpu_has_vmx_virtualize_apic_accesses )
         return 0;
 
-    pg = alloc_domheap_page(d, MEMF_no_refcount);
+    pg = alloc_domheap_page(NULL, 0);
     if ( !pg )
         return -ENOMEM;
 
-    if ( !get_page_and_type(pg, d, PGT_writable_page) )
-    {
-        /*
-         * The domain can't possibly know about this page yet, so failure
-         * here is a clear indication of something fishy going on.
-         */
-        domain_crash(d);
-        return -ENODATA;
-    }
-
     mfn = page_to_mfn(pg);
     clear_domain_page(mfn);
-    d->arch.hvm.vmx.apic_access_mfn = mfn;
+    apic_access_mfn = mfn;
 
-    return set_mmio_p2m_entry(d, gaddr_to_gfn(APIC_DEFAULT_PHYS_BASE), mfn,
-                              PAGE_ORDER_4K);
-}
-
-static void vmx_free_vlapic_mapping(struct domain *d)
-{
-    mfn_t mfn = d->arch.hvm.vmx.apic_access_mfn;
-
-    d->arch.hvm.vmx.apic_access_mfn = _mfn(0);
-    if ( !mfn_eq(mfn, _mfn(0)) )
-    {
-        struct page_info *pg = mfn_to_page(mfn);
-
-        put_page_alloc_ref(pg);
-        put_page_and_type(pg);
-    }
+    return 0;
 }
 
 static void vmx_install_vlapic_mapping(struct vcpu *v)
 {
     paddr_t virt_page_ma, apic_page_ma;
 
-    if ( mfn_eq(v->domain->arch.hvm.vmx.apic_access_mfn, _mfn(0)) )
+    if ( !has_vlapic(v->domain) || mfn_eq(apic_access_mfn, _mfn(0)) )
         return;
 
     ASSERT(cpu_has_vmx_virtualize_apic_accesses);
 
     virt_page_ma = page_to_maddr(vcpu_vlapic(v)->regs_page);
-    apic_page_ma = mfn_to_maddr(v->domain->arch.hvm.vmx.apic_access_mfn);
+    apic_page_ma = mfn_to_maddr(apic_access_mfn);
 
     vmx_vmcs_enter(v);
     __vmwrite(VIRTUAL_APIC_PAGE_ADDR, virt_page_ma);
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -106,6 +106,7 @@  struct hvm_function_table {
      * Initialise/destroy HVM domain/vcpu resources
      */
     int  (*domain_initialise)(struct domain *d);
+    void (*domain_creation_finished)(struct domain *d);
     void (*domain_relinquish_resources)(struct domain *d);
     void (*domain_destroy)(struct domain *d);
     int  (*vcpu_initialise)(struct vcpu *v);
@@ -390,6 +391,12 @@  static inline bool hvm_has_set_descripto
     return hvm_funcs.set_descriptor_access_exiting;
 }
 
+static inline void hvm_domain_creation_finished(struct domain *d)
+{
+    if ( hvm_funcs.domain_creation_finished )
+        alternative_vcall(hvm_funcs.domain_creation_finished, d);
+}
+
 static inline int
 hvm_guest_x86_mode(struct vcpu *v)
 {
@@ -765,6 +772,11 @@  static inline void hvm_invlpg(const stru
 {
     ASSERT_UNREACHABLE();
 }
+
+static inline void hvm_domain_creation_finished(struct domain *d)
+{
+    ASSERT_UNREACHABLE();
+}
 
 /*
  * Shadow code needs further cleanup to eliminate some HVM-only paths. For
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -58,7 +58,6 @@  struct ept_data {
 #define _VMX_DOMAIN_PML_ENABLED    0
 #define VMX_DOMAIN_PML_ENABLED     (1ul << _VMX_DOMAIN_PML_ENABLED)
 struct vmx_domain {
-    mfn_t apic_access_mfn;
     /* VMX_DOMAIN_* */
     unsigned int status;