diff mbox series

[v3,1/2,4.15] VMX: delay p2m insertion of APIC access page

Message ID 90271e69-c07e-a32c-5531-a79b10ef03dd@suse.com (mailing list archive)
State New, archived
Headers show
Series VMX: apic access page handling adjustments | expand

Commit Message

Jan Beulich Feb. 22, 2021, 10:56 a.m. UTC
Inserting the mapping at domain creation time leads to a memory leak
when the creation fails later on and the domain uses separate CPU and
IOMMU page tables - the latter requires intermediate page tables to be
allocated, but there's no freeing of them at present in this case. Since
we don't need the p2m insertion to happen this early, avoid the problem
altogether by deferring it until the last possible point. This comes at
the price of not being able to handle an error other than by crashing
the domain.

Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: New (split out).
---
Hooking p2m insertion onto arch_domain_creation_finished() isn't very
nice, but I couldn't find any better hook (nor a good place where to
introduce a new one). In particular there look to be no hvm_funcs hooks
being used on a domain-wide basis (except for init/destroy of course).
I did consider connecting this to the setting of HVM_PARAM_IDENT_PT, but
considered this no better, the more that the tool stack could be smarter
and avoid setting that param when not needed.

Comments

Ian Jackson Feb. 22, 2021, 11:25 a.m. UTC | #1
Jan Beulich writes ("[PATCH v3 1/2][4.15] VMX: delay p2m insertion of APIC access page"):
> Inserting the mapping at domain creation time leads to a memory leak
> when the creation fails later on and the domain uses separate CPU and
> IOMMU page tables - the latter requires intermediate page tables to be
> allocated, but there's no freeing of them at present in this case. Since
> we don't need the p2m insertion to happen this early, avoid the problem
> altogether by deferring it until the last possible point.

Thanks.

>   This comes at
> the price of not being able to handle an error other than by crashing
> the domain.

How worried should I be about this ?

Ian.
Roger Pau Monné Feb. 22, 2021, 12:15 p.m. UTC | #2
On Mon, Feb 22, 2021 at 11:56:58AM +0100, Jan Beulich wrote:
> Inserting the mapping at domain creation time leads to a memory leak
> when the creation fails later on and the domain uses separate CPU and
> IOMMU page tables - the latter requires intermediate page tables to be
> allocated, but there's no freeing of them at present in this case. Since
> we don't need the p2m insertion to happen this early, avoid the problem
> altogether by deferring it until the last possible point. This comes at
> the price of not being able to handle an error other than by crashing
> the domain.
> 
> Reported-by: Julien Grall <julien@xen.org>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

> ---
> v3: New (split out).
> ---
> Hooking p2m insertion onto arch_domain_creation_finished() isn't very
> nice, but I couldn't find any better hook (nor a good place where to
> introduce a new one). In particular there look to be no hvm_funcs hooks
> being used on a domain-wide basis (except for init/destroy of course).
> I did consider connecting this to the setting of HVM_PARAM_IDENT_PT, but
> considered this no better, the more that the tool stack could be smarter
> and avoid setting that param when not needed.

I'm not specially found of allocating the page in one hook, mapping it
into the p2m in another hook and finally setting up the vmcs fields in
yet another hook.

I would rather prefer to have a single place where for the BSP the
page is allocated and mapped into the p2m, while for APs just the vmcs
fields are set. It's IMO slightly difficult to follow the setup when
it's split into so many different places.

Thanks, Roger.
Jan Beulich Feb. 22, 2021, 2:05 p.m. UTC | #3
On 22.02.2021 12:25, Ian Jackson wrote:
> Jan Beulich writes ("[PATCH v3 1/2][4.15] VMX: delay p2m insertion of APIC access page"):
>> Inserting the mapping at domain creation time leads to a memory leak
>> when the creation fails later on and the domain uses separate CPU and
>> IOMMU page tables - the latter requires intermediate page tables to be
>> allocated, but there's no freeing of them at present in this case. Since
>> we don't need the p2m insertion to happen this early, avoid the problem
>> altogether by deferring it until the last possible point.
> 
> Thanks.
> 
>>   This comes at
>> the price of not being able to handle an error other than by crashing
>> the domain.
> 
> How worried should I be about this ?

Not overly much I would say. The difference is between a failure
(-ENOMEM) during domain creation vs the domain getting crashed
before it gets first scheduled. This is certainly less friendly
to the user, but lack of memory shouldn't typically happen when
creating domains. Plus the memory talked about here is such that
gets provided explicitly to the domain (the p2m pool), rather
than a system wide pool.

Jan
Ian Jackson Feb. 22, 2021, 5:17 p.m. UTC | #4
Jan Beulich writes ("Re: [PATCH v3 1/2][4.15] VMX: delay p2m insertion of APIC access page"):
> On 22.02.2021 12:25, Ian Jackson wrote:
> > Jan Beulich writes ("[PATCH v3 1/2][4.15] VMX: delay p2m insertion of APIC access page"):
> >> Inserting the mapping at domain creation time leads to a memory leak
> >> when the creation fails later on and the domain uses separate CPU and
> >> IOMMU page tables - the latter requires intermediate page tables to be
> >> allocated, but there's no freeing of them at present in this case. Since
> >> we don't need the p2m insertion to happen this early, avoid the problem
> >> altogether by deferring it until the last possible point.
> > 
> > Thanks.
> > 
> >>   This comes at
> >> the price of not being able to handle an error other than by crashing
> >> the domain.
> > 
> > How worried should I be about this ?
> 
> Not overly much I would say. The difference is between a failure
> (-ENOMEM) during domain creation vs the domain getting crashed
> before it gets first scheduled. This is certainly less friendly
> to the user, but lack of memory shouldn't typically happen when
> creating domains. Plus the memory talked about here is such that
> gets provided explicitly to the domain (the p2m pool), rather
> than a system wide pool.

OK, thanks.

Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Jan Beulich Feb. 25, 2021, 8:44 a.m. UTC | #5
On 22.02.2021 11:56, Jan Beulich wrote:
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -428,6 +428,14 @@ static void vmx_domain_relinquish_resour
>      vmx_free_vlapic_mapping(d);
>  }
>  
> +static void domain_creation_finished(struct domain *d)
> +{
> +    if ( has_vlapic(d) && !mfn_eq(d->arch.hvm.vmx.apic_access_mfn, _mfn(0)) &&
> +         set_mmio_p2m_entry(d, gaddr_to_gfn(APIC_DEFAULT_PHYS_BASE),
> +                            d->arch.hvm.vmx.apic_access_mfn, PAGE_ORDER_4K) )
> +        domain_crash(d);
> +}

Having noticed that in patch 2 I also need to arrange for
ept_get_entry_emt() to continue to return WB for this page, I'm
inclined to add a respective assertion here. Would anyone object
to me doing so?

Kevin, Jun - I'd like this to also serve as a ping for an ack
(with or without the suggested ASSERT() addition).

Jan
Tian, Kevin Feb. 26, 2021, 7:06 a.m. UTC | #6
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, February 25, 2021 4:44 PM
> 
> On 22.02.2021 11:56, Jan Beulich wrote:
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -428,6 +428,14 @@ static void vmx_domain_relinquish_resour
> >      vmx_free_vlapic_mapping(d);
> >  }
> >
> > +static void domain_creation_finished(struct domain *d)
> > +{
> > +    if ( has_vlapic(d) && !mfn_eq(d->arch.hvm.vmx.apic_access_mfn,
> _mfn(0)) &&
> > +         set_mmio_p2m_entry(d, gaddr_to_gfn(APIC_DEFAULT_PHYS_BASE),
> > +                            d->arch.hvm.vmx.apic_access_mfn, PAGE_ORDER_4K) )
> > +        domain_crash(d);
> > +}
> 
> Having noticed that in patch 2 I also need to arrange for
> ept_get_entry_emt() to continue to return WB for this page, I'm
> inclined to add a respective assertion here. Would anyone object
> to me doing so?
> 
> Kevin, Jun - I'd like this to also serve as a ping for an ack
> (with or without the suggested ASSERT() addition).
> 

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
diff mbox series

Patch

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1007,6 +1007,8 @@  int arch_domain_soft_reset(struct domain
 
 void arch_domain_creation_finished(struct domain *d)
 {
+    if ( is_hvm_domain(d) )
+        hvm_domain_creation_finished(d);
 }
 
 #define xen_vcpu_guest_context vcpu_guest_context
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -428,6 +428,14 @@  static void vmx_domain_relinquish_resour
     vmx_free_vlapic_mapping(d);
 }
 
+static void domain_creation_finished(struct domain *d)
+{
+    if ( has_vlapic(d) && !mfn_eq(d->arch.hvm.vmx.apic_access_mfn, _mfn(0)) &&
+         set_mmio_p2m_entry(d, gaddr_to_gfn(APIC_DEFAULT_PHYS_BASE),
+                            d->arch.hvm.vmx.apic_access_mfn, PAGE_ORDER_4K) )
+        domain_crash(d);
+}
+
 static void vmx_init_ipt(struct vcpu *v)
 {
     unsigned int size = v->domain->vmtrace_size;
@@ -2408,6 +2416,7 @@  static struct hvm_function_table __initd
     .cpu_dead             = vmx_cpu_dead,
     .domain_initialise    = vmx_domain_initialise,
     .domain_relinquish_resources = vmx_domain_relinquish_resources,
+    .domain_creation_finished = domain_creation_finished,
     .vcpu_initialise      = vmx_vcpu_initialise,
     .vcpu_destroy         = vmx_vcpu_destroy,
     .save_cpu_ctxt        = vmx_save_vmcs_ctxt,
@@ -3234,8 +3243,7 @@  static int vmx_alloc_vlapic_mapping(stru
     clear_domain_page(mfn);
     d->arch.hvm.vmx.apic_access_mfn = mfn;
 
-    return set_mmio_p2m_entry(d, gaddr_to_gfn(APIC_DEFAULT_PHYS_BASE), mfn,
-                              PAGE_ORDER_4K);
+    return 0;
 }
 
 static void vmx_free_vlapic_mapping(struct domain *d)
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -106,6 +106,7 @@  struct hvm_function_table {
      * Initialise/destroy HVM domain/vcpu resources
      */
     int  (*domain_initialise)(struct domain *d);
+    void (*domain_creation_finished)(struct domain *d);
     void (*domain_relinquish_resources)(struct domain *d);
     void (*domain_destroy)(struct domain *d);
     int  (*vcpu_initialise)(struct vcpu *v);
@@ -390,6 +391,12 @@  static inline bool hvm_has_set_descripto
     return hvm_funcs.set_descriptor_access_exiting;
 }
 
+static inline void hvm_domain_creation_finished(struct domain *d)
+{
+    if ( hvm_funcs.domain_creation_finished )
+        alternative_vcall(hvm_funcs.domain_creation_finished, d);
+}
+
 static inline int
 hvm_guest_x86_mode(struct vcpu *v)
 {
@@ -765,6 +772,11 @@  static inline void hvm_invlpg(const stru
 {
     ASSERT_UNREACHABLE();
 }
+
+static inline void hvm_domain_creation_finished(struct domain *d)
+{
+    ASSERT_UNREACHABLE();
+}
 
 /*
  * Shadow code needs further cleanup to eliminate some HVM-only paths. For