Message ID | 90271e69-c07e-a32c-5531-a79b10ef03dd@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | VMX: apic access page handling adjustments | expand |
Jan Beulich writes ("[PATCH v3 1/2][4.15] VMX: delay p2m insertion of APIC access page"): > Inserting the mapping at domain creation time leads to a memory leak > when the creation fails later on and the domain uses separate CPU and > IOMMU page tables - the latter requires intermediate page tables to be > allocated, but there's no freeing of them at present in this case. Since > we don't need the p2m insertion to happen this early, avoid the problem > altogether by deferring it until the last possible point. Thanks. > This comes at > the price of not being able to handle an error other than by crashing > the domain. How worried should I be about this ? Ian.
On Mon, Feb 22, 2021 at 11:56:58AM +0100, Jan Beulich wrote: > Inserting the mapping at domain creation time leads to a memory leak > when the creation fails later on and the domain uses separate CPU and > IOMMU page tables - the latter requires intermediate page tables to be > allocated, but there's no freeing of them at present in this case. Since > we don't need the p2m insertion to happen this early, avoid the problem > altogether by deferring it until the last possible point. This comes at > the price of not being able to handle an error other than by crashing > the domain. > > Reported-by: Julien Grall <julien@xen.org> > Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> > --- > v3: New (split out). > --- > Hooking p2m insertion onto arch_domain_creation_finished() isn't very > nice, but I couldn't find any better hook (nor a good place where to > introduce a new one). In particular there look to be no hvm_funcs hooks > being used on a domain-wide basis (except for init/destroy of course). > I did consider connecting this to the setting of HVM_PARAM_IDENT_PT, but > considered this no better, the more that the tool stack could be smarter > and avoid setting that param when not needed. I'm not specially found of allocating the page in one hook, mapping it into the p2m in another hook and finally setting up the vmcs fields in yet another hook. I would rather prefer to have a single place where for the BSP the page is allocated and mapped into the p2m, while for APs just the vmcs fields are set. It's IMO slightly difficult to follow the setup when it's split into so many different places. Thanks, Roger.
On 22.02.2021 12:25, Ian Jackson wrote: > Jan Beulich writes ("[PATCH v3 1/2][4.15] VMX: delay p2m insertion of APIC access page"): >> Inserting the mapping at domain creation time leads to a memory leak >> when the creation fails later on and the domain uses separate CPU and >> IOMMU page tables - the latter requires intermediate page tables to be >> allocated, but there's no freeing of them at present in this case. Since >> we don't need the p2m insertion to happen this early, avoid the problem >> altogether by deferring it until the last possible point. > > Thanks. > >> This comes at >> the price of not being able to handle an error other than by crashing >> the domain. > > How worried should I be about this ? Not overly much I would say. The difference is between a failure (-ENOMEM) during domain creation vs the domain getting crashed before it gets first scheduled. This is certainly less friendly to the user, but lack of memory shouldn't typically happen when creating domains. Plus the memory talked about here is such that gets provided explicitly to the domain (the p2m pool), rather than a system wide pool. Jan
Jan Beulich writes ("Re: [PATCH v3 1/2][4.15] VMX: delay p2m insertion of APIC access page"): > On 22.02.2021 12:25, Ian Jackson wrote: > > Jan Beulich writes ("[PATCH v3 1/2][4.15] VMX: delay p2m insertion of APIC access page"): > >> Inserting the mapping at domain creation time leads to a memory leak > >> when the creation fails later on and the domain uses separate CPU and > >> IOMMU page tables - the latter requires intermediate page tables to be > >> allocated, but there's no freeing of them at present in this case. Since > >> we don't need the p2m insertion to happen this early, avoid the problem > >> altogether by deferring it until the last possible point. > > > > Thanks. > > > >> This comes at > >> the price of not being able to handle an error other than by crashing > >> the domain. > > > > How worried should I be about this ? > > Not overly much I would say. The difference is between a failure > (-ENOMEM) during domain creation vs the domain getting crashed > before it gets first scheduled. This is certainly less friendly > to the user, but lack of memory shouldn't typically happen when > creating domains. Plus the memory talked about here is such that > gets provided explicitly to the domain (the p2m pool), rather > than a system wide pool. OK, thanks. Release-Acked-by: Ian Jackson <iwj@xenproject.org>
On 22.02.2021 11:56, Jan Beulich wrote: > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -428,6 +428,14 @@ static void vmx_domain_relinquish_resour > vmx_free_vlapic_mapping(d); > } > > +static void domain_creation_finished(struct domain *d) > +{ > + if ( has_vlapic(d) && !mfn_eq(d->arch.hvm.vmx.apic_access_mfn, _mfn(0)) && > + set_mmio_p2m_entry(d, gaddr_to_gfn(APIC_DEFAULT_PHYS_BASE), > + d->arch.hvm.vmx.apic_access_mfn, PAGE_ORDER_4K) ) > + domain_crash(d); > +} Having noticed that in patch 2 I also need to arrange for ept_get_entry_emt() to continue to return WB for this page, I'm inclined to add a respective assertion here. Would anyone object to me doing so? Kevin, Jun - I'd like this to also serve as a ping for an ack (with or without the suggested ASSERT() addition). Jan
> From: Jan Beulich <jbeulich@suse.com> > Sent: Thursday, February 25, 2021 4:44 PM > > On 22.02.2021 11:56, Jan Beulich wrote: > > --- a/xen/arch/x86/hvm/vmx/vmx.c > > +++ b/xen/arch/x86/hvm/vmx/vmx.c > > @@ -428,6 +428,14 @@ static void vmx_domain_relinquish_resour > > vmx_free_vlapic_mapping(d); > > } > > > > +static void domain_creation_finished(struct domain *d) > > +{ > > + if ( has_vlapic(d) && !mfn_eq(d->arch.hvm.vmx.apic_access_mfn, > _mfn(0)) && > > + set_mmio_p2m_entry(d, gaddr_to_gfn(APIC_DEFAULT_PHYS_BASE), > > + d->arch.hvm.vmx.apic_access_mfn, PAGE_ORDER_4K) ) > > + domain_crash(d); > > +} > > Having noticed that in patch 2 I also need to arrange for > ept_get_entry_emt() to continue to return WB for this page, I'm > inclined to add a respective assertion here. Would anyone object > to me doing so? > > Kevin, Jun - I'd like this to also serve as a ping for an ack > (with or without the suggested ASSERT() addition). > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
--- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1007,6 +1007,8 @@ int arch_domain_soft_reset(struct domain void arch_domain_creation_finished(struct domain *d) { + if ( is_hvm_domain(d) ) + hvm_domain_creation_finished(d); } #define xen_vcpu_guest_context vcpu_guest_context --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -428,6 +428,14 @@ static void vmx_domain_relinquish_resour vmx_free_vlapic_mapping(d); } +static void domain_creation_finished(struct domain *d) +{ + if ( has_vlapic(d) && !mfn_eq(d->arch.hvm.vmx.apic_access_mfn, _mfn(0)) && + set_mmio_p2m_entry(d, gaddr_to_gfn(APIC_DEFAULT_PHYS_BASE), + d->arch.hvm.vmx.apic_access_mfn, PAGE_ORDER_4K) ) + domain_crash(d); +} + static void vmx_init_ipt(struct vcpu *v) { unsigned int size = v->domain->vmtrace_size; @@ -2408,6 +2416,7 @@ static struct hvm_function_table __initd .cpu_dead = vmx_cpu_dead, .domain_initialise = vmx_domain_initialise, .domain_relinquish_resources = vmx_domain_relinquish_resources, + .domain_creation_finished = domain_creation_finished, .vcpu_initialise = vmx_vcpu_initialise, .vcpu_destroy = vmx_vcpu_destroy, .save_cpu_ctxt = vmx_save_vmcs_ctxt, @@ -3234,8 +3243,7 @@ static int vmx_alloc_vlapic_mapping(stru clear_domain_page(mfn); d->arch.hvm.vmx.apic_access_mfn = mfn; - return set_mmio_p2m_entry(d, gaddr_to_gfn(APIC_DEFAULT_PHYS_BASE), mfn, - PAGE_ORDER_4K); + return 0; } static void vmx_free_vlapic_mapping(struct domain *d) --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -106,6 +106,7 @@ struct hvm_function_table { * Initialise/destroy HVM domain/vcpu resources */ int (*domain_initialise)(struct domain *d); + void (*domain_creation_finished)(struct domain *d); void (*domain_relinquish_resources)(struct domain *d); void (*domain_destroy)(struct domain *d); int (*vcpu_initialise)(struct vcpu *v); @@ -390,6 +391,12 @@ static inline bool hvm_has_set_descripto return hvm_funcs.set_descriptor_access_exiting; } +static inline void hvm_domain_creation_finished(struct domain *d) +{ + if ( hvm_funcs.domain_creation_finished ) + alternative_vcall(hvm_funcs.domain_creation_finished, d); +} + static inline int hvm_guest_x86_mode(struct vcpu *v) { @@ -765,6 +772,11 @@ static inline void hvm_invlpg(const stru { ASSERT_UNREACHABLE(); } + +static inline void hvm_domain_creation_finished(struct domain *d) +{ + ASSERT_UNREACHABLE(); +} /* * Shadow code needs further cleanup to eliminate some HVM-only paths. For
Inserting the mapping at domain creation time leads to a memory leak when the creation fails later on and the domain uses separate CPU and IOMMU page tables - the latter requires intermediate page tables to be allocated, but there's no freeing of them at present in this case. Since we don't need the p2m insertion to happen this early, avoid the problem altogether by deferring it until the last possible point. This comes at the price of not being able to handle an error other than by crashing the domain. Reported-by: Julien Grall <julien@xen.org> Signed-off-by: Jan Beulich <jbeulich@suse.com> --- v3: New (split out). --- Hooking p2m insertion onto arch_domain_creation_finished() isn't very nice, but I couldn't find any better hook (nor a good place where to introduce a new one). In particular there look to be no hvm_funcs hooks being used on a domain-wide basis (except for init/destroy of course). I did consider connecting this to the setting of HVM_PARAM_IDENT_PT, but considered this no better, the more that the tool stack could be smarter and avoid setting that param when not needed.