Message ID | 348063358f2ded237334b3cec7498a36296fd408.1676351034.git.tamas@tklengyel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [1/2] x86: Perform mem_sharing teardown before paging teardown | expand |
On 14.02.2023 06:07, Tamas K Lengyel wrote: > When VM forking is initiated a VM is not supposed to try to perform mem_sharing > yet as the fork process hasn't completed all required steps. However, the vCPU > bring-up paths trigger guest memory accesses that can lead to such premature > sharing ops. However, the gating check to see whether a VM is a fork is set > already (ie. the domain has a parent). I find this confusing, and I had to read the patch to understand what's meant. Since there are a total of three VMs involved here (the one driving the fork, the one being forked, and the new clone), "a VM" is insufficient. The sentence further reads as if that VM is performing sharing operations on itself, which according to my understanding is impossible. Furthermore "the vCPU bringup paths" is also not specific enough imo. The forked VM as well as the new clone are both paused if I'm not mistaken, so neither can be in the process of bringing up any vCPU-s. I assume you refer to bring_up_vcpus(), but I'm afraid I view this function as misnamed. vCPU-s are being _created_ there, not brought up. Furthermore there are no guest memory accesses from underneath bring_up_vcpus(), so with those accesses you may be referring to copy_settings() instead? Which in turn - I'm sorry for my ignorance - raises the question why (implicit) hypervisor side accesses to guest memory might trigger sharing: So far I was working from the assumption that it's only control tool requests which do. Otoh you talk of "sharing ops", which suggests it may indeed be requests coming from the control tool. Yet that's also what invoked the request to fork, so shouldn't it coordinate with itself and avoid issuing sharing ops prematurely? > --- a/xen/arch/x86/include/asm/hvm/domain.h > +++ b/xen/arch/x86/include/asm/hvm/domain.h > @@ -33,6 +33,14 @@ struct mem_sharing_domain > { > bool enabled, block_interrupts; > > + /* > + * We need to avoid trying to nominate pages for forking until the > + * fork operation is completely finished. As parts of fork creation > + * is restartable we mark here if the process started to skip the > + * non-restartable portion. > + */ > + bool fork_started; "non-restartable portion" reads incorrect to me: The issue there is with that portion being one-time initialization. I think this should also be said this way. Furthermore (nit) it wants to be either "parts" and "are" or "part" and "is", and I think a comma after "started" would help readability (or else it can be read as "the process started to skip <whatever>"). Finally maybe better "... this flag indicates that ..." (in particular because we definitely do not mark here, but in fork()). > @@ -1905,7 +1906,7 @@ static int fork(struct domain *cd, struct domain *d) > *cd->arch.cpuid = *d->arch.cpuid; > *cd->arch.msr = *d->arch.msr; > cd->vmtrace_size = d->vmtrace_size; > - cd->parent = d; > + msd->fork_started = 1; "true" please (and "false" below). Jan
On Wed, Feb 15, 2023 at 6:03 AM Jan Beulich <jbeulich@suse.com> wrote: > > On 14.02.2023 06:07, Tamas K Lengyel wrote: > > When VM forking is initiated a VM is not supposed to try to perform mem_sharing > > yet as the fork process hasn't completed all required steps. However, the vCPU > > bring-up paths trigger guest memory accesses that can lead to such premature > > sharing ops. However, the gating check to see whether a VM is a fork is set > > already (ie. the domain has a parent). > > I find this confusing, and I had to read the patch to understand what's > meant. Since there are a total of three VMs involved here (the one > driving the fork, the one being forked, and the new clone), "a VM" is > insufficient. The sentence further reads as if that VM is performing > sharing operations on itself, which according to my understanding is > impossible. > > Furthermore "the vCPU bringup paths" is also not specific enough imo. > The forked VM as well as the new clone are both paused if I'm not > mistaken, so neither can be in the process of bringing up any vCPU-s. > I assume you refer to bring_up_vcpus(), but I'm afraid I view this > function as misnamed. vCPU-s are being _created_ there, not brought up. > Furthermore there are no guest memory accesses from underneath > bring_up_vcpus(), so with those accesses you may be referring to > copy_settings() instead? Which in turn - I'm sorry for my ignorance - > raises the question why (implicit) hypervisor side accesses to guest > memory might trigger sharing: So far I was working from the assumption > that it's only control tool requests which do. Otoh you talk of > "sharing ops", which suggests it may indeed be requests coming from > the control tool. Yet that's also what invoked the request to fork, > so shouldn't it coordinate with itself and avoid issuing sharing ops > prematurely? I went back to double-check and here is the memory access during vcpu_create: (XEN) Xen call trace: (XEN) [<ffff82d0402ebf38>] R mem_sharing.c#mem_sharing_gfn_alloc+0x5c/0x5e (XEN) [<ffff82d0402ecb12>] F mem_sharing.c#add_to_physmap+0x175/0x233 (XEN) [<ffff82d0402ee81d>] F mem_sharing_fork_page+0x4ee/0x51e (XEN) [<ffff82d0402f244f>] F p2m_get_gfn_type_access+0x119/0x1a7 (XEN) [<ffff82d0402e67ef>] F hap.c#hap_update_paging_modes+0xbe/0x2ee (XEN) [<ffff82d0402942a0>] F vmx_create_vmcs+0x981/0xb71 (XEN) [<ffff82d040298884>] F vmx.c#vmx_vcpu_initialise+0x64/0x1a0 (XEN) [<ffff82d0402acc59>] F hvm_vcpu_initialise+0x97/0x19e (XEN) [<ffff82d0403168db>] F arch_vcpu_create+0xf3/0x1db (XEN) [<ffff82d040206c69>] F vcpu_create+0x211/0x35d (XEN) [<ffff82d0402f00b7>] F mem_sharing_memop+0x16a9/0x1869 (XEN) [<ffff82d0403317c5>] F subarch_memory_op+0x298/0x2c4 (XEN) [<ffff82d04032ca26>] F arch_memory_op+0xa9f/0xaa4 (XEN) [<ffff82d040221e66>] F do_memory_op+0x2150/0x2297 (XEN) [<ffff82d04030bcfb>] F pv_do_multicall_call+0x22c/0x4c7 (XEN) [<ffff82d040319727>] F arch_do_multicall_call+0x23/0x2c (XEN) [<ffff82d04022231f>] F do_multicall+0xd3/0x417 (XEN) [<ffff82d04030c0e4>] F pv_hypercall+0xea/0x3a0 (XEN) [<ffff82d040201292>] F lstar_enter+0x122/0x130 Any time any page is requested in a fork that's not present it gets populated from the parent (if the parent has it). What I was worried about is nominate_page being called on the fork but I was mistaken, it was called on the parent before add_to_physmap is called on the fork - which is fine and the expected behavior. We can drop this patch, sorry for the noise. Tamas
diff --git a/xen/arch/x86/include/asm/hvm/domain.h b/xen/arch/x86/include/asm/hvm/domain.h index 698455444e..76a08b55f9 100644 --- a/xen/arch/x86/include/asm/hvm/domain.h +++ b/xen/arch/x86/include/asm/hvm/domain.h @@ -33,6 +33,14 @@ struct mem_sharing_domain { bool enabled, block_interrupts; + /* + * We need to avoid trying to nominate pages for forking until the + * fork operation is completely finished. As parts of fork creation + * is restartable we mark here if the process started to skip the + * non-restartable portion. + */ + bool fork_started; + /* * When releasing shared gfn's in a preemptible manner, recall where * to resume the search. diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index 649d93dc54..b55c5bccdd 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -1888,11 +1888,12 @@ static int copy_settings(struct domain *cd, struct domain *d) static int fork(struct domain *cd, struct domain *d) { int rc = -EBUSY; + struct mem_sharing_domain *msd = &cd->arch.hvm.mem_sharing; if ( !cd->controller_pause_count ) return rc; - if ( !cd->parent ) + if ( !msd->fork_started ) { if ( !get_domain(d) ) { @@ -1905,7 +1906,7 @@ static int fork(struct domain *cd, struct domain *d) *cd->arch.cpuid = *d->arch.cpuid; *cd->arch.msr = *d->arch.msr; cd->vmtrace_size = d->vmtrace_size; - cd->parent = d; + msd->fork_started = 1; } /* This is preemptible so it's the first to get done */ @@ -1918,8 +1919,11 @@ static int fork(struct domain *cd, struct domain *d) rc = copy_settings(cd, d); done: - if ( rc && rc != -ERESTART ) + if ( !rc ) + cd->parent = d; + else if ( rc != -ERESTART ) { + msd->fork_started = 0; cd->parent = NULL; domain_unpause(d); put_domain(d);
When VM forking is initiated a VM is not supposed to try to perform mem_sharing yet as the fork process hasn't completed all required steps. However, the vCPU bring-up paths trigger guest memory accesses that can lead to such premature sharing ops. However, the gating check to see whether a VM is a fork is set already (ie. the domain has a parent). In this patch we introduce a separate variable to store the information that forking has been initiated so that the non-restartable part of the forking op is not repeated and we avoid the premature sharing ops from triggering. Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com> --- xen/arch/x86/include/asm/hvm/domain.h | 8 ++++++++ xen/arch/x86/mm/mem_sharing.c | 10 +++++++--- 2 files changed, 15 insertions(+), 3 deletions(-)