diff mbox series

[2/2] x86/mem_sharing: Add extra variable to track fork progress

Message ID 348063358f2ded237334b3cec7498a36296fd408.1676351034.git.tamas@tklengyel.com (mailing list archive)
State Superseded
Headers show
Series [1/2] x86: Perform mem_sharing teardown before paging teardown | expand

Commit Message

Tamas K Lengyel Feb. 14, 2023, 5:07 a.m. UTC
When VM forking is initiated a VM is not supposed to try to perform mem_sharing
yet as the fork process hasn't completed all required steps. However, the vCPU
bring-up paths trigger guest memory accesses that can lead to such premature
sharing ops. However, the gating check to see whether a VM is a fork is set
already (ie. the domain has a parent).

In this patch we introduce a separate variable to store the information that
forking has been initiated so that the non-restartable part of the forking op
is not repeated and we avoid the premature sharing ops from triggering.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
---
 xen/arch/x86/include/asm/hvm/domain.h |  8 ++++++++
 xen/arch/x86/mm/mem_sharing.c         | 10 +++++++---
 2 files changed, 15 insertions(+), 3 deletions(-)

Comments

Jan Beulich Feb. 15, 2023, 11:03 a.m. UTC | #1
On 14.02.2023 06:07, Tamas K Lengyel wrote:
> When VM forking is initiated a VM is not supposed to try to perform mem_sharing
> yet as the fork process hasn't completed all required steps. However, the vCPU
> bring-up paths trigger guest memory accesses that can lead to such premature
> sharing ops. However, the gating check to see whether a VM is a fork is set
> already (ie. the domain has a parent).

I find this confusing, and I had to read the patch to understand what's
meant. Since there are a total of three VMs involved here (the one
driving the fork, the one being forked, and the new clone), "a VM" is
insufficient. The sentence further reads as if that VM is performing
sharing operations on itself, which according to my understanding is
impossible.

Furthermore "the vCPU bringup paths" is also not specific enough imo.
The forked VM as well as the new clone are both paused if I'm not
mistaken, so neither can be in the process of bringing up any vCPU-s.
I assume you refer to bring_up_vcpus(), but I'm afraid I view this
function as misnamed. vCPU-s are being _created_ there, not brought up.
Furthermore there are no guest memory accesses from underneath
bring_up_vcpus(), so with those accesses you may be referring to
copy_settings() instead? Which in turn - I'm sorry for my ignorance -
raises the question why (implicit) hypervisor side accesses to guest
memory might trigger sharing: So far I was working from the assumption
that it's only control tool requests which do. Otoh you talk of
"sharing ops", which suggests it may indeed be requests coming from
the control tool. Yet that's also what invoked the request to fork,
so shouldn't it coordinate with itself and avoid issuing sharing ops
prematurely?

> --- a/xen/arch/x86/include/asm/hvm/domain.h
> +++ b/xen/arch/x86/include/asm/hvm/domain.h
> @@ -33,6 +33,14 @@ struct mem_sharing_domain
>  {
>      bool enabled, block_interrupts;
>  
> +    /*
> +     * We need to avoid trying to nominate pages for forking until the
> +     * fork operation is completely finished. As parts of fork creation
> +     * is restartable we mark here if the process started to skip the
> +     * non-restartable portion.
> +     */
> +    bool fork_started;

"non-restartable portion" reads incorrect to me: The issue there is with
that portion being one-time initialization. I think this should also be
said this way. Furthermore (nit) it wants to be either "parts" and "are"
or "part" and "is", and I think a comma after "started" would help
readability (or else it can be read as "the process started to skip
<whatever>"). Finally maybe better "... this flag indicates that ..."
(in particular because we definitely do not mark here, but in fork()).

> @@ -1905,7 +1906,7 @@ static int fork(struct domain *cd, struct domain *d)
>          *cd->arch.cpuid = *d->arch.cpuid;
>          *cd->arch.msr = *d->arch.msr;
>          cd->vmtrace_size = d->vmtrace_size;
> -        cd->parent = d;
> +        msd->fork_started = 1;

"true" please (and "false" below).

Jan
Tamas K Lengyel Feb. 15, 2023, 5:02 p.m. UTC | #2
On Wed, Feb 15, 2023 at 6:03 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 14.02.2023 06:07, Tamas K Lengyel wrote:
> > When VM forking is initiated a VM is not supposed to try to perform
mem_sharing
> > yet as the fork process hasn't completed all required steps. However,
the vCPU
> > bring-up paths trigger guest memory accesses that can lead to such
premature
> > sharing ops. However, the gating check to see whether a VM is a fork is
set
> > already (ie. the domain has a parent).
>
> I find this confusing, and I had to read the patch to understand what's
> meant. Since there are a total of three VMs involved here (the one
> driving the fork, the one being forked, and the new clone), "a VM" is
> insufficient. The sentence further reads as if that VM is performing
> sharing operations on itself, which according to my understanding is
> impossible.
>
> Furthermore "the vCPU bringup paths" is also not specific enough imo.
> The forked VM as well as the new clone are both paused if I'm not
> mistaken, so neither can be in the process of bringing up any vCPU-s.
> I assume you refer to bring_up_vcpus(), but I'm afraid I view this
> function as misnamed. vCPU-s are being _created_ there, not brought up.
> Furthermore there are no guest memory accesses from underneath
> bring_up_vcpus(), so with those accesses you may be referring to
> copy_settings() instead? Which in turn - I'm sorry for my ignorance -
> raises the question why (implicit) hypervisor side accesses to guest
> memory might trigger sharing: So far I was working from the assumption
> that it's only control tool requests which do. Otoh you talk of
> "sharing ops", which suggests it may indeed be requests coming from
> the control tool. Yet that's also what invoked the request to fork,
> so shouldn't it coordinate with itself and avoid issuing sharing ops
> prematurely?

I went back to double-check and here is the memory access during
vcpu_create:

(XEN) Xen call trace:
(XEN)    [<ffff82d0402ebf38>] R
mem_sharing.c#mem_sharing_gfn_alloc+0x5c/0x5e
(XEN)    [<ffff82d0402ecb12>] F mem_sharing.c#add_to_physmap+0x175/0x233
(XEN)    [<ffff82d0402ee81d>] F mem_sharing_fork_page+0x4ee/0x51e
(XEN)    [<ffff82d0402f244f>] F p2m_get_gfn_type_access+0x119/0x1a7
(XEN)    [<ffff82d0402e67ef>] F hap.c#hap_update_paging_modes+0xbe/0x2ee
(XEN)    [<ffff82d0402942a0>] F vmx_create_vmcs+0x981/0xb71
(XEN)    [<ffff82d040298884>] F vmx.c#vmx_vcpu_initialise+0x64/0x1a0
(XEN)    [<ffff82d0402acc59>] F hvm_vcpu_initialise+0x97/0x19e
(XEN)    [<ffff82d0403168db>] F arch_vcpu_create+0xf3/0x1db
(XEN)    [<ffff82d040206c69>] F vcpu_create+0x211/0x35d
(XEN)    [<ffff82d0402f00b7>] F mem_sharing_memop+0x16a9/0x1869
(XEN)    [<ffff82d0403317c5>] F subarch_memory_op+0x298/0x2c4
(XEN)    [<ffff82d04032ca26>] F arch_memory_op+0xa9f/0xaa4
(XEN)    [<ffff82d040221e66>] F do_memory_op+0x2150/0x2297
(XEN)    [<ffff82d04030bcfb>] F pv_do_multicall_call+0x22c/0x4c7
(XEN)    [<ffff82d040319727>] F arch_do_multicall_call+0x23/0x2c
(XEN)    [<ffff82d04022231f>] F do_multicall+0xd3/0x417
(XEN)    [<ffff82d04030c0e4>] F pv_hypercall+0xea/0x3a0
(XEN)    [<ffff82d040201292>] F lstar_enter+0x122/0x130

Any time any page is requested in a fork that's not present it gets
populated from the parent (if the parent has it). What I was worried about
is nominate_page being called on the fork but I was mistaken, it was called
on the parent before add_to_physmap is called on the fork - which is fine
and the expected behavior.

We can drop this patch, sorry for the noise.

Tamas
diff mbox series

Patch

diff --git a/xen/arch/x86/include/asm/hvm/domain.h b/xen/arch/x86/include/asm/hvm/domain.h
index 698455444e..76a08b55f9 100644
--- a/xen/arch/x86/include/asm/hvm/domain.h
+++ b/xen/arch/x86/include/asm/hvm/domain.h
@@ -33,6 +33,14 @@  struct mem_sharing_domain
 {
     bool enabled, block_interrupts;
 
+    /*
+     * We need to avoid trying to nominate pages for forking until the
+     * fork operation is completely finished. As parts of fork creation
+     * is restartable we mark here if the process started to skip the
+     * non-restartable portion.
+     */
+    bool fork_started;
+
     /*
      * When releasing shared gfn's in a preemptible manner, recall where
      * to resume the search.
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 649d93dc54..b55c5bccdd 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1888,11 +1888,12 @@  static int copy_settings(struct domain *cd, struct domain *d)
 static int fork(struct domain *cd, struct domain *d)
 {
     int rc = -EBUSY;
+    struct mem_sharing_domain *msd = &cd->arch.hvm.mem_sharing;
 
     if ( !cd->controller_pause_count )
         return rc;
 
-    if ( !cd->parent )
+    if ( !msd->fork_started )
     {
         if ( !get_domain(d) )
         {
@@ -1905,7 +1906,7 @@  static int fork(struct domain *cd, struct domain *d)
         *cd->arch.cpuid = *d->arch.cpuid;
         *cd->arch.msr = *d->arch.msr;
         cd->vmtrace_size = d->vmtrace_size;
-        cd->parent = d;
+        msd->fork_started = 1;
     }
 
     /* This is preemptible so it's the first to get done */
@@ -1918,8 +1919,11 @@  static int fork(struct domain *cd, struct domain *d)
     rc = copy_settings(cd, d);
 
  done:
-    if ( rc && rc != -ERESTART )
+    if ( !rc )
+        cd->parent = d;
+    else if ( rc != -ERESTART )
     {
+        msd->fork_started = 0;
         cd->parent = NULL;
         domain_unpause(d);
         put_domain(d);