diff mbox series

[v2,1/3] x86/mem_sharing: option to enforce fork starting with empty p2m

Message ID ab6bb88e90e5649c60e08a1680b3a2390441031b.1648561546.git.tamas.lengyel@intel.com (mailing list archive)
State New, archived
Headers show
Series [v2,1/3] x86/mem_sharing: option to enforce fork starting with empty p2m | expand

Commit Message

Tamas K Lengyel March 29, 2022, 2:03 p.m. UTC
Add option to the fork memop to enforce a start with an empty p2m.
Pre-populating special pages to the fork tend to be necessary only when setting
up forks to be fully functional with a toolstack or if the fork makes use of
them in some way. For short-lived forks these pages are optional and starting
with an empty p2m has advantages both in terms of reset performance as well as
easier reasoning about the state of the fork after creation.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
v2: rename flag to empty_p2m, add assert at the end and move
     vAPIC page mapping skipping logic into where its mapped
---
 tools/include/xenctrl.h               |  3 ++-
 tools/libs/ctrl/xc_memshr.c           |  5 +++-
 xen/arch/x86/hvm/vmx/vmx.c            |  5 ++++
 xen/arch/x86/include/asm/hvm/domain.h |  4 +++-
 xen/arch/x86/mm/mem_sharing.c         | 33 +++++++++++++++++----------
 xen/include/public/memory.h           |  4 ++--
 6 files changed, 37 insertions(+), 17 deletions(-)

Comments

Jan Beulich March 29, 2022, 3:42 p.m. UTC | #1
On 29.03.2022 16:03, Tamas K Lengyel wrote:
> Add option to the fork memop to enforce a start with an empty p2m.
> Pre-populating special pages to the fork tend to be necessary only when setting
> up forks to be fully functional with a toolstack or if the fork makes use of
> them in some way. For short-lived forks these pages are optional and starting
> with an empty p2m has advantages both in terms of reset performance as well as
> easier reasoning about the state of the fork after creation.

I'm afraid I don't consider this enough of an explanation: Why would these
page be optional? Where does the apriori knowledge come from that the guest
wouldn't manage to access the vCPU info pages or the APIC access one?

> --- a/xen/arch/x86/include/asm/hvm/domain.h
> +++ b/xen/arch/x86/include/asm/hvm/domain.h
> @@ -31,7 +31,9 @@
>  #ifdef CONFIG_MEM_SHARING
>  struct mem_sharing_domain
>  {
> -    bool enabled, block_interrupts;
> +    bool enabled;
> +    bool block_interrupts;
> +    bool empty_p2m;

While the name of the field is perhaps fine as is, it would be helpful to
have a comment here clarifying that this is only about the guest's initial
and reset state; this specifically does not indicate the p2m has to remain
empty (aiui).

> @@ -1856,7 +1860,13 @@ static int fork(struct domain *cd, struct domain *d)
>      if ( (rc = bring_up_vcpus(cd, d)) )
>          goto done;
>  
> -    rc = copy_settings(cd, d);
> +    if ( !(rc = copy_settings(cd, d, empty_p2m)) )
> +    {
> +        cd->arch.hvm.mem_sharing.block_interrupts = block_interrupts;
> +
> +        if ( (cd->arch.hvm.mem_sharing.empty_p2m = empty_p2m) )

Is there a reason you don't do the assignment earlier, thus avoiding the
need to pass around the extra function argument?

> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -543,10 +543,10 @@ struct xen_mem_sharing_op {
>          } debug;
>          struct mem_sharing_op_fork {      /* OP_FORK */
>              domid_t parent_domain;        /* IN: parent's domain id */
> -/* Only makes sense for short-lived forks */
> +/* These flags only makes sense for short-lived forks */

Nit: s/makes/make/.

Jan
Tamas K Lengyel March 29, 2022, 4:10 p.m. UTC | #2
On Tue, Mar 29, 2022 at 11:42 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 29.03.2022 16:03, Tamas K Lengyel wrote:
> > Add option to the fork memop to enforce a start with an empty p2m.
> > Pre-populating special pages to the fork tend to be necessary only when setting
> > up forks to be fully functional with a toolstack or if the fork makes use of
> > them in some way. For short-lived forks these pages are optional and starting
> > with an empty p2m has advantages both in terms of reset performance as well as
> > easier reasoning about the state of the fork after creation.
>
> I'm afraid I don't consider this enough of an explanation: Why would these
> page be optional? Where does the apriori knowledge come from that the guest
> wouldn't manage to access the vCPU info pages or the APIC access one?

By knowing what code you are fuzzing. The code you are fuzzing is
clearly marked by harnesses and that's the only code you execute while
fuzzing. If you know the code doesn't use them, no need to map them
in. They haven't been needed in any of the fuzzing setups we had so
far so I'm planning to be this the default when fuzzing.

> > --- a/xen/arch/x86/include/asm/hvm/domain.h
> > +++ b/xen/arch/x86/include/asm/hvm/domain.h
> > @@ -31,7 +31,9 @@
> >  #ifdef CONFIG_MEM_SHARING
> >  struct mem_sharing_domain
> >  {
> > -    bool enabled, block_interrupts;
> > +    bool enabled;
> > +    bool block_interrupts;
> > +    bool empty_p2m;
>
> While the name of the field is perhaps fine as is, it would be helpful to
> have a comment here clarifying that this is only about the guest's initial
> and reset state; this specifically does not indicate the p2m has to remain
> empty (aiui).

Sure.

>
> > @@ -1856,7 +1860,13 @@ static int fork(struct domain *cd, struct domain *d)
> >      if ( (rc = bring_up_vcpus(cd, d)) )
> >          goto done;
> >
> > -    rc = copy_settings(cd, d);
> > +    if ( !(rc = copy_settings(cd, d, empty_p2m)) )
> > +    {
> > +        cd->arch.hvm.mem_sharing.block_interrupts = block_interrupts;
> > +
> > +        if ( (cd->arch.hvm.mem_sharing.empty_p2m = empty_p2m) )
>
> Is there a reason you don't do the assignment earlier, thus avoiding the
> need to pass around the extra function argument?

Yes, I prefer only setting these values when the fork is complete, to
avoid having these be dangling in case the fork failed. It's
ultimately not a requirement since if the fork failed we just destroy
the domain that was destined to be the fork from the toolstack. If the
fork failed half-way through all bets are off anyway since we don't do
any "unfork" to roll back the changes that were already applied, so
having these also set early wouldn't make things worse then it already
is. But still, I prefer not adding more things that would need to be
cleaned up if I don't have to.

>
> > --- a/xen/include/public/memory.h
> > +++ b/xen/include/public/memory.h
> > @@ -543,10 +543,10 @@ struct xen_mem_sharing_op {
> >          } debug;
> >          struct mem_sharing_op_fork {      /* OP_FORK */
> >              domid_t parent_domain;        /* IN: parent's domain id */
> > -/* Only makes sense for short-lived forks */
> > +/* These flags only makes sense for short-lived forks */
>
> Nit: s/makes/make/.

Ack.

Tamas
Jan Beulich March 30, 2022, 6:46 a.m. UTC | #3
On 29.03.2022 18:10, Tamas K Lengyel wrote:
> On Tue, Mar 29, 2022 at 11:42 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 29.03.2022 16:03, Tamas K Lengyel wrote:
>>> Add option to the fork memop to enforce a start with an empty p2m.
>>> Pre-populating special pages to the fork tend to be necessary only when setting
>>> up forks to be fully functional with a toolstack or if the fork makes use of
>>> them in some way. For short-lived forks these pages are optional and starting
>>> with an empty p2m has advantages both in terms of reset performance as well as
>>> easier reasoning about the state of the fork after creation.
>>
>> I'm afraid I don't consider this enough of an explanation: Why would these
>> page be optional? Where does the apriori knowledge come from that the guest
>> wouldn't manage to access the vCPU info pages or the APIC access one?
> 
> By knowing what code you are fuzzing. The code you are fuzzing is
> clearly marked by harnesses and that's the only code you execute while
> fuzzing. If you know the code doesn't use them, no need to map them
> in. They haven't been needed in any of the fuzzing setups we had so
> far so I'm planning to be this the default when fuzzing.

But isn't it the very nature of what you do fuzzing for that unexpected
code paths may be taken? By not having in place what is expected to be
there, yet more unexpected behavior might then result.

Plus - how do you bound how far the guest executes in a single attempt?

Jan
Tamas K Lengyel March 30, 2022, 12:23 p.m. UTC | #4
On Wed, Mar 30, 2022, 2:47 AM Jan Beulich <jbeulich@suse.com> wrote:

> On 29.03.2022 18:10, Tamas K Lengyel wrote:
> > On Tue, Mar 29, 2022 at 11:42 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 29.03.2022 16:03, Tamas K Lengyel wrote:
> >>> Add option to the fork memop to enforce a start with an empty p2m.
> >>> Pre-populating special pages to the fork tend to be necessary only
> when setting
> >>> up forks to be fully functional with a toolstack or if the fork makes
> use of
> >>> them in some way. For short-lived forks these pages are optional and
> starting
> >>> with an empty p2m has advantages both in terms of reset performance as
> well as
> >>> easier reasoning about the state of the fork after creation.
> >>
> >> I'm afraid I don't consider this enough of an explanation: Why would
> these
> >> page be optional? Where does the apriori knowledge come from that the
> guest
> >> wouldn't manage to access the vCPU info pages or the APIC access one?
> >
> > By knowing what code you are fuzzing. The code you are fuzzing is
> > clearly marked by harnesses and that's the only code you execute while
> > fuzzing. If you know the code doesn't use them, no need to map them
> > in. They haven't been needed in any of the fuzzing setups we had so
> > far so I'm planning to be this the default when fuzzing.
>
> But isn't it the very nature of what you do fuzzing for that unexpected
> code paths may be taken? By not having in place what is expected to be
> there, yet more unexpected behavior might then result.
>

You don't get totally arbitrary execution, no. If you do then that means
having instability and non-reproducible runs which makes the fuzzing
inefficient. So if you know that the part of code has no reasonable path to
reach code using these pages then you can get rid of them. This is an
option for cases where you can make that call. That's all, just an option.


> Plus - how do you bound how far the guest executes in a single attempt?
>

We use a cpuid or breakpoint to signal where the code reached the end
point. The start point is where the parent got paused (also usually using a
magic cpuid).

Tamas

>
diff mbox series

Patch

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 95bd5eca67..26766ec19f 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -2281,7 +2281,8 @@  int xc_memshr_fork(xc_interface *xch,
                    uint32_t source_domain,
                    uint32_t client_domain,
                    bool allow_with_iommu,
-                   bool block_interrupts);
+                   bool block_interrupts,
+                   bool empty_p2m);
 
 /*
  * Note: this function is only intended to be used on short-lived forks that
diff --git a/tools/libs/ctrl/xc_memshr.c b/tools/libs/ctrl/xc_memshr.c
index a6cfd7dccf..0143f9ddea 100644
--- a/tools/libs/ctrl/xc_memshr.c
+++ b/tools/libs/ctrl/xc_memshr.c
@@ -240,7 +240,8 @@  int xc_memshr_debug_gref(xc_interface *xch,
 }
 
 int xc_memshr_fork(xc_interface *xch, uint32_t pdomid, uint32_t domid,
-                   bool allow_with_iommu, bool block_interrupts)
+                   bool allow_with_iommu, bool block_interrupts,
+                   bool empty_p2m)
 {
     xen_mem_sharing_op_t mso;
 
@@ -253,6 +254,8 @@  int xc_memshr_fork(xc_interface *xch, uint32_t pdomid, uint32_t domid,
         mso.u.fork.flags |= XENMEM_FORK_WITH_IOMMU_ALLOWED;
     if ( block_interrupts )
         mso.u.fork.flags |= XENMEM_FORK_BLOCK_INTERRUPTS;
+    if ( empty_p2m )
+        mso.u.fork.flags |= XENMEM_FORK_EMPTY_P2M;
 
     return xc_memshr_memop(xch, domid, &mso);
 }
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index c075370f64..5e60c92d5c 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -424,6 +424,11 @@  static void cf_check domain_creation_finished(struct domain *d)
     if ( !has_vlapic(d) || mfn_eq(apic_access_mfn, INVALID_MFN) )
         return;
 
+#ifdef CONFIG_MEM_SHARING
+    if ( d->arch.hvm.mem_sharing.empty_p2m )
+        return;
+#endif
+
     ASSERT(epte_get_entry_emt(d, gfn, apic_access_mfn, 0, &ipat,
                               p2m_mmio_direct) == MTRR_TYPE_WRBACK);
     ASSERT(ipat);
diff --git a/xen/arch/x86/include/asm/hvm/domain.h b/xen/arch/x86/include/asm/hvm/domain.h
index 698455444e..22a17c36c5 100644
--- a/xen/arch/x86/include/asm/hvm/domain.h
+++ b/xen/arch/x86/include/asm/hvm/domain.h
@@ -31,7 +31,9 @@ 
 #ifdef CONFIG_MEM_SHARING
 struct mem_sharing_domain
 {
-    bool enabled, block_interrupts;
+    bool enabled;
+    bool block_interrupts;
+    bool empty_p2m;
 
     /*
      * When releasing shared gfn's in a preemptible manner, recall where
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 15e6a7ed81..ef67285a98 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1643,7 +1643,8 @@  static int bring_up_vcpus(struct domain *cd, struct domain *d)
     return 0;
 }
 
-static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
+static int copy_vcpu_settings(struct domain *cd, const struct domain *d,
+                              bool empty_p2m)
 {
     unsigned int i;
     struct p2m_domain *p2m = p2m_get_hostp2m(cd);
@@ -1660,7 +1661,7 @@  static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
 
         /* Copy & map in the vcpu_info page if the guest uses one */
         vcpu_info_mfn = d_vcpu->vcpu_info_mfn;
-        if ( !mfn_eq(vcpu_info_mfn, INVALID_MFN) )
+        if ( !empty_p2m && !mfn_eq(vcpu_info_mfn, INVALID_MFN) )
         {
             mfn_t new_vcpu_info_mfn = cd_vcpu->vcpu_info_mfn;
 
@@ -1807,17 +1808,18 @@  static int copy_special_pages(struct domain *cd, struct domain *d)
     return 0;
 }
 
-static int copy_settings(struct domain *cd, struct domain *d)
+static int copy_settings(struct domain *cd, struct domain *d,
+                         bool empty_p2m)
 {
     int rc;
 
-    if ( (rc = copy_vcpu_settings(cd, d)) )
+    if ( (rc = copy_vcpu_settings(cd, d, empty_p2m)) )
         return rc;
 
     if ( (rc = hvm_copy_context_and_params(cd, d)) )
         return rc;
 
-    if ( (rc = copy_special_pages(cd, d)) )
+    if ( !empty_p2m && (rc = copy_special_pages(cd, d)) )
         return rc;
 
     copy_tsc(cd, d);
@@ -1826,9 +1828,11 @@  static int copy_settings(struct domain *cd, struct domain *d)
     return rc;
 }
 
-static int fork(struct domain *cd, struct domain *d)
+static int fork(struct domain *cd, struct domain *d, uint16_t flags)
 {
     int rc = -EBUSY;
+    bool block_interrupts = flags & XENMEM_FORK_BLOCK_INTERRUPTS;
+    bool empty_p2m = flags & XENMEM_FORK_EMPTY_P2M;
 
     if ( !cd->controller_pause_count )
         return rc;
@@ -1856,7 +1860,13 @@  static int fork(struct domain *cd, struct domain *d)
     if ( (rc = bring_up_vcpus(cd, d)) )
         goto done;
 
-    rc = copy_settings(cd, d);
+    if ( !(rc = copy_settings(cd, d, empty_p2m)) )
+    {
+        cd->arch.hvm.mem_sharing.block_interrupts = block_interrupts;
+
+        if ( (cd->arch.hvm.mem_sharing.empty_p2m = empty_p2m) )
+            ASSERT(page_list_empty(&cd->page_list));
+    }
 
  done:
     if ( rc && rc != -ERESTART )
@@ -1920,7 +1930,7 @@  static int mem_sharing_fork_reset(struct domain *d)
     }
     spin_unlock_recursive(&d->page_alloc_lock);
 
-    rc = copy_settings(d, pd);
+    rc = copy_settings(d, pd, d->arch.hvm.mem_sharing.empty_p2m);
 
     domain_unpause(d);
 
@@ -2190,7 +2200,8 @@  int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
         if ( mso.u.fork.pad )
             goto out;
         if ( mso.u.fork.flags &
-             ~(XENMEM_FORK_WITH_IOMMU_ALLOWED | XENMEM_FORK_BLOCK_INTERRUPTS) )
+             ~(XENMEM_FORK_WITH_IOMMU_ALLOWED | XENMEM_FORK_BLOCK_INTERRUPTS |
+               XENMEM_FORK_EMPTY_P2M) )
             goto out;
 
         rc = rcu_lock_live_remote_domain_by_id(mso.u.fork.parent_domain,
@@ -2212,14 +2223,12 @@  int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
             goto out;
         }
 
-        rc = fork(d, pd);
+        rc = fork(d, pd, mso.u.fork.flags);
 
         if ( rc == -ERESTART )
             rc = hypercall_create_continuation(__HYPERVISOR_memory_op,
                                                "lh", XENMEM_sharing_op,
                                                arg);
-        else if ( !rc && (mso.u.fork.flags & XENMEM_FORK_BLOCK_INTERRUPTS) )
-            d->arch.hvm.mem_sharing.block_interrupts = true;
 
         rcu_unlock_domain(pd);
         break;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index a1a0f0233a..d44c256b3c 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -543,10 +543,10 @@  struct xen_mem_sharing_op {
         } debug;
         struct mem_sharing_op_fork {      /* OP_FORK */
             domid_t parent_domain;        /* IN: parent's domain id */
-/* Only makes sense for short-lived forks */
+/* These flags only makes sense for short-lived forks */
 #define XENMEM_FORK_WITH_IOMMU_ALLOWED (1u << 0)
-/* Only makes sense for short-lived forks */
 #define XENMEM_FORK_BLOCK_INTERRUPTS   (1u << 1)
+#define XENMEM_FORK_EMPTY_P2M          (1u << 2)
             uint16_t flags;               /* IN: optional settings */
             uint32_t pad;                 /* Must be set to 0 */
         } fork;