diff mbox series

[v2,1/7] xen/mm: lift 32 item limit from mfn/gfn arrays

Message ID 1060400786.9820894.1592523480084.JavaMail.zimbra@cert.pl (mailing list archive)
State Superseded
Headers show
Series Implement support for external IPT monitoring | expand

Commit Message

Michał Leszczyński June 18, 2020, 11:38 p.m. UTC
Replace on-stack array allocation with heap allocation
in order to lift the limit of 32 items in mfn/gfn arrays
when calling acquire_resource.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
---
 xen/common/memory.c | 39 +++++++++++++++++----------------------
 1 file changed, 17 insertions(+), 22 deletions(-)

Comments

Roger Pau Monné June 19, 2020, 11:34 a.m. UTC | #1
On Fri, Jun 19, 2020 at 01:38:00AM +0200, Michał Leszczyński wrote:
> Replace on-stack array allocation with heap allocation
> in order to lift the limit of 32 items in mfn/gfn arrays
> when calling acquire_resource.

I'm afraid this is not correct, you cannot allow unbounded amounts of
items to be processed like this, it's likely that you manage to
trigger the watchdog if the list is long enough, specially when doing
set_foreign_p2m_entry.

You need to process the items in batches (32 was IMO a good start), and
then add support for hypercall continuations. Take a look at how
XENMEM_populate_physmap just a couple of lines below makes use of
hypercall_create_continuation.

After processing every batch you need to check if
hypercall_preempt_check returns true and if so use
hypercall_create_continuation in order to encode a continuation.

Thanks, Roger.
Michał Leszczyński June 19, 2020, 11:36 a.m. UTC | #2
----- 19 cze 2020 o 13:34, Roger Pau Monné roger.pau@citrix.com napisał(a):

> On Fri, Jun 19, 2020 at 01:38:00AM +0200, Michał Leszczyński wrote:
>> Replace on-stack array allocation with heap allocation
>> in order to lift the limit of 32 items in mfn/gfn arrays
>> when calling acquire_resource.
> 
> I'm afraid this is not correct, you cannot allow unbounded amounts of
> items to be processed like this, it's likely that you manage to
> trigger the watchdog if the list is long enough, specially when doing
> set_foreign_p2m_entry.
> 
> You need to process the items in batches (32 was IMO a good start), and
> then add support for hypercall continuations. Take a look at how
> XENMEM_populate_physmap just a couple of lines below makes use of
> hypercall_create_continuation.
> 
> After processing every batch you need to check if
> hypercall_preempt_check returns true and if so use
> hypercall_create_continuation in order to encode a continuation.
> 
> Thanks, Roger.


Somebody previously suggested that this limit could be lifted this way,
so I would like to hear some more opinions on that.


Best regards,
Michał Leszczyński
CERT Polska
Jan Beulich June 19, 2020, 11:48 a.m. UTC | #3
On 19.06.2020 13:36, Michał Leszczyński wrote:
> ----- 19 cze 2020 o 13:34, Roger Pau Monné roger.pau@citrix.com napisał(a):
> 
>> On Fri, Jun 19, 2020 at 01:38:00AM +0200, Michał Leszczyński wrote:
>>> Replace on-stack array allocation with heap allocation
>>> in order to lift the limit of 32 items in mfn/gfn arrays
>>> when calling acquire_resource.
>>
>> I'm afraid this is not correct, you cannot allow unbounded amounts of
>> items to be processed like this, it's likely that you manage to
>> trigger the watchdog if the list is long enough, specially when doing
>> set_foreign_p2m_entry.
>>
>> You need to process the items in batches (32 was IMO a good start), and
>> then add support for hypercall continuations. Take a look at how
>> XENMEM_populate_physmap just a couple of lines below makes use of
>> hypercall_create_continuation.
>>
>> After processing every batch you need to check if
>> hypercall_preempt_check returns true and if so use
>> hypercall_create_continuation in order to encode a continuation.
>>
>> Thanks, Roger.
> 
> 
> Somebody previously suggested that this limit could be lifted this way,
> so I would like to hear some more opinions on that.

I did suggest the limit can be lifted, but not by processing all
pieces in one go. Whether batches of 32 or 64 or 128 are chosen
is a different thing, but you can't do arbitrary amounts without
any preemption checks.

Jan
Michał Leszczyński June 19, 2020, 11:51 a.m. UTC | #4
----- 19 cze 2020 o 13:48, Jan Beulich jbeulich@suse.com napisał(a):

> On 19.06.2020 13:36, Michał Leszczyński wrote:
>> ----- 19 cze 2020 o 13:34, Roger Pau Monné roger.pau@citrix.com napisał(a):
>> 
>>> On Fri, Jun 19, 2020 at 01:38:00AM +0200, Michał Leszczyński wrote:
>>>> Replace on-stack array allocation with heap allocation
>>>> in order to lift the limit of 32 items in mfn/gfn arrays
>>>> when calling acquire_resource.
>>>
>>> I'm afraid this is not correct, you cannot allow unbounded amounts of
>>> items to be processed like this, it's likely that you manage to
>>> trigger the watchdog if the list is long enough, specially when doing
>>> set_foreign_p2m_entry.
>>>
>>> You need to process the items in batches (32 was IMO a good start), and
>>> then add support for hypercall continuations. Take a look at how
>>> XENMEM_populate_physmap just a couple of lines below makes use of
>>> hypercall_create_continuation.
>>>
>>> After processing every batch you need to check if
>>> hypercall_preempt_check returns true and if so use
>>> hypercall_create_continuation in order to encode a continuation.
>>>
>>> Thanks, Roger.
>> 
>> 
>> Somebody previously suggested that this limit could be lifted this way,
>> so I would like to hear some more opinions on that.
> 
> I did suggest the limit can be lifted, but not by processing all
> pieces in one go. Whether batches of 32 or 64 or 128 are chosen
> is a different thing, but you can't do arbitrary amounts without
> any preemption checks.
> 
> Jan


Okay. I will try to correct it within v3.

Best regards,
Michał Leszczyński
CERT Polska
Michał Leszczyński June 19, 2020, 12:35 p.m. UTC | #5
----- 19 cze 2020 o 13:34, Roger Pau Monné roger.pau@citrix.com napisał(a):

> On Fri, Jun 19, 2020 at 01:38:00AM +0200, Michał Leszczyński wrote:
>> Replace on-stack array allocation with heap allocation
>> in order to lift the limit of 32 items in mfn/gfn arrays
>> when calling acquire_resource.
> 
> I'm afraid this is not correct, you cannot allow unbounded amounts of
> items to be processed like this, it's likely that you manage to
> trigger the watchdog if the list is long enough, specially when doing
> set_foreign_p2m_entry.
> 
> You need to process the items in batches (32 was IMO a good start), and
> then add support for hypercall continuations. Take a look at how
> XENMEM_populate_physmap just a couple of lines below makes use of
> hypercall_create_continuation.
> 
> After processing every batch you need to check if
> hypercall_preempt_check returns true and if so use
> hypercall_create_continuation in order to encode a continuation.
> 
> Thanks, Roger.


One more question. Are these continuations transparent from the caller side,
or do I also need to add something on the invoker side to properly handle these
continuations?


Best regards,
Michał Leszczyński
CERT Polska
Jan Beulich June 19, 2020, 12:39 p.m. UTC | #6
On 19.06.2020 14:35, Michał Leszczyński wrote:
> ----- 19 cze 2020 o 13:34, Roger Pau Monné roger.pau@citrix.com napisał(a):
> 
>> On Fri, Jun 19, 2020 at 01:38:00AM +0200, Michał Leszczyński wrote:
>>> Replace on-stack array allocation with heap allocation
>>> in order to lift the limit of 32 items in mfn/gfn arrays
>>> when calling acquire_resource.
>>
>> I'm afraid this is not correct, you cannot allow unbounded amounts of
>> items to be processed like this, it's likely that you manage to
>> trigger the watchdog if the list is long enough, specially when doing
>> set_foreign_p2m_entry.
>>
>> You need to process the items in batches (32 was IMO a good start), and
>> then add support for hypercall continuations. Take a look at how
>> XENMEM_populate_physmap just a couple of lines below makes use of
>> hypercall_create_continuation.
>>
>> After processing every batch you need to check if
>> hypercall_preempt_check returns true and if so use
>> hypercall_create_continuation in order to encode a continuation.
> 
> One more question. Are these continuations transparent from the caller side,
> or do I also need to add something on the invoker side to properly handle these
> continuations?

They are (mostly) transparent to the guest, yes. "Mostly" because we
have cases (iirc) where the continuation data is stored in a way that
a guest could observe it. But it still wouldn't need to do anything
in order for the hypercall to get continued until it completes (which
may be "fails", faod).

Jan
Michał Leszczyński June 22, 2020, 3 a.m. UTC | #7
----- 19 cze 2020 o 14:39, Jan Beulich jbeulich@suse.com napisał(a):

> On 19.06.2020 14:35, Michał Leszczyński wrote:
>> ----- 19 cze 2020 o 13:34, Roger Pau Monné roger.pau@citrix.com napisał(a):
>> 
>>> On Fri, Jun 19, 2020 at 01:38:00AM +0200, Michał Leszczyński wrote:
>>>> Replace on-stack array allocation with heap allocation
>>>> in order to lift the limit of 32 items in mfn/gfn arrays
>>>> when calling acquire_resource.
>>>
>>> I'm afraid this is not correct, you cannot allow unbounded amounts of
>>> items to be processed like this, it's likely that you manage to
>>> trigger the watchdog if the list is long enough, specially when doing
>>> set_foreign_p2m_entry.
>>>
>>> You need to process the items in batches (32 was IMO a good start), and
>>> then add support for hypercall continuations. Take a look at how
>>> XENMEM_populate_physmap just a couple of lines below makes use of
>>> hypercall_create_continuation.
>>>
>>> After processing every batch you need to check if
>>> hypercall_preempt_check returns true and if so use
>>> hypercall_create_continuation in order to encode a continuation.
>> 
>> One more question. Are these continuations transparent from the caller side,
>> or do I also need to add something on the invoker side to properly handle these
>> continuations?
> 
> They are (mostly) transparent to the guest, yes. "Mostly" because we
> have cases (iirc) where the continuation data is stored in a way that
> a guest could observe it. But it still wouldn't need to do anything
> in order for the hypercall to get continued until it completes (which
> may be "fails", faod).
> 
> Jan


Okay, I've managed to implement continuations while still having these array small.
The operation could simply process max. 32 elements at the time and creates continuation
until everything gets processed.

This will be in patch v3.


Best regards,
Michał Leszczyński
CERT Polska
diff mbox series

Patch

diff --git a/xen/common/memory.c b/xen/common/memory.c
index 714077c1e5..e02606ebe5 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1050,12 +1050,7 @@  static int acquire_resource(
 {
     struct domain *d, *currd = current->domain;
     xen_mem_acquire_resource_t xmar;
-    /*
-     * The mfn_list and gfn_list (below) arrays are ok on stack for the
-     * moment since they are small, but if they need to grow in future
-     * use-cases then per-CPU arrays or heap allocations may be required.
-     */
-    xen_pfn_t mfn_list[32];
+    xen_pfn_t *mfn_list;
     int rc;
 
     if ( copy_from_guest(&xmar, arg, 1) )
@@ -1064,25 +1059,17 @@  static int acquire_resource(
     if ( xmar.pad != 0 )
         return -EINVAL;
 
-    if ( guest_handle_is_null(xmar.frame_list) )
-    {
-        if ( xmar.nr_frames )
-            return -EINVAL;
-
-        xmar.nr_frames = ARRAY_SIZE(mfn_list);
-
-        if ( __copy_field_to_guest(arg, &xmar, nr_frames) )
-            return -EFAULT;
-
-        return 0;
-    }
+    mfn_list = xmalloc_array(xen_pfn_t, xmar.nr_frames);
 
-    if ( xmar.nr_frames > ARRAY_SIZE(mfn_list) )
-        return -E2BIG;
+    if ( ! mfn_list )
+        return -EFAULT;
 
     rc = rcu_lock_remote_domain_by_id(xmar.domid, &d);
     if ( rc )
+    {
+        xfree(mfn_list);
         return rc;
+    }
 
     rc = xsm_domain_resource_map(XSM_DM_PRIV, d);
     if ( rc )
@@ -1111,7 +1098,7 @@  static int acquire_resource(
     }
     else
     {
-        xen_pfn_t gfn_list[ARRAY_SIZE(mfn_list)];
+        xen_pfn_t *gfn_list;
         unsigned int i;
 
         /*
@@ -1120,7 +1107,12 @@  static int acquire_resource(
          *        resource pages unless the caller is the hardware domain.
          */
         if ( !is_hardware_domain(currd) )
-            return -EACCES;
+        {
+            rc = -EACCES;
+            goto out;
+        }
+
+        gfn_list = xmalloc_array(xen_pfn_t, xmar.nr_frames);
 
         if ( copy_from_guest(gfn_list, xmar.frame_list, xmar.nr_frames) )
             rc = -EFAULT;
@@ -1133,9 +1125,12 @@  static int acquire_resource(
             if ( rc && i )
                 rc = -EIO;
         }
+
+        xfree(gfn_list);
     }
 
  out:
+    xfree(mfn_list);
     rcu_unlock_domain(d);
 
     return rc;