diff mbox series

[3/4] mm: Use put_old_guest_table for relinquish_pages

Message ID 20191223164329.3113378-4-george.dunlap@citrix.com (mailing list archive)
State New, archived
Headers show
Series x86: Remove force-invalidate loop from relinqusish_memory | expand

Commit Message

George Dunlap Dec. 23, 2019, 4:43 p.m. UTC
relinquish_pages() deals with interrupted de-validation in a fairly
ad-hoc way, by either re-setting PGT_pinned (in the case of EINTR) or
letting the page "fall through" to the "force invalidate" loop below.
This requires an extensive comment describing what needs to happen to
the type and count in each case, and why each works.  Additionally, it
turns out that at this point, the "force invalidate" loop is only
required to handle this ad-hoc continuation.

Replace this with the 'standard' way of dealing with restarting pages,
old_guest_table.  Call put_old_guest_table(current) at the top of the
function, and set current->arch.old_guest_table* as appropriate.  This
code is simpler, and mirrors other old_guest_table code in mm.c.  It
will also allow us to remove the force-invalidate loop entirely in a
subsequent patch.

While here, make the refcounting logic a bit easier to follow: We
always drop the general reference held by PGT_pinned, regardless of
what happens to the type count.  Rather than manually re-dropping the
refcount if put_page_and_type_preemptible() fails, just drop the
refcount unconditionally, and call put_page_type_preemptible()
instead.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/domain.c | 50 +++++++++++++++++++------------------------
 1 file changed, 22 insertions(+), 28 deletions(-)

Comments

Jan Beulich Jan. 3, 2020, 3:43 p.m. UTC | #1
On 23.12.2019 17:43, George Dunlap wrote:
> @@ -1967,42 +1971,32 @@ static int relinquish_memory(
>          }
>  
>          if ( test_and_clear_bit(_PGT_pinned, &page->u.inuse.type_info) )
> -            ret = put_page_and_type_preemptible(page);
> +        {
> +            /* Always drop the page ref associated with PGT_pinned */
> +            put_page(page);
> +            ret = put_page_type_preemptible(page);
> +        }
>          switch ( ret )
>          {
>          case 0:
>              break;
> -        case -ERESTART:
>          case -EINTR:
> -            /*
> -             * -EINTR means PGT_validated has been re-set; re-set
> -             * PGT_pinned again so that it gets picked up next time
> -             * around.
> -             *
> -             * -ERESTART, OTOH, means PGT_partial is set instead.  Put
> -             * it back on the list, but don't set PGT_pinned; the
> -             * section below will finish off de-validation.  But we do
> -             * need to drop the general ref associated with
> -             * PGT_pinned, since put_page_and_type_preemptible()
> -             * didn't do it.
> -             *
> -             * NB we can do an ASSERT for PGT_validated, since we
> -             * "own" the type ref; but theoretically, the PGT_partial
> -             * could be cleared by someone else.
> -             */
> -            if ( ret == -EINTR )
> -            {
> -                ASSERT(page->u.inuse.type_info & PGT_validated);
> -                set_bit(_PGT_pinned, &page->u.inuse.type_info);
> -            }
> -            else
> -                put_page(page);
> +            ASSERT(page->u.inuse.type_info & PGT_validated);
> +            /* Fallthrough */
> +        case -ERESTART:
> +            current->arch.old_guest_ptpg = NULL;
> +            current->arch.old_guest_table = page;
> +            current->arch.old_guest_table_partial = (ret == -ERESTART);
>  
>              ret = -ERESTART;
>  
> -            /* Put the page back on the list and drop the ref we grabbed above */
> -            page_list_add(page, list);
> -            put_page(page);
> +            /* Make sure we don't lose track of the page */
> +            page_list_add_tail(page, &d->arch.relmem_list);

Why at the tail? The prior page_list_add() made sure we'd encounter
this page first on the subsequent continuation. No need to keep
(perhaps very) many pages in partially destructed state. With this
changed back (or the tail insertion suitably explained in the
description)
Reviewed-by: Jan Beulich <jbeulich@suse.com>

On top of this please consider latching current into a local
variable.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index d9c63379cd..b7968463cb 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1953,6 +1953,10 @@  static int relinquish_memory(
     unsigned long     x, y;
     int               ret = 0;
 
+    ret = put_old_guest_table(current);
+    if ( ret )
+        return ret;
+
     /* Use a recursive lock, as we may enter 'free_domheap_page'. */
     spin_lock_recursive(&d->page_alloc_lock);
 
@@ -1967,42 +1971,32 @@  static int relinquish_memory(
         }
 
         if ( test_and_clear_bit(_PGT_pinned, &page->u.inuse.type_info) )
-            ret = put_page_and_type_preemptible(page);
+        {
+            /* Always drop the page ref associated with PGT_pinned */
+            put_page(page);
+            ret = put_page_type_preemptible(page);
+        }
         switch ( ret )
         {
         case 0:
             break;
-        case -ERESTART:
         case -EINTR:
-            /*
-             * -EINTR means PGT_validated has been re-set; re-set
-             * PGT_pinned again so that it gets picked up next time
-             * around.
-             *
-             * -ERESTART, OTOH, means PGT_partial is set instead.  Put
-             * it back on the list, but don't set PGT_pinned; the
-             * section below will finish off de-validation.  But we do
-             * need to drop the general ref associated with
-             * PGT_pinned, since put_page_and_type_preemptible()
-             * didn't do it.
-             *
-             * NB we can do an ASSERT for PGT_validated, since we
-             * "own" the type ref; but theoretically, the PGT_partial
-             * could be cleared by someone else.
-             */
-            if ( ret == -EINTR )
-            {
-                ASSERT(page->u.inuse.type_info & PGT_validated);
-                set_bit(_PGT_pinned, &page->u.inuse.type_info);
-            }
-            else
-                put_page(page);
+            ASSERT(page->u.inuse.type_info & PGT_validated);
+            /* Fallthrough */
+        case -ERESTART:
+            current->arch.old_guest_ptpg = NULL;
+            current->arch.old_guest_table = page;
+            current->arch.old_guest_table_partial = (ret == -ERESTART);
 
             ret = -ERESTART;
 
-            /* Put the page back on the list and drop the ref we grabbed above */
-            page_list_add(page, list);
-            put_page(page);
+            /* Make sure we don't lose track of the page */
+            page_list_add_tail(page, &d->arch.relmem_list);
+
+            /*
+             * NB that we've transferred the general ref acquired at
+             * the top of the loop to old_guest_table.
+             */
             goto out;
         default:
             BUG();