diff mbox series

[4/4] x86/mm: Remove force-invalidate loop

Message ID 20191223164329.3113378-5-george.dunlap@citrix.com (mailing list archive)
State New, archived
Headers show
Series x86: Remove force-invalidate loop from relinqusish_memory | expand

Commit Message

George Dunlap Dec. 23, 2019, 4:43 p.m. UTC
The comment about the "force-invalidate" loop gives two reasons for
its existence:

1. Breaking circular "linear pagetable" references

2. Cleaning up partially-validated pages.

The first reason been invalid since XSA-240: Since then it hasn't been
possible to generate circular linear pagetable references.

The second reason has been invalid since long before that: Before
calling relinquish_memory(), domain_relinquish_resources() calls
vcpu_destroy_pagetables() on each vcpu; this will in turn call
put_old_guest_table() on each vcpu.  The result should be that it's
not possible to have top-level partially-devalidated pagetables by the
time we get to relinquish_memory().

The loop, it turns out, was effectively there only to pick up
interrupted UNPIN operations of relinquish_memory() itself.  Now that
these are being handled by put_old_guest_table(), we can remove this
loop entirely.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
This patch has the advantage that it doesn't hide mis-accounted
partial pages anymore; but has the disadvantage that if there *is*
such a mis-accounting, it will become a resource leak (and thus an
XSA).  It might be a good idea to add another "clean up partial pages"
pass after all other pages have been cleaned up, with a suitable
ASSERT().  That way we have a higher chance of catching mis-accounting
during development, without risking opening up a memory / domain leak
in production.

CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/domain.c | 71 -------------------------------------------
 1 file changed, 71 deletions(-)

Comments

Jan Beulich Jan. 3, 2020, 3:51 p.m. UTC | #1
On 23.12.2019 17:43, George Dunlap wrote:
> The comment about the "force-invalidate" loop gives two reasons for
> its existence:
> 
> 1. Breaking circular "linear pagetable" references
> 
> 2. Cleaning up partially-validated pages.
> 
> The first reason been invalid since XSA-240: Since then it hasn't been
> possible to generate circular linear pagetable references.
> 
> The second reason has been invalid since long before that: Before
> calling relinquish_memory(), domain_relinquish_resources() calls
> vcpu_destroy_pagetables() on each vcpu; this will in turn call
> put_old_guest_table() on each vcpu.  The result should be that it's
> not possible to have top-level partially-devalidated pagetables by the
> time we get to relinquish_memory().

There's a subtle difference: Up in the enumeration you correctly
say "partially-validated pages". Down here you also correctly state
that put_old_guest_table() deals with partially-devalidated
pagetables. The latter indeed shouldn't be able to make it here,
but what about ones that have been partially validated (and hence
not put into ->arch.old_guest_table) while the domain was still
alive? (I think the code change is still correct, because
partially validated pages ought to be taken care of correctly, but
I'd prefer if the description matched the change.)

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index b7968463cb..847a70302c 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1950,7 +1950,6 @@  static int relinquish_memory(
     struct domain *d, struct page_list_head *list, unsigned long type)
 {
     struct page_info  *page;
-    unsigned long     x, y;
     int               ret = 0;
 
     ret = put_old_guest_table(current);
@@ -2004,76 +2003,6 @@  static int relinquish_memory(
 
         put_page_alloc_ref(page);
 
-        /*
-         * Forcibly invalidate top-most, still valid page tables at this point
-         * to break circular 'linear page table' references as well as clean up
-         * partially validated pages. This is okay because MMU structures are
-         * not shared across domains and this domain is now dead. Thus top-most
-         * valid tables are not in use so a non-zero count means circular
-         * reference or partially validated.
-         */
-        y = page->u.inuse.type_info;
-        for ( ; ; )
-        {
-            x = y;
-            if ( likely((x & PGT_type_mask) != type) ||
-                 likely(!(x & (PGT_validated|PGT_partial))) )
-                break;
-
-            y = cmpxchg(&page->u.inuse.type_info, x,
-                        x & ~(PGT_validated|PGT_partial));
-            if ( likely(y == x) )
-            {
-                /* No need for atomic update of type_info here: noone else updates it. */
-                switch ( ret = devalidate_page(page, x, 1) )
-                {
-                case 0:
-                    break;
-                case -EINTR:
-                    page_list_add(page, list);
-                    page->u.inuse.type_info |= PGT_validated;
-                    if ( x & PGT_partial )
-                        put_page(page);
-                    put_page(page);
-                    ret = -ERESTART;
-                    goto out;
-                case -ERESTART:
-                    page_list_add(page, list);
-                    /*
-                     * PGT_partial holds a type ref and a general ref.
-                     * If we came in with PGT_partial set, then we 1)
-                     * don't need to grab an extra type count, and 2)
-                     * do need to drop the extra page ref we grabbed
-                     * at the top of the loop.  If we didn't come in
-                     * with PGT_partial set, we 1) do need to drab an
-                     * extra type count, but 2) can transfer the page
-                     * ref we grabbed above to it.
-                     *
-                     * Note that we must increment type_info before
-                     * setting PGT_partial.  Theoretically it should
-                     * be safe to drop the page ref before setting
-                     * PGT_partial, but do it afterwards just to be
-                     * extra safe.
-                     */
-                    if ( !(x & PGT_partial) )
-                        page->u.inuse.type_info++;
-                    smp_wmb();
-                    page->u.inuse.type_info |= PGT_partial;
-                    if ( x & PGT_partial )
-                        put_page(page);
-                    goto out;
-                default:
-                    BUG();
-                }
-                if ( x & PGT_partial )
-                {
-                    page->u.inuse.type_info--;
-                    put_page(page);
-                }
-                break;
-            }
-        }
-
         /* Put the page on the list and /then/ potentially free it. */
         page_list_add_tail(page, &d->arch.relmem_list);
         put_page(page);