From patchwork Fri May 19 15:50:33 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Ostrovsky X-Patchwork-Id: 9737509 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id C2F806041F for ; Fri, 19 May 2017 15:52:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C3BAC26AE3 for ; Fri, 19 May 2017 15:52:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B865A28179; Fri, 19 May 2017 15:52:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id E6D8926AE3 for ; Fri, 19 May 2017 15:52:24 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dBkA0-0000dp-10; Fri, 19 May 2017 15:49:48 +0000 Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dBk9y-0000d2-8s for xen-devel@lists.xen.org; Fri, 19 May 2017 15:49:46 +0000 Received: from [85.158.139.211] by server-5.bemta-5.messagelabs.com id 76/7F-02183-9141F195; Fri, 19 May 2017 15:49:45 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrDLMWRWlGSWpSXmKPExsXSO6nOVVdCRD7 S4OEOFYslHxezODB6HN39mymAMYo1My8pvyKBNePW8VbGgpm5FXO2rGdrYHzp08XIxSEkMJlJ Yv2mJWwQzl9GiYOXp0M5Gxgltu3rYe9i5ARyehglbm1SBbHZBIwkzh6dzghiiwhIS1z7fJkRp IFZoIFJ4vm5g2AJYQFPib1XfrOB2CwCqhK9k++wgNi8Al4S028+A6uREFCQmPLwPTOIzSngLf Fm0ixWiGVeEr9m9bFC1BhLtL+9yDaBkW8BI8MqRvXi1KKy1CJdM72kosz0jJLcxMwcXUMDU73 c1OLixPTUnMSkYr3k/NxNjMBQYQCCHYxTG5wPMUpyMCmJ8joelosU4kvKT6nMSCzOiC8qzUkt PsQow8GhJMFrKSwfKSRYlJqeWpGWmQMMWpi0BAePkghvKUiat7ggMbc4Mx0idYpRUUqc94MQU EIAJJFRmgfXBouUS4yyUsK8jECHCPEUpBblZpagyr9iFOdgVBLmvQEyhSczrwRu+iugxUxAi5 sfSIMsLklESEk1MPYeCf7mMq3XweLWNG+jrScEv8jrlLwzNlFlDvDX2vDmUKeb/8v2IzWfpQv Kphu8WNxwWGvjwz/S36K+dgvOZJjibThx/ocX8+x3xpd0Kageanr9+p7YDNU95lF20muapndq uN3f+V5T/Psyj7bmEOUlBTz/7T7Nuzvvl7PIZ/3HVgyKMQuv7lViKc5INNRiLipOBAAp3VI3j wIAAA== X-Env-Sender: boris.ostrovsky@oracle.com X-Msg-Ref: server-8.tower-206.messagelabs.com!1495208982!99096015!1 X-Originating-IP: [141.146.126.69] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogMTQxLjE0Ni4xMjYuNjkgPT4gMjc3MjE4\n X-StarScan-Received: X-StarScan-Version: 9.4.12; banners=-,-,- X-VirusChecked: Checked Received: (qmail 61890 invoked from network); 19 May 2017 15:49:44 -0000 Received: from aserp1040.oracle.com (HELO aserp1040.oracle.com) (141.146.126.69) by server-8.tower-206.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 19 May 2017 15:49:44 -0000 Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v4JFnccv027841 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 19 May 2017 15:49:38 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id v4JFnbH3029649 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 19 May 2017 15:49:37 GMT Received: from abhmp0001.oracle.com (abhmp0001.oracle.com [141.146.116.7]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id v4JFnacj030748; Fri, 19 May 2017 15:49:36 GMT Received: from ovs104.us.oracle.com (/10.149.76.204) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 19 May 2017 08:49:36 -0700 From: Boris Ostrovsky To: xen-devel@lists.xen.org Date: Fri, 19 May 2017 11:50:33 -0400 Message-Id: <1495209040-11101-2-git-send-email-boris.ostrovsky@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1495209040-11101-1-git-send-email-boris.ostrovsky@oracle.com> References: <1495209040-11101-1-git-send-email-boris.ostrovsky@oracle.com> X-Source-IP: aserv0022.oracle.com [141.146.126.234] Cc: sstabellini@kernel.org, wei.liu2@citrix.com, George.Dunlap@eu.citrix.com, andrew.cooper3@citrix.com, ian.jackson@eu.citrix.com, tim@xen.org, jbeulich@suse.com, Boris Ostrovsky Subject: [Xen-devel] [PATCH v4 1/8] mm: Place unscrubbed pages at the end of pagelist X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP . so that it's easy to find pages that need to be scrubbed (those pages are now marked with _PGC_need_scrub bit). We keep track of the first unscrubbed page in a page buddy using first_dirty field. For now it can have two values, 0 (whole buddy needs scrubbing) or INVALID_DIRTY_IDX (the buddy does not need to be scrubbed). Subsequent patches will allow scrubbing to be interrupted, resulting in first_dirty taking any value. Signed-off-by: Boris Ostrovsky --- Changes in v4: * Instead of using a bool dirty_head in page_info use int first_dirty. - Keep track of first_dirty in free_heap_pages() * Alias PGC_need_scrub flag to PGC_allocated xen/common/page_alloc.c | 175 ++++++++++++++++++++++++++++++++++++++--------- xen/include/asm-arm/mm.h | 10 +++ xen/include/asm-x86/mm.h | 10 +++ 3 files changed, 163 insertions(+), 32 deletions(-) diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index 9e41fb4..c65d214 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -383,6 +383,8 @@ typedef struct page_list_head heap_by_zone_and_order_t[NR_ZONES][MAX_ORDER+1]; static heap_by_zone_and_order_t *_heap[MAX_NUMNODES]; #define heap(node, zone, order) ((*_heap[node])[zone][order]) +static unsigned long node_need_scrub[MAX_NUMNODES]; + static unsigned long *avail[MAX_NUMNODES]; static long total_avail_pages; @@ -678,6 +680,20 @@ static void check_low_mem_virq(void) } } +/* Pages that need a scrub are added to tail, otherwise to head. */ +static void page_list_add_scrub(struct page_info *pg, unsigned int node, + unsigned int zone, unsigned int order, + unsigned int first_dirty) +{ + PFN_ORDER(pg) = order; + pg->u.free.first_dirty = first_dirty; + + if ( first_dirty != INVALID_DIRTY_IDX ) + page_list_add_tail(pg, &heap(node, zone, order)); + else + page_list_add(pg, &heap(node, zone, order)); +} + /* Allocate 2^@order contiguous pages. */ static struct page_info *alloc_heap_pages( unsigned int zone_lo, unsigned int zone_hi, @@ -689,7 +705,7 @@ static struct page_info *alloc_heap_pages( unsigned long request = 1UL << order; struct page_info *pg; nodemask_t nodemask = (d != NULL ) ? d->node_affinity : node_online_map; - bool_t need_tlbflush = 0; + bool need_scrub, need_tlbflush = 0; uint32_t tlbflush_timestamp = 0; /* Make sure there are enough bits in memflags for nodeID. */ @@ -798,11 +814,18 @@ static struct page_info *alloc_heap_pages( return NULL; found: + need_scrub = (pg->u.free.first_dirty != INVALID_DIRTY_IDX); + /* We may have to halve the chunk a number of times. */ while ( j != order ) { - PFN_ORDER(pg) = --j; - page_list_add_tail(pg, &heap(node, zone, j)); + /* + * Some of the sub-chunks may be clean but we will mark them + * as dirty (if need_scrub is set) to avoid traversing the + * list here. + */ + page_list_add_scrub(pg, node, zone, --j, + need_scrub ? 0 : INVALID_DIRTY_IDX); pg += 1 << j; } @@ -851,11 +874,20 @@ static int reserve_offlined_page(struct page_info *head) int zone = page_to_zone(head), i, head_order = PFN_ORDER(head), count = 0; struct page_info *cur_head; int cur_order; + bool need_scrub; ASSERT(spin_is_locked(&heap_lock)); cur_head = head; + /* + * We may break the buddy so let's mark the head as clean. Then, when + * merging chunks back into the heap, we will see whether the chunk has + * unscrubbed pages and set its first_dirty properly. + */ + need_scrub = (head->u.free.first_dirty != INVALID_DIRTY_IDX); + head->u.free.first_dirty = INVALID_DIRTY_IDX; + page_list_del(head, &heap(node, zone, head_order)); while ( cur_head < (head + (1 << head_order)) ) @@ -873,6 +905,8 @@ static int reserve_offlined_page(struct page_info *head) while ( cur_order < head_order ) { + unsigned int first_dirty = INVALID_DIRTY_IDX; + next_order = cur_order + 1; if ( (cur_head + (1 << next_order)) >= (head + ( 1 << head_order)) ) @@ -892,8 +926,20 @@ static int reserve_offlined_page(struct page_info *head) { merge: /* We don't consider merging outside the head_order. */ - page_list_add_tail(cur_head, &heap(node, zone, cur_order)); - PFN_ORDER(cur_head) = cur_order; + + /* See if any of the pages indeed need scrubbing. */ + if ( need_scrub ) + { + for ( i = 0; i < (1 << cur_order); i++ ) + if ( test_bit(_PGC_need_scrub, + &cur_head[i].count_info) ) + { + first_dirty = i; + break; + } + } + page_list_add_scrub(cur_head, node, zone, + cur_order, first_dirty); cur_head += (1 << cur_order); break; } @@ -919,9 +965,52 @@ static int reserve_offlined_page(struct page_info *head) return count; } +static void scrub_free_pages(unsigned int node) +{ + struct page_info *pg; + unsigned int zone; + + ASSERT(spin_is_locked(&heap_lock)); + + if ( !node_need_scrub[node] ) + return; + + for ( zone = 0; zone < NR_ZONES; zone++ ) + { + unsigned int order = MAX_ORDER; + do { + while ( !page_list_empty(&heap(node, zone, order)) ) + { + unsigned int i; + + /* Unscrubbed pages are always at the end of the list. */ + pg = page_list_last(&heap(node, zone, order)); + if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX ) + break; + + for ( i = pg->u.free.first_dirty; i < (1U << order); i++) + { + if ( test_bit(_PGC_need_scrub, &pg[i].count_info) ) + { + scrub_one_page(&pg[i]); + pg[i].count_info &= ~PGC_need_scrub; + node_need_scrub[node]--; + } + } + + page_list_del(pg, &heap(node, zone, order)); + page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX); + + if ( node_need_scrub[node] == 0 ) + return; + } + } while ( order-- != 0 ); + } +} + /* Free 2^@order set of pages. */ static void free_heap_pages( - struct page_info *pg, unsigned int order) + struct page_info *pg, unsigned int order, bool need_scrub) { unsigned long mask, mfn = page_to_mfn(pg); unsigned int i, node = phys_to_nid(page_to_maddr(pg)), tainted = 0; @@ -961,10 +1050,20 @@ static void free_heap_pages( /* This page is not a guest frame any more. */ page_set_owner(&pg[i], NULL); /* set_gpfn_from_mfn snoops pg owner */ set_gpfn_from_mfn(mfn + i, INVALID_M2P_ENTRY); + + if ( need_scrub ) + pg[i].count_info |= PGC_need_scrub; } avail[node][zone] += 1 << order; total_avail_pages += 1 << order; + if ( need_scrub ) + { + node_need_scrub[node] += 1 << order; + pg->u.free.first_dirty = 0; + } + else + pg->u.free.first_dirty = INVALID_DIRTY_IDX; if ( tmem_enabled() ) midsize_alloc_zone_pages = max( @@ -977,35 +1076,54 @@ static void free_heap_pages( if ( (page_to_mfn(pg) & mask) ) { + struct page_info *predecessor = pg - mask; + /* Merge with predecessor block? */ - if ( !mfn_valid(_mfn(page_to_mfn(pg-mask))) || - !page_state_is(pg-mask, free) || - (PFN_ORDER(pg-mask) != order) || - (phys_to_nid(page_to_maddr(pg-mask)) != node) ) + if ( !mfn_valid(_mfn(page_to_mfn(predecessor))) || + !page_state_is(predecessor, free) || + (PFN_ORDER(predecessor) != order) || + (phys_to_nid(page_to_maddr(predecessor)) != node) ) break; - pg -= mask; - page_list_del(pg, &heap(node, zone, order)); + + page_list_del(predecessor, &heap(node, zone, order)); + + if ( predecessor->u.free.first_dirty != INVALID_DIRTY_IDX ) + need_scrub = true; + /* ... and keep predecessor's first_dirty. */ + else if ( pg->u.free.first_dirty != INVALID_DIRTY_IDX ) + predecessor->u.free.first_dirty = (1U << order) + + pg->u.free.first_dirty; + + pg->u.free.first_dirty = INVALID_DIRTY_IDX; + pg = predecessor; } else { + struct page_info *successor = pg + mask; + /* Merge with successor block? */ - if ( !mfn_valid(_mfn(page_to_mfn(pg+mask))) || - !page_state_is(pg+mask, free) || - (PFN_ORDER(pg+mask) != order) || - (phys_to_nid(page_to_maddr(pg+mask)) != node) ) + if ( !mfn_valid(_mfn(page_to_mfn(successor))) || + !page_state_is(successor, free) || + (PFN_ORDER(successor) != order) || + (phys_to_nid(page_to_maddr(successor)) != node) ) break; - page_list_del(pg + mask, &heap(node, zone, order)); + page_list_del(successor, &heap(node, zone, order)); + + need_scrub |= (successor->u.free.first_dirty != INVALID_DIRTY_IDX); + successor->u.free.first_dirty = INVALID_DIRTY_IDX; } order++; } - PFN_ORDER(pg) = order; - page_list_add_tail(pg, &heap(node, zone, order)); + page_list_add_scrub(pg, node, zone, order, pg->u.free.first_dirty); if ( tainted ) reserve_offlined_page(pg); + if ( need_scrub ) + scrub_free_pages(node); + spin_unlock(&heap_lock); } @@ -1226,7 +1344,7 @@ unsigned int online_page(unsigned long mfn, uint32_t *status) spin_unlock(&heap_lock); if ( (y & PGC_state) == PGC_state_offlined ) - free_heap_pages(pg, 0); + free_heap_pages(pg, 0, false); return ret; } @@ -1295,7 +1413,7 @@ static void init_heap_pages( nr_pages -= n; } - free_heap_pages(pg+i, 0); + free_heap_pages(pg + i, 0, false); } } @@ -1622,7 +1740,7 @@ void free_xenheap_pages(void *v, unsigned int order) memguard_guard_range(v, 1 << (order + PAGE_SHIFT)); - free_heap_pages(virt_to_page(v), order); + free_heap_pages(virt_to_page(v), order, false); } #else @@ -1676,12 +1794,9 @@ void free_xenheap_pages(void *v, unsigned int order) pg = virt_to_page(v); for ( i = 0; i < (1u << order); i++ ) - { - scrub_one_page(&pg[i]); pg[i].count_info &= ~PGC_xen_heap; - } - free_heap_pages(pg, order); + free_heap_pages(pg, order, true); } #endif @@ -1790,7 +1905,7 @@ struct page_info *alloc_domheap_pages( if ( d && !(memflags & MEMF_no_owner) && assign_pages(d, pg, order, memflags) ) { - free_heap_pages(pg, order); + free_heap_pages(pg, order, false); return NULL; } @@ -1858,11 +1973,7 @@ void free_domheap_pages(struct page_info *pg, unsigned int order) scrub = 1; } - if ( unlikely(scrub) ) - for ( i = 0; i < (1 << order); i++ ) - scrub_one_page(&pg[i]); - - free_heap_pages(pg, order); + free_heap_pages(pg, order, scrub); } if ( drop_dom_ref ) diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index f6915ad..38d4fba 100644 --- a/xen/include/asm-arm/mm.h +++ b/xen/include/asm-arm/mm.h @@ -43,6 +43,9 @@ struct page_info } inuse; /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */ struct { + /* Index of the first *possibly* unscrubbed page in the buddy. */ +#define INVALID_DIRTY_IDX -1U + unsigned int first_dirty; /* Do TLBs need flushing for safety before next page use? */ bool_t need_tlbflush; } free; @@ -115,6 +118,13 @@ struct page_info #define PGC_count_width PG_shift(9) #define PGC_count_mask ((1UL<