[v4,1/8] mm: Place unscrubbed pages at the end of pagelist

Message ID	1495209040-11101-2-git-send-email-boris.ostrovsky@oracle.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xen.org> From: Boris Ostrovsky <boris.ostrovsky@oracle.com> To: xen-devel@lists.xen.org Date: Fri, 19 May 2017 11:50:33 -0400 Message-Id: <1495209040-11101-2-git-send-email-boris.ostrovsky@oracle.com> In-Reply-To: <1495209040-11101-1-git-send-email-boris.ostrovsky@oracle.com> References: <1495209040-11101-1-git-send-email-boris.ostrovsky@oracle.com> Cc: sstabellini@kernel.org, wei.liu2@citrix.com, George.Dunlap@eu.citrix.com, andrew.cooper3@citrix.com, ian.jackson@eu.citrix.com, tim@xen.org, jbeulich@suse.com, Boris Ostrovsky <boris.ostrovsky@oracle.com> Subject: [Xen-devel] [PATCH v4 1/8] mm: Place unscrubbed pages at the end of pagelist Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

Message ID

1495209040-11101-2-git-send-email-boris.ostrovsky@oracle.com (mailing list archive)

State

New, archived

Headers

From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
To: xen-devel@lists.xen.org
Date: Fri, 19 May 2017 11:50:33 -0400
Message-Id: <1495209040-11101-2-git-send-email-boris.ostrovsky@oracle.com>
In-Reply-To: <1495209040-11101-1-git-send-email-boris.ostrovsky@oracle.com>
References: <1495209040-11101-1-git-send-email-boris.ostrovsky@oracle.com>
Cc: sstabellini@kernel.org, wei.liu2@citrix.com, George.Dunlap@eu.citrix.com,
	andrew.cooper3@citrix.com, ian.jackson@eu.citrix.com, tim@xen.org,
	jbeulich@suse.com, Boris Ostrovsky <boris.ostrovsky@oracle.com>
Subject: [Xen-devel] [PATCH v4 1/8] mm: Place unscrubbed pages at the end of
	pagelist
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

Commit Message

Boris Ostrovsky May 19, 2017, 3:50 p.m. UTC

. so that it's easy to find pages that need to be scrubbed (those pages are
now marked with _PGC_need_scrub bit).

We keep track of the first unscrubbed page in a page buddy using first_dirty
field. For now it can have two values, 0 (whole buddy needs scrubbing) or
INVALID_DIRTY_IDX (the buddy does not need to be scrubbed). Subsequent patches
will allow scrubbing to be interrupted, resulting in first_dirty taking any
value.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v4:
* Instead of using a bool dirty_head in page_info use int first_dirty.
  - Keep track of first_dirty in free_heap_pages()
* Alias PGC_need_scrub flag to PGC_allocated


 xen/common/page_alloc.c  | 175 ++++++++++++++++++++++++++++++++++++++---------
 xen/include/asm-arm/mm.h |  10 +++
 xen/include/asm-x86/mm.h |  10 +++
 3 files changed, 163 insertions(+), 32 deletions(-)

Comments

Jan Beulich June 9, 2017, 2:50 p.m. UTC | #1

>>> On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -383,6 +383,8 @@ typedef struct page_list_head heap_by_zone_and_order_t[NR_ZONES][MAX_ORDER+1];
>  static heap_by_zone_and_order_t *_heap[MAX_NUMNODES];
>  #define heap(node, zone, order) ((*_heap[node])[zone][order])
>  
> +static unsigned long node_need_scrub[MAX_NUMNODES];

Just as a remark - I think it is inefficient to store per-node data
this way (equally applies to _heap[]); this would better be put
into per-node memory. Since we don't have (and likely also don't
want to have) this_node() / per_node(), perhaps we could
(ab)use per-CPU data for this.

> @@ -798,11 +814,18 @@ static struct page_info *alloc_heap_pages(
>      return NULL;
>  
>   found: 
> +    need_scrub = (pg->u.free.first_dirty != INVALID_DIRTY_IDX);
> +
>      /* We may have to halve the chunk a number of times. */
>      while ( j != order )
>      {
> -        PFN_ORDER(pg) = --j;
> -        page_list_add_tail(pg, &heap(node, zone, j));
> +        /*
> +         * Some of the sub-chunks may be clean but we will mark them
> +         * as dirty (if need_scrub is set) to avoid traversing the
> +         * list here.
> +         */
> +        page_list_add_scrub(pg, node, zone, --j,
> +                            need_scrub ? 0 : INVALID_DIRTY_IDX);

I suppose this is going to be improved in subsequent patches,
and hence the comment will be gone by the end of the series?

> @@ -851,11 +874,20 @@ static int reserve_offlined_page(struct page_info *head)
>      int zone = page_to_zone(head), i, head_order = PFN_ORDER(head), count = 0;
>      struct page_info *cur_head;
>      int cur_order;
> +    bool need_scrub;
>  
>      ASSERT(spin_is_locked(&heap_lock));
>  
>      cur_head = head;
>  
> +    /*
> +     * We may break the buddy so let's mark the head as clean. Then, when
> +     * merging chunks back into the heap, we will see whether the chunk has
> +     * unscrubbed pages and set its first_dirty properly.
> +     */
> +    need_scrub = (head->u.free.first_dirty != INVALID_DIRTY_IDX);
> +    head->u.free.first_dirty = INVALID_DIRTY_IDX;
> +
>      page_list_del(head, &heap(node, zone, head_order));
>  
>      while ( cur_head < (head + (1 << head_order)) )
> @@ -873,6 +905,8 @@ static int reserve_offlined_page(struct page_info *head)
>  
>          while ( cur_order < head_order )
>          {
> +            unsigned int first_dirty = INVALID_DIRTY_IDX;
> +
>              next_order = cur_order + 1;
>  
>              if ( (cur_head + (1 << next_order)) >= (head + ( 1 << head_order)) )
> @@ -892,8 +926,20 @@ static int reserve_offlined_page(struct page_info *head)
>              {
>              merge:
>                  /* We don't consider merging outside the head_order. */
> -                page_list_add_tail(cur_head, &heap(node, zone, cur_order));
> -                PFN_ORDER(cur_head) = cur_order;
> +
> +                /* See if any of the pages indeed need scrubbing. */
> +                if ( need_scrub )
> +                {
> +                    for ( i = 0; i < (1 << cur_order); i++ )
> +                        if ( test_bit(_PGC_need_scrub,
> +                                      &cur_head[i].count_info) )
> +                        {
> +                            first_dirty = i;
> +                            break;
> +                        }
> +                }

This is inefficient - at the point you set need_scrub you could
instead latch the first page needing to be scrubbed, and start
the loop only there (and in some cases you would get away
without entering the loop altogether).

> @@ -919,9 +965,52 @@ static int reserve_offlined_page(struct page_info *head)
>      return count;
>  }
>  
> +static void scrub_free_pages(unsigned int node)
> +{
> +    struct page_info *pg;
> +    unsigned int zone;
> +
> +    ASSERT(spin_is_locked(&heap_lock));
> +
> +    if ( !node_need_scrub[node] )
> +        return;
> +
> +    for ( zone = 0; zone < NR_ZONES; zone++ )
> +    {
> +        unsigned int order = MAX_ORDER;
> +        do {

Blank line between declaration(s) and statement(s) please.

Also, considering that this entire function runs with the heap lock
held, I think we really need to split that one into per-node ones.
But of course, as with the NUMA locality related remark, this isn't
a request to necessarily do this in the series here.

> @@ -977,35 +1076,54 @@ static void free_heap_pages(
>  
>          if ( (page_to_mfn(pg) & mask) )
>          {
> +            struct page_info *predecessor = pg - mask;
> +
>              /* Merge with predecessor block? */
> -            if ( !mfn_valid(_mfn(page_to_mfn(pg-mask))) ||
> -                 !page_state_is(pg-mask, free) ||
> -                 (PFN_ORDER(pg-mask) != order) ||
> -                 (phys_to_nid(page_to_maddr(pg-mask)) != node) )
> +            if ( !mfn_valid(_mfn(page_to_mfn(predecessor))) ||
> +                 !page_state_is(predecessor, free) ||
> +                 (PFN_ORDER(predecessor) != order) ||
> +                 (phys_to_nid(page_to_maddr(predecessor)) != node) )
>                  break;
> -            pg -= mask;
> -            page_list_del(pg, &heap(node, zone, order));
> +
> +            page_list_del(predecessor, &heap(node, zone, order));
> +
> +            if ( predecessor->u.free.first_dirty != INVALID_DIRTY_IDX )
> +                need_scrub = true;
> +                /* ... and keep predecessor's first_dirty. */
> +            else if ( pg->u.free.first_dirty != INVALID_DIRTY_IDX )
> +                predecessor->u.free.first_dirty = (1U << order) +
> +                                                  pg->u.free.first_dirty;
> +
> +            pg->u.free.first_dirty = INVALID_DIRTY_IDX;

Is this really needed?

> +            pg = predecessor;
>          }
>          else
>          {
> +            struct page_info *successor = pg + mask;
> +
>              /* Merge with successor block? */
> -            if ( !mfn_valid(_mfn(page_to_mfn(pg+mask))) ||
> -                 !page_state_is(pg+mask, free) ||
> -                 (PFN_ORDER(pg+mask) != order) ||
> -                 (phys_to_nid(page_to_maddr(pg+mask)) != node) )
> +            if ( !mfn_valid(_mfn(page_to_mfn(successor))) ||
> +                 !page_state_is(successor, free) ||
> +                 (PFN_ORDER(successor) != order) ||
> +                 (phys_to_nid(page_to_maddr(successor)) != node) )
>                  break;
> -            page_list_del(pg + mask, &heap(node, zone, order));
> +            page_list_del(successor, &heap(node, zone, order));
> +
> +            need_scrub |= (successor->u.free.first_dirty != INVALID_DIRTY_IDX);

|= on a boolean is slightly odd - perhaps better us if() just like you
do on the other path?

> +            successor->u.free.first_dirty = INVALID_DIRTY_IDX;

Same here - is this really needed?

> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -87,6 +87,9 @@ struct page_info
>  
>          /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>          struct {
> +            /* Index of the first *possibly* unscrubbed page in the buddy. */
> +#define INVALID_DIRTY_IDX -1U

This needs parentheses and would better be either ~0U or UINT_MAX.

> +            unsigned int first_dirty;

On x86 this change is fine at present, albeit not optimal. Its ARM
equivalent, however, grows struct page_info in the 32-bit case,
which I don't think is wanted or needed. You really only need
MAX_ORDER+1 bits here, so I'd suggest making this a bit field
(also on x86, to at once make obvious how many bits remain
unused), and perhaps making need_tlbflush a single bit at once
(unless its address is being taken somewhere).

Jan

Boris Ostrovsky June 9, 2017, 8:07 p.m. UTC | #2

On 06/09/2017 10:50 AM, Jan Beulich wrote:
>>>> On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -383,6 +383,8 @@ typedef struct page_list_head heap_by_zone_and_order_t[NR_ZONES][MAX_ORDER+1];
>>  static heap_by_zone_and_order_t *_heap[MAX_NUMNODES];
>>  #define heap(node, zone, order) ((*_heap[node])[zone][order])
>>  
>> +static unsigned long node_need_scrub[MAX_NUMNODES];
> Just as a remark - I think it is inefficient to store per-node data
> this way (equally applies to _heap[]); this would better be put
> into per-node memory. Since we don't have (and likely also don't
> want to have) this_node() / per_node(), perhaps we could
> (ab)use per-CPU data for this.

I did think about doing this but then decided against it since I wasn't
sure it's worth the extra space.


>
>> @@ -798,11 +814,18 @@ static struct page_info *alloc_heap_pages(
>>      return NULL;
>>  
>>   found: 
>> +    need_scrub = (pg->u.free.first_dirty != INVALID_DIRTY_IDX);
>> +
>>      /* We may have to halve the chunk a number of times. */
>>      while ( j != order )
>>      {
>> -        PFN_ORDER(pg) = --j;
>> -        page_list_add_tail(pg, &heap(node, zone, j));
>> +        /*
>> +         * Some of the sub-chunks may be clean but we will mark them
>> +         * as dirty (if need_scrub is set) to avoid traversing the
>> +         * list here.
>> +         */
>> +        page_list_add_scrub(pg, node, zone, --j,
>> +                            need_scrub ? 0 : INVALID_DIRTY_IDX);
> I suppose this is going to be improved in subsequent patches,
> and hence the comment will be gone by the end of the series?

No, it stays the same throughout he series. I suppose I could improve
this by tracking the original buddy's first_dirty and call
page_list_add_scrub(.., false) until we reach that value.


>
>> @@ -919,9 +965,52 @@ static int reserve_offlined_page(struct page_info *head)
>>      return count;
>>  }
>>  
>> +static void scrub_free_pages(unsigned int node)
>> +{
>> +    struct page_info *pg;
>> +    unsigned int zone;
>> +
>> +    ASSERT(spin_is_locked(&heap_lock));
>> +
>> +    if ( !node_need_scrub[node] )
>> +        return;
>> +
>> +    for ( zone = 0; zone < NR_ZONES; zone++ )
>> +    {
>> +        unsigned int order = MAX_ORDER;
>> +        do {
> Blank line between declaration(s) and statement(s) please.
>
> Also, considering that this entire function runs with the heap lock
> held, I think we really need to split that one into per-node ones.
> But of course, as with the NUMA locality related remark, this isn't
> a request to necessarily do this in the series here.

Keep in mind that last patch drops the lock when doing actual scrubbing
so it will get better.

>> +            unsigned int first_dirty;
> On x86 this change is fine at present, albeit not optimal. Its ARM
> equivalent, however, grows struct page_info in the 32-bit case,

It does? I am looking at include/asm-arm/mm.h and I don't see this.

> which I don't think is wanted or needed. You really only need
> MAX_ORDER+1 bits here, so I'd suggest making this a bit field

Just to make sure --- when you say "bit field" you mean masking various
bits in a word, not C language bit fields?

> (also on x86, to at once make obvious how many bits remain
> unused), and perhaps making need_tlbflush a single bit at once
> (unless its address is being taken somewhere).


accumulate_tlbflush() flush wants the address but I think it can be
modified to handle a single bit.


-boris

Jan Beulich June 12, 2017, 6:50 a.m. UTC | #3

>>> On 09.06.17 at 22:07, <boris.ostrovsky@oracle.com> wrote:
> On 06/09/2017 10:50 AM, Jan Beulich wrote:
>>>>> On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
>>> +            unsigned int first_dirty;
>> On x86 this change is fine at present, albeit not optimal. Its ARM
>> equivalent, however, grows struct page_info in the 32-bit case,
> 
> It does? I am looking at include/asm-arm/mm.h and I don't see this.

The union with field name u is 32 bits on 32-bit ARM right now.
You change grows it to 64 bits.

>> which I don't think is wanted or needed. You really only need
>> MAX_ORDER+1 bits here, so I'd suggest making this a bit field
> 
> Just to make sure --- when you say "bit field" you mean masking various
> bits in a word, not C language bit fields?

No, actually the latter.

Jan

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 9e41fb4..c65d214 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -383,6 +383,8 @@  typedef struct page_list_head heap_by_zone_and_order_t[NR_ZONES][MAX_ORDER+1];
 static heap_by_zone_and_order_t *_heap[MAX_NUMNODES];
 #define heap(node, zone, order) ((*_heap[node])[zone][order])
 
+static unsigned long node_need_scrub[MAX_NUMNODES];
+
 static unsigned long *avail[MAX_NUMNODES];
 static long total_avail_pages;
 
@@ -678,6 +680,20 @@  static void check_low_mem_virq(void)
     }
 }
 
+/* Pages that need a scrub are added to tail, otherwise to head. */
+static void page_list_add_scrub(struct page_info *pg, unsigned int node,
+                                unsigned int zone, unsigned int order,
+                                unsigned int first_dirty)
+{
+    PFN_ORDER(pg) = order;
+    pg->u.free.first_dirty = first_dirty;
+
+    if ( first_dirty != INVALID_DIRTY_IDX )
+        page_list_add_tail(pg, &heap(node, zone, order));
+    else
+        page_list_add(pg, &heap(node, zone, order));
+}
+
 /* Allocate 2^@order contiguous pages. */
 static struct page_info *alloc_heap_pages(
     unsigned int zone_lo, unsigned int zone_hi,
@@ -689,7 +705,7 @@  static struct page_info *alloc_heap_pages(
     unsigned long request = 1UL << order;
     struct page_info *pg;
     nodemask_t nodemask = (d != NULL ) ? d->node_affinity : node_online_map;
-    bool_t need_tlbflush = 0;
+    bool need_scrub, need_tlbflush = 0;
     uint32_t tlbflush_timestamp = 0;
 
     /* Make sure there are enough bits in memflags for nodeID. */
@@ -798,11 +814,18 @@  static struct page_info *alloc_heap_pages(
     return NULL;
 
  found: 
+    need_scrub = (pg->u.free.first_dirty != INVALID_DIRTY_IDX);
+
     /* We may have to halve the chunk a number of times. */
     while ( j != order )
     {
-        PFN_ORDER(pg) = --j;
-        page_list_add_tail(pg, &heap(node, zone, j));
+        /*
+         * Some of the sub-chunks may be clean but we will mark them
+         * as dirty (if need_scrub is set) to avoid traversing the
+         * list here.
+         */
+        page_list_add_scrub(pg, node, zone, --j,
+                            need_scrub ? 0 : INVALID_DIRTY_IDX);
         pg += 1 << j;
     }
 
@@ -851,11 +874,20 @@  static int reserve_offlined_page(struct page_info *head)
     int zone = page_to_zone(head), i, head_order = PFN_ORDER(head), count = 0;
     struct page_info *cur_head;
     int cur_order;
+    bool need_scrub;
 
     ASSERT(spin_is_locked(&heap_lock));
 
     cur_head = head;
 
+    /*
+     * We may break the buddy so let's mark the head as clean. Then, when
+     * merging chunks back into the heap, we will see whether the chunk has
+     * unscrubbed pages and set its first_dirty properly.
+     */
+    need_scrub = (head->u.free.first_dirty != INVALID_DIRTY_IDX);
+    head->u.free.first_dirty = INVALID_DIRTY_IDX;
+
     page_list_del(head, &heap(node, zone, head_order));
 
     while ( cur_head < (head + (1 << head_order)) )
@@ -873,6 +905,8 @@  static int reserve_offlined_page(struct page_info *head)
 
         while ( cur_order < head_order )
         {
+            unsigned int first_dirty = INVALID_DIRTY_IDX;
+
             next_order = cur_order + 1;
 
             if ( (cur_head + (1 << next_order)) >= (head + ( 1 << head_order)) )
@@ -892,8 +926,20 @@  static int reserve_offlined_page(struct page_info *head)
             {
             merge:
                 /* We don't consider merging outside the head_order. */
-                page_list_add_tail(cur_head, &heap(node, zone, cur_order));
-                PFN_ORDER(cur_head) = cur_order;
+
+                /* See if any of the pages indeed need scrubbing. */
+                if ( need_scrub )
+                {
+                    for ( i = 0; i < (1 << cur_order); i++ )
+                        if ( test_bit(_PGC_need_scrub,
+                                      &cur_head[i].count_info) )
+                        {
+                            first_dirty = i;
+                            break;
+                        }
+                }
+                page_list_add_scrub(cur_head, node, zone,
+                                    cur_order, first_dirty);
                 cur_head += (1 << cur_order);
                 break;
             }
@@ -919,9 +965,52 @@  static int reserve_offlined_page(struct page_info *head)
     return count;
 }
 
+static void scrub_free_pages(unsigned int node)
+{
+    struct page_info *pg;
+    unsigned int zone;
+
+    ASSERT(spin_is_locked(&heap_lock));
+
+    if ( !node_need_scrub[node] )
+        return;
+
+    for ( zone = 0; zone < NR_ZONES; zone++ )
+    {
+        unsigned int order = MAX_ORDER;
+        do {
+            while ( !page_list_empty(&heap(node, zone, order)) )
+            {
+                unsigned int i;
+
+                /* Unscrubbed pages are always at the end of the list. */
+                pg = page_list_last(&heap(node, zone, order));
+                if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX )
+                    break;
+
+                for ( i = pg->u.free.first_dirty; i < (1U << order); i++)
+                {
+                    if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
+                    {
+                        scrub_one_page(&pg[i]);
+                        pg[i].count_info &= ~PGC_need_scrub;
+                        node_need_scrub[node]--;
+                    }
+                }
+
+                page_list_del(pg, &heap(node, zone, order));
+                page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX);
+
+                if ( node_need_scrub[node] == 0 )
+                    return;
+            }
+        } while ( order-- != 0 );
+    }
+}
+
 /* Free 2^@order set of pages. */
 static void free_heap_pages(
-    struct page_info *pg, unsigned int order)
+    struct page_info *pg, unsigned int order, bool need_scrub)
 {
     unsigned long mask, mfn = page_to_mfn(pg);
     unsigned int i, node = phys_to_nid(page_to_maddr(pg)), tainted = 0;
@@ -961,10 +1050,20 @@  static void free_heap_pages(
         /* This page is not a guest frame any more. */
         page_set_owner(&pg[i], NULL); /* set_gpfn_from_mfn snoops pg owner */
         set_gpfn_from_mfn(mfn + i, INVALID_M2P_ENTRY);
+
+        if ( need_scrub )
+            pg[i].count_info |= PGC_need_scrub;
     }
 
     avail[node][zone] += 1 << order;
     total_avail_pages += 1 << order;
+    if ( need_scrub )
+    {
+        node_need_scrub[node] += 1 << order;
+        pg->u.free.first_dirty = 0;
+    }
+    else
+        pg->u.free.first_dirty = INVALID_DIRTY_IDX;
 
     if ( tmem_enabled() )
         midsize_alloc_zone_pages = max(
@@ -977,35 +1076,54 @@  static void free_heap_pages(
 
         if ( (page_to_mfn(pg) & mask) )
         {
+            struct page_info *predecessor = pg - mask;
+
             /* Merge with predecessor block? */
-            if ( !mfn_valid(_mfn(page_to_mfn(pg-mask))) ||
-                 !page_state_is(pg-mask, free) ||
-                 (PFN_ORDER(pg-mask) != order) ||
-                 (phys_to_nid(page_to_maddr(pg-mask)) != node) )
+            if ( !mfn_valid(_mfn(page_to_mfn(predecessor))) ||
+                 !page_state_is(predecessor, free) ||
+                 (PFN_ORDER(predecessor) != order) ||
+                 (phys_to_nid(page_to_maddr(predecessor)) != node) )
                 break;
-            pg -= mask;
-            page_list_del(pg, &heap(node, zone, order));
+
+            page_list_del(predecessor, &heap(node, zone, order));
+
+            if ( predecessor->u.free.first_dirty != INVALID_DIRTY_IDX )
+                need_scrub = true;
+                /* ... and keep predecessor's first_dirty. */
+            else if ( pg->u.free.first_dirty != INVALID_DIRTY_IDX )
+                predecessor->u.free.first_dirty = (1U << order) +
+                                                  pg->u.free.first_dirty;
+
+            pg->u.free.first_dirty = INVALID_DIRTY_IDX;
+            pg = predecessor;
         }
         else
         {
+            struct page_info *successor = pg + mask;
+
             /* Merge with successor block? */
-            if ( !mfn_valid(_mfn(page_to_mfn(pg+mask))) ||
-                 !page_state_is(pg+mask, free) ||
-                 (PFN_ORDER(pg+mask) != order) ||
-                 (phys_to_nid(page_to_maddr(pg+mask)) != node) )
+            if ( !mfn_valid(_mfn(page_to_mfn(successor))) ||
+                 !page_state_is(successor, free) ||
+                 (PFN_ORDER(successor) != order) ||
+                 (phys_to_nid(page_to_maddr(successor)) != node) )
                 break;
-            page_list_del(pg + mask, &heap(node, zone, order));
+            page_list_del(successor, &heap(node, zone, order));
+
+            need_scrub |= (successor->u.free.first_dirty != INVALID_DIRTY_IDX);
+            successor->u.free.first_dirty = INVALID_DIRTY_IDX;
         }
 
         order++;
     }
 
-    PFN_ORDER(pg) = order;
-    page_list_add_tail(pg, &heap(node, zone, order));
+    page_list_add_scrub(pg, node, zone, order, pg->u.free.first_dirty);
 
     if ( tainted )
         reserve_offlined_page(pg);
 
+    if ( need_scrub )
+        scrub_free_pages(node);
+
     spin_unlock(&heap_lock);
 }
 
@@ -1226,7 +1344,7 @@  unsigned int online_page(unsigned long mfn, uint32_t *status)
     spin_unlock(&heap_lock);
 
     if ( (y & PGC_state) == PGC_state_offlined )
-        free_heap_pages(pg, 0);
+        free_heap_pages(pg, 0, false);
 
     return ret;
 }
@@ -1295,7 +1413,7 @@  static void init_heap_pages(
             nr_pages -= n;
         }
 
-        free_heap_pages(pg+i, 0);
+        free_heap_pages(pg + i, 0, false);
     }
 }
 
@@ -1622,7 +1740,7 @@  void free_xenheap_pages(void *v, unsigned int order)
 
     memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
 
-    free_heap_pages(virt_to_page(v), order);
+    free_heap_pages(virt_to_page(v), order, false);
 }
 
 #else
@@ -1676,12 +1794,9 @@  void free_xenheap_pages(void *v, unsigned int order)
     pg = virt_to_page(v);
 
     for ( i = 0; i < (1u << order); i++ )
-    {
-        scrub_one_page(&pg[i]);
         pg[i].count_info &= ~PGC_xen_heap;
-    }
 
-    free_heap_pages(pg, order);
+    free_heap_pages(pg, order, true);
 }
 
 #endif
@@ -1790,7 +1905,7 @@  struct page_info *alloc_domheap_pages(
     if ( d && !(memflags & MEMF_no_owner) &&
          assign_pages(d, pg, order, memflags) )
     {
-        free_heap_pages(pg, order);
+        free_heap_pages(pg, order, false);
         return NULL;
     }
     
@@ -1858,11 +1973,7 @@  void free_domheap_pages(struct page_info *pg, unsigned int order)
             scrub = 1;
         }
 
-        if ( unlikely(scrub) )
-            for ( i = 0; i < (1 << order); i++ )
-                scrub_one_page(&pg[i]);
-
-        free_heap_pages(pg, order);
+        free_heap_pages(pg, order, scrub);
     }
 
     if ( drop_dom_ref )
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index f6915ad..38d4fba 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -43,6 +43,9 @@  struct page_info
         } inuse;
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
         struct {
+            /* Index of the first *possibly* unscrubbed page in the buddy. */
+#define INVALID_DIRTY_IDX -1U
+            unsigned int first_dirty;
             /* Do TLBs need flushing for safety before next page use? */
             bool_t need_tlbflush;
         } free;
@@ -115,6 +118,13 @@  struct page_info
 #define PGC_count_width   PG_shift(9)
 #define PGC_count_mask    ((1UL<<PGC_count_width)-1)
 
+/*
+ * Page needs to be scrubbed. Since this bit can only be set on a page that is
+ * free (i.e. in PGC_state_free) we can reuse PGC_allocated bit.
+ */
+#define _PGC_need_scrub   _PGC_allocated
+#define PGC_need_scrub    PGC_allocated
+
 extern unsigned long xenheap_mfn_start, xenheap_mfn_end;
 extern vaddr_t xenheap_virt_end;
 #ifdef CONFIG_ARM_64
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 119d7de..e20f161 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -87,6 +87,9 @@  struct page_info
 
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
         struct {
+            /* Index of the first *possibly* unscrubbed page in the buddy. */
+#define INVALID_DIRTY_IDX -1U
+            unsigned int first_dirty;
             /* Do TLBs need flushing for safety before next page use? */
             bool_t need_tlbflush;
         } free;
@@ -233,6 +236,13 @@  struct page_info
 #define PGC_count_width   PG_shift(9)
 #define PGC_count_mask    ((1UL<<PGC_count_width)-1)
 
+/*
+ * Page needs to be scrubbed. Since this bit can only be set on a page that is
+ * free (i.e. in PGC_state_free) we can reuse PGC_allocated bit.
+ */
+#define _PGC_need_scrub   _PGC_allocated
+#define PGC_need_scrub    PGC_allocated
+
 struct spage_info
 {
        unsigned long type_info;

[v4,1/8] mm: Place unscrubbed pages at the end of pagelist

Commit Message

Comments

Patch