diff mbox

[v4,3/8] mm: Scrub pages in alloc_heap_pages() if needed

Message ID 1495209040-11101-4-git-send-email-boris.ostrovsky@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Boris Ostrovsky May 19, 2017, 3:50 p.m. UTC
When allocating pages in alloc_heap_pages() first look for clean pages. If none
is found then retry, take pages marked as unscrubbed and scrub them.

Note that we shouldn't find unscrubbed pages in alloc_heap_pages() yet. However,
this will become possible when we stop scrubbing from free_heap_pages() and
instead do it from idle loop.

Since not all allocations require clean pages (such as xenheap allocations)
introduce MEMF_no_scrub flag that callers can set if they are willing to
consume unscrubbed pages.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v4:
* Add MEMF_no_tlbflush flag

 xen/common/page_alloc.c | 43 ++++++++++++++++++++++++++++++++++++-------
 xen/include/xen/mm.h    |  2 ++
 2 files changed, 38 insertions(+), 7 deletions(-)

Comments

Jan Beulich June 9, 2017, 3:22 p.m. UTC | #1
>>> On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
> @@ -734,8 +735,15 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
>  
>              /* Find smallest order which can satisfy the request. */
>              for ( j = order; j <= MAX_ORDER; j++ )
> +            {
>                  if ( (pg = page_list_remove_head(&heap(node, zone, j))) )
> -                    return pg;
> +                {
> +                    if ( (order == 0) || use_unscrubbed ||

Why is order 0 being special cased here? If this really is intended, a
comment should be added.

> @@ -821,9 +829,16 @@ static struct page_info *alloc_heap_pages(
>      pg = get_free_buddy(zone_lo, zone_hi, order, memflags, d);
>      if ( !pg )
>      {
> -        /* No suitable memory blocks. Fail the request. */
> -        spin_unlock(&heap_lock);
> -        return NULL;
> +        /* Try now getting a dirty buddy. */
> +        if ( !(memflags & MEMF_no_scrub) )
> +            pg = get_free_buddy(zone_lo, zone_hi, order,
> +                                memflags | MEMF_no_scrub, d);
> +        if ( !pg )
> +        {
> +            /* No suitable memory blocks. Fail the request. */
> +            spin_unlock(&heap_lock);
> +            return NULL;
> +        }
>      }

I'd appreciate if you avoided the re-indentation by simply
prefixing another if() to the one that's already there.

> @@ -855,10 +870,24 @@ static struct page_info *alloc_heap_pages(
>      if ( d != NULL )
>          d->last_alloc_node = node;
>  
> +    need_scrub &= !(memflags & MEMF_no_scrub);

Can't this be done right away when need_scrub is being set?

>      for ( i = 0; i < (1 << order); i++ )
>      {
>          /* Reference count must continuously be zero for free pages. */
> -        BUG_ON(pg[i].count_info != PGC_state_free);
> +        BUG_ON((pg[i].count_info & ~PGC_need_scrub ) != PGC_state_free);

Isn't this change needed in one of the earlier patches already?
There also is a stray blank ahead of the first closing paren here.

> +        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
> +        {
> +            if ( need_scrub )
> +                scrub_one_page(&pg[i]);
> +            node_need_scrub[node]--;
> +            /*
> +             * Technically, we need to set first_dirty to INVALID_DIRTY_IDX
> +             * on buddy's head. However, since we assign pg[i].count_info
> +             * below, we can skip this.
> +             */

This comment is correct only with the current way struct page_info's
fields are unionized. In fact I think the comment is unneeded - the
buddy is being transitioned from free to allocated here, so the field
loses its meaning.

Jan
Boris Ostrovsky June 9, 2017, 8:55 p.m. UTC | #2
On 06/09/2017 11:22 AM, Jan Beulich wrote:
>>>> On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
>> @@ -734,8 +735,15 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
>>  
>>              /* Find smallest order which can satisfy the request. */
>>              for ( j = order; j <= MAX_ORDER; j++ )
>> +            {
>>                  if ( (pg = page_list_remove_head(&heap(node, zone, j))) )
>> -                    return pg;
>> +                {
>> +                    if ( (order == 0) || use_unscrubbed ||
> Why is order 0 being special cased here? If this really is intended, a
> comment should be added.

That's because for a single page it's not worth skipping a dirty buddy.
(It is a pretty arbitrary number, could be <=1 or even <=2, presumably)

I'll add a comment.


>> @@ -855,10 +870,24 @@ static struct page_info *alloc_heap_pages(
>>      if ( d != NULL )
>>          d->last_alloc_node = node;
>>  
>> +    need_scrub &= !(memflags & MEMF_no_scrub);
> Can't this be done right away when need_scrub is being set?

No, because we use the earlier assignment to decide how we put
"sub-buddies" back to the heap (dirty or not). Here we use need_scrub to
decide whether to scrub the buddy.

This may change though with the changes that you suggested in the
comments to the first patch.

>
>>      for ( i = 0; i < (1 << order); i++ )
>>      {
>>          /* Reference count must continuously be zero for free pages. */
>> -        BUG_ON(pg[i].count_info != PGC_state_free);
>> +        BUG_ON((pg[i].count_info & ~PGC_need_scrub ) != PGC_state_free);
> Isn't this change needed in one of the earlier patches already?

At this patch level we are still scrubbing in free_heap_pages() so there
is never an unscrubbed page in the allocator. The next patch will switch
to scrubbing from idle loop.

> There also is a stray blank ahead of the first closing paren here.
>
>> +        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
>> +        {
>> +            if ( need_scrub )
>> +                scrub_one_page(&pg[i]);
>> +            node_need_scrub[node]--;
>> +            /*
>> +             * Technically, we need to set first_dirty to INVALID_DIRTY_IDX
>> +             * on buddy's head. However, since we assign pg[i].count_info
>> +             * below, we can skip this.
>> +             */
> This comment is correct only with the current way struct page_info's
> fields are unionized. In fact I think the comment is unneeded - the
> buddy is being transitioned from free to allocated here, so the field
> loses its meaning.

That, actually, is exactly what I was trying to say. I can drop the
comment if you feel it is obvious why we don't need to set first_dirty.

-boris
Jan Beulich June 12, 2017, 6:54 a.m. UTC | #3
>>> On 09.06.17 at 22:55, <boris.ostrovsky@oracle.com> wrote:
> On 06/09/2017 11:22 AM, Jan Beulich wrote:
>>>>> On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
>>> @@ -734,8 +735,15 @@ static struct page_info *get_free_buddy(unsigned int 
>>> +        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
>>> +        {
>>> +            if ( need_scrub )
>>> +                scrub_one_page(&pg[i]);
>>> +            node_need_scrub[node]--;
>>> +            /*
>>> +             * Technically, we need to set first_dirty to INVALID_DIRTY_IDX
>>> +             * on buddy's head. However, since we assign pg[i].count_info
>>> +             * below, we can skip this.
>>> +             */
>> This comment is correct only with the current way struct page_info's
>> fields are unionized. In fact I think the comment is unneeded - the
>> buddy is being transitioned from free to allocated here, so the field
>> loses its meaning.
> 
> That, actually, is exactly what I was trying to say. I can drop the
> comment if you feel it is obvious why we don't need to set first_dirty.

Well, my personal order of preference would be to (a) drop
the comment or then (b) re-word it to express the free ->
allocated transition as the reason explicitly. Others my prefer
a corrected comment over no comment at all ...

Jan
diff mbox

Patch

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 1e57885..b7c7426 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -703,6 +703,7 @@  static struct page_info *get_free_buddy(unsigned int zone_lo,
     nodemask_t nodemask = d ? d->node_affinity : node_online_map;
     unsigned int j, zone, nodemask_retry = 0, request = 1UL << order;
     struct page_info *pg;
+    bool use_unscrubbed = (memflags & MEMF_no_scrub);
 
     if ( node == NUMA_NO_NODE )
     {
@@ -734,8 +735,15 @@  static struct page_info *get_free_buddy(unsigned int zone_lo,
 
             /* Find smallest order which can satisfy the request. */
             for ( j = order; j <= MAX_ORDER; j++ )
+            {
                 if ( (pg = page_list_remove_head(&heap(node, zone, j))) )
-                    return pg;
+                {
+                    if ( (order == 0) || use_unscrubbed ||
+                         pg->u.free.first_dirty == INVALID_DIRTY_IDX)
+                        return pg;
+                    page_list_add_tail(pg, &heap(node, zone, j));
+                }
+            }
         } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */
 
         if ( (memflags & MEMF_exact_node) && req_node != NUMA_NO_NODE )
@@ -821,9 +829,16 @@  static struct page_info *alloc_heap_pages(
     pg = get_free_buddy(zone_lo, zone_hi, order, memflags, d);
     if ( !pg )
     {
-        /* No suitable memory blocks. Fail the request. */
-        spin_unlock(&heap_lock);
-        return NULL;
+        /* Try now getting a dirty buddy. */
+        if ( !(memflags & MEMF_no_scrub) )
+            pg = get_free_buddy(zone_lo, zone_hi, order,
+                                memflags | MEMF_no_scrub, d);
+        if ( !pg )
+        {
+            /* No suitable memory blocks. Fail the request. */
+            spin_unlock(&heap_lock);
+            return NULL;
+        }
     }
 
     node = phys_to_nid(page_to_maddr(pg));
@@ -855,10 +870,24 @@  static struct page_info *alloc_heap_pages(
     if ( d != NULL )
         d->last_alloc_node = node;
 
+    need_scrub &= !(memflags & MEMF_no_scrub);
     for ( i = 0; i < (1 << order); i++ )
     {
         /* Reference count must continuously be zero for free pages. */
-        BUG_ON(pg[i].count_info != PGC_state_free);
+        BUG_ON((pg[i].count_info & ~PGC_need_scrub ) != PGC_state_free);
+
+        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
+        {
+            if ( need_scrub )
+                scrub_one_page(&pg[i]);
+            node_need_scrub[node]--;
+            /*
+             * Technically, we need to set first_dirty to INVALID_DIRTY_IDX
+             * on buddy's head. However, since we assign pg[i].count_info
+             * below, we can skip this.
+             */
+        }
+
         pg[i].count_info = PGC_state_inuse;
 
         if ( !(memflags & MEMF_no_tlbflush) )
@@ -1737,7 +1766,7 @@  void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
     ASSERT(!in_irq());
 
     pg = alloc_heap_pages(MEMZONE_XEN, MEMZONE_XEN,
-                          order, memflags, NULL);
+                          order, memflags | MEMF_no_scrub, NULL);
     if ( unlikely(pg == NULL) )
         return NULL;
 
@@ -1787,7 +1816,7 @@  void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
     if ( !(memflags >> _MEMF_bits) )
         memflags |= MEMF_bits(xenheap_bits);
 
-    pg = alloc_domheap_pages(NULL, order, memflags);
+    pg = alloc_domheap_pages(NULL, order, memflags | MEMF_no_scrub);
     if ( unlikely(pg == NULL) )
         return NULL;
 
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 88de3c1..0d4b7c2 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -224,6 +224,8 @@  struct npfec {
 #define  MEMF_no_owner    (1U<<_MEMF_no_owner)
 #define _MEMF_no_tlbflush 6
 #define  MEMF_no_tlbflush (1U<<_MEMF_no_tlbflush)
+#define _MEMF_no_scrub    7
+#define  MEMF_no_scrub    (1U<<_MEMF_no_scrub)
 #define _MEMF_node        8
 #define  MEMF_node_mask   ((1U << (8 * sizeof(nodeid_t))) - 1)
 #define  MEMF_node(n)     ((((n) + 1) & MEMF_node_mask) << _MEMF_node)