diff mbox

[v3,5/9] mm: Do not discard already-scrubbed pages if softirqs are pending

Message ID 1492184258-3277-6-git-send-email-boris.ostrovsky@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Boris Ostrovsky April 14, 2017, 3:37 p.m. UTC
While scrubbing from idle loop, check for softirqs every 256 pages.
If softirq is pending, don't scrub any further and merge the
partially-scrubbed buddy back into heap by breaking the clean portion
into smaller power-of-2 chunks. Then repeat the same process for the
dirty part.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/common/page_alloc.c |   78 ++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 64 insertions(+), 14 deletions(-)

Comments

Jan Beulich May 4, 2017, 3:43 p.m. UTC | #1
>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
> While scrubbing from idle loop, check for softirqs every 256 pages.
> If softirq is pending, don't scrub any further and merge the
> partially-scrubbed buddy back into heap by breaking the clean portion
> into smaller power-of-2 chunks. Then repeat the same process for the
> dirty part.

This is ugly, as it gets us back into the state where full merge
opportunities aren't being realized, just that the time window
may be smaller now. As hinted at before, is there no way to
flag the first page needing scrubbing alongside the head
indicating that _some_ page needs scrubbing? The pages are
all free, so if there's no other suitable storage, the head page
itself could serve as such. But instead of a flag in struct
page_info, perhaps you could store a (relatively small) integer?

Jan
Boris Ostrovsky May 4, 2017, 5:18 p.m. UTC | #2
On 05/04/2017 11:43 AM, Jan Beulich wrote:
>>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
>> While scrubbing from idle loop, check for softirqs every 256 pages.
>> If softirq is pending, don't scrub any further and merge the
>> partially-scrubbed buddy back into heap by breaking the clean portion
>> into smaller power-of-2 chunks. Then repeat the same process for the
>> dirty part.
> This is ugly, as it gets us back into the state where full merge
> opportunities aren't being realized, just that the time window
> may be smaller now. As hinted at before, is there no way to
> flag the first page needing scrubbing alongside the head
> indicating that _some_ page needs scrubbing? The pages are
> all free, so if there's no other suitable storage, the head page
> itself could serve as such. But instead of a flag in struct
> page_info, perhaps you could store a (relatively small) integer?

How will it help? Even if we know what the fist dirty page is we still
may have to drop scrubbing if irq is pending. We simply will not have to
scan the buddy until first dirty page is found.

Or perhaps I don't understand what you are suggesting.


-boris
Jan Beulich May 5, 2017, 10:27 a.m. UTC | #3
>>> On 04.05.17 at 19:18, <boris.ostrovsky@oracle.com> wrote:
> On 05/04/2017 11:43 AM, Jan Beulich wrote:
>>>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
>>> While scrubbing from idle loop, check for softirqs every 256 pages.
>>> If softirq is pending, don't scrub any further and merge the
>>> partially-scrubbed buddy back into heap by breaking the clean portion
>>> into smaller power-of-2 chunks. Then repeat the same process for the
>>> dirty part.
>> This is ugly, as it gets us back into the state where full merge
>> opportunities aren't being realized, just that the time window
>> may be smaller now. As hinted at before, is there no way to
>> flag the first page needing scrubbing alongside the head
>> indicating that _some_ page needs scrubbing? The pages are
>> all free, so if there's no other suitable storage, the head page
>> itself could serve as such. But instead of a flag in struct
>> page_info, perhaps you could store a (relatively small) integer?
> 
> How will it help? Even if we know what the fist dirty page is we still
> may have to drop scrubbing if irq is pending. We simply will not have to
> scan the buddy until first dirty page is found.
> 
> Or perhaps I don't understand what you are suggesting.

If you get a request to stop scrubbing, you split the current
buddy into a clean and a (possibly) dirty part (if I understood
the code in this patch correctly). It's that splitting which I'd like
to see avoided, and by keeping a "first dirty" index (which
would be updated when you stop scrubbing) you could do so.

Jan
Boris Ostrovsky May 5, 2017, 1:51 p.m. UTC | #4
On 05/05/2017 06:27 AM, Jan Beulich wrote:
>>>> On 04.05.17 at 19:18, <boris.ostrovsky@oracle.com> wrote:
>> On 05/04/2017 11:43 AM, Jan Beulich wrote:
>>>>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
>>>> While scrubbing from idle loop, check for softirqs every 256 pages.
>>>> If softirq is pending, don't scrub any further and merge the
>>>> partially-scrubbed buddy back into heap by breaking the clean portion
>>>> into smaller power-of-2 chunks. Then repeat the same process for the
>>>> dirty part.
>>> This is ugly, as it gets us back into the state where full merge
>>> opportunities aren't being realized, just that the time window
>>> may be smaller now. As hinted at before, is there no way to
>>> flag the first page needing scrubbing alongside the head
>>> indicating that _some_ page needs scrubbing? The pages are
>>> all free, so if there's no other suitable storage, the head page
>>> itself could serve as such. But instead of a flag in struct
>>> page_info, perhaps you could store a (relatively small) integer?
>> How will it help? Even if we know what the fist dirty page is we still
>> may have to drop scrubbing if irq is pending. We simply will not have to
>> scan the buddy until first dirty page is found.
>>
>> Or perhaps I don't understand what you are suggesting.
> If you get a request to stop scrubbing, you split the current
> buddy into a clean and a (possibly) dirty part (if I understood
> the code in this patch correctly). 

Right.

> It's that splitting which I'd like
> to see avoided, and by keeping a "first dirty" index (which
> would be updated when you stop scrubbing) you could do so.


If we are to avoid splitting then we might just keep the buddy on the
"dirty end" (i.e tail) of the order if we are interrupted in the middle.

Having "first_dirty" (which is really "maybe_first_dirty") would only
help us start next scan faster but it won't really change the algorithm.

(Or maybe that's exactly what you are proposing.)

-boris
Jan Beulich May 5, 2017, 2:13 p.m. UTC | #5
>>> On 05.05.17 at 15:51, <boris.ostrovsky@oracle.com> wrote:
> On 05/05/2017 06:27 AM, Jan Beulich wrote:
>>>>> On 04.05.17 at 19:18, <boris.ostrovsky@oracle.com> wrote:
>>> On 05/04/2017 11:43 AM, Jan Beulich wrote:
>>>>>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
>>>>> While scrubbing from idle loop, check for softirqs every 256 pages.
>>>>> If softirq is pending, don't scrub any further and merge the
>>>>> partially-scrubbed buddy back into heap by breaking the clean portion
>>>>> into smaller power-of-2 chunks. Then repeat the same process for the
>>>>> dirty part.
>>>> This is ugly, as it gets us back into the state where full merge
>>>> opportunities aren't being realized, just that the time window
>>>> may be smaller now. As hinted at before, is there no way to
>>>> flag the first page needing scrubbing alongside the head
>>>> indicating that _some_ page needs scrubbing? The pages are
>>>> all free, so if there's no other suitable storage, the head page
>>>> itself could serve as such. But instead of a flag in struct
>>>> page_info, perhaps you could store a (relatively small) integer?
>>> How will it help? Even if we know what the fist dirty page is we still
>>> may have to drop scrubbing if irq is pending. We simply will not have to
>>> scan the buddy until first dirty page is found.
>>>
>>> Or perhaps I don't understand what you are suggesting.
>> If you get a request to stop scrubbing, you split the current
>> buddy into a clean and a (possibly) dirty part (if I understood
>> the code in this patch correctly). 
> 
> Right.
> 
>> It's that splitting which I'd like
>> to see avoided, and by keeping a "first dirty" index (which
>> would be updated when you stop scrubbing) you could do so.
> 
> 
> If we are to avoid splitting then we might just keep the buddy on the
> "dirty end" (i.e tail) of the order if we are interrupted in the middle.
> 
> Having "first_dirty" (which is really "maybe_first_dirty") would only
> help us start next scan faster but it won't really change the algorithm.
> 
> (Or maybe that's exactly what you are proposing.)

Yes - using such a hint means better chances of actually making
overall progress despite not splitting buddies.

Jan
diff mbox

Patch

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index fcd7308..0b2dff1 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -996,7 +996,7 @@  static int reserve_offlined_page(struct page_info *head)
 static struct page_info *
 merge_and_free_buddy(struct page_info *pg, unsigned int node,
                      unsigned int zone, unsigned int order,
-                     bool need_scrub)
+                     bool need_scrub, bool is_frag)
 {
     ASSERT(spin_is_locked(&heap_lock));
 
@@ -1011,7 +1011,15 @@  merge_and_free_buddy(struct page_info *pg, unsigned int node,
         if ( (page_to_mfn(pg) & mask) )
             buddy = pg - mask; /* Merge with predecessor block. */
         else
-            buddy = pg + mask; /* Merge with successor block. */
+        {
+            /*
+             * Merge with successor block.
+             * Only un-fragmented buddy can be merged forward.
+             */
+            if ( is_frag )
+                break;
+            buddy = pg + mask;
+        }
 
         if ( !mfn_valid(_mfn(page_to_mfn(buddy))) ||
              !page_state_is(buddy, free) ||
@@ -1093,12 +1101,15 @@  static unsigned int node_to_scrub(bool get_node)
 bool scrub_free_pages(void)
 {
     struct page_info *pg;
-    unsigned int zone, order;
-    unsigned long i;
+    unsigned int zone, order, scrub_order;
+    unsigned long i, num_processed, start, end;
     unsigned int cpu = smp_processor_id();
-    bool preempt = false;
+    bool preempt = false, is_frag;
     nodeid_t node;
 
+    /* Scrubbing granularity. */
+#define SCRUB_CHUNK_ORDER  8
+
     /*
      * Don't scrub while dom0 is being constructed since we may
      * fail trying to call map_domain_page() from scrub_one_page().
@@ -1123,25 +1134,64 @@  bool scrub_free_pages(void)
                 if ( !pg->u.free.dirty_head )
                     break;
 
-                for ( i = 0; i < (1UL << order); i++)
+                scrub_order = MIN(order, SCRUB_CHUNK_ORDER);
+                num_processed = 0;
+                is_frag = false;
+                while ( num_processed < (1UL << order) )
                 {
-                    if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
+                    for ( i = num_processed;
+                          i < num_processed + (1UL << scrub_order); i++ )
                     {
-                        scrub_one_page(&pg[i]);
-                        pg[i].count_info &= ~PGC_need_scrub;
-                        node_need_scrub[node]--;
+                        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
+                        {
+                            scrub_one_page(&pg[i]);
+                            pg[i].count_info &= ~PGC_need_scrub;
+                            node_need_scrub[node]--;
+                        }
                     }
+
+                    num_processed += (1UL << scrub_order);
                     if ( softirq_pending(cpu) )
                     {
                         preempt = true;
+                        is_frag = (num_processed < (1UL << order));
                         break;
                     }
                 }
 
-                if ( i == (1UL << order) )
+                start = 0;
+                end = num_processed;
+
+                page_list_del(pg, &heap(node, zone, order));
+
+                /* Merge clean pages */
+                while ( start < end )
+                {
+                    /*
+                     * Largest power-of-two chunk starting @start,
+                     * not greater than @end
+                     */
+                    unsigned chunk_order = flsl(end - start) - 1;
+                    struct page_info *ppg = &pg[start];
+
+                    PFN_ORDER(ppg) = chunk_order;
+                    merge_and_free_buddy(ppg, node, zone, chunk_order, false, is_frag);
+                    start += (1UL << chunk_order);
+                }
+
+                /* Merge unscrubbed pages */
+                while ( end < (1UL << order) )
                 {
-                    page_list_del(pg, &heap(node, zone, order));
-                    merge_and_free_buddy(pg, node, zone, order, false);
+                    /*
+                     * Largest power-of-two chunk starting @end, not crossing
+                     * next power-of-two boundary
+                     */
+                    unsigned chunk_order = ffsl(end) - 1;
+                    struct page_info *ppg = &pg[end];
+
+                    PFN_ORDER(ppg) = chunk_order;
+                    merge_and_free_buddy(ppg, node, zone, chunk_order, true, true);
+                    end += (1UL << chunk_order);
                 }
 
                 if ( preempt || (node_need_scrub[node] == 0) )
@@ -1215,7 +1265,7 @@  static void free_heap_pages(
         node_need_scrub[node] += (1UL << order);
     }
 
-    pg = merge_and_free_buddy(pg, node, zone, order, need_scrub);
+    pg = merge_and_free_buddy(pg, node, zone, order, need_scrub, false);
 
     if ( tainted )
         reserve_offlined_page(pg);