diff mbox

[v2] mm: Don't scrub pages while holding heap lock in alloc_heap_pages()

Message ID 1504620661-26511-1-git-send-email-boris.ostrovsky@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Boris Ostrovsky Sept. 5, 2017, 2:11 p.m. UTC
Instead, preserve PGC_need_scrub bit when setting PGC_state_inuse
state while still under the lock and clear those pages later.

Note that we still need to grub the lock when clearing PGC_need_scrub
bit since count_info might be updated during MCE handling in
mark_page_offline().

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
I couldn't convince myself that just marking the head as PGC_state_inuse
under the lock is safe, mostly because of how MCE handler may write the
state while the allocator is walking (lock-free) the buddy.

This patch only deals with PGC_need_scrub bit which is outside state bits.
_PGC_need_scrub is aliased to _PGC_allocated and I think is safe to use but
we could redefine it to something else (PGC_cacheattr_base?).

 xen/common/page_alloc.c | 44 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 34 insertions(+), 10 deletions(-)

Comments

Jan Beulich Sept. 5, 2017, 2:28 p.m. UTC | #1
>>> On 05.09.17 at 16:11, <boris.ostrovsky@oracle.com> wrote:
> I couldn't convince myself that just marking the head as PGC_state_inuse
> under the lock is safe, mostly because of how MCE handler may write the
> state while the allocator is walking (lock-free) the buddy.

Good point.

> @@ -974,13 +972,39 @@ static struct page_info *alloc_heap_pages(
>           * guest can control its own visibility of/through the cache.
>           */
>          flush_page_to_ram(page_to_mfn(&pg[i]), !(memflags & MEMF_no_icache_flush));
> -
> -        if ( !(memflags & MEMF_no_scrub) )
> -            check_one_page(&pg[i]);
>      }
>  
>      spin_unlock(&heap_lock);
>  
> +    if ( first_dirty != INVALID_DIRTY_IDX ||
> +         (scrub_debug && !(memflags & MEMF_no_scrub)) )

Why does scrub_debug matter here?.

> +    {
> +        for ( i = 0; i < (1U << order); i++ )
> +        {
> +            if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
> +            {
> +                if ( !(memflags & MEMF_no_scrub) )
> +                    scrub_one_page(&pg[i]);
> +
> +                dirty_cnt++;
> +
> +                spin_lock(&heap_lock);
> +                pg[i].count_info &= ~PGC_need_scrub;
> +                spin_unlock(&heap_lock);
> +            }
> +
> +            if ( !(memflags & MEMF_no_scrub) )
> +                check_one_page(&pg[i]);

Wouldn't this better be "else if", as checking a page just scrubbed
doesn't look very useful?

Jan
Boris Ostrovsky Sept. 5, 2017, 2:42 p.m. UTC | #2
>> @@ -974,13 +972,39 @@ static struct page_info *alloc_heap_pages(
>>           * guest can control its own visibility of/through the cache.
>>           */
>>          flush_page_to_ram(page_to_mfn(&pg[i]), !(memflags & MEMF_no_icache_flush));
>> -
>> -        if ( !(memflags & MEMF_no_scrub) )
>> -            check_one_page(&pg[i]);
>>      }
>>  
>>      spin_unlock(&heap_lock);
>>  
>> +    if ( first_dirty != INVALID_DIRTY_IDX ||
>> +         (scrub_debug && !(memflags & MEMF_no_scrub)) )
> Why does scrub_debug matter here?.
>
>> +    {
>> +        for ( i = 0; i < (1U << order); i++ )
>> +        {
>> +            if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
>> +            {
>> +                if ( !(memflags & MEMF_no_scrub) )
>> +                    scrub_one_page(&pg[i]);
>> +
>> +                dirty_cnt++;
>> +
>> +                spin_lock(&heap_lock);
>> +                pg[i].count_info &= ~PGC_need_scrub;
>> +                spin_unlock(&heap_lock);
>> +            }
>> +
>> +            if ( !(memflags & MEMF_no_scrub) )
>> +                check_one_page(&pg[i]);
> Wouldn't this better be "else if", as checking a page just scrubbed
> doesn't look very useful?

For both of these questions --- we don't want to miss a poisoned page.
For example, if a page was poisoned but for some reason is not marked
PGC_need_scrub.

Of course, we risk a false positive if a guest wrote the page with the
same pattern.

-boris
Jan Beulich Sept. 5, 2017, 2:53 p.m. UTC | #3
>>> On 05.09.17 at 16:42, <boris.ostrovsky@oracle.com> wrote:

>>> @@ -974,13 +972,39 @@ static struct page_info *alloc_heap_pages(
>>>           * guest can control its own visibility of/through the cache.
>>>           */
>>>          flush_page_to_ram(page_to_mfn(&pg[i]), !(memflags & 
> MEMF_no_icache_flush));
>>> -
>>> -        if ( !(memflags & MEMF_no_scrub) )
>>> -            check_one_page(&pg[i]);
>>>      }
>>>  
>>>      spin_unlock(&heap_lock);
>>>  
>>> +    if ( first_dirty != INVALID_DIRTY_IDX ||
>>> +         (scrub_debug && !(memflags & MEMF_no_scrub)) )
>> Why does scrub_debug matter here?.
>>
>>> +    {
>>> +        for ( i = 0; i < (1U << order); i++ )
>>> +        {
>>> +            if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
>>> +            {
>>> +                if ( !(memflags & MEMF_no_scrub) )
>>> +                    scrub_one_page(&pg[i]);
>>> +
>>> +                dirty_cnt++;
>>> +
>>> +                spin_lock(&heap_lock);
>>> +                pg[i].count_info &= ~PGC_need_scrub;
>>> +                spin_unlock(&heap_lock);
>>> +            }
>>> +
>>> +            if ( !(memflags & MEMF_no_scrub) )
>>> +                check_one_page(&pg[i]);
>> Wouldn't this better be "else if", as checking a page just scrubbed
>> doesn't look very useful?
> 
> For both of these questions --- we don't want to miss a poisoned page.
> For example, if a page was poisoned but for some reason is not marked
> PGC_need_scrub.

While I can accept this as an answer to the first question, I don't
see how it relates to the second one: When MEMF_no_scrub is
clear, the code above obviously scrubs the page in the first if()
body - what's the point of passing it to check_one_page() right
afterwards?

Jan
Boris Ostrovsky Sept. 5, 2017, 2:54 p.m. UTC | #4
On 09/05/2017 10:42 AM, Boris Ostrovsky wrote:
>>> @@ -974,13 +972,39 @@ static struct page_info *alloc_heap_pages(
>>>           * guest can control its own visibility of/through the cache.
>>>           */
>>>          flush_page_to_ram(page_to_mfn(&pg[i]), !(memflags & MEMF_no_icache_flush));
>>> -
>>> -        if ( !(memflags & MEMF_no_scrub) )
>>> -            check_one_page(&pg[i]);
>>>      }
>>>  
>>>      spin_unlock(&heap_lock);
>>>  
>>> +    if ( first_dirty != INVALID_DIRTY_IDX ||
>>> +         (scrub_debug && !(memflags & MEMF_no_scrub)) )
>> Why does scrub_debug matter here?.
>>
>>> +    {
>>> +        for ( i = 0; i < (1U << order); i++ )
>>> +        {
>>> +            if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
>>> +            {
>>> +                if ( !(memflags & MEMF_no_scrub) )
>>> +                    scrub_one_page(&pg[i]);
>>> +
>>> +                dirty_cnt++;
>>> +
>>> +                spin_lock(&heap_lock);
>>> +                pg[i].count_info &= ~PGC_need_scrub;
>>> +                spin_unlock(&heap_lock);
>>> +            }
>>> +
>>> +            if ( !(memflags & MEMF_no_scrub) )
>>> +                check_one_page(&pg[i]);
>> Wouldn't this better be "else if", as checking a page just scrubbed
>> doesn't look very useful?
> For both of these questions --- we don't want to miss a poisoned page.
> For example, if a page was poisoned but for some reason is not marked
> PGC_need_scrub.
>
> Of course, we risk a false positive if a guest wrote the page with the
> same pattern.

Just in case I wasn't clear --- I will remove scrub_debug test and add
'else' for this reason. Even though it's a debug-only feature I think we
shouldn't do this.

-boris
Jan Beulich Sept. 5, 2017, 3:14 p.m. UTC | #5
>>> On 05.09.17 at 16:54, <boris.ostrovsky@oracle.com> wrote:
> On 09/05/2017 10:42 AM, Boris Ostrovsky wrote:
>>>> @@ -974,13 +972,39 @@ static struct page_info *alloc_heap_pages(
>>>>           * guest can control its own visibility of/through the cache.
>>>>           */
>>>>          flush_page_to_ram(page_to_mfn(&pg[i]), !(memflags & 
> MEMF_no_icache_flush));
>>>> -
>>>> -        if ( !(memflags & MEMF_no_scrub) )
>>>> -            check_one_page(&pg[i]);
>>>>      }
>>>>  
>>>>      spin_unlock(&heap_lock);
>>>>  
>>>> +    if ( first_dirty != INVALID_DIRTY_IDX ||
>>>> +         (scrub_debug && !(memflags & MEMF_no_scrub)) )
>>> Why does scrub_debug matter here?.
>>>
>>>> +    {
>>>> +        for ( i = 0; i < (1U << order); i++ )
>>>> +        {
>>>> +            if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
>>>> +            {
>>>> +                if ( !(memflags & MEMF_no_scrub) )
>>>> +                    scrub_one_page(&pg[i]);
>>>> +
>>>> +                dirty_cnt++;
>>>> +
>>>> +                spin_lock(&heap_lock);
>>>> +                pg[i].count_info &= ~PGC_need_scrub;
>>>> +                spin_unlock(&heap_lock);
>>>> +            }
>>>> +
>>>> +            if ( !(memflags & MEMF_no_scrub) )
>>>> +                check_one_page(&pg[i]);
>>> Wouldn't this better be "else if", as checking a page just scrubbed
>>> doesn't look very useful?
>> For both of these questions --- we don't want to miss a poisoned page.
>> For example, if a page was poisoned but for some reason is not marked
>> PGC_need_scrub.
>>
>> Of course, we risk a false positive if a guest wrote the page with the
>> same pattern.
> 
> Just in case I wasn't clear --- I will remove scrub_debug test and add
> 'else' for this reason. Even though it's a debug-only feature I think we
> shouldn't do this.

Perhaps I'm dense, but it sounds to me like you agree to do a
change you think you shouldn't do.

Jan
Boris Ostrovsky Sept. 5, 2017, 3:39 p.m. UTC | #6
On 09/05/2017 11:14 AM, Jan Beulich wrote:
>>>> On 05.09.17 at 16:54, <boris.ostrovsky@oracle.com> wrote:
>> On 09/05/2017 10:42 AM, Boris Ostrovsky wrote:
>>>>> @@ -974,13 +972,39 @@ static struct page_info *alloc_heap_pages(
>>>>>           * guest can control its own visibility of/through the cache.
>>>>>           */
>>>>>          flush_page_to_ram(page_to_mfn(&pg[i]), !(memflags & 
>> MEMF_no_icache_flush));
>>>>> -
>>>>> -        if ( !(memflags & MEMF_no_scrub) )
>>>>> -            check_one_page(&pg[i]);
>>>>>      }
>>>>>  
>>>>>      spin_unlock(&heap_lock);
>>>>>  
>>>>> +    if ( first_dirty != INVALID_DIRTY_IDX ||
>>>>> +         (scrub_debug && !(memflags & MEMF_no_scrub)) )
>>>> Why does scrub_debug matter here?.
>>>>
>>>>> +    {
>>>>> +        for ( i = 0; i < (1U << order); i++ )
>>>>> +        {
>>>>> +            if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
>>>>> +            {
>>>>> +                if ( !(memflags & MEMF_no_scrub) )
>>>>> +                    scrub_one_page(&pg[i]);
>>>>> +
>>>>> +                dirty_cnt++;
>>>>> +
>>>>> +                spin_lock(&heap_lock);
>>>>> +                pg[i].count_info &= ~PGC_need_scrub;
>>>>> +                spin_unlock(&heap_lock);
>>>>> +            }
>>>>> +
>>>>> +            if ( !(memflags & MEMF_no_scrub) )
>>>>> +                check_one_page(&pg[i]);
>>>> Wouldn't this better be "else if", as checking a page just scrubbed
>>>> doesn't look very useful?
>>> For both of these questions --- we don't want to miss a poisoned page.
>>> For example, if a page was poisoned but for some reason is not marked
>>> PGC_need_scrub.
>>>
>>> Of course, we risk a false positive if a guest wrote the page with the
>>> same pattern.
>> Just in case I wasn't clear --- I will remove scrub_debug test and add
>> 'else' for this reason. Even though it's a debug-only feature I think we
>> shouldn't do this.
> Perhaps I'm dense, but it sounds to me like you agree to do a
> change you think you shouldn't do.

What I meant was that with current code we have a chance of a false
positive and that, even though it's a debug-only case, is probably not
the way to go.

If I make the change that you suggested this false positive will not
occur (although at the cost of reduced test coverage).

-boris
diff mbox

Patch

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index dbad1e1..d368518 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -860,6 +860,7 @@  static struct page_info *alloc_heap_pages(
     struct page_info *pg;
     bool need_tlbflush = false;
     uint32_t tlbflush_timestamp = 0;
+    unsigned int dirty_cnt = 0;
 
     /* Make sure there are enough bits in memflags for nodeID. */
     BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t)));
@@ -953,14 +954,11 @@  static struct page_info *alloc_heap_pages(
         /* Reference count must continuously be zero for free pages. */
         BUG_ON((pg[i].count_info & ~PGC_need_scrub) != PGC_state_free);
 
-        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
-        {
-            if ( !(memflags & MEMF_no_scrub) )
-                scrub_one_page(&pg[i]);
-            node_need_scrub[node]--;
-        }
+        /* PGC_need_scrub can only be set if first_dirty is valid */
+        ASSERT(first_dirty != INVALID_DIRTY_IDX || !(pg[i].count_info & PGC_need_scrub));
 
-        pg[i].count_info = PGC_state_inuse;
+        /* Preserve PGC_need_scrub so we can check it after lock is dropped. */
+        pg[i].count_info = PGC_state_inuse | (pg[i].count_info & PGC_need_scrub);
 
         if ( !(memflags & MEMF_no_tlbflush) )
             accumulate_tlbflush(&need_tlbflush, &pg[i],
@@ -974,13 +972,39 @@  static struct page_info *alloc_heap_pages(
          * guest can control its own visibility of/through the cache.
          */
         flush_page_to_ram(page_to_mfn(&pg[i]), !(memflags & MEMF_no_icache_flush));
-
-        if ( !(memflags & MEMF_no_scrub) )
-            check_one_page(&pg[i]);
     }
 
     spin_unlock(&heap_lock);
 
+    if ( first_dirty != INVALID_DIRTY_IDX ||
+         (scrub_debug && !(memflags & MEMF_no_scrub)) )
+    {
+        for ( i = 0; i < (1U << order); i++ )
+        {
+            if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
+            {
+                if ( !(memflags & MEMF_no_scrub) )
+                    scrub_one_page(&pg[i]);
+
+                dirty_cnt++;
+
+                spin_lock(&heap_lock);
+                pg[i].count_info &= ~PGC_need_scrub;
+                spin_unlock(&heap_lock);
+            }
+
+            if ( !(memflags & MEMF_no_scrub) )
+                check_one_page(&pg[i]);
+        }
+
+        if ( dirty_cnt )
+        {
+            spin_lock(&heap_lock);
+            node_need_scrub[node] -= dirty_cnt;
+            spin_unlock(&heap_lock);
+        }
+    }
+
     if ( need_tlbflush )
         filtered_flush_tlb_mask(tlbflush_timestamp);