Message ID | 1504620661-26511-1-git-send-email-boris.ostrovsky@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 05.09.17 at 16:11, <boris.ostrovsky@oracle.com> wrote: > I couldn't convince myself that just marking the head as PGC_state_inuse > under the lock is safe, mostly because of how MCE handler may write the > state while the allocator is walking (lock-free) the buddy. Good point. > @@ -974,13 +972,39 @@ static struct page_info *alloc_heap_pages( > * guest can control its own visibility of/through the cache. > */ > flush_page_to_ram(page_to_mfn(&pg[i]), !(memflags & MEMF_no_icache_flush)); > - > - if ( !(memflags & MEMF_no_scrub) ) > - check_one_page(&pg[i]); > } > > spin_unlock(&heap_lock); > > + if ( first_dirty != INVALID_DIRTY_IDX || > + (scrub_debug && !(memflags & MEMF_no_scrub)) ) Why does scrub_debug matter here?. > + { > + for ( i = 0; i < (1U << order); i++ ) > + { > + if ( test_bit(_PGC_need_scrub, &pg[i].count_info) ) > + { > + if ( !(memflags & MEMF_no_scrub) ) > + scrub_one_page(&pg[i]); > + > + dirty_cnt++; > + > + spin_lock(&heap_lock); > + pg[i].count_info &= ~PGC_need_scrub; > + spin_unlock(&heap_lock); > + } > + > + if ( !(memflags & MEMF_no_scrub) ) > + check_one_page(&pg[i]); Wouldn't this better be "else if", as checking a page just scrubbed doesn't look very useful? Jan
>> @@ -974,13 +972,39 @@ static struct page_info *alloc_heap_pages( >> * guest can control its own visibility of/through the cache. >> */ >> flush_page_to_ram(page_to_mfn(&pg[i]), !(memflags & MEMF_no_icache_flush)); >> - >> - if ( !(memflags & MEMF_no_scrub) ) >> - check_one_page(&pg[i]); >> } >> >> spin_unlock(&heap_lock); >> >> + if ( first_dirty != INVALID_DIRTY_IDX || >> + (scrub_debug && !(memflags & MEMF_no_scrub)) ) > Why does scrub_debug matter here?. > >> + { >> + for ( i = 0; i < (1U << order); i++ ) >> + { >> + if ( test_bit(_PGC_need_scrub, &pg[i].count_info) ) >> + { >> + if ( !(memflags & MEMF_no_scrub) ) >> + scrub_one_page(&pg[i]); >> + >> + dirty_cnt++; >> + >> + spin_lock(&heap_lock); >> + pg[i].count_info &= ~PGC_need_scrub; >> + spin_unlock(&heap_lock); >> + } >> + >> + if ( !(memflags & MEMF_no_scrub) ) >> + check_one_page(&pg[i]); > Wouldn't this better be "else if", as checking a page just scrubbed > doesn't look very useful? For both of these questions --- we don't want to miss a poisoned page. For example, if a page was poisoned but for some reason is not marked PGC_need_scrub. Of course, we risk a false positive if a guest wrote the page with the same pattern. -boris
>>> On 05.09.17 at 16:42, <boris.ostrovsky@oracle.com> wrote: >>> @@ -974,13 +972,39 @@ static struct page_info *alloc_heap_pages( >>> * guest can control its own visibility of/through the cache. >>> */ >>> flush_page_to_ram(page_to_mfn(&pg[i]), !(memflags & > MEMF_no_icache_flush)); >>> - >>> - if ( !(memflags & MEMF_no_scrub) ) >>> - check_one_page(&pg[i]); >>> } >>> >>> spin_unlock(&heap_lock); >>> >>> + if ( first_dirty != INVALID_DIRTY_IDX || >>> + (scrub_debug && !(memflags & MEMF_no_scrub)) ) >> Why does scrub_debug matter here?. >> >>> + { >>> + for ( i = 0; i < (1U << order); i++ ) >>> + { >>> + if ( test_bit(_PGC_need_scrub, &pg[i].count_info) ) >>> + { >>> + if ( !(memflags & MEMF_no_scrub) ) >>> + scrub_one_page(&pg[i]); >>> + >>> + dirty_cnt++; >>> + >>> + spin_lock(&heap_lock); >>> + pg[i].count_info &= ~PGC_need_scrub; >>> + spin_unlock(&heap_lock); >>> + } >>> + >>> + if ( !(memflags & MEMF_no_scrub) ) >>> + check_one_page(&pg[i]); >> Wouldn't this better be "else if", as checking a page just scrubbed >> doesn't look very useful? > > For both of these questions --- we don't want to miss a poisoned page. > For example, if a page was poisoned but for some reason is not marked > PGC_need_scrub. While I can accept this as an answer to the first question, I don't see how it relates to the second one: When MEMF_no_scrub is clear, the code above obviously scrubs the page in the first if() body - what's the point of passing it to check_one_page() right afterwards? Jan
On 09/05/2017 10:42 AM, Boris Ostrovsky wrote: >>> @@ -974,13 +972,39 @@ static struct page_info *alloc_heap_pages( >>> * guest can control its own visibility of/through the cache. >>> */ >>> flush_page_to_ram(page_to_mfn(&pg[i]), !(memflags & MEMF_no_icache_flush)); >>> - >>> - if ( !(memflags & MEMF_no_scrub) ) >>> - check_one_page(&pg[i]); >>> } >>> >>> spin_unlock(&heap_lock); >>> >>> + if ( first_dirty != INVALID_DIRTY_IDX || >>> + (scrub_debug && !(memflags & MEMF_no_scrub)) ) >> Why does scrub_debug matter here?. >> >>> + { >>> + for ( i = 0; i < (1U << order); i++ ) >>> + { >>> + if ( test_bit(_PGC_need_scrub, &pg[i].count_info) ) >>> + { >>> + if ( !(memflags & MEMF_no_scrub) ) >>> + scrub_one_page(&pg[i]); >>> + >>> + dirty_cnt++; >>> + >>> + spin_lock(&heap_lock); >>> + pg[i].count_info &= ~PGC_need_scrub; >>> + spin_unlock(&heap_lock); >>> + } >>> + >>> + if ( !(memflags & MEMF_no_scrub) ) >>> + check_one_page(&pg[i]); >> Wouldn't this better be "else if", as checking a page just scrubbed >> doesn't look very useful? > For both of these questions --- we don't want to miss a poisoned page. > For example, if a page was poisoned but for some reason is not marked > PGC_need_scrub. > > Of course, we risk a false positive if a guest wrote the page with the > same pattern. Just in case I wasn't clear --- I will remove scrub_debug test and add 'else' for this reason. Even though it's a debug-only feature I think we shouldn't do this. -boris
>>> On 05.09.17 at 16:54, <boris.ostrovsky@oracle.com> wrote: > On 09/05/2017 10:42 AM, Boris Ostrovsky wrote: >>>> @@ -974,13 +972,39 @@ static struct page_info *alloc_heap_pages( >>>> * guest can control its own visibility of/through the cache. >>>> */ >>>> flush_page_to_ram(page_to_mfn(&pg[i]), !(memflags & > MEMF_no_icache_flush)); >>>> - >>>> - if ( !(memflags & MEMF_no_scrub) ) >>>> - check_one_page(&pg[i]); >>>> } >>>> >>>> spin_unlock(&heap_lock); >>>> >>>> + if ( first_dirty != INVALID_DIRTY_IDX || >>>> + (scrub_debug && !(memflags & MEMF_no_scrub)) ) >>> Why does scrub_debug matter here?. >>> >>>> + { >>>> + for ( i = 0; i < (1U << order); i++ ) >>>> + { >>>> + if ( test_bit(_PGC_need_scrub, &pg[i].count_info) ) >>>> + { >>>> + if ( !(memflags & MEMF_no_scrub) ) >>>> + scrub_one_page(&pg[i]); >>>> + >>>> + dirty_cnt++; >>>> + >>>> + spin_lock(&heap_lock); >>>> + pg[i].count_info &= ~PGC_need_scrub; >>>> + spin_unlock(&heap_lock); >>>> + } >>>> + >>>> + if ( !(memflags & MEMF_no_scrub) ) >>>> + check_one_page(&pg[i]); >>> Wouldn't this better be "else if", as checking a page just scrubbed >>> doesn't look very useful? >> For both of these questions --- we don't want to miss a poisoned page. >> For example, if a page was poisoned but for some reason is not marked >> PGC_need_scrub. >> >> Of course, we risk a false positive if a guest wrote the page with the >> same pattern. > > Just in case I wasn't clear --- I will remove scrub_debug test and add > 'else' for this reason. Even though it's a debug-only feature I think we > shouldn't do this. Perhaps I'm dense, but it sounds to me like you agree to do a change you think you shouldn't do. Jan
On 09/05/2017 11:14 AM, Jan Beulich wrote: >>>> On 05.09.17 at 16:54, <boris.ostrovsky@oracle.com> wrote: >> On 09/05/2017 10:42 AM, Boris Ostrovsky wrote: >>>>> @@ -974,13 +972,39 @@ static struct page_info *alloc_heap_pages( >>>>> * guest can control its own visibility of/through the cache. >>>>> */ >>>>> flush_page_to_ram(page_to_mfn(&pg[i]), !(memflags & >> MEMF_no_icache_flush)); >>>>> - >>>>> - if ( !(memflags & MEMF_no_scrub) ) >>>>> - check_one_page(&pg[i]); >>>>> } >>>>> >>>>> spin_unlock(&heap_lock); >>>>> >>>>> + if ( first_dirty != INVALID_DIRTY_IDX || >>>>> + (scrub_debug && !(memflags & MEMF_no_scrub)) ) >>>> Why does scrub_debug matter here?. >>>> >>>>> + { >>>>> + for ( i = 0; i < (1U << order); i++ ) >>>>> + { >>>>> + if ( test_bit(_PGC_need_scrub, &pg[i].count_info) ) >>>>> + { >>>>> + if ( !(memflags & MEMF_no_scrub) ) >>>>> + scrub_one_page(&pg[i]); >>>>> + >>>>> + dirty_cnt++; >>>>> + >>>>> + spin_lock(&heap_lock); >>>>> + pg[i].count_info &= ~PGC_need_scrub; >>>>> + spin_unlock(&heap_lock); >>>>> + } >>>>> + >>>>> + if ( !(memflags & MEMF_no_scrub) ) >>>>> + check_one_page(&pg[i]); >>>> Wouldn't this better be "else if", as checking a page just scrubbed >>>> doesn't look very useful? >>> For both of these questions --- we don't want to miss a poisoned page. >>> For example, if a page was poisoned but for some reason is not marked >>> PGC_need_scrub. >>> >>> Of course, we risk a false positive if a guest wrote the page with the >>> same pattern. >> Just in case I wasn't clear --- I will remove scrub_debug test and add >> 'else' for this reason. Even though it's a debug-only feature I think we >> shouldn't do this. > Perhaps I'm dense, but it sounds to me like you agree to do a > change you think you shouldn't do. What I meant was that with current code we have a chance of a false positive and that, even though it's a debug-only case, is probably not the way to go. If I make the change that you suggested this false positive will not occur (although at the cost of reduced test coverage). -boris
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index dbad1e1..d368518 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -860,6 +860,7 @@ static struct page_info *alloc_heap_pages( struct page_info *pg; bool need_tlbflush = false; uint32_t tlbflush_timestamp = 0; + unsigned int dirty_cnt = 0; /* Make sure there are enough bits in memflags for nodeID. */ BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t))); @@ -953,14 +954,11 @@ static struct page_info *alloc_heap_pages( /* Reference count must continuously be zero for free pages. */ BUG_ON((pg[i].count_info & ~PGC_need_scrub) != PGC_state_free); - if ( test_bit(_PGC_need_scrub, &pg[i].count_info) ) - { - if ( !(memflags & MEMF_no_scrub) ) - scrub_one_page(&pg[i]); - node_need_scrub[node]--; - } + /* PGC_need_scrub can only be set if first_dirty is valid */ + ASSERT(first_dirty != INVALID_DIRTY_IDX || !(pg[i].count_info & PGC_need_scrub)); - pg[i].count_info = PGC_state_inuse; + /* Preserve PGC_need_scrub so we can check it after lock is dropped. */ + pg[i].count_info = PGC_state_inuse | (pg[i].count_info & PGC_need_scrub); if ( !(memflags & MEMF_no_tlbflush) ) accumulate_tlbflush(&need_tlbflush, &pg[i], @@ -974,13 +972,39 @@ static struct page_info *alloc_heap_pages( * guest can control its own visibility of/through the cache. */ flush_page_to_ram(page_to_mfn(&pg[i]), !(memflags & MEMF_no_icache_flush)); - - if ( !(memflags & MEMF_no_scrub) ) - check_one_page(&pg[i]); } spin_unlock(&heap_lock); + if ( first_dirty != INVALID_DIRTY_IDX || + (scrub_debug && !(memflags & MEMF_no_scrub)) ) + { + for ( i = 0; i < (1U << order); i++ ) + { + if ( test_bit(_PGC_need_scrub, &pg[i].count_info) ) + { + if ( !(memflags & MEMF_no_scrub) ) + scrub_one_page(&pg[i]); + + dirty_cnt++; + + spin_lock(&heap_lock); + pg[i].count_info &= ~PGC_need_scrub; + spin_unlock(&heap_lock); + } + + if ( !(memflags & MEMF_no_scrub) ) + check_one_page(&pg[i]); + } + + if ( dirty_cnt ) + { + spin_lock(&heap_lock); + node_need_scrub[node] -= dirty_cnt; + spin_unlock(&heap_lock); + } + } + if ( need_tlbflush ) filtered_flush_tlb_mask(tlbflush_timestamp);
Instead, preserve PGC_need_scrub bit when setting PGC_state_inuse state while still under the lock and clear those pages later. Note that we still need to grub the lock when clearing PGC_need_scrub bit since count_info might be updated during MCE handling in mark_page_offline(). Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- I couldn't convince myself that just marking the head as PGC_state_inuse under the lock is safe, mostly because of how MCE handler may write the state while the allocator is walking (lock-free) the buddy. This patch only deals with PGC_need_scrub bit which is outside state bits. _PGC_need_scrub is aliased to _PGC_allocated and I think is safe to use but we could redefine it to something else (PGC_cacheattr_base?). xen/common/page_alloc.c | 44 ++++++++++++++++++++++++++++++++++---------- 1 file changed, 34 insertions(+), 10 deletions(-)