Message ID | 20190412194712.1602-1-tamas@tklengyel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2,1/2] x86/mem_sharing: reorder when pages are unlocked and released | expand |
On 4/12/19 8:47 PM, Tamas K Lengyel wrote: > Patch 0502e0adae2 "x86: correct instances of PGC_allocated clearing" introduced > grabbing extra references for pages that drop references tied to PGC_allocated. > However, the way these extra references were grabbed were incorrect, resulting > in both share_pages and unshare_pages failing. Why? Is cd not the owner of the page? > There actually is no need to > grab extra references, only a reordering was needed for when the existing > references are being released. This is in accordance to the XSA-242 > recommendation of not calling _put_page_type while also holding the page_lock > for that page. > > Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com> > Cc: Jan Beulich <jbeulich@suse.com> > Cc: Andrew Cooper <andrew.cooper3@citrix.com> > Cc: George Dunlap <george.dunlap@eu.citrix.com> > Cc: Wei Liu <wei.liu2@citrix.com> > Cc: Roger Pau Monne <roger.pau@citrix.com> > --- > v2: Add assert that put_count is at least 1 > --- > xen/arch/x86/mm/mem_sharing.c | 32 +++++++++----------------------- > 1 file changed, 9 insertions(+), 23 deletions(-) > > diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c > index 5ac9d8f54c..708037d91e 100644 > --- a/xen/arch/x86/mm/mem_sharing.c > +++ b/xen/arch/x86/mm/mem_sharing.c > @@ -900,6 +900,7 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh, > p2m_type_t smfn_type, cmfn_type; > struct two_gfns tg; > struct rmap_iterator ri; > + unsigned long put_count = 0; > > get_two_gfns(sd, sgfn, &smfn_type, NULL, &smfn, > cd, cgfn, &cmfn_type, NULL, &cmfn, 0, &tg); > @@ -964,15 +965,6 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh, > goto err_out; > } > > - /* Acquire an extra reference, for the freeing below to be safe. */ > - if ( !get_page(cpage, cd) ) > - { > - ret = -EOVERFLOW; > - mem_sharing_page_unlock(secondpg); > - mem_sharing_page_unlock(firstpg); > - goto err_out; > - } Why does this fail? Is cd not the owner of the page? > - > /* Merge the lists together */ > rmap_seed_iterator(cpage, &ri); > while ( (gfn = rmap_iterate(cpage, &ri)) != NULL) > @@ -984,7 +976,7 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh, > * Don't change the type of rmap for the client page. */ > rmap_del(gfn, cpage, 0); > rmap_add(gfn, spage); > - put_page_and_type(cpage); > + put_count++; > d = get_domain_by_id(gfn->domain); > BUG_ON(!d); > BUG_ON(set_shared_p2m_entry(d, gfn->gfn, smfn)); > @@ -1002,7 +994,10 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh, > /* Free the client page */ > if(test_and_clear_bit(_PGC_allocated, &cpage->count_info)) > put_page(cpage); > - put_page(cpage); > + > + ASSERT(put_count); > + while ( put_count-- ) > + put_page_and_type(cpage); I don't think this is in the spirit of the fix in 0502e0adae2; the idea there seems to have been to make sure that cpage belongs to the right domain, and that the ownership doesn't change. If such a race / mistake were possible before that change, such a race / mistake would be possible after this change, wouldn't it? -George
On Mon, Apr 15, 2019 at 9:21 AM George Dunlap <george.dunlap@citrix.com> wrote: > > On 4/12/19 8:47 PM, Tamas K Lengyel wrote: > > Patch 0502e0adae2 "x86: correct instances of PGC_allocated clearing" introduced > > grabbing extra references for pages that drop references tied to PGC_allocated. > > However, the way these extra references were grabbed were incorrect, resulting > > in both share_pages and unshare_pages failing. > > Why? Is cd not the owner of the page? It's not, dom_cow is. > > @@ -1002,7 +994,10 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh, > > /* Free the client page */ > > if(test_and_clear_bit(_PGC_allocated, &cpage->count_info)) > > put_page(cpage); > > - put_page(cpage); > > + > > + ASSERT(put_count); > > + while ( put_count-- ) > > + put_page_and_type(cpage); > > I don't think this is in the spirit of the fix in 0502e0adae2; the idea > there seems to have been to make sure that cpage belongs to the right > domain, and that the ownership doesn't change. If such a race / mistake > were possible before that change, such a race / mistake would be > possible after this change, wouldn't it? I don't have an answer to your question. AFAIU this is just to ensure that the page isn't dropped before test_and_clear_bit is used on the page. Tamas
> On Apr 15, 2019, at 4:31 PM, Tamas K Lengyel <tamas@tklengyel.com> wrote: > > On Mon, Apr 15, 2019 at 9:21 AM George Dunlap <george.dunlap@citrix.com> wrote: >> >> On 4/12/19 8:47 PM, Tamas K Lengyel wrote: >>> Patch 0502e0adae2 "x86: correct instances of PGC_allocated clearing" introduced >>> grabbing extra references for pages that drop references tied to PGC_allocated. >>> However, the way these extra references were grabbed were incorrect, resulting >>> in both share_pages and unshare_pages failing. >> >> Why? Is cd not the owner of the page? > > It's not, dom_cow is. > >>> @@ -1002,7 +994,10 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh, >>> /* Free the client page */ >>> if(test_and_clear_bit(_PGC_allocated, &cpage->count_info)) >>> put_page(cpage); >>> - put_page(cpage); >>> + >>> + ASSERT(put_count); >>> + while ( put_count-- ) >>> + put_page_and_type(cpage); >> >> I don't think this is in the spirit of the fix in 0502e0adae2; the idea >> there seems to have been to make sure that cpage belongs to the right >> domain, and that the ownership doesn't change. If such a race / mistake >> were possible before that change, such a race / mistake would be >> possible after this change, wouldn't it? > > I don't have an answer to your question. AFAIU this is just to ensure > that the page isn't dropped before test_and_clear_bit is used on the > page. Right, but why would that be a bad thing? I think it’s to avoid races like this: P1: Gets pointer to page A, owned by domain M P2: Gets pointer to page A, owned by domain M P2: test_and_clear_bit(PGC_allocated) succeeds P2: put_page() -> page freed -> Page re-allocated to domain N P1: test_and_clear_bit(PGC_allocated) succeeds P1: put_page() -> page freed ## Now P1 incorrectly “frees” a page owned by domain N. If both P1 and P2 call get_page(A, M) first, and drop that reference afterwards, this race can’t happen. Jan, you’re the author of that patch — is that not the case? Could a similar race to the above occur in this code? It looks like it could have before 0502e0adae2 for sure, since all references and locks were dropped. It’s not clear to me whether mem_sharing_page_lock() will protect against ownership changes. *If* that assertion you added is always true, that there is always at least one reference in the rmap that will be held; *and* those references are always “held” on behalf of dom_cow, then I think your version will probably be safe (as the reference in the rmap will prevent it from changing ownership until the final put_page_and_type()). But if #2 ever false for any reason (even due to a bug in Xen), then this will open up a potential security issue (since ASSERT()s don’t trigger in production code). A couple of options: * Make it a BUG_ON(); that would change a potential privilege escalation into a DoS * Only clear PGC_allocated if put_count > 0. This changes a potential privilege escalation into a slow memory leak * If put_count == 0, grab an extra reference and drop it afterwards (with an ASSERT_UNREACHABLE() to help catch errors early). This avoids any XSA, but makes the code a bit uglier. * Leave the code as it is; just change the get_page() to grabbing dom_cow rather than cd. This avoids any XSA, but adds an extra get / put page which will be pure overhead in the common case. Thoughts? -George
On Mon, Apr 15, 2019 at 11:28 AM George Dunlap <George.Dunlap@citrix.com> wrote: > > > > > On Apr 15, 2019, at 4:31 PM, Tamas K Lengyel <tamas@tklengyel.com> wrote: > > > > On Mon, Apr 15, 2019 at 9:21 AM George Dunlap <george.dunlap@citrix.com> wrote: > >> > >> On 4/12/19 8:47 PM, Tamas K Lengyel wrote: > >>> Patch 0502e0adae2 "x86: correct instances of PGC_allocated clearing" introduced > >>> grabbing extra references for pages that drop references tied to PGC_allocated. > >>> However, the way these extra references were grabbed were incorrect, resulting > >>> in both share_pages and unshare_pages failing. > >> > >> Why? Is cd not the owner of the page? > > > > It's not, dom_cow is. > > > >>> @@ -1002,7 +994,10 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh, > >>> /* Free the client page */ > >>> if(test_and_clear_bit(_PGC_allocated, &cpage->count_info)) > >>> put_page(cpage); > >>> - put_page(cpage); > >>> + > >>> + ASSERT(put_count); > >>> + while ( put_count-- ) > >>> + put_page_and_type(cpage); > >> > >> I don't think this is in the spirit of the fix in 0502e0adae2; the idea > >> there seems to have been to make sure that cpage belongs to the right > >> domain, and that the ownership doesn't change. If such a race / mistake > >> were possible before that change, such a race / mistake would be > >> possible after this change, wouldn't it? > > > > I don't have an answer to your question. AFAIU this is just to ensure > > that the page isn't dropped before test_and_clear_bit is used on the > > page. > > Right, but why would that be a bad thing? > > I think it’s to avoid races like this: > > P1: Gets pointer to page A, owned by domain M > P2: Gets pointer to page A, owned by domain M > P2: test_and_clear_bit(PGC_allocated) succeeds > P2: put_page() > -> page freed > -> Page re-allocated to domain N > P1: test_and_clear_bit(PGC_allocated) succeeds > P1: put_page() > -> page freed ## > > Now P1 incorrectly “frees” a page owned by domain N. > > If both P1 and P2 call get_page(A, M) first, and drop that reference afterwards, this race can’t happen. > > Jan, you’re the author of that patch — is that not the case? > > Could a similar race to the above occur in this code? It looks like it could have before 0502e0adae2 for sure, since all references and locks were dropped. It’s not clear to me whether mem_sharing_page_lock() will protect against ownership changes. > > *If* that assertion you added is always true, that there is always at least one reference in the rmap that will be held; *and* those references are always “held” on behalf of dom_cow, then I think your version will probably be safe (as the reference in the rmap will prevent it from changing ownership until the final put_page_and_type()). But if #2 ever false for any reason (even due to a bug in Xen), then this will open up a potential security issue (since ASSERT()s don’t trigger in production code). > > A couple of options: > * Make it a BUG_ON(); that would change a potential privilege escalation into a DoS I'm fine with changing that ASSERT into a BUG_ON. > * Only clear PGC_allocated if put_count > 0. This changes a potential privilege escalation into a slow memory leak > * If put_count == 0, grab an extra reference and drop it afterwards (with an ASSERT_UNREACHABLE() to help catch errors early). This avoids any XSA, but makes the code a bit uglier. > * Leave the code as it is; just change the get_page() to grabbing dom_cow rather than cd. This avoids any XSA, but adds an extra get / put page which will be pure overhead in the common case. Simply changing to get_page with dom_cow is not enough, I ran into issues where the system still deadlocks due to the ordering of put_page_and_type being called before mem_sharing_page_unlock. So the ordering still has to be changed. Tamas
>>> On 15.04.19 at 19:28, <George.Dunlap@citrix.com> wrote: >> On Apr 15, 2019, at 4:31 PM, Tamas K Lengyel <tamas@tklengyel.com> wrote: >> On Mon, Apr 15, 2019 at 9:21 AM George Dunlap <george.dunlap@citrix.com> wrote: >>> On 4/12/19 8:47 PM, Tamas K Lengyel wrote: >>>> Patch 0502e0adae2 "x86: correct instances of PGC_allocated clearing" introduced >>>> grabbing extra references for pages that drop references tied to PGC_allocated. >>>> However, the way these extra references were grabbed were incorrect, resulting >>>> in both share_pages and unshare_pages failing. >>> >>> Why? Is cd not the owner of the page? >> >> It's not, dom_cow is. >> >>>> @@ -1002,7 +994,10 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh, >>>> /* Free the client page */ >>>> if(test_and_clear_bit(_PGC_allocated, &cpage->count_info)) >>>> put_page(cpage); >>>> - put_page(cpage); >>>> + >>>> + ASSERT(put_count); >>>> + while ( put_count-- ) >>>> + put_page_and_type(cpage); >>> >>> I don't think this is in the spirit of the fix in 0502e0adae2; the idea >>> there seems to have been to make sure that cpage belongs to the right >>> domain, and that the ownership doesn't change. If such a race / mistake >>> were possible before that change, such a race / mistake would be >>> possible after this change, wouldn't it? >> >> I don't have an answer to your question. AFAIU this is just to ensure >> that the page isn't dropped before test_and_clear_bit is used on the >> page. > > Right, but why would that be a bad thing? > > I think it’s to avoid races like this: > > P1: Gets pointer to page A, owned by domain M > P2: Gets pointer to page A, owned by domain M > P2: test_and_clear_bit(PGC_allocated) succeeds > P2: put_page() > -> page freed > -> Page re-allocated to domain N > P1: test_and_clear_bit(PGC_allocated) succeeds > P1: put_page() > -> page freed ## > > Now P1 incorrectly “frees” a page owned by domain N. > > If both P1 and P2 call get_page(A, M) first, and drop that reference > afterwards, this race can’t happen. > > Jan, you’re the author of that patch — is that not the case? Yes indeed, that's why I did add the extra get_page() (not realizing the owning domain is the wrong one). Jan
>>> On 15.04.19 at 19:51, <tamas@tklengyel.com> wrote: > Simply changing to get_page with dom_cow is not enough, I ran into > issues where the system still deadlocks due to the ordering of > put_page_and_type being called before mem_sharing_page_unlock. So the > ordering still has to be changed. Dead locks because of the ordering of releases (or locks or alike), not acquires? That looks suspicious to me, i.e. suggests yet another problem elsewhere. Jan
On Thu, Apr 25, 2019 at 5:14 AM Jan Beulich <JBeulich@suse.com> wrote: > > >>> On 15.04.19 at 19:51, <tamas@tklengyel.com> wrote: > > Simply changing to get_page with dom_cow is not enough, I ran into > > issues where the system still deadlocks due to the ordering of > > put_page_and_type being called before mem_sharing_page_unlock. So the > > ordering still has to be changed. > > Dead locks because of the ordering of releases (or locks or alike), > not acquires? That looks suspicious to me, i.e. suggests yet > another problem elsewhere. FWIW page_lock has this warning: * We must never call _put_page_type() while holding a page_lock() for * that page; doing so may cause a deadlock under the right * conditions. That said, I'm working on getting rid of using page_lock altogether because it just over-complicates things for no good reason. Should be able to send the new patches today. Tamas
>>> On 25.04.19 at 15:33, <tamas@tklengyel.com> wrote: > On Thu, Apr 25, 2019 at 5:14 AM Jan Beulich <JBeulich@suse.com> wrote: >> >> >>> On 15.04.19 at 19:51, <tamas@tklengyel.com> wrote: >> > Simply changing to get_page with dom_cow is not enough, I ran into >> > issues where the system still deadlocks due to the ordering of >> > put_page_and_type being called before mem_sharing_page_unlock. So the >> > ordering still has to be changed. >> >> Dead locks because of the ordering of releases (or locks or alike), >> not acquires? That looks suspicious to me, i.e. suggests yet >> another problem elsewhere. > > FWIW page_lock has this warning: > > * We must never call _put_page_type() while holding a page_lock() for > * that page; doing so may cause a deadlock under the right > * conditions. Oh, of course. I once again forgot that mem_sharing_page_unlock() calls page_unlock(). > That said, I'm working on getting rid of using page_lock altogether > because it just over-complicates things for no good reason. Should be > able to send the new patches today. Oh, that's good news. Jan
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index 5ac9d8f54c..708037d91e 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -900,6 +900,7 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh, p2m_type_t smfn_type, cmfn_type; struct two_gfns tg; struct rmap_iterator ri; + unsigned long put_count = 0; get_two_gfns(sd, sgfn, &smfn_type, NULL, &smfn, cd, cgfn, &cmfn_type, NULL, &cmfn, 0, &tg); @@ -964,15 +965,6 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh, goto err_out; } - /* Acquire an extra reference, for the freeing below to be safe. */ - if ( !get_page(cpage, cd) ) - { - ret = -EOVERFLOW; - mem_sharing_page_unlock(secondpg); - mem_sharing_page_unlock(firstpg); - goto err_out; - } - /* Merge the lists together */ rmap_seed_iterator(cpage, &ri); while ( (gfn = rmap_iterate(cpage, &ri)) != NULL) @@ -984,7 +976,7 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh, * Don't change the type of rmap for the client page. */ rmap_del(gfn, cpage, 0); rmap_add(gfn, spage); - put_page_and_type(cpage); + put_count++; d = get_domain_by_id(gfn->domain); BUG_ON(!d); BUG_ON(set_shared_p2m_entry(d, gfn->gfn, smfn)); @@ -1002,7 +994,10 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh, /* Free the client page */ if(test_and_clear_bit(_PGC_allocated, &cpage->count_info)) put_page(cpage); - put_page(cpage); + + ASSERT(put_count); + while ( put_count-- ) + put_page_and_type(cpage); /* We managed to free a domain page. */ atomic_dec(&nr_shared_mfns); @@ -1167,20 +1162,11 @@ int __mem_sharing_unshare_page(struct domain *d, { if ( !last_gfn ) mem_sharing_gfn_destroy(page, d, gfn_info); - put_page_and_type(page); mem_sharing_page_unlock(page); - if ( last_gfn ) - { - if ( !get_page(page, d) ) - { - put_gfn(d, gfn); - domain_crash(d); - return -EOVERFLOW; - } - if ( test_and_clear_bit(_PGC_allocated, &page->count_info) ) - put_page(page); + if ( last_gfn && + test_and_clear_bit(_PGC_allocated, &page->count_info) ) put_page(page); - } + put_page_and_type(page); put_gfn(d, gfn); return 0;
Patch 0502e0adae2 "x86: correct instances of PGC_allocated clearing" introduced grabbing extra references for pages that drop references tied to PGC_allocated. However, the way these extra references were grabbed were incorrect, resulting in both share_pages and unshare_pages failing. There actually is no need to grab extra references, only a reordering was needed for when the existing references are being released. This is in accordance to the XSA-242 recommendation of not calling _put_page_type while also holding the page_lock for that page. Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Roger Pau Monne <roger.pau@citrix.com> --- v2: Add assert that put_count is at least 1 --- xen/arch/x86/mm/mem_sharing.c | 32 +++++++++----------------------- 1 file changed, 9 insertions(+), 23 deletions(-)