diff mbox series

[v2,1/2] x86/mem_sharing: reorder when pages are unlocked and released

Message ID 20190412194712.1602-1-tamas@tklengyel.com (mailing list archive)
State Superseded
Headers show
Series [v2,1/2] x86/mem_sharing: reorder when pages are unlocked and released | expand

Commit Message

Tamas K Lengyel April 12, 2019, 7:47 p.m. UTC
Patch 0502e0adae2 "x86: correct instances of PGC_allocated clearing" introduced
grabbing extra references for pages that drop references tied to PGC_allocated.
However, the way these extra references were grabbed were incorrect, resulting
in both share_pages and unshare_pages failing. There actually is no need to
grab extra references, only a reordering was needed for when the existing
references are being released. This is in accordance to the XSA-242
recommendation of not calling _put_page_type while also holding the page_lock
for that page.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Roger Pau Monne <roger.pau@citrix.com>
---
v2: Add assert that put_count is at least 1
---
 xen/arch/x86/mm/mem_sharing.c | 32 +++++++++-----------------------
 1 file changed, 9 insertions(+), 23 deletions(-)

Comments

George Dunlap April 15, 2019, 3:21 p.m. UTC | #1
On 4/12/19 8:47 PM, Tamas K Lengyel wrote:
> Patch 0502e0adae2 "x86: correct instances of PGC_allocated clearing" introduced
> grabbing extra references for pages that drop references tied to PGC_allocated.
> However, the way these extra references were grabbed were incorrect, resulting
> in both share_pages and unshare_pages failing.

Why?  Is cd not the owner of the page?

> There actually is no need to
> grab extra references, only a reordering was needed for when the existing
> references are being released. This is in accordance to the XSA-242
> recommendation of not calling _put_page_type while also holding the page_lock
> for that page.
> 
> Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: Roger Pau Monne <roger.pau@citrix.com>
> ---
> v2: Add assert that put_count is at least 1
> ---
>  xen/arch/x86/mm/mem_sharing.c | 32 +++++++++-----------------------
>  1 file changed, 9 insertions(+), 23 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> index 5ac9d8f54c..708037d91e 100644
> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -900,6 +900,7 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh,
>      p2m_type_t smfn_type, cmfn_type;
>      struct two_gfns tg;
>      struct rmap_iterator ri;
> +    unsigned long put_count = 0;
>  
>      get_two_gfns(sd, sgfn, &smfn_type, NULL, &smfn,
>                   cd, cgfn, &cmfn_type, NULL, &cmfn, 0, &tg);
> @@ -964,15 +965,6 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh,
>          goto err_out;
>      }
>  
> -    /* Acquire an extra reference, for the freeing below to be safe. */
> -    if ( !get_page(cpage, cd) )
> -    {
> -        ret = -EOVERFLOW;
> -        mem_sharing_page_unlock(secondpg);
> -        mem_sharing_page_unlock(firstpg);
> -        goto err_out;
> -    }

Why does this fail?  Is cd not the owner of the page?

> -
>      /* Merge the lists together */
>      rmap_seed_iterator(cpage, &ri);
>      while ( (gfn = rmap_iterate(cpage, &ri)) != NULL)
> @@ -984,7 +976,7 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh,
>           * Don't change the type of rmap for the client page. */
>          rmap_del(gfn, cpage, 0);
>          rmap_add(gfn, spage);
> -        put_page_and_type(cpage);
> +        put_count++;
>          d = get_domain_by_id(gfn->domain);
>          BUG_ON(!d);
>          BUG_ON(set_shared_p2m_entry(d, gfn->gfn, smfn));
> @@ -1002,7 +994,10 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh,
>      /* Free the client page */
>      if(test_and_clear_bit(_PGC_allocated, &cpage->count_info))
>          put_page(cpage);
> -    put_page(cpage);
> +
> +    ASSERT(put_count);
> +    while ( put_count-- )
> +        put_page_and_type(cpage);

I don't think this is in the spirit of the fix in 0502e0adae2; the idea
there seems to have been to make sure that cpage belongs to the right
domain, and that the ownership doesn't change.  If such a race / mistake
were possible before that change, such a race / mistake would be
possible after this change, wouldn't it?

 -George
Tamas K Lengyel April 15, 2019, 3:31 p.m. UTC | #2
On Mon, Apr 15, 2019 at 9:21 AM George Dunlap <george.dunlap@citrix.com> wrote:
>
> On 4/12/19 8:47 PM, Tamas K Lengyel wrote:
> > Patch 0502e0adae2 "x86: correct instances of PGC_allocated clearing" introduced
> > grabbing extra references for pages that drop references tied to PGC_allocated.
> > However, the way these extra references were grabbed were incorrect, resulting
> > in both share_pages and unshare_pages failing.
>
> Why?  Is cd not the owner of the page?

It's not, dom_cow is.

> > @@ -1002,7 +994,10 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh,
> >      /* Free the client page */
> >      if(test_and_clear_bit(_PGC_allocated, &cpage->count_info))
> >          put_page(cpage);
> > -    put_page(cpage);
> > +
> > +    ASSERT(put_count);
> > +    while ( put_count-- )
> > +        put_page_and_type(cpage);
>
> I don't think this is in the spirit of the fix in 0502e0adae2; the idea
> there seems to have been to make sure that cpage belongs to the right
> domain, and that the ownership doesn't change.  If such a race / mistake
> were possible before that change, such a race / mistake would be
> possible after this change, wouldn't it?

I don't have an answer to your question. AFAIU this is just to ensure
that the page isn't dropped before test_and_clear_bit is used on the
page.

Tamas
George Dunlap April 15, 2019, 5:28 p.m. UTC | #3
> On Apr 15, 2019, at 4:31 PM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
> 
> On Mon, Apr 15, 2019 at 9:21 AM George Dunlap <george.dunlap@citrix.com> wrote:
>> 
>> On 4/12/19 8:47 PM, Tamas K Lengyel wrote:
>>> Patch 0502e0adae2 "x86: correct instances of PGC_allocated clearing" introduced
>>> grabbing extra references for pages that drop references tied to PGC_allocated.
>>> However, the way these extra references were grabbed were incorrect, resulting
>>> in both share_pages and unshare_pages failing.
>> 
>> Why?  Is cd not the owner of the page?
> 
> It's not, dom_cow is.
> 
>>> @@ -1002,7 +994,10 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh,
>>>     /* Free the client page */
>>>     if(test_and_clear_bit(_PGC_allocated, &cpage->count_info))
>>>         put_page(cpage);
>>> -    put_page(cpage);
>>> +
>>> +    ASSERT(put_count);
>>> +    while ( put_count-- )
>>> +        put_page_and_type(cpage);
>> 
>> I don't think this is in the spirit of the fix in 0502e0adae2; the idea
>> there seems to have been to make sure that cpage belongs to the right
>> domain, and that the ownership doesn't change.  If such a race / mistake
>> were possible before that change, such a race / mistake would be
>> possible after this change, wouldn't it?
> 
> I don't have an answer to your question. AFAIU this is just to ensure
> that the page isn't dropped before test_and_clear_bit is used on the
> page.

Right, but why would that be a bad thing?

I think it’s to avoid races like this:

P1: Gets pointer to page A, owned by domain M
P2: Gets pointer to page A, owned by domain M
P2: test_and_clear_bit(PGC_allocated) succeeds
P2: put_page()
  -> page freed
  -> Page re-allocated to domain N
P1: test_and_clear_bit(PGC_allocated) succeeds
P1: put_page() 
  -> page freed ##

Now P1 incorrectly “frees” a page owned by domain N.

If both P1 and P2 call get_page(A, M) first, and drop that reference afterwards, this race can’t happen.

Jan, you’re the author of that patch — is that not the case?

Could a similar race to the above occur in this code?  It looks like it could have before 0502e0adae2 for sure, since all references and locks were dropped.  It’s not clear to me whether mem_sharing_page_lock() will protect against ownership changes.

*If* that assertion you added is always true, that there is always at least one reference in the rmap that will be held; *and* those references are always “held” on behalf of dom_cow, then I think your version will probably be safe (as the reference in the rmap will prevent it from changing ownership until the final put_page_and_type()).  But if #2 ever false for any reason (even due to a bug in Xen), then this will open up a potential security issue (since ASSERT()s don’t trigger in production code).

A couple of options:
* Make it a BUG_ON(); that would change a potential privilege escalation into a DoS
* Only clear PGC_allocated if put_count > 0.  This changes a potential privilege escalation into a slow memory leak
* If put_count == 0, grab an extra reference and drop it afterwards (with an ASSERT_UNREACHABLE() to help catch errors early).  This avoids any XSA, but makes the code a bit uglier.
* Leave the code as it is; just change the get_page() to grabbing dom_cow rather than cd.  This avoids any XSA, but adds an extra get / put page which will be pure overhead in the common case.

Thoughts?

 -George
Tamas K Lengyel April 15, 2019, 5:51 p.m. UTC | #4
On Mon, Apr 15, 2019 at 11:28 AM George Dunlap <George.Dunlap@citrix.com> wrote:
>
>
>
> > On Apr 15, 2019, at 4:31 PM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
> >
> > On Mon, Apr 15, 2019 at 9:21 AM George Dunlap <george.dunlap@citrix.com> wrote:
> >>
> >> On 4/12/19 8:47 PM, Tamas K Lengyel wrote:
> >>> Patch 0502e0adae2 "x86: correct instances of PGC_allocated clearing" introduced
> >>> grabbing extra references for pages that drop references tied to PGC_allocated.
> >>> However, the way these extra references were grabbed were incorrect, resulting
> >>> in both share_pages and unshare_pages failing.
> >>
> >> Why?  Is cd not the owner of the page?
> >
> > It's not, dom_cow is.
> >
> >>> @@ -1002,7 +994,10 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh,
> >>>     /* Free the client page */
> >>>     if(test_and_clear_bit(_PGC_allocated, &cpage->count_info))
> >>>         put_page(cpage);
> >>> -    put_page(cpage);
> >>> +
> >>> +    ASSERT(put_count);
> >>> +    while ( put_count-- )
> >>> +        put_page_and_type(cpage);
> >>
> >> I don't think this is in the spirit of the fix in 0502e0adae2; the idea
> >> there seems to have been to make sure that cpage belongs to the right
> >> domain, and that the ownership doesn't change.  If such a race / mistake
> >> were possible before that change, such a race / mistake would be
> >> possible after this change, wouldn't it?
> >
> > I don't have an answer to your question. AFAIU this is just to ensure
> > that the page isn't dropped before test_and_clear_bit is used on the
> > page.
>
> Right, but why would that be a bad thing?
>
> I think it’s to avoid races like this:
>
> P1: Gets pointer to page A, owned by domain M
> P2: Gets pointer to page A, owned by domain M
> P2: test_and_clear_bit(PGC_allocated) succeeds
> P2: put_page()
>   -> page freed
>   -> Page re-allocated to domain N
> P1: test_and_clear_bit(PGC_allocated) succeeds
> P1: put_page()
>   -> page freed ##
>
> Now P1 incorrectly “frees” a page owned by domain N.
>
> If both P1 and P2 call get_page(A, M) first, and drop that reference afterwards, this race can’t happen.
>
> Jan, you’re the author of that patch — is that not the case?
>
> Could a similar race to the above occur in this code?  It looks like it could have before 0502e0adae2 for sure, since all references and locks were dropped.  It’s not clear to me whether mem_sharing_page_lock() will protect against ownership changes.
>
> *If* that assertion you added is always true, that there is always at least one reference in the rmap that will be held; *and* those references are always “held” on behalf of dom_cow, then I think your version will probably be safe (as the reference in the rmap will prevent it from changing ownership until the final put_page_and_type()).  But if #2 ever false for any reason (even due to a bug in Xen), then this will open up a potential security issue (since ASSERT()s don’t trigger in production code).
>
> A couple of options:
> * Make it a BUG_ON(); that would change a potential privilege escalation into a DoS

I'm fine with changing that ASSERT into a BUG_ON.

> * Only clear PGC_allocated if put_count > 0.  This changes a potential privilege escalation into a slow memory leak
> * If put_count == 0, grab an extra reference and drop it afterwards (with an ASSERT_UNREACHABLE() to help catch errors early).  This avoids any XSA, but makes the code a bit uglier.
> * Leave the code as it is; just change the get_page() to grabbing dom_cow rather than cd.  This avoids any XSA, but adds an extra get / put page which will be pure overhead in the common case.

Simply changing to get_page with dom_cow is not enough, I ran into
issues where the system still deadlocks due to the ordering of
put_page_and_type being called before mem_sharing_page_unlock. So the
ordering still has to be changed.

Tamas
Jan Beulich April 25, 2019, 11:11 a.m. UTC | #5
>>> On 15.04.19 at 19:28, <George.Dunlap@citrix.com> wrote:
>> On Apr 15, 2019, at 4:31 PM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
>> On Mon, Apr 15, 2019 at 9:21 AM George Dunlap <george.dunlap@citrix.com> wrote:
>>> On 4/12/19 8:47 PM, Tamas K Lengyel wrote:
>>>> Patch 0502e0adae2 "x86: correct instances of PGC_allocated clearing" introduced
>>>> grabbing extra references for pages that drop references tied to PGC_allocated.
>>>> However, the way these extra references were grabbed were incorrect, resulting
>>>> in both share_pages and unshare_pages failing.
>>> 
>>> Why?  Is cd not the owner of the page?
>> 
>> It's not, dom_cow is.
>> 
>>>> @@ -1002,7 +994,10 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh,
>>>>     /* Free the client page */
>>>>     if(test_and_clear_bit(_PGC_allocated, &cpage->count_info))
>>>>         put_page(cpage);
>>>> -    put_page(cpage);
>>>> +
>>>> +    ASSERT(put_count);
>>>> +    while ( put_count-- )
>>>> +        put_page_and_type(cpage);
>>> 
>>> I don't think this is in the spirit of the fix in 0502e0adae2; the idea
>>> there seems to have been to make sure that cpage belongs to the right
>>> domain, and that the ownership doesn't change.  If such a race / mistake
>>> were possible before that change, such a race / mistake would be
>>> possible after this change, wouldn't it?
>> 
>> I don't have an answer to your question. AFAIU this is just to ensure
>> that the page isn't dropped before test_and_clear_bit is used on the
>> page.
> 
> Right, but why would that be a bad thing?
> 
> I think it’s to avoid races like this:
> 
> P1: Gets pointer to page A, owned by domain M
> P2: Gets pointer to page A, owned by domain M
> P2: test_and_clear_bit(PGC_allocated) succeeds
> P2: put_page()
>   -> page freed
>   -> Page re-allocated to domain N
> P1: test_and_clear_bit(PGC_allocated) succeeds
> P1: put_page() 
>   -> page freed ##
> 
> Now P1 incorrectly “frees” a page owned by domain N.
> 
> If both P1 and P2 call get_page(A, M) first, and drop that reference 
> afterwards, this race can’t happen.
> 
> Jan, you’re the author of that patch — is that not the case?

Yes indeed, that's why I did add the extra get_page() (not
realizing the owning domain is the wrong one).

Jan
Jan Beulich April 25, 2019, 11:14 a.m. UTC | #6
>>> On 15.04.19 at 19:51, <tamas@tklengyel.com> wrote:
> Simply changing to get_page with dom_cow is not enough, I ran into
> issues where the system still deadlocks due to the ordering of
> put_page_and_type being called before mem_sharing_page_unlock. So the
> ordering still has to be changed.

Dead locks because of the ordering of releases (or locks or alike),
not acquires? That looks suspicious to me, i.e. suggests yet
another problem elsewhere.

Jan
Tamas K Lengyel April 25, 2019, 1:33 p.m. UTC | #7
On Thu, Apr 25, 2019 at 5:14 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> >>> On 15.04.19 at 19:51, <tamas@tklengyel.com> wrote:
> > Simply changing to get_page with dom_cow is not enough, I ran into
> > issues where the system still deadlocks due to the ordering of
> > put_page_and_type being called before mem_sharing_page_unlock. So the
> > ordering still has to be changed.
>
> Dead locks because of the ordering of releases (or locks or alike),
> not acquires? That looks suspicious to me, i.e. suggests yet
> another problem elsewhere.

FWIW page_lock has this warning:

 * We must never call _put_page_type() while holding a page_lock() for
 * that page; doing so may cause a deadlock under the right
 * conditions.

That said, I'm working on getting rid of using page_lock altogether
because it just over-complicates things for no good reason. Should be
able to send the new patches today.

Tamas
Jan Beulich April 25, 2019, 2:03 p.m. UTC | #8
>>> On 25.04.19 at 15:33, <tamas@tklengyel.com> wrote:
> On Thu, Apr 25, 2019 at 5:14 AM Jan Beulich <JBeulich@suse.com> wrote:
>>
>> >>> On 15.04.19 at 19:51, <tamas@tklengyel.com> wrote:
>> > Simply changing to get_page with dom_cow is not enough, I ran into
>> > issues where the system still deadlocks due to the ordering of
>> > put_page_and_type being called before mem_sharing_page_unlock. So the
>> > ordering still has to be changed.
>>
>> Dead locks because of the ordering of releases (or locks or alike),
>> not acquires? That looks suspicious to me, i.e. suggests yet
>> another problem elsewhere.
> 
> FWIW page_lock has this warning:
> 
>  * We must never call _put_page_type() while holding a page_lock() for
>  * that page; doing so may cause a deadlock under the right
>  * conditions.

Oh, of course. I once again forgot that
mem_sharing_page_unlock() calls page_unlock().

> That said, I'm working on getting rid of using page_lock altogether
> because it just over-complicates things for no good reason. Should be
> able to send the new patches today.

Oh, that's good news.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 5ac9d8f54c..708037d91e 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -900,6 +900,7 @@  static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh,
     p2m_type_t smfn_type, cmfn_type;
     struct two_gfns tg;
     struct rmap_iterator ri;
+    unsigned long put_count = 0;
 
     get_two_gfns(sd, sgfn, &smfn_type, NULL, &smfn,
                  cd, cgfn, &cmfn_type, NULL, &cmfn, 0, &tg);
@@ -964,15 +965,6 @@  static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh,
         goto err_out;
     }
 
-    /* Acquire an extra reference, for the freeing below to be safe. */
-    if ( !get_page(cpage, cd) )
-    {
-        ret = -EOVERFLOW;
-        mem_sharing_page_unlock(secondpg);
-        mem_sharing_page_unlock(firstpg);
-        goto err_out;
-    }
-
     /* Merge the lists together */
     rmap_seed_iterator(cpage, &ri);
     while ( (gfn = rmap_iterate(cpage, &ri)) != NULL)
@@ -984,7 +976,7 @@  static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh,
          * Don't change the type of rmap for the client page. */
         rmap_del(gfn, cpage, 0);
         rmap_add(gfn, spage);
-        put_page_and_type(cpage);
+        put_count++;
         d = get_domain_by_id(gfn->domain);
         BUG_ON(!d);
         BUG_ON(set_shared_p2m_entry(d, gfn->gfn, smfn));
@@ -1002,7 +994,10 @@  static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh,
     /* Free the client page */
     if(test_and_clear_bit(_PGC_allocated, &cpage->count_info))
         put_page(cpage);
-    put_page(cpage);
+
+    ASSERT(put_count);
+    while ( put_count-- )
+        put_page_and_type(cpage);
 
     /* We managed to free a domain page. */
     atomic_dec(&nr_shared_mfns);
@@ -1167,20 +1162,11 @@  int __mem_sharing_unshare_page(struct domain *d,
     {
         if ( !last_gfn )
             mem_sharing_gfn_destroy(page, d, gfn_info);
-        put_page_and_type(page);
         mem_sharing_page_unlock(page);
-        if ( last_gfn )
-        {
-            if ( !get_page(page, d) )
-            {
-                put_gfn(d, gfn);
-                domain_crash(d);
-                return -EOVERFLOW;
-            }
-            if ( test_and_clear_bit(_PGC_allocated, &page->count_info) )
-                put_page(page);
+        if ( last_gfn &&
+            test_and_clear_bit(_PGC_allocated, &page->count_info) )
             put_page(page);
-        }
+        put_page_and_type(page);
         put_gfn(d, gfn);
 
         return 0;