diff mbox series

fix BUG in gnttab_unpopulate_status_frames()

Message ID 20190730164401.34097-1-paul.durrant@citrix.com (mailing list archive)
State New, archived
Headers show
Series fix BUG in gnttab_unpopulate_status_frames() | expand

Commit Message

Paul Durrant July 30, 2019, 4:44 p.m. UTC
Since commit ec83f825627 "mm.h: add helper function to test-and-clear
_PGC_allocated" (and subsequent fix-up 44a887d021d "mm.h: fix BUG_ON()
condition in put_page_alloc_ref()") setting grant table version from 2
back to 1 has been vulnerable to hitting the BUG_ON in put_page_alloc_ref()
during gnttab_unpopulate_status_frames() because that function does not
acquire a local page reference.

This patch fixes the problem by first acquiring a local page reference on a
status frame (which should always succeed and so a failure results in a
domain_crash()) before attempting to 'unassign' it from the guest by
dropping the allocation reference. The local reference can then be dropped.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wl@xen.org>
---
 xen/common/grant_table.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Jan Beulich Aug. 2, 2019, 2:44 p.m. UTC | #1
On 30.07.2019 18:44, Paul Durrant wrote:
> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -1682,6 +1682,14 @@ gnttab_unpopulate_status_frames(struct domain *d, struct grant_table *gt)
>           struct page_info *pg = virt_to_page(gt->status[i]);
>           gfn_t gfn = gnttab_get_frame_gfn(gt, true, i);
>   
> +        if ( !get_page(pg, d) )
> +        {
> +            gprintk(XENLOG_ERR,
> +                    "Could not get a reference to status frame %u\n", i);
> +            domain_crash(d);
> +            return -EINVAL;
> +        }
> +
>           /*
>            * For translated domains, recovering from failure after partial
>            * changes were made is more complicated than it seems worth
> @@ -1708,6 +1716,7 @@ gnttab_unpopulate_status_frames(struct domain *d, struct grant_table *gt)
>   
>           BUG_ON(page_get_owner(pg) != d);
>           put_page_alloc_ref(pg);
> +        put_page(pg);
>   
>           if ( pg->count_info & ~PGC_xen_heap )
>           {
> 

I dislike this approach, and not chosing the alternative of excluding
xenheap pages in the check in put_page_alloc_ref() (as I had recommended
elsewhere) should at least be discussed in the description. It is the
very nature of xenheap pages that they won't get freed, and hence don't
need this extra ref to be held for clearing PGC_allocated.

Jan
George Dunlap Aug. 2, 2019, 3:28 p.m. UTC | #2
On 8/2/19 3:44 PM, Jan Beulich wrote:
> On 30.07.2019 18:44, Paul Durrant wrote:
>> --- a/xen/common/grant_table.c
>> +++ b/xen/common/grant_table.c
>> @@ -1682,6 +1682,14 @@ gnttab_unpopulate_status_frames(struct domain *d, struct grant_table *gt)
>>           struct page_info *pg = virt_to_page(gt->status[i]);
>>           gfn_t gfn = gnttab_get_frame_gfn(gt, true, i);
>>   
>> +        if ( !get_page(pg, d) )
>> +        {
>> +            gprintk(XENLOG_ERR,
>> +                    "Could not get a reference to status frame %u\n", i);
>> +            domain_crash(d);
>> +            return -EINVAL;
>> +        }
>> +
>>           /*
>>            * For translated domains, recovering from failure after partial
>>            * changes were made is more complicated than it seems worth
>> @@ -1708,6 +1716,7 @@ gnttab_unpopulate_status_frames(struct domain *d, struct grant_table *gt)
>>   
>>           BUG_ON(page_get_owner(pg) != d);
>>           put_page_alloc_ref(pg);
>> +        put_page(pg);
>>   
>>           if ( pg->count_info & ~PGC_xen_heap )
>>           {
>>
> 
> I dislike this approach, and not chosing the alternative of excluding
> xenheap pages in the check in put_page_alloc_ref() (as I had recommended
> elsewhere) should at least be discussed in the description. It is the
> very nature of xenheap pages that they won't get freed, and hence don't
> need this extra ref to be held for clearing PGC_allocated.

Also, it looks like there are other places where the BUG_ON() may /
should be firing:  namely, vmx_free_vlapic_mapping() and
unshare_xenoprof_page_with_guest().  Teaching put_page_alloc_ref() that
dropping PGC_allocated when PGC_xen_heap is set is safe would fix all
three at once.

Possibly more importantly, suppose that the first time
gnttab_unpopulate_status_frames() comes around, gt->status[1] is still
mapped by the guest.  Then gt->status[0] will have its refcount reduced
to 0 (but not freed), but gt->status[1] will be restored to its previous
state.  If the guest unmaps gt->status[1] and
gnttab_unpopulate_status_frames() is called again, then the
get_page(gt->status[0]) will fail (since its refcount is 0), causing the
guest to be crashed instead.

Not terrible for such a wonkily-behaving guest; but I think I'd rather
go with the "special-case xenheap pages" option.

 -George
Paul Durrant Aug. 12, 2019, 10:38 a.m. UTC | #3
> -----Original Message-----
> From: George Dunlap <george.dunlap@citrix.com>
> Sent: 02 August 2019 16:28
> To: Jan Beulich <JBeulich@suse.com>; Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel@lists.xenproject.org; Julien Grall <julien.grall@arm.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>; George Dunlap <George.Dunlap@citrix.com>; Ian Jackson
> <Ian.Jackson@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com>; Tim (Xen.org) <tim@xen.org>; Wei Liu <wl@xen.org>
> Subject: Re: [PATCH] fix BUG in gnttab_unpopulate_status_frames()
> 
> On 8/2/19 3:44 PM, Jan Beulich wrote:
> > On 30.07.2019 18:44, Paul Durrant wrote:
> >> --- a/xen/common/grant_table.c
> >> +++ b/xen/common/grant_table.c
> >> @@ -1682,6 +1682,14 @@ gnttab_unpopulate_status_frames(struct domain *d, struct grant_table *gt)
> >>           struct page_info *pg = virt_to_page(gt->status[i]);
> >>           gfn_t gfn = gnttab_get_frame_gfn(gt, true, i);
> >>
> >> +        if ( !get_page(pg, d) )
> >> +        {
> >> +            gprintk(XENLOG_ERR,
> >> +                    "Could not get a reference to status frame %u\n", i);
> >> +            domain_crash(d);
> >> +            return -EINVAL;
> >> +        }
> >> +
> >>           /*
> >>            * For translated domains, recovering from failure after partial
> >>            * changes were made is more complicated than it seems worth
> >> @@ -1708,6 +1716,7 @@ gnttab_unpopulate_status_frames(struct domain *d, struct grant_table *gt)
> >>
> >>           BUG_ON(page_get_owner(pg) != d);
> >>           put_page_alloc_ref(pg);
> >> +        put_page(pg);
> >>
> >>           if ( pg->count_info & ~PGC_xen_heap )
> >>           {
> >>
> >
> > I dislike this approach, and not chosing the alternative of excluding
> > xenheap pages in the check in put_page_alloc_ref() (as I had recommended
> > elsewhere) should at least be discussed in the description. It is the
> > very nature of xenheap pages that they won't get freed, and hence don't
> > need this extra ref to be held for clearing PGC_allocated.
> 
> Also, it looks like there are other places where the BUG_ON() may /
> should be firing:  namely, vmx_free_vlapic_mapping() and
> unshare_xenoprof_page_with_guest().  Teaching put_page_alloc_ref() that
> dropping PGC_allocated when PGC_xen_heap is set is safe would fix all
> three at once.
> 
> Possibly more importantly, suppose that the first time
> gnttab_unpopulate_status_frames() comes around, gt->status[1] is still
> mapped by the guest.  Then gt->status[0] will have its refcount reduced
> to 0 (but not freed), but gt->status[1] will be restored to its previous
> state.  If the guest unmaps gt->status[1] and
> gnttab_unpopulate_status_frames() is called again, then the
> get_page(gt->status[0]) will fail (since its refcount is 0), causing the
> guest to be crashed instead.
> 
> Not terrible for such a wonkily-behaving guest; but I think I'd rather
> go with the "special-case xenheap pages" option.

Ok, I see you've committed a patch to that effect while I was away.

  Paul

> 
>  -George
George Dunlap Aug. 12, 2019, 11:03 a.m. UTC | #4
On 8/12/19 11:38 AM, Paul Durrant wrote:
>> -----Original Message-----
>> From: George Dunlap <george.dunlap@citrix.com>
>> Not terrible for such a wonkily-behaving guest; but I think I'd rather
>> go with the "special-case xenheap pages" option.
> 
> Ok, I see you've committed a patch to that effect while I was away.

Yes; Andy wanted to get tests passing, and asked me to submit a change
along the lines of what I was proposing.

 -George
diff mbox series

Patch

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 97695a221a..b9ca388051 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -1682,6 +1682,14 @@  gnttab_unpopulate_status_frames(struct domain *d, struct grant_table *gt)
         struct page_info *pg = virt_to_page(gt->status[i]);
         gfn_t gfn = gnttab_get_frame_gfn(gt, true, i);
 
+        if ( !get_page(pg, d) )
+        {
+            gprintk(XENLOG_ERR,
+                    "Could not get a reference to status frame %u\n", i);
+            domain_crash(d);
+            return -EINVAL;
+        }
+
         /*
          * For translated domains, recovering from failure after partial
          * changes were made is more complicated than it seems worth
@@ -1708,6 +1716,7 @@  gnttab_unpopulate_status_frames(struct domain *d, struct grant_table *gt)
 
         BUG_ON(page_get_owner(pg) != d);
         put_page_alloc_ref(pg);
+        put_page(pg);
 
         if ( pg->count_info & ~PGC_xen_heap )
         {