diff mbox

Ubuntu 16.04.1 LTS kernel 4.4.0-57 over-allocation and xen-access fail

Message ID 661a3921-e18f-8236-dde3-93d9327a657a@bitdefender.com (mailing list archive)
State New, archived
Headers show

Commit Message

Razvan Cojocaru Jan. 10, 2017, 9:06 a.m. UTC
On 01/09/2017 02:54 PM, Andrew Cooper wrote:
> On 09/01/17 11:36, Razvan Cojocaru wrote:
>> Hello,
>>
>> We've come across a weird phenomenon: an Ubuntu 16.04.1 LTS HVM guest
>> running kernel 4.4.0 installed via XenCenter in XenServer Dundee seems
>> to eat up all the RAM it can:
>>
>> (XEN) [  394.379760] d1v1 Over-allocation for domain 1: 524545 > 524544
>>
>> This leads to a problem with xen-access, specifically libxc which does
>> this in xc_vm_event_enable() (this is Xen 4.6):
>>
>> ring_page = xc_map_foreign_batch(xch, domain_id, PROT_READ | PROT_WRITE,
>>                                  &mmap_pfn, 1);
>>
>> if ( mmap_pfn & XEN_DOMCTL_PFINFO_XTAB )
>> {
>>     /* Map failed, populate ring page */
>>     rc1 = xc_domain_populate_physmap_exact(xch, domain_id, 1, 0, 0,
>>                                                &ring_pfn);
>>     if ( rc1 != 0 )
>>     {
>>         PERROR("Failed to populate ring pfn\n");
>>         goto out;
>>     }
>>
>> The first time everything works fine, xen-access can map the ring page.
>> But most of the time the second time fails in the
>> xc_domain_populate_physmap_exact() call, and again this is dumped in the
>> Xen log (once for each failed attempt):
>>
>> (XEN) [  395.952188] d0v3 Over-allocation for domain 1: 524545 > 524544
> 
> Thinking further about this, what happens if you avoid removing the page
> on exit?
> 
> The first populate succeeds, and if you leave the page populated, the
> second time you come around the loop, it should not be of type XTAB, and
> the map should succeed.

Sorry for the late reply, had to put out another fire yesterday.

I've taken your recommendation to roughly mean this:

         vm_event_ring_unlock(ved);

but this unfortunately still fails to map the page the second time. Do
you mean to simply no longer munmap() the ring page from libxc / the
client application?


Thanks,
Razvan

Comments

Andrew Cooper Jan. 10, 2017, 2:13 p.m. UTC | #1
On 10/01/17 09:06, Razvan Cojocaru wrote:
> On 01/09/2017 02:54 PM, Andrew Cooper wrote:
>> On 09/01/17 11:36, Razvan Cojocaru wrote:
>>> Hello,
>>>
>>> We've come across a weird phenomenon: an Ubuntu 16.04.1 LTS HVM guest
>>> running kernel 4.4.0 installed via XenCenter in XenServer Dundee seems
>>> to eat up all the RAM it can:
>>>
>>> (XEN) [  394.379760] d1v1 Over-allocation for domain 1: 524545 > 524544
>>>
>>> This leads to a problem with xen-access, specifically libxc which does
>>> this in xc_vm_event_enable() (this is Xen 4.6):
>>>
>>> ring_page = xc_map_foreign_batch(xch, domain_id, PROT_READ | PROT_WRITE,
>>>                                  &mmap_pfn, 1);
>>>
>>> if ( mmap_pfn & XEN_DOMCTL_PFINFO_XTAB )
>>> {
>>>     /* Map failed, populate ring page */
>>>     rc1 = xc_domain_populate_physmap_exact(xch, domain_id, 1, 0, 0,
>>>                                                &ring_pfn);
>>>     if ( rc1 != 0 )
>>>     {
>>>         PERROR("Failed to populate ring pfn\n");
>>>         goto out;
>>>     }
>>>
>>> The first time everything works fine, xen-access can map the ring page.
>>> But most of the time the second time fails in the
>>> xc_domain_populate_physmap_exact() call, and again this is dumped in the
>>> Xen log (once for each failed attempt):
>>>
>>> (XEN) [  395.952188] d0v3 Over-allocation for domain 1: 524545 > 524544
>> Thinking further about this, what happens if you avoid removing the page
>> on exit?
>>
>> The first populate succeeds, and if you leave the page populated, the
>> second time you come around the loop, it should not be of type XTAB, and
>> the map should succeed.
> Sorry for the late reply, had to put out another fire yesterday.
>
> I've taken your recommendation to roughly mean this:
>
> diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
> index ba9690a..805564b 100644
> --- a/xen/common/vm_event.c
> +++ b/xen/common/vm_event.c
> @@ -100,8 +100,11 @@ static int vm_event_enable(
>      return 0;
>
>   err:
> +    /*
>      destroy_ring_for_helper(&ved->ring_page,
>                              ved->ring_pg_struct);
> +    */
> +    ved->ring_page = NULL;
>      vm_event_ring_unlock(ved);
>
>      return rc;
> @@ -229,9 +232,12 @@ static int vm_event_disable(struct domain *d,
> struct vm_event_domain *ved)
>              }
>          }
>
> +        /*
>          destroy_ring_for_helper(&ved->ring_page,
>                                  ved->ring_pg_struct);
> +       */
>
> +        ved->ring_page = NULL;
>          vm_event_cleanup_domain(d);
>
>          vm_event_ring_unlock(ved);
>
> but this unfortunately still fails to map the page the second time. Do
> you mean to simply no longer munmap() the ring page from libxc / the
> client application?

Neither.

First of all, I notice that this is probably buggy:

    ring_pfn = pfn;
    mmap_pfn = pfn;
    rc1 = xc_get_pfn_type_batch(xch, domain_id, 1, &mmap_pfn);
    if ( rc1 || mmap_pfn & XEN_DOMCTL_PFINFO_XTAB )
    {
        /* Page not in the physmap, try to populate it */
        rc1 = xc_domain_populate_physmap_exact(xch, domain_id, 1, 0, 0,
                                              &ring_pfn);
        if ( rc1 != 0 )
        {
            PERROR("Failed to populate ring pfn\n");
            goto out;
        }
    }

A failure of xc_get_pfn_type_batch() is not a suggestion that population
might work.


What I meant was taking out this call:

    /* Remove the ring_pfn from the guest's physmap */
    rc1 = xc_domain_decrease_reservation_exact(xch, domain_id, 1, 0,
&ring_pfn);
    if ( rc1 != 0 )
        PERROR("Failed to remove ring page from guest physmap");

To leave the frame in the guest physmap.  The issue is fundamentally
that after this frame has been taken out, something kicks the VM to
realise it has an extra frame of balloonable space, which it clearly
compensates for.

You can work around the added attack surface by marking it RO in EPT;
neither Xen's nor dom0's mappings are translated via EPT, so they can
still make updates, but the guest won't be able to write to it.

I should say that this is all a gross hack, and is in desperate need of
a proper API to make rings entirely outside of the gfn space, but this
hack should work for now.

~Andrew
diff mbox

Patch

diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
index ba9690a..805564b 100644
--- a/xen/common/vm_event.c
+++ b/xen/common/vm_event.c
@@ -100,8 +100,11 @@  static int vm_event_enable(
     return 0;

  err:
+    /*
     destroy_ring_for_helper(&ved->ring_page,
                             ved->ring_pg_struct);
+    */
+    ved->ring_page = NULL;
     vm_event_ring_unlock(ved);

     return rc;
@@ -229,9 +232,12 @@  static int vm_event_disable(struct domain *d,
struct vm_event_domain *ved)
             }
         }

+        /*
         destroy_ring_for_helper(&ved->ring_page,
                                 ved->ring_pg_struct);
+       */

+        ved->ring_page = NULL;
         vm_event_cleanup_domain(d);