Message ID | 25413f94-8183-e497-5d56-20ef794f5a41@bitdefender.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 10/01/17 15:02, Razvan Cojocaru wrote: > On 01/10/2017 04:13 PM, Andrew Cooper wrote: >> On 10/01/17 09:06, Razvan Cojocaru wrote: >>> On 01/09/2017 02:54 PM, Andrew Cooper wrote: >>>> On 09/01/17 11:36, Razvan Cojocaru wrote: >>>>> Hello, >>>>> >>>>> We've come across a weird phenomenon: an Ubuntu 16.04.1 LTS HVM guest >>>>> running kernel 4.4.0 installed via XenCenter in XenServer Dundee seems >>>>> to eat up all the RAM it can: >>>>> >>>>> (XEN) [ 394.379760] d1v1 Over-allocation for domain 1: 524545 > 524544 >>>>> >>>>> This leads to a problem with xen-access, specifically libxc which does >>>>> this in xc_vm_event_enable() (this is Xen 4.6): >>>>> >>>>> ring_page = xc_map_foreign_batch(xch, domain_id, PROT_READ | PROT_WRITE, >>>>> &mmap_pfn, 1); >>>>> >>>>> if ( mmap_pfn & XEN_DOMCTL_PFINFO_XTAB ) >>>>> { >>>>> /* Map failed, populate ring page */ >>>>> rc1 = xc_domain_populate_physmap_exact(xch, domain_id, 1, 0, 0, >>>>> &ring_pfn); >>>>> if ( rc1 != 0 ) >>>>> { >>>>> PERROR("Failed to populate ring pfn\n"); >>>>> goto out; >>>>> } >>>>> >>>>> The first time everything works fine, xen-access can map the ring page. >>>>> But most of the time the second time fails in the >>>>> xc_domain_populate_physmap_exact() call, and again this is dumped in the >>>>> Xen log (once for each failed attempt): >>>>> >>>>> (XEN) [ 395.952188] d0v3 Over-allocation for domain 1: 524545 > 524544 >>>> Thinking further about this, what happens if you avoid removing the page >>>> on exit? >>>> >>>> The first populate succeeds, and if you leave the page populated, the >>>> second time you come around the loop, it should not be of type XTAB, and >>>> the map should succeed. >>> Sorry for the late reply, had to put out another fire yesterday. >>> >>> I've taken your recommendation to roughly mean this: >>> >>> diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c >>> index ba9690a..805564b 100644 >>> --- a/xen/common/vm_event.c >>> +++ b/xen/common/vm_event.c >>> @@ -100,8 +100,11 @@ static int vm_event_enable( >>> return 0; >>> >>> err: >>> + /* >>> destroy_ring_for_helper(&ved->ring_page, >>> ved->ring_pg_struct); >>> + */ >>> + ved->ring_page = NULL; >>> vm_event_ring_unlock(ved); >>> >>> return rc; >>> @@ -229,9 +232,12 @@ static int vm_event_disable(struct domain *d, >>> struct vm_event_domain *ved) >>> } >>> } >>> >>> + /* >>> destroy_ring_for_helper(&ved->ring_page, >>> ved->ring_pg_struct); >>> + */ >>> >>> + ved->ring_page = NULL; >>> vm_event_cleanup_domain(d); >>> >>> vm_event_ring_unlock(ved); >>> >>> but this unfortunately still fails to map the page the second time. Do >>> you mean to simply no longer munmap() the ring page from libxc / the >>> client application? >> Neither. >> >> First of all, I notice that this is probably buggy: >> >> ring_pfn = pfn; >> mmap_pfn = pfn; >> rc1 = xc_get_pfn_type_batch(xch, domain_id, 1, &mmap_pfn); >> if ( rc1 || mmap_pfn & XEN_DOMCTL_PFINFO_XTAB ) >> { >> /* Page not in the physmap, try to populate it */ >> rc1 = xc_domain_populate_physmap_exact(xch, domain_id, 1, 0, 0, >> &ring_pfn); >> if ( rc1 != 0 ) >> { >> PERROR("Failed to populate ring pfn\n"); >> goto out; >> } >> } >> >> A failure of xc_get_pfn_type_batch() is not a suggestion that population >> might work. >> >> >> What I meant was taking out this call: >> >> /* Remove the ring_pfn from the guest's physmap */ >> rc1 = xc_domain_decrease_reservation_exact(xch, domain_id, 1, 0, >> &ring_pfn); >> if ( rc1 != 0 ) >> PERROR("Failed to remove ring page from guest physmap"); >> >> To leave the frame in the guest physmap. The issue is fundamentally >> that after this frame has been taken out, something kicks the VM to >> realise it has an extra frame of balloonable space, which it clearly >> compensates for. >> >> You can work around the added attack surface by marking it RO in EPT; >> neither Xen's nor dom0's mappings are translated via EPT, so they can >> still make updates, but the guest won't be able to write to it. >> >> I should say that this is all a gross hack, and is in desperate need of >> a proper API to make rings entirely outside of the gfn space, but this >> hack should work for now. > Thanks! So far, it seems to work like a charm like this: Great. > > diff --git a/tools/libxc/xc_vm_event.c b/tools/libxc/xc_vm_event.c > index 2fef96a..5dd00a6 100644 > --- a/tools/libxc/xc_vm_event.c > +++ b/tools/libxc/xc_vm_event.c > @@ -130,9 +130,17 @@ void *xc_vm_event_enable(xc_interface *xch, domid_t > domain_id, int param, > } > > /* Remove the ring_pfn from the guest's physmap */ > + /* > rc1 = xc_domain_decrease_reservation_exact(xch, domain_id, 1, 0, > &ring_pfn); > if ( rc1 != 0 ) > PERROR("Failed to remove ring page from guest physmap"); > + */ > + > + if ( xc_set_mem_access(xch, domain_id, XENMEM_access_r, mmap_pfn, 1) ) > + { > + PERROR("Could not set ring page read-only\n"); > + goto out; > + } > > out: > saved_errno = errno; > > Should I send this as a patch for mainline as well? Probably a good idea, although I would include a code comment explaining what is going on, because this is subtle if you don't know the context. ~Andrew
>>> What I meant was taking out this call: >>> >>> /* Remove the ring_pfn from the guest's physmap */ >>> rc1 = xc_domain_decrease_reservation_exact(xch, domain_id, 1, 0, >>> &ring_pfn); >>> if ( rc1 != 0 ) >>> PERROR("Failed to remove ring page from guest physmap"); >>> >>> To leave the frame in the guest physmap. The issue is fundamentally >>> that after this frame has been taken out, something kicks the VM to >>> realise it has an extra frame of balloonable space, which it clearly >>> compensates for. >>> >>> You can work around the added attack surface by marking it RO in EPT; >>> neither Xen's nor dom0's mappings are translated via EPT, so they can >>> still make updates, but the guest won't be able to write to it. >>> >>> I should say that this is all a gross hack, and is in desperate need of >>> a proper API to make rings entirely outside of the gfn space, but this >>> hack should work for now. >> Thanks! So far, it seems to work like a charm like this: > > Great. > >> >> diff --git a/tools/libxc/xc_vm_event.c b/tools/libxc/xc_vm_event.c >> index 2fef96a..5dd00a6 100644 >> --- a/tools/libxc/xc_vm_event.c >> +++ b/tools/libxc/xc_vm_event.c >> @@ -130,9 +130,17 @@ void *xc_vm_event_enable(xc_interface *xch, domid_t >> domain_id, int param, >> } >> >> /* Remove the ring_pfn from the guest's physmap */ >> + /* >> rc1 = xc_domain_decrease_reservation_exact(xch, domain_id, 1, 0, >> &ring_pfn); >> if ( rc1 != 0 ) >> PERROR("Failed to remove ring page from guest physmap"); >> + */ >> + >> + if ( xc_set_mem_access(xch, domain_id, XENMEM_access_r, mmap_pfn, 1) ) >> + { >> + PERROR("Could not set ring page read-only\n"); >> + goto out; >> + } >> >> out: >> saved_errno = errno; >> >> Should I send this as a patch for mainline as well? > > Probably a good idea, although I would include a code comment explaining > what is going on, because this is subtle if you don't know the context. Will do, I'll send a patch out as soon as we've done a few more rounds of testing. Thanks again, Razvan
On Tue, Jan 10, 2017 at 8:35 AM, Razvan Cojocaru <rcojocaru@bitdefender.com> wrote: > >>> What I meant was taking out this call: > >>> > >>> /* Remove the ring_pfn from the guest's physmap */ > >>> rc1 = xc_domain_decrease_reservation_exact(xch, domain_id, 1, 0, > >>> &ring_pfn); > >>> if ( rc1 != 0 ) > >>> PERROR("Failed to remove ring page from guest physmap"); > >>> > >>> To leave the frame in the guest physmap. The issue is fundamentally > >>> that after this frame has been taken out, something kicks the VM to > >>> realise it has an extra frame of balloonable space, which it clearly > >>> compensates for. > >>> > >>> You can work around the added attack surface by marking it RO in EPT; > >>> neither Xen's nor dom0's mappings are translated via EPT, so they can > >>> still make updates, but the guest won't be able to write to it. > >>> > >>> I should say that this is all a gross hack, and is in desperate need of > >>> a proper API to make rings entirely outside of the gfn space, but this > >>> hack should work for now. > >> Thanks! So far, it seems to work like a charm like this: > > > > Great. > > > >> > >> diff --git a/tools/libxc/xc_vm_event.c b/tools/libxc/xc_vm_event.c > >> index 2fef96a..5dd00a6 100644 > >> --- a/tools/libxc/xc_vm_event.c > >> +++ b/tools/libxc/xc_vm_event.c > >> @@ -130,9 +130,17 @@ void *xc_vm_event_enable(xc_interface *xch, > domid_t > >> domain_id, int param, > >> } > >> > >> /* Remove the ring_pfn from the guest's physmap */ > >> + /* > >> rc1 = xc_domain_decrease_reservation_exact(xch, domain_id, 1, 0, > >> &ring_pfn); > >> if ( rc1 != 0 ) > >> PERROR("Failed to remove ring page from guest physmap"); > >> + */ > >> + > >> + if ( xc_set_mem_access(xch, domain_id, XENMEM_access_r, mmap_pfn, > 1) ) > >> + { > >> + PERROR("Could not set ring page read-only\n"); > >> + goto out; > >> + } > >> > >> out: > >> saved_errno = errno; > >> > >> Should I send this as a patch for mainline as well? > > > > Probably a good idea, although I would include a code comment explaining > > what is going on, because this is subtle if you don't know the context. > > Will do, I'll send a patch out as soon as we've done a few more rounds > of testing. > (replying to all): I'm not in favor of this patch mainly because it is not stealthy. A malicious kernel could easily track what events are being sent on the ring. With DRAKVUF I could work around this using altp2m pfn-remapping, but for other tools this is can be a serious information leak. Tamas
On 01/10/2017 06:29 PM, Tamas K Lengyel wrote: > > > On Tue, Jan 10, 2017 at 8:35 AM, Razvan Cojocaru > <rcojocaru@bitdefender.com <mailto:rcojocaru@bitdefender.com>> wrote: > > >>> What I meant was taking out this call: > >>> > >>> /* Remove the ring_pfn from the guest's physmap */ > >>> rc1 = xc_domain_decrease_reservation_exact(xch, domain_id, 1, 0, > >>> &ring_pfn); > >>> if ( rc1 != 0 ) > >>> PERROR("Failed to remove ring page from guest physmap"); > >>> > >>> To leave the frame in the guest physmap. The issue is fundamentally > >>> that after this frame has been taken out, something kicks the VM to > >>> realise it has an extra frame of balloonable space, which it clearly > >>> compensates for. > >>> > >>> You can work around the added attack surface by marking it RO in > EPT; > >>> neither Xen's nor dom0's mappings are translated via EPT, so > they can > >>> still make updates, but the guest won't be able to write to it. > >>> > >>> I should say that this is all a gross hack, and is in desperate > need of > >>> a proper API to make rings entirely outside of the gfn space, > but this > >>> hack should work for now. > >> Thanks! So far, it seems to work like a charm like this: > > > > Great. > > > >> > >> diff --git a/tools/libxc/xc_vm_event.c b/tools/libxc/xc_vm_event.c > >> index 2fef96a..5dd00a6 100644 > >> --- a/tools/libxc/xc_vm_event.c > >> +++ b/tools/libxc/xc_vm_event.c > >> @@ -130,9 +130,17 @@ void *xc_vm_event_enable(xc_interface *xch, > domid_t > >> domain_id, int param, > >> } > >> > >> /* Remove the ring_pfn from the guest's physmap */ > >> + /* > >> rc1 = xc_domain_decrease_reservation_exact(xch, domain_id, 1, 0, > >> &ring_pfn); > >> if ( rc1 != 0 ) > >> PERROR("Failed to remove ring page from guest physmap"); > >> + */ > >> + > >> + if ( xc_set_mem_access(xch, domain_id, XENMEM_access_r, > mmap_pfn, 1) ) > >> + { > >> + PERROR("Could not set ring page read-only\n"); > >> + goto out; > >> + } > >> > >> out: > >> saved_errno = errno; > >> > >> Should I send this as a patch for mainline as well? > > > > Probably a good idea, although I would include a code comment > explaining > > what is going on, because this is subtle if you don't know the > context. > > Will do, I'll send a patch out as soon as we've done a few more rounds > of testing. > > > (replying to all): I'm not in favor of this patch mainly because it is > not stealthy. A malicious kernel could easily track what events are > being sent on the ring. With DRAKVUF I could work around this using > altp2m pfn-remapping, but for other tools this is can be a serious > information leak. I understand your point, however the alternative is potential lack of availability to monitor which is arguably a more severe problem. _Any_ guest could choose to do what this Ubuntu 16.04 guest does, and then connecting to the guest via vm_event can only be done once. Thanks, Razvan
On Tue, Jan 10, 2017 at 9:34 AM, Razvan Cojocaru <rcojocaru@bitdefender.com> wrote: > On 01/10/2017 06:29 PM, Tamas K Lengyel wrote: > > > > > > On Tue, Jan 10, 2017 at 8:35 AM, Razvan Cojocaru > > <rcojocaru@bitdefender.com <mailto:rcojocaru@bitdefender.com>> wrote: > > > > >>> What I meant was taking out this call: > > >>> > > >>> /* Remove the ring_pfn from the guest's physmap */ > > >>> rc1 = xc_domain_decrease_reservation_exact(xch, domain_id, > 1, 0, > > >>> &ring_pfn); > > >>> if ( rc1 != 0 ) > > >>> PERROR("Failed to remove ring page from guest physmap"); > > >>> > > >>> To leave the frame in the guest physmap. The issue is > fundamentally > > >>> that after this frame has been taken out, something kicks the VM > to > > >>> realise it has an extra frame of balloonable space, which it > clearly > > >>> compensates for. > > >>> > > >>> You can work around the added attack surface by marking it RO in > > EPT; > > >>> neither Xen's nor dom0's mappings are translated via EPT, so > > they can > > >>> still make updates, but the guest won't be able to write to it. > > >>> > > >>> I should say that this is all a gross hack, and is in desperate > > need of > > >>> a proper API to make rings entirely outside of the gfn space, > > but this > > >>> hack should work for now. > > >> Thanks! So far, it seems to work like a charm like this: > > > > > > Great. > > > > > >> > > >> diff --git a/tools/libxc/xc_vm_event.c b/tools/libxc/xc_vm_event.c > > >> index 2fef96a..5dd00a6 100644 > > >> --- a/tools/libxc/xc_vm_event.c > > >> +++ b/tools/libxc/xc_vm_event.c > > >> @@ -130,9 +130,17 @@ void *xc_vm_event_enable(xc_interface *xch, > > domid_t > > >> domain_id, int param, > > >> } > > >> > > >> /* Remove the ring_pfn from the guest's physmap */ > > >> + /* > > >> rc1 = xc_domain_decrease_reservation_exact(xch, domain_id, > 1, 0, > > >> &ring_pfn); > > >> if ( rc1 != 0 ) > > >> PERROR("Failed to remove ring page from guest physmap"); > > >> + */ > > >> + > > >> + if ( xc_set_mem_access(xch, domain_id, XENMEM_access_r, > > mmap_pfn, 1) ) > > >> + { > > >> + PERROR("Could not set ring page read-only\n"); > > >> + goto out; > > >> + } > > >> > > >> out: > > >> saved_errno = errno; > > >> > > >> Should I send this as a patch for mainline as well? > > > > > > Probably a good idea, although I would include a code comment > > explaining > > > what is going on, because this is subtle if you don't know the > > context. > > > > Will do, I'll send a patch out as soon as we've done a few more > rounds > > of testing. > > > > > > (replying to all): I'm not in favor of this patch mainly because it is > > not stealthy. A malicious kernel could easily track what events are > > being sent on the ring. With DRAKVUF I could work around this using > > altp2m pfn-remapping, but for other tools this is can be a serious > > information leak. > > I understand your point, however the alternative is potential lack of > availability to monitor which is arguably a more severe problem. _Any_ > guest could choose to do what this Ubuntu 16.04 guest does, and then > connecting to the guest via vm_event can only be done once. > IMHO in that case you should implement your own internal version of this function to fall back to instead of forcing all tools to go down this path. The requirements to do that in your own tool are accessible so there is no need to push that into libxc. Tamas
On 01/10/2017 06:40 PM, Tamas K Lengyel wrote: > > (replying to all): I'm not in favor of this patch mainly because it is > > not stealthy. A malicious kernel could easily track what events are > > being sent on the ring. With DRAKVUF I could work around this using > > altp2m pfn-remapping, but for other tools this is can be a serious > > information leak. > > I understand your point, however the alternative is potential lack of > availability to monitor which is arguably a more severe problem. _Any_ > guest could choose to do what this Ubuntu 16.04 guest does, and then > connecting to the guest via vm_event can only be done once. > > > IMHO in that case you should implement your own internal version of this > function to fall back to instead of forcing all tools to go down this > path. The requirements to do that in your own tool are accessible so > there is no need to push that into libxc. Obviously I won't send a patch you're against as co-maintainer of vm_event, however looking at my version of xc_vm_event_enable() (from Xen 4.6), it's calling xc_vm_event_control() which is not publicly accesible - so going my own way would in this case still require libxc modifications. We could also add another parameter to xc_vm_event_enable(), for example "bool remove_page_from_guest_physmap" to control this behaviour, which would change the API - or leave the current API alone and add xc_vm_event_control_remap() or something to that effect. But in any case I don't see how any of this can be achieved with zero libxc modifications. Thanks, Razvan
diff --git a/tools/libxc/xc_vm_event.c b/tools/libxc/xc_vm_event.c index 2fef96a..5dd00a6 100644 --- a/tools/libxc/xc_vm_event.c +++ b/tools/libxc/xc_vm_event.c @@ -130,9 +130,17 @@ void *xc_vm_event_enable(xc_interface *xch, domid_t domain_id, int param, } /* Remove the ring_pfn from the guest's physmap */ + /* rc1 = xc_domain_decrease_reservation_exact(xch, domain_id, 1, 0, &ring_pfn); if ( rc1 != 0 ) PERROR("Failed to remove ring page from guest physmap"); + */ + + if ( xc_set_mem_access(xch, domain_id, XENMEM_access_r, mmap_pfn, 1) ) + { + PERROR("Could not set ring page read-only\n"); + goto out; + }