diff mbox series

xen/x86: domain: Free all the pages associated to struct domain

Message ID 20200120143142.19820-1-julien@xen.org (mailing list archive)
State New, archived
Headers show
Series xen/x86: domain: Free all the pages associated to struct domain | expand

Commit Message

Julien Grall Jan. 20, 2020, 2:31 p.m. UTC
From: Julien Grall <jgrall@amazon.com>

The structure domain may be bigger than a page size when lock profiling
is enabled. However, the function free_domheap_struct will only free the
first page.

This is not a security issue because struct domain can only be bigger
than a page size for lock profiling. The feature can only be selected
in DEBUG and EXPERT mode.

Fixes: 8916fcf4577 ("x86/domain: compile with lock_profile=y enabled")
Reported-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 xen/arch/x86/domain.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jan Beulich Jan. 20, 2020, 2:40 p.m. UTC | #1
On 20.01.2020 15:31, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> The structure domain may be bigger than a page size when lock profiling
> is enabled. However, the function free_domheap_struct will only free the
> first page.
> 
> This is not a security issue because struct domain can only be bigger
> than a page size for lock profiling. The feature can only be selected
> in DEBUG and EXPERT mode.
> 
> Fixes: 8916fcf4577 ("x86/domain: compile with lock_profile=y enabled")
> Reported-by: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné Jan. 20, 2020, 2:50 p.m. UTC | #2
On Mon, Jan 20, 2020 at 02:31:42PM +0000, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> The structure domain may be bigger than a page size when lock profiling
> is enabled. However, the function free_domheap_struct will only free the
> first page.
> 
> This is not a security issue because struct domain can only be bigger
> than a page size for lock profiling. The feature can only be selected
> in DEBUG and EXPERT mode.
> 
> Fixes: 8916fcf4577 ("x86/domain: compile with lock_profile=y enabled")
> Reported-by: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.
Andrew Cooper Jan. 22, 2020, 12:52 p.m. UTC | #3
On 20/01/2020 14:31, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
>
> The structure domain may be bigger than a page size when lock profiling
> is enabled. However, the function free_domheap_struct will only free the
> first page.
>
> This is not a security issue because struct domain can only be bigger
> than a page size for lock profiling. The feature can only be selected
> in DEBUG and EXPERT mode.
>
> Fixes: 8916fcf4577 ("x86/domain: compile with lock_profile=y enabled")
> Reported-by: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> ---
>  xen/arch/x86/domain.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 28fefa1f81..a5380b9bab 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -344,7 +344,7 @@ struct domain *alloc_domain_struct(void)
>  
>  void free_domain_struct(struct domain *d)
>  {
> -    free_xenheap_page(d);
> +    free_xenheap_pages(d, get_order_from_bytes(sizeof(*d)));

:(

I'm entirely certain I raised this during the review of the original patch.

I'd much rather see the original patch reverted.  The current size of
struct domain with lockprofile enabled is 3200 bytes.

~Andrew
Julien Grall Jan. 22, 2020, 1:13 p.m. UTC | #4
Hi Andrew,

On 22/01/2020 12:52, Andrew Cooper wrote:
> On 20/01/2020 14:31, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> The structure domain may be bigger than a page size when lock profiling
>> is enabled. However, the function free_domheap_struct will only free the
>> first page.
>>
>> This is not a security issue because struct domain can only be bigger
>> than a page size for lock profiling. The feature can only be selected
>> in DEBUG and EXPERT mode.
>>
>> Fixes: 8916fcf4577 ("x86/domain: compile with lock_profile=y enabled")
>> Reported-by: David Woodhouse <dwmw@amazon.co.uk>
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>> ---
>>   xen/arch/x86/domain.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>> index 28fefa1f81..a5380b9bab 100644
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -344,7 +344,7 @@ struct domain *alloc_domain_struct(void)
>>   
>>   void free_domain_struct(struct domain *d)
>>   {
>> -    free_xenheap_page(d);
>> +    free_xenheap_pages(d, get_order_from_bytes(sizeof(*d)));
> 
> :(
> 
> I'm entirely certain I raised this during the review of the original patch.
> 
> I'd much rather see the original patch reverted.  The current size of
> struct domain with lockprofile enabled is 3200 bytes.

Let me have a look first to see when/why struct domain is less than 4K 
with lockprofile.

Cheers,
Andrew Cooper Jan. 22, 2020, 1:15 p.m. UTC | #5
On 22/01/2020 13:13, Julien Grall wrote:
> Hi Andrew,
>
> On 22/01/2020 12:52, Andrew Cooper wrote:
>> On 20/01/2020 14:31, Julien Grall wrote:
>>> From: Julien Grall <jgrall@amazon.com>
>>>
>>> The structure domain may be bigger than a page size when lock profiling
>>> is enabled. However, the function free_domheap_struct will only free
>>> the
>>> first page.
>>>
>>> This is not a security issue because struct domain can only be bigger
>>> than a page size for lock profiling. The feature can only be selected
>>> in DEBUG and EXPERT mode.
>>>
>>> Fixes: 8916fcf4577 ("x86/domain: compile with lock_profile=y enabled")
>>> Reported-by: David Woodhouse <dwmw@amazon.co.uk>
>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>> ---
>>>   xen/arch/x86/domain.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>>> index 28fefa1f81..a5380b9bab 100644
>>> --- a/xen/arch/x86/domain.c
>>> +++ b/xen/arch/x86/domain.c
>>> @@ -344,7 +344,7 @@ struct domain *alloc_domain_struct(void)
>>>     void free_domain_struct(struct domain *d)
>>>   {
>>> -    free_xenheap_page(d);
>>> +    free_xenheap_pages(d, get_order_from_bytes(sizeof(*d)));
>>
>> :(
>>
>> I'm entirely certain I raised this during the review of the original
>> patch.
>>
>> I'd much rather see the original patch reverted.  The current size of
>> struct domain with lockprofile enabled is 3200 bytes.
>
> Let me have a look first to see when/why struct domain is less than 4K
> with lockprofile.

In the intervening time, Juergen has totally rewritten lock profiling,
and the virtual timer infrastructure has been taken out-of-line.

Its probably the latter which is the dominating factor.

~Andrew
Woodhouse, David Jan. 22, 2020, 1:50 p.m. UTC | #6
On Wed, 2020-01-22 at 13:15 +0000, Andrew Cooper wrote:
> > > I'd much rather see the original patch reverted.  The current size of
> > > struct domain with lockprofile enabled is 3200 bytes.
> > 
> > Let me have a look first to see when/why struct domain is less than 4K
> > with lockprofile.
> 
> In the intervening time, Juergen has totally rewritten lock profiling,
> and the virtual timer infrastructure has been taken out-of-line.
> 
> Its probably the latter which is the dominating factor.

OK, so if we revert 8916fcf4577 is it reasonable for me to then assume
that 'struct domain' will always fit within a page, and declare that
live update shall not work to a Xen where that isn't true?

I have a nasty trick in mind...

During live update, we need to do a pass over the live update data in
early boot in order to work out which pages to reserve. That part has
to be done early, while the boot allocator is active. It works by
setting the PGC_allocated bit in the page_info of the reserved pages,
so that init_heap_pages() knows not to include them. I've posted that
part already.

Also during live update, we need to consume the actual domain state
that was passed over from the previous Xen, and fill in the owner (and
refcount etc.) in the same page_info structures, before those pages are
in a truly consistent state.

Right now, we need the latter to happen *after* the boot allocator is
done and we're able to allocate from the heap... because we need to be
able to allocate the domain structures, and we don't want to ensure
that there's enough space in the LU reserved bootmem for that many
domain structures.

Hence the nasty trick: What if we allocate the new struct domain on
*top* of the old one, since we happen to know that page *wasn't* used
by the previous Xen for anything that needs to be preserved. The
structure itself isn't an ABI and never can be, and it will have to be
repopulated from the live migration data, of course — but if we can at
least make the assumption that it'll *fit*, then perhaps we can manage
to do both of the above with only one pass over all the domain pages.

This will have a direct effect on the customer-observed pause time for
a live update. So it's kind of icky, but also *very* tempting...



Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.
Julien Grall Jan. 22, 2020, 2:23 p.m. UTC | #7
Hi David,

On 22/01/2020 13:50, Woodhouse, David wrote:
> On Wed, 2020-01-22 at 13:15 +0000, Andrew Cooper wrote:
>>>> I'd much rather see the original patch reverted.  The current size of
>>>> struct domain with lockprofile enabled is 3200 bytes.
>>>
>>> Let me have a look first to see when/why struct domain is less than 4K
>>> with lockprofile.
>>
>> In the intervening time, Juergen has totally rewritten lock profiling,
>> and the virtual timer infrastructure has been taken out-of-line.
>>
>> Its probably the latter which is the dominating factor.
> 
> OK, so if we revert 8916fcf4577 is it reasonable for me to then assume
> that 'struct domain' will always fit within a page, and declare that
> live update shall not work to a Xen where that isn't true?
While it is nice to have struct domain small, I would rather not bake to 
size assumption in live update.

The more on Arm we have been quite often close to the limit. So I don't 
want to limit my choices.

> 
> I have a nasty trick in mind...
> 
> During live update, we need to do a pass over the live update data in
> early boot in order to work out which pages to reserve. That part has
> to be done early, while the boot allocator is active. It works by
> setting the PGC_allocated bit in the page_info of the reserved pages,
> so that init_heap_pages() knows not to include them. I've posted that
> part already.
> 
> Also during live update, we need to consume the actual domain state
> that was passed over from the previous Xen, and fill in the owner (and
> refcount etc.) in the same page_info structures, before those pages are
> in a truly consistent state.
> 
> Right now, we need the latter to happen *after* the boot allocator is
> done and we're able to allocate from the heap... because we need to be
> able to allocate the domain structures, and we don't want to ensure
> that there's enough space in the LU reserved bootmem for that many
> domain structures.

As you pointed out above, the struct domain itself is a page (i.e 4KB on 
x86). Let say you have 256 domains, this would only use 1MB. Which is 
not too bad to take into account.

But struct domain has a lot of out-of-line allocation. If you are not 
planning to make struct domain part of the ABI, then you would need 
quite a few allocations even in your cases.

Of course, you could just half initialize struct domain. However, I 
would be cautious with this solution as it would be more difficult to 
chase down bug around struct page. Imagine the domain pointer is 
accessed earlier than expected.

> Hence the nasty trick: What if we allocate the new struct domain on
> *top* of the old one, since we happen to know that page *wasn't* used
> by the previous Xen for anything that needs to be preserved. The
> structure itself isn't an ABI and never can be, and it will have to be
> repopulated from the live migration data, of course — but if we can at
> least make the assumption that it'll *fit*, then perhaps we can manage
> to do both of the above with only one pass over all the domain pages.
> 
> This will have a direct effect on the customer-observed pause time for
> a live update. So it's kind of icky, but also *very* tempting...

May I recommend to get some numbers with the multiple pass and "nicer" 
code first? We can decide on that what sort of hackery we need to lower 
the pause time.

Cheers,
diff mbox series

Patch

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 28fefa1f81..a5380b9bab 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -344,7 +344,7 @@  struct domain *alloc_domain_struct(void)
 
 void free_domain_struct(struct domain *d)
 {
-    free_xenheap_page(d);
+    free_xenheap_pages(d, get_order_from_bytes(sizeof(*d)));
 }
 
 struct vcpu *alloc_vcpu_struct(const struct domain *d)