diff mbox series

[3/3] xen/arm: fix mask calculation in init_pdx

Message ID 1556916614-21512-3-git-send-email-sstabellini@kernel.org (mailing list archive)
State Superseded
Headers show
Series [1/3] xen/arm: fix nr_pdxs calculation | expand

Commit Message

Stefano Stabellini May 3, 2019, 8:50 p.m. UTC
The mask calculation in init_pdx is wrong when the first bank starts at
address 0x0. The reason is that pdx_init_mask will do '0 - 1' causing an
underflow. As a result, the mask becomes 0xffffffffffffffff which is the
biggest possible mask and ends up causing a significant memory waste in
the frametable size computation.

For instance, on platforms that have a low memory bank and a high memory
bank, the frametable will end up covering all the holes in between.

The purpose of the mask is to be passed as a parameter to
pfn_pdx_hole_setup, which based on the mask parameter caculates
pfn_pdx_hole_shift, pfn_pdx_bottom_mask, etc. which are actually the
important masks for frametable initialization later on.

pfn_pdx_hole_setup never compresses addresses below MAX_ORDER bits (1GB
on ARM). Thus, it is safe to initialize mask passing 1ULL << (MAX_ORDER
+ PAGE_SHIFT) as start address to pdx_init_mask.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
CC: JBeulich@suse.com
---
 xen/arch/arm/setup.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Jan Beulich May 6, 2019, 9:06 a.m. UTC | #1
>>> On 03.05.19 at 22:50, <sstabellini@kernel.org> wrote:
> --- a/xen/arch/arm/setup.c
> +++ b/xen/arch/arm/setup.c
> @@ -481,10 +481,15 @@ static paddr_t __init next_module(paddr_t s, paddr_t *end)
>  static void __init init_pdx(void)
>  {
>      paddr_t bank_start, bank_size, bank_end;
> -
> -    u64 mask = pdx_init_mask(bootinfo.mem.bank[0].start);
> +    u64 mask;
>      int bank;
>  
> +    /*
> +     * We always map the first 1<<MAX_ORDER of RAM, hence, they are left

"... pages of RAM." ?

> +     * uncompressed.
> +     */
> +    mask = pdx_init_mask(1ULL << (MAX_ORDER + PAGE_SHIFT));

PAGE_SIZE << MAX_ORDER?

I wonder whether pdx_init_mask() itself wouldn't better apply this
(lower) cap.

Jan
Julien Grall May 6, 2019, 3:26 p.m. UTC | #2
Hi Jan,

On 5/6/19 10:06 AM, Jan Beulich wrote:
>>>> On 03.05.19 at 22:50, <sstabellini@kernel.org> wrote:
>> --- a/xen/arch/arm/setup.c
>> +++ b/xen/arch/arm/setup.c
>> @@ -481,10 +481,15 @@ static paddr_t __init next_module(paddr_t s, paddr_t *end)
>>   static void __init init_pdx(void)
>>   {
>>       paddr_t bank_start, bank_size, bank_end;
>> -
>> -    u64 mask = pdx_init_mask(bootinfo.mem.bank[0].start);
>> +    u64 mask;
>>       int bank;
>>   
>> +    /*
>> +     * We always map the first 1<<MAX_ORDER of RAM, hence, they are left
> 
> "... pages of RAM." ?
> 
>> +     * uncompressed.
>> +     */
>> +    mask = pdx_init_mask(1ULL << (MAX_ORDER + PAGE_SHIFT));
> 
> PAGE_SIZE << MAX_ORDER?

Hmmm, I am not entirely convince this will give the correct value 
because PAGE_SIZE is defined as (_AC(1, L) << PAGE_SHIFT.

> 
> I wonder whether pdx_init_mask() itself wouldn't better apply this
> (lower) cap

Do you mean always returning (PAGE_SIZE << MAX_ORDER) or capping the 
init mask?

Note that the later will not produce the same behavior as this patch.

Cheers,
Jan Beulich May 7, 2019, 7:40 a.m. UTC | #3
>>> On 06.05.19 at 17:26, <julien.grall@arm.com> wrote:
> On 5/6/19 10:06 AM, Jan Beulich wrote:
>>>>> On 03.05.19 at 22:50, <sstabellini@kernel.org> wrote:
>>> --- a/xen/arch/arm/setup.c
>>> +++ b/xen/arch/arm/setup.c
>>> @@ -481,10 +481,15 @@ static paddr_t __init next_module(paddr_t s, paddr_t *end)
>>>   static void __init init_pdx(void)
>>>   {
>>>       paddr_t bank_start, bank_size, bank_end;
>>> -
>>> -    u64 mask = pdx_init_mask(bootinfo.mem.bank[0].start);
>>> +    u64 mask;
>>>       int bank;
>>>   
>>> +    /*
>>> +     * We always map the first 1<<MAX_ORDER of RAM, hence, they are left
>> 
>> "... pages of RAM." ?
>> 
>>> +     * uncompressed.
>>> +     */
>>> +    mask = pdx_init_mask(1ULL << (MAX_ORDER + PAGE_SHIFT));
>> 
>> PAGE_SIZE << MAX_ORDER?
> 
> Hmmm, I am not entirely convince this will give the correct value 
> because PAGE_SIZE is defined as (_AC(1, L) << PAGE_SHIFT.

Oh, indeed, for an abstract 32-bit arch this would de-generate, due
to MAX_ORDER being 20. Nevertheless I think the expression used
invites for "cleaning up" (making the same mistake I've made), and
since this is in Arm-specific code (where MAX_ORDER is 18) I think it
would still be better to use the suggested alternative.

>> I wonder whether pdx_init_mask() itself wouldn't better apply this
>> (lower) cap
> 
> Do you mean always returning (PAGE_SIZE << MAX_ORDER) or capping the 
> init mask?
> 
> Note that the later will not produce the same behavior as this patch.

Since I'm not sure I understand what you mean with "capping the
init mask", I'm also uncertain what alternative behavior you're
thinking of. What I'm suggesting is

u64 __init pdx_init_mask(u64 base_addr)
{
    return fill_mask(max(base_addr, (uint64_t)PAGE_SIZE << MAX_ORDER) - 1);
}

Jan
Julien Grall May 7, 2019, 8:59 a.m. UTC | #4
Hi Jan,

On 5/7/19 8:40 AM, Jan Beulich wrote:
>>>> On 06.05.19 at 17:26, <julien.grall@arm.com> wrote:
>> On 5/6/19 10:06 AM, Jan Beulich wrote:
>>>>>> On 03.05.19 at 22:50, <sstabellini@kernel.org> wrote:
>>>> --- a/xen/arch/arm/setup.c
>>>> +++ b/xen/arch/arm/setup.c
>>>> @@ -481,10 +481,15 @@ static paddr_t __init next_module(paddr_t s, paddr_t *end)
>>>>    static void __init init_pdx(void)
>>>>    {
>>>>        paddr_t bank_start, bank_size, bank_end;
>>>> -
>>>> -    u64 mask = pdx_init_mask(bootinfo.mem.bank[0].start);
>>>> +    u64 mask;
>>>>        int bank;
>>>>    
>>>> +    /*
>>>> +     * We always map the first 1<<MAX_ORDER of RAM, hence, they are left
>>>
>>> "... pages of RAM." ?
>>>
>>>> +     * uncompressed.
>>>> +     */
>>>> +    mask = pdx_init_mask(1ULL << (MAX_ORDER + PAGE_SHIFT));
>>>
>>> PAGE_SIZE << MAX_ORDER?
>>
>> Hmmm, I am not entirely convince this will give the correct value
>> because PAGE_SIZE is defined as (_AC(1, L) << PAGE_SHIFT.
> 
> Oh, indeed, for an abstract 32-bit arch this would de-generate, due
> to MAX_ORDER being 20. Nevertheless I think the expression used
> invites for "cleaning up" (making the same mistake I've made), and
> since this is in Arm-specific code (where MAX_ORDER is 18) I think it
> would still be better to use the suggested alternative.

The comment on top of PAGE_SIZE in asm-x86/page.h leads to think that 
PAGE_SIZE should use signed quantities. So I am not sure whether it is 
safe to switch to UL here.

If we keep PAGE_SIZE as signed quantities, then I would rather not used 
your suggestion because this may introduce buggy code if MAX_ORDER is 
ever updated on Arm.

> 
>>> I wonder whether pdx_init_mask() itself wouldn't better apply this
>>> (lower) cap
>>
>> Do you mean always returning (PAGE_SIZE << MAX_ORDER) or capping the
>> init mask?
>>
>> Note that the later will not produce the same behavior as this patch.
> 
> Since I'm not sure I understand what you mean with "capping the
> init mask", I'm also uncertain what alternative behavior you're
> thinking of. What I'm suggesting is
> 
> u64 __init pdx_init_mask(u64 base_addr)
> {
>      return fill_mask(max(base_addr, (uint64_t)PAGE_SIZE << MAX_ORDER) - 1);
> }

As I pointed out in the original thread, there are a couple of issues 
with this suggestion:
	1) banks are not ordered on Arm, so the pdx mask may get initialized 
with a higher bank address preventing the PDX compression to work
	2) the PDX will not be able to compress any bits between 0 and the MSB 
1' in the base_addr. In other word, for a base address 0x200000000 
(8GB), the initial mask will be  0x1ffffffff. I am aware of some 
platforms with some interesting first bank base address (i.e above 4GB).

2) is not overly critical, but I think 1) should be addressed.

At least on Arm, I don't see any real value to initialize the PDX mask 
with a base address. I would be more keen to implement pdx_init_mask() as:

return fill_mask(((uint64_t)PAGE_SIZE << MAX_ORDER - 1);

Cheers,
Jan Beulich May 7, 2019, 9:35 a.m. UTC | #5
>>> On 07.05.19 at 10:59, <julien.grall@arm.com> wrote:
> On 5/7/19 8:40 AM, Jan Beulich wrote:
>>>>> On 06.05.19 at 17:26, <julien.grall@arm.com> wrote:
>>> On 5/6/19 10:06 AM, Jan Beulich wrote:
>>>>>>> On 03.05.19 at 22:50, <sstabellini@kernel.org> wrote:
>>>>> +    mask = pdx_init_mask(1ULL << (MAX_ORDER + PAGE_SHIFT));
>>>>
>>>> PAGE_SIZE << MAX_ORDER?
>>>
>>> Hmmm, I am not entirely convince this will give the correct value
>>> because PAGE_SIZE is defined as (_AC(1, L) << PAGE_SHIFT.
>> 
>> Oh, indeed, for an abstract 32-bit arch this would de-generate, due
>> to MAX_ORDER being 20. Nevertheless I think the expression used
>> invites for "cleaning up" (making the same mistake I've made), and
>> since this is in Arm-specific code (where MAX_ORDER is 18) I think it
>> would still be better to use the suggested alternative.
> 
> The comment on top of PAGE_SIZE in asm-x86/page.h leads to think that 
> PAGE_SIZE should use signed quantities. So I am not sure whether it is 
> safe to switch to UL here.

It's not (at least when keeping past x86-32 in the picture): Extending
to unsigned long long works differently when the type is "unsigned long".
This matters when using things like ~(PAGE_SIZE - 1).

> If we keep PAGE_SIZE as signed quantities, then I would rather not used 
> your suggestion because this may introduce buggy code if MAX_ORDER is 
> ever updated on Arm.

A BUILD_BUG_ON() could help prevent this.

>>>> I wonder whether pdx_init_mask() itself wouldn't better apply this
>>>> (lower) cap
>>>
>>> Do you mean always returning (PAGE_SIZE << MAX_ORDER) or capping the
>>> init mask?
>>>
>>> Note that the later will not produce the same behavior as this patch.
>> 
>> Since I'm not sure I understand what you mean with "capping the
>> init mask", I'm also uncertain what alternative behavior you're
>> thinking of. What I'm suggesting is
>> 
>> u64 __init pdx_init_mask(u64 base_addr)
>> {
>>      return fill_mask(max(base_addr, (uint64_t)PAGE_SIZE << MAX_ORDER) - 1);
>> }
> 
> As I pointed out in the original thread, there are a couple of issues 
> with this suggestion:
> 	1) banks are not ordered on Arm, so the pdx mask may get initialized 
> with a higher bank address preventing the PDX compression to work

This is orthogonal to my suggestion here. It's up to Arm code to
call the function with the lowest bank's base address instead of
the first one.

> 	2) the PDX will not be able to compress any bits between 0 and the MSB 
> 1' in the base_addr. In other word, for a base address 0x200000000 
> (8GB), the initial mask will be  0x1ffffffff. I am aware of some 
> platforms with some interesting first bank base address (i.e above 4GB).

Well, we'd been there before: More "interesting" layouts may
indeed require adjustments to the logic. The particular case
we've been talking about was there not being _any_ RAM
below a certain boundary.

> 2) is not overly critical, but I think 1) should be addressed.
> 
> At least on Arm, I don't see any real value to initialize the PDX mask 
> with a base address. I would be more keen to implement pdx_init_mask() as:
> 
> return fill_mask(((uint64_t)PAGE_SIZE << MAX_ORDER - 1);

But (besides the missing closing parenthese) that's not what x86 wants.

Jan
Julien Grall May 7, 2019, 1:20 p.m. UTC | #6
Hi,

On 5/7/19 10:35 AM, Jan Beulich wrote:
>>>> On 07.05.19 at 10:59, <julien.grall@arm.com> wrote:
>> On 5/7/19 8:40 AM, Jan Beulich wrote:
>>>>>> On 06.05.19 at 17:26, <julien.grall@arm.com> wrote:
>>>> On 5/6/19 10:06 AM, Jan Beulich wrote:
>>>>>>>> On 03.05.19 at 22:50, <sstabellini@kernel.org> wrote:
>>>>>> +    mask = pdx_init_mask(1ULL << (MAX_ORDER + PAGE_SHIFT));
>>>>>
>>>>> PAGE_SIZE << MAX_ORDER?
>>>>
>>>> Hmmm, I am not entirely convince this will give the correct value
>>>> because PAGE_SIZE is defined as (_AC(1, L) << PAGE_SHIFT.
>>>
>>> Oh, indeed, for an abstract 32-bit arch this would de-generate, due
>>> to MAX_ORDER being 20. Nevertheless I think the expression used
>>> invites for "cleaning up" (making the same mistake I've made), and
>>> since this is in Arm-specific code (where MAX_ORDER is 18) I think it
>>> would still be better to use the suggested alternative.
>>
>> The comment on top of PAGE_SIZE in asm-x86/page.h leads to think that
>> PAGE_SIZE should use signed quantities. So I am not sure whether it is
>> safe to switch to UL here.
> 
> It's not (at least when keeping past x86-32 in the picture): Extending
> to unsigned long long works differently when the type is "unsigned long".
> This matters when using things like ~(PAGE_SIZE - 1).
> 
>> If we keep PAGE_SIZE as signed quantities, then I would rather not used
>> your suggestion because this may introduce buggy code if MAX_ORDER is
>> ever updated on Arm.
> 
> A BUILD_BUG_ON() could help prevent this.

Good point.

> 
>>>>> I wonder whether pdx_init_mask() itself wouldn't better apply this
>>>>> (lower) cap
>>>>
>>>> Do you mean always returning (PAGE_SIZE << MAX_ORDER) or capping the
>>>> init mask?
>>>>
>>>> Note that the later will not produce the same behavior as this patch.
>>>
>>> Since I'm not sure I understand what you mean with "capping the
>>> init mask", I'm also uncertain what alternative behavior you're
>>> thinking of. What I'm suggesting is
>>>
>>> u64 __init pdx_init_mask(u64 base_addr)
>>> {
>>>       return fill_mask(max(base_addr, (uint64_t)PAGE_SIZE << MAX_ORDER) - 1);
>>> }
>>
>> As I pointed out in the original thread, there are a couple of issues
>> with this suggestion:
>> 	1) banks are not ordered on Arm, so the pdx mask may get initialized
>> with a higher bank address preventing the PDX compression to work
> 
> This is orthogonal to my suggestion here. It's up to Arm code to
> call the function with the lowest bank's base address instead of
> the first one. >
>> 	2) the PDX will not be able to compress any bits between 0 and the MSB
>> 1' in the base_addr. In other word, for a base address 0x200000000
>> (8GB), the initial mask will be  0x1ffffffff. I am aware of some
>> platforms with some interesting first bank base address (i.e above 4GB).
> 
> Well, we'd been there before: More "interesting" layouts may
> indeed require adjustments to the logic. The particular case
> we've been talking about was there not being _any_ RAM
> below a certain boundary.
Yes this is unrelated to the case Stefano is trying to fix, however 
Stefano & I have also been discussing of other potential issues with PDX.

I would rather try to address the most important/concerning one at the 
same time. Stefano's patch is actually fixing all the known issues with 
PDX on Arm.

>> 2) is not overly critical, but I think 1) should be addressed.
>>
>> At least on Arm, I don't see any real value to initialize the PDX mask
>> with a base address. I would be more keen to implement pdx_init_mask() as:
>>
>> return fill_mask(((uint64_t)PAGE_SIZE << MAX_ORDER - 1);
> 
> But (besides the missing closing parenthese) that's not what x86 wants.

Could you remind me why?

Cheers,
Jan Beulich May 7, 2019, 1:57 p.m. UTC | #7
>>> On 07.05.19 at 15:20, <julien.grall@arm.com> wrote:
> On 5/7/19 10:35 AM, Jan Beulich wrote:
>>>>> On 07.05.19 at 10:59, <julien.grall@arm.com> wrote:
>>> At least on Arm, I don't see any real value to initialize the PDX mask
>>> with a base address. I would be more keen to implement pdx_init_mask() as:
>>>
>>> return fill_mask(((uint64_t)PAGE_SIZE << MAX_ORDER - 1);
>> 
>> But (besides the missing closing parenthese) that's not what x86 wants.
> 
> Could you remind me why?

Because we don't want to compress on the low 32 bits, for there
being "interesting" things below 4Gb.

Jan
Stefano Stabellini May 8, 2019, 10:31 p.m. UTC | #8
On Tue, 7 May 2019, Julien Grall wrote:
> Hi,
> 
> On 5/7/19 10:35 AM, Jan Beulich wrote:
> > > > > On 07.05.19 at 10:59, <julien.grall@arm.com> wrote:
> > > On 5/7/19 8:40 AM, Jan Beulich wrote:
> > > > > > > On 06.05.19 at 17:26, <julien.grall@arm.com> wrote:
> > > > > On 5/6/19 10:06 AM, Jan Beulich wrote:
> > > > > > > > > On 03.05.19 at 22:50, <sstabellini@kernel.org> wrote:
> > > > > > > +    mask = pdx_init_mask(1ULL << (MAX_ORDER + PAGE_SHIFT));
> > > > > > 
> > > > > > PAGE_SIZE << MAX_ORDER?
> > > > > 
> > > > > Hmmm, I am not entirely convince this will give the correct value
> > > > > because PAGE_SIZE is defined as (_AC(1, L) << PAGE_SHIFT.
> > > > 
> > > > Oh, indeed, for an abstract 32-bit arch this would de-generate, due
> > > > to MAX_ORDER being 20. Nevertheless I think the expression used
> > > > invites for "cleaning up" (making the same mistake I've made), and
> > > > since this is in Arm-specific code (where MAX_ORDER is 18) I think it
> > > > would still be better to use the suggested alternative.
> > > 
> > > The comment on top of PAGE_SIZE in asm-x86/page.h leads to think that
> > > PAGE_SIZE should use signed quantities. So I am not sure whether it is
> > > safe to switch to UL here.
> > 
> > It's not (at least when keeping past x86-32 in the picture): Extending
> > to unsigned long long works differently when the type is "unsigned long".
> > This matters when using things like ~(PAGE_SIZE - 1).
> > 
> > > If we keep PAGE_SIZE as signed quantities, then I would rather not used
> > > your suggestion because this may introduce buggy code if MAX_ORDER is
> > > ever updated on Arm.
> > 
> > A BUILD_BUG_ON() could help prevent this.
> 
> Good point.

Fair enough, but I would rather keep the original version because I don't see
how PAGE_SIZE << MAX_ORDER could be an improvement. It is also more
understandable I think.


> > > > > > I wonder whether pdx_init_mask() itself wouldn't better apply this
> > > > > > (lower) cap
> > > > > 
> > > > > Do you mean always returning (PAGE_SIZE << MAX_ORDER) or capping the
> > > > > init mask?
> > > > > 
> > > > > Note that the later will not produce the same behavior as this patch.
> > > > 
> > > > Since I'm not sure I understand what you mean with "capping the
> > > > init mask", I'm also uncertain what alternative behavior you're
> > > > thinking of. What I'm suggesting is
> > > > 
> > > > u64 __init pdx_init_mask(u64 base_addr)
> > > > {
> > > >       return fill_mask(max(base_addr, (uint64_t)PAGE_SIZE << MAX_ORDER)
> > > > - 1);
> > > > }
> > > 
> > > As I pointed out in the original thread, there are a couple of issues
> > > with this suggestion:
> > > 	1) banks are not ordered on Arm, so the pdx mask may get initialized
> > > with a higher bank address preventing the PDX compression to work
> > 
> > This is orthogonal to my suggestion here. It's up to Arm code to
> > call the function with the lowest bank's base address instead of
> > the first one. >
> > > 	2) the PDX will not be able to compress any bits between 0 and the MSB
> > > 1' in the base_addr. In other word, for a base address 0x200000000
> > > (8GB), the initial mask will be  0x1ffffffff. I am aware of some
> > > platforms with some interesting first bank base address (i.e above 4GB).
> > 
> > Well, we'd been there before: More "interesting" layouts may
> > indeed require adjustments to the logic. The particular case
> > we've been talking about was there not being _any_ RAM
> > below a certain boundary.
> Yes this is unrelated to the case Stefano is trying to fix, however Stefano &
> I have also been discussing of other potential issues with PDX.
> 
> I would rather try to address the most important/concerning one at the same
> time. Stefano's patch is actually fixing all the known issues with PDX on Arm.
> 
> > > 2) is not overly critical, but I think 1) should be addressed.
> > > 
> > > At least on Arm, I don't see any real value to initialize the PDX mask
> > > with a base address. I would be more keen to implement pdx_init_mask() as:
> > > 
> > > return fill_mask(((uint64_t)PAGE_SIZE << MAX_ORDER - 1);
> > 
> > But (besides the missing closing parenthese) that's not what x86 wants.

It is not a problem to move the change into pdx_init_mask, and it could
make sense to do so. From init_pdx we'll just pass 0x0 to get the
behavior of the current patch.

I'll send an update
diff mbox series

Patch

diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index ccb0f18..22f20bb 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -481,10 +481,15 @@  static paddr_t __init next_module(paddr_t s, paddr_t *end)
 static void __init init_pdx(void)
 {
     paddr_t bank_start, bank_size, bank_end;
-
-    u64 mask = pdx_init_mask(bootinfo.mem.bank[0].start);
+    u64 mask;
     int bank;
 
+    /*
+     * We always map the first 1<<MAX_ORDER of RAM, hence, they are left
+     * uncompressed.
+     */
+    mask = pdx_init_mask(1ULL << (MAX_ORDER + PAGE_SHIFT));
+
     for ( bank = 0 ; bank < bootinfo.mem.nr_banks; bank++ )
     {
         bank_start = bootinfo.mem.bank[bank].start;