[[PATCH,for-4.13] ] xen/arm: mm: Allow generic xen page-tables helpers to be called early
diff mbox series

Message ID 20190917160202.16770-1-julien.grall@arm.com
State New
Headers show
Series
  • [[PATCH,for-4.13] ] xen/arm: mm: Allow generic xen page-tables helpers to be called early
Related show

Commit Message

Julien Grall Sept. 17, 2019, 4:02 p.m. UTC
The current implementations of xen_{map, unmap}_table() expect
{map, unmap}_domain_page() to be usable. Those helpers are used to
map/unmap page tables while update Xen page-tables.

Since commit 022387ee1a "xen/arm: mm: Don't open-code Xen PT update in
{set, clear}_fixmap()", setup_fixmap() will make use of the helpers
mentioned above. When booting Xen using GRUB, setup_fixmap() may be used
before map_domain_page() can be called. This will result to data abort:

(XEN) Data Abort Trap. Syndrome=0x5
(XEN) CPU0: Unexpected Trap: Data Abort

[...]

(XEN) Xen call trace:
(XEN)    [<000000000025ab6c>] mm.c#xen_pt_update+0x2b4/0x59c (PC)
(XEN)    [<000000000025ab20>] mm.c#xen_pt_update+0x268/0x59c (LR)
(XEN)    [<000000000025ae70>] set_fixmap+0x1c/0x2c
(XEN)    [<00000000002a9c98>] copy_from_paddr+0x7c/0xdc
(XEN)    [<00000000002a4ae0>] has_xsm_magic+0x18/0x34
(XEN)    [<00000000002a5b5c>] bootfdt.c#early_scan_node+0x398/0x560
(XEN)    [<00000000002a5de0>] device_tree_for_each_node+0xbc/0x144
(XEN)    [<00000000002a5ed4>] boot_fdt_info+0x6c/0x260
(XEN)    [<00000000002ac0d0>] start_xen+0x108/0xc74
(XEN)    [<000000000020044c>] arm64/head.o#paging+0x60/0x88

During early boot, the page tables are either statically allocated in
Xen binary or allocated via alloc_boot_pages().

For statically allocated page-tables, they will already be mapped as
part of Xen binary. So we can easily find the virtual address.

For dynamically allocated page-tables, we need to rely
map_domain_page() to be functionally working.

For arm32, the call will be usable much before page can be dynamically
allocated (see setup_pagetables()). For arm64, the call will be usable
after setup_xenheap_mappings().

In both cases, memory are given to the boot allocator afterwards. So we
can rely on map_domain_page() for mapping page tables allocated
dynamically.

The helpers xen_{map, unmap}_table() are now updated to take into
account the case where page-tables are part of Xen binary.

Fixes: 022387ee1a ('xen/arm: mm: Don't open-code Xen PT update in {set, clear}_fixmap()')
Signed-off-by: Julien Grall <julien.grall@arm.com>
---
 xen/arch/arm/mm.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

Comments

Stefano Stabellini Sept. 19, 2019, 11:37 p.m. UTC | #1
On Tue, 17 Sep 2019, Julien Grall wrote:
> The current implementations of xen_{map, unmap}_table() expect
> {map, unmap}_domain_page() to be usable. Those helpers are used to
> map/unmap page tables while update Xen page-tables.
> 
> Since commit 022387ee1a "xen/arm: mm: Don't open-code Xen PT update in
> {set, clear}_fixmap()", setup_fixmap() will make use of the helpers
> mentioned above. When booting Xen using GRUB, setup_fixmap() may be used
> before map_domain_page() can be called. This will result to data abort:
> 
> (XEN) Data Abort Trap. Syndrome=0x5
> (XEN) CPU0: Unexpected Trap: Data Abort
> 
> [...]
> 
> (XEN) Xen call trace:
> (XEN)    [<000000000025ab6c>] mm.c#xen_pt_update+0x2b4/0x59c (PC)
> (XEN)    [<000000000025ab20>] mm.c#xen_pt_update+0x268/0x59c (LR)
> (XEN)    [<000000000025ae70>] set_fixmap+0x1c/0x2c
> (XEN)    [<00000000002a9c98>] copy_from_paddr+0x7c/0xdc
> (XEN)    [<00000000002a4ae0>] has_xsm_magic+0x18/0x34
> (XEN)    [<00000000002a5b5c>] bootfdt.c#early_scan_node+0x398/0x560
> (XEN)    [<00000000002a5de0>] device_tree_for_each_node+0xbc/0x144
> (XEN)    [<00000000002a5ed4>] boot_fdt_info+0x6c/0x260
> (XEN)    [<00000000002ac0d0>] start_xen+0x108/0xc74
> (XEN)    [<000000000020044c>] arm64/head.o#paging+0x60/0x88
> 
> During early boot, the page tables are either statically allocated in
> Xen binary or allocated via alloc_boot_pages().
> 
> For statically allocated page-tables, they will already be mapped as
> part of Xen binary. So we can easily find the virtual address.
> 
> For dynamically allocated page-tables, we need to rely
> map_domain_page() to be functionally working.
> 
> For arm32, the call will be usable much before page can be dynamically
> allocated (see setup_pagetables()). For arm64, the call will be usable
> after setup_xenheap_mappings().
> 
> In both cases, memory are given to the boot allocator afterwards. So we
> can rely on map_domain_page() for mapping page tables allocated
> dynamically.
> 
> The helpers xen_{map, unmap}_table() are now updated to take into
> account the case where page-tables are part of Xen binary.
> 
> Fixes: 022387ee1a ('xen/arm: mm: Don't open-code Xen PT update in {set, clear}_fixmap()')
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> ---
>  xen/arch/arm/mm.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index e1cdeaaf2f..da6303a8fd 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -950,11 +950,31 @@ static int create_xen_table(lpae_t *entry)
>  
>  static lpae_t *xen_map_table(mfn_t mfn)
>  {
> +    /*
> +     * We may require to map the page table before map_domain_page() is
> +     * useable. The requirements here is it must be useable as soon as
> +     * page-tables are allocated dynamically via alloc_boot_pages().
> +     */
> +    if ( system_state == SYS_STATE_early_boot )
> +    {
> +        vaddr_t va = mfn_to_maddr(mfn) - phys_offset;
> +
> +        if ( is_kernel(va) )
> +            return (lpae_t *)va;

Is it intended to continue if it is not a xen text page? Shouldn't we
BUG() or WARN?


> +    }
> +
>      return map_domain_page(mfn);
>  }
>  
>  static void xen_unmap_table(const lpae_t *table)
>  {
> +    /*
> +     * During early boot, xen_map_table() will not use map_domain_page()
> +     * for page-tables residing in Xen binary. So skip the unmap part.
> +     */
> +    if ( system_state == SYS_STATE_early_boot && is_kernel(table) )
> +        return;
> +
>      unmap_domain_page(table);
>  }
Julien Grall Sept. 20, 2019, 9:44 a.m. UTC | #2
Hi,

On 20/09/2019 00:37, Stefano Stabellini wrote:
> On Tue, 17 Sep 2019, Julien Grall wrote:
>> The current implementations of xen_{map, unmap}_table() expect
>> {map, unmap}_domain_page() to be usable. Those helpers are used to
>> map/unmap page tables while update Xen page-tables.
>>
>> Since commit 022387ee1a "xen/arm: mm: Don't open-code Xen PT update in
>> {set, clear}_fixmap()", setup_fixmap() will make use of the helpers
>> mentioned above. When booting Xen using GRUB, setup_fixmap() may be used
>> before map_domain_page() can be called. This will result to data abort:
>>
>> (XEN) Data Abort Trap. Syndrome=0x5
>> (XEN) CPU0: Unexpected Trap: Data Abort
>>
>> [...]
>>
>> (XEN) Xen call trace:
>> (XEN)    [<000000000025ab6c>] mm.c#xen_pt_update+0x2b4/0x59c (PC)
>> (XEN)    [<000000000025ab20>] mm.c#xen_pt_update+0x268/0x59c (LR)
>> (XEN)    [<000000000025ae70>] set_fixmap+0x1c/0x2c
>> (XEN)    [<00000000002a9c98>] copy_from_paddr+0x7c/0xdc
>> (XEN)    [<00000000002a4ae0>] has_xsm_magic+0x18/0x34
>> (XEN)    [<00000000002a5b5c>] bootfdt.c#early_scan_node+0x398/0x560
>> (XEN)    [<00000000002a5de0>] device_tree_for_each_node+0xbc/0x144
>> (XEN)    [<00000000002a5ed4>] boot_fdt_info+0x6c/0x260
>> (XEN)    [<00000000002ac0d0>] start_xen+0x108/0xc74
>> (XEN)    [<000000000020044c>] arm64/head.o#paging+0x60/0x88
>>
>> During early boot, the page tables are either statically allocated in
>> Xen binary or allocated via alloc_boot_pages().
>>
>> For statically allocated page-tables, they will already be mapped as
>> part of Xen binary. So we can easily find the virtual address.
>>
>> For dynamically allocated page-tables, we need to rely
>> map_domain_page() to be functionally working.
>>
>> For arm32, the call will be usable much before page can be dynamically
>> allocated (see setup_pagetables()). For arm64, the call will be usable
>> after setup_xenheap_mappings().
>>
>> In both cases, memory are given to the boot allocator afterwards. So we
>> can rely on map_domain_page() for mapping page tables allocated
>> dynamically.
>>
>> The helpers xen_{map, unmap}_table() are now updated to take into
>> account the case where page-tables are part of Xen binary.
>>
>> Fixes: 022387ee1a ('xen/arm: mm: Don't open-code Xen PT update in {set, clear}_fixmap()')
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>> ---
>>   xen/arch/arm/mm.c | 20 ++++++++++++++++++++
>>   1 file changed, 20 insertions(+)
>>
>> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
>> index e1cdeaaf2f..da6303a8fd 100644
>> --- a/xen/arch/arm/mm.c
>> +++ b/xen/arch/arm/mm.c
>> @@ -950,11 +950,31 @@ static int create_xen_table(lpae_t *entry)
>>   
>>   static lpae_t *xen_map_table(mfn_t mfn)
>>   {
>> +    /*
>> +     * We may require to map the page table before map_domain_page() is
>> +     * useable. The requirements here is it must be useable as soon as
>> +     * page-tables are allocated dynamically via alloc_boot_pages().
>> +     */
>> +    if ( system_state == SYS_STATE_early_boot )
>> +    {
>> +        vaddr_t va = mfn_to_maddr(mfn) - phys_offset;
>> +
>> +        if ( is_kernel(va) )
>> +            return (lpae_t *)va;
> 
> Is it intended to continue if it is not a xen text page? Shouldn't we
> BUG() or WARN?
Yes, I wrote the rationale in the commit message and a summary in a few lines 
above. For convenience, I pasted the commit message again here:

"During early boot, the page tables are either statically allocated in
Xen binary or allocated via alloc_boot_pages().

For statically allocated page-tables, they will already be mapped as
part of Xen binary. So we can easily find the virtual address.

For dynamically allocated page-tables, we need to rely
map_domain_page() to be functionally working.

For arm32, the call will be usable much before page can be dynamically
allocated (see setup_pagetables()). For arm64, the call will be usable
after setup_xenheap_mappings().

In both cases, memory are given to the boot allocator afterwards. So we
can rely on map_domain_page() for mapping page tables allocated
dynamically."

Cheers,
Stefano Stabellini Sept. 20, 2019, 3:16 p.m. UTC | #3
On Fri, 20 Sep 2019, Julien Grall wrote:
> On 20/09/2019 00:37, Stefano Stabellini wrote:
> > On Tue, 17 Sep 2019, Julien Grall wrote:
> > > The current implementations of xen_{map, unmap}_table() expect
> > > {map, unmap}_domain_page() to be usable. Those helpers are used to
> > > map/unmap page tables while update Xen page-tables.
> > > 
> > > Since commit 022387ee1a "xen/arm: mm: Don't open-code Xen PT update in
> > > {set, clear}_fixmap()", setup_fixmap() will make use of the helpers
> > > mentioned above. When booting Xen using GRUB, setup_fixmap() may be used
> > > before map_domain_page() can be called. This will result to data abort:
> > > 
> > > (XEN) Data Abort Trap. Syndrome=0x5
> > > (XEN) CPU0: Unexpected Trap: Data Abort
> > > 
> > > [...]
> > > 
> > > (XEN) Xen call trace:
> > > (XEN)    [<000000000025ab6c>] mm.c#xen_pt_update+0x2b4/0x59c (PC)
> > > (XEN)    [<000000000025ab20>] mm.c#xen_pt_update+0x268/0x59c (LR)
> > > (XEN)    [<000000000025ae70>] set_fixmap+0x1c/0x2c
> > > (XEN)    [<00000000002a9c98>] copy_from_paddr+0x7c/0xdc
> > > (XEN)    [<00000000002a4ae0>] has_xsm_magic+0x18/0x34
> > > (XEN)    [<00000000002a5b5c>] bootfdt.c#early_scan_node+0x398/0x560
> > > (XEN)    [<00000000002a5de0>] device_tree_for_each_node+0xbc/0x144
> > > (XEN)    [<00000000002a5ed4>] boot_fdt_info+0x6c/0x260
> > > (XEN)    [<00000000002ac0d0>] start_xen+0x108/0xc74
> > > (XEN)    [<000000000020044c>] arm64/head.o#paging+0x60/0x88
> > > 
> > > During early boot, the page tables are either statically allocated in
> > > Xen binary or allocated via alloc_boot_pages().
> > > 
> > > For statically allocated page-tables, they will already be mapped as
> > > part of Xen binary. So we can easily find the virtual address.
> > > 
> > > For dynamically allocated page-tables, we need to rely
> > > map_domain_page() to be functionally working.
> > > 
> > > For arm32, the call will be usable much before page can be dynamically
> > > allocated (see setup_pagetables()). For arm64, the call will be usable
> > > after setup_xenheap_mappings().
> > > 
> > > In both cases, memory are given to the boot allocator afterwards. So we
> > > can rely on map_domain_page() for mapping page tables allocated
> > > dynamically.
> > > 
> > > The helpers xen_{map, unmap}_table() are now updated to take into
> > > account the case where page-tables are part of Xen binary.
> > > 
> > > Fixes: 022387ee1a ('xen/arm: mm: Don't open-code Xen PT update in {set,
> > > clear}_fixmap()')
> > > Signed-off-by: Julien Grall <julien.grall@arm.com>
> > > ---
> > >   xen/arch/arm/mm.c | 20 ++++++++++++++++++++
> > >   1 file changed, 20 insertions(+)
> > > 
> > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> > > index e1cdeaaf2f..da6303a8fd 100644
> > > --- a/xen/arch/arm/mm.c
> > > +++ b/xen/arch/arm/mm.c
> > > @@ -950,11 +950,31 @@ static int create_xen_table(lpae_t *entry)
> > >     static lpae_t *xen_map_table(mfn_t mfn)
> > >   {
> > > +    /*
> > > +     * We may require to map the page table before map_domain_page() is
> > > +     * useable. The requirements here is it must be useable as soon as
> > > +     * page-tables are allocated dynamically via alloc_boot_pages().
> > > +     */
> > > +    if ( system_state == SYS_STATE_early_boot )
> > > +    {
> > > +        vaddr_t va = mfn_to_maddr(mfn) - phys_offset;
> > > +
> > > +        if ( is_kernel(va) )
> > > +            return (lpae_t *)va;
> > 
> > Is it intended to continue if it is not a xen text page? Shouldn't we
> > BUG() or WARN?
> Yes, I wrote the rationale in the commit message and a summary in a few lines
> above. For convenience, I pasted the commit message again here:
 
The commit message explains what you are doing but I am still missing
something.

Why are we continuing if system_state == SYS_STATE_early_boot and
!is_kernel(va)?

The commit message explains that if system_state == SYS_STATE_early_boot
pagetable pages are static, right? Only after dynamic allocation are
possible it makes sense to use map_domain_page, and dynamic allocations
are possible roughly when system_state switched to SYS_STATE_boot.
Julien Grall Sept. 20, 2019, 3:26 p.m. UTC | #4
Hi,

On 20/09/2019 16:16, Stefano Stabellini wrote:
> On Fri, 20 Sep 2019, Julien Grall wrote:
>> On 20/09/2019 00:37, Stefano Stabellini wrote:
>>> On Tue, 17 Sep 2019, Julien Grall wrote:
>>>> The current implementations of xen_{map, unmap}_table() expect
>>>> {map, unmap}_domain_page() to be usable. Those helpers are used to
>>>> map/unmap page tables while update Xen page-tables.
>>>>
>>>> Since commit 022387ee1a "xen/arm: mm: Don't open-code Xen PT update in
>>>> {set, clear}_fixmap()", setup_fixmap() will make use of the helpers
>>>> mentioned above. When booting Xen using GRUB, setup_fixmap() may be used
>>>> before map_domain_page() can be called. This will result to data abort:
>>>>
>>>> (XEN) Data Abort Trap. Syndrome=0x5
>>>> (XEN) CPU0: Unexpected Trap: Data Abort
>>>>
>>>> [...]
>>>>
>>>> (XEN) Xen call trace:
>>>> (XEN)    [<000000000025ab6c>] mm.c#xen_pt_update+0x2b4/0x59c (PC)
>>>> (XEN)    [<000000000025ab20>] mm.c#xen_pt_update+0x268/0x59c (LR)
>>>> (XEN)    [<000000000025ae70>] set_fixmap+0x1c/0x2c
>>>> (XEN)    [<00000000002a9c98>] copy_from_paddr+0x7c/0xdc
>>>> (XEN)    [<00000000002a4ae0>] has_xsm_magic+0x18/0x34
>>>> (XEN)    [<00000000002a5b5c>] bootfdt.c#early_scan_node+0x398/0x560
>>>> (XEN)    [<00000000002a5de0>] device_tree_for_each_node+0xbc/0x144
>>>> (XEN)    [<00000000002a5ed4>] boot_fdt_info+0x6c/0x260
>>>> (XEN)    [<00000000002ac0d0>] start_xen+0x108/0xc74
>>>> (XEN)    [<000000000020044c>] arm64/head.o#paging+0x60/0x88
>>>>
>>>> During early boot, the page tables are either statically allocated in
>>>> Xen binary or allocated via alloc_boot_pages().
>>>>
>>>> For statically allocated page-tables, they will already be mapped as
>>>> part of Xen binary. So we can easily find the virtual address.
>>>>
>>>> For dynamically allocated page-tables, we need to rely
>>>> map_domain_page() to be functionally working.
>>>>
>>>> For arm32, the call will be usable much before page can be dynamically
>>>> allocated (see setup_pagetables()). For arm64, the call will be usable
>>>> after setup_xenheap_mappings().
>>>>
>>>> In both cases, memory are given to the boot allocator afterwards. So we
>>>> can rely on map_domain_page() for mapping page tables allocated
>>>> dynamically.
>>>>
>>>> The helpers xen_{map, unmap}_table() are now updated to take into
>>>> account the case where page-tables are part of Xen binary.
>>>>
>>>> Fixes: 022387ee1a ('xen/arm: mm: Don't open-code Xen PT update in {set,
>>>> clear}_fixmap()')
>>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>>> ---
>>>>    xen/arch/arm/mm.c | 20 ++++++++++++++++++++
>>>>    1 file changed, 20 insertions(+)
>>>>
>>>> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
>>>> index e1cdeaaf2f..da6303a8fd 100644
>>>> --- a/xen/arch/arm/mm.c
>>>> +++ b/xen/arch/arm/mm.c
>>>> @@ -950,11 +950,31 @@ static int create_xen_table(lpae_t *entry)
>>>>      static lpae_t *xen_map_table(mfn_t mfn)
>>>>    {
>>>> +    /*
>>>> +     * We may require to map the page table before map_domain_page() is
>>>> +     * useable. The requirements here is it must be useable as soon as
>>>> +     * page-tables are allocated dynamically via alloc_boot_pages().
>>>> +     */
>>>> +    if ( system_state == SYS_STATE_early_boot )
>>>> +    {
>>>> +        vaddr_t va = mfn_to_maddr(mfn) - phys_offset;
>>>> +
>>>> +        if ( is_kernel(va) )
>>>> +            return (lpae_t *)va;
>>>
>>> Is it intended to continue if it is not a xen text page? Shouldn't we
>>> BUG() or WARN?
>> Yes, I wrote the rationale in the commit message and a summary in a few lines
>> above. For convenience, I pasted the commit message again here:
>   
> The commit message explains what you are doing but I am still missing
> something.
> 
> Why are we continuing if system_state == SYS_STATE_early_boot and
> !is_kernel(va)?
> 
> The commit message explains that if system_state == SYS_STATE_early_boot
> pagetable pages are static, right? 
That's not correct. Below an excerpt of the commit message:

"During early boot, the page tables are either statically allocated in
Xen binary or allocated via alloc_boot_pages()."

An example of dynamic allocation happening when system_state == 
SYS_STATE_early_boot is in setup_xenheap_mappings(). alloc_boot_pages() will be 
used to allocate intermediate page-tables as the runtime allocator is not yet ready.

> Only after dynamic allocation are
> possible it makes sense to use map_domain_page, and dynamic allocations
> are possible roughly when system_state switched to SYS_STATE_boot.

That's not correct. alloc_boot_pages() is actually here to allow dynamic 
allocation before the memory subsystem (and therefore the runtime allocator) is 
initialized.

Half of the commit message actually explain when dynamic allocation can be used. 
I am not entirely sure what is unclear in it so please suggest a different 
commit message.

Cheers,
Julien Grall Sept. 25, 2019, 3:23 p.m. UTC | #5
Hi,

I am CCing Juergen to mark this as a blocker for 4.13. Without this patch, Xen 
cannot be booted using GRUB.

Cheers,

On 17/09/2019 17:02, Julien Grall wrote:
> The current implementations of xen_{map, unmap}_table() expect
> {map, unmap}_domain_page() to be usable. Those helpers are used to
> map/unmap page tables while update Xen page-tables.
> 
> Since commit 022387ee1a "xen/arm: mm: Don't open-code Xen PT update in
> {set, clear}_fixmap()", setup_fixmap() will make use of the helpers
> mentioned above. When booting Xen using GRUB, setup_fixmap() may be used
> before map_domain_page() can be called. This will result to data abort:
> 
> (XEN) Data Abort Trap. Syndrome=0x5
> (XEN) CPU0: Unexpected Trap: Data Abort
> 
> [...]
> 
> (XEN) Xen call trace:
> (XEN)    [<000000000025ab6c>] mm.c#xen_pt_update+0x2b4/0x59c (PC)
> (XEN)    [<000000000025ab20>] mm.c#xen_pt_update+0x268/0x59c (LR)
> (XEN)    [<000000000025ae70>] set_fixmap+0x1c/0x2c
> (XEN)    [<00000000002a9c98>] copy_from_paddr+0x7c/0xdc
> (XEN)    [<00000000002a4ae0>] has_xsm_magic+0x18/0x34
> (XEN)    [<00000000002a5b5c>] bootfdt.c#early_scan_node+0x398/0x560
> (XEN)    [<00000000002a5de0>] device_tree_for_each_node+0xbc/0x144
> (XEN)    [<00000000002a5ed4>] boot_fdt_info+0x6c/0x260
> (XEN)    [<00000000002ac0d0>] start_xen+0x108/0xc74
> (XEN)    [<000000000020044c>] arm64/head.o#paging+0x60/0x88
> 
> During early boot, the page tables are either statically allocated in
> Xen binary or allocated via alloc_boot_pages().
> 
> For statically allocated page-tables, they will already be mapped as
> part of Xen binary. So we can easily find the virtual address.
> 
> For dynamically allocated page-tables, we need to rely
> map_domain_page() to be functionally working.
> 
> For arm32, the call will be usable much before page can be dynamically
> allocated (see setup_pagetables()). For arm64, the call will be usable
> after setup_xenheap_mappings().
> 
> In both cases, memory are given to the boot allocator afterwards. So we
> can rely on map_domain_page() for mapping page tables allocated
> dynamically.
> 
> The helpers xen_{map, unmap}_table() are now updated to take into
> account the case where page-tables are part of Xen binary.
> 
> Fixes: 022387ee1a ('xen/arm: mm: Don't open-code Xen PT update in {set, clear}_fixmap()')
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> ---
>   xen/arch/arm/mm.c | 20 ++++++++++++++++++++
>   1 file changed, 20 insertions(+)
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index e1cdeaaf2f..da6303a8fd 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -950,11 +950,31 @@ static int create_xen_table(lpae_t *entry)
>   
>   static lpae_t *xen_map_table(mfn_t mfn)
>   {
> +    /*
> +     * We may require to map the page table before map_domain_page() is
> +     * useable. The requirements here is it must be useable as soon as
> +     * page-tables are allocated dynamically via alloc_boot_pages().
> +     */
> +    if ( system_state == SYS_STATE_early_boot )
> +    {
> +        vaddr_t va = mfn_to_maddr(mfn) - phys_offset;
> +
> +        if ( is_kernel(va) )
> +            return (lpae_t *)va;
> +    }
> +
>       return map_domain_page(mfn);
>   }
>   
>   static void xen_unmap_table(const lpae_t *table)
>   {
> +    /*
> +     * During early boot, xen_map_table() will not use map_domain_page()
> +     * for page-tables residing in Xen binary. So skip the unmap part.
> +     */
> +    if ( system_state == SYS_STATE_early_boot && is_kernel(table) )
> +        return;
> +
>       unmap_domain_page(table);
>   }
>   
>
Juergen Gross Sept. 25, 2019, 3:27 p.m. UTC | #6
On 25.09.19 17:23, Julien Grall wrote:
> Hi,
> 
> I am CCing Juergen to mark this as a blocker for 4.13. Without this 
> patch, Xen cannot be booted using GRUB.
> 
> Cheers,
> 
> On 17/09/2019 17:02, Julien Grall wrote:
>> The current implementations of xen_{map, unmap}_table() expect
>> {map, unmap}_domain_page() to be usable. Those helpers are used to
>> map/unmap page tables while update Xen page-tables.
>>
>> Since commit 022387ee1a "xen/arm: mm: Don't open-code Xen PT update in
>> {set, clear}_fixmap()", setup_fixmap() will make use of the helpers
>> mentioned above. When booting Xen using GRUB, setup_fixmap() may be used
>> before map_domain_page() can be called. This will result to data abort:
>>
>> (XEN) Data Abort Trap. Syndrome=0x5
>> (XEN) CPU0: Unexpected Trap: Data Abort
>>
>> [...]
>>
>> (XEN) Xen call trace:
>> (XEN)    [<000000000025ab6c>] mm.c#xen_pt_update+0x2b4/0x59c (PC)
>> (XEN)    [<000000000025ab20>] mm.c#xen_pt_update+0x268/0x59c (LR)
>> (XEN)    [<000000000025ae70>] set_fixmap+0x1c/0x2c
>> (XEN)    [<00000000002a9c98>] copy_from_paddr+0x7c/0xdc
>> (XEN)    [<00000000002a4ae0>] has_xsm_magic+0x18/0x34
>> (XEN)    [<00000000002a5b5c>] bootfdt.c#early_scan_node+0x398/0x560
>> (XEN)    [<00000000002a5de0>] device_tree_for_each_node+0xbc/0x144
>> (XEN)    [<00000000002a5ed4>] boot_fdt_info+0x6c/0x260
>> (XEN)    [<00000000002ac0d0>] start_xen+0x108/0xc74
>> (XEN)    [<000000000020044c>] arm64/head.o#paging+0x60/0x88
>>
>> During early boot, the page tables are either statically allocated in
>> Xen binary or allocated via alloc_boot_pages().
>>
>> For statically allocated page-tables, they will already be mapped as
>> part of Xen binary. So we can easily find the virtual address.
>>
>> For dynamically allocated page-tables, we need to rely
>> map_domain_page() to be functionally working.
>>
>> For arm32, the call will be usable much before page can be dynamically
>> allocated (see setup_pagetables()). For arm64, the call will be usable
>> after setup_xenheap_mappings().
>>
>> In both cases, memory are given to the boot allocator afterwards. So we
>> can rely on map_domain_page() for mapping page tables allocated
>> dynamically.
>>
>> The helpers xen_{map, unmap}_table() are now updated to take into
>> account the case where page-tables are part of Xen binary.
>>
>> Fixes: 022387ee1a ('xen/arm: mm: Don't open-code Xen PT update in 
>> {set, clear}_fixmap()')
>> Signed-off-by: Julien Grall <julien.grall@arm.com>

Release-acked-by: Juergen Gross <jgross@suse.com>


Juergen
Julien Grall Oct. 2, 2019, 5:49 p.m. UTC | #7
Hi,

Gentle ping.

Cheers,

On 9/20/19 4:26 PM, Julien Grall wrote:
> Hi,
> 
> On 20/09/2019 16:16, Stefano Stabellini wrote:
>> On Fri, 20 Sep 2019, Julien Grall wrote:
>>> On 20/09/2019 00:37, Stefano Stabellini wrote:
>>>> On Tue, 17 Sep 2019, Julien Grall wrote:
>>>>> The current implementations of xen_{map, unmap}_table() expect
>>>>> {map, unmap}_domain_page() to be usable. Those helpers are used to
>>>>> map/unmap page tables while update Xen page-tables.
>>>>>
>>>>> Since commit 022387ee1a "xen/arm: mm: Don't open-code Xen PT update in
>>>>> {set, clear}_fixmap()", setup_fixmap() will make use of the helpers
>>>>> mentioned above. When booting Xen using GRUB, setup_fixmap() may be 
>>>>> used
>>>>> before map_domain_page() can be called. This will result to data 
>>>>> abort:
>>>>>
>>>>> (XEN) Data Abort Trap. Syndrome=0x5
>>>>> (XEN) CPU0: Unexpected Trap: Data Abort
>>>>>
>>>>> [...]
>>>>>
>>>>> (XEN) Xen call trace:
>>>>> (XEN)    [<000000000025ab6c>] mm.c#xen_pt_update+0x2b4/0x59c (PC)
>>>>> (XEN)    [<000000000025ab20>] mm.c#xen_pt_update+0x268/0x59c (LR)
>>>>> (XEN)    [<000000000025ae70>] set_fixmap+0x1c/0x2c
>>>>> (XEN)    [<00000000002a9c98>] copy_from_paddr+0x7c/0xdc
>>>>> (XEN)    [<00000000002a4ae0>] has_xsm_magic+0x18/0x34
>>>>> (XEN)    [<00000000002a5b5c>] bootfdt.c#early_scan_node+0x398/0x560
>>>>> (XEN)    [<00000000002a5de0>] device_tree_for_each_node+0xbc/0x144
>>>>> (XEN)    [<00000000002a5ed4>] boot_fdt_info+0x6c/0x260
>>>>> (XEN)    [<00000000002ac0d0>] start_xen+0x108/0xc74
>>>>> (XEN)    [<000000000020044c>] arm64/head.o#paging+0x60/0x88
>>>>>
>>>>> During early boot, the page tables are either statically allocated in
>>>>> Xen binary or allocated via alloc_boot_pages().
>>>>>
>>>>> For statically allocated page-tables, they will already be mapped as
>>>>> part of Xen binary. So we can easily find the virtual address.
>>>>>
>>>>> For dynamically allocated page-tables, we need to rely
>>>>> map_domain_page() to be functionally working.
>>>>>
>>>>> For arm32, the call will be usable much before page can be dynamically
>>>>> allocated (see setup_pagetables()). For arm64, the call will be usable
>>>>> after setup_xenheap_mappings().
>>>>>
>>>>> In both cases, memory are given to the boot allocator afterwards. 
>>>>> So we
>>>>> can rely on map_domain_page() for mapping page tables allocated
>>>>> dynamically.
>>>>>
>>>>> The helpers xen_{map, unmap}_table() are now updated to take into
>>>>> account the case where page-tables are part of Xen binary.
>>>>>
>>>>> Fixes: 022387ee1a ('xen/arm: mm: Don't open-code Xen PT update in 
>>>>> {set,
>>>>> clear}_fixmap()')
>>>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>>>> ---
>>>>>    xen/arch/arm/mm.c | 20 ++++++++++++++++++++
>>>>>    1 file changed, 20 insertions(+)
>>>>>
>>>>> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
>>>>> index e1cdeaaf2f..da6303a8fd 100644
>>>>> --- a/xen/arch/arm/mm.c
>>>>> +++ b/xen/arch/arm/mm.c
>>>>> @@ -950,11 +950,31 @@ static int create_xen_table(lpae_t *entry)
>>>>>      static lpae_t *xen_map_table(mfn_t mfn)
>>>>>    {
>>>>> +    /*
>>>>> +     * We may require to map the page table before 
>>>>> map_domain_page() is
>>>>> +     * useable. The requirements here is it must be useable as 
>>>>> soon as
>>>>> +     * page-tables are allocated dynamically via alloc_boot_pages().
>>>>> +     */
>>>>> +    if ( system_state == SYS_STATE_early_boot )
>>>>> +    {
>>>>> +        vaddr_t va = mfn_to_maddr(mfn) - phys_offset;
>>>>> +
>>>>> +        if ( is_kernel(va) )
>>>>> +            return (lpae_t *)va;
>>>>
>>>> Is it intended to continue if it is not a xen text page? Shouldn't we
>>>> BUG() or WARN?
>>> Yes, I wrote the rationale in the commit message and a summary in a 
>>> few lines
>>> above. For convenience, I pasted the commit message again here:
>> The commit message explains what you are doing but I am still missing
>> something.
>>
>> Why are we continuing if system_state == SYS_STATE_early_boot and
>> !is_kernel(va)?
>>
>> The commit message explains that if system_state == SYS_STATE_early_boot
>> pagetable pages are static, right? 
> That's not correct. Below an excerpt of the commit message:
> 
> "During early boot, the page tables are either statically allocated in
> Xen binary or allocated via alloc_boot_pages()."
> 
> An example of dynamic allocation happening when system_state == 
> SYS_STATE_early_boot is in setup_xenheap_mappings(). alloc_boot_pages() 
> will be used to allocate intermediate page-tables as the runtime 
> allocator is not yet ready.
> 
>> Only after dynamic allocation are
>> possible it makes sense to use map_domain_page, and dynamic allocations
>> are possible roughly when system_state switched to SYS_STATE_boot.
> 
> That's not correct. alloc_boot_pages() is actually here to allow dynamic 
> allocation before the memory subsystem (and therefore the runtime 
> allocator) is initialized.
> 
> Half of the commit message actually explain when dynamic allocation can 
> be used. I am not entirely sure what is unclear in it so please suggest 
> a different commit message.
> 
> Cheers,
>
Stefano Stabellini Oct. 3, 2019, 1:02 a.m. UTC | #8
On Fri, 20 Sep 2019, Julien Grall wrote:
> Hi,
> 
> On 20/09/2019 16:16, Stefano Stabellini wrote:
> > On Fri, 20 Sep 2019, Julien Grall wrote:
> > > On 20/09/2019 00:37, Stefano Stabellini wrote:
> > > > On Tue, 17 Sep 2019, Julien Grall wrote:
> > > > > The current implementations of xen_{map, unmap}_table() expect
> > > > > {map, unmap}_domain_page() to be usable. Those helpers are used to
> > > > > map/unmap page tables while update Xen page-tables.
> > > > > 
> > > > > Since commit 022387ee1a "xen/arm: mm: Don't open-code Xen PT update in
> > > > > {set, clear}_fixmap()", setup_fixmap() will make use of the helpers
> > > > > mentioned above. When booting Xen using GRUB, setup_fixmap() may be
> > > > > used
> > > > > before map_domain_page() can be called. This will result to data
> > > > > abort:
> > > > > 
> > > > > (XEN) Data Abort Trap. Syndrome=0x5
> > > > > (XEN) CPU0: Unexpected Trap: Data Abort
> > > > > 
> > > > > [...]
> > > > > 
> > > > > (XEN) Xen call trace:
> > > > > (XEN)    [<000000000025ab6c>] mm.c#xen_pt_update+0x2b4/0x59c (PC)
> > > > > (XEN)    [<000000000025ab20>] mm.c#xen_pt_update+0x268/0x59c (LR)
> > > > > (XEN)    [<000000000025ae70>] set_fixmap+0x1c/0x2c
> > > > > (XEN)    [<00000000002a9c98>] copy_from_paddr+0x7c/0xdc
> > > > > (XEN)    [<00000000002a4ae0>] has_xsm_magic+0x18/0x34
> > > > > (XEN)    [<00000000002a5b5c>] bootfdt.c#early_scan_node+0x398/0x560
> > > > > (XEN)    [<00000000002a5de0>] device_tree_for_each_node+0xbc/0x144
> > > > > (XEN)    [<00000000002a5ed4>] boot_fdt_info+0x6c/0x260
> > > > > (XEN)    [<00000000002ac0d0>] start_xen+0x108/0xc74
> > > > > (XEN)    [<000000000020044c>] arm64/head.o#paging+0x60/0x88
> > > > > 
> > > > > During early boot, the page tables are either statically allocated in
> > > > > Xen binary or allocated via alloc_boot_pages().
> > > > > 
> > > > > For statically allocated page-tables, they will already be mapped as
> > > > > part of Xen binary. So we can easily find the virtual address.
> > > > > 
> > > > > For dynamically allocated page-tables, we need to rely
> > > > > map_domain_page() to be functionally working.
> > > > > 
> > > > > For arm32, the call will be usable much before page can be dynamically
> > > > > allocated (see setup_pagetables()). For arm64, the call will be usable
> > > > > after setup_xenheap_mappings().
> > > > > 
> > > > > In both cases, memory are given to the boot allocator afterwards. So
> > > > > we
> > > > > can rely on map_domain_page() for mapping page tables allocated
> > > > > dynamically.
> > > > > 
> > > > > The helpers xen_{map, unmap}_table() are now updated to take into
> > > > > account the case where page-tables are part of Xen binary.
> > > > > 
> > > > > Fixes: 022387ee1a ('xen/arm: mm: Don't open-code Xen PT update in
> > > > > {set,
> > > > > clear}_fixmap()')
> > > > > Signed-off-by: Julien Grall <julien.grall@arm.com>
> > > > > ---
> > > > >    xen/arch/arm/mm.c | 20 ++++++++++++++++++++
> > > > >    1 file changed, 20 insertions(+)
> > > > > 
> > > > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> > > > > index e1cdeaaf2f..da6303a8fd 100644
> > > > > --- a/xen/arch/arm/mm.c
> > > > > +++ b/xen/arch/arm/mm.c
> > > > > @@ -950,11 +950,31 @@ static int create_xen_table(lpae_t *entry)
> > > > >      static lpae_t *xen_map_table(mfn_t mfn)
> > > > >    {
> > > > > +    /*
> > > > > +     * We may require to map the page table before map_domain_page()
> > > > > is
> > > > > +     * useable. The requirements here is it must be useable as soon
> > > > > as
> > > > > +     * page-tables are allocated dynamically via alloc_boot_pages().
> > > > > +     */
> > > > > +    if ( system_state == SYS_STATE_early_boot )
> > > > > +    {
> > > > > +        vaddr_t va = mfn_to_maddr(mfn) - phys_offset;
> > > > > +
> > > > > +        if ( is_kernel(va) )
> > > > > +            return (lpae_t *)va;
> > > > 
> > > > Is it intended to continue if it is not a xen text page? Shouldn't we
> > > > BUG() or WARN?
> > > Yes, I wrote the rationale in the commit message and a summary in a few
> > > lines
> > > above. For convenience, I pasted the commit message again here:
> >   The commit message explains what you are doing but I am still missing
> > something.
> > 
> > Why are we continuing if system_state == SYS_STATE_early_boot and
> > !is_kernel(va)?
> > 
> > The commit message explains that if system_state == SYS_STATE_early_boot
> > pagetable pages are static, right? 
> That's not correct. Below an excerpt of the commit message:
> 
> "During early boot, the page tables are either statically allocated in
> Xen binary or allocated via alloc_boot_pages()."
> 
> An example of dynamic allocation happening when system_state ==
> SYS_STATE_early_boot is in setup_xenheap_mappings(). alloc_boot_pages() will
> be used to allocate intermediate page-tables as the runtime allocator is not
> yet ready.
> 
> > Only after dynamic allocation are
> > possible it makes sense to use map_domain_page, and dynamic allocations
> > are possible roughly when system_state switched to SYS_STATE_boot.
> 
> That's not correct. alloc_boot_pages() is actually here to allow dynamic
> allocation before the memory subsystem (and therefore the runtime allocator)
> is initialized.

Let me change the question then: is the system_state ==
SYS_STATE_early_boot check strictly necessary? It looks like it is not:
the patch would work even if it was just:


diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 9e0fdc39f9..eee7d080c0 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -963,11 +963,19 @@ static int create_xen_table(lpae_t *entry)
 
 static lpae_t *xen_map_table(mfn_t mfn)
 {
+    vaddr_t va = mfn_to_maddr(mfn) - phys_offset;
+
+    if ( is_kernel(va) )
+        return (lpae_t *)va;
+
     return map_domain_page(mfn);
 }
 
 static void xen_unmap_table(const lpae_t *table)
 {
+    if ( is_kernel(table) )
+        return;
+
     unmap_domain_page(table);
 }
 

Is that right? Note that I am not asking you to change the patch, I am
only trying to make sure I am understanding all the implications.
Julien Grall Oct. 7, 2019, 8:35 p.m. UTC | #9
Hi,

On 03/10/2019 02:02, Stefano Stabellini wrote:
> On Fri, 20 Sep 2019, Julien Grall wrote:
>> That's not correct. alloc_boot_pages() is actually here to allow dynamic
>> allocation before the memory subsystem (and therefore the runtime allocator)
>> is initialized.
> 
> Let me change the question then: is the system_state ==
> SYS_STATE_early_boot check strictly necessary? It looks like it is not:
> the patch would work even if it was just:

I had a few thoughts about it. On Arm32, this only really works for 
32-bits machine address (it can go up to 40-bits). I haven't really 
fully investigated what could go wrong, but it would be best to keep it 
only for early boot.

Also, I don't really want to rely on this "workaround" after boot. Maybe 
we would want to keep them unmapped in the future.

Cheers,
Stefano Stabellini Oct. 8, 2019, 12:18 a.m. UTC | #10
On Mon, 7 Oct 2019, Julien Grall wrote:
> Hi,
> 
> On 03/10/2019 02:02, Stefano Stabellini wrote:
> > On Fri, 20 Sep 2019, Julien Grall wrote:
> >> That's not correct. alloc_boot_pages() is actually here to allow dynamic
> >> allocation before the memory subsystem (and therefore the runtime allocator)
> >> is initialized.
> > 
> > Let me change the question then: is the system_state ==
> > SYS_STATE_early_boot check strictly necessary? It looks like it is not:
> > the patch would work even if it was just:
> 
> I had a few thoughts about it. On Arm32, this only really works for 
> 32-bits machine address (it can go up to 40-bits). I haven't really 
> fully investigated what could go wrong, but it would be best to keep it 
> only for early boot.
> 
> Also, I don't really want to rely on this "workaround" after boot. Maybe 
> we would want to keep them unmapped in the future.

Yes, no problems, we agree on that. I am not asking in regards to the
check system_state == SYS_STATE_early_boot with the goal of asking you
to get rid of it. I am fine with keeping the check. (Maybe we want to add
an `unlikely()' around the check.)

I am trying to understand whether the code actually relies on
system_state == SYS_STATE_early_boot, and, if so, why. The goal is to
make sure that if there are some limitations that they are documented,
or just to double-check that there are no limitations.

In regards to your comment about only working for 32-bit addresses on
Arm32, you have a point. At least we should be careful with the mfn to
vaddr conversion because mfn_to_maddr returns a paddr_t which is 64-bit
and vaddr_t is 32-bit. I imagine that theoretically, even with
system_state == SYS_STATE_early_boot, it could get truncated with the
wrong combination of mfn and phys_offset.

If nothing else, maybe we should add a truncation check for safety?
Something like the following but that ideally would be applicable to
arm64 too without having to add an #ifdef:

    paddr_t pa = mfn_to_maddr(mfn) - phys_offset;

    if ( pa < _end && is_kernel((vaddr_t)pa) )
        return (lpae_t *)(vaddr_t)pa;
Julien Grall Oct. 8, 2019, 2:02 p.m. UTC | #11
(+ Juergen)

Hi Stefano,

On 10/8/19 1:18 AM, Stefano Stabellini wrote:
> On Mon, 7 Oct 2019, Julien Grall wrote:
>> Hi,
>>
>> On 03/10/2019 02:02, Stefano Stabellini wrote:
>>> On Fri, 20 Sep 2019, Julien Grall wrote:
>>>> That's not correct. alloc_boot_pages() is actually here to allow dynamic
>>>> allocation before the memory subsystem (and therefore the runtime allocator)
>>>> is initialized.
>>>
>>> Let me change the question then: is the system_state ==
>>> SYS_STATE_early_boot check strictly necessary? It looks like it is not:
>>> the patch would work even if it was just:
>>
>> I had a few thoughts about it. On Arm32, this only really works for
>> 32-bits machine address (it can go up to 40-bits). I haven't really
>> fully investigated what could go wrong, but it would be best to keep it
>> only for early boot.
>>
>> Also, I don't really want to rely on this "workaround" after boot. Maybe
>> we would want to keep them unmapped in the future.
> 
> Yes, no problems, we agree on that. I am not asking in regards to the
> check system_state == SYS_STATE_early_boot with the goal of asking you
> to get rid of it. I am fine with keeping the check. (Maybe we want to add
> an `unlikely()' around the check.)
> 
> I am trying to understand whether the code actually relies on
> system_state == SYS_STATE_early_boot, and, if so, why. The goal is to
> make sure that if there are some limitations that they are documented,
> or just to double-check that there are no limitations.

The check is not strictly necessary.

> 
> In regards to your comment about only working for 32-bit addresses on
> Arm32, you have a point. At least we should be careful with the mfn to
> vaddr conversion because mfn_to_maddr returns a paddr_t which is 64-bit
> and vaddr_t is 32-bit. I imagine that theoretically, even with
> system_state == SYS_STATE_early_boot, it could get truncated with the
> wrong combination of mfn and phys_offset.
> 
> If nothing else, maybe we should add a truncation check for safety?

Except that phys_offset is not defined correctly, so your check below 
will break some setup :(. Let's take the following example:

    Xen is loaded at PA 0x100000

The boot offset is computed using 32-bit address (see head.S):
     PA - VA = 0x100000 - 0x200000
             = 0xfff00000

This value will be passed to C code as an unsigned long. But then we 
will store it in a uint64_t/paddr_t (see phys_offset which is set in 
setup_page_tables). Because this is a conversion from unsigned to 
unsigned, the "sign bit" will not be propagated.

This means that phys_offset will be equal to 0xfff00000 and not 
0xfffffffffff00000!

Therefore if we try to convert 0x100000 (where Xen has been loaded) back 
to its VA, the resulting value will be 0xffffffff00200100.

Looking at the code, I think pte_of_xenaddr() has also the exact problem. :(

I guess nobody tried to load Xen that low in memory on Arm32? But that's 
going to be definitely an issues with the memory rework I have in mind.

I have some other important work to finish for Xen 4.13. So I am 
thinking to defer the problem I mention above for post Xen 4.13. 
Although, the GRUB issues would still need to be fix. Any opinions?

Note that this is also more reasons to limit the use of "MA - 
phys_offset". So the mess is kept to boot code.

> Something like the following but that ideally would be applicable to
> arm64 too without having to add an #ifdef:
> 
>      paddr_t pa = mfn_to_maddr(mfn) - phys_offset;
> 
>      if ( pa < _end && is_kernel((vaddr_t)pa) )
>          return (lpae_t *)(vaddr_t)pa;

Cheers,
Stefano Stabellini Oct. 8, 2019, 10:24 p.m. UTC | #12
On Tue, 8 Oct 2019, Julien Grall wrote:
> On 10/8/19 1:18 AM, Stefano Stabellini wrote:
> > On Mon, 7 Oct 2019, Julien Grall wrote:
> > > Hi,
> > > 
> > > On 03/10/2019 02:02, Stefano Stabellini wrote:
> > > > On Fri, 20 Sep 2019, Julien Grall wrote:
> > > > > That's not correct. alloc_boot_pages() is actually here to allow
> > > > > dynamic
> > > > > allocation before the memory subsystem (and therefore the runtime
> > > > > allocator)
> > > > > is initialized.
> > > > 
> > > > Let me change the question then: is the system_state ==
> > > > SYS_STATE_early_boot check strictly necessary? It looks like it is not:
> > > > the patch would work even if it was just:
> > > 
> > > I had a few thoughts about it. On Arm32, this only really works for
> > > 32-bits machine address (it can go up to 40-bits). I haven't really
> > > fully investigated what could go wrong, but it would be best to keep it
> > > only for early boot.
> > > 
> > > Also, I don't really want to rely on this "workaround" after boot. Maybe
> > > we would want to keep them unmapped in the future.
> > 
> > Yes, no problems, we agree on that. I am not asking in regards to the
> > check system_state == SYS_STATE_early_boot with the goal of asking you
> > to get rid of it. I am fine with keeping the check. (Maybe we want to add
> > an `unlikely()' around the check.)
> > 
> > I am trying to understand whether the code actually relies on
> > system_state == SYS_STATE_early_boot, and, if so, why. The goal is to
> > make sure that if there are some limitations that they are documented,
> > or just to double-check that there are no limitations.
> 
> The check is not strictly necessary.
> 
> > 
> > In regards to your comment about only working for 32-bit addresses on
> > Arm32, you have a point. At least we should be careful with the mfn to
> > vaddr conversion because mfn_to_maddr returns a paddr_t which is 64-bit
> > and vaddr_t is 32-bit. I imagine that theoretically, even with
> > system_state == SYS_STATE_early_boot, it could get truncated with the
> > wrong combination of mfn and phys_offset.
> > 
> > If nothing else, maybe we should add a truncation check for safety?
> 
> Except that phys_offset is not defined correctly, so your check below will
> break some setup :(. Let's take the following example:
> 
>    Xen is loaded at PA 0x100000
> 
> The boot offset is computed using 32-bit address (see head.S):
>     PA - VA = 0x100000 - 0x200000
>             = 0xfff00000
> 
> This value will be passed to C code as an unsigned long. But then we will
> store it in a uint64_t/paddr_t (see phys_offset which is set in
> setup_page_tables). Because this is a conversion from unsigned to unsigned,
> the "sign bit" will not be propagated.
>
> This means that phys_offset will be equal to 0xfff00000 and not
> 0xfffffffffff00000!
> 
> Therefore if we try to convert 0x100000 (where Xen has been loaded) back to
> its VA, the resulting value will be 0xffffffff00200100.
>
> Looking at the code, I think pte_of_xenaddr() has also the exact problem. :(

One way to fix it would be to change phys_offset to register_t (or just
declare it as unsigned long on arm32 and unsigned long long on arm64):

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index be23acfe26..c7ead39654 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -170,7 +170,7 @@ static DEFINE_PAGE_TABLE(xen_xenmap);
 /* Non-boot CPUs use this to find the correct pagetables. */
 uint64_t init_ttbr;
 
-static paddr_t phys_offset;
+static register_t phys_offset;
 
 /* Limits of the Xen heap */
 mfn_t xenheap_mfn_start __read_mostly = INVALID_MFN_INITIALIZER;

It would work OK for virtual to physical conversions. Physical to
virtual is more challenging because physical could go up to 40bits.
Fortunately, it doesn't look like we are using phys_offset to do many
phys-to-virt conversions. The only example is the one in this patch,
and dump_hyp_walk.


> I guess nobody tried to load Xen that low in memory on Arm32? But that's going
> to be definitely an issues with the memory rework I have in mind.
> 
> I have some other important work to finish for Xen 4.13. So I am thinking to
> defer the problem I mention above for post Xen 4.13. Although, the GRUB issues
> would still need to be fix. Any opinions?

I think for Xen 4.13 I would like to get in your patch to fix GRUB
booting, with a little bit more attention to integer issues. Something
like the following:

    paddr_t pa_end = _end + phys_offset;
    paddr_t pa_start = _start + phys_offset;
    paddr_t pa = mfn_to_maddr(mfn);

    if ( pa < pa_end && pa >= pa_start )
        return (lpae_t *)(vaddr_t)(pa - pa_start + XEN_VIRT_ADDR);

I think it should work if phys_offset is register_t. Or we could have a
check on pa >= 4G, something like:

    paddr_t pa = (register_t)mfn_to_maddr(mfn) - phys_offset;

    if ( mfn_to_maddr(mfn) <= UINT32_MAX &&
         pa < _end &&
         is_kernel((vaddr_t)pa) )
        return (lpae_t *)(vaddr_t)pa;


> Note that this is also more reasons to limit the use of "MA - phys_offset". So
> the mess is kept to boot code.
> 
> > Something like the following but that ideally would be applicable to
> > arm64 too without having to add an #ifdef:
> > 
> >      paddr_t pa = mfn_to_maddr(mfn) - phys_offset;
> > 
> >      if ( pa < _end && is_kernel((vaddr_t)pa) )
> >          return (lpae_t *)(vaddr_t)pa;
Julien Grall Oct. 10, 2019, 4:56 p.m. UTC | #13
Hi Stefano,

On 08/10/2019 23:24, Stefano Stabellini wrote:
> On Tue, 8 Oct 2019, Julien Grall wrote:
>> On 10/8/19 1:18 AM, Stefano Stabellini wrote:
>>> On Mon, 7 Oct 2019, Julien Grall wrote:
>>>> Hi,
>>>>
>>>> On 03/10/2019 02:02, Stefano Stabellini wrote:
>>>>> On Fri, 20 Sep 2019, Julien Grall wrote:
>>>>>> That's not correct. alloc_boot_pages() is actually here to allow
>>>>>> dynamic
>>>>>> allocation before the memory subsystem (and therefore the runtime
>>>>>> allocator)
>>>>>> is initialized.
>>>>>
>>>>> Let me change the question then: is the system_state ==
>>>>> SYS_STATE_early_boot check strictly necessary? It looks like it is not:
>>>>> the patch would work even if it was just:
>>>>
>>>> I had a few thoughts about it. On Arm32, this only really works for
>>>> 32-bits machine address (it can go up to 40-bits). I haven't really
>>>> fully investigated what could go wrong, but it would be best to keep it
>>>> only for early boot.
>>>>
>>>> Also, I don't really want to rely on this "workaround" after boot. Maybe
>>>> we would want to keep them unmapped in the future.
>>>
>>> Yes, no problems, we agree on that. I am not asking in regards to the
>>> check system_state == SYS_STATE_early_boot with the goal of asking you
>>> to get rid of it. I am fine with keeping the check. (Maybe we want to add
>>> an `unlikely()' around the check.)
>>>
>>> I am trying to understand whether the code actually relies on
>>> system_state == SYS_STATE_early_boot, and, if so, why. The goal is to
>>> make sure that if there are some limitations that they are documented,
>>> or just to double-check that there are no limitations.
>>
>> The check is not strictly necessary.
>>
>>>
>>> In regards to your comment about only working for 32-bit addresses on
>>> Arm32, you have a point. At least we should be careful with the mfn to
>>> vaddr conversion because mfn_to_maddr returns a paddr_t which is 64-bit
>>> and vaddr_t is 32-bit. I imagine that theoretically, even with
>>> system_state == SYS_STATE_early_boot, it could get truncated with the
>>> wrong combination of mfn and phys_offset.
>>>
>>> If nothing else, maybe we should add a truncation check for safety?
>>
>> Except that phys_offset is not defined correctly, so your check below will
>> break some setup :(. Let's take the following example:
>>
>>     Xen is loaded at PA 0x100000
>>
>> The boot offset is computed using 32-bit address (see head.S):
>>      PA - VA = 0x100000 - 0x200000
>>              = 0xfff00000
>>
>> This value will be passed to C code as an unsigned long. But then we will
>> store it in a uint64_t/paddr_t (see phys_offset which is set in
>> setup_page_tables). Because this is a conversion from unsigned to unsigned,
>> the "sign bit" will not be propagated.
>>
>> This means that phys_offset will be equal to 0xfff00000 and not
>> 0xfffffffffff00000!
>>
>> Therefore if we try to convert 0x100000 (where Xen has been loaded) back to
>> its VA, the resulting value will be 0xffffffff00200100.
>>
>> Looking at the code, I think pte_of_xenaddr() has also the exact problem. :(
> 
> One way to fix it would be to change phys_offset to register_t (or just
> declare it as unsigned long on arm32 and unsigned long long on arm64):

sizeof (unsigned long) = 32 (resp. 64) on Arm32 (resp. arm64). This is what we 
already rely on for boot_phys_offset (see setup_pagetables). So I am not sure 
why phys_offset needs to be defined differently.

An alternative is to use vaddr_t.

> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index be23acfe26..c7ead39654 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -170,7 +170,7 @@ static DEFINE_PAGE_TABLE(xen_xenmap);
>   /* Non-boot CPUs use this to find the correct pagetables. */
>   uint64_t init_ttbr;
>   
> -static paddr_t phys_offset;
> +static register_t phys_offset;
>   
>   /* Limits of the Xen heap */
>   mfn_t xenheap_mfn_start __read_mostly = INVALID_MFN_INITIALIZER;
> 
> It would work OK for virtual to physical conversions. 

This is because the addition will be done using vaddr_t (aka unsigned long) and 
then promote to 64-bit, right?

Although, I am not too happy with making this assumption. I would much prefer if 
we keep phys_offset a paddr_t and make sure sign-bit is propagated. So this 
would cater user doing the conversion either using 64-bit or 32-bit.

> Physical to
> virtual is more challenging because physical could go up to 40bits.
> Fortunately, it doesn't look like we are using phys_offset to do many
> phys-to-virt conversions. The only example is the one in this patch,
> and dump_hyp_walk.
> 
> 
>> I guess nobody tried to load Xen that low in memory on Arm32? But that's going
>> to be definitely an issues with the memory rework I have in mind.
>>
>> I have some other important work to finish for Xen 4.13. So I am thinking to
>> defer the problem I mention above for post Xen 4.13. Although, the GRUB issues
>> would still need to be fix. Any opinions?
> 
> I think for Xen 4.13 I would like to get in your patch to fix GRUB
> booting, with a little bit more attention to integer issues. Something
> like the following:
> 
>      paddr_t pa_end = _end + phys_offset;
>      paddr_t pa_start = _start + phys_offset;

This could be a good idea. Thinking a bit more, we don't need to use phys_offset 
here. We could use hardware translation directly (via virt_to_maddr).

So we would keep phys_offset as it is today for Xen 4.13. For next, we can queue 
a patch to completely drop it. Let me have a think and try to come up with 
something.

>      paddr_t pa = mfn_to_maddr(mfn);
> 
>      if ( pa < pa_end && pa >= pa_start )
>          return (lpae_t *)(vaddr_t)(pa - pa_start + XEN_VIRT_ADDR);
> 
> I think it should work if phys_offset is register_t. Or we could have a
> check on pa >= 4G, something like:
> 
>      paddr_t pa = (register_t)mfn_to_maddr(mfn) - phys_offset;
> 
>      if ( mfn_to_maddr(mfn) <= UINT32_MAX &&
>           pa < _end &&
>           is_kernel((vaddr_t)pa) )
>          return (lpae_t *)(vaddr_t)pa;

There is nothing preventing Xen to be loaded above 4GB on Arm64. For instance on 
AMD Seattle, Xen is loaded after at 512GB.

> 
> 
>> Note that this is also more reasons to limit the use of "MA - phys_offset". So
>> the mess is kept to boot code.
>>
>>> Something like the following but that ideally would be applicable to
>>> arm64 too without having to add an #ifdef:
>>>
>>>       paddr_t pa = mfn_to_maddr(mfn) - phys_offset;
>>>
>>>       if ( pa < _end && is_kernel((vaddr_t)pa) )
>>>           return (lpae_t *)(vaddr_t)pa;

Cheers,
Stefano Stabellini Oct. 11, 2019, 12:32 a.m. UTC | #14
On Thu, 10 Oct 2019, Julien Grall wrote:
> On 08/10/2019 23:24, Stefano Stabellini wrote:
> > On Tue, 8 Oct 2019, Julien Grall wrote:
> > > On 10/8/19 1:18 AM, Stefano Stabellini wrote:
> > > > On Mon, 7 Oct 2019, Julien Grall wrote:
> > > > > Hi,
> > > > > 
> > > > > On 03/10/2019 02:02, Stefano Stabellini wrote:
> > > > > > On Fri, 20 Sep 2019, Julien Grall wrote:
> > > > > > > That's not correct. alloc_boot_pages() is actually here to allow
> > > > > > > dynamic
> > > > > > > allocation before the memory subsystem (and therefore the runtime
> > > > > > > allocator)
> > > > > > > is initialized.
> > > > > > 
> > > > > > Let me change the question then: is the system_state ==
> > > > > > SYS_STATE_early_boot check strictly necessary? It looks like it is
> > > > > > not:
> > > > > > the patch would work even if it was just:
> > > > > 
> > > > > I had a few thoughts about it. On Arm32, this only really works for
> > > > > 32-bits machine address (it can go up to 40-bits). I haven't really
> > > > > fully investigated what could go wrong, but it would be best to keep
> > > > > it
> > > > > only for early boot.
> > > > > 
> > > > > Also, I don't really want to rely on this "workaround" after boot.
> > > > > Maybe
> > > > > we would want to keep them unmapped in the future.
> > > > 
> > > > Yes, no problems, we agree on that. I am not asking in regards to the
> > > > check system_state == SYS_STATE_early_boot with the goal of asking you
> > > > to get rid of it. I am fine with keeping the check. (Maybe we want to
> > > > add
> > > > an `unlikely()' around the check.)
> > > > 
> > > > I am trying to understand whether the code actually relies on
> > > > system_state == SYS_STATE_early_boot, and, if so, why. The goal is to
> > > > make sure that if there are some limitations that they are documented,
> > > > or just to double-check that there are no limitations.
> > > 
> > > The check is not strictly necessary.
> > > 
> > > > 
> > > > In regards to your comment about only working for 32-bit addresses on
> > > > Arm32, you have a point. At least we should be careful with the mfn to
> > > > vaddr conversion because mfn_to_maddr returns a paddr_t which is 64-bit
> > > > and vaddr_t is 32-bit. I imagine that theoretically, even with
> > > > system_state == SYS_STATE_early_boot, it could get truncated with the
> > > > wrong combination of mfn and phys_offset.
> > > > 
> > > > If nothing else, maybe we should add a truncation check for safety?
> > > 
> > > Except that phys_offset is not defined correctly, so your check below will
> > > break some setup :(. Let's take the following example:
> > > 
> > >     Xen is loaded at PA 0x100000
> > > 
> > > The boot offset is computed using 32-bit address (see head.S):
> > >      PA - VA = 0x100000 - 0x200000
> > >              = 0xfff00000
> > > 
> > > This value will be passed to C code as an unsigned long. But then we will
> > > store it in a uint64_t/paddr_t (see phys_offset which is set in
> > > setup_page_tables). Because this is a conversion from unsigned to
> > > unsigned,
> > > the "sign bit" will not be propagated.
> > > 
> > > This means that phys_offset will be equal to 0xfff00000 and not
> > > 0xfffffffffff00000!
> > > 
> > > Therefore if we try to convert 0x100000 (where Xen has been loaded) back
> > > to
> > > its VA, the resulting value will be 0xffffffff00200100.
> > > 
> > > Looking at the code, I think pte_of_xenaddr() has also the exact problem.
> > > :(
> > 
> > One way to fix it would be to change phys_offset to register_t (or just
> > declare it as unsigned long on arm32 and unsigned long long on arm64):
> 
> sizeof (unsigned long) = 32 (resp. 64) on Arm32 (resp. arm64). This is what we
> already rely on for boot_phys_offset (see setup_pagetables). So I am not sure
> why phys_offset needs to be defined differently.
> 
> An alternative is to use vaddr_t.

Yes, I meant like vaddr_t or just "unsigned long" like boot_phys_offset.
Even with your latest patch
(https://marc.info/?l=xen-devel&m=157073004830894) which I like as a way
to solve the original GRUB booting issue, it looks like we also need to
change phys_addr to unsigned long to fix other arm32 problems.

Are you going to send the patch for that too?
Xia, Hongyan Oct. 11, 2019, 9:53 a.m. UTC | #15
Not commenting on the patch, but I had exactly the same problem when
removing the direct map in x86. map_domain_page has to be usable
without the direct map and even before alloc_boot_pages can be used (so
that I can map the bootmem_regions_list, although I have made it
statically allocated now).

The direct map teardown series in the end introduces a persistent
mapping infrastructure to map page tables which occupies a few fixmap
entries, and avoid set_fixmap but instead modify the entries directly
to avoid invocation loops. The result is that map_xen_pagetable and
map_domain_page is usable from the very beginning of start_xen.

Hongyan

On Tue, 2019-09-17 at 17:02 +0100, Julien Grall wrote:
> The current implementations of xen_{map, unmap}_table() expect
> {map, unmap}_domain_page() to be usable. Those helpers are used to
> map/unmap page tables while update Xen page-tables.
> 
> Since commit 022387ee1a "xen/arm: mm: Don't open-code Xen PT update
> in
> {set, clear}_fixmap()", setup_fixmap() will make use of the helpers
> mentioned above. When booting Xen using GRUB, setup_fixmap() may be
> used
> before map_domain_page() can be called. This will result to data
> abort:
> 
> (XEN) Data Abort Trap. Syndrome=0x5
> (XEN) CPU0: Unexpected Trap: Data Abort
> 
> [...]
> 
> (XEN) Xen call trace:
> (XEN)    [<000000000025ab6c>] mm.c#xen_pt_update+0x2b4/0x59c (PC)
> (XEN)    [<000000000025ab20>] mm.c#xen_pt_update+0x268/0x59c (LR)
> (XEN)    [<000000000025ae70>] set_fixmap+0x1c/0x2c
> (XEN)    [<00000000002a9c98>] copy_from_paddr+0x7c/0xdc
> (XEN)    [<00000000002a4ae0>] has_xsm_magic+0x18/0x34
> (XEN)    [<00000000002a5b5c>] bootfdt.c#early_scan_node+0x398/0x560
> (XEN)    [<00000000002a5de0>] device_tree_for_each_node+0xbc/0x144
> (XEN)    [<00000000002a5ed4>] boot_fdt_info+0x6c/0x260
> (XEN)    [<00000000002ac0d0>] start_xen+0x108/0xc74
> (XEN)    [<000000000020044c>] arm64/head.o#paging+0x60/0x88
> 
> During early boot, the page tables are either statically allocated in
> Xen binary or allocated via alloc_boot_pages().
> 
> For statically allocated page-tables, they will already be mapped as
> part of Xen binary. So we can easily find the virtual address.
> 
> For dynamically allocated page-tables, we need to rely
> map_domain_page() to be functionally working.
> 
> For arm32, the call will be usable much before page can be
> dynamically
> allocated (see setup_pagetables()). For arm64, the call will be
> usable
> after setup_xenheap_mappings().
> 
> In both cases, memory are given to the boot allocator afterwards. So
> we
> can rely on map_domain_page() for mapping page tables allocated
> dynamically.
> 
> The helpers xen_{map, unmap}_table() are now updated to take into
> account the case where page-tables are part of Xen binary.
> 
> Fixes: 022387ee1a ('xen/arm: mm: Don't open-code Xen PT update in
> {set, clear}_fixmap()')
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> ---
>  xen/arch/arm/mm.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index e1cdeaaf2f..da6303a8fd 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -950,11 +950,31 @@ static int create_xen_table(lpae_t *entry)
>  
>  static lpae_t *xen_map_table(mfn_t mfn)
>  {
> +    /*
> +     * We may require to map the page table before map_domain_page()
> is
> +     * useable. The requirements here is it must be useable as soon
> as
> +     * page-tables are allocated dynamically via alloc_boot_pages().
> +     */
> +    if ( system_state == SYS_STATE_early_boot )
> +    {
> +        vaddr_t va = mfn_to_maddr(mfn) - phys_offset;
> +
> +        if ( is_kernel(va) )
> +            return (lpae_t *)va;
> +    }
> +
>      return map_domain_page(mfn);
>  }
>  
>  static void xen_unmap_table(const lpae_t *table)
>  {
> +    /*
> +     * During early boot, xen_map_table() will not use
> map_domain_page()
> +     * for page-tables residing in Xen binary. So skip the unmap
> part.
> +     */
> +    if ( system_state == SYS_STATE_early_boot && is_kernel(table) )
> +        return;
> +
>      unmap_domain_page(table);
>  }
>
Julien Grall Oct. 11, 2019, 10:14 a.m. UTC | #16
Hi,

On 11/10/2019 01:32, Stefano Stabellini wrote:
> On Thu, 10 Oct 2019, Julien Grall wrote:
>> On 08/10/2019 23:24, Stefano Stabellini wrote:
>>> On Tue, 8 Oct 2019, Julien Grall wrote:
>>>> On 10/8/19 1:18 AM, Stefano Stabellini wrote:
>>>>> On Mon, 7 Oct 2019, Julien Grall wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 03/10/2019 02:02, Stefano Stabellini wrote:
>>>>>>> On Fri, 20 Sep 2019, Julien Grall wrote:
>>>>>>>> That's not correct. alloc_boot_pages() is actually here to allow
>>>>>>>> dynamic
>>>>>>>> allocation before the memory subsystem (and therefore the runtime
>>>>>>>> allocator)
>>>>>>>> is initialized.
>>>>>>>
>>>>>>> Let me change the question then: is the system_state ==
>>>>>>> SYS_STATE_early_boot check strictly necessary? It looks like it is
>>>>>>> not:
>>>>>>> the patch would work even if it was just:
>>>>>>
>>>>>> I had a few thoughts about it. On Arm32, this only really works for
>>>>>> 32-bits machine address (it can go up to 40-bits). I haven't really
>>>>>> fully investigated what could go wrong, but it would be best to keep
>>>>>> it
>>>>>> only for early boot.
>>>>>>
>>>>>> Also, I don't really want to rely on this "workaround" after boot.
>>>>>> Maybe
>>>>>> we would want to keep them unmapped in the future.
>>>>>
>>>>> Yes, no problems, we agree on that. I am not asking in regards to the
>>>>> check system_state == SYS_STATE_early_boot with the goal of asking you
>>>>> to get rid of it. I am fine with keeping the check. (Maybe we want to
>>>>> add
>>>>> an `unlikely()' around the check.)
>>>>>
>>>>> I am trying to understand whether the code actually relies on
>>>>> system_state == SYS_STATE_early_boot, and, if so, why. The goal is to
>>>>> make sure that if there are some limitations that they are documented,
>>>>> or just to double-check that there are no limitations.
>>>>
>>>> The check is not strictly necessary.
>>>>
>>>>>
>>>>> In regards to your comment about only working for 32-bit addresses on
>>>>> Arm32, you have a point. At least we should be careful with the mfn to
>>>>> vaddr conversion because mfn_to_maddr returns a paddr_t which is 64-bit
>>>>> and vaddr_t is 32-bit. I imagine that theoretically, even with
>>>>> system_state == SYS_STATE_early_boot, it could get truncated with the
>>>>> wrong combination of mfn and phys_offset.
>>>>>
>>>>> If nothing else, maybe we should add a truncation check for safety?
>>>>
>>>> Except that phys_offset is not defined correctly, so your check below will
>>>> break some setup :(. Let's take the following example:
>>>>
>>>>      Xen is loaded at PA 0x100000
>>>>
>>>> The boot offset is computed using 32-bit address (see head.S):
>>>>       PA - VA = 0x100000 - 0x200000
>>>>               = 0xfff00000
>>>>
>>>> This value will be passed to C code as an unsigned long. But then we will
>>>> store it in a uint64_t/paddr_t (see phys_offset which is set in
>>>> setup_page_tables). Because this is a conversion from unsigned to
>>>> unsigned,
>>>> the "sign bit" will not be propagated.
>>>>
>>>> This means that phys_offset will be equal to 0xfff00000 and not
>>>> 0xfffffffffff00000!
>>>>
>>>> Therefore if we try to convert 0x100000 (where Xen has been loaded) back
>>>> to
>>>> its VA, the resulting value will be 0xffffffff00200100.
>>>>
>>>> Looking at the code, I think pte_of_xenaddr() has also the exact problem.
>>>> :(
>>>
>>> One way to fix it would be to change phys_offset to register_t (or just
>>> declare it as unsigned long on arm32 and unsigned long long on arm64):
>>
>> sizeof (unsigned long) = 32 (resp. 64) on Arm32 (resp. arm64). This is what we
>> already rely on for boot_phys_offset (see setup_pagetables). So I am not sure
>> why phys_offset needs to be defined differently.
>>
>> An alternative is to use vaddr_t.
> 
> Yes, I meant like vaddr_t or just "unsigned long" like boot_phys_offset.
> Even with your latest patch
> (https://marc.info/?l=xen-devel&m=157073004830894) which I like as a way
> to solve the original GRUB booting issue, it looks like we also need to
> change phys_addr to unsigned long to fix other arm32 problems.
> 
> Are you going to send the patch for that too?

I am looking at dropping phys_offset completely.

Regarding Xen 4.13, the issue would only happen if you place Xen below 2MB on 
Arm32. Yet, I believe this works fine because of side effect (MFN can only be 
32-bit).

This is not pretty, but I don't view this as critical to fix for Xen 4.13. So I 
am thinking to defer this post 4.13.

Cheers,
Stefano Stabellini Oct. 11, 2019, 5:11 p.m. UTC | #17
On Fri, 11 Oct 2019, Julien Grall wrote:
> On 11/10/2019 01:32, Stefano Stabellini wrote:
> > On Thu, 10 Oct 2019, Julien Grall wrote:
> > > On 08/10/2019 23:24, Stefano Stabellini wrote:
> > > > On Tue, 8 Oct 2019, Julien Grall wrote:
> > > > > On 10/8/19 1:18 AM, Stefano Stabellini wrote:
> > > > > > On Mon, 7 Oct 2019, Julien Grall wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > On 03/10/2019 02:02, Stefano Stabellini wrote:
> > > > > > > > On Fri, 20 Sep 2019, Julien Grall wrote:
> > > > > > > > > That's not correct. alloc_boot_pages() is actually here to
> > > > > > > > > allow
> > > > > > > > > dynamic
> > > > > > > > > allocation before the memory subsystem (and therefore the
> > > > > > > > > runtime
> > > > > > > > > allocator)
> > > > > > > > > is initialized.
> > > > > > > > 
> > > > > > > > Let me change the question then: is the system_state ==
> > > > > > > > SYS_STATE_early_boot check strictly necessary? It looks like it
> > > > > > > > is
> > > > > > > > not:
> > > > > > > > the patch would work even if it was just:
> > > > > > > 
> > > > > > > I had a few thoughts about it. On Arm32, this only really works
> > > > > > > for
> > > > > > > 32-bits machine address (it can go up to 40-bits). I haven't
> > > > > > > really
> > > > > > > fully investigated what could go wrong, but it would be best to
> > > > > > > keep
> > > > > > > it
> > > > > > > only for early boot.
> > > > > > > 
> > > > > > > Also, I don't really want to rely on this "workaround" after boot.
> > > > > > > Maybe
> > > > > > > we would want to keep them unmapped in the future.
> > > > > > 
> > > > > > Yes, no problems, we agree on that. I am not asking in regards to
> > > > > > the
> > > > > > check system_state == SYS_STATE_early_boot with the goal of asking
> > > > > > you
> > > > > > to get rid of it. I am fine with keeping the check. (Maybe we want
> > > > > > to
> > > > > > add
> > > > > > an `unlikely()' around the check.)
> > > > > > 
> > > > > > I am trying to understand whether the code actually relies on
> > > > > > system_state == SYS_STATE_early_boot, and, if so, why. The goal is
> > > > > > to
> > > > > > make sure that if there are some limitations that they are
> > > > > > documented,
> > > > > > or just to double-check that there are no limitations.
> > > > > 
> > > > > The check is not strictly necessary.
> > > > > 
> > > > > > 
> > > > > > In regards to your comment about only working for 32-bit addresses
> > > > > > on
> > > > > > Arm32, you have a point. At least we should be careful with the mfn
> > > > > > to
> > > > > > vaddr conversion because mfn_to_maddr returns a paddr_t which is
> > > > > > 64-bit
> > > > > > and vaddr_t is 32-bit. I imagine that theoretically, even with
> > > > > > system_state == SYS_STATE_early_boot, it could get truncated with
> > > > > > the
> > > > > > wrong combination of mfn and phys_offset.
> > > > > > 
> > > > > > If nothing else, maybe we should add a truncation check for safety?
> > > > > 
> > > > > Except that phys_offset is not defined correctly, so your check below
> > > > > will
> > > > > break some setup :(. Let's take the following example:
> > > > > 
> > > > >      Xen is loaded at PA 0x100000
> > > > > 
> > > > > The boot offset is computed using 32-bit address (see head.S):
> > > > >       PA - VA = 0x100000 - 0x200000
> > > > >               = 0xfff00000
> > > > > 
> > > > > This value will be passed to C code as an unsigned long. But then we
> > > > > will
> > > > > store it in a uint64_t/paddr_t (see phys_offset which is set in
> > > > > setup_page_tables). Because this is a conversion from unsigned to
> > > > > unsigned,
> > > > > the "sign bit" will not be propagated.
> > > > > 
> > > > > This means that phys_offset will be equal to 0xfff00000 and not
> > > > > 0xfffffffffff00000!
> > > > > 
> > > > > Therefore if we try to convert 0x100000 (where Xen has been loaded)
> > > > > back
> > > > > to
> > > > > its VA, the resulting value will be 0xffffffff00200100.
> > > > > 
> > > > > Looking at the code, I think pte_of_xenaddr() has also the exact
> > > > > problem.
> > > > > :(
> > > > 
> > > > One way to fix it would be to change phys_offset to register_t (or just
> > > > declare it as unsigned long on arm32 and unsigned long long on arm64):
> > > 
> > > sizeof (unsigned long) = 32 (resp. 64) on Arm32 (resp. arm64). This is
> > > what we
> > > already rely on for boot_phys_offset (see setup_pagetables). So I am not
> > > sure
> > > why phys_offset needs to be defined differently.
> > > 
> > > An alternative is to use vaddr_t.
> > 
> > Yes, I meant like vaddr_t or just "unsigned long" like boot_phys_offset.
> > Even with your latest patch
> > (https://marc.info/?l=xen-devel&m=157073004830894) which I like as a way
> > to solve the original GRUB booting issue, it looks like we also need to
> > change phys_addr to unsigned long to fix other arm32 problems.
> > 
> > Are you going to send the patch for that too?
> 
> I am looking at dropping phys_offset completely.
> 
> Regarding Xen 4.13, the issue would only happen if you place Xen below 2MB on
> Arm32. Yet, I believe this works fine because of side effect (MFN can only be
> 32-bit).
> 
> This is not pretty, but I don't view this as critical to fix for Xen 4.13. So
> I am thinking to defer this post 4.13.

If the issue doesn't trigger on 4.13, then I agree we shouldn't try to
fix it (badly) at this stage.

But we do have a "minus phys_offset" operation in dump_hyp_walk, I don't
follow why we wouldn't see a problem if Xen is loaded at 1MB on arm32.

Xen pa: 0x100000
Xen va: 0x200000
phys_offset: 0xfff00000

test_va: 0x202000
test_va - phys_offset = 0xffffffff00300200. But it should be 0x102000.

Given that the problem is in a BUG_ON maybe we could remove it? Or just
rework the BUG_ON?
Julien Grall Oct. 11, 2019, 5:51 p.m. UTC | #18
Hi Stefano,

On 10/11/19 6:11 PM, Stefano Stabellini wrote:
>> This is not pretty, but I don't view this as critical to fix for Xen 4.13. So
>> I am thinking to defer this post 4.13.
> 
> If the issue doesn't trigger on 4.13, then I agree we shouldn't try to
> fix it (badly) at this stage.
> 
> But we do have a "minus phys_offset" operation in dump_hyp_walk, I don't
> follow why we wouldn't see a problem if Xen is loaded at 1MB on arm32.

I haven't said the issue cannot be triggered. I pointed out I don't view 
this as critical.

One of the reason is that I bet nobody ever run Xen below 1MB on Arm32. 
Otherwise they would have seen an error...

In other words, I am not going to plan to fix it for Xen 4.13. However, 
if someone wants to spend time on it (and have platform to test it), 
then patch are welcomed.

> 
> Xen pa: 0x100000
> Xen va: 0x200000
> phys_offset: 0xfff00000
> 
> test_va: 0x202000
> test_va - phys_offset = 0xffffffff00300200. But it should be 0x102000. >
> Given that the problem is in a BUG_ON maybe we could remove it? Or just
> rework the BUG_ON?

For arm32, dump_hyp_walk() is only called when the AT instruction fails 
to translate a physical address. You are already doing something wrong 
if you hit, so you will panic in either case. The only difference is you 
don't get the page-table dumped.

Cheers,
Stefano Stabellini Oct. 11, 2019, 7:01 p.m. UTC | #19
On Fri, 11 Oct 2019, Julien Grall wrote:
> Hi Stefano,
> 
> On 10/11/19 6:11 PM, Stefano Stabellini wrote:
> > > This is not pretty, but I don't view this as critical to fix for Xen 4.13.
> > > So
> > > I am thinking to defer this post 4.13.
> > 
> > If the issue doesn't trigger on 4.13, then I agree we shouldn't try to
> > fix it (badly) at this stage.
> > 
> > But we do have a "minus phys_offset" operation in dump_hyp_walk, I don't
> > follow why we wouldn't see a problem if Xen is loaded at 1MB on arm32.
> 
> I haven't said the issue cannot be triggered. I pointed out I don't view this
> as critical.
> 
> One of the reason is that I bet nobody ever run Xen below 1MB on Arm32.
> Otherwise they would have seen an error...
> 
> In other words, I am not going to plan to fix it for Xen 4.13. However, if
> someone wants to spend time on it (and have platform to test it), then patch
> are welcomed.

If the issue can be triggered then I think we have a choice between
fixing it (you don't necessarily have to be the one to do it) or
documenting that Xen needs to be loaded above XEN_VIRT_ADDR on arm32.
But see below.


> > Xen pa: 0x100000
> > Xen va: 0x200000
> > phys_offset: 0xfff00000
> > 
> > test_va: 0x202000
> > test_va - phys_offset = 0xffffffff00300200. But it should be 0x102000. >
> > Given that the problem is in a BUG_ON maybe we could remove it? Or just
> > rework the BUG_ON?
> 
> For arm32, dump_hyp_walk() is only called when the AT instruction fails to
> translate a physical address. You are already doing something wrong if you
> hit, so you will panic in either case. The only difference is you don't get
> the page-table dumped.

Why don't we change the BUG_ON like the following:

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index a6637ce347..b7883cfbab 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -284,10 +284,7 @@ void dump_hyp_walk(vaddr_t addr)
            "on CPU%d via TTBR 0x%016"PRIx64"\n",
            addr, smp_processor_id(), ttbr);
 
-    if ( smp_processor_id() == 0 )
-        BUG_ON( (lpae_t *)(unsigned long)(ttbr - phys_offset) != pgtable );
-    else
-        BUG_ON( virt_to_maddr(pgtable) != ttbr );
+    BUG_ON( virt_to_maddr(pgtable) != ttbr );
     dump_pt_walk(ttbr, addr, HYP_PT_ROOT_LEVEL, 1);
 }
 
We probably don't need the CPU0 special case?
Julien Grall Oct. 11, 2019, 7:14 p.m. UTC | #20
On 10/11/19 8:01 PM, Stefano Stabellini wrote:
> On Fri, 11 Oct 2019, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 10/11/19 6:11 PM, Stefano Stabellini wrote:
>>>> This is not pretty, but I don't view this as critical to fix for Xen 4.13.
>>>> So
>>>> I am thinking to defer this post 4.13.
>>>
>>> If the issue doesn't trigger on 4.13, then I agree we shouldn't try to
>>> fix it (badly) at this stage.
>>>
>>> But we do have a "minus phys_offset" operation in dump_hyp_walk, I don't
>>> follow why we wouldn't see a problem if Xen is loaded at 1MB on arm32.
>>
>> I haven't said the issue cannot be triggered. I pointed out I don't view this
>> as critical.
>>
>> One of the reason is that I bet nobody ever run Xen below 1MB on Arm32.
>> Otherwise they would have seen an error...
>>
>> In other words, I am not going to plan to fix it for Xen 4.13. However, if
>> someone wants to spend time on it (and have platform to test it), then patch
>> are welcomed.
> 
> If the issue can be triggered then I think we have a choice between
> fixing it (you don't necessarily have to be the one to do it) or
> documenting that Xen needs to be loaded above XEN_VIRT_ADDR on arm32.

As I said before on a similar topic, if we document this, we also need 
to document that our Xen boot code is definitely not compliant with the 
Arm Arm and you are at risk to not be able to boot ;).

>>> Xen pa: 0x100000
>>> Xen va: 0x200000
>>> phys_offset: 0xfff00000
>>>
>>> test_va: 0x202000
>>> test_va - phys_offset = 0xffffffff00300200. But it should be 0x102000. >
>>> Given that the problem is in a BUG_ON maybe we could remove it? Or just
>>> rework the BUG_ON?
>>
>> For arm32, dump_hyp_walk() is only called when the AT instruction fails to
>> translate a physical address. You are already doing something wrong if you
>> hit, so you will panic in either case. The only difference is you don't get
>> the page-table dumped.
> 
> Why don't we change the BUG_ON like the following:
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index a6637ce347..b7883cfbab 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -284,10 +284,7 @@ void dump_hyp_walk(vaddr_t addr)
>              "on CPU%d via TTBR 0x%016"PRIx64"\n",
>              addr, smp_processor_id(), ttbr);
>   
> -    if ( smp_processor_id() == 0 )
> -        BUG_ON( (lpae_t *)(unsigned long)(ttbr - phys_offset) != pgtable );
> -    else
> -        BUG_ON( virt_to_maddr(pgtable) != ttbr );
> +    BUG_ON( virt_to_maddr(pgtable) != ttbr );
>       dump_pt_walk(ttbr, addr, HYP_PT_ROOT_LEVEL, 1);
>   }
>   
> We probably don't need the CPU0 special case?

We don't need it. Patch are welcomed.

Cheers,
Julien Grall Oct. 16, 2019, 12:46 p.m. UTC | #21
Hi Hongyan,

On 11/10/2019 10:53, Xia, Hongyan wrote:
> Not commenting on the patch, but I had exactly the same problem when
> removing the direct map in x86. map_domain_page has to be usable
> without the direct map and even before alloc_boot_pages can be used (so
> that I can map the bootmem_regions_list, although I have made it
> statically allocated now).
> 
> The direct map teardown series in the end introduces a persistent
> mapping infrastructure to map page tables which occupies a few fixmap
> entries, and avoid set_fixmap but instead modify the entries directly
> to avoid invocation loops. The result is that map_xen_pagetable and
> map_domain_page is usable from the very beginning of start_xen.

Would you mind to point to the patch adding the implementation on x86?

On arm32, we are not using a direct map. So we are using a per-cpu mapping (see 
map_domain_page() implementation in arch/arm/mm.c).

On arm64, we are using the direct map so far. But we could use any of the two 
solutions if needed.

Cheers,
Xia, Hongyan Oct. 16, 2019, 1:16 p.m. UTC | #22
Hi Julien,

Sure. You should be able to find it on directnonmap-v2.3 branch at
https://xenbits.xen.org/git-http/people/hx242/xen.git.

Commit: a4fef31b99388524d3f7748967c5d04a924cb7e3
    x86: add Persistent Map (PMAP) infrastructure

One thing to note is that the PMAP structure is really low-performance
because we do not want to burn too many fixmap entries. It should no
longer be in use the moment we bootstrapped other mapping
infrastructures. The only intention of it is to be able to map pages
very early on.

Hongyan

On Wed, 2019-10-16 at 13:46 +0100, Julien Grall wrote:
> Hi Hongyan,
> 
> On 11/10/2019 10:53, Xia, Hongyan wrote:
> > Not commenting on the patch, but I had exactly the same problem
> > when
> > removing the direct map in x86. map_domain_page has to be usable
> > without the direct map and even before alloc_boot_pages can be used
> > (so
> > that I can map the bootmem_regions_list, although I have made it
> > statically allocated now).
> > 
> > The direct map teardown series in the end introduces a persistent
> > mapping infrastructure to map page tables which occupies a few
> > fixmap
> > entries, and avoid set_fixmap but instead modify the entries
> > directly
> > to avoid invocation loops. The result is that map_xen_pagetable and
> > map_domain_page is usable from the very beginning of start_xen.
> 
> Would you mind to point to the patch adding the implementation on
> x86?
> 
> On arm32, we are not using a direct map. So we are using a per-cpu
> mapping (see 
> map_domain_page() implementation in arch/arm/mm.c).
> 
> On arm64, we are using the direct map so far. But we could use any of
> the two 
> solutions if needed.
> 
> Cheers,
>

Patch
diff mbox series

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index e1cdeaaf2f..da6303a8fd 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -950,11 +950,31 @@  static int create_xen_table(lpae_t *entry)
 
 static lpae_t *xen_map_table(mfn_t mfn)
 {
+    /*
+     * We may require to map the page table before map_domain_page() is
+     * useable. The requirements here is it must be useable as soon as
+     * page-tables are allocated dynamically via alloc_boot_pages().
+     */
+    if ( system_state == SYS_STATE_early_boot )
+    {
+        vaddr_t va = mfn_to_maddr(mfn) - phys_offset;
+
+        if ( is_kernel(va) )
+            return (lpae_t *)va;
+    }
+
     return map_domain_page(mfn);
 }
 
 static void xen_unmap_table(const lpae_t *table)
 {
+    /*
+     * During early boot, xen_map_table() will not use map_domain_page()
+     * for page-tables residing in Xen binary. So skip the unmap part.
+     */
+    if ( system_state == SYS_STATE_early_boot && is_kernel(table) )
+        return;
+
     unmap_domain_page(table);
 }