diff mbox

xen: arm: zero EL2 pagetable pages before use

Message ID 1457647206-9436-1-git-send-email-shankerd@codeaurora.org (mailing list archive)
State New, archived
Headers show

Commit Message

Shanker Donthineni March 10, 2016, 10 p.m. UTC
From: Vikram Sethi <vikrams@codeaurora.org>

arch/arm/mm.c has 2 uses of alloc_boot_pages which are used for
pagetables, but the allocated pages are not zeroed. This can cause
crashes on CPUs with aggressive prefetching when they find 'valid'
entries in the page tables but which are really uninitialized.
Memset the allocated pages before use.

Change-Id: I517ca45ca240766dfbf1d6884c044c377babab7d
Signed-off-by: Vikram Sethi <vikrams@codeaurora.org>
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
---
 xen/arch/arm/mm.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Jan Beulich March 11, 2016, 11:29 a.m. UTC | #1
>>> On 10.03.16 at 23:00, <shankerd@codeaurora.org> wrote:

First of all - please correctly Cc maintainers (there were two recent
changes for ARM).

> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -730,6 +730,7 @@ void __init setup_xenheap_mappings(unsigned long base_mfn,
>          else
>          {
>              unsigned long first_mfn = alloc_boot_pages(1, 1);
> +            memset(mfn_to_virt(first_mfn), 0, PAGE_SIZE);

If I was maintainer of this code, I would demand use of clear_page()
and ask for insertion of the missing blank line here (separating
declaration and statements).

> @@ -771,6 +772,7 @@ void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
>      nr_second = frametable_size >> SECOND_SHIFT;
>      second_base = alloc_boot_pages(nr_second, 1);
>      second = mfn_to_virt(second_base);
> +    memset(second, 0, nr_second * PAGE_SIZE);
>      for ( i = 0; i < nr_second; i++ )
>      {
>          pte = mfn_to_xen_entry(second_base + i, WRITEALLOC);

Along those lines here - use clear_page(), presumably by moving it
into the loop.

Jan
Andrew Cooper March 11, 2016, 12:56 p.m. UTC | #2
On 11/03/16 11:29, Jan Beulich wrote:
>>>> On 10.03.16 at 23:00, <shankerd@codeaurora.org> wrote:
> First of all - please correctly Cc maintainers (there were two recent
> changes for ARM).
>
>> --- a/xen/arch/arm/mm.c
>> +++ b/xen/arch/arm/mm.c
>> @@ -730,6 +730,7 @@ void __init setup_xenheap_mappings(unsigned long base_mfn,
>>          else
>>          {
>>              unsigned long first_mfn = alloc_boot_pages(1, 1);
>> +            memset(mfn_to_virt(first_mfn), 0, PAGE_SIZE);
> If I was maintainer of this code, I would demand use of clear_page()
> and ask for insertion of the missing blank line here (separating
> declaration and statements).
>
>> @@ -771,6 +772,7 @@ void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
>>      nr_second = frametable_size >> SECOND_SHIFT;
>>      second_base = alloc_boot_pages(nr_second, 1);
>>      second = mfn_to_virt(second_base);
>> +    memset(second, 0, nr_second * PAGE_SIZE);
>>      for ( i = 0; i < nr_second; i++ )
>>      {
>>          pte = mfn_to_xen_entry(second_base + i, WRITEALLOC);
> Along those lines here - use clear_page(), presumably by moving it
> into the loop.

This need only initialise the entries which are not filled by the loop,
which will only be the rounding size up to the next 2M or 32M boundary.

Most of the content of 'second' is explicitly initialised, so zeroing it
all first is redundant.

~Andrew
Jan Beulich March 11, 2016, 1:13 p.m. UTC | #3
>>> On 11.03.16 at 13:56, <andrew.cooper3@citrix.com> wrote:
> On 11/03/16 11:29, Jan Beulich wrote:
>>>>> On 10.03.16 at 23:00, <shankerd@codeaurora.org> wrote:
>>> @@ -771,6 +772,7 @@ void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
>>>      nr_second = frametable_size >> SECOND_SHIFT;
>>>      second_base = alloc_boot_pages(nr_second, 1);
>>>      second = mfn_to_virt(second_base);
>>> +    memset(second, 0, nr_second * PAGE_SIZE);
>>>      for ( i = 0; i < nr_second; i++ )
>>>      {
>>>          pte = mfn_to_xen_entry(second_base + i, WRITEALLOC);
>> Along those lines here - use clear_page(), presumably by moving it
>> into the loop.
> 
> This need only initialise the entries which are not filled by the loop,
> which will only be the rounding size up to the next 2M or 32M boundary.
> 
> Most of the content of 'second' is explicitly initialised, so zeroing it
> all first is redundant.

Well, I certainly don't know all the details of how this works on
ARM, but the way I remember the original problem description
(sent a few days ago) the problem was with bogus translations
to be visible transiently. Of course all depends on whether the
page tables that are being modified here are live ones, which
I simply don't know.

Jan
Andrew Cooper March 11, 2016, 1:24 p.m. UTC | #4
On 11/03/16 13:13, Jan Beulich wrote:
>>>> On 11.03.16 at 13:56, <andrew.cooper3@citrix.com> wrote:
>> On 11/03/16 11:29, Jan Beulich wrote:
>>>>>> On 10.03.16 at 23:00, <shankerd@codeaurora.org> wrote:
>>>> @@ -771,6 +772,7 @@ void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
>>>>      nr_second = frametable_size >> SECOND_SHIFT;
>>>>      second_base = alloc_boot_pages(nr_second, 1);
>>>>      second = mfn_to_virt(second_base);
>>>> +    memset(second, 0, nr_second * PAGE_SIZE);
>>>>      for ( i = 0; i < nr_second; i++ )
>>>>      {
>>>>          pte = mfn_to_xen_entry(second_base + i, WRITEALLOC);
>>> Along those lines here - use clear_page(), presumably by moving it
>>> into the loop.
>> This need only initialise the entries which are not filled by the loop,
>> which will only be the rounding size up to the next 2M or 32M boundary.
>>
>> Most of the content of 'second' is explicitly initialised, so zeroing it
>> all first is redundant.
> Well, I certainly don't know all the details of how this works on
> ARM, but the way I remember the original problem description
> (sent a few days ago) the problem was with bogus translations
> to be visible transiently. Of course all depends on whether the
> page tables that are being modified here are live ones, which
> I simply don't know.

Looking at the code here, second is hooked into the live pagetables
immediately after the loop.  Therefore, bogus translations will only be
present for the untouched PTEs which make up the alignment space.

However, it is probably best to defer to the ARM maintainers.

~Andrew
Julien Grall March 12, 2016, 2:32 p.m. UTC | #5
Hi,

On 11/03/2016 18:29, Jan Beulich wrote:
>>>> On 10.03.16 at 23:00, <shankerd@codeaurora.org> wrote:
>> --- a/xen/arch/arm/mm.c
>> +++ b/xen/arch/arm/mm.c
>> @@ -730,6 +730,7 @@ void __init setup_xenheap_mappings(unsigned long base_mfn,
>>           else
>>           {
>>               unsigned long first_mfn = alloc_boot_pages(1, 1);
>> +            memset(mfn_to_virt(first_mfn), 0, PAGE_SIZE);
>
> If I was maintainer of this code, I would demand use of clear_page()
> and ask for insertion of the missing blank line here (separating
> declaration and statements).

+1

>
>> @@ -771,6 +772,7 @@ void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
>>       nr_second = frametable_size >> SECOND_SHIFT;
>>       second_base = alloc_boot_pages(nr_second, 1);
>>       second = mfn_to_virt(second_base);
>> +    memset(second, 0, nr_second * PAGE_SIZE);
>>       for ( i = 0; i < nr_second; i++ )
>>       {
>>           pte = mfn_to_xen_entry(second_base + i, WRITEALLOC);
>
> Along those lines here - use clear_page(), presumably by moving it
> into the loop.

+1

Regards,
Julien Grall March 12, 2016, 4:03 p.m. UTC | #6
Hi,

On 11/03/2016 20:24, Andrew Cooper wrote:
> On 11/03/16 13:13, Jan Beulich wrote:
>>>>> On 11.03.16 at 13:56, <andrew.cooper3@citrix.com> wrote:
>>> On 11/03/16 11:29, Jan Beulich wrote:
>>>>>>> On 10.03.16 at 23:00, <shankerd@codeaurora.org> wrote:
>>>>> @@ -771,6 +772,7 @@ void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
>>>>>       nr_second = frametable_size >> SECOND_SHIFT;
>>>>>       second_base = alloc_boot_pages(nr_second, 1);
>>>>>       second = mfn_to_virt(second_base);
>>>>> +    memset(second, 0, nr_second * PAGE_SIZE);
>>>>>       for ( i = 0; i < nr_second; i++ )
>>>>>       {
>>>>>           pte = mfn_to_xen_entry(second_base + i, WRITEALLOC);
>>>> Along those lines here - use clear_page(), presumably by moving it
>>>> into the loop.
>>> This need only initialise the entries which are not filled by the loop,
>>> which will only be the rounding size up to the next 2M or 32M boundary.
>>>
>>> Most of the content of 'second' is explicitly initialised, so zeroing it
>>> all first is redundant.
>> Well, I certainly don't know all the details of how this works on
>> ARM, but the way I remember the original problem description
>> (sent a few days ago) the problem was with bogus translations
>> to be visible transiently. Of course all depends on whether the
>> page tables that are being modified here are live ones, which
>> I simply don't know.
>
> Looking at the code here, second is hooked into the live pagetables
> immediately after the loop.  Therefore, bogus translations will only be
> present for the untouched PTEs which make up the alignment space.

The frame table size is always aligned to 2MB/32MB. However, the frame 
table may not use all the entries in a level 2 page table (which cover 
1GB of memory). Those unused entries will be unknown if we don't clear them.

Keeping them unknown is not a problem as long as nobody is trying to 
access the underlying virtual address.

In the case of setup_frametable_mappings, Xen is still running with a 
single processor and the frame_table is not access until after 
create_mappings is called. The function should nuke all the TLBs at the 
end, so it looks like to me that zeroed the entries will hide the real 
problem.

Nonetheless, I would invalidate all the entries in the table to avoid 
polluting the TLBs with bogus entries and get a better crash.

Regards,
Julien Grall March 14, 2016, 7:37 a.m. UTC | #7
Hi Shanker,

On 11/03/2016 05:00, Shanker Donthineni wrote:
> From: Vikram Sethi <vikrams@codeaurora.org>
>
> arch/arm/mm.c has 2 uses of alloc_boot_pages which are used for
> pagetables, but the allocated pages are not zeroed. This can cause
> crashes on CPUs with aggressive prefetching when they find 'valid'
> entries in the page tables but which are really uninitialized.
> Memset the allocated pages before use.

I first thought the problem was related to break-before-make mandate by 
the ARM architecture (see D4-1732 in ARM DDI 0487A.i) when the page 
tables are modified in a certain way, but neither the frame table noor 
the xen heap are used before the TLBs are nuked.

I would like to see more details in the commit message about the crash 
and why (based on the spec) clearing the page is the right solution.

Note that I think clearing the page is good to avoid polluting the TLBs 
with bogus entries and get better crash log.

> Change-Id: I517ca45ca240766dfbf1d6884c044c377babab7d

What this line for?

> Signed-off-by: Vikram Sethi <vikrams@codeaurora.org>
> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
> ---
>   xen/arch/arm/mm.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 81f9e2e..215ec93 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -730,6 +730,7 @@ void __init setup_xenheap_mappings(unsigned long base_mfn,
>           else
>           {
>               unsigned long first_mfn = alloc_boot_pages(1, 1);
> +            memset(mfn_to_virt(first_mfn), 0, PAGE_SIZE);

You can move "first = mfn_to_virt(first_mfn)" earlier and re-use first here.

>               pte = mfn_to_xen_entry(first_mfn, WRITEALLOC);
>               pte.pt.table = 1;
>               write_pte(p, pte);
> @@ -771,6 +772,7 @@ void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
>       nr_second = frametable_size >> SECOND_SHIFT;
>       second_base = alloc_boot_pages(nr_second, 1);
>       second = mfn_to_virt(second_base);
> +    memset(second, 0, nr_second * PAGE_SIZE);
>       for ( i = 0; i < nr_second; i++ )
>       {
>           pte = mfn_to_xen_entry(second_base + i, WRITEALLOC);
>

Regards,
Shanker Donthineni March 14, 2016, 5:18 p.m. UTC | #8
Hi Julien,


On 03/12/2016 10:03 AM, Julien Grall wrote:
> Hi,
>
> On 11/03/2016 20:24, Andrew Cooper wrote:
>> On 11/03/16 13:13, Jan Beulich wrote:
>>>>>> On 11.03.16 at 13:56, <andrew.cooper3@citrix.com> wrote:
>>>> On 11/03/16 11:29, Jan Beulich wrote:
>>>>>>>> On 10.03.16 at 23:00, <shankerd@codeaurora.org> wrote:
>>>>>> @@ -771,6 +772,7 @@ void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
>>>>>>       nr_second = frametable_size >> SECOND_SHIFT;
>>>>>>       second_base = alloc_boot_pages(nr_second, 1);
>>>>>>       second = mfn_to_virt(second_base);
>>>>>> +    memset(second, 0, nr_second * PAGE_SIZE);
>>>>>>       for ( i = 0; i < nr_second; i++ )
>>>>>>       {
>>>>>>           pte = mfn_to_xen_entry(second_base + i, WRITEALLOC);
>>>>> Along those lines here - use clear_page(), presumably by moving it
>>>>> into the loop.
>>>> This need only initialise the entries which are not filled by the loop,
>>>> which will only be the rounding size up to the next 2M or 32M boundary.
>>>>
>>>> Most of the content of 'second' is explicitly initialised, so zeroing it
>>>> all first is redundant.
>>> Well, I certainly don't know all the details of how this works on
>>> ARM, but the way I remember the original problem description
>>> (sent a few days ago) the problem was with bogus translations
>>> to be visible transiently. Of course all depends on whether the
>>> page tables that are being modified here are live ones, which
>>> I simply don't know.
>>
>> Looking at the code here, second is hooked into the live pagetables
>> immediately after the loop.  Therefore, bogus translations will only be
>> present for the untouched PTEs which make up the alignment space.
>
> The frame table size is always aligned to 2MB/32MB. However, the frame table may not use all the entries in a level 2 page table (which cover 1GB of memory). Those unused entries will be unknown if we don't clear them.
>
> Keeping them unknown is not a problem as long as nobody is trying to access the underlying virtual address.
>

I don't agree keeping a garbage value in PTE is not a problem. The ARMv8 Architecture
allows to perform speculative data/instruction read accesses from memory (type normal)
as along as its PTE valid bit is set.

CPU prefetch logic might access garbage PTEs and cause system panic if VA-PA translation
happens to be physical address that is not addressable by system BUS.
 

> In the case of setup_frametable_mappings, Xen is still running with a single processor and the frame_table is not access until after create_mappings is called. The function should nuke all the TLBs at the end, so it looks like to me that zeroed the entries will hide the real problem.
>

Not true, zeroed PTE entries fixing the asynchronous aborts and Serror exceptions due to garbage
PTEs.

> Nonetheless, I would invalidate all the entries in the table to avoid polluting the TLBs with bogus entries and get a better crash.
>
> Regards,
>
Julien Grall March 15, 2016, 5:37 p.m. UTC | #9
On 14/03/16 17:18, Shanker Donthineni wrote:
> Hi Julien,

Hi Shanker,

>
> On 03/12/2016 10:03 AM, Julien Grall wrote:
>> Hi,
>>
>> On 11/03/2016 20:24, Andrew Cooper wrote:
>>> On 11/03/16 13:13, Jan Beulich wrote:
>>>>>>> On 11.03.16 at 13:56, <andrew.cooper3@citrix.com> wrote:
>>>>> On 11/03/16 11:29, Jan Beulich wrote:
>>>>>>>>> On 10.03.16 at 23:00, <shankerd@codeaurora.org> wrote:
>>>>>>> @@ -771,6 +772,7 @@ void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
>>>>>>>        nr_second = frametable_size >> SECOND_SHIFT;
>>>>>>>        second_base = alloc_boot_pages(nr_second, 1);
>>>>>>>        second = mfn_to_virt(second_base);
>>>>>>> +    memset(second, 0, nr_second * PAGE_SIZE);
>>>>>>>        for ( i = 0; i < nr_second; i++ )
>>>>>>>        {
>>>>>>>            pte = mfn_to_xen_entry(second_base + i, WRITEALLOC);
>>>>>> Along those lines here - use clear_page(), presumably by moving it
>>>>>> into the loop.
>>>>> This need only initialise the entries which are not filled by the loop,
>>>>> which will only be the rounding size up to the next 2M or 32M boundary.
>>>>>
>>>>> Most of the content of 'second' is explicitly initialised, so zeroing it
>>>>> all first is redundant.
>>>> Well, I certainly don't know all the details of how this works on
>>>> ARM, but the way I remember the original problem description
>>>> (sent a few days ago) the problem was with bogus translations
>>>> to be visible transiently. Of course all depends on whether the
>>>> page tables that are being modified here are live ones, which
>>>> I simply don't know.
>>>
>>> Looking at the code here, second is hooked into the live pagetables
>>> immediately after the loop.  Therefore, bogus translations will only be
>>> present for the untouched PTEs which make up the alignment space.
>>
>> The frame table size is always aligned to 2MB/32MB. However, the frame table may not use all the entries in a level 2 page table (which cover 1GB of memory). Those unused entries will be unknown if we don't clear them.
>>
>> Keeping them unknown is not a problem as long as nobody is trying to access the underlying virtual address.
>>
>
> I don't agree keeping a garbage value in PTE is not a problem. The ARMv8 Architecture
> allows to perform speculative data/instruction read accesses from memory (type normal)
> as along as its PTE valid bit is set.

When you quote the spec, can you give the section/version? It helps the 
reviewers to find more details.

More generally, having the section/version in the commit message is 
useful for future reference.

> CPU prefetch logic might access garbage PTEs and cause system panic if VA-PA translation
> happens to be physical address that is not addressable by system BUS.

It would have been nice to see those kind of details in the commit message.

Regards,
diff mbox

Patch

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 81f9e2e..215ec93 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -730,6 +730,7 @@  void __init setup_xenheap_mappings(unsigned long base_mfn,
         else
         {
             unsigned long first_mfn = alloc_boot_pages(1, 1);
+            memset(mfn_to_virt(first_mfn), 0, PAGE_SIZE);
             pte = mfn_to_xen_entry(first_mfn, WRITEALLOC);
             pte.pt.table = 1;
             write_pte(p, pte);
@@ -771,6 +772,7 @@  void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
     nr_second = frametable_size >> SECOND_SHIFT;
     second_base = alloc_boot_pages(nr_second, 1);
     second = mfn_to_virt(second_base);
+    memset(second, 0, nr_second * PAGE_SIZE);
     for ( i = 0; i < nr_second; i++ )
     {
         pte = mfn_to_xen_entry(second_base + i, WRITEALLOC);