diff mbox series

[v2,3/3] riscv: Fix crash when flushing executable ioremap regions

Message ID 8a555b0b0934f0ba134de92f6cf9db8b1744316c.1581767384.git.jan.kiszka@web.de (mailing list archive)
State New, archived
Headers show
Series riscv: mem= support, ioremap exec fix | expand

Commit Message

Jan Kiszka Feb. 15, 2020, 11:49 a.m. UTC
From: Jan Kiszka <jan.kiszka@siemens.com>

Those are not backed by page structs, and pte_page is returning an
invalid pointer.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 arch/riscv/mm/cacheflush.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--
2.16.4

Comments

Alexandre Ghiti Feb. 16, 2020, 2:41 p.m. UTC | #1
Hi Jan,

On 2/15/20 6:49 AM, Jan Kiszka wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> Those are not backed by page structs, and pte_page is returning an
> invalid pointer.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> =2D--
>   arch/riscv/mm/cacheflush.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index 8930ab7278e6..9ee2c1a387cc 100644
> =2D-- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -84,7 +84,8 @@ void flush_icache_pte(pte_t pte)
>   {
>   	struct page *page =3D pte_page(pte);
> 
> -	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
> +	if (!pfn_valid(pte_pfn(pte)) ||
> +	    !test_and_set_bit(PG_dcache_clean, &page->flags))
>   		flush_icache_all();
>   }
>   #endif /* CONFIG_MMU */
> =2D-
> 2.16.4
> 
> 

When did you encounter such a situation ? i.e. executable code that is 
not backed by struct page ?

Riscv uses the generic implementation of ioremap and the way 
_PAGE_IOREMAP is defined does not allow to map executable memory region 
using ioremap, so I'm interested to understand how we end up in 
flush_icache_pte for an executable region not backed by any struct page.

Thanks,

Alex
Jan Kiszka Feb. 16, 2020, 4:05 p.m. UTC | #2
On 16.02.20 15:41, Alex Ghiti wrote:
> Hi Jan,
>
> On 2/15/20 6:49 AM, Jan Kiszka wrote:
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>
>> Those are not backed by page structs, and pte_page is returning an
>> invalid pointer.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> =2D--
>>   arch/riscv/mm/cacheflush.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
>> index 8930ab7278e6..9ee2c1a387cc 100644
>> =2D-- a/arch/riscv/mm/cacheflush.c
>> +++ b/arch/riscv/mm/cacheflush.c
>> @@ -84,7 +84,8 @@ void flush_icache_pte(pte_t pte)
>>   {
>>       struct page *page =3D pte_page(pte);
>>
>> -    if (!test_and_set_bit(PG_dcache_clean, &page->flags))
>> +    if (!pfn_valid(pte_pfn(pte)) ||
>> +        !test_and_set_bit(PG_dcache_clean, &page->flags))
>>           flush_icache_all();
>>   }
>>   #endif /* CONFIG_MMU */
>> =2D-
>> 2.16.4
>>
>>
>
> When did you encounter such a situation ? i.e. executable code that is
> not backed by struct page ?
>
> Riscv uses the generic implementation of ioremap and the way
> _PAGE_IOREMAP is defined does not allow to map executable memory region
> using ioremap, so I'm interested to understand how we end up in
> flush_icache_pte for an executable region not backed by any struct page.

You can create executable mappings of memory that Linux does not
initially consider as RAM via ioremap_prot or ioremap_page_range. We are
using that in Jailhouse to load the hypervisor code into reserved memory
that is ioremapped for the purpose. Works fine on x86, arm and arm64.

Jan
Alexandre Ghiti Feb. 16, 2020, 7:56 p.m. UTC | #3
On 2/16/20 11:05 AM, Jan Kiszka wrote:
> On 16.02.20 15:41, Alex Ghiti wrote:
>> Hi Jan,
>>
>> On 2/15/20 6:49 AM, Jan Kiszka wrote:
>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> Those are not backed by page structs, and pte_page is returning an
>>> invalid pointer.
>>>
>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>> =2D--
>>>   arch/riscv/mm/cacheflush.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
>>> index 8930ab7278e6..9ee2c1a387cc 100644
>>> =2D-- a/arch/riscv/mm/cacheflush.c
>>> +++ b/arch/riscv/mm/cacheflush.c
>>> @@ -84,7 +84,8 @@ void flush_icache_pte(pte_t pte)
>>>   {
>>>       struct page *page =3D pte_page(pte);
>>>
>>> -    if (!test_and_set_bit(PG_dcache_clean, &page->flags))
>>> +    if (!pfn_valid(pte_pfn(pte)) ||
>>> +        !test_and_set_bit(PG_dcache_clean, &page->flags))
>>>           flush_icache_all();
>>>   }
>>>   #endif /* CONFIG_MMU */
>>> =2D-
>>> 2.16.4
>>>
>>>
>>
>> When did you encounter such a situation ? i.e. executable code that is
>> not backed by struct page ?
>>
>> Riscv uses the generic implementation of ioremap and the way
>> _PAGE_IOREMAP is defined does not allow to map executable memory region
>> using ioremap, so I'm interested to understand how we end up in
>> flush_icache_pte for an executable region not backed by any struct page.
> 
> You can create executable mappings of memory that Linux does not
> initially consider as RAM via ioremap_prot or ioremap_page_range. We are
> using that in Jailhouse to load the hypervisor code into reserved memory
> that is ioremapped for the purpose. Works fine on x86, arm and arm64.
> 
> Jan

Ok thanks, I had missed this API.

Regarding your patch, I find it weird to do anything if the pfn is 
invalid, we could have garbage in pte pointing to an invalid region for 
example (I admit that the effect of flushing the icache would not be 
catastrophic in that situation).

I'm not saying I will come with a better solution but I'll take a deeper 
look tomorrow.

Alex
Alexandre Ghiti Feb. 20, 2020, 5:49 a.m. UTC | #4
Hi Jan,

On 2/16/20 2:56 PM, Alex Ghiti wrote:
> On 2/16/20 11:05 AM, Jan Kiszka wrote:
>> On 16.02.20 15:41, Alex Ghiti wrote:
>>> Hi Jan,
>>>
>>> On 2/15/20 6:49 AM, Jan Kiszka wrote:
>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>
>>>> Those are not backed by page structs, and pte_page is returning an
>>>> invalid pointer.
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>> =2D--
>>>>   arch/riscv/mm/cacheflush.c | 3 ++-
>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
>>>> index 8930ab7278e6..9ee2c1a387cc 100644
>>>> =2D-- a/arch/riscv/mm/cacheflush.c
>>>> +++ b/arch/riscv/mm/cacheflush.c
>>>> @@ -84,7 +84,8 @@ void flush_icache_pte(pte_t pte)
>>>>   {
>>>>       struct page *page =3D pte_page(pte);
>>>>
>>>> -    if (!test_and_set_bit(PG_dcache_clean, &page->flags))
>>>> +    if (!pfn_valid(pte_pfn(pte)) ||
>>>> +        !test_and_set_bit(PG_dcache_clean, &page->flags))
>>>>           flush_icache_all();
>>>>   }
>>>>   #endif /* CONFIG_MMU */
>>>> =2D-
>>>> 2.16.4
>>>>
>>>>
>>>
>>> When did you encounter such a situation ? i.e. executable code that is
>>> not backed by struct page ?
>>>
>>> Riscv uses the generic implementation of ioremap and the way
>>> _PAGE_IOREMAP is defined does not allow to map executable memory region
>>> using ioremap, so I'm interested to understand how we end up in
>>> flush_icache_pte for an executable region not backed by any struct page.
>>
>> You can create executable mappings of memory that Linux does not
>> initially consider as RAM via ioremap_prot or ioremap_page_range. We are
>> using that in Jailhouse to load the hypervisor code into reserved memory
>> that is ioremapped for the purpose. Works fine on x86, arm and arm64.
>>
>> Jan
> 
> Ok thanks, I had missed this API.
> 
> Regarding your patch, I find it weird to do anything if the pfn is 
> invalid, we could have garbage in pte pointing to an invalid region for 
> example (I admit that the effect of flushing the icache would not be 
> catastrophic in that situation).
> 
> I'm not saying I will come with a better solution but I'll take a deeper 
> look tomorrow.
> 
> Alex
> 

I took a look at the Jailhouse driver. After loading the hypervisor into 
the ioremapped region, it explicitly ensures icache/dcache consistency 
by calling flush_icache_range here:

https://github.com/siemens/jailhouse/blob/master/driver/main.c#L505

There seems to be an implicit (?) rule that states that in-kernel code 
modification must handle icache/dcache consistency:

In arm64 set_pte_at definition, they do not sync icache/dcache when the 
pte is kernel:

https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/pgtable.h#L271

In mips, they do the same:

https://elixir.bootlin.com/linux/latest/source/arch/mips/mm/cache.c#L137

So funnily, I'd do the contrary of what you have done, the mips way:

diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index 8930ab7278e6..c90c8bb49109 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -84,6 +84,9 @@ void flush_icache_pte(pte_t pte)
  {
         struct page *page = pte_page(pte);

+       if (unlikely(!pfn_valid(pte_pfn(pte))))
+               return;
+
         if (!test_and_set_bit(PG_dcache_clean, &page->flags))
                 flush_icache_all();
  }

What do you think ?

Alex
Jan Kiszka Feb. 20, 2020, 6:38 a.m. UTC | #5
On 20.02.20 06:49, Alex Ghiti wrote:
> Hi Jan,
>
> On 2/16/20 2:56 PM, Alex Ghiti wrote:
>> On 2/16/20 11:05 AM, Jan Kiszka wrote:
>>> On 16.02.20 15:41, Alex Ghiti wrote:
>>>> Hi Jan,
>>>>
>>>> On 2/15/20 6:49 AM, Jan Kiszka wrote:
>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>
>>>>> Those are not backed by page structs, and pte_page is returning an
>>>>> invalid pointer.
>>>>>
>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>> =2D--
>>>>>   arch/riscv/mm/cacheflush.c | 3 ++-
>>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
>>>>> index 8930ab7278e6..9ee2c1a387cc 100644
>>>>> =2D-- a/arch/riscv/mm/cacheflush.c
>>>>> +++ b/arch/riscv/mm/cacheflush.c
>>>>> @@ -84,7 +84,8 @@ void flush_icache_pte(pte_t pte)
>>>>>   {
>>>>>       struct page *page =3D pte_page(pte);
>>>>>
>>>>> -    if (!test_and_set_bit(PG_dcache_clean, &page->flags))
>>>>> +    if (!pfn_valid(pte_pfn(pte)) ||
>>>>> +        !test_and_set_bit(PG_dcache_clean, &page->flags))
>>>>>           flush_icache_all();
>>>>>   }
>>>>>   #endif /* CONFIG_MMU */
>>>>> =2D-
>>>>> 2.16.4
>>>>>
>>>>>
>>>>
>>>> When did you encounter such a situation ? i.e. executable code that is
>>>> not backed by struct page ?
>>>>
>>>> Riscv uses the generic implementation of ioremap and the way
>>>> _PAGE_IOREMAP is defined does not allow to map executable memory region
>>>> using ioremap, so I'm interested to understand how we end up in
>>>> flush_icache_pte for an executable region not backed by any struct
>>>> page.
>>>
>>> You can create executable mappings of memory that Linux does not
>>> initially consider as RAM via ioremap_prot or ioremap_page_range. We are
>>> using that in Jailhouse to load the hypervisor code into reserved memory
>>> that is ioremapped for the purpose. Works fine on x86, arm and arm64.
>>>
>>> Jan
>>
>> Ok thanks, I had missed this API.
>>
>> Regarding your patch, I find it weird to do anything if the pfn is
>> invalid, we could have garbage in pte pointing to an invalid region
>> for example (I admit that the effect of flushing the icache would not
>> be catastrophic in that situation).
>>
>> I'm not saying I will come with a better solution but I'll take a
>> deeper look tomorrow.
>>
>> Alex
>>
>
> I took a look at the Jailhouse driver. After loading the hypervisor into
> the ioremapped region, it explicitly ensures icache/dcache consistency
> by calling flush_icache_range here:
>
> https://github.com/siemens/jailhouse/blob/master/driver/main.c#L505
>

Yeah, the arm64 port needed this.

> There seems to be an implicit (?) rule that states that in-kernel code
> modification must handle icache/dcache consistency:
>
> In arm64 set_pte_at definition, they do not sync icache/dcache when the
> pte is kernel:
>
> https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/pgtable.h#L271
>
>
> In mips, they do the same:
>
> https://elixir.bootlin.com/linux/latest/source/arch/mips/mm/cache.c#L137
>
> So funnily, I'd do the contrary of what you have done, the mips way:
>
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index 8930ab7278e6..c90c8bb49109 100644
> --- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -84,6 +84,9 @@ void flush_icache_pte(pte_t pte)
>   {
>          struct page *page = pte_page(pte);
>
> +       if (unlikely(!pfn_valid(pte_pfn(pte))))
> +               return;
> +
>          if (!test_and_set_bit(PG_dcache_clean, &page->flags))
>                  flush_icache_all();
>   }
>
> What do you think ?
>

I wouldn't mind doing it like above. I suspect that became the common
simple pattern because no one expected a use case like with Jailhouse.
But I'm by far not an expert in mm topics in the kernel.

Jan
diff mbox series

Patch

diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index 8930ab7278e6..9ee2c1a387cc 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -84,7 +84,8 @@  void flush_icache_pte(pte_t pte)
 {
 	struct page *page = pte_page(pte);

-	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
+	if (!pfn_valid(pte_pfn(pte)) ||
+	    !test_and_set_bit(PG_dcache_clean, &page->flags))
 		flush_icache_all();
 }
 #endif /* CONFIG_MMU */