Message ID | 8a555b0b0934f0ba134de92f6cf9db8b1744316c.1581767384.git.jan.kiszka@web.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | riscv: mem= support, ioremap exec fix | expand |
Hi Jan, On 2/15/20 6:49 AM, Jan Kiszka wrote: > From: Jan Kiszka <jan.kiszka@siemens.com> > > Those are not backed by page structs, and pte_page is returning an > invalid pointer. > > Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> > =2D-- > arch/riscv/mm/cacheflush.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c > index 8930ab7278e6..9ee2c1a387cc 100644 > =2D-- a/arch/riscv/mm/cacheflush.c > +++ b/arch/riscv/mm/cacheflush.c > @@ -84,7 +84,8 @@ void flush_icache_pte(pte_t pte) > { > struct page *page =3D pte_page(pte); > > - if (!test_and_set_bit(PG_dcache_clean, &page->flags)) > + if (!pfn_valid(pte_pfn(pte)) || > + !test_and_set_bit(PG_dcache_clean, &page->flags)) > flush_icache_all(); > } > #endif /* CONFIG_MMU */ > =2D- > 2.16.4 > > When did you encounter such a situation ? i.e. executable code that is not backed by struct page ? Riscv uses the generic implementation of ioremap and the way _PAGE_IOREMAP is defined does not allow to map executable memory region using ioremap, so I'm interested to understand how we end up in flush_icache_pte for an executable region not backed by any struct page. Thanks, Alex
On 16.02.20 15:41, Alex Ghiti wrote: > Hi Jan, > > On 2/15/20 6:49 AM, Jan Kiszka wrote: >> From: Jan Kiszka <jan.kiszka@siemens.com> >> >> Those are not backed by page structs, and pte_page is returning an >> invalid pointer. >> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >> =2D-- >> arch/riscv/mm/cacheflush.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c >> index 8930ab7278e6..9ee2c1a387cc 100644 >> =2D-- a/arch/riscv/mm/cacheflush.c >> +++ b/arch/riscv/mm/cacheflush.c >> @@ -84,7 +84,8 @@ void flush_icache_pte(pte_t pte) >> { >> struct page *page =3D pte_page(pte); >> >> - if (!test_and_set_bit(PG_dcache_clean, &page->flags)) >> + if (!pfn_valid(pte_pfn(pte)) || >> + !test_and_set_bit(PG_dcache_clean, &page->flags)) >> flush_icache_all(); >> } >> #endif /* CONFIG_MMU */ >> =2D- >> 2.16.4 >> >> > > When did you encounter such a situation ? i.e. executable code that is > not backed by struct page ? > > Riscv uses the generic implementation of ioremap and the way > _PAGE_IOREMAP is defined does not allow to map executable memory region > using ioremap, so I'm interested to understand how we end up in > flush_icache_pte for an executable region not backed by any struct page. You can create executable mappings of memory that Linux does not initially consider as RAM via ioremap_prot or ioremap_page_range. We are using that in Jailhouse to load the hypervisor code into reserved memory that is ioremapped for the purpose. Works fine on x86, arm and arm64. Jan
On 2/16/20 11:05 AM, Jan Kiszka wrote: > On 16.02.20 15:41, Alex Ghiti wrote: >> Hi Jan, >> >> On 2/15/20 6:49 AM, Jan Kiszka wrote: >>> From: Jan Kiszka <jan.kiszka@siemens.com> >>> >>> Those are not backed by page structs, and pte_page is returning an >>> invalid pointer. >>> >>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>> =2D-- >>> arch/riscv/mm/cacheflush.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c >>> index 8930ab7278e6..9ee2c1a387cc 100644 >>> =2D-- a/arch/riscv/mm/cacheflush.c >>> +++ b/arch/riscv/mm/cacheflush.c >>> @@ -84,7 +84,8 @@ void flush_icache_pte(pte_t pte) >>> { >>> struct page *page =3D pte_page(pte); >>> >>> - if (!test_and_set_bit(PG_dcache_clean, &page->flags)) >>> + if (!pfn_valid(pte_pfn(pte)) || >>> + !test_and_set_bit(PG_dcache_clean, &page->flags)) >>> flush_icache_all(); >>> } >>> #endif /* CONFIG_MMU */ >>> =2D- >>> 2.16.4 >>> >>> >> >> When did you encounter such a situation ? i.e. executable code that is >> not backed by struct page ? >> >> Riscv uses the generic implementation of ioremap and the way >> _PAGE_IOREMAP is defined does not allow to map executable memory region >> using ioremap, so I'm interested to understand how we end up in >> flush_icache_pte for an executable region not backed by any struct page. > > You can create executable mappings of memory that Linux does not > initially consider as RAM via ioremap_prot or ioremap_page_range. We are > using that in Jailhouse to load the hypervisor code into reserved memory > that is ioremapped for the purpose. Works fine on x86, arm and arm64. > > Jan Ok thanks, I had missed this API. Regarding your patch, I find it weird to do anything if the pfn is invalid, we could have garbage in pte pointing to an invalid region for example (I admit that the effect of flushing the icache would not be catastrophic in that situation). I'm not saying I will come with a better solution but I'll take a deeper look tomorrow. Alex
Hi Jan, On 2/16/20 2:56 PM, Alex Ghiti wrote: > On 2/16/20 11:05 AM, Jan Kiszka wrote: >> On 16.02.20 15:41, Alex Ghiti wrote: >>> Hi Jan, >>> >>> On 2/15/20 6:49 AM, Jan Kiszka wrote: >>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>> >>>> Those are not backed by page structs, and pte_page is returning an >>>> invalid pointer. >>>> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>> =2D-- >>>> arch/riscv/mm/cacheflush.c | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c >>>> index 8930ab7278e6..9ee2c1a387cc 100644 >>>> =2D-- a/arch/riscv/mm/cacheflush.c >>>> +++ b/arch/riscv/mm/cacheflush.c >>>> @@ -84,7 +84,8 @@ void flush_icache_pte(pte_t pte) >>>> { >>>> struct page *page =3D pte_page(pte); >>>> >>>> - if (!test_and_set_bit(PG_dcache_clean, &page->flags)) >>>> + if (!pfn_valid(pte_pfn(pte)) || >>>> + !test_and_set_bit(PG_dcache_clean, &page->flags)) >>>> flush_icache_all(); >>>> } >>>> #endif /* CONFIG_MMU */ >>>> =2D- >>>> 2.16.4 >>>> >>>> >>> >>> When did you encounter such a situation ? i.e. executable code that is >>> not backed by struct page ? >>> >>> Riscv uses the generic implementation of ioremap and the way >>> _PAGE_IOREMAP is defined does not allow to map executable memory region >>> using ioremap, so I'm interested to understand how we end up in >>> flush_icache_pte for an executable region not backed by any struct page. >> >> You can create executable mappings of memory that Linux does not >> initially consider as RAM via ioremap_prot or ioremap_page_range. We are >> using that in Jailhouse to load the hypervisor code into reserved memory >> that is ioremapped for the purpose. Works fine on x86, arm and arm64. >> >> Jan > > Ok thanks, I had missed this API. > > Regarding your patch, I find it weird to do anything if the pfn is > invalid, we could have garbage in pte pointing to an invalid region for > example (I admit that the effect of flushing the icache would not be > catastrophic in that situation). > > I'm not saying I will come with a better solution but I'll take a deeper > look tomorrow. > > Alex > I took a look at the Jailhouse driver. After loading the hypervisor into the ioremapped region, it explicitly ensures icache/dcache consistency by calling flush_icache_range here: https://github.com/siemens/jailhouse/blob/master/driver/main.c#L505 There seems to be an implicit (?) rule that states that in-kernel code modification must handle icache/dcache consistency: In arm64 set_pte_at definition, they do not sync icache/dcache when the pte is kernel: https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/pgtable.h#L271 In mips, they do the same: https://elixir.bootlin.com/linux/latest/source/arch/mips/mm/cache.c#L137 So funnily, I'd do the contrary of what you have done, the mips way: diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c index 8930ab7278e6..c90c8bb49109 100644 --- a/arch/riscv/mm/cacheflush.c +++ b/arch/riscv/mm/cacheflush.c @@ -84,6 +84,9 @@ void flush_icache_pte(pte_t pte) { struct page *page = pte_page(pte); + if (unlikely(!pfn_valid(pte_pfn(pte)))) + return; + if (!test_and_set_bit(PG_dcache_clean, &page->flags)) flush_icache_all(); } What do you think ? Alex
On 20.02.20 06:49, Alex Ghiti wrote: > Hi Jan, > > On 2/16/20 2:56 PM, Alex Ghiti wrote: >> On 2/16/20 11:05 AM, Jan Kiszka wrote: >>> On 16.02.20 15:41, Alex Ghiti wrote: >>>> Hi Jan, >>>> >>>> On 2/15/20 6:49 AM, Jan Kiszka wrote: >>>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>>> >>>>> Those are not backed by page structs, and pte_page is returning an >>>>> invalid pointer. >>>>> >>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>> =2D-- >>>>> arch/riscv/mm/cacheflush.c | 3 ++- >>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c >>>>> index 8930ab7278e6..9ee2c1a387cc 100644 >>>>> =2D-- a/arch/riscv/mm/cacheflush.c >>>>> +++ b/arch/riscv/mm/cacheflush.c >>>>> @@ -84,7 +84,8 @@ void flush_icache_pte(pte_t pte) >>>>> { >>>>> struct page *page =3D pte_page(pte); >>>>> >>>>> - if (!test_and_set_bit(PG_dcache_clean, &page->flags)) >>>>> + if (!pfn_valid(pte_pfn(pte)) || >>>>> + !test_and_set_bit(PG_dcache_clean, &page->flags)) >>>>> flush_icache_all(); >>>>> } >>>>> #endif /* CONFIG_MMU */ >>>>> =2D- >>>>> 2.16.4 >>>>> >>>>> >>>> >>>> When did you encounter such a situation ? i.e. executable code that is >>>> not backed by struct page ? >>>> >>>> Riscv uses the generic implementation of ioremap and the way >>>> _PAGE_IOREMAP is defined does not allow to map executable memory region >>>> using ioremap, so I'm interested to understand how we end up in >>>> flush_icache_pte for an executable region not backed by any struct >>>> page. >>> >>> You can create executable mappings of memory that Linux does not >>> initially consider as RAM via ioremap_prot or ioremap_page_range. We are >>> using that in Jailhouse to load the hypervisor code into reserved memory >>> that is ioremapped for the purpose. Works fine on x86, arm and arm64. >>> >>> Jan >> >> Ok thanks, I had missed this API. >> >> Regarding your patch, I find it weird to do anything if the pfn is >> invalid, we could have garbage in pte pointing to an invalid region >> for example (I admit that the effect of flushing the icache would not >> be catastrophic in that situation). >> >> I'm not saying I will come with a better solution but I'll take a >> deeper look tomorrow. >> >> Alex >> > > I took a look at the Jailhouse driver. After loading the hypervisor into > the ioremapped region, it explicitly ensures icache/dcache consistency > by calling flush_icache_range here: > > https://github.com/siemens/jailhouse/blob/master/driver/main.c#L505 > Yeah, the arm64 port needed this. > There seems to be an implicit (?) rule that states that in-kernel code > modification must handle icache/dcache consistency: > > In arm64 set_pte_at definition, they do not sync icache/dcache when the > pte is kernel: > > https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/pgtable.h#L271 > > > In mips, they do the same: > > https://elixir.bootlin.com/linux/latest/source/arch/mips/mm/cache.c#L137 > > So funnily, I'd do the contrary of what you have done, the mips way: > > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c > index 8930ab7278e6..c90c8bb49109 100644 > --- a/arch/riscv/mm/cacheflush.c > +++ b/arch/riscv/mm/cacheflush.c > @@ -84,6 +84,9 @@ void flush_icache_pte(pte_t pte) > { > struct page *page = pte_page(pte); > > + if (unlikely(!pfn_valid(pte_pfn(pte)))) > + return; > + > if (!test_and_set_bit(PG_dcache_clean, &page->flags)) > flush_icache_all(); > } > > What do you think ? > I wouldn't mind doing it like above. I suspect that became the common simple pattern because no one expected a use case like with Jailhouse. But I'm by far not an expert in mm topics in the kernel. Jan
diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c index 8930ab7278e6..9ee2c1a387cc 100644 --- a/arch/riscv/mm/cacheflush.c +++ b/arch/riscv/mm/cacheflush.c @@ -84,7 +84,8 @@ void flush_icache_pte(pte_t pte) { struct page *page = pte_page(pte); - if (!test_and_set_bit(PG_dcache_clean, &page->flags)) + if (!pfn_valid(pte_pfn(pte)) || + !test_and_set_bit(PG_dcache_clean, &page->flags)) flush_icache_all(); } #endif /* CONFIG_MMU */