diff mbox

[v2] ARM64: Kernel managed pages are only flushed

Message ID 1394018716-17075-1-git-send-email-Bharat.Bhushan@freescale.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bharat Bhushan March 5, 2014, 11:25 a.m. UTC
Kernel can only access pages which maps to managed memory.
So flush only valid kernel pages.

I observed kernel crash direct assigning a device using VFIO
and found that it was caused because of accessing invalid page

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
---
v1->v2
 Getting pfn usin pte_pfn() in pfn_valid.

 arch/arm64/mm/flush.c |   13 ++++++++++++-
 1 files changed, 12 insertions(+), 1 deletions(-)

Comments

Will Deacon March 5, 2014, 4:12 p.m. UTC | #1
On Wed, Mar 05, 2014 at 11:25:16AM +0000, Bharat Bhushan wrote:
> Kernel can only access pages which maps to managed memory.
> So flush only valid kernel pages.
> 
> I observed kernel crash direct assigning a device using VFIO
> and found that it was caused because of accessing invalid page
> 
> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> ---
> v1->v2
>  Getting pfn usin pte_pfn() in pfn_valid.
> 
>  arch/arm64/mm/flush.c |   13 ++++++++++++-
>  1 files changed, 12 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> index e4193e3..319826a 100644
> --- a/arch/arm64/mm/flush.c
> +++ b/arch/arm64/mm/flush.c
> @@ -72,7 +72,18 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
>  
>  void __sync_icache_dcache(pte_t pte, unsigned long addr)
>  {
> -	struct page *page = pte_page(pte);
> +	struct page *page;
> +
> +#ifdef CONFIG_HAVE_ARCH_PFN_VALID
> +	/*
> +	 * We can only access pages that the kernel maps
> +	 * as memory. Bail out for unmapped ones.
> +	 */
> +	if (!pfn_valid(pte_pfn(pte)))
> +		return;
> +
> +#endif
> +	page = pte_page(pte);

How do you get into this function without a valid, userspace, executable pte?

I suspect you've got changes elsewhere and are calling this function in a
context where it's not supposed to be called.

Will
Bharat Bhushan March 5, 2014, 4:27 p.m. UTC | #2
> -----Original Message-----
> From: Will Deacon [mailto:will.deacon@arm.com]
> Sent: Wednesday, March 05, 2014 9:43 PM
> To: Bhushan Bharat-R65777
> Cc: Catalin Marinas; linux-arm-kernel@lists.infradead.org; Bhushan Bharat-R65777
> Subject: Re: [PATCH v2] ARM64: Kernel managed pages are only flushed
> 
> On Wed, Mar 05, 2014 at 11:25:16AM +0000, Bharat Bhushan wrote:
> > Kernel can only access pages which maps to managed memory.
> > So flush only valid kernel pages.
> >
> > I observed kernel crash direct assigning a device using VFIO and found
> > that it was caused because of accessing invalid page
> >
> > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> > ---
> > v1->v2
> >  Getting pfn usin pte_pfn() in pfn_valid.
> >
> >  arch/arm64/mm/flush.c |   13 ++++++++++++-
> >  1 files changed, 12 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c index
> > e4193e3..319826a 100644
> > --- a/arch/arm64/mm/flush.c
> > +++ b/arch/arm64/mm/flush.c
> > @@ -72,7 +72,18 @@ void copy_to_user_page(struct vm_area_struct *vma,
> > struct page *page,
> >
> >  void __sync_icache_dcache(pte_t pte, unsigned long addr)  {
> > -	struct page *page = pte_page(pte);
> > +	struct page *page;
> > +
> > +#ifdef CONFIG_HAVE_ARCH_PFN_VALID
> > +	/*
> > +	 * We can only access pages that the kernel maps
> > +	 * as memory. Bail out for unmapped ones.
> > +	 */
> > +	if (!pfn_valid(pte_pfn(pte)))
> > +		return;
> > +
> > +#endif
> > +	page = pte_page(pte);
> 
> How do you get into this function without a valid, userspace, executable pte?
> 
> I suspect you've got changes elsewhere and are calling this function in a
> context where it's not supposed to be called.

Below I will describe the context in which this function is called:

When we direct assign a bus device (we have a different freescale specific bus device but we can take PCI device for discussion as this logic applies equally for PCI device I think) to user space using VFIO. Then userspace needs to mmap(PCI_BARx_offset: this PCI bar offset in not a kernel visible memory). Then VFIO-kernel mmap() ioctl code calls remap_pfn_range() for mapping the requested address. While remap_pfn_range() internally calls this function.

Thanks
-Bharat

> 
> Will
>
Laura Abbott March 5, 2014, 8:03 p.m. UTC | #3
On 3/5/2014 8:27 AM, Bharat.Bhushan@freescale.com wrote:
>
>
>> -----Original Message-----
>> From: Will Deacon [mailto:will.deacon@arm.com]
>> Sent: Wednesday, March 05, 2014 9:43 PM
>> To: Bhushan Bharat-R65777
>> Cc: Catalin Marinas; linux-arm-kernel@lists.infradead.org; Bhushan Bharat-R65777
>> Subject: Re: [PATCH v2] ARM64: Kernel managed pages are only flushed
>>
>> On Wed, Mar 05, 2014 at 11:25:16AM +0000, Bharat Bhushan wrote:
>>> Kernel can only access pages which maps to managed memory.
>>> So flush only valid kernel pages.
>>>
>>> I observed kernel crash direct assigning a device using VFIO and found
>>> that it was caused because of accessing invalid page
>>>
>>> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
>>> ---
>>> v1->v2
>>>   Getting pfn usin pte_pfn() in pfn_valid.
>>>
>>>   arch/arm64/mm/flush.c |   13 ++++++++++++-
>>>   1 files changed, 12 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c index
>>> e4193e3..319826a 100644
>>> --- a/arch/arm64/mm/flush.c
>>> +++ b/arch/arm64/mm/flush.c
>>> @@ -72,7 +72,18 @@ void copy_to_user_page(struct vm_area_struct *vma,
>>> struct page *page,
>>>
>>>   void __sync_icache_dcache(pte_t pte, unsigned long addr)  {
>>> -	struct page *page = pte_page(pte);
>>> +	struct page *page;
>>> +
>>> +#ifdef CONFIG_HAVE_ARCH_PFN_VALID
>>> +	/*
>>> +	 * We can only access pages that the kernel maps
>>> +	 * as memory. Bail out for unmapped ones.
>>> +	 */
>>> +	if (!pfn_valid(pte_pfn(pte)))
>>> +		return;
>>> +
>>> +#endif
>>> +	page = pte_page(pte);
>>
>> How do you get into this function without a valid, userspace, executable pte?
>>
>> I suspect you've got changes elsewhere and are calling this function in a
>> context where it's not supposed to be called.
>
> Below I will describe the context in which this function is called:
>
> When we direct assign a bus device (we have a different freescale specific bus
 > device but we can take PCI device for discussion as this logic 
applies equally
 > for PCI device I think) to user space using VFIO. Then userspace needs to
 > mmap(PCI_BARx_offset: this PCI bar offset in not a kernel visible 
memory).
> Then VFIO-kernel mmap() ioctl code calls remap_pfn_range()  for mapping the
 >requested address. While remap_pfn_range() internally calls this function.
>

As someone who likes calling functions in context where they aren't 
supposed to be called, I took a look a this because I was curious.

I can confirm the same problem trying to mmap arbitrary io address space 
with remap_pfn_range. We should only be hitting this if the pte is 
marked as exec per set_pte_at. With my test case, even mmaping with only 
PROT_READ and PROT_WRITE was setting PROT_EXEC as well which was 
triggering the bug. This seems to be because READ_IMPLIES_EXEC 
personality was set which was derived from

#define elf_read_implies_exec(ex,stk)   (stk != EXSTACK_DISABLE_X)

and none of the binaries I'm generating seem to be setting the stack 
execute bit either way (all are EXECSTACK_DEFAULT).

It's not obvious what the best solution is here.

Thanks,
Laura
Will Deacon March 6, 2014, 4:18 p.m. UTC | #4
Hi Laura,

On Wed, Mar 05, 2014 at 08:03:58PM +0000, Laura Abbott wrote:
> On 3/5/2014 8:27 AM, Bharat.Bhushan@freescale.com wrote:
> >> On Wed, Mar 05, 2014 at 11:25:16AM +0000, Bharat Bhushan wrote:
> >>> Kernel can only access pages which maps to managed memory.
> >>> So flush only valid kernel pages.
> >>>
> >> How do you get into this function without a valid, userspace, executable pte?
> >>
> >> I suspect you've got changes elsewhere and are calling this function in a
> >> context where it's not supposed to be called.
> >
> > Below I will describe the context in which this function is called:
> >
> > When we direct assign a bus device (we have a different freescale specific bus
>  > device but we can take PCI device for discussion as this logic 
> applies equally
>  > for PCI device I think) to user space using VFIO. Then userspace needs to
>  > mmap(PCI_BARx_offset: this PCI bar offset in not a kernel visible 
> memory).
> > Then VFIO-kernel mmap() ioctl code calls remap_pfn_range()  for mapping the
>  >requested address. While remap_pfn_range() internally calls this function.
> >
> 
> As someone who likes calling functions in context where they aren't 
> supposed to be called, I took a look a this because I was curious.

Somebody should hide your keyboard. Stephen?

> I can confirm the same problem trying to mmap arbitrary io address space 
> with remap_pfn_range. We should only be hitting this if the pte is 
> marked as exec per set_pte_at. With my test case, even mmaping with only 
> PROT_READ and PROT_WRITE was setting PROT_EXEC as well which was 
> triggering the bug. This seems to be because READ_IMPLIES_EXEC 
> personality was set which was derived from
> 
> #define elf_read_implies_exec(ex,stk)   (stk != EXSTACK_DISABLE_X)
> 
> and none of the binaries I'm generating seem to be setting the stack 
> execute bit either way (all are EXECSTACK_DEFAULT).
> 
> It's not obvious what the best solution is here.

It would be nice if something like phys_mem_access_prot was used by the
callers, since this is used by the /dev/mem driver to make sure that the
pgprot is sane for the underlying pfn. In the absence of that, I guess we
could add the pfn_valid check (we have it already on arch/arm/) but if that
means we end up with executable devices, we're still entering a world of
looks-like-my-instruction-fetcher-just-acked-an-irq style pain.

Will
diff mbox

Patch

diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index e4193e3..319826a 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -72,7 +72,18 @@  void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
 
 void __sync_icache_dcache(pte_t pte, unsigned long addr)
 {
-	struct page *page = pte_page(pte);
+	struct page *page;
+
+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
+	/*
+	 * We can only access pages that the kernel maps
+	 * as memory. Bail out for unmapped ones.
+	 */
+	if (!pfn_valid(pte_pfn(pte)))
+		return;
+
+#endif
+	page = pte_page(pte);
 
 	/* no flushing needed for anonymous pages */
 	if (!page_mapping(page))