mm: avoid setting up anonymous pages into file mapping
diff mbox

Message ID 1435932447-84377-1-git-send-email-kirill.shutemov@linux.intel.com
State New
Headers show

Commit Message

Kirill A. Shutemov July 3, 2015, 2:07 p.m. UTC
Reading page fault handler code I've noticed that under right
circumstances kernel would map anonymous pages into file mappings:
if the VMA doesn't have vm_ops->fault() and the VMA wasn't fully
populated on ->mmap(), kernel would handle page fault to not populated
pte with do_anonymous_page().

There's chance that it was done intentionally, but I don't see good
justification for this. We just hide bugs in broken drivers.

Let's change page fault handler to use do_anonymous_page() only on
anonymous VMA (->vm_ops == NULL).

For file mappings without vm_ops->fault() page fault on pte_none() entry
would lead to SIGBUS.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/memory.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

Comments

Boaz Harrosh July 5, 2015, 3:15 p.m. UTC | #1
On 07/03/2015 05:07 PM, Kirill A. Shutemov wrote:
> Reading page fault handler code I've noticed that under right
> circumstances kernel would map anonymous pages into file mappings:
> if the VMA doesn't have vm_ops->fault() and the VMA wasn't fully
> populated on ->mmap(), kernel would handle page fault to not populated
> pte with do_anonymous_page().
> 
> There's chance that it was done intentionally, but I don't see good
> justification for this. We just hide bugs in broken drivers.
> 

Have you done a preliminary audit for these broken drivers? If they actually
exist in-tree then this patch is a regression for them.

We need to look for vm_ops without an .fault = . Perhaps define a
map_annonimous() for those to revert to the old behavior, if any
actually exist.

> Let's change page fault handler to use do_anonymous_page() only on
> anonymous VMA (->vm_ops == NULL).
> 
> For file mappings without vm_ops->fault() page fault on pte_none() entry
> would lead to SIGBUS.
> 

Again that could mean a theoretical regression for some in-tree driver,
do you know of any such driver?

Thanks
Boaz

> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  mm/memory.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 8a2fc9945b46..f3ee782059e3 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3115,6 +3115,9 @@ static int do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  			- vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
>  
>  	pte_unmap(page_table);
> +
> +	if (unlikely(!vma->vm_ops->fault))
> +		return VM_FAULT_SIGBUS;
>  	if (!(flags & FAULT_FLAG_WRITE))
>  		return do_read_fault(mm, vma, address, pmd, pgoff, flags,
>  				orig_pte);
> @@ -3260,13 +3263,13 @@ static int handle_pte_fault(struct mm_struct *mm,
>  	barrier();
>  	if (!pte_present(entry)) {
>  		if (pte_none(entry)) {
> -			if (vma->vm_ops) {
> -				if (likely(vma->vm_ops->fault))
> -					return do_fault(mm, vma, address, pte,
> -							pmd, flags, entry);
> +			if (!vma->vm_ops) {
> +				return do_anonymous_page(mm, vma, address, pte,
> +						pmd, flags);
> +			} else {
> +				return do_fault(mm, vma, address, pte, pmd,
> +						flags, entry);
>  			}
> -			return do_anonymous_page(mm, vma, address,
> -						 pte, pmd, flags);
>  		}
>  		return do_swap_page(mm, vma, address,
>  					pte, pmd, flags, entry);
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kirill A. Shutemov July 5, 2015, 3:44 p.m. UTC | #2
On Sun, Jul 05, 2015 at 06:15:20PM +0300, Boaz Harrosh wrote:
> On 07/03/2015 05:07 PM, Kirill A. Shutemov wrote:
> > Reading page fault handler code I've noticed that under right
> > circumstances kernel would map anonymous pages into file mappings:
> > if the VMA doesn't have vm_ops->fault() and the VMA wasn't fully
> > populated on ->mmap(), kernel would handle page fault to not populated
> > pte with do_anonymous_page().
> > 
> > There's chance that it was done intentionally, but I don't see good
> > justification for this. We just hide bugs in broken drivers.
> > 
> 
> Have you done a preliminary audit for these broken drivers? If they actually
> exist in-tree then this patch is a regression for them.

No, I didn't check drivers.

On other hand, if such driver exists it has security issue. If you're
able to setup zero page into file mapping, you can make it writable with
security implications.

> We need to look for vm_ops without an .fault = . Perhaps define a
> map_annonimous() for those to revert to the old behavior, if any
> actually exist.

No. Drivers should be fixed properly.

> > Let's change page fault handler to use do_anonymous_page() only on
> > anonymous VMA (->vm_ops == NULL).
> > 
> > For file mappings without vm_ops->fault() page fault on pte_none() entry
> > would lead to SIGBUS.
> > 
> 
> Again that could mean a theoretical regression for some in-tree driver,
> do you know of any such driver?

I did very little testing with the patch: boot kvm with Fedora and run
trinity there for a while. More testing is required.
Boaz Harrosh July 5, 2015, 4:38 p.m. UTC | #3
On 07/05/2015 06:44 PM, Kirill A. Shutemov wrote:
>> Again that could mean a theoretical regression for some in-tree driver,
>> do you know of any such driver?
> 
> I did very little testing with the patch: boot kvm with Fedora and run
> trinity there for a while. More testing is required.
> 

It seems more likely to be a bug in some obscure real HW driver, then
anything virtualized.

Let me run a quick search and see if I can see any obvious candidates
for this ...

<arch/x86/kernel/vsyscall_64.c>
static struct vm_operations_struct gate_vma_ops = {
	.name = gate_vma_name,
};

Perhaps it was done for this one
</arch/x86/kernel/vsyscall_64.c>

<arch/x86/mm/mpx.c>
static struct vm_operations_struct mpx_vma_ops = {
	.name = mpx_mapping_name,
};

Or this

</arch/x86/mm/mpx.c>

<more>
static const struct vm_operations_struct pci_mmap_ops = {

static const struct vm_operations_struct mmap_mem_ops = {

...
</more>

I was looking in-tree for any vm_operations_struct declaration without a .fault
member, there are these above and a slue of HW drivers that only have an .open
and .close so those might populate at open time and never actually ever fault.

Please have a quick look, I did not. I agree about the possible security badness.

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/mm/memory.c b/mm/memory.c
index 8a2fc9945b46..f3ee782059e3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3115,6 +3115,9 @@  static int do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 			- vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
 
 	pte_unmap(page_table);
+
+	if (unlikely(!vma->vm_ops->fault))
+		return VM_FAULT_SIGBUS;
 	if (!(flags & FAULT_FLAG_WRITE))
 		return do_read_fault(mm, vma, address, pmd, pgoff, flags,
 				orig_pte);
@@ -3260,13 +3263,13 @@  static int handle_pte_fault(struct mm_struct *mm,
 	barrier();
 	if (!pte_present(entry)) {
 		if (pte_none(entry)) {
-			if (vma->vm_ops) {
-				if (likely(vma->vm_ops->fault))
-					return do_fault(mm, vma, address, pte,
-							pmd, flags, entry);
+			if (!vma->vm_ops) {
+				return do_anonymous_page(mm, vma, address, pte,
+						pmd, flags);
+			} else {
+				return do_fault(mm, vma, address, pte, pmd,
+						flags, entry);
 			}
-			return do_anonymous_page(mm, vma, address,
-						 pte, pmd, flags);
 		}
 		return do_swap_page(mm, vma, address,
 					pte, pmd, flags, entry);