diff mbox

[2/3] ARM: cacheflush: don't bother rounding to nearest vma

Message ID 1364235486-17738-3-git-send-email-will.deacon@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Will Deacon March 25, 2013, 6:18 p.m. UTC
do_cache_op finds the lowest VMA contained in the specified address
range and rounds the range to cover only the mapped addresses.

Since commit 4542b6a0fa6b ("ARM: 7365/1: drop unused parameter from
flush_cache_user_range") the VMA is not used for anything else in this
code and seeing as the low-level cache flushing routines return -EFAULT
if the address is not valid, there is no need for this range truncation.

This patch removes the VMA handling code from the cacheflushing syscall.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/kernel/traps.c | 17 +----------------
 1 file changed, 1 insertion(+), 16 deletions(-)

Comments

Catalin Marinas March 27, 2013, 11:09 a.m. UTC | #1
On Mon, Mar 25, 2013 at 06:18:05PM +0000, Will Deacon wrote:
> do_cache_op finds the lowest VMA contained in the specified address
> range and rounds the range to cover only the mapped addresses.
> 
> Since commit 4542b6a0fa6b ("ARM: 7365/1: drop unused parameter from
> flush_cache_user_range") the VMA is not used for anything else in this
> code and seeing as the low-level cache flushing routines return -EFAULT
> if the address is not valid, there is no need for this range truncation.
> 
> This patch removes the VMA handling code from the cacheflushing syscall.
> 
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---
>  arch/arm/kernel/traps.c | 17 +----------------
>  1 file changed, 1 insertion(+), 16 deletions(-)
> 
> diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
> index 1c08911..da5e268 100644
> --- a/arch/arm/kernel/traps.c
> +++ b/arch/arm/kernel/traps.c
> @@ -509,25 +509,10 @@ static int bad_syscall(int n, struct pt_regs *regs)
>  static inline int
>  do_cache_op(unsigned long start, unsigned long end, int flags)
>  {
> -	struct mm_struct *mm = current->active_mm;
> -	struct vm_area_struct *vma;
> -
>  	if (end < start || flags)
>  		return -EINVAL;
>  
> -	down_read(&mm->mmap_sem);
> -	vma = find_vma(mm, start);
> -	if (vma && vma->vm_start < end) {
> -		if (start < vma->vm_start)
> -			start = vma->vm_start;
> -		if (end > vma->vm_end)
> -			end = vma->vm_end;
> -
> -		up_read(&mm->mmap_sem);
> -		return flush_cache_user_range(start, end);
> -	}
> -	up_read(&mm->mmap_sem);
> -	return -EINVAL;
> +	return flush_cache_user_range(start, end);

While this would work, it introduces a possibility of DoS where an
application passes bigger valid range (kernel linear mapping) and the
kernel code would not be preempted (CONFIG_PREEMPT disabled). IIRC,
that's why Russell reject such patch a while back.
Will Deacon March 27, 2013, 12:15 p.m. UTC | #2
Hello,

On Wed, Mar 27, 2013 at 11:09:38AM +0000, Catalin Marinas wrote:
> On Mon, Mar 25, 2013 at 06:18:05PM +0000, Will Deacon wrote:
> > diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
> > index 1c08911..da5e268 100644
> > --- a/arch/arm/kernel/traps.c
> > +++ b/arch/arm/kernel/traps.c
> > @@ -509,25 +509,10 @@ static int bad_syscall(int n, struct pt_regs *regs)
> >  static inline int
> >  do_cache_op(unsigned long start, unsigned long end, int flags)
> >  {
> > -	struct mm_struct *mm = current->active_mm;
> > -	struct vm_area_struct *vma;
> > -
> >  	if (end < start || flags)
> >  		return -EINVAL;
> >  
> > -	down_read(&mm->mmap_sem);
> > -	vma = find_vma(mm, start);
> > -	if (vma && vma->vm_start < end) {
> > -		if (start < vma->vm_start)
> > -			start = vma->vm_start;
> > -		if (end > vma->vm_end)
> > -			end = vma->vm_end;
> > -
> > -		up_read(&mm->mmap_sem);
> > -		return flush_cache_user_range(start, end);
> > -	}
> > -	up_read(&mm->mmap_sem);
> > -	return -EINVAL;
> > +	return flush_cache_user_range(start, end);
> 
> While this would work, it introduces a possibility of DoS where an
> application passes bigger valid range (kernel linear mapping) and the
> kernel code would not be preempted (CONFIG_PREEMPT disabled). IIRC,
> that's why Russell reject such patch a while back.

Hmm, I'm not sure I buy that argument. Firstly, you can't just pass a kernel
linear mapping address -- we'll fault straight away because it's not a
userspace address. Secondly, what's to stop an application from mmaping a
large area into a single VMA and giving rise to the same situation? Finally,
interrupts are enabled during this operation, so I don't understand how you
can trigger a DoS, irrespective of the preempt configuration.

Is there an old thread I can refer to with more details about this? It may
be that some of the assumptions there no longer hold with subsequent changes
to the fault handling on this path.

Will
Catalin Marinas March 27, 2013, 12:21 p.m. UTC | #3
On Wed, Mar 27, 2013 at 12:15:12PM +0000, Will Deacon wrote:
> On Wed, Mar 27, 2013 at 11:09:38AM +0000, Catalin Marinas wrote:
> > On Mon, Mar 25, 2013 at 06:18:05PM +0000, Will Deacon wrote:
> > > diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
> > > index 1c08911..da5e268 100644
> > > --- a/arch/arm/kernel/traps.c
> > > +++ b/arch/arm/kernel/traps.c
> > > @@ -509,25 +509,10 @@ static int bad_syscall(int n, struct pt_regs *regs)
> > >  static inline int
> > >  do_cache_op(unsigned long start, unsigned long end, int flags)
> > >  {
> > > -	struct mm_struct *mm = current->active_mm;
> > > -	struct vm_area_struct *vma;
> > > -
> > >  	if (end < start || flags)
> > >  		return -EINVAL;
> > >  
> > > -	down_read(&mm->mmap_sem);
> > > -	vma = find_vma(mm, start);
> > > -	if (vma && vma->vm_start < end) {
> > > -		if (start < vma->vm_start)
> > > -			start = vma->vm_start;
> > > -		if (end > vma->vm_end)
> > > -			end = vma->vm_end;
> > > -
> > > -		up_read(&mm->mmap_sem);
> > > -		return flush_cache_user_range(start, end);
> > > -	}
> > > -	up_read(&mm->mmap_sem);
> > > -	return -EINVAL;
> > > +	return flush_cache_user_range(start, end);
> > 
> > While this would work, it introduces a possibility of DoS where an
> > application passes bigger valid range (kernel linear mapping) and the
> > kernel code would not be preempted (CONFIG_PREEMPT disabled). IIRC,
> > that's why Russell reject such patch a while back.
> 
> Hmm, I'm not sure I buy that argument. Firstly, you can't just pass a kernel
> linear mapping address -- we'll fault straight away because it's not a
> userspace address.

Fault where?

> Secondly, what's to stop an application from mmaping a large area into
> a single VMA and giving rise to the same situation? Finally,
> interrupts are enabled during this operation, so I don't understand
> how you can trigger a DoS, irrespective of the preempt configuration.

You can prevent context switching to other threads. But I agree, with a
large vma (which is already faulted in), you can get similar behaviour.

> Is there an old thread I can refer to with more details about this? It may
> be that some of the assumptions there no longer hold with subsequent changes
> to the fault handling on this path.

You could search the list for "do_cache_op", I don't have any at hand.
Will Deacon March 27, 2013, 12:43 p.m. UTC | #4
On Wed, Mar 27, 2013 at 12:21:59PM +0000, Catalin Marinas wrote:
> On Wed, Mar 27, 2013 at 12:15:12PM +0000, Will Deacon wrote:
> > On Wed, Mar 27, 2013 at 11:09:38AM +0000, Catalin Marinas wrote:
> > > While this would work, it introduces a possibility of DoS where an
> > > application passes bigger valid range (kernel linear mapping) and the
> > > kernel code would not be preempted (CONFIG_PREEMPT disabled). IIRC,
> > > that's why Russell reject such patch a while back.
> > 
> > Hmm, I'm not sure I buy that argument. Firstly, you can't just pass a kernel
> > linear mapping address -- we'll fault straight away because it's not a
> > userspace address.
> 
> Fault where?

I was expecting something like an access_ok check, but you're right, we
don't have one (and I guess that's not strictly needed given that flushing
isn't destructive). I still find it a bit scary that we allow userspace to
pass kernel addresses through though -- especially if there's something like
a DMA or CPU suspend operation running on another core.

> > Secondly, what's to stop an application from mmaping a large area into
> > a single VMA and giving rise to the same situation? Finally,
> > interrupts are enabled during this operation, so I don't understand
> > how you can trigger a DoS, irrespective of the preempt configuration.
> 
> You can prevent context switching to other threads. But I agree, with a
> large vma (which is already faulted in), you can get similar behaviour.

So the easy fix is to split the range up into chunks and call cond_resched
after processing each one.

Will
Catalin Marinas March 27, 2013, 1:08 p.m. UTC | #5
On Wed, Mar 27, 2013 at 12:43:52PM +0000, Will Deacon wrote:
> On Wed, Mar 27, 2013 at 12:21:59PM +0000, Catalin Marinas wrote:
> > On Wed, Mar 27, 2013 at 12:15:12PM +0000, Will Deacon wrote:
> > > On Wed, Mar 27, 2013 at 11:09:38AM +0000, Catalin Marinas wrote:
> > > > While this would work, it introduces a possibility of DoS where an
> > > > application passes bigger valid range (kernel linear mapping) and the
> > > > kernel code would not be preempted (CONFIG_PREEMPT disabled). IIRC,
> > > > that's why Russell reject such patch a while back.
> > > 
> > > Hmm, I'm not sure I buy that argument. Firstly, you can't just pass a kernel
> > > linear mapping address -- we'll fault straight away because it's not a
> > > userspace address.
> > 
> > Fault where?
> 
> I was expecting something like an access_ok check, but you're right, we
> don't have one (and I guess that's not strictly needed given that flushing
> isn't destructive). I still find it a bit scary that we allow userspace to
> pass kernel addresses through though -- especially if there's something like
> a DMA or CPU suspend operation running on another core.

Currently we don't allow kernel addresses since we require a valid vma.
If we drop the vma search, we should add an access_ok.

> > > Secondly, what's to stop an application from mmaping a large area into
> > > a single VMA and giving rise to the same situation? Finally,
> > > interrupts are enabled during this operation, so I don't understand
> > > how you can trigger a DoS, irrespective of the preempt configuration.
> > 
> > You can prevent context switching to other threads. But I agree, with a
> > large vma (which is already faulted in), you can get similar behaviour.
> 
> So the easy fix is to split the range up into chunks and call cond_resched
> after processing each one.

This would work.
diff mbox

Patch

diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 1c08911..da5e268 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -509,25 +509,10 @@  static int bad_syscall(int n, struct pt_regs *regs)
 static inline int
 do_cache_op(unsigned long start, unsigned long end, int flags)
 {
-	struct mm_struct *mm = current->active_mm;
-	struct vm_area_struct *vma;
-
 	if (end < start || flags)
 		return -EINVAL;
 
-	down_read(&mm->mmap_sem);
-	vma = find_vma(mm, start);
-	if (vma && vma->vm_start < end) {
-		if (start < vma->vm_start)
-			start = vma->vm_start;
-		if (end > vma->vm_end)
-			end = vma->vm_end;
-
-		up_read(&mm->mmap_sem);
-		return flush_cache_user_range(start, end);
-	}
-	up_read(&mm->mmap_sem);
-	return -EINVAL;
+	return flush_cache_user_range(start, end);
 }
 
 /*