diff mbox series

[RFC] ARM: enable irq in translation/section permission fault handlers

Message ID 20190215200533.ypfrdekg7j4ucu6a@linutronix.de (mailing list archive)
State RFC
Headers show
Series [RFC] ARM: enable irq in translation/section permission fault handlers | expand

Commit Message

Sebastian Andrzej Siewior Feb. 15, 2019, 8:05 p.m. UTC
From: "Yadi.hu" <yadi.hu@windriver.com>

Accessing a kernel address in user space causes a SIGSEGV which is sent
via
-> do_DataAbort
   -> do_sect_fault || do_translation_fault
      -> do_bad_area
         -> __do_user_fault
            -> force_sig_fault
Since commit

  02fe2845d6a83 ("ARM: entry: avoid enabling interrupts in prefetch/data abort handlers")

that path is carried out with disabled interrupts. Page/alignment fault
do enable interrupts but data abort has been left out.

On -RT the siglock is a sleeping spinlock and requires interrupts to be
enabled in order to acquire it.

Enable interrupts in the DataAbort handler if the parent context had
interrupts enabled. Move harden_branch_predictor() before interrupts are
enabled.

Reported-by: <Bernd Edlinger <bernd.edlinger@hotmail.de>
Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
[bigeasy: rewrote patch description, reordered patch]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
I though that the "interrupt enable part" has already been posted and
then Bernd complained about a warning from harden_branch_predictor() on
-RT so here it is.

 arch/arm/mm/fault.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Bernd Edlinger Feb. 15, 2019, 9:57 p.m. UTC | #1
On 2/15/19 9:05 PM, Sebastian Andrzej Siewior wrote:
> From: "Yadi.hu" <yadi.hu@windriver.com>
> 
> Accessing a kernel address in user space causes a SIGSEGV which is sent
> via
> -> do_DataAbort
>    -> do_sect_fault || do_translation_fault
>       -> do_bad_area
>          -> __do_user_fault
>             -> force_sig_fault
> Since commit
> 
>   02fe2845d6a83 ("ARM: entry: avoid enabling interrupts in prefetch/data abort handlers")
> 
> that path is carried out with disabled interrupts. Page/alignment fault
> do enable interrupts but data abort has been left out.
> 
> On -RT the siglock is a sleeping spinlock and requires interrupts to be
> enabled in order to acquire it.
> 
> Enable interrupts in the DataAbort handler if the parent context had
> interrupts enabled. Move harden_branch_predictor() before interrupts are
> enabled.
> 
> Reported-by: <Bernd Edlinger <bernd.edlinger@hotmail.de>
> Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
> [bigeasy: rewrote patch description, reordered patch]
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> I though that the "interrupt enable part" has already been posted and
> then Bernd complained about a warning from harden_branch_predictor() on
> -RT so here it is.
> 
>  arch/arm/mm/fault.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> index 58f69fa07df95..da82967865836 100644
> --- a/arch/arm/mm/fault.c
> +++ b/arch/arm/mm/fault.c
> @@ -161,8 +161,6 @@ __do_user_fault(struct task_struct *tsk, unsigned long addr,
>  		unsigned int fsr, unsigned int sig, int code,
>  		struct pt_regs *regs)
>  {
> -	if (addr > TASK_SIZE)
> -		harden_branch_predictor();
>  
>  #ifdef CONFIG_DEBUG_USER
>  	if (((user_debug & UDBG_SEGV) && (sig == SIGSEGV)) ||
> @@ -191,6 +189,11 @@ void do_bad_area(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
>  	struct task_struct *tsk = current;
>  	struct mm_struct *mm = tsk->active_mm;
>  
> +	if (addr > TASK_SIZE && user_mode(regs))
> +		harden_branch_predictor();

This is somehow inconsisten with do_translation_fault, where
we have this:

	if (addr < TASK_SIZE)
		return do_page_fault(addr, fsr, regs);

> +
> +	if (interrupts_enabled(regs))
> +		local_irq_enable();
>  	/*
>  	 * If we are in kernel mode at this point, we
>  	 * have no context to handle this fault with.
> 

I have seen three different failure modes, pleas see the first 3 calls stacks
here: https://marc.info/?l=linux-rt-users&m=155016888714927&w=2

I am concerned about this fist issue, because it removes the branch
predictor hardening after the do_page_fault has executed:

do_DataAbort->do_page_fault(addr>TASK_SIZE)->__do_user_fault

This is reachable because do_page_fault is not only called from
do_translation_fault but also from here: arch/arm/mm/fsr-2level.c
and here: arch/arm/mm/fsr-3level.c
those are callable with addr > TASK_SIZE

And the following code path does enable the hard irqs before do_bad_area:
do_DataAbort->do_sect_fault->do_bad_area->__do_user_fault

So this function, would need to be rewritten:

do_sect_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
{
	if (interrupts_enabled(regs))
		local_irq_enable();

	do_bad_area(addr, fsr, regs);
	return 0;
}



BTW, the complete linux tree I am testing right now is here:
https://github.com/bernd-edlinger/linux-rt
in the branch: patches-rel-v4.14-rt


Thanks
Bernd.
Russell King (Oracle) Feb. 16, 2019, 11:33 a.m. UTC | #2
On Fri, Feb 15, 2019 at 09:05:33PM +0100, Sebastian Andrzej Siewior wrote:
> From: "Yadi.hu" <yadi.hu@windriver.com>
> 
> Accessing a kernel address in user space causes a SIGSEGV which is sent
> via
> -> do_DataAbort
>    -> do_sect_fault || do_translation_fault
>       -> do_bad_area
>          -> __do_user_fault
>             -> force_sig_fault
> Since commit
> 
>   02fe2845d6a83 ("ARM: entry: avoid enabling interrupts in prefetch/data abort handlers")
> 
> that path is carried out with disabled interrupts. Page/alignment fault
> do enable interrupts but data abort has been left out.
> 
> On -RT the siglock is a sleeping spinlock and requires interrupts to be
> enabled in order to acquire it.
> 
> Enable interrupts in the DataAbort handler if the parent context had
> interrupts enabled. Move harden_branch_predictor() before interrupts are
> enabled.

One very very big NAK.

You haven't given any reasoning for moving harden_branch_predictor().

Moving it in this way, you are re-opening systems up for branch
predictor attacks, since there is now a path through the data abort
handler where userspace can attempt to access kernel space, resulting
in a fault being generated, but the branch predictor hardening is no
longer called.

> Reported-by: <Bernd Edlinger <bernd.edlinger@hotmail.de>
> Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
> [bigeasy: rewrote patch description, reordered patch]
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> I though that the "interrupt enable part" has already been posted and
> then Bernd complained about a warning from harden_branch_predictor() on
> -RT so here it is.
> 
>  arch/arm/mm/fault.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> index 58f69fa07df95..da82967865836 100644
> --- a/arch/arm/mm/fault.c
> +++ b/arch/arm/mm/fault.c
> @@ -161,8 +161,6 @@ __do_user_fault(struct task_struct *tsk, unsigned long addr,
>  		unsigned int fsr, unsigned int sig, int code,
>  		struct pt_regs *regs)
>  {
> -	if (addr > TASK_SIZE)
> -		harden_branch_predictor();
>  
>  #ifdef CONFIG_DEBUG_USER
>  	if (((user_debug & UDBG_SEGV) && (sig == SIGSEGV)) ||
> @@ -191,6 +189,11 @@ void do_bad_area(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
>  	struct task_struct *tsk = current;
>  	struct mm_struct *mm = tsk->active_mm;
>  
> +	if (addr > TASK_SIZE && user_mode(regs))
> +		harden_branch_predictor();
> +
> +	if (interrupts_enabled(regs))
> +		local_irq_enable();
>  	/*
>  	 * If we are in kernel mode at this point, we
>  	 * have no context to handle this fault with.
> -- 
> 2.20.1
>
Sebastian Andrzej Siewior Feb. 20, 2019, 10:50 a.m. UTC | #3
On 2019-02-15 21:57:56 [+0000], Bernd Edlinger wrote:
> > diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> > index 58f69fa07df95..da82967865836 100644
> > --- a/arch/arm/mm/fault.c
> > +++ b/arch/arm/mm/fault.c
> > @@ -161,8 +161,6 @@ __do_user_fault(struct task_struct *tsk, unsigned long addr,
> >  		unsigned int fsr, unsigned int sig, int code,
> >  		struct pt_regs *regs)
> >  {
> > -	if (addr > TASK_SIZE)
> > -		harden_branch_predictor();
> >  
> >  #ifdef CONFIG_DEBUG_USER
> >  	if (((user_debug & UDBG_SEGV) && (sig == SIGSEGV)) ||
> > @@ -191,6 +189,11 @@ void do_bad_area(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
> >  	struct task_struct *tsk = current;
> >  	struct mm_struct *mm = tsk->active_mm;
> >  
> > +	if (addr > TASK_SIZE && user_mode(regs))
> > +		harden_branch_predictor();
> 
> This is somehow inconsisten with do_translation_fault, where
> we have this:
> 
> 	if (addr < TASK_SIZE)
> 		return do_page_fault(addr, fsr, regs);

yes but harden_branch_predictor() is only invoked for addr > TASK_SIZE.
What do I miss?

> > +
> > +	if (interrupts_enabled(regs))
> > +		local_irq_enable();
> >  	/*
> >  	 * If we are in kernel mode at this point, we
> >  	 * have no context to handle this fault with.
> > 
> 
> I have seen three different failure modes, pleas see the first 3 calls stacks
> here: https://marc.info/?l=linux-rt-users&m=155016888714927&w=2

yes, but this is only with the one patch in RT. So you should not see
this without the RT patch.

> I am concerned about this fist issue, because it removes the branch
> predictor hardening after the do_page_fault has executed:
> 
> do_DataAbort->do_page_fault(addr>TASK_SIZE)->__do_user_fault
>
> This is reachable because do_page_fault is not only called from
> do_translation_fault but also from here: arch/arm/mm/fsr-2level.c
> and here: arch/arm/mm/fsr-3level.c
> those are callable with addr > TASK_SIZE

okay. So 0xbffffff0 without LPAE would be left out. I wasn't ware of
that. And this indeed it hits the warning.

> And the following code path does enable the hard irqs before do_bad_area:
> do_DataAbort->do_sect_fault->do_bad_area->__do_user_fault
> 
> So this function, would need to be rewritten:
> 
> do_sect_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
> {
> 	if (interrupts_enabled(regs))
> 		local_irq_enable();
> 
> 	do_bad_area(addr, fsr, regs);
> 	return 0;
> }

We would need to move the branch predictor piece before enabling
interrupts.

> 
> Thanks
> Bernd.

Sebastian
Sebastian Andrzej Siewior Feb. 20, 2019, 11 a.m. UTC | #4
On 2019-02-16 11:33:38 [+0000], Russell King - ARM Linux admin wrote:
> On Fri, Feb 15, 2019 at 09:05:33PM +0100, Sebastian Andrzej Siewior wrote:
> > From: "Yadi.hu" <yadi.hu@windriver.com>
> > 
> > Accessing a kernel address in user space causes a SIGSEGV which is sent
> > via
> > -> do_DataAbort
> >    -> do_sect_fault || do_translation_fault
> >       -> do_bad_area
> >          -> __do_user_fault
> >             -> force_sig_fault
> > Since commit
> > 
> >   02fe2845d6a83 ("ARM: entry: avoid enabling interrupts in prefetch/data abort handlers")
> > 
> > that path is carried out with disabled interrupts. Page/alignment fault
> > do enable interrupts but data abort has been left out.
> > 
> > On -RT the siglock is a sleeping spinlock and requires interrupts to be
> > enabled in order to acquire it.
> > 
> > Enable interrupts in the DataAbort handler if the parent context had
> > interrupts enabled. Move harden_branch_predictor() before interrupts are
> > enabled.
> 
> One very very big NAK.
> 
> You haven't given any reasoning for moving harden_branch_predictor().

if do_bad_area() opens interrupts then harden_branch_predictor() will
complain via smp_processor_id(). So it looked obvious to move
harden_branch_predictor() before interrupts are enabled.

> Moving it in this way, you are re-opening systems up for branch
> predictor attacks, since there is now a path through the data abort
> handler where userspace can attempt to access kernel space, resulting
> in a fault being generated, but the branch predictor hardening is no
> longer called.

Bernd Edlinger explained the missing piece to me. With
PAGE_OFFSET=0xC0000000 and without LPAE a R/W of 0xbffffff0 will ends up
with:

| ~/mem-tc 0xbffffff0 1
|Write 0xbffffff0
|BUG: using smp_processor_id() in preemptible [00000000] code: mem-tc/706
|caller is __do_user_fault.constprop.2+0x78/0xa4
|CPU: 3 PID: 706 Comm: mem-tc Tainted: G        W         5.0.0-rc6-00001-g625f719a23bda-dirty #35
|Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
|[<c01123d4>] (unwind_backtrace) from [<c010cc58>] (show_stack+0x10/0x14)
|[<c010cc58>] (show_stack) from [<c0a40988>] (dump_stack+0x78/0x8c)
|[<c0a40988>] (dump_stack) from [<c045427c>] (check_preemption_disabled+0x100/0x11c)
|[<c045427c>] (check_preemption_disabled) from [<c01164e4>] (__do_user_fault.constprop.2+0x78/0xa4)
|[<c01164e4>] (__do_user_fault.constprop.2) from [<c01168fc>] (do_page_fault+0x378/0x72c)
|[<c01168fc>] (do_page_fault) from [<c0116e24>] (do_DataAbort+0x3c/0xa8)
|[<c0116e24>] (do_DataAbort) from [<c0101d9c>] (__dabt_usr+0x3c/0x40)
|Exception stack(0xec899fb0 to 0xec899ff8)
|9fa0:                                     00000000 00000000 00000001 12345678
|9fc0: 00000000 004f7000 bffffff0 bffffff0 bedc9df4 bedc9c7c 004f7000 00000000
|9fe0: 00000000 bedc9c78 004e6559 004e6562 60010030 ffffffff
|Segmentation fault

So should harden_branch_predictor() be invoked before interrupts are enabled?

Sebastian
Bernd Edlinger Feb. 21, 2019, 8:06 a.m. UTC | #5
On 2/20/19 12:00 PM, Sebastian Andrzej Siewior wrote:
> 
> Bernd Edlinger explained the missing piece to me. With
> PAGE_OFFSET=0xC0000000 and without LPAE a R/W of 0xbffffff0 will ends up
> with:
> 

I am unable to reproduce on my target, wheter do_page_fault is directly taken,
or do_translation_fault is taken depends on whether the page table root directory
permission bits deny the access or the second level page table deny the access?

But if there is a way to get into the code path do_DataAbort->do_page_fault with
addr > TASK_SIZE, then I don't see why that works without the RT patch,
since the interrupts are enabled here:

do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
{
        struct task_struct *tsk;
        struct mm_struct *mm;
        int sig, code;
        vm_fault_t fault;
        unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;

        if (notify_page_fault(regs, fsr))
                return 0;

        tsk = current;
        mm  = tsk->mm;

        /* Enable interrupts if they were enabled in the parent context. */
        if (interrupts_enabled(regs))
                local_irq_enable();

Is preemption disabled at this point, without the RT patch?

I still don't quite see why is this no issue without the RT patch.
Can someone explain that?


Thanks
Bernd.
Bernd Edlinger Feb. 21, 2019, 9:31 a.m. UTC | #6
> But if there is a way to get into the code path do_DataAbort->do_page_fault with
> addr > TASK_SIZE, then I don't see why that works without the RT patch,
> since the interrupts are enabled here:
>
> do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
> {
>         struct task_struct *tsk;
>         struct mm_struct *mm;
>         int sig, code;
>         vm_fault_t fault;
>         unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> 
>         if (notify_page_fault(regs, fsr))
>                 return 0;
> 
>         tsk = current;
>         mm  = tsk->mm;
> 
>         /* Enable interrupts if they were enabled in the parent context. */
>         if (interrupts_enabled(regs))
>                 local_irq_enable();
> 
> Is preemption disabled at this point, without the RT patch?
> 
> I still don't quite see why is this no issue without the RT patch.
> Can someone explain that?

Never mind, now I found out how to prove that it is not a RT issue:

$ cat t2.c 
int main() {
  *((char*)0xFFFFFFFF) = 1;
}
$ gcc t2.c -o t2
$ ./t2 
BUG: using smp_processor_id() in preemptible [00000000] code: t2/832
caller is debug_smp_processor_id+0x18/0x24
CPU: 1 PID: 832 Comm: t2 Not tainted 4.18.0-00005-g7523d7f #1
Hardware name: Altera SOCFPGA
[<80112e68>] (unwind_backtrace) from [<8010da28>] (show_stack+0x20/0x24)
[<8010da28>] (show_stack) from [<806bcaec>] (dump_stack+0x78/0x94)
[<806bcaec>] (dump_stack) from [<8043f2d0>] (check_preemption_disabled+0xe4/0x120)
[<8043f2d0>] (check_preemption_disabled) from [<8043f324>] (debug_smp_processor_id+0x18/0x24)
[<8043f324>] (debug_smp_processor_id) from [<80116ba0>] (__do_user_fault+0x48/0x13c)
[<80116ba0>] (__do_user_fault) from [<8011700c>] (do_page_fault+0x2f4/0x328)
[<8011700c>] (do_page_fault) from [<80117200>] (do_DataAbort+0x58/0x100)
[<80117200>] (do_DataAbort) from [<80102460>] (__dabt_usr+0x40/0x60)
Exception stack(0xbf197fb0 to 0xbf197ff8)
7fa0:                                     00000001 7e963d94 00000001 ffffffff
7fc0: 000103f8 00000000 000102e0 00000000 00000000 00000000 76fa9000 7e963c3c
7fe0: 00000000 7e963c3c 76e47bbc 000103e0 60060010 ffffffff
Segmentation fault


What exactly is the reason why we should not apply this patch in the non-RT kernel as well:

https://marc.info/?l=linux-rt-users&m=154997565610392&w=2



Thanks
Bernd.
Sebastian Andrzej Siewior Feb. 21, 2019, 9:57 a.m. UTC | #7
On 2019-02-21 09:31:53 [+0000], Bernd Edlinger wrote:
> Never mind, now I found out how to prove that it is not a RT issue:
> 
> $ cat t2.c 
> int main() {
>   *((char*)0xFFFFFFFF) = 1;
> }

I've sent another example to in this thread to rmk.

> $ gcc t2.c -o t2
> $ ./t2 
> BUG: using smp_processor_id() in preemptible [00000000] code: t2/832
> caller is debug_smp_processor_id+0x18/0x24
> CPU: 1 PID: 832 Comm: t2 Not tainted 4.18.0-00005-g7523d7f #1
> Hardware name: Altera SOCFPGA
> [<80112e68>] (unwind_backtrace) from [<8010da28>] (show_stack+0x20/0x24)
> [<8010da28>] (show_stack) from [<806bcaec>] (dump_stack+0x78/0x94)
> [<806bcaec>] (dump_stack) from [<8043f2d0>] (check_preemption_disabled+0xe4/0x120)
> [<8043f2d0>] (check_preemption_disabled) from [<8043f324>] (debug_smp_processor_id+0x18/0x24)
> [<8043f324>] (debug_smp_processor_id) from [<80116ba0>] (__do_user_fault+0x48/0x13c)
> [<80116ba0>] (__do_user_fault) from [<8011700c>] (do_page_fault+0x2f4/0x328)
> [<8011700c>] (do_page_fault) from [<80117200>] (do_DataAbort+0x58/0x100)
> [<80117200>] (do_DataAbort) from [<80102460>] (__dabt_usr+0x40/0x60)
> Exception stack(0xbf197fb0 to 0xbf197ff8)
> 7fa0:                                     00000001 7e963d94 00000001 ffffffff
> 7fc0: 000103f8 00000000 000102e0 00000000 00000000 00000000 76fa9000 7e963c3c
> 7fe0: 00000000 7e963c3c 76e47bbc 000103e0 60060010 ffffffff
> Segmentation fault
> 
> 
> What exactly is the reason why we should not apply this patch in the non-RT kernel as well:
> 
> https://marc.info/?l=linux-rt-users&m=154997565610392&w=2

I'm not sure that this is the right thing to do. I think you should
invoke harden_branch_predictor() before the interrupts are enabled so it
is invoked on the same CPU that triggered the exception.

> Thanks
> Bernd.

Sebastian
Bernd Edlinger Feb. 21, 2019, 2:03 p.m. UTC | #8
On 2/21/19 10:57 AM, Sebastian Andrzej Siewior wrote:
> On 2019-02-21 09:31:53 [+0000], Bernd Edlinger wrote:
>>
>> What exactly is the reason why we should not apply this patch in the non-RT kernel as well:
>>
>> https://marc.info/?l=linux-rt-users&m=154997565610392&w=2
> 
> I'm not sure that this is the right thing to do. I think you should
> invoke harden_branch_predictor() before the interrupts are enabled so it
> is invoked on the same CPU that triggered the exception.
> 

When is there any secret information in the branch predictor cache ?

When do_DataAbort calls do_translation_fault?

Or after do_page_fault has executed, and is about to raise a sig-fault?

Doesn't do_page_fault have access to information that an attacker
would like to know?


Bernd.
diff mbox series

Patch

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 58f69fa07df95..da82967865836 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -161,8 +161,6 @@  __do_user_fault(struct task_struct *tsk, unsigned long addr,
 		unsigned int fsr, unsigned int sig, int code,
 		struct pt_regs *regs)
 {
-	if (addr > TASK_SIZE)
-		harden_branch_predictor();
 
 #ifdef CONFIG_DEBUG_USER
 	if (((user_debug & UDBG_SEGV) && (sig == SIGSEGV)) ||
@@ -191,6 +189,11 @@  void do_bad_area(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 	struct task_struct *tsk = current;
 	struct mm_struct *mm = tsk->active_mm;
 
+	if (addr > TASK_SIZE && user_mode(regs))
+		harden_branch_predictor();
+
+	if (interrupts_enabled(regs))
+		local_irq_enable();
 	/*
 	 * If we are in kernel mode at this point, we
 	 * have no context to handle this fault with.