diff mbox

[v3,5/7] arm64: Handle faults caused by inadvertent user access with PAN enabled

Message ID 1473788797-10879-6-git-send-email-catalin.marinas@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Catalin Marinas Sept. 13, 2016, 5:46 p.m. UTC
When TTBR0_EL1 is set to the reserved page, an erroneous kernel access
to user space would generate a translation fault. This patch adds the
checks for the software-set PSR_PAN_BIT to emulate a permission fault
and report it accordingly.

This patch also updates the description of the synchronous external
aborts on translation table walks.

Cc: Will Deacon <will.deacon@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/mm/fault.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

Comments

Mark Rutland Sept. 16, 2016, 11:33 a.m. UTC | #1
On Tue, Sep 13, 2016 at 06:46:35PM +0100, Catalin Marinas wrote:
> When TTBR0_EL1 is set to the reserved page, an erroneous kernel access
> to user space would generate a translation fault. This patch adds the
> checks for the software-set PSR_PAN_BIT to emulate a permission fault
> and report it accordingly.

Why do we need to treat this case as a permission fault? It looks like a
fault that wasn't a deliberate uaccess (which thus doesn't have a fixup
handler) should have do_page_fault call __do_kernel_fault, where we
should die().

Is this just for more consistent reporting of erroneous accesses to user
memory? Or does this fix some other issue?

> This patch also updates the description of the synchronous external
> aborts on translation table walks.

This sounds fine, though it might be better as a separate/preparatory
patch.

> -static inline bool is_permission_fault(unsigned int esr)
> +static inline bool is_permission_fault(unsigned int esr, struct pt_regs *regs)
>  {
>  	unsigned int ec       = ESR_ELx_EC(esr);
>  	unsigned int fsc_type = esr & ESR_ELx_FSC_TYPE;
>  
> -	return (ec == ESR_ELx_EC_DABT_CUR && fsc_type == ESR_ELx_FSC_PERM) ||
> -	       (ec == ESR_ELx_EC_IABT_CUR && fsc_type == ESR_ELx_FSC_PERM);
> +	if (ec != ESR_ELx_EC_DABT_CUR && ec != ESR_ELx_EC_IABT_CUR)
> +		return false;
> +
> +	if (system_uses_ttbr0_pan())
> +		return fsc_type == ESR_ELx_FSC_FAULT &&
> +			(regs->pstate & PSR_PAN_BIT);
> +	else
> +		return fsc_type == ESR_ELx_FSC_PERM;
>  }


Since the entry code will clear the PAN bit in the SPSR when we see the
reserved ASID, faults in EFI runtime services will still be correctly
reported, though that's somewhat subtle (and it took me a while to
convince myself that was the case).

Also, I think that faults elsewhere may be misreported, e.g. in cold
entry to kernel paths (idle, hotplug) where we have a global mapping in
TTBR0, and it looks like we're using the reserved ASID.

I'm not immediately sure how to distinguish those cases.

Thanks,
Mark.
Catalin Marinas Sept. 16, 2016, 3:55 p.m. UTC | #2
On Fri, Sep 16, 2016 at 12:33:25PM +0100, Mark Rutland wrote:
> On Tue, Sep 13, 2016 at 06:46:35PM +0100, Catalin Marinas wrote:
> > When TTBR0_EL1 is set to the reserved page, an erroneous kernel access
> > to user space would generate a translation fault. This patch adds the
> > checks for the software-set PSR_PAN_BIT to emulate a permission fault
> > and report it accordingly.
> 
> Why do we need to treat this case as a permission fault? It looks like a
> fault that wasn't a deliberate uaccess (which thus doesn't have a fixup
> handler) should have do_page_fault call __do_kernel_fault, where we
> should die().

We don't always check the exception table for a dedicated uaccess. The
only cases where the exception table is checked is when
down_read_trylock(mmap_sem) fails or CONFIG_DEBUG_VM is enabled. This is
a slight optimisation of the fast path of the page fault handling. So,
without is_permission_fault() (which much quicker than
search_exception_tables()) the kernel would keep restarting the same
instruction.

> > -static inline bool is_permission_fault(unsigned int esr)
> > +static inline bool is_permission_fault(unsigned int esr, struct pt_regs *regs)
> >  {
> >  	unsigned int ec       = ESR_ELx_EC(esr);
> >  	unsigned int fsc_type = esr & ESR_ELx_FSC_TYPE;
> >  
> > -	return (ec == ESR_ELx_EC_DABT_CUR && fsc_type == ESR_ELx_FSC_PERM) ||
> > -	       (ec == ESR_ELx_EC_IABT_CUR && fsc_type == ESR_ELx_FSC_PERM);
> > +	if (ec != ESR_ELx_EC_DABT_CUR && ec != ESR_ELx_EC_IABT_CUR)
> > +		return false;
> > +
> > +	if (system_uses_ttbr0_pan())
> > +		return fsc_type == ESR_ELx_FSC_FAULT &&
> > +			(regs->pstate & PSR_PAN_BIT);
> > +	else
> > +		return fsc_type == ESR_ELx_FSC_PERM;
> >  }
> 
> Since the entry code will clear the PAN bit in the SPSR when we see the
> reserved ASID,

It actually sets the PAN bit in regs->pstate when it sees the reserved
ASID.

> faults in EFI runtime services will still be correctly reported,
> though that's somewhat subtle (and it took me a while to convince
> myself that was the case).

The EFI runtime services use a non-reserved ASID, so a fault taken while
executing EFI services would clear the PAN bit in regs->pstate. Such
faults would not trigger is_permission_fault() == true above.

> Also, I think that faults elsewhere may be misreported, e.g. in cold
> entry to kernel paths (idle, hotplug) where we have a global mapping in
> TTBR0, and it looks like we're using the reserved ASID.
> 
> I'm not immediately sure how to distinguish those cases.

We don't normally expect faults on those paths. If we get one, they are
usually fatal, so the kernel still dies but with a slightly misleading
message.

We could improve this I think if we can distinguished between
reserved_ttbr0 after swapper (set on exception entry) and the reserved
TTBR0_EL1 pointing to empty_zero_page (and not merging Ard's patch). Is
it worth having such distinction? I'm not convinced, the only thing you
probably save is not printing "Accessing user space memory outside
uaccess.h routines" during fault (there may be other ways to mark these
special cases maybe by checking the current thread_info but it needs
more thinking).
diff mbox

Patch

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 05d2bd776c69..2be46e6794e1 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -268,13 +268,19 @@  out:
 	return fault;
 }
 
-static inline bool is_permission_fault(unsigned int esr)
+static inline bool is_permission_fault(unsigned int esr, struct pt_regs *regs)
 {
 	unsigned int ec       = ESR_ELx_EC(esr);
 	unsigned int fsc_type = esr & ESR_ELx_FSC_TYPE;
 
-	return (ec == ESR_ELx_EC_DABT_CUR && fsc_type == ESR_ELx_FSC_PERM) ||
-	       (ec == ESR_ELx_EC_IABT_CUR && fsc_type == ESR_ELx_FSC_PERM);
+	if (ec != ESR_ELx_EC_DABT_CUR && ec != ESR_ELx_EC_IABT_CUR)
+		return false;
+
+	if (system_uses_ttbr0_pan())
+		return fsc_type == ESR_ELx_FSC_FAULT &&
+			(regs->pstate & PSR_PAN_BIT);
+	else
+		return fsc_type == ESR_ELx_FSC_PERM;
 }
 
 static bool is_el0_instruction_abort(unsigned int esr)
@@ -314,7 +320,7 @@  static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
 		mm_flags |= FAULT_FLAG_WRITE;
 	}
 
-	if (is_permission_fault(esr) && (addr < USER_DS)) {
+	if (addr < USER_DS && is_permission_fault(esr, regs)) {
 		/* regs->orig_addr_limit may be 0 if we entered from EL0 */
 		if (regs->orig_addr_limit == KERNEL_DS)
 			die("Accessing user space memory with fs=KERNEL_DS", regs, esr);
@@ -506,10 +512,10 @@  static const struct fault_info {
 	{ do_bad,		SIGBUS,  0,		"unknown 17"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 18"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 19"			},
-	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
+	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
+	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
+	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
+	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
 	{ do_bad,		SIGBUS,  0,		"synchronous parity error"	},
 	{ do_bad,		SIGBUS,  0,		"unknown 25"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 26"			},