[21/35] arm64: mte: Add in-kernel tag fault handler

Message ID	f173aacd755e4644485c551198549ac52d1eb650.1597425745.git.andreyknvl@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=uP9R=BY=lists.infradead.org=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 67BB720829 Date: Fri, 14 Aug 2020 19:27:03 +0200 In-Reply-To: <cover.1597425745.git.andreyknvl@google.com> Message-Id: <f173aacd755e4644485c551198549ac52d1eb650.1597425745.git.andreyknvl@google.com> Mime-Version: 1.0 References: <cover.1597425745.git.andreyknvl@google.com> Subject: [PATCH 21/35] arm64: mte: Add in-kernel tag fault handler From: Andrey Konovalov <andreyknvl@google.com> To: Dmitry Vyukov <dvyukov@google.com>, Vincenzo Frascino <vincenzo.frascino@arm.com>, Catalin Marinas <catalin.marinas@arm.com>, kasan-dev@googlegroups.com summary: Content analysis details: (-7.7 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2607:f8b0:4864:20:0:0:0:f49 listed in] [list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record -7.5 USER_IN_DEF_DKIM_WL From: address is in the default DKIM white-list 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.0 DKIMWL_WL_MED DKIMwl.org - Medium sender Precedence: list Cc: Marco Elver <elver@google.com>, Elena Petrova <lenaptr@google.com>, Andrey Konovalov <andreyknvl@google.com>, Kevin Brodsky <kevin.brodsky@arm.com>, Will Deacon <will.deacon@arm.com>, Branislav Rankov <Branislav.Rankov@arm.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Alexander Potapenko <glider@google.com>, linux-arm-kernel@lists.infradead.org, Andrey Ryabinin <aryabinin@virtuozzo.com>, Andrew Morton <akpm@linux-foundation.org>, Evgenii Stepanov <eugenis@google.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org
Series	kasan: add hardware tag-based mode for arm64 \| expand [00/35] kasan: add hardware tag-based mode for arm64 [01/35] kasan: KASAN_VMALLOC depends on KASAN_GENERIC [02/35] kasan: group vmalloc code [03/35] kasan: shadow declarations only for software modes [04/35] kasan: rename (un)poison_shadow to (un)poison_memory [05/35] kasan: rename KASAN_SHADOW_* to KASAN_GRANULE_* [06/35] kasan: only build init.c for software modes [07/35] kasan: split out shadow.c from common.c [08/35] kasan: rename generic/tags_report.c files [09/35] kasan: don't duplicate config dependencies [10/35] kasan: hide invalid free check implementation [11/35] kasan: decode stack frame only with KASAN_STACK_ENABLE [12/35] kasan, arm64: only init shadow for software modes [13/35] kasan, arm64: only use kasan_depth for software modes [14/35] kasan: rename addr_has_shadow to addr_has_metadata [15/35] kasan: rename print_shadow_for_address to print_memory_metadata [16/35] kasan: kasan_non_canonical_hook only for software modes [17/35] kasan: rename SHADOW layout macros to META [18/35] kasan: separate metadata_fetch_row for each mode [19/35] kasan: don't allow SW_TAGS with ARM64_MTE [20/35] arm64: mte: Add in-kernel MTE helpers [21/35] arm64: mte: Add in-kernel tag fault handler [22/35] arm64: mte: Enable in-kernel MTE [23/35] arm64: mte: Convert gcr_user into an exclude mask [24/35] arm64: mte: Switch GCR_EL1 in kernel entry and exit [25/35] kasan: introduce CONFIG_KASAN_HW_TAGS [26/35] kasan, arm64: Enable TBI EL1 [27/35] kasan, arm64: align allocations for HW_TAGS [28/35] kasan: define KASAN_GRANULE_SIZE for HW_TAGS [29/35] kasan, x86, s390: update undef CONFIG_KASAN [30/35] kasan, arm64: expand CONFIG_KASAN checks [31/35] kasan, arm64: implement HW_TAGS runtime [32/35] kasan, arm64: print report from tag fault handler [33/35] kasan, slub: reset tags when accessing metadata [34/35] kasan, arm64: enable CONFIG_KASAN_HW_TAGS [35/35] kasan: add documentation for hardware tag-based mode

Andrey Konovalov Aug. 14, 2020, 5:27 p.m. UTC

From: Vincenzo Frascino <vincenzo.frascino@arm.com>

Add the implementation of the in-kernel fault handler.

When a tag fault happens on a kernel address:
* a warning is logged,
* the faulting instruction is skipped,
* the execution continues.

When a tag fault happens on a user address:
* the kernel executes do_bad_area() and panics.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
 arch/arm64/mm/fault.c | 50 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 49 insertions(+), 1 deletion(-)

Catalin Marinas Aug. 27, 2020, 9:54 a.m. UTC | #1

On Fri, Aug 14, 2020 at 07:27:03PM +0200, Andrey Konovalov wrote:
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 5e832b3387f1..c62c8ba85c0e 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -33,6 +33,7 @@
>  #include <asm/debug-monitors.h>
>  #include <asm/esr.h>
>  #include <asm/kprobes.h>
> +#include <asm/mte.h>
>  #include <asm/processor.h>
>  #include <asm/sysreg.h>
>  #include <asm/system_misc.h>
> @@ -222,6 +223,20 @@ int ptep_set_access_flags(struct vm_area_struct *vma,
>  	return 1;
>  }
>  
> +static bool is_el1_mte_sync_tag_check_fault(unsigned int esr)
> +{
> +	unsigned int ec = ESR_ELx_EC(esr);
> +	unsigned int fsc = esr & ESR_ELx_FSC;
> +
> +	if (ec != ESR_ELx_EC_DABT_CUR)
> +		return false;
> +
> +	if (fsc == ESR_ELx_FSC_MTE)
> +		return true;
> +
> +	return false;
> +}
> +
>  static bool is_el1_instruction_abort(unsigned int esr)
>  {
>  	return ESR_ELx_EC(esr) == ESR_ELx_EC_IABT_CUR;
> @@ -294,6 +309,18 @@ static void die_kernel_fault(const char *msg, unsigned long addr,
>  	do_exit(SIGKILL);
>  }
>  
> +static void report_tag_fault(unsigned long addr, unsigned int esr,
> +			     struct pt_regs *regs)
> +{
> +	bool is_write = ((esr & ESR_ELx_WNR) >> ESR_ELx_WNR_SHIFT) != 0;
> +
> +	pr_alert("Memory Tagging Extension Fault in %pS\n", (void *)regs->pc);
> +	pr_alert("  %s at address %lx\n", is_write ? "Write" : "Read", addr);
> +	pr_alert("  Pointer tag: [%02x], memory tag: [%02x]\n",
> +			mte_get_ptr_tag(addr),
> +			mte_get_mem_tag((void *)addr));
> +}
> +
>  static void __do_kernel_fault(unsigned long addr, unsigned int esr,
>  			      struct pt_regs *regs)
>  {
> @@ -317,12 +344,16 @@ static void __do_kernel_fault(unsigned long addr, unsigned int esr,
>  			msg = "execute from non-executable memory";
>  		else
>  			msg = "read from unreadable memory";
> +	} else if (is_el1_mte_sync_tag_check_fault(esr)) {
> +		report_tag_fault(addr, esr, regs);
> +		msg = "memory tagging extension fault";

IIUC, that's dead code. See my comment below on do_tag_check_fault().

>  	} else if (addr < PAGE_SIZE) {
>  		msg = "NULL pointer dereference";
>  	} else {
>  		msg = "paging request";
>  	}
>  
> +

Unnecessary empty line.

>  	die_kernel_fault(msg, addr, esr, regs);
>  }
>  
> @@ -658,10 +689,27 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>  	return 0;
>  }
>  
> +static int do_tag_recovery(unsigned long addr, unsigned int esr,
> +			   struct pt_regs *regs)
> +{
> +	report_tag_fault(addr, esr, regs);
> +
> +	/* Skip over the faulting instruction and continue: */
> +	arm64_skip_faulting_instruction(regs, AARCH64_INSN_SIZE);

Ooooh, do we expect the kernel to still behave correctly after this? I
thought the recovery means disabling tag checking altogether and
restarting the instruction rather than skipping over it. We only skip if
we emulated it.

> +
> +	return 0;
> +}
> +
> +
>  static int do_tag_check_fault(unsigned long addr, unsigned int esr,
>  			      struct pt_regs *regs)
>  {
> -	do_bad_area(addr, esr, regs);
> +	/* The tag check fault (TCF) is per TTBR */
> +	if (is_ttbr0_addr(addr))
> +		do_bad_area(addr, esr, regs);
> +	else
> +		do_tag_recovery(addr, esr, regs);

So we never invoke __do_kernel_fault() for a synchronous tag check in
the kernel. What's with all the is_el1_mte_sync_tag_check_fault() check
above?

Vincenzo Frascino Aug. 27, 2020, 10:44 a.m. UTC | #2

On 8/27/20 10:54 AM, Catalin Marinas wrote:
> On Fri, Aug 14, 2020 at 07:27:03PM +0200, Andrey Konovalov wrote:
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 5e832b3387f1..c62c8ba85c0e 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -33,6 +33,7 @@
>>  #include <asm/debug-monitors.h>
>>  #include <asm/esr.h>
>>  #include <asm/kprobes.h>
>> +#include <asm/mte.h>
>>  #include <asm/processor.h>
>>  #include <asm/sysreg.h>
>>  #include <asm/system_misc.h>
>> @@ -222,6 +223,20 @@ int ptep_set_access_flags(struct vm_area_struct *vma,
>>  	return 1;
>>  }
>>  
>> +static bool is_el1_mte_sync_tag_check_fault(unsigned int esr)
>> +{
>> +	unsigned int ec = ESR_ELx_EC(esr);
>> +	unsigned int fsc = esr & ESR_ELx_FSC;
>> +
>> +	if (ec != ESR_ELx_EC_DABT_CUR)
>> +		return false;
>> +
>> +	if (fsc == ESR_ELx_FSC_MTE)
>> +		return true;
>> +
>> +	return false;
>> +}
>> +
>>  static bool is_el1_instruction_abort(unsigned int esr)
>>  {
>>  	return ESR_ELx_EC(esr) == ESR_ELx_EC_IABT_CUR;
>> @@ -294,6 +309,18 @@ static void die_kernel_fault(const char *msg, unsigned long addr,
>>  	do_exit(SIGKILL);
>>  }
>>  
>> +static void report_tag_fault(unsigned long addr, unsigned int esr,
>> +			     struct pt_regs *regs)
>> +{
>> +	bool is_write = ((esr & ESR_ELx_WNR) >> ESR_ELx_WNR_SHIFT) != 0;
>> +
>> +	pr_alert("Memory Tagging Extension Fault in %pS\n", (void *)regs->pc);
>> +	pr_alert("  %s at address %lx\n", is_write ? "Write" : "Read", addr);
>> +	pr_alert("  Pointer tag: [%02x], memory tag: [%02x]\n",
>> +			mte_get_ptr_tag(addr),
>> +			mte_get_mem_tag((void *)addr));
>> +}
>> +
>>  static void __do_kernel_fault(unsigned long addr, unsigned int esr,
>>  			      struct pt_regs *regs)
>>  {
>> @@ -317,12 +344,16 @@ static void __do_kernel_fault(unsigned long addr, unsigned int esr,
>>  			msg = "execute from non-executable memory";
>>  		else
>>  			msg = "read from unreadable memory";
>> +	} else if (is_el1_mte_sync_tag_check_fault(esr)) {
>> +		report_tag_fault(addr, esr, regs);
>> +		msg = "memory tagging extension fault";
> 
> IIUC, that's dead code. See my comment below on do_tag_check_fault().
>

That's correct. This was useful with "panic_on_mte_fault" kernel command line
parameter. Since it has now been replaced by a similar kasan feature, this code
can be safely removed.

>>  	} else if (addr < PAGE_SIZE) {
>>  		msg = "NULL pointer dereference";
>>  	} else {
>>  		msg = "paging request";
>>  	}
>>  
>> +
> 
> Unnecessary empty line.
> 

Agree.

>>  	die_kernel_fault(msg, addr, esr, regs);
>>  }
>>  
>> @@ -658,10 +689,27 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>  	return 0;
>>  }
>>  
>> +static int do_tag_recovery(unsigned long addr, unsigned int esr,
>> +			   struct pt_regs *regs)
>> +{
>> +	report_tag_fault(addr, esr, regs);
>> +
>> +	/* Skip over the faulting instruction and continue: */
>> +	arm64_skip_faulting_instruction(regs, AARCH64_INSN_SIZE);
> 
> Ooooh, do we expect the kernel to still behave correctly after this? I
> thought the recovery means disabling tag checking altogether and
> restarting the instruction rather than skipping over it. We only skip if
> we emulated it.
> 

I tried to dig it out but I am not sure why we need this as well.

>> +
>> +	return 0;
>> +}
>> +
>> +
>>  static int do_tag_check_fault(unsigned long addr, unsigned int esr,
>>  			      struct pt_regs *regs)
>>  {
>> -	do_bad_area(addr, esr, regs);
>> +	/* The tag check fault (TCF) is per TTBR */
>> +	if (is_ttbr0_addr(addr))
>> +		do_bad_area(addr, esr, regs);
>> +	else
>> +		do_tag_recovery(addr, esr, regs);
> 
> So we never invoke __do_kernel_fault() for a synchronous tag check in
> the kernel. What's with all the is_el1_mte_sync_tag_check_fault() check
> above?
> 

That's correct. This had a meaning with "panic_on_mte_fault" but since the
feature has been replaced is_el1_mte_sync_tag_check_fault() is not useful anymore.

Andrey Konovalov Aug. 27, 2020, 12:31 p.m. UTC | #3

On Thu, Aug 27, 2020 at 11:54 AM Catalin Marinas
<catalin.marinas@arm.com> wrote:
>
> On Fri, Aug 14, 2020 at 07:27:03PM +0200, Andrey Konovalov wrote:
> > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> > index 5e832b3387f1..c62c8ba85c0e 100644
> > --- a/arch/arm64/mm/fault.c
> > +++ b/arch/arm64/mm/fault.c
> > @@ -33,6 +33,7 @@
> >  #include <asm/debug-monitors.h>
> >  #include <asm/esr.h>
> >  #include <asm/kprobes.h>
> > +#include <asm/mte.h>
> >  #include <asm/processor.h>
> >  #include <asm/sysreg.h>
> >  #include <asm/system_misc.h>
> > @@ -222,6 +223,20 @@ int ptep_set_access_flags(struct vm_area_struct *vma,
> >       return 1;
> >  }
> >
> > +static bool is_el1_mte_sync_tag_check_fault(unsigned int esr)
> > +{
> > +     unsigned int ec = ESR_ELx_EC(esr);
> > +     unsigned int fsc = esr & ESR_ELx_FSC;
> > +
> > +     if (ec != ESR_ELx_EC_DABT_CUR)
> > +             return false;
> > +
> > +     if (fsc == ESR_ELx_FSC_MTE)
> > +             return true;
> > +
> > +     return false;
> > +}
> > +
> >  static bool is_el1_instruction_abort(unsigned int esr)
> >  {
> >       return ESR_ELx_EC(esr) == ESR_ELx_EC_IABT_CUR;
> > @@ -294,6 +309,18 @@ static void die_kernel_fault(const char *msg, unsigned long addr,
> >       do_exit(SIGKILL);
> >  }
> >
> > +static void report_tag_fault(unsigned long addr, unsigned int esr,
> > +                          struct pt_regs *regs)
> > +{
> > +     bool is_write = ((esr & ESR_ELx_WNR) >> ESR_ELx_WNR_SHIFT) != 0;
> > +
> > +     pr_alert("Memory Tagging Extension Fault in %pS\n", (void *)regs->pc);
> > +     pr_alert("  %s at address %lx\n", is_write ? "Write" : "Read", addr);
> > +     pr_alert("  Pointer tag: [%02x], memory tag: [%02x]\n",
> > +                     mte_get_ptr_tag(addr),
> > +                     mte_get_mem_tag((void *)addr));
> > +}
> > +
> >  static void __do_kernel_fault(unsigned long addr, unsigned int esr,
> >                             struct pt_regs *regs)
> >  {
> > @@ -317,12 +344,16 @@ static void __do_kernel_fault(unsigned long addr, unsigned int esr,
> >                       msg = "execute from non-executable memory";
> >               else
> >                       msg = "read from unreadable memory";
> > +     } else if (is_el1_mte_sync_tag_check_fault(esr)) {
> > +             report_tag_fault(addr, esr, regs);
> > +             msg = "memory tagging extension fault";
>
> IIUC, that's dead code. See my comment below on do_tag_check_fault().
>
> >       } else if (addr < PAGE_SIZE) {
> >               msg = "NULL pointer dereference";
> >       } else {
> >               msg = "paging request";
> >       }
> >
> > +
>
> Unnecessary empty line.
>
> >       die_kernel_fault(msg, addr, esr, regs);
> >  }
> >
> > @@ -658,10 +689,27 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
> >       return 0;
> >  }
> >
> > +static int do_tag_recovery(unsigned long addr, unsigned int esr,
> > +                        struct pt_regs *regs)
> > +{
> > +     report_tag_fault(addr, esr, regs);
> > +
> > +     /* Skip over the faulting instruction and continue: */
> > +     arm64_skip_faulting_instruction(regs, AARCH64_INSN_SIZE);
>
> Ooooh, do we expect the kernel to still behave correctly after this? I
> thought the recovery means disabling tag checking altogether and
> restarting the instruction rather than skipping over it.

The intention is to be able to catch multiple MTE faults without
panicking or disabling MTE when executing KASAN tests (those do
multiple bad accesses one after another). We do
arm64_skip_faulting_instruction() for software tag-based KASAN too,
it's not ideal, but works for testing purposes.

Can we disable MTE, reexecute the instruction, and then reenable MTE,
or something like that?

When running in-kernel MTE in production, we'll either panic or
disable MTE after the first fault. This was controlled by the
panic_on_mte_fault option Vincenzo initially had.

> We only skip if we emulated it.

I'm not sure I understand this part, what do you mean by emulating?

>
> > +
> > +     return 0;
> > +}
> > +
> > +
> >  static int do_tag_check_fault(unsigned long addr, unsigned int esr,
> >                             struct pt_regs *regs)
> >  {
> > -     do_bad_area(addr, esr, regs);
> > +     /* The tag check fault (TCF) is per TTBR */
> > +     if (is_ttbr0_addr(addr))
> > +             do_bad_area(addr, esr, regs);
> > +     else
> > +             do_tag_recovery(addr, esr, regs);
>
> So we never invoke __do_kernel_fault() for a synchronous tag check in
> the kernel. What's with all the is_el1_mte_sync_tag_check_fault() check
> above?
>
> --
> Catalin
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20200827095429.GC29264%40gaia.

Catalin Marinas Aug. 27, 2020, 1:10 p.m. UTC | #4

On Thu, Aug 27, 2020 at 02:31:23PM +0200, Andrey Konovalov wrote:
> On Thu, Aug 27, 2020 at 11:54 AM Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > On Fri, Aug 14, 2020 at 07:27:03PM +0200, Andrey Konovalov wrote:
> > > +static int do_tag_recovery(unsigned long addr, unsigned int esr,
> > > +                        struct pt_regs *regs)
> > > +{
> > > +     report_tag_fault(addr, esr, regs);
> > > +
> > > +     /* Skip over the faulting instruction and continue: */
> > > +     arm64_skip_faulting_instruction(regs, AARCH64_INSN_SIZE);
> >
> > Ooooh, do we expect the kernel to still behave correctly after this? I
> > thought the recovery means disabling tag checking altogether and
> > restarting the instruction rather than skipping over it.
> 
> The intention is to be able to catch multiple MTE faults without
> panicking or disabling MTE when executing KASAN tests (those do
> multiple bad accesses one after another).

The problem is that for MTE synchronous tag check faults, the access has
not happened, so you basically introduce memory corruption by skipping
the access.

> We do arm64_skip_faulting_instruction() for software tag-based KASAN
> too, it's not ideal, but works for testing purposes.

IIUC, KASAN only skips over the brk instruction which doesn't have any
other side-effects. Has the actual memory access taken place when it
hits the brk?

> Can we disable MTE, reexecute the instruction, and then reenable MTE,
> or something like that?

If you want to preserve the MTE enabled, you could single-step the
instruction or execute it out of line, though it's a bit more convoluted
(we have a similar mechanism for kprobes/uprobes).

Another option would be to attempt to set the matching tag in memory,
under the assumption that it is writable (if it's not, maybe it's fine
to panic). Not sure how this interacts with the slub allocator since,
presumably, the logical tag in the pointer is wrong rather than the
allocation one.

Yet another option would be to change the tag in the register and
re-execute but this may confuse the compiler.

> When running in-kernel MTE in production, we'll either panic or
> disable MTE after the first fault. This was controlled by the
> panic_on_mte_fault option Vincenzo initially had.

I prefer to disable MTE, print something and continue, but no panic.

> > We only skip if we emulated it.
> 
> I'm not sure I understand this part, what do you mean by emulating?

Executing it out of line or other form of instruction emulation (see
arch/arm64/kernel/probes/simulate-insn.c) so that the access actually
takes place. But you can single-step or experiment with some of the
other tricks above.

Andrey Konovalov Aug. 27, 2020, 1:34 p.m. UTC | #5

On Thu, Aug 27, 2020 at 3:10 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Thu, Aug 27, 2020 at 02:31:23PM +0200, Andrey Konovalov wrote:
> > On Thu, Aug 27, 2020 at 11:54 AM Catalin Marinas
> > <catalin.marinas@arm.com> wrote:
> > > On Fri, Aug 14, 2020 at 07:27:03PM +0200, Andrey Konovalov wrote:
> > > > +static int do_tag_recovery(unsigned long addr, unsigned int esr,
> > > > +                        struct pt_regs *regs)
> > > > +{
> > > > +     report_tag_fault(addr, esr, regs);
> > > > +
> > > > +     /* Skip over the faulting instruction and continue: */
> > > > +     arm64_skip_faulting_instruction(regs, AARCH64_INSN_SIZE);
> > >
> > > Ooooh, do we expect the kernel to still behave correctly after this? I
> > > thought the recovery means disabling tag checking altogether and
> > > restarting the instruction rather than skipping over it.
> >
> > The intention is to be able to catch multiple MTE faults without
> > panicking or disabling MTE when executing KASAN tests (those do
> > multiple bad accesses one after another).
>
> The problem is that for MTE synchronous tag check faults, the access has
> not happened, so you basically introduce memory corruption by skipping
> the access.

Yes, you're right.

> > We do arm64_skip_faulting_instruction() for software tag-based KASAN
> > too, it's not ideal, but works for testing purposes.
>
> IIUC, KASAN only skips over the brk instruction which doesn't have any
> other side-effects.

Oh, yes, indeed. For some reason I confused myself thinking that we
also skip the access for software KASAN.

> Has the actual memory access taken place when it
> hits the brk?

IIRC, no, but it will be executed right after we skip the brk.

> > Can we disable MTE, reexecute the instruction, and then reenable MTE,
> > or something like that?
>
> If you want to preserve the MTE enabled, you could single-step the
> instruction or execute it out of line, though it's a bit more convoluted
> (we have a similar mechanism for kprobes/uprobes).
>
> Another option would be to attempt to set the matching tag in memory,
> under the assumption that it is writable (if it's not, maybe it's fine
> to panic). Not sure how this interacts with the slub allocator since,
> presumably, the logical tag in the pointer is wrong rather than the
> allocation one.
>
> Yet another option would be to change the tag in the register and
> re-execute but this may confuse the compiler.

Which one of these would be simpler to implement?

Perhaps we could somehow only skip faulting instructions that happen
in the KASAN test module?.. Decoding stack trace would be an option,
but that's a bit weird.

Overall, this feature is not essential, but will make testing simpler.

> > When running in-kernel MTE in production, we'll either panic or
> > disable MTE after the first fault. This was controlled by the
> > panic_on_mte_fault option Vincenzo initially had.
>
> I prefer to disable MTE, print something and continue, but no panic.

OK, we can do this.

> > > We only skip if we emulated it.
> >
> > I'm not sure I understand this part, what do you mean by emulating?
>
> Executing it out of line or other form of instruction emulation (see
> arch/arm64/kernel/probes/simulate-insn.c) so that the access actually
> takes place. But you can single-step or experiment with some of the
> other tricks above.
>
> --
> Catalin

Catalin Marinas Aug. 27, 2020, 2:56 p.m. UTC | #6

On Thu, Aug 27, 2020 at 03:34:42PM +0200, Andrey Konovalov wrote:
> On Thu, Aug 27, 2020 at 3:10 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Thu, Aug 27, 2020 at 02:31:23PM +0200, Andrey Konovalov wrote:
> > > On Thu, Aug 27, 2020 at 11:54 AM Catalin Marinas
> > > <catalin.marinas@arm.com> wrote:
> > > > On Fri, Aug 14, 2020 at 07:27:03PM +0200, Andrey Konovalov wrote:
> > > > > +static int do_tag_recovery(unsigned long addr, unsigned int esr,
> > > > > +                        struct pt_regs *regs)
> > > > > +{
> > > > > +     report_tag_fault(addr, esr, regs);
> > > > > +
> > > > > +     /* Skip over the faulting instruction and continue: */
> > > > > +     arm64_skip_faulting_instruction(regs, AARCH64_INSN_SIZE);
> > > >
> > > > Ooooh, do we expect the kernel to still behave correctly after this? I
> > > > thought the recovery means disabling tag checking altogether and
> > > > restarting the instruction rather than skipping over it.
[...]
> > > Can we disable MTE, reexecute the instruction, and then reenable MTE,
> > > or something like that?
> >
> > If you want to preserve the MTE enabled, you could single-step the
> > instruction or execute it out of line, though it's a bit more convoluted
> > (we have a similar mechanism for kprobes/uprobes).
> >
> > Another option would be to attempt to set the matching tag in memory,
> > under the assumption that it is writable (if it's not, maybe it's fine
> > to panic). Not sure how this interacts with the slub allocator since,
> > presumably, the logical tag in the pointer is wrong rather than the
> > allocation one.
> >
> > Yet another option would be to change the tag in the register and
> > re-execute but this may confuse the compiler.
> 
> Which one of these would be simpler to implement?

Either 2 or 3 would be simpler (re-tag the memory location or the
pointer) with the caveats I mentioned. Also, does the slab allocator
need to touch the memory on free with a tagged pointer? Otherwise slab
may hit an MTE fault itself.

> Perhaps we could somehow only skip faulting instructions that happen
> in the KASAN test module?.. Decoding stack trace would be an option,
> but that's a bit weird.

If you want to restrict this to the KASAN tests, just add some
MTE-specific accessors with a fixup entry similar to get_user/put_user.
__do_kernel_fault() (if actually called) will invoke the fixup code
which skips the access and returns an error. This way KASAN tests can
actually verify that tag checking works, I'd find this a lot more
useful.

Evgenii Stepanov Aug. 27, 2020, 7:14 p.m. UTC | #7

On Thu, Aug 27, 2020 at 7:56 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Thu, Aug 27, 2020 at 03:34:42PM +0200, Andrey Konovalov wrote:
> > On Thu, Aug 27, 2020 at 3:10 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > On Thu, Aug 27, 2020 at 02:31:23PM +0200, Andrey Konovalov wrote:
> > > > On Thu, Aug 27, 2020 at 11:54 AM Catalin Marinas
> > > > <catalin.marinas@arm.com> wrote:
> > > > > On Fri, Aug 14, 2020 at 07:27:03PM +0200, Andrey Konovalov wrote:
> > > > > > +static int do_tag_recovery(unsigned long addr, unsigned int esr,
> > > > > > +                        struct pt_regs *regs)
> > > > > > +{
> > > > > > +     report_tag_fault(addr, esr, regs);
> > > > > > +
> > > > > > +     /* Skip over the faulting instruction and continue: */
> > > > > > +     arm64_skip_faulting_instruction(regs, AARCH64_INSN_SIZE);
> > > > >
> > > > > Ooooh, do we expect the kernel to still behave correctly after this? I
> > > > > thought the recovery means disabling tag checking altogether and
> > > > > restarting the instruction rather than skipping over it.
> [...]
> > > > Can we disable MTE, reexecute the instruction, and then reenable MTE,
> > > > or something like that?
> > >
> > > If you want to preserve the MTE enabled, you could single-step the
> > > instruction or execute it out of line, though it's a bit more convoluted
> > > (we have a similar mechanism for kprobes/uprobes).
> > >
> > > Another option would be to attempt to set the matching tag in memory,
> > > under the assumption that it is writable (if it's not, maybe it's fine
> > > to panic). Not sure how this interacts with the slub allocator since,
> > > presumably, the logical tag in the pointer is wrong rather than the
> > > allocation one.
> > >
> > > Yet another option would be to change the tag in the register and
> > > re-execute but this may confuse the compiler.
> >
> > Which one of these would be simpler to implement?
>
> Either 2 or 3 would be simpler (re-tag the memory location or the
> pointer) with the caveats I mentioned. Also, does the slab allocator
> need to touch the memory on free with a tagged pointer? Otherwise slab
> may hit an MTE fault itself.

Changing the memory tag can cause faults in other threads, and that
could be very confusing.
Probably the safest thing is to retag the register, single step and
then retag it back, but be careful with the instructions that change
the address register (like ldr x0, [x0]).

>
> > Perhaps we could somehow only skip faulting instructions that happen
> > in the KASAN test module?.. Decoding stack trace would be an option,
> > but that's a bit weird.
>
> If you want to restrict this to the KASAN tests, just add some
> MTE-specific accessors with a fixup entry similar to get_user/put_user.
> __do_kernel_fault() (if actually called) will invoke the fixup code
> which skips the access and returns an error. This way KASAN tests can
> actually verify that tag checking works, I'd find this a lot more
> useful.
>
> --
> Catalin

Catalin Marinas Aug. 28, 2020, 9:56 a.m. UTC | #8

On Thu, Aug 27, 2020 at 12:14:26PM -0700, Evgenii Stepanov wrote:
> On Thu, Aug 27, 2020 at 7:56 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Thu, Aug 27, 2020 at 03:34:42PM +0200, Andrey Konovalov wrote:
> > > On Thu, Aug 27, 2020 at 3:10 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > > On Thu, Aug 27, 2020 at 02:31:23PM +0200, Andrey Konovalov wrote:
> > > > > On Thu, Aug 27, 2020 at 11:54 AM Catalin Marinas
> > > > > <catalin.marinas@arm.com> wrote:
> > > > > > On Fri, Aug 14, 2020 at 07:27:03PM +0200, Andrey Konovalov wrote:
> > > > > > > +static int do_tag_recovery(unsigned long addr, unsigned int esr,
> > > > > > > +                        struct pt_regs *regs)
> > > > > > > +{
> > > > > > > +     report_tag_fault(addr, esr, regs);
> > > > > > > +
> > > > > > > +     /* Skip over the faulting instruction and continue: */
> > > > > > > +     arm64_skip_faulting_instruction(regs, AARCH64_INSN_SIZE);
> > > > > >
> > > > > > Ooooh, do we expect the kernel to still behave correctly after this? I
> > > > > > thought the recovery means disabling tag checking altogether and
> > > > > > restarting the instruction rather than skipping over it.
> > [...]
> > > > > Can we disable MTE, reexecute the instruction, and then reenable MTE,
> > > > > or something like that?
> > > >
> > > > If you want to preserve the MTE enabled, you could single-step the
> > > > instruction or execute it out of line, though it's a bit more convoluted
> > > > (we have a similar mechanism for kprobes/uprobes).
> > > >
> > > > Another option would be to attempt to set the matching tag in memory,
> > > > under the assumption that it is writable (if it's not, maybe it's fine
> > > > to panic). Not sure how this interacts with the slub allocator since,
> > > > presumably, the logical tag in the pointer is wrong rather than the
> > > > allocation one.
> > > >
> > > > Yet another option would be to change the tag in the register and
> > > > re-execute but this may confuse the compiler.
> > >
> > > Which one of these would be simpler to implement?
> >
> > Either 2 or 3 would be simpler (re-tag the memory location or the
> > pointer) with the caveats I mentioned. Also, does the slab allocator
> > need to touch the memory on free with a tagged pointer? Otherwise slab
> > may hit an MTE fault itself.
> 
> Changing the memory tag can cause faults in other threads, and that
> could be very confusing.

It could indeed trigger a chain of faults. It's not even other threads,
it could be the same thread in a different function.

> Probably the safest thing is to retag the register, single step and
> then retag it back, but be careful with the instructions that change
> the address register (like ldr x0, [x0]).

This gets complicated if you have to parse the opcode. If you can
single-step, just set PSTATE.TCO for the instruction. But the
single-step machinery gets more complicated, probably interacts badly
with kprobes.

I think the best option is to disable the MTE checks in TCF on an
_unhandled_ kernel fault, report and continue. For the KASAN tests, add
accessors similar to get_user/put_user which are able to handle the
fault and return an error. Such accessors, since they have a fixup
handler, would not lead to the MTE checks being disabled.

[21/35] arm64: mte: Add in-kernel tag fault handler

Commit Message

Comments

Patch